All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] kvm: better MWAIT emulation for guests
@ 2017-03-09 22:29 Michael S. Tsirkin
  2017-03-10  0:51 ` Gabriel L. Somlo
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2017-03-09 22:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paolo Bonzini, Radim Krčmář,
	Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, kvm, linux-doc

Some guests call mwait without checking the cpu flags.  We currently
emulate that as a NOP but on VMX we can do better: let guest stop the
CPU until timer or IPI.  CPU will be busy but that isn't any worse than
a NOP emulation.

Note that mwait within guests is not the same as on real hardware
because you must halt if you want to go deep into sleep.  Thus it isn't
a good idea to use the regular MWAIT flag in CPUID for that.  Add a flag
in the hypervisor leaf instead.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 Documentation/virtual/kvm/cpuid.txt  | 3 +++
 arch/x86/include/uapi/asm/kvm_para.h | 1 +
 arch/x86/kvm/cpuid.c                 | 3 +++
 arch/x86/kvm/vmx.c                   | 4 ----
 4 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
index 3c65feb..5caa234 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -54,6 +54,9 @@ KVM_FEATURE_PV_UNHALT              ||     7 || guest checks this feature bit
                                    ||       || before enabling paravirtualized
                                    ||       || spinlock support.
 ------------------------------------------------------------------------------
+KVM_FEATURE_MWAIT                  ||     8 || guest can use monitor/mwait
+                                   ||       || to halt the VCPU.
+------------------------------------------------------------------------------
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no guest-side
                                    ||       || per-cpu warps are expected in
                                    ||       || kvmclock.
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index cff0bb6..9cc77a7 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -24,6 +24,7 @@
 #define KVM_FEATURE_STEAL_TIME		5
 #define KVM_FEATURE_PV_EOI		6
 #define KVM_FEATURE_PV_UNHALT		7
+#define KVM_FEATURE_MWAIT		8
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index efde6cc..fe3d292 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -594,6 +594,9 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 		if (sched_info_on())
 			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
 
+		if (this_cpu_has(X86_FEATURE_MWAIT))
+			entry->eax = (1 << KVM_FEATURE_MWAIT);
+
 		entry->ebx = 0;
 		entry->ecx = 0;
 		entry->edx = 0;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4bfe349..b167aba 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3547,13 +3547,9 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 	      CPU_BASED_USE_IO_BITMAPS |
 	      CPU_BASED_MOV_DR_EXITING |
 	      CPU_BASED_USE_TSC_OFFSETING |
-	      CPU_BASED_MWAIT_EXITING |
-	      CPU_BASED_MONITOR_EXITING |
 	      CPU_BASED_INVLPG_EXITING |
 	      CPU_BASED_RDPMC_EXITING;
 
-	printk(KERN_ERR "cleared CPU_BASED_MWAIT_EXITING + CPU_BASED_MONITOR_EXITING\n");
-
 	opt = CPU_BASED_TPR_SHADOW |
 	      CPU_BASED_USE_MSR_BITMAPS |
 	      CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
-- 
MST

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] kvm: better MWAIT emulation for guests
  2017-03-09 22:29 [PATCH] kvm: better MWAIT emulation for guests Michael S. Tsirkin
@ 2017-03-10  0:51 ` Gabriel L. Somlo
  2017-03-10  1:12   ` Michael S. Tsirkin
  2017-03-10 23:46 ` Jim Mattson
  2017-03-13 15:46 ` Radim Krčmář
  2 siblings, 1 reply; 19+ messages in thread
From: Gabriel L. Somlo @ 2017-03-10  0:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Paolo Bonzini, Radim Krčmář,
	Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, kvm, linux-doc

On Fri, Mar 10, 2017 at 12:29:31AM +0200, Michael S. Tsirkin wrote:
> Some guests call mwait without checking the cpu flags.  We currently
> emulate that as a NOP but on VMX we can do better: let guest stop the
> CPU until timer or IPI.  CPU will be busy but that isn't any worse than
> a NOP emulation.

Are you getting an IPI if another VCPU writes to the MONITOR-ed memory
location? If not, you'd be waking up too late and fail to meet the
specified behavior of the MONITOR/MWAIT instruction pair.

> Note that mwait within guests is not the same as on real hardware
> because you must halt if you want to go deep into sleep.  Thus it isn't
> a good idea to use the regular MWAIT flag in CPUID for that.  Add a flag
> in the hypervisor leaf instead.

Is it a good idea to advertise MWAIT capability to guests? The
misbehaving ones will call it willy-nilly, true, but aren't compliant
ones better off falling back to some alternative method (typically
using a HLT-based idle loop instead of a MONITOR/MWAIT based one) ?

Thanks,
--Gabriel

> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  Documentation/virtual/kvm/cpuid.txt  | 3 +++
>  arch/x86/include/uapi/asm/kvm_para.h | 1 +
>  arch/x86/kvm/cpuid.c                 | 3 +++
>  arch/x86/kvm/vmx.c                   | 4 ----
>  4 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
> index 3c65feb..5caa234 100644
> --- a/Documentation/virtual/kvm/cpuid.txt
> +++ b/Documentation/virtual/kvm/cpuid.txt
> @@ -54,6 +54,9 @@ KVM_FEATURE_PV_UNHALT              ||     7 || guest checks this feature bit
>                                     ||       || before enabling paravirtualized
>                                     ||       || spinlock support.
>  ------------------------------------------------------------------------------
> +KVM_FEATURE_MWAIT                  ||     8 || guest can use monitor/mwait
> +                                   ||       || to halt the VCPU.
> +------------------------------------------------------------------------------
>  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no guest-side
>                                     ||       || per-cpu warps are expected in
>                                     ||       || kvmclock.
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> index cff0bb6..9cc77a7 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -24,6 +24,7 @@
>  #define KVM_FEATURE_STEAL_TIME		5
>  #define KVM_FEATURE_PV_EOI		6
>  #define KVM_FEATURE_PV_UNHALT		7
> +#define KVM_FEATURE_MWAIT		8
>  
>  /* The last 8 bits are used to indicate how to interpret the flags field
>   * in pvclock structure. If no bits are set, all flags are ignored.
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index efde6cc..fe3d292 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -594,6 +594,9 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>  		if (sched_info_on())
>  			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
>  
> +		if (this_cpu_has(X86_FEATURE_MWAIT))
> +			entry->eax = (1 << KVM_FEATURE_MWAIT);
> +
>  		entry->ebx = 0;
>  		entry->ecx = 0;
>  		entry->edx = 0;
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 4bfe349..b167aba 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -3547,13 +3547,9 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
>  	      CPU_BASED_USE_IO_BITMAPS |
>  	      CPU_BASED_MOV_DR_EXITING |
>  	      CPU_BASED_USE_TSC_OFFSETING |
> -	      CPU_BASED_MWAIT_EXITING |
> -	      CPU_BASED_MONITOR_EXITING |
>  	      CPU_BASED_INVLPG_EXITING |
>  	      CPU_BASED_RDPMC_EXITING;
>  
> -	printk(KERN_ERR "cleared CPU_BASED_MWAIT_EXITING + CPU_BASED_MONITOR_EXITING\n");
> -
>  	opt = CPU_BASED_TPR_SHADOW |
>  	      CPU_BASED_USE_MSR_BITMAPS |
>  	      CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
> -- 
> MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] kvm: better MWAIT emulation for guests
  2017-03-10  0:51 ` Gabriel L. Somlo
@ 2017-03-10  1:12   ` Michael S. Tsirkin
  2017-03-13  7:44     ` Wanpeng Li
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2017-03-10  1:12 UTC (permalink / raw)
  To: Gabriel L. Somlo
  Cc: linux-kernel, Paolo Bonzini, Radim Krčmář,
	Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, kvm, linux-doc

On Thu, Mar 09, 2017 at 07:51:27PM -0500, Gabriel L. Somlo wrote:
> On Fri, Mar 10, 2017 at 12:29:31AM +0200, Michael S. Tsirkin wrote:
> > Some guests call mwait without checking the cpu flags.  We currently
> > emulate that as a NOP but on VMX we can do better: let guest stop the
> > CPU until timer or IPI.  CPU will be busy but that isn't any worse than
> > a NOP emulation.
> 
> Are you getting an IPI if another VCPU writes to the MONITOR-ed memory
> location?

In my testing yes.

> If not, you'd be waking up too late and fail to meet the
> specified behavior of the MONITOR/MWAIT instruction pair.
> 
> > Note that mwait within guests is not the same as on real hardware
> > because you must halt if you want to go deep into sleep.  Thus it isn't
> > a good idea to use the regular MWAIT flag in CPUID for that.  Add a flag
> > in the hypervisor leaf instead.
> 
> Is it a good idea to advertise MWAIT capability to guests?

I think it isn't so this patch does not do it.

> The
> misbehaving ones will call it willy-nilly, true, but aren't compliant
> ones better off falling back to some alternative method (typically
> using a HLT-based idle loop instead of a MONITOR/MWAIT based one) ?
> 
> Thanks,
> --Gabriel
> 
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  Documentation/virtual/kvm/cpuid.txt  | 3 +++
> >  arch/x86/include/uapi/asm/kvm_para.h | 1 +
> >  arch/x86/kvm/cpuid.c                 | 3 +++
> >  arch/x86/kvm/vmx.c                   | 4 ----
> >  4 files changed, 7 insertions(+), 4 deletions(-)
> > 
> > diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
> > index 3c65feb..5caa234 100644
> > --- a/Documentation/virtual/kvm/cpuid.txt
> > +++ b/Documentation/virtual/kvm/cpuid.txt
> > @@ -54,6 +54,9 @@ KVM_FEATURE_PV_UNHALT              ||     7 || guest checks this feature bit
> >                                     ||       || before enabling paravirtualized
> >                                     ||       || spinlock support.
> >  ------------------------------------------------------------------------------
> > +KVM_FEATURE_MWAIT                  ||     8 || guest can use monitor/mwait
> > +                                   ||       || to halt the VCPU.
> > +------------------------------------------------------------------------------
> >  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no guest-side
> >                                     ||       || per-cpu warps are expected in
> >                                     ||       || kvmclock.
> > diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> > index cff0bb6..9cc77a7 100644
> > --- a/arch/x86/include/uapi/asm/kvm_para.h
> > +++ b/arch/x86/include/uapi/asm/kvm_para.h
> > @@ -24,6 +24,7 @@
> >  #define KVM_FEATURE_STEAL_TIME		5
> >  #define KVM_FEATURE_PV_EOI		6
> >  #define KVM_FEATURE_PV_UNHALT		7
> > +#define KVM_FEATURE_MWAIT		8
> >  
> >  /* The last 8 bits are used to indicate how to interpret the flags field
> >   * in pvclock structure. If no bits are set, all flags are ignored.
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index efde6cc..fe3d292 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -594,6 +594,9 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> >  		if (sched_info_on())
> >  			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
> >  
> > +		if (this_cpu_has(X86_FEATURE_MWAIT))
> > +			entry->eax = (1 << KVM_FEATURE_MWAIT);
> > +
> >  		entry->ebx = 0;
> >  		entry->ecx = 0;
> >  		entry->edx = 0;
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 4bfe349..b167aba 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -3547,13 +3547,9 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
> >  	      CPU_BASED_USE_IO_BITMAPS |
> >  	      CPU_BASED_MOV_DR_EXITING |
> >  	      CPU_BASED_USE_TSC_OFFSETING |
> > -	      CPU_BASED_MWAIT_EXITING |
> > -	      CPU_BASED_MONITOR_EXITING |
> >  	      CPU_BASED_INVLPG_EXITING |
> >  	      CPU_BASED_RDPMC_EXITING;
> >  
> > -	printk(KERN_ERR "cleared CPU_BASED_MWAIT_EXITING + CPU_BASED_MONITOR_EXITING\n");
> > -
> >  	opt = CPU_BASED_TPR_SHADOW |
> >  	      CPU_BASED_USE_MSR_BITMAPS |
> >  	      CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
> > -- 
> > MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] kvm: better MWAIT emulation for guests
  2017-03-09 22:29 [PATCH] kvm: better MWAIT emulation for guests Michael S. Tsirkin
  2017-03-10  0:51 ` Gabriel L. Somlo
@ 2017-03-10 23:46 ` Jim Mattson
  2017-03-12  0:01   ` Michael S. Tsirkin
  2017-03-13 15:46 ` Radim Krčmář
  2 siblings, 1 reply; 19+ messages in thread
From: Jim Mattson @ 2017-03-10 23:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: LKML, Paolo Bonzini, Radim Krčmář,
	Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	the arch/x86 maintainers, kvm list, linux-doc

On Thu, Mar 9, 2017 at 2:29 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> Some guests call mwait without checking the cpu flags.  We currently

"Some guests"? What guests other than Mac OS X are so ill-behaved?

> emulate that as a NOP but on VMX we can do better: let guest stop the
> CPU until timer or IPI.  CPU will be busy but that isn't any worse than
> a NOP emulation.
>
> Note that mwait within guests is not the same as on real hardware
> because you must halt if you want to go deep into sleep.  Thus it isn't
> a good idea to use the regular MWAIT flag in CPUID for that.  Add a flag
> in the hypervisor leaf instead.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  Documentation/virtual/kvm/cpuid.txt  | 3 +++
>  arch/x86/include/uapi/asm/kvm_para.h | 1 +
>  arch/x86/kvm/cpuid.c                 | 3 +++
>  arch/x86/kvm/vmx.c                   | 4 ----
>  4 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
> index 3c65feb..5caa234 100644
> --- a/Documentation/virtual/kvm/cpuid.txt
> +++ b/Documentation/virtual/kvm/cpuid.txt
> @@ -54,6 +54,9 @@ KVM_FEATURE_PV_UNHALT              ||     7 || guest checks this feature bit
>                                     ||       || before enabling paravirtualized
>                                     ||       || spinlock support.
>  ------------------------------------------------------------------------------
> +KVM_FEATURE_MWAIT                  ||     8 || guest can use monitor/mwait
> +                                   ||       || to halt the VCPU.
> +------------------------------------------------------------------------------
>  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no guest-side
>                                     ||       || per-cpu warps are expected in
>                                     ||       || kvmclock.
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> index cff0bb6..9cc77a7 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -24,6 +24,7 @@
>  #define KVM_FEATURE_STEAL_TIME         5
>  #define KVM_FEATURE_PV_EOI             6
>  #define KVM_FEATURE_PV_UNHALT          7
> +#define KVM_FEATURE_MWAIT              8
>
>  /* The last 8 bits are used to indicate how to interpret the flags field
>   * in pvclock structure. If no bits are set, all flags are ignored.
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index efde6cc..fe3d292 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -594,6 +594,9 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>                 if (sched_info_on())
>                         entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
>
> +               if (this_cpu_has(X86_FEATURE_MWAIT))
> +                       entry->eax = (1 << KVM_FEATURE_MWAIT);
> +
>                 entry->ebx = 0;
>                 entry->ecx = 0;
>                 entry->edx = 0;
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 4bfe349..b167aba 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -3547,13 +3547,9 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
>               CPU_BASED_USE_IO_BITMAPS |
>               CPU_BASED_MOV_DR_EXITING |
>               CPU_BASED_USE_TSC_OFFSETING |
> -             CPU_BASED_MWAIT_EXITING |
> -             CPU_BASED_MONITOR_EXITING |
>               CPU_BASED_INVLPG_EXITING |
>               CPU_BASED_RDPMC_EXITING;
>
> -       printk(KERN_ERR "cleared CPU_BASED_MWAIT_EXITING + CPU_BASED_MONITOR_EXITING\n");
> -
>         opt = CPU_BASED_TPR_SHADOW |
>               CPU_BASED_USE_MSR_BITMAPS |
>               CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
> --
> MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] kvm: better MWAIT emulation for guests
  2017-03-10 23:46 ` Jim Mattson
@ 2017-03-12  0:01   ` Michael S. Tsirkin
  2017-03-12 21:18     ` Gabriel L. Somlo
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2017-03-12  0:01 UTC (permalink / raw)
  To: Jim Mattson
  Cc: LKML, Paolo Bonzini, Radim Krčmář,
	Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	the arch/x86 maintainers, kvm list, linux-doc

On Fri, Mar 10, 2017 at 03:46:45PM -0800, Jim Mattson wrote:
> On Thu, Mar 9, 2017 at 2:29 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > Some guests call mwait without checking the cpu flags.  We currently
> 
> "Some guests"? What guests other than Mac OS X are so ill-behaved?

I heard about Mac OSX only but even that is hearsay for me
so I didn't want to say that explicitly.

> > emulate that as a NOP but on VMX we can do better: let guest stop the
> > CPU until timer or IPI.  CPU will be busy but that isn't any worse than
> > a NOP emulation.
> >
> > Note that mwait within guests is not the same as on real hardware
> > because you must halt if you want to go deep into sleep.  Thus it isn't
> > a good idea to use the regular MWAIT flag in CPUID for that.  Add a flag
> > in the hypervisor leaf instead.
> >
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  Documentation/virtual/kvm/cpuid.txt  | 3 +++
> >  arch/x86/include/uapi/asm/kvm_para.h | 1 +
> >  arch/x86/kvm/cpuid.c                 | 3 +++
> >  arch/x86/kvm/vmx.c                   | 4 ----
> >  4 files changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
> > index 3c65feb..5caa234 100644
> > --- a/Documentation/virtual/kvm/cpuid.txt
> > +++ b/Documentation/virtual/kvm/cpuid.txt
> > @@ -54,6 +54,9 @@ KVM_FEATURE_PV_UNHALT              ||     7 || guest checks this feature bit
> >                                     ||       || before enabling paravirtualized
> >                                     ||       || spinlock support.
> >  ------------------------------------------------------------------------------
> > +KVM_FEATURE_MWAIT                  ||     8 || guest can use monitor/mwait
> > +                                   ||       || to halt the VCPU.
> > +------------------------------------------------------------------------------
> >  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no guest-side
> >                                     ||       || per-cpu warps are expected in
> >                                     ||       || kvmclock.
> > diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> > index cff0bb6..9cc77a7 100644
> > --- a/arch/x86/include/uapi/asm/kvm_para.h
> > +++ b/arch/x86/include/uapi/asm/kvm_para.h
> > @@ -24,6 +24,7 @@
> >  #define KVM_FEATURE_STEAL_TIME         5
> >  #define KVM_FEATURE_PV_EOI             6
> >  #define KVM_FEATURE_PV_UNHALT          7
> > +#define KVM_FEATURE_MWAIT              8
> >
> >  /* The last 8 bits are used to indicate how to interpret the flags field
> >   * in pvclock structure. If no bits are set, all flags are ignored.
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index efde6cc..fe3d292 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -594,6 +594,9 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> >                 if (sched_info_on())
> >                         entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
> >
> > +               if (this_cpu_has(X86_FEATURE_MWAIT))
> > +                       entry->eax = (1 << KVM_FEATURE_MWAIT);
> > +
> >                 entry->ebx = 0;
> >                 entry->ecx = 0;
> >                 entry->edx = 0;
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 4bfe349..b167aba 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -3547,13 +3547,9 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
> >               CPU_BASED_USE_IO_BITMAPS |
> >               CPU_BASED_MOV_DR_EXITING |
> >               CPU_BASED_USE_TSC_OFFSETING |
> > -             CPU_BASED_MWAIT_EXITING |
> > -             CPU_BASED_MONITOR_EXITING |
> >               CPU_BASED_INVLPG_EXITING |
> >               CPU_BASED_RDPMC_EXITING;
> >
> > -       printk(KERN_ERR "cleared CPU_BASED_MWAIT_EXITING + CPU_BASED_MONITOR_EXITING\n");
> > -
> >         opt = CPU_BASED_TPR_SHADOW |
> >               CPU_BASED_USE_MSR_BITMAPS |
> >               CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
> > --
> > MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] kvm: better MWAIT emulation for guests
  2017-03-12  0:01   ` Michael S. Tsirkin
@ 2017-03-12 21:18     ` Gabriel L. Somlo
  0 siblings, 0 replies; 19+ messages in thread
From: Gabriel L. Somlo @ 2017-03-12 21:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jim Mattson, LKML, Paolo Bonzini, Radim Krčmář,
	Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	the arch/x86 maintainers, kvm list, linux-doc

On Sun, Mar 12, 2017 at 02:01:32AM +0200, Michael S. Tsirkin wrote:
> On Fri, Mar 10, 2017 at 03:46:45PM -0800, Jim Mattson wrote:
> > On Thu, Mar 9, 2017 at 2:29 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > Some guests call mwait without checking the cpu flags.  We currently
> > 
> > "Some guests"? What guests other than Mac OS X are so ill-behaved?
> 
> I heard about Mac OSX only but even that is hearsay for me
> so I didn't want to say that explicitly.

As the likely origin of said hearsay, I can confirm that Mac OS 5, 6,
and 7 (Leopard through Lion) had that problem: unless explicitly
provided with kernel command line argument "idlehalt=0" they'd
implicitly assume MONITOR and MWAIT availability, without checking CPUID.

As of MountainLion (10.8.x), that is no longer the case, and OS X
gracefully falls back to an idle loop which doesn't use MONITOR/MWAIT.

Regards,
--Gabriel
 
> > > emulate that as a NOP but on VMX we can do better: let guest stop the
> > > CPU until timer or IPI.  CPU will be busy but that isn't any worse than
> > > a NOP emulation.
> > >
> > > Note that mwait within guests is not the same as on real hardware
> > > because you must halt if you want to go deep into sleep.  Thus it isn't
> > > a good idea to use the regular MWAIT flag in CPUID for that.  Add a flag
> > > in the hypervisor leaf instead.
> > >
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > >  Documentation/virtual/kvm/cpuid.txt  | 3 +++
> > >  arch/x86/include/uapi/asm/kvm_para.h | 1 +
> > >  arch/x86/kvm/cpuid.c                 | 3 +++
> > >  arch/x86/kvm/vmx.c                   | 4 ----
> > >  4 files changed, 7 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
> > > index 3c65feb..5caa234 100644
> > > --- a/Documentation/virtual/kvm/cpuid.txt
> > > +++ b/Documentation/virtual/kvm/cpuid.txt
> > > @@ -54,6 +54,9 @@ KVM_FEATURE_PV_UNHALT              ||     7 || guest checks this feature bit
> > >                                     ||       || before enabling paravirtualized
> > >                                     ||       || spinlock support.
> > >  ------------------------------------------------------------------------------
> > > +KVM_FEATURE_MWAIT                  ||     8 || guest can use monitor/mwait
> > > +                                   ||       || to halt the VCPU.
> > > +------------------------------------------------------------------------------
> > >  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no guest-side
> > >                                     ||       || per-cpu warps are expected in
> > >                                     ||       || kvmclock.
> > > diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> > > index cff0bb6..9cc77a7 100644
> > > --- a/arch/x86/include/uapi/asm/kvm_para.h
> > > +++ b/arch/x86/include/uapi/asm/kvm_para.h
> > > @@ -24,6 +24,7 @@
> > >  #define KVM_FEATURE_STEAL_TIME         5
> > >  #define KVM_FEATURE_PV_EOI             6
> > >  #define KVM_FEATURE_PV_UNHALT          7
> > > +#define KVM_FEATURE_MWAIT              8
> > >
> > >  /* The last 8 bits are used to indicate how to interpret the flags field
> > >   * in pvclock structure. If no bits are set, all flags are ignored.
> > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > index efde6cc..fe3d292 100644
> > > --- a/arch/x86/kvm/cpuid.c
> > > +++ b/arch/x86/kvm/cpuid.c
> > > @@ -594,6 +594,9 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> > >                 if (sched_info_on())
> > >                         entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
> > >
> > > +               if (this_cpu_has(X86_FEATURE_MWAIT))
> > > +                       entry->eax = (1 << KVM_FEATURE_MWAIT);
> > > +
> > >                 entry->ebx = 0;
> > >                 entry->ecx = 0;
> > >                 entry->edx = 0;
> > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > index 4bfe349..b167aba 100644
> > > --- a/arch/x86/kvm/vmx.c
> > > +++ b/arch/x86/kvm/vmx.c
> > > @@ -3547,13 +3547,9 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
> > >               CPU_BASED_USE_IO_BITMAPS |
> > >               CPU_BASED_MOV_DR_EXITING |
> > >               CPU_BASED_USE_TSC_OFFSETING |
> > > -             CPU_BASED_MWAIT_EXITING |
> > > -             CPU_BASED_MONITOR_EXITING |
> > >               CPU_BASED_INVLPG_EXITING |
> > >               CPU_BASED_RDPMC_EXITING;
> > >
> > > -       printk(KERN_ERR "cleared CPU_BASED_MWAIT_EXITING + CPU_BASED_MONITOR_EXITING\n");
> > > -
> > >         opt = CPU_BASED_TPR_SHADOW |
> > >               CPU_BASED_USE_MSR_BITMAPS |
> > >               CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
> > > --
> > > MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] kvm: better MWAIT emulation for guests
  2017-03-10  1:12   ` Michael S. Tsirkin
@ 2017-03-13  7:44     ` Wanpeng Li
  0 siblings, 0 replies; 19+ messages in thread
From: Wanpeng Li @ 2017-03-13  7:44 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Gabriel L. Somlo, linux-kernel, Paolo Bonzini,
	Radim Krčmář,
	Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	the arch/x86 maintainers, kvm, linux-doc, Peter Zijlstra

Cc Peterz,
2017-03-10 9:12 GMT+08:00 Michael S. Tsirkin <mst@redhat.com>:
> On Thu, Mar 09, 2017 at 07:51:27PM -0500, Gabriel L. Somlo wrote:
>> On Fri, Mar 10, 2017 at 12:29:31AM +0200, Michael S. Tsirkin wrote:
>> > Some guests call mwait without checking the cpu flags.  We currently
>> > emulate that as a NOP but on VMX we can do better: let guest stop the
>> > CPU until timer or IPI.  CPU will be busy but that isn't any worse than
>> > a NOP emulation.
>>
>> Are you getting an IPI if another VCPU writes to the MONITOR-ed memory
>> location?
>
> In my testing yes.

Why there is still an IPI if monitor/mwait is used in guest?

>
>> If not, you'd be waking up too late and fail to meet the
>> specified behavior of the MONITOR/MWAIT instruction pair.
>>
>> > Note that mwait within guests is not the same as on real hardware
>> > because you must halt if you want to go deep into sleep.  Thus it isn't
>> > a good idea to use the regular MWAIT flag in CPUID for that.  Add a flag
>> > in the hypervisor leaf instead.
>>
>> Is it a good idea to advertise MWAIT capability to guests?
>
> I think it isn't so this patch does not do it.
>
>> The
>> misbehaving ones will call it willy-nilly, true, but aren't compliant
>> ones better off falling back to some alternative method (typically
>> using a HLT-based idle loop instead of a MONITOR/MWAIT based one) ?
>>
>> Thanks,
>> --Gabriel
>>

[...]

>> > @@ -594,6 +594,9 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>> >             if (sched_info_on())
>> >                     entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
>> >
>> > +           if (this_cpu_has(X86_FEATURE_MWAIT))
>> > +                   entry->eax = (1 << KVM_FEATURE_MWAIT);

s/"="/"|=", otherwise you almost kill other features.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] kvm: better MWAIT emulation for guests
  2017-03-09 22:29 [PATCH] kvm: better MWAIT emulation for guests Michael S. Tsirkin
  2017-03-10  0:51 ` Gabriel L. Somlo
  2017-03-10 23:46 ` Jim Mattson
@ 2017-03-13 15:46 ` Radim Krčmář
  2017-03-13 16:08   ` Michael S. Tsirkin
  2 siblings, 1 reply; 19+ messages in thread
From: Radim Krčmář @ 2017-03-13 15:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, kvm, linux-doc

2017-03-10 00:29+0200, Michael S. Tsirkin:
> Some guests call mwait without checking the cpu flags.  We currently
> emulate that as a NOP but on VMX we can do better: let guest stop the
> CPU until timer or IPI.  CPU will be busy but that isn't any worse than
> a NOP emulation.
> 
> Note that mwait within guests is not the same as on real hardware
> because you must halt if you want to go deep into sleep.

SDM (25.3 CHANGES TO INSTRUCTION BEHAVIOR IN VMX NON-ROOT OPERATION)
says that "MWAIT operates normally".  What is the reason why MWAIT
inside VMX cannot reach the same states as MWAIT outside VMX?

>                                                           Thus it isn't
> a good idea to use the regular MWAIT flag in CPUID for that.  Add a flag
> in the hypervisor leaf instead.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
  [...]
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> @@ -594,6 +594,9 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> +		if (this_cpu_has(X86_FEATURE_MWAIT))
> +			entry->eax = (1 << KVM_FEATURE_MWAIT);

I'd rather not add it as a paravirt feature:

 - MWAIT requires the software to provide a target state, but we're not
   doing anything to expose those states.
   The feature would need very constrained setup, which is hard to
   support

 - we've had requests to support MWAIT emulation for Linux and fully
   emulating MWAIT would be best.
   MWAIT is not going to enabled by default, of course; it would be
   targeted at LPAR-like uses of KVM.

What about keeping just the last hunk to improve OS X, for now?

Thanks.

> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> @@ -3547,13 +3547,9 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
>  	      CPU_BASED_USE_IO_BITMAPS |
>  	      CPU_BASED_MOV_DR_EXITING |
>  	      CPU_BASED_USE_TSC_OFFSETING |
> -	      CPU_BASED_MWAIT_EXITING |
> -	      CPU_BASED_MONITOR_EXITING |
>  	      CPU_BASED_INVLPG_EXITING |
>  	      CPU_BASED_RDPMC_EXITING;
>  
> -	printk(KERN_ERR "cleared CPU_BASED_MWAIT_EXITING + CPU_BASED_MONITOR_EXITING\n");
> -
>  	opt = CPU_BASED_TPR_SHADOW |
>  	      CPU_BASED_USE_MSR_BITMAPS |
>  	      CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
> -- 
> MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] kvm: better MWAIT emulation for guests
  2017-03-13 15:46 ` Radim Krčmář
@ 2017-03-13 16:08   ` Michael S. Tsirkin
  2017-03-13 19:39     ` Radim Krčmář
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2017-03-13 16:08 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, kvm, linux-doc

On Mon, Mar 13, 2017 at 04:46:20PM +0100, Radim Krčmář wrote:
> 2017-03-10 00:29+0200, Michael S. Tsirkin:
> > Some guests call mwait without checking the cpu flags.  We currently
> > emulate that as a NOP but on VMX we can do better: let guest stop the
> > CPU until timer or IPI.  CPU will be busy but that isn't any worse than
> > a NOP emulation.
> > 
> > Note that mwait within guests is not the same as on real hardware
> > because you must halt if you want to go deep into sleep.
> 
> SDM (25.3 CHANGES TO INSTRUCTION BEHAVIOR IN VMX NON-ROOT OPERATION)
> says that "MWAIT operates normally".  What is the reason why MWAIT
> inside VMX cannot reach the same states as MWAIT outside VMX?

If you are going into a deep sleep state with huge latency you are
better off exiting and paying an extra microsecond latency
since a chance some other task will want to schedule seems higher.

> >                                                           Thus it isn't
> > a good idea to use the regular MWAIT flag in CPUID for that.  Add a flag
> > in the hypervisor leaf instead.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
>   [...]
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > @@ -594,6 +594,9 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> > +		if (this_cpu_has(X86_FEATURE_MWAIT))
> > +			entry->eax = (1 << KVM_FEATURE_MWAIT);
> 
> I'd rather not add it as a paravirt feature:
> 
>  - MWAIT requires the software to provide a target state, but we're not
>    doing anything to expose those states.

Current linux guests just discover these states based on
CPU model, so we do expose enough info.

>    The feature would need very constrained setup, which is hard to
>    support

Why would it? It works without any tweaking on several boxes
I own.

>  - we've had requests to support MWAIT emulation for Linux and fully
>    emulating MWAIT would be best.
>    MWAIT is not going to enabled by default, of course; it would be
>    targeted at LPAR-like uses of KVM.

Yes I think this limited emulation is safe to enable by default.
Pretending mwait is equivalent to halt maybe isn't.

> What about keeping just the last hunk to improve OS X, for now?
> 
> Thanks.

IMHO if we have a new functionality we are better of creating
some way for guests to discover it is there.
Do we really have to argue about a single bit in HV leaf?
What harm does it do?

> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > @@ -3547,13 +3547,9 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
> >  	      CPU_BASED_USE_IO_BITMAPS |
> >  	      CPU_BASED_MOV_DR_EXITING |
> >  	      CPU_BASED_USE_TSC_OFFSETING |
> > -	      CPU_BASED_MWAIT_EXITING |
> > -	      CPU_BASED_MONITOR_EXITING |
> >  	      CPU_BASED_INVLPG_EXITING |
> >  	      CPU_BASED_RDPMC_EXITING;
> >  
> > -	printk(KERN_ERR "cleared CPU_BASED_MWAIT_EXITING + CPU_BASED_MONITOR_EXITING\n");
> > -
> >  	opt = CPU_BASED_TPR_SHADOW |
> >  	      CPU_BASED_USE_MSR_BITMAPS |
> >  	      CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
> > -- 
> > MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] kvm: better MWAIT emulation for guests
  2017-03-13 16:08   ` Michael S. Tsirkin
@ 2017-03-13 19:39     ` Radim Krčmář
  2017-03-13 20:03       ` Michael S. Tsirkin
  0 siblings, 1 reply; 19+ messages in thread
From: Radim Krčmář @ 2017-03-13 19:39 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, kvm, linux-doc

2017-03-13 18:08+0200, Michael S. Tsirkin:
> On Mon, Mar 13, 2017 at 04:46:20PM +0100, Radim Krčmář wrote:
>> 2017-03-10 00:29+0200, Michael S. Tsirkin:
>> > Some guests call mwait without checking the cpu flags.  We currently
>> > emulate that as a NOP but on VMX we can do better: let guest stop the
>> > CPU until timer or IPI.  CPU will be busy but that isn't any worse than
>> > a NOP emulation.
>> > 
>> > Note that mwait within guests is not the same as on real hardware
>> > because you must halt if you want to go deep into sleep.
>> 
>> SDM (25.3 CHANGES TO INSTRUCTION BEHAVIOR IN VMX NON-ROOT OPERATION)
>> says that "MWAIT operates normally".  What is the reason why MWAIT
>> inside VMX cannot reach the same states as MWAIT outside VMX?
> 
> If you are going into a deep sleep state with huge latency you are
> better off exiting and paying an extra microsecond latency
> since a chance some other task will want to schedule seems higher.

Oh, so MWAIT behavior is same and can reach deep sleep, just use-cases
differ ... If the guest VCPU is running on isolated CPU, then you might
want to reach a deep state to save power when there is no better use.

>> >                                                           Thus it isn't
>> > a good idea to use the regular MWAIT flag in CPUID for that.  Add a flag
>> > in the hypervisor leaf instead.
>> > 
>> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>> > ---
>>   [...]
>> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>> > @@ -594,6 +594,9 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>> > +		if (this_cpu_has(X86_FEATURE_MWAIT))
>> > +			entry->eax = (1 << KVM_FEATURE_MWAIT);
>> 
>> I'd rather not add it as a paravirt feature:
>> 
>>  - MWAIT requires the software to provide a target state, but we're not
>>    doing anything to expose those states.
> 
> Current linux guests just discover these states based on
> CPU model, so we do expose enough info.

Linux still filters the hardcoded hints through CPUID[5].edx, which is 0
in our case.

>>    The feature would need very constrained setup, which is hard to
>>    support
> 
> Why would it? It works without any tweaking on several boxes
> I own.

MWAIT hints do not always mean the same, so they could lead to different
power/performance tradeoffs than the applications expects.  We should at
least specify that the paravirt feature allows only hint 0.

You probably don't run weird combinations of host/guest CPUs.

>>  - we've had requests to support MWAIT emulation for Linux and fully
>>    emulating MWAIT would be best.
>>    MWAIT is not going to enabled by default, of course; it would be
>>    targeted at LPAR-like uses of KVM.
> 
> Yes I think this limited emulation is safe to enable by default.
> Pretending mwait is equivalent to halt maybe isn't.

Right, we must keep the VCPU thread running when emulating mwait as it
is different from a hlt.

>> What about keeping just the last hunk to improve OS X, for now?
>> 
>> Thanks.
> 
> IMHO if we have a new functionality we are better of creating
> some way for guests to discover it is there.
> 
> Do we really have to argue about a single bit in HV leaf?
> What harm does it do?

It adds code to both guest and hosts and needs documentation ...
The bit is acceptable.  I just see no point in having it when there
already is a detection mechanism for mwait.

In any case, this patch should also remove VM exits under SVM and add
KVM_CAP_MWAIT for userspace.  Userspace can then set the MWAIT feature
if it wishes the guest to use it in a more standard way.

I can do a cleanup due to unused VM exits on top of it.

Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] kvm: better MWAIT emulation for guests
  2017-03-13 19:39     ` Radim Krčmář
@ 2017-03-13 20:03       ` Michael S. Tsirkin
  2017-03-13 21:43         ` Radim Krčmář
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2017-03-13 20:03 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, kvm, linux-doc

On Mon, Mar 13, 2017 at 08:39:11PM +0100, Radim Krčmář wrote:
> 2017-03-13 18:08+0200, Michael S. Tsirkin:
> > On Mon, Mar 13, 2017 at 04:46:20PM +0100, Radim Krčmář wrote:
> >> 2017-03-10 00:29+0200, Michael S. Tsirkin:
> >> > Some guests call mwait without checking the cpu flags.  We currently
> >> > emulate that as a NOP but on VMX we can do better: let guest stop the
> >> > CPU until timer or IPI.  CPU will be busy but that isn't any worse than
> >> > a NOP emulation.
> >> > 
> >> > Note that mwait within guests is not the same as on real hardware
> >> > because you must halt if you want to go deep into sleep.
> >> 
> >> SDM (25.3 CHANGES TO INSTRUCTION BEHAVIOR IN VMX NON-ROOT OPERATION)
> >> says that "MWAIT operates normally".  What is the reason why MWAIT
> >> inside VMX cannot reach the same states as MWAIT outside VMX?
> > 
> > If you are going into a deep sleep state with huge latency you are
> > better off exiting and paying an extra microsecond latency
> > since a chance some other task will want to schedule seems higher.
> 
> Oh, so MWAIT behavior is same and can reach deep sleep, just use-cases
> differ ... If the guest VCPU is running on isolated CPU, then you might
> want to reach a deep state to save power when there is no better use.
> 
> >> >                                                           Thus it isn't
> >> > a good idea to use the regular MWAIT flag in CPUID for that.  Add a flag
> >> > in the hypervisor leaf instead.
> >> > 
> >> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >> > ---
> >>   [...]
> >> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> >> > @@ -594,6 +594,9 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> >> > +		if (this_cpu_has(X86_FEATURE_MWAIT))
> >> > +			entry->eax = (1 << KVM_FEATURE_MWAIT);
> >> 
> >> I'd rather not add it as a paravirt feature:
> >> 
> >>  - MWAIT requires the software to provide a target state, but we're not
> >>    doing anything to expose those states.
> > 
> > Current linux guests just discover these states based on
> > CPU model, so we do expose enough info.
> 
> Linux still filters the hardcoded hints through CPUID[5].edx, which is 0
> in our case.
> 
> >>    The feature would need very constrained setup, which is hard to
> >>    support
> > 
> > Why would it? It works without any tweaking on several boxes
> > I own.
> 
> MWAIT hints do not always mean the same, so they could lead to different
> power/performance tradeoffs than the applications expects.  We should at
> least specify that the paravirt feature allows only hint 0.
> 
> You probably don't run weird combinations of host/guest CPUs.
> 
> >>  - we've had requests to support MWAIT emulation for Linux and fully
> >>    emulating MWAIT would be best.
> >>    MWAIT is not going to enabled by default, of course; it would be
> >>    targeted at LPAR-like uses of KVM.
> > 
> > Yes I think this limited emulation is safe to enable by default.
> > Pretending mwait is equivalent to halt maybe isn't.
> 
> Right, we must keep the VCPU thread running when emulating mwait as it
> is different from a hlt.
> 
> >> What about keeping just the last hunk to improve OS X, for now?
> >> 
> >> Thanks.
> > 
> > IMHO if we have a new functionality we are better of creating
> > some way for guests to discover it is there.
> > 
> > Do we really have to argue about a single bit in HV leaf?
> > What harm does it do?
> 
> It adds code to both guest and hosts and needs documentation ...
> The bit is acceptable.  I just see no point in having it when there
> already is a detection mechanism for mwait.

We don't want to use that standard detection mechanism IMHO at least
not in all cases.

> In any case, this patch should also remove VM exits under SVM

AMD does not have MWAIT AFAIK. In any case, I don't see
why can't SVM be a separate patch.

> and add
> KVM_CAP_MWAIT for userspace.

Sure, why not. Will do.

> Userspace can then set the MWAIT feature
> if it wishes the guest to use it in a more standard way.
> 
> I can do a cleanup due to unused VM exits on top of it.
> 
> Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] kvm: better MWAIT emulation for guests
  2017-03-13 20:03       ` Michael S. Tsirkin
@ 2017-03-13 21:43         ` Radim Krčmář
  2017-03-15 18:14           ` Gabriel L. Somlo
  0 siblings, 1 reply; 19+ messages in thread
From: Radim Krčmář @ 2017-03-13 21:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, kvm, linux-doc

2017-03-13 22:03+0200, Michael S. Tsirkin:
> On Mon, Mar 13, 2017 at 08:39:11PM +0100, Radim Krčmář wrote:
> > 2017-03-13 18:08+0200, Michael S. Tsirkin:
> > > On Mon, Mar 13, 2017 at 04:46:20PM +0100, Radim Krčmář wrote:
>> >> What about keeping just the last hunk to improve OS X, for now?
>> > 
>> > IMHO if we have a new functionality we are better of creating
>> > some way for guests to discover it is there.
>> > 
>> > Do we really have to argue about a single bit in HV leaf?
>> > What harm does it do?
>> 
>> It adds code to both guest and hosts and needs documentation ...
>> The bit is acceptable.  I just see no point in having it when there
>> already is a detection mechanism for mwait.
> 
> We don't want to use that standard detection mechanism IMHO at least
> not in all cases.

Enabling mwait by default would make sense if the guest OS monitored its
steal time and disabled mwait when it detects that it is not the main
user of the CPU, because mwait then hurts the host as well as the guest.

This would warrant some kind of paravirt as we still wouldn't want to
have standard mwait by default.  My problem is that the paravirt flag
alone is not enough for a normal mwait use on Intel.

>> In any case, this patch should also remove VM exits under SVM
> 
> AMD does not have MWAIT AFAIK. In any case, I don't see
> why can't SVM be a separate patch.

AMD just doesn't have MWAIT hints. (AMD has even MWAIT in userspace and
MWAITX, but they are not supported by KVM.)

The separate patch would have to be part of the same series as we don't
want to have vendor-specific detection, so I'd just remove these two in
the same patch to simplify handling:

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index d1efe2c62b3f..18e53bc185d6 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1198,8 +1198,6 @@ static void init_vmcb(struct vcpu_svm *svm)
 	set_intercept(svm, INTERCEPT_CLGI);
 	set_intercept(svm, INTERCEPT_SKINIT);
 	set_intercept(svm, INTERCEPT_WBINVD);
-	set_intercept(svm, INTERCEPT_MONITOR);
-	set_intercept(svm, INTERCEPT_MWAIT);
 	set_intercept(svm, INTERCEPT_XSETBV);
 
 	control->iopm_base_pa = iopm_base;

Thanks.

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] kvm: better MWAIT emulation for guests
  2017-03-13 21:43         ` Radim Krčmář
@ 2017-03-15 18:14           ` Gabriel L. Somlo
  2017-03-15 18:29             ` Michael S. Tsirkin
  0 siblings, 1 reply; 19+ messages in thread
From: Gabriel L. Somlo @ 2017-03-15 18:14 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Michael S. Tsirkin, linux-kernel, Paolo Bonzini, Jonathan Corbet,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, kvm,
	linux-doc

Michael,

I tested this on OS X 10.7 (Lion), the last version that doesn't check
CPUID for MWAIT support.

I used the latest kvm from git://git.kernel.org/pub/scm/virt/kvm/kvm.git
first as-is, then with your v2 MWAIT patch applied.

Single-(V)CPU guest works as expected (but then again, single-vcpu
guests worked even back when I tried emulating MWAIT the same as HLT).

When I try starting a SMP guest (with "-smp 4,cores=2"), the guest OS
hangs after generating some output in text/verbose boot mode -- I gave
up waiting for it after about 5 minutes. Works fine before your patch,
which leads me to suspect that, as I feared, MWAIT doesn't wake
immediately upon another VCPU writing to the MONITOR-ed memory location.

Tangentially, I remember back in the days of OS X 10.7, the
alternative to exiting guest mode and emulating MWAIT and MONITOR as
NOPs was to allow them both to run in guest mode.

While poorly documented by Intel at the time, MWAIT at L>0 effectively
behaves as a NOP (i.e., doesn't actually put the physical core into
low-power mode, because doing that would allow a guest to effectively
DOS the host hardware).

Given how unusual it is for a guest to use MONITOR/MWAIT in the first
place, what's wrong with leaving it all as is (i.e., emulated as NOP)?

Thanks,
--Gabriel

On Mon, Mar 13, 2017 at 10:43:55PM +0100, Radim Krčmář wrote:
> 2017-03-13 22:03+0200, Michael S. Tsirkin:
> > On Mon, Mar 13, 2017 at 08:39:11PM +0100, Radim Krčmář wrote:
> > > 2017-03-13 18:08+0200, Michael S. Tsirkin:
> > > > On Mon, Mar 13, 2017 at 04:46:20PM +0100, Radim Krčmář wrote:
> >> >> What about keeping just the last hunk to improve OS X, for now?
> >> > 
> >> > IMHO if we have a new functionality we are better of creating
> >> > some way for guests to discover it is there.
> >> > 
> >> > Do we really have to argue about a single bit in HV leaf?
> >> > What harm does it do?
> >> 
> >> It adds code to both guest and hosts and needs documentation ...
> >> The bit is acceptable.  I just see no point in having it when there
> >> already is a detection mechanism for mwait.
> > 
> > We don't want to use that standard detection mechanism IMHO at least
> > not in all cases.
> 
> Enabling mwait by default would make sense if the guest OS monitored its
> steal time and disabled mwait when it detects that it is not the main
> user of the CPU, because mwait then hurts the host as well as the guest.
> 
> This would warrant some kind of paravirt as we still wouldn't want to
> have standard mwait by default.  My problem is that the paravirt flag
> alone is not enough for a normal mwait use on Intel.
> 
> >> In any case, this patch should also remove VM exits under SVM
> > 
> > AMD does not have MWAIT AFAIK. In any case, I don't see
> > why can't SVM be a separate patch.
> 
> AMD just doesn't have MWAIT hints. (AMD has even MWAIT in userspace and
> MWAITX, but they are not supported by KVM.)
> 
> The separate patch would have to be part of the same series as we don't
> want to have vendor-specific detection, so I'd just remove these two in
> the same patch to simplify handling:
> 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index d1efe2c62b3f..18e53bc185d6 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -1198,8 +1198,6 @@ static void init_vmcb(struct vcpu_svm *svm)
>  	set_intercept(svm, INTERCEPT_CLGI);
>  	set_intercept(svm, INTERCEPT_SKINIT);
>  	set_intercept(svm, INTERCEPT_WBINVD);
> -	set_intercept(svm, INTERCEPT_MONITOR);
> -	set_intercept(svm, INTERCEPT_MWAIT);
>  	set_intercept(svm, INTERCEPT_XSETBV);
>  
>  	control->iopm_base_pa = iopm_base;
> 
> Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] kvm: better MWAIT emulation for guests
  2017-03-15 18:14           ` Gabriel L. Somlo
@ 2017-03-15 18:29             ` Michael S. Tsirkin
  2017-03-15 19:01               ` Gabriel L. Somlo
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2017-03-15 18:29 UTC (permalink / raw)
  To: Gabriel L. Somlo
  Cc: Radim Krčmář,
	linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, kvm, linux-doc

On Wed, Mar 15, 2017 at 02:14:26PM -0400, Gabriel L. Somlo wrote:
> Michael,
> 
> I tested this on OS X 10.7 (Lion), the last version that doesn't check
> CPUID for MWAIT support.
> 
> I used the latest kvm from git://git.kernel.org/pub/scm/virt/kvm/kvm.git
> first as-is, then with your v2 MWAIT patch applied.
> 
> Single-(V)CPU guest works as expected (but then again, single-vcpu
> guests worked even back when I tried emulating MWAIT the same as HLT).
> 
> When I try starting a SMP guest (with "-smp 4,cores=2"), the guest OS
> hangs after generating some output in text/verbose boot mode -- I gave
> up waiting for it after about 5 minutes. Works fine before your patch,
> which leads me to suspect that, as I feared, MWAIT doesn't wake
> immediately upon another VCPU writing to the MONITOR-ed memory location.
> 
> Tangentially, I remember back in the days of OS X 10.7, the
> alternative to exiting guest mode and emulating MWAIT and MONITOR as
> NOPs was to allow them both to run in guest mode.
> 
> While poorly documented by Intel at the time, MWAIT at L>0 effectively
> behaves as a NOP (i.e., doesn't actually put the physical core into
> low-power mode, because doing that would allow a guest to effectively
> DOS the host hardware).

Thanks for the testing, interesting.
Testing with Linux guest seems to show it works.
This could be an interrupt thing not a monitor thing.
Question: does your host CPU have this in its MWAIT leaf?
	Bit 01: Supports treating interrupts as break-event for MWAIT, even when interrupts disabled

We really should check that before enabling,
I'll add that.

> 
> Given how unusual it is for a guest to use MONITOR/MWAIT in the first
> place, what's wrong with leaving it all as is (i.e., emulated as NOP)?
> 
> Thanks,
> --Gabriel

I'm really looking into ways to use mwait within Linux guests,
this is just a building block that should help Mac OSX
as a side effect (and we do not want it broken if at all possible).

-- 
MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] kvm: better MWAIT emulation for guests
  2017-03-15 18:29             ` Michael S. Tsirkin
@ 2017-03-15 19:01               ` Gabriel L. Somlo
  2017-03-15 19:05                 ` Michael S. Tsirkin
  2017-03-15 19:29                 ` Michael S. Tsirkin
  0 siblings, 2 replies; 19+ messages in thread
From: Gabriel L. Somlo @ 2017-03-15 19:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Radim Krčmář,
	linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, kvm, linux-doc

On Wed, Mar 15, 2017 at 08:29:23PM +0200, Michael S. Tsirkin wrote:
> On Wed, Mar 15, 2017 at 02:14:26PM -0400, Gabriel L. Somlo wrote:
> > Michael,
> > 
> > I tested this on OS X 10.7 (Lion), the last version that doesn't check
> > CPUID for MWAIT support.
> > 
> > I used the latest kvm from git://git.kernel.org/pub/scm/virt/kvm/kvm.git
> > first as-is, then with your v2 MWAIT patch applied.
> > 
> > Single-(V)CPU guest works as expected (but then again, single-vcpu
> > guests worked even back when I tried emulating MWAIT the same as HLT).
> > 
> > When I try starting a SMP guest (with "-smp 4,cores=2"), the guest OS
> > hangs after generating some output in text/verbose boot mode -- I gave
> > up waiting for it after about 5 minutes. Works fine before your patch,
> > which leads me to suspect that, as I feared, MWAIT doesn't wake
> > immediately upon another VCPU writing to the MONITOR-ed memory location.
> > 
> > Tangentially, I remember back in the days of OS X 10.7, the
> > alternative to exiting guest mode and emulating MWAIT and MONITOR as
> > NOPs was to allow them both to run in guest mode.
> > 
> > While poorly documented by Intel at the time, MWAIT at L>0 effectively
> > behaves as a NOP (i.e., doesn't actually put the physical core into
> > low-power mode, because doing that would allow a guest to effectively
> > DOS the host hardware).
> 
> Thanks for the testing, interesting.
> Testing with Linux guest seems to show it works.
> This could be an interrupt thing not a monitor thing.
> Question: does your host CPU have this in its MWAIT leaf?
> 	Bit 01: Supports treating interrupts as break-event for MWAIT, even when interrupts disabled

How would I check for this (I'm sorry, haven't hacked on any KVM
related thing in a while, so I don't have it "cached") :)

> 
> We really should check that before enabling,
> I'll add that.
> 
> > 
> > Given how unusual it is for a guest to use MONITOR/MWAIT in the first
> > place, what's wrong with leaving it all as is (i.e., emulated as NOP)?
> > 
> 
> I'm really looking into ways to use mwait within Linux guests,
> this is just a building block that should help Mac OSX
> as a side effect (and we do not want it broken if at all possible).

A few years ago I tried really emulating MONITOR and MWAIT for a
project -- while not a total abject failure, the resulting patch
worked only intermittently (on OS X 10.7, which was the hot new thing
at the time, and hadn't started checking CPUID yet).

My collected wisdom on the topic from back then is here:

   http://www.contrib.andrew.cmu.edu/~somlo/OSXKVM/mwait.html

The problem is that MWAIT is required to wake synchronously with
any other "thing" (either another (v)CPU, or DMA, or whatever) writing
to the memory location "marked" by the last preceding MONITOR. While
interrupts of any kind may also wake an MWAIT, it is strictly not allowed
to "miss" a write to the MONITOR-ed memory location. So unless we implement
some sort of condition queue that guarantees re-enabling the "parked" vcpu
on an intercepted write to a specific memory location by another vcpu,
we can't guarantee architecturally correct behavior.

If linux uses it in a very specific way that can be "faked" even
without ISA compliance, that's OK with me -- but other guest OSs might
take the x86 ISA more literally :)

Let me know if there's anything else you'd like me to test, now that I
have set up a 4.11.0-rc2+ (a.k.a. kvm git master) testing rig...

Regards,
--Gabe

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] kvm: better MWAIT emulation for guests
  2017-03-15 19:01               ` Gabriel L. Somlo
@ 2017-03-15 19:05                 ` Michael S. Tsirkin
  2017-03-15 19:29                 ` Michael S. Tsirkin
  1 sibling, 0 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2017-03-15 19:05 UTC (permalink / raw)
  To: Gabriel L. Somlo
  Cc: Radim Krčmář,
	linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, kvm, linux-doc

On Wed, Mar 15, 2017 at 03:01:12PM -0400, Gabriel L. Somlo wrote:
> On Wed, Mar 15, 2017 at 08:29:23PM +0200, Michael S. Tsirkin wrote:
> > On Wed, Mar 15, 2017 at 02:14:26PM -0400, Gabriel L. Somlo wrote:
> > > Michael,
> > > 
> > > I tested this on OS X 10.7 (Lion), the last version that doesn't check
> > > CPUID for MWAIT support.
> > > 
> > > I used the latest kvm from git://git.kernel.org/pub/scm/virt/kvm/kvm.git
> > > first as-is, then with your v2 MWAIT patch applied.
> > > 
> > > Single-(V)CPU guest works as expected (but then again, single-vcpu
> > > guests worked even back when I tried emulating MWAIT the same as HLT).
> > > 
> > > When I try starting a SMP guest (with "-smp 4,cores=2"), the guest OS
> > > hangs after generating some output in text/verbose boot mode -- I gave
> > > up waiting for it after about 5 minutes. Works fine before your patch,
> > > which leads me to suspect that, as I feared, MWAIT doesn't wake
> > > immediately upon another VCPU writing to the MONITOR-ed memory location.
> > > 
> > > Tangentially, I remember back in the days of OS X 10.7, the
> > > alternative to exiting guest mode and emulating MWAIT and MONITOR as
> > > NOPs was to allow them both to run in guest mode.
> > > 
> > > While poorly documented by Intel at the time, MWAIT at L>0 effectively
> > > behaves as a NOP (i.e., doesn't actually put the physical core into
> > > low-power mode, because doing that would allow a guest to effectively
> > > DOS the host hardware).
> > 
> > Thanks for the testing, interesting.
> > Testing with Linux guest seems to show it works.
> > This could be an interrupt thing not a monitor thing.
> > Question: does your host CPU have this in its MWAIT leaf?
> > 	Bit 01: Supports treating interrupts as break-event for MWAIT, even when interrupts disabled
> 
> How would I check for this (I'm sorry, haven't hacked on any KVM
> related thing in a while, so I don't have it "cached") :)
> 
> > 
> > We really should check that before enabling,
> > I'll add that.
> > 
> > > 
> > > Given how unusual it is for a guest to use MONITOR/MWAIT in the first
> > > place, what's wrong with leaving it all as is (i.e., emulated as NOP)?
> > > 
> > 
> > I'm really looking into ways to use mwait within Linux guests,
> > this is just a building block that should help Mac OSX
> > as a side effect (and we do not want it broken if at all possible).
> 
> A few years ago I tried really emulating MONITOR and MWAIT for a
> project -- while not a total abject failure, the resulting patch
> worked only intermittently (on OS X 10.7, which was the hot new thing
> at the time, and hadn't started checking CPUID yet).
> 
> My collected wisdom on the topic from back then is here:
> 
>    http://www.contrib.andrew.cmu.edu/~somlo/OSXKVM/mwait.html
> 
> The problem is that MWAIT is required to wake synchronously with
> any other "thing" (either another (v)CPU, or DMA, or whatever) writing
> to the memory location "marked" by the last preceding MONITOR. While
> interrupts of any kind may also wake an MWAIT, it is strictly not allowed
> to "miss" a write to the MONITOR-ed memory location. So unless we implement
> some sort of condition queue that guarantees re-enabling the "parked" vcpu
> on an intercepted write to a specific memory location by another vcpu,
> we can't guarantee architecturally correct behavior.
> 
> If linux uses it in a very specific way that can be "faked" even
> without ISA compliance, that's OK with me -- but other guest OSs might
> take the x86 ISA more literally :)
> 
> Let me know if there's anything else you'd like me to test, now that I
> have set up a 4.11.0-rc2+ (a.k.a. kvm git master) testing rig...
> 
> Regards,
> --Gabe

I'm going to post a patch in a couple of minutes.

-- 
MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] kvm: better MWAIT emulation for guests
  2017-03-15 19:01               ` Gabriel L. Somlo
  2017-03-15 19:05                 ` Michael S. Tsirkin
@ 2017-03-15 19:29                 ` Michael S. Tsirkin
  2017-03-15 19:43                   ` Gabriel L. Somlo
  1 sibling, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2017-03-15 19:29 UTC (permalink / raw)
  To: Gabriel L. Somlo
  Cc: Radim Krčmář,
	linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, kvm, linux-doc

On Wed, Mar 15, 2017 at 03:01:12PM -0400, Gabriel L. Somlo wrote:
> On Wed, Mar 15, 2017 at 08:29:23PM +0200, Michael S. Tsirkin wrote:
> > On Wed, Mar 15, 2017 at 02:14:26PM -0400, Gabriel L. Somlo wrote:
> > > Michael,
> > > 
> > > I tested this on OS X 10.7 (Lion), the last version that doesn't check
> > > CPUID for MWAIT support.
> > > 
> > > I used the latest kvm from git://git.kernel.org/pub/scm/virt/kvm/kvm.git
> > > first as-is, then with your v2 MWAIT patch applied.
> > > 
> > > Single-(V)CPU guest works as expected (but then again, single-vcpu
> > > guests worked even back when I tried emulating MWAIT the same as HLT).
> > > 
> > > When I try starting a SMP guest (with "-smp 4,cores=2"), the guest OS
> > > hangs after generating some output in text/verbose boot mode -- I gave
> > > up waiting for it after about 5 minutes. Works fine before your patch,
> > > which leads me to suspect that, as I feared, MWAIT doesn't wake
> > > immediately upon another VCPU writing to the MONITOR-ed memory location.
> > > 
> > > Tangentially, I remember back in the days of OS X 10.7, the
> > > alternative to exiting guest mode and emulating MWAIT and MONITOR as
> > > NOPs was to allow them both to run in guest mode.
> > > 
> > > While poorly documented by Intel at the time, MWAIT at L>0 effectively
> > > behaves as a NOP (i.e., doesn't actually put the physical core into
> > > low-power mode, because doing that would allow a guest to effectively
> > > DOS the host hardware).
> > 
> > Thanks for the testing, interesting.
> > Testing with Linux guest seems to show it works.
> > This could be an interrupt thing not a monitor thing.
> > Question: does your host CPU have this in its MWAIT leaf?
> > 	Bit 01: Supports treating interrupts as break-event for MWAIT, even when interrupts disabled
> 
> How would I check for this (I'm sorry, haven't hacked on any KVM
> related thing in a while, so I don't have it "cached") :)
> 
> > 
> > We really should check that before enabling,
> > I'll add that.
> > 
> > > 
> > > Given how unusual it is for a guest to use MONITOR/MWAIT in the first
> > > place, what's wrong with leaving it all as is (i.e., emulated as NOP)?
> > > 
> > 
> > I'm really looking into ways to use mwait within Linux guests,
> > this is just a building block that should help Mac OSX
> > as a side effect (and we do not want it broken if at all possible).
> 
> A few years ago I tried really emulating MONITOR and MWAIT for a
> project -- while not a total abject failure, the resulting patch
> worked only intermittently (on OS X 10.7, which was the hot new thing
> at the time, and hadn't started checking CPUID yet).
> 
> My collected wisdom on the topic from back then is here:
> 
>    http://www.contrib.andrew.cmu.edu/~somlo/OSXKVM/mwait.html
> 
> The problem is that MWAIT is required to wake synchronously with
> any other "thing" (either another (v)CPU, or DMA, or whatever) writing
> to the memory location "marked" by the last preceding MONITOR. While
> interrupts of any kind may also wake an MWAIT, it is strictly not allowed
> to "miss" a write to the MONITOR-ed memory location. So unless we implement
> some sort of condition queue that guarantees re-enabling the "parked" vcpu
> on an intercepted write to a specific memory location by another vcpu,
> we can't guarantee architecturally correct behavior.
> 
> If linux uses it in a very specific way that can be "faked" even
> without ISA compliance, that's OK with me -- but other guest OSs might
> take the x86 ISA more literally :)
> 
> Let me know if there's anything else you'd like me to test, now that I
> have set up a 4.11.0-rc2+ (a.k.a. kvm git master) testing rig...
> 
> Regards,
> --Gabe

Doing that corrently in software would be very hard.
I suspect your host CPU has an issue, sent a patch to
detect that. Let's see what happens.

-- 
MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] kvm: better MWAIT emulation for guests
  2017-03-15 19:29                 ` Michael S. Tsirkin
@ 2017-03-15 19:43                   ` Gabriel L. Somlo
  2017-03-15 20:13                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 19+ messages in thread
From: Gabriel L. Somlo @ 2017-03-15 19:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Radim Krčmář,
	linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, kvm, linux-doc

Applies cleanly over git://git.kernel.org/pub/scm/virt/kvm/kvm.git,
but then I get:

  CC [M]  arch/x86/kvm/x86.o
In file included from arch/x86/kvm/x86.c:28:0:
arch/x86/kvm/x86.h: In function ‘kvm_mwait_in_guest’:
arch/x86/kvm/x86.h:231:34: error: ‘CPUID_MWAIT_LEAF’ undeclared (first use in this function)
  if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF)
                                  ^
arch/x86/kvm/x86.h:231:34: note: each undeclared identifier is reported only once for each function it appears in
arch/x86/kvm/x86.h:234:45: error: ‘mwait_substates’ undeclared (first use in this function)
  cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &mwait_substates);
                                             ^
arch/x86/kvm/x86.h:236:14: error: ‘CPUID5_ECX_INTERRUPT_BREAK’ undeclared (first use in this function)
  if (!(ecx & CPUID5_ECX_INTERRUPT_BREAK))
              ^
arch/x86/kvm/x86.h:238:1: warning: control reaches end of non-void function [-Wreturn-type]
 }
 ^
scripts/Makefile.build:294: recipe for target 'arch/x86/kvm/x86.o' failed
make[2]: *** [arch/x86/kvm/x86.o] Error 1
scripts/Makefile.build:553: recipe for target 'arch/x86/kvm' failed
make[1]: *** [arch/x86/kvm] Error 2
Makefile:1002: recipe for target 'arch/x86' failed
make: *** [arch/x86] Error 2


Did you accidentally leave out something that went into a .h file
somewhere ?

Thx,
--G

On Wed, Mar 15, 2017 at 09:29:57PM +0200, Michael S. Tsirkin wrote:
> On Wed, Mar 15, 2017 at 03:01:12PM -0400, Gabriel L. Somlo wrote:
> > On Wed, Mar 15, 2017 at 08:29:23PM +0200, Michael S. Tsirkin wrote:
> > > On Wed, Mar 15, 2017 at 02:14:26PM -0400, Gabriel L. Somlo wrote:
> > > > Michael,
> > > > 
> > > > I tested this on OS X 10.7 (Lion), the last version that doesn't check
> > > > CPUID for MWAIT support.
> > > > 
> > > > I used the latest kvm from git://git.kernel.org/pub/scm/virt/kvm/kvm.git
> > > > first as-is, then with your v2 MWAIT patch applied.
> > > > 
> > > > Single-(V)CPU guest works as expected (but then again, single-vcpu
> > > > guests worked even back when I tried emulating MWAIT the same as HLT).
> > > > 
> > > > When I try starting a SMP guest (with "-smp 4,cores=2"), the guest OS
> > > > hangs after generating some output in text/verbose boot mode -- I gave
> > > > up waiting for it after about 5 minutes. Works fine before your patch,
> > > > which leads me to suspect that, as I feared, MWAIT doesn't wake
> > > > immediately upon another VCPU writing to the MONITOR-ed memory location.
> > > > 
> > > > Tangentially, I remember back in the days of OS X 10.7, the
> > > > alternative to exiting guest mode and emulating MWAIT and MONITOR as
> > > > NOPs was to allow them both to run in guest mode.
> > > > 
> > > > While poorly documented by Intel at the time, MWAIT at L>0 effectively
> > > > behaves as a NOP (i.e., doesn't actually put the physical core into
> > > > low-power mode, because doing that would allow a guest to effectively
> > > > DOS the host hardware).
> > > 
> > > Thanks for the testing, interesting.
> > > Testing with Linux guest seems to show it works.
> > > This could be an interrupt thing not a monitor thing.
> > > Question: does your host CPU have this in its MWAIT leaf?
> > > 	Bit 01: Supports treating interrupts as break-event for MWAIT, even when interrupts disabled
> > 
> > How would I check for this (I'm sorry, haven't hacked on any KVM
> > related thing in a while, so I don't have it "cached") :)
> > 
> > > 
> > > We really should check that before enabling,
> > > I'll add that.
> > > 
> > > > 
> > > > Given how unusual it is for a guest to use MONITOR/MWAIT in the first
> > > > place, what's wrong with leaving it all as is (i.e., emulated as NOP)?
> > > > 
> > > 
> > > I'm really looking into ways to use mwait within Linux guests,
> > > this is just a building block that should help Mac OSX
> > > as a side effect (and we do not want it broken if at all possible).
> > 
> > A few years ago I tried really emulating MONITOR and MWAIT for a
> > project -- while not a total abject failure, the resulting patch
> > worked only intermittently (on OS X 10.7, which was the hot new thing
> > at the time, and hadn't started checking CPUID yet).
> > 
> > My collected wisdom on the topic from back then is here:
> > 
> >    http://www.contrib.andrew.cmu.edu/~somlo/OSXKVM/mwait.html
> > 
> > The problem is that MWAIT is required to wake synchronously with
> > any other "thing" (either another (v)CPU, or DMA, or whatever) writing
> > to the memory location "marked" by the last preceding MONITOR. While
> > interrupts of any kind may also wake an MWAIT, it is strictly not allowed
> > to "miss" a write to the MONITOR-ed memory location. So unless we implement
> > some sort of condition queue that guarantees re-enabling the "parked" vcpu
> > on an intercepted write to a specific memory location by another vcpu,
> > we can't guarantee architecturally correct behavior.
> > 
> > If linux uses it in a very specific way that can be "faked" even
> > without ISA compliance, that's OK with me -- but other guest OSs might
> > take the x86 ISA more literally :)
> > 
> > Let me know if there's anything else you'd like me to test, now that I
> > have set up a 4.11.0-rc2+ (a.k.a. kvm git master) testing rig...
> > 
> > Regards,
> > --Gabe
> 
> Doing that corrently in software would be very hard.
> I suspect your host CPU has an issue, sent a patch to
> detect that. Let's see what happens.
> 
> -- 
> MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] kvm: better MWAIT emulation for guests
  2017-03-15 19:43                   ` Gabriel L. Somlo
@ 2017-03-15 20:13                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2017-03-15 20:13 UTC (permalink / raw)
  To: Gabriel L. Somlo
  Cc: Radim Krčmář,
	linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, kvm, linux-doc

On Wed, Mar 15, 2017 at 03:43:03PM -0400, Gabriel L. Somlo wrote:
> Applies cleanly over git://git.kernel.org/pub/scm/virt/kvm/kvm.git,
> but then I get:
> 
>   CC [M]  arch/x86/kvm/x86.o
> In file included from arch/x86/kvm/x86.c:28:0:
> arch/x86/kvm/x86.h: In function ‘kvm_mwait_in_guest’:
> arch/x86/kvm/x86.h:231:34: error: ‘CPUID_MWAIT_LEAF’ undeclared (first use in this function)
>   if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF)
>                                   ^
> arch/x86/kvm/x86.h:231:34: note: each undeclared identifier is reported only once for each function it appears in
> arch/x86/kvm/x86.h:234:45: error: ‘mwait_substates’ undeclared (first use in this function)
>   cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &mwait_substates);
>                                              ^
> arch/x86/kvm/x86.h:236:14: error: ‘CPUID5_ECX_INTERRUPT_BREAK’ undeclared (first use in this function)
>   if (!(ecx & CPUID5_ECX_INTERRUPT_BREAK))
>               ^
> arch/x86/kvm/x86.h:238:1: warning: control reaches end of non-void function [-Wreturn-type]
>  }
>  ^
> scripts/Makefile.build:294: recipe for target 'arch/x86/kvm/x86.o' failed
> make[2]: *** [arch/x86/kvm/x86.o] Error 1
> scripts/Makefile.build:553: recipe for target 'arch/x86/kvm' failed
> make[1]: *** [arch/x86/kvm] Error 2
> Makefile:1002: recipe for target 'arch/x86' failed
> make: *** [arch/x86] Error 2

forgot to commit :(
Will resend, sorry.

> 
> Did you accidentally leave out something that went into a .h file
> somewhere ?
> 
> Thx,
> --G
> 
> On Wed, Mar 15, 2017 at 09:29:57PM +0200, Michael S. Tsirkin wrote:
> > On Wed, Mar 15, 2017 at 03:01:12PM -0400, Gabriel L. Somlo wrote:
> > > On Wed, Mar 15, 2017 at 08:29:23PM +0200, Michael S. Tsirkin wrote:
> > > > On Wed, Mar 15, 2017 at 02:14:26PM -0400, Gabriel L. Somlo wrote:
> > > > > Michael,
> > > > > 
> > > > > I tested this on OS X 10.7 (Lion), the last version that doesn't check
> > > > > CPUID for MWAIT support.
> > > > > 
> > > > > I used the latest kvm from git://git.kernel.org/pub/scm/virt/kvm/kvm.git
> > > > > first as-is, then with your v2 MWAIT patch applied.
> > > > > 
> > > > > Single-(V)CPU guest works as expected (but then again, single-vcpu
> > > > > guests worked even back when I tried emulating MWAIT the same as HLT).
> > > > > 
> > > > > When I try starting a SMP guest (with "-smp 4,cores=2"), the guest OS
> > > > > hangs after generating some output in text/verbose boot mode -- I gave
> > > > > up waiting for it after about 5 minutes. Works fine before your patch,
> > > > > which leads me to suspect that, as I feared, MWAIT doesn't wake
> > > > > immediately upon another VCPU writing to the MONITOR-ed memory location.
> > > > > 
> > > > > Tangentially, I remember back in the days of OS X 10.7, the
> > > > > alternative to exiting guest mode and emulating MWAIT and MONITOR as
> > > > > NOPs was to allow them both to run in guest mode.
> > > > > 
> > > > > While poorly documented by Intel at the time, MWAIT at L>0 effectively
> > > > > behaves as a NOP (i.e., doesn't actually put the physical core into
> > > > > low-power mode, because doing that would allow a guest to effectively
> > > > > DOS the host hardware).
> > > > 
> > > > Thanks for the testing, interesting.
> > > > Testing with Linux guest seems to show it works.
> > > > This could be an interrupt thing not a monitor thing.
> > > > Question: does your host CPU have this in its MWAIT leaf?
> > > > 	Bit 01: Supports treating interrupts as break-event for MWAIT, even when interrupts disabled
> > > 
> > > How would I check for this (I'm sorry, haven't hacked on any KVM
> > > related thing in a while, so I don't have it "cached") :)
> > > 
> > > > 
> > > > We really should check that before enabling,
> > > > I'll add that.
> > > > 
> > > > > 
> > > > > Given how unusual it is for a guest to use MONITOR/MWAIT in the first
> > > > > place, what's wrong with leaving it all as is (i.e., emulated as NOP)?
> > > > > 
> > > > 
> > > > I'm really looking into ways to use mwait within Linux guests,
> > > > this is just a building block that should help Mac OSX
> > > > as a side effect (and we do not want it broken if at all possible).
> > > 
> > > A few years ago I tried really emulating MONITOR and MWAIT for a
> > > project -- while not a total abject failure, the resulting patch
> > > worked only intermittently (on OS X 10.7, which was the hot new thing
> > > at the time, and hadn't started checking CPUID yet).
> > > 
> > > My collected wisdom on the topic from back then is here:
> > > 
> > >    http://www.contrib.andrew.cmu.edu/~somlo/OSXKVM/mwait.html
> > > 
> > > The problem is that MWAIT is required to wake synchronously with
> > > any other "thing" (either another (v)CPU, or DMA, or whatever) writing
> > > to the memory location "marked" by the last preceding MONITOR. While
> > > interrupts of any kind may also wake an MWAIT, it is strictly not allowed
> > > to "miss" a write to the MONITOR-ed memory location. So unless we implement
> > > some sort of condition queue that guarantees re-enabling the "parked" vcpu
> > > on an intercepted write to a specific memory location by another vcpu,
> > > we can't guarantee architecturally correct behavior.
> > > 
> > > If linux uses it in a very specific way that can be "faked" even
> > > without ISA compliance, that's OK with me -- but other guest OSs might
> > > take the x86 ISA more literally :)
> > > 
> > > Let me know if there's anything else you'd like me to test, now that I
> > > have set up a 4.11.0-rc2+ (a.k.a. kvm git master) testing rig...
> > > 
> > > Regards,
> > > --Gabe
> > 
> > Doing that corrently in software would be very hard.
> > I suspect your host CPU has an issue, sent a patch to
> > detect that. Let's see what happens.
> > 
> > -- 
> > MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-03-15 20:13 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-09 22:29 [PATCH] kvm: better MWAIT emulation for guests Michael S. Tsirkin
2017-03-10  0:51 ` Gabriel L. Somlo
2017-03-10  1:12   ` Michael S. Tsirkin
2017-03-13  7:44     ` Wanpeng Li
2017-03-10 23:46 ` Jim Mattson
2017-03-12  0:01   ` Michael S. Tsirkin
2017-03-12 21:18     ` Gabriel L. Somlo
2017-03-13 15:46 ` Radim Krčmář
2017-03-13 16:08   ` Michael S. Tsirkin
2017-03-13 19:39     ` Radim Krčmář
2017-03-13 20:03       ` Michael S. Tsirkin
2017-03-13 21:43         ` Radim Krčmář
2017-03-15 18:14           ` Gabriel L. Somlo
2017-03-15 18:29             ` Michael S. Tsirkin
2017-03-15 19:01               ` Gabriel L. Somlo
2017-03-15 19:05                 ` Michael S. Tsirkin
2017-03-15 19:29                 ` Michael S. Tsirkin
2017-03-15 19:43                   ` Gabriel L. Somlo
2017-03-15 20:13                     ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.