* [PATCH v5 untested] kvm: better MWAIT emulation for guests @ 2017-03-15 21:22 Michael S. Tsirkin 2017-03-15 23:35 ` Gabriel L. Somlo ` (2 more replies) 0 siblings, 3 replies; 54+ messages in thread From: Michael S. Tsirkin @ 2017-03-15 21:22 UTC (permalink / raw) To: linux-kernel Cc: Gabriel L. Somlo, Paolo Bonzini, Radim Krčmář, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem: unless explicitly provided with kernel command line argument "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability, without checking CPUID. We currently emulate that as a NOP but on VMX we can do better: let guest stop the CPU until timer, IPI or memory change. CPU will be busy but that isn't any worse than a NOP emulation. Note that mwait within guests is not the same as on real hardware because halt causes an exit while mwait doesn't. For this reason it might not be a good idea to use the regular MWAIT flag in CPUID to signal this capability. Add a flag in the hypervisor leaf instead. Additionally, we add a capability for QEMU - e.g. if it knows there's an isolated CPU dedicated for the VCPU it can set the standard MWAIT flag to improve guest behaviour. Reported-by: "Gabriel L. Somlo" <gsomlo@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> --- This is for Gabriel's testing only. A bit rushed so untested. Documentation/virtual/kvm/api.txt | 9 +++++++++ Documentation/virtual/kvm/cpuid.txt | 6 ++++++ arch/x86/include/uapi/asm/kvm_para.h | 1 + arch/x86/kvm/cpuid.c | 3 +++ arch/x86/kvm/svm.c | 2 -- arch/x86/kvm/vmx.c | 6 ++++-- arch/x86/kvm/x86.c | 3 +++ arch/x86/kvm/x86.h | 28 ++++++++++++++++++++++++++++ include/uapi/linux/kvm.h | 1 + 9 files changed, 55 insertions(+), 4 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 3c248f7..6ee2e43 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -4147,3 +4147,12 @@ This capability, if KVM_CHECK_EXTENSION indicates that it is available, means that that the kernel can support guests using the hashed page table MMU defined in Power ISA V3.00 (as implemented in the POWER9 processor), including in-memory segment tables. + +8.5 KVM_CAP_X86_GUEST_MWAIT + +Architectures: x86 + +This capability indicates that guest using memory monotoring instructions +(MWAIT/MWAITX) to stop the virtual CPU will not cause a VM exit. As such time +spent while virtual CPU is halted in this way will then be accounted for as +guest running time on the host (as opposed to e.g. HLT). diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt index 3c65feb..04c201c 100644 --- a/Documentation/virtual/kvm/cpuid.txt +++ b/Documentation/virtual/kvm/cpuid.txt @@ -54,6 +54,12 @@ KVM_FEATURE_PV_UNHALT || 7 || guest checks this feature bit || || before enabling paravirtualized || || spinlock support. ------------------------------------------------------------------------------ +KVM_FEATURE_MWAIT || 8 || guest can use monitor/mwait + || || to halt the VCPU without exits, + || || time spent while halted in this + || || way is accounted for on host as + || || VCPU run time. +------------------------------------------------------------------------------ KVM_FEATURE_CLOCKSOURCE_STABLE_BIT || 24 || host will warn if no guest-side || || per-cpu warps are expected in || || kvmclock. diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index cff0bb6..9cc77a7 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -24,6 +24,7 @@ #define KVM_FEATURE_STEAL_TIME 5 #define KVM_FEATURE_PV_EOI 6 #define KVM_FEATURE_PV_UNHALT 7 +#define KVM_FEATURE_MWAIT 8 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index efde6cc..5638102 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -594,6 +594,9 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, if (sched_info_on()) entry->eax |= (1 << KVM_FEATURE_STEAL_TIME); + if (kvm_mwait_in_guest()) + entry->eax |= (1 << KVM_FEATURE_MWAIT); + entry->ebx = 0; entry->ecx = 0; entry->edx = 0; diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index d1efe2c..18e53bc 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1198,8 +1198,6 @@ static void init_vmcb(struct vcpu_svm *svm) set_intercept(svm, INTERCEPT_CLGI); set_intercept(svm, INTERCEPT_SKINIT); set_intercept(svm, INTERCEPT_WBINVD); - set_intercept(svm, INTERCEPT_MONITOR); - set_intercept(svm, INTERCEPT_MWAIT); set_intercept(svm, INTERCEPT_XSETBV); control->iopm_base_pa = iopm_base; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 98e82ee..ea0c96a 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3547,11 +3547,13 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) CPU_BASED_USE_IO_BITMAPS | CPU_BASED_MOV_DR_EXITING | CPU_BASED_USE_TSC_OFFSETING | - CPU_BASED_MWAIT_EXITING | - CPU_BASED_MONITOR_EXITING | CPU_BASED_INVLPG_EXITING | CPU_BASED_RDPMC_EXITING; + if (!kvm_mwait_in_guest()) + min |= CPU_BASED_MWAIT_EXITING | + CPU_BASED_MONITOR_EXITING; + opt = CPU_BASED_TPR_SHADOW | CPU_BASED_USE_MSR_BITMAPS | CPU_BASED_ACTIVATE_SECONDARY_CONTROLS; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1faf620..8c74fff 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2684,6 +2684,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_ADJUST_CLOCK: r = KVM_CLOCK_TSC_STABLE; break; + case KVM_CAP_X86_GUEST_MWAIT: + r = kvm_mwait_in_guest(); + break; case KVM_CAP_X86_SMM: /* SMBASE is usually relocated above 1M on modern chipsets, * and SMM handlers might indeed rely on 4G segment limits, diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index e8ff3e4..a2d8964 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -1,6 +1,8 @@ #ifndef ARCH_X86_KVM_X86_H #define ARCH_X86_KVM_X86_H +#include <asm/processor.h> +#include <asm/mwait.h> #include <linux/kvm_host.h> #include <asm/pvclock.h> #include "kvm_cache_regs.h" @@ -212,4 +214,30 @@ static inline u64 nsec_to_cycles(struct kvm_vcpu *vcpu, u64 nsec) __rem; \ }) +static inline bool kvm_mwait_in_guest(void) +{ + unsigned int eax, ebx, ecx, edx; + + if (!cpu_has(&boot_cpu_data, X86_FEATURE_MWAIT)) + return false; + + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) + return false; + + /* + * Intel CPUs without CPUID5_ECX_INTERRUPT_BREAK are problematic as + * they would allow guest to stop the CPU completely by disabling + * interrupts then invoking MWAIT. + */ + if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF) + return false; + + cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx); + + if (!(ecx & CPUID5_ECX_INTERRUPT_BREAK)) + return false; + + return true; +} + #endif diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index f51d508..8b6bc06 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -883,6 +883,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_PPC_MMU_RADIX 134 #define KVM_CAP_PPC_MMU_HASH_V3 135 #define KVM_CAP_IMMEDIATE_EXIT 136 +#define KVM_CAP_X86_GUEST_MWAIT 137 #ifdef KVM_CAP_IRQ_ROUTING -- MST ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-15 21:22 [PATCH v5 untested] kvm: better MWAIT emulation for guests Michael S. Tsirkin @ 2017-03-15 23:35 ` Gabriel L. Somlo 2017-03-15 23:41 ` Michael S. Tsirkin 2017-03-21 16:16 ` Joerg Roedel 2017-03-27 13:34 ` Alexander Graf 2 siblings, 1 reply; 54+ messages in thread From: Gabriel L. Somlo @ 2017-03-15 23:35 UTC (permalink / raw) To: Michael S. Tsirkin Cc: linux-kernel, Paolo Bonzini, Radim Krčmář, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc On Wed, Mar 15, 2017 at 11:22:18PM +0200, Michael S. Tsirkin wrote: > Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem: > unless explicitly provided with kernel command line argument > "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability, > without checking CPUID. > > We currently emulate that as a NOP but on VMX we can do better: let > guest stop the CPU until timer, IPI or memory change. CPU will be busy > but that isn't any worse than a NOP emulation. > > Note that mwait within guests is not the same as on real hardware > because halt causes an exit while mwait doesn't. For this reason it > might not be a good idea to use the regular MWAIT flag in CPUID to > signal this capability. Add a flag in the hypervisor leaf instead. > > Additionally, we add a capability for QEMU - e.g. if it knows there's an > isolated CPU dedicated for the VCPU it can set the standard MWAIT flag > to improve guest behaviour. Same behavior (on the mac pro 1,1 running F22 with custom-compiled kernel from kvm git master, plus this patch on top). The OS X 10.7 kernel hangs (or at least progresses extremely slowly) on boot, does not bring up guest graphical interface within the first 10 minutes that I waited for it. That, in contrast with the default nop-based emulation where the guest comes up within 30 seconds. I will run another round of tests on a newer Mac (4-year-old macbook air) and report back tomorrow. Going off on a tangent, why would encouraging otherwise well-behaved guests (like linux ones, for example) to use MWAIT be desirable to begin with ? Is it a matter of minimizing the overhead associated with exiting and re-entering L1 ? Because if so, AFAIR staying inside L1 and running guest-mode MWAIT in a tight loop will actually waste the host CPU without the opportunity to yield to some other L0 thread. Sorry if I fell into the middle of an ongoing conversation on this and missed most of the relevant context, in which case please feel free to ignore me... :) Thanks, --G > > Reported-by: "Gabriel L. Somlo" <gsomlo@gmail.com> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > --- > > This is for Gabriel's testing only. A bit rushed so untested. > > Documentation/virtual/kvm/api.txt | 9 +++++++++ > Documentation/virtual/kvm/cpuid.txt | 6 ++++++ > arch/x86/include/uapi/asm/kvm_para.h | 1 + > arch/x86/kvm/cpuid.c | 3 +++ > arch/x86/kvm/svm.c | 2 -- > arch/x86/kvm/vmx.c | 6 ++++-- > arch/x86/kvm/x86.c | 3 +++ > arch/x86/kvm/x86.h | 28 ++++++++++++++++++++++++++++ > include/uapi/linux/kvm.h | 1 + > 9 files changed, 55 insertions(+), 4 deletions(-) > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt > index 3c248f7..6ee2e43 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -4147,3 +4147,12 @@ This capability, if KVM_CHECK_EXTENSION indicates that it is > available, means that that the kernel can support guests using the > hashed page table MMU defined in Power ISA V3.00 (as implemented in > the POWER9 processor), including in-memory segment tables. > + > +8.5 KVM_CAP_X86_GUEST_MWAIT > + > +Architectures: x86 > + > +This capability indicates that guest using memory monotoring instructions > +(MWAIT/MWAITX) to stop the virtual CPU will not cause a VM exit. As such time > +spent while virtual CPU is halted in this way will then be accounted for as > +guest running time on the host (as opposed to e.g. HLT). > diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt > index 3c65feb..04c201c 100644 > --- a/Documentation/virtual/kvm/cpuid.txt > +++ b/Documentation/virtual/kvm/cpuid.txt > @@ -54,6 +54,12 @@ KVM_FEATURE_PV_UNHALT || 7 || guest checks this feature bit > || || before enabling paravirtualized > || || spinlock support. > ------------------------------------------------------------------------------ > +KVM_FEATURE_MWAIT || 8 || guest can use monitor/mwait > + || || to halt the VCPU without exits, > + || || time spent while halted in this > + || || way is accounted for on host as > + || || VCPU run time. > +------------------------------------------------------------------------------ > KVM_FEATURE_CLOCKSOURCE_STABLE_BIT || 24 || host will warn if no guest-side > || || per-cpu warps are expected in > || || kvmclock. > diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h > index cff0bb6..9cc77a7 100644 > --- a/arch/x86/include/uapi/asm/kvm_para.h > +++ b/arch/x86/include/uapi/asm/kvm_para.h > @@ -24,6 +24,7 @@ > #define KVM_FEATURE_STEAL_TIME 5 > #define KVM_FEATURE_PV_EOI 6 > #define KVM_FEATURE_PV_UNHALT 7 > +#define KVM_FEATURE_MWAIT 8 > > /* The last 8 bits are used to indicate how to interpret the flags field > * in pvclock structure. If no bits are set, all flags are ignored. > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index efde6cc..5638102 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -594,6 +594,9 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, > if (sched_info_on()) > entry->eax |= (1 << KVM_FEATURE_STEAL_TIME); > > + if (kvm_mwait_in_guest()) > + entry->eax |= (1 << KVM_FEATURE_MWAIT); > + > entry->ebx = 0; > entry->ecx = 0; > entry->edx = 0; > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c > index d1efe2c..18e53bc 100644 > --- a/arch/x86/kvm/svm.c > +++ b/arch/x86/kvm/svm.c > @@ -1198,8 +1198,6 @@ static void init_vmcb(struct vcpu_svm *svm) > set_intercept(svm, INTERCEPT_CLGI); > set_intercept(svm, INTERCEPT_SKINIT); > set_intercept(svm, INTERCEPT_WBINVD); > - set_intercept(svm, INTERCEPT_MONITOR); > - set_intercept(svm, INTERCEPT_MWAIT); > set_intercept(svm, INTERCEPT_XSETBV); > > control->iopm_base_pa = iopm_base; > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 98e82ee..ea0c96a 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -3547,11 +3547,13 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) > CPU_BASED_USE_IO_BITMAPS | > CPU_BASED_MOV_DR_EXITING | > CPU_BASED_USE_TSC_OFFSETING | > - CPU_BASED_MWAIT_EXITING | > - CPU_BASED_MONITOR_EXITING | > CPU_BASED_INVLPG_EXITING | > CPU_BASED_RDPMC_EXITING; > > + if (!kvm_mwait_in_guest()) > + min |= CPU_BASED_MWAIT_EXITING | > + CPU_BASED_MONITOR_EXITING; > + > opt = CPU_BASED_TPR_SHADOW | > CPU_BASED_USE_MSR_BITMAPS | > CPU_BASED_ACTIVATE_SECONDARY_CONTROLS; > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 1faf620..8c74fff 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -2684,6 +2684,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > case KVM_CAP_ADJUST_CLOCK: > r = KVM_CLOCK_TSC_STABLE; > break; > + case KVM_CAP_X86_GUEST_MWAIT: > + r = kvm_mwait_in_guest(); > + break; > case KVM_CAP_X86_SMM: > /* SMBASE is usually relocated above 1M on modern chipsets, > * and SMM handlers might indeed rely on 4G segment limits, > diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h > index e8ff3e4..a2d8964 100644 > --- a/arch/x86/kvm/x86.h > +++ b/arch/x86/kvm/x86.h > @@ -1,6 +1,8 @@ > #ifndef ARCH_X86_KVM_X86_H > #define ARCH_X86_KVM_X86_H > > +#include <asm/processor.h> > +#include <asm/mwait.h> > #include <linux/kvm_host.h> > #include <asm/pvclock.h> > #include "kvm_cache_regs.h" > @@ -212,4 +214,30 @@ static inline u64 nsec_to_cycles(struct kvm_vcpu *vcpu, u64 nsec) > __rem; \ > }) > > +static inline bool kvm_mwait_in_guest(void) > +{ > + unsigned int eax, ebx, ecx, edx; > + > + if (!cpu_has(&boot_cpu_data, X86_FEATURE_MWAIT)) > + return false; > + > + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) > + return false; > + > + /* > + * Intel CPUs without CPUID5_ECX_INTERRUPT_BREAK are problematic as > + * they would allow guest to stop the CPU completely by disabling > + * interrupts then invoking MWAIT. > + */ > + if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF) > + return false; > + > + cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx); > + > + if (!(ecx & CPUID5_ECX_INTERRUPT_BREAK)) > + return false; > + > + return true; > +} > + > #endif > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index f51d508..8b6bc06 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -883,6 +883,7 @@ struct kvm_ppc_resize_hpt { > #define KVM_CAP_PPC_MMU_RADIX 134 > #define KVM_CAP_PPC_MMU_HASH_V3 135 > #define KVM_CAP_IMMEDIATE_EXIT 136 > +#define KVM_CAP_X86_GUEST_MWAIT 137 > > #ifdef KVM_CAP_IRQ_ROUTING > > -- > MST ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-15 23:35 ` Gabriel L. Somlo @ 2017-03-15 23:41 ` Michael S. Tsirkin 2017-03-16 13:24 ` Gabriel L. Somlo 0 siblings, 1 reply; 54+ messages in thread From: Michael S. Tsirkin @ 2017-03-15 23:41 UTC (permalink / raw) To: Gabriel L. Somlo Cc: linux-kernel, Paolo Bonzini, Radim Krčmář, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc On Wed, Mar 15, 2017 at 07:35:34PM -0400, Gabriel L. Somlo wrote: > On Wed, Mar 15, 2017 at 11:22:18PM +0200, Michael S. Tsirkin wrote: > > Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem: > > unless explicitly provided with kernel command line argument > > "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability, > > without checking CPUID. > > > > We currently emulate that as a NOP but on VMX we can do better: let > > guest stop the CPU until timer, IPI or memory change. CPU will be busy > > but that isn't any worse than a NOP emulation. > > > > Note that mwait within guests is not the same as on real hardware > > because halt causes an exit while mwait doesn't. For this reason it > > might not be a good idea to use the regular MWAIT flag in CPUID to > > signal this capability. Add a flag in the hypervisor leaf instead. > > > > Additionally, we add a capability for QEMU - e.g. if it knows there's an > > isolated CPU dedicated for the VCPU it can set the standard MWAIT flag > > to improve guest behaviour. > > Same behavior (on the mac pro 1,1 running F22 with custom-compiled > kernel from kvm git master, plus this patch on top). > > The OS X 10.7 kernel hangs (or at least progresses extremely slowly) > on boot, does not bring up guest graphical interface within the first > 10 minutes that I waited for it. That, in contrast with the default > nop-based emulation where the guest comes up within 30 seconds. Thanks a lot, meanwhile I'll try to write a unit-test and experiment with various behaviours. > I will run another round of tests on a newer Mac (4-year-old macbook > air) and report back tomorrow. > > Going off on a tangent, why would encouraging otherwise well-behaved > guests (like linux ones, for example) to use MWAIT be desirable to > begin with ? Is it a matter of minimizing the overhead associated with > exiting and re-entering L1 ? Because if so, AFAIR staying inside L1 and > running guest-mode MWAIT in a tight loop will actually waste the host > CPU without the opportunity to yield to some other L0 thread. Sorry if > I fell into the middle of an ongoing conversation on this and missed > most of the relevant context, in which case please feel free to ignore > me... :) > > Thanks, > --G It's just some experiments I'm running, I'm not ready to describe it yet. I thought this part might be useful to at least some guests, so trying to upstream it right now. -- MST ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-15 23:41 ` Michael S. Tsirkin @ 2017-03-16 13:24 ` Gabriel L. Somlo 2017-03-16 14:04 ` Michael S. Tsirkin 2017-03-16 14:08 ` Radim Krčmář 0 siblings, 2 replies; 54+ messages in thread From: Gabriel L. Somlo @ 2017-03-16 13:24 UTC (permalink / raw) To: Michael S. Tsirkin Cc: linux-kernel, Paolo Bonzini, Radim Krčmář, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc On Thu, Mar 16, 2017 at 01:41:28AM +0200, Michael S. Tsirkin wrote: > On Wed, Mar 15, 2017 at 07:35:34PM -0400, Gabriel L. Somlo wrote: > > On Wed, Mar 15, 2017 at 11:22:18PM +0200, Michael S. Tsirkin wrote: > > > Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem: > > > unless explicitly provided with kernel command line argument > > > "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability, > > > without checking CPUID. > > > > > > We currently emulate that as a NOP but on VMX we can do better: let > > > guest stop the CPU until timer, IPI or memory change. CPU will be busy > > > but that isn't any worse than a NOP emulation. > > > > > > Note that mwait within guests is not the same as on real hardware > > > because halt causes an exit while mwait doesn't. For this reason it > > > might not be a good idea to use the regular MWAIT flag in CPUID to > > > signal this capability. Add a flag in the hypervisor leaf instead. > > > > > > Additionally, we add a capability for QEMU - e.g. if it knows there's an > > > isolated CPU dedicated for the VCPU it can set the standard MWAIT flag > > > to improve guest behaviour. > > > > Same behavior (on the mac pro 1,1 running F22 with custom-compiled > > kernel from kvm git master, plus this patch on top). > > > > The OS X 10.7 kernel hangs (or at least progresses extremely slowly) > > on boot, does not bring up guest graphical interface within the first > > 10 minutes that I waited for it. That, in contrast with the default > > nop-based emulation where the guest comes up within 30 seconds. > > > Thanks a lot, meanwhile I'll try to write a unit-test and experiment > with various behaviours. > > > I will run another round of tests on a newer Mac (4-year-old macbook > > air) and report back tomorrow. > > > > Going off on a tangent, why would encouraging otherwise well-behaved > > guests (like linux ones, for example) to use MWAIT be desirable to > > begin with ? Is it a matter of minimizing the overhead associated with > > exiting and re-entering L1 ? Because if so, AFAIR staying inside L1 and > > running guest-mode MWAIT in a tight loop will actually waste the host > > CPU without the opportunity to yield to some other L0 thread. Sorry if > > I fell into the middle of an ongoing conversation on this and missed > > most of the relevant context, in which case please feel free to ignore > > me... :) > > > > Thanks, > > --G > > It's just some experiments I'm running, I'm not ready to describe it > yet. I thought this part might be useful to at least some guests, so > trying to upstream it right now. OK, so on a macbook air running F25 and the latest kvm git master plus your v5 patch (4.11.0-rc2+), things appear to work. host-side cpuid output: eax=0x000040 ebx=0x000040 ecx=0x000003 edx=0x021120 guest-side cpuid output: eax=00000000 ebx=00000000 ecx=0x000003 edx=00000000 processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2677M CPU @ 1.80GHz stepping : 7 microcode : 0x29 cpu MHz : 1157.849 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts bugs : bogomips : 3604.68 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: After studying your patch a bit more carefully (sorry, it's crazy around here right now :) ) I realized you're simply trying to (selectively) decide when to exit L1 and emulate as NOP vs. when to just allow L1 to execute MONITOR & MWAIT natively. Is that right ? Because if so, the issues I saw on my MacPro1,1 are weird and inexplicable, given that allowing L>=1 to run MONITOR/MWAIT natively was one of the options Alex Graf and Rene Rebe used back in the very early days of OS X on QEMU, at the time I got involved with that project. Here's part of an out of tree patch against 3.4 which did just that, and worked as far as I remember on *any* MWAIT capable intel chip I had access to back in 2010: ############################################################################## # 99-mwait.patch.kvm-kmod (Rene Rebe <rene@exactcode.de>) 2010-04-27 ############################################################################## diff -pNarU5 linux-3.4/arch/x86/kvm/cpuid.c linux-3.4-mac/arch/x86/kvm/cpuid.c --- linux-3.4/arch/x86/kvm/cpuid.c 2012-05-20 18:29:13.000000000 -0400 +++ linux-3.4-mac/arch/x86/kvm/cpuid.c 2012-10-09 11:42:59.921215750 -0400 @@ -222,11 +222,11 @@ static int do_cpuid_ent(struct kvm_cpuid f_nx | 0 /* Reserved */ | F(MMXEXT) | F(MMX) | F(FXSR) | F(FXSR_OPT) | f_gbpages | f_rdtscp | 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW); /* cpuid 1.ecx */ const u32 kvm_supported_word4_x86_features = - F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | + F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ | 0 /* DS-CPL, VMX, SMX, EST */ | 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ | 0 /* Reserved, DCA */ | F(XMM4_1) | F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) | diff -pNarU5 linux-3.4/arch/x86/kvm/svm.c linux-3.4-mac/arch/x86/kvm/svm.c --- linux-3.4/arch/x86/kvm/svm.c 2012-05-20 18:29:13.000000000 -0400 +++ linux-3.4-mac/arch/x86/kvm/svm.c 2012-10-09 11:44:41.598997481 -0400 @@ -1102,12 +1102,10 @@ static void init_vmcb(struct vcpu_svm *s set_intercept(svm, INTERCEPT_VMSAVE); set_intercept(svm, INTERCEPT_STGI); set_intercept(svm, INTERCEPT_CLGI); set_intercept(svm, INTERCEPT_SKINIT); set_intercept(svm, INTERCEPT_WBINVD); - set_intercept(svm, INTERCEPT_MONITOR); - set_intercept(svm, INTERCEPT_MWAIT); set_intercept(svm, INTERCEPT_XSETBV); control->iopm_base_pa = iopm_base; control->msrpm_base_pa = __pa(svm->msrpm); control->int_ctl = V_INTR_MASKING_MASK; diff -pNarU5 linux-3.4/arch/x86/kvm/vmx.c linux-3.4-mac/arch/x86/kvm/vmx.c --- linux-3.4/arch/x86/kvm/vmx.c 2012-05-20 18:29:13.000000000 -0400 +++ linux-3.4-mac/arch/x86/kvm/vmx.c 2012-10-09 11:42:59.925215977 -0400 @@ -1938,11 +1938,11 @@ static __init void nested_vmx_setup_ctls nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high); nested_vmx_procbased_ctls_low = 0; nested_vmx_procbased_ctls_high &= CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_USE_TSC_OFFSETING | CPU_BASED_HLT_EXITING | CPU_BASED_INVLPG_EXITING | - CPU_BASED_MWAIT_EXITING | CPU_BASED_CR3_LOAD_EXITING | + CPU_BASED_CR3_LOAD_EXITING | CPU_BASED_CR3_STORE_EXITING | #ifdef CONFIG_X86_64 CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING | #endif CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING | @@ -2404,12 +2404,10 @@ static __init int setup_vmcs_config(stru CPU_BASED_CR3_LOAD_EXITING | CPU_BASED_CR3_STORE_EXITING | CPU_BASED_USE_IO_BITMAPS | CPU_BASED_MOV_DR_EXITING | CPU_BASED_USE_TSC_OFFSETING | - CPU_BASED_MWAIT_EXITING | - CPU_BASED_MONITOR_EXITING | CPU_BASED_INVLPG_EXITING | CPU_BASED_RDPMC_EXITING; opt = CPU_BASED_TPR_SHADOW | CPU_BASED_USE_MSR_BITMAPS | If all you're trying to do is (selectively) revert to this behavior, that "shouldn't" mess it up for the MacPro either, so I'm thoroughly confused at this point :) Back in 2010, running MWAIT in L>=1 behaved 100% exactly like a NOP, didn't power down the physical CPU, just immediately moved on to the next instruction. As such, there was no power saving and no opportunity to yield to another L0 thread either, unlike with NOP emulation at L0. Did that change on newer Intel chips (i.e., is guest-mode MWAIT now doing something smarter than just acting as a guest-mode NOP) ? Thanks, --Gabriel ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-16 13:24 ` Gabriel L. Somlo @ 2017-03-16 14:04 ` Michael S. Tsirkin 2017-03-16 14:58 ` Gabriel L. Somlo 2017-03-16 14:08 ` Radim Krčmář 1 sibling, 1 reply; 54+ messages in thread From: Michael S. Tsirkin @ 2017-03-16 14:04 UTC (permalink / raw) To: Gabriel L. Somlo Cc: linux-kernel, Paolo Bonzini, Radim Krčmář, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc On Thu, Mar 16, 2017 at 09:24:27AM -0400, Gabriel L. Somlo wrote: > After studying your patch a bit more carefully (sorry, it's crazy > around here right now :) ) I realized you're simply trying to > (selectively) decide when to exit L1 and emulate as NOP vs. when to > just allow L1 to execute MONITOR & MWAIT natively. > > Is that right ? Because if so, the issues I saw on my MacPro1,1 are > weird and inexplicable, given that allowing L>=1 to run MONITOR/MWAIT > natively was one of the options Alex Graf and Rene Rebe used back in > the very early days of OS X on QEMU, at the time I got involved with > that project. Here's part of an out of tree patch against 3.4 which did > just that, and worked as far as I remember on *any* MWAIT capable > intel chip I had access to back in 2010: > > ############################################################################## > # 99-mwait.patch.kvm-kmod (Rene Rebe <rene@exactcode.de>) 2010-04-27 > ############################################################################## > diff -pNarU5 linux-3.4/arch/x86/kvm/cpuid.c linux-3.4-mac/arch/x86/kvm/cpuid.c > --- linux-3.4/arch/x86/kvm/cpuid.c 2012-05-20 18:29:13.000000000 -0400 > +++ linux-3.4-mac/arch/x86/kvm/cpuid.c 2012-10-09 11:42:59.921215750 -0400 > @@ -222,11 +222,11 @@ static int do_cpuid_ent(struct kvm_cpuid > f_nx | 0 /* Reserved */ | F(MMXEXT) | F(MMX) | > F(FXSR) | F(FXSR_OPT) | f_gbpages | f_rdtscp | > 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW); > /* cpuid 1.ecx */ > const u32 kvm_supported_word4_x86_features = > - F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | > + F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ | > 0 /* DS-CPL, VMX, SMX, EST */ | > 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | > F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ | > 0 /* Reserved, DCA */ | F(XMM4_1) | > F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) | > diff -pNarU5 linux-3.4/arch/x86/kvm/svm.c linux-3.4-mac/arch/x86/kvm/svm.c > --- linux-3.4/arch/x86/kvm/svm.c 2012-05-20 18:29:13.000000000 -0400 > +++ linux-3.4-mac/arch/x86/kvm/svm.c 2012-10-09 11:44:41.598997481 -0400 > @@ -1102,12 +1102,10 @@ static void init_vmcb(struct vcpu_svm *s > set_intercept(svm, INTERCEPT_VMSAVE); > set_intercept(svm, INTERCEPT_STGI); > set_intercept(svm, INTERCEPT_CLGI); > set_intercept(svm, INTERCEPT_SKINIT); > set_intercept(svm, INTERCEPT_WBINVD); > - set_intercept(svm, INTERCEPT_MONITOR); > - set_intercept(svm, INTERCEPT_MWAIT); > set_intercept(svm, INTERCEPT_XSETBV); > > control->iopm_base_pa = iopm_base; > control->msrpm_base_pa = __pa(svm->msrpm); > control->int_ctl = V_INTR_MASKING_MASK; > diff -pNarU5 linux-3.4/arch/x86/kvm/vmx.c linux-3.4-mac/arch/x86/kvm/vmx.c > --- linux-3.4/arch/x86/kvm/vmx.c 2012-05-20 18:29:13.000000000 -0400 > +++ linux-3.4-mac/arch/x86/kvm/vmx.c 2012-10-09 11:42:59.925215977 -0400 > @@ -1938,11 +1938,11 @@ static __init void nested_vmx_setup_ctls > nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high); > nested_vmx_procbased_ctls_low = 0; > nested_vmx_procbased_ctls_high &= > CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_USE_TSC_OFFSETING | > CPU_BASED_HLT_EXITING | CPU_BASED_INVLPG_EXITING | > - CPU_BASED_MWAIT_EXITING | CPU_BASED_CR3_LOAD_EXITING | > + CPU_BASED_CR3_LOAD_EXITING | > CPU_BASED_CR3_STORE_EXITING | > #ifdef CONFIG_X86_64 > CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING | > #endif > CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING | > @@ -2404,12 +2404,10 @@ static __init int setup_vmcs_config(stru > CPU_BASED_CR3_LOAD_EXITING | > CPU_BASED_CR3_STORE_EXITING | > CPU_BASED_USE_IO_BITMAPS | > CPU_BASED_MOV_DR_EXITING | > CPU_BASED_USE_TSC_OFFSETING | > - CPU_BASED_MWAIT_EXITING | > - CPU_BASED_MONITOR_EXITING | > CPU_BASED_INVLPG_EXITING | > CPU_BASED_RDPMC_EXITING; > > opt = CPU_BASED_TPR_SHADOW | > CPU_BASED_USE_MSR_BITMAPS | > > If all you're trying to do is (selectively) revert to this behavior, > that "shouldn't" mess it up for the MacPro either, so I'm thoroughly > confused at this point :) Yes. Me too. Want to try that other patch and see what happens? > Back in 2010, running MWAIT in L>=1 behaved 100% exactly like a NOP, > didn't power down the physical CPU, just immediately moved on to the > next instruction. As such, there was no power saving and no > opportunity to yield to another L0 thread either, unlike with NOP > emulation at L0. > > Did that change on newer Intel chips (i.e., is guest-mode MWAIT now > doing something smarter than just acting as a guest-mode NOP) ? > > Thanks, > --Gabriel Interesting. What it seems to say is this: MWAIT. Behavior of the MWAIT instruction (which always causes an invalid- opcode exception—#UD—if CPL > 0) is determined by the setting of the “MWAIT exiting” VM-execution control: — If the “MWAIT exiting” VM-execution control is 1, MWAIT causes a VM exit (see Section 22.1.3). — If the “MWAIT exiting” VM-execution control is 0, MWAIT operates normally if any of the following is true: (1) the “interrupt-window exiting” VM-execution control is 0; (2) ECX[0] is 0; or (3) RFLAGS.IF = 1. — If the “MWAIT exiting” VM-execution control is 0, the “interrupt-window exiting” VM-execution control is 1, ECX[0] = 1, and RFLAGS.IF = 0, MWAIT does not cause the processor to enter an implementation-dependent optimized state; instead, control passes to the instruction following the MWAIT instruction. And since interrupt-window exiting is 0 most of the time for KVM, I would expect MWAIT to behave normally. -- MST ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-16 14:04 ` Michael S. Tsirkin @ 2017-03-16 14:58 ` Gabriel L. Somlo 2017-03-16 15:23 ` Michael S. Tsirkin 2017-03-16 15:35 ` Radim Krčmář 0 siblings, 2 replies; 54+ messages in thread From: Gabriel L. Somlo @ 2017-03-16 14:58 UTC (permalink / raw) To: Michael S. Tsirkin Cc: linux-kernel, Paolo Bonzini, Radim Krčmář, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc On Thu, Mar 16, 2017 at 04:04:12PM +0200, Michael S. Tsirkin wrote: > On Thu, Mar 16, 2017 at 09:24:27AM -0400, Gabriel L. Somlo wrote: > > After studying your patch a bit more carefully (sorry, it's crazy > > around here right now :) ) I realized you're simply trying to > > (selectively) decide when to exit L1 and emulate as NOP vs. when to > > just allow L1 to execute MONITOR & MWAIT natively. > > > > Is that right ? Because if so, the issues I saw on my MacPro1,1 are > > weird and inexplicable, given that allowing L>=1 to run MONITOR/MWAIT > > natively was one of the options Alex Graf and Rene Rebe used back in > > the very early days of OS X on QEMU, at the time I got involved with > > that project. Here's part of an out of tree patch against 3.4 which did > > just that, and worked as far as I remember on *any* MWAIT capable > > intel chip I had access to back in 2010: > > > > ############################################################################## > > # 99-mwait.patch.kvm-kmod (Rene Rebe <rene@exactcode.de>) 2010-04-27 > > ############################################################################## > > diff -pNarU5 linux-3.4/arch/x86/kvm/cpuid.c linux-3.4-mac/arch/x86/kvm/cpuid.c > > --- linux-3.4/arch/x86/kvm/cpuid.c 2012-05-20 18:29:13.000000000 -0400 > > +++ linux-3.4-mac/arch/x86/kvm/cpuid.c 2012-10-09 11:42:59.921215750 -0400 > > @@ -222,11 +222,11 @@ static int do_cpuid_ent(struct kvm_cpuid > > f_nx | 0 /* Reserved */ | F(MMXEXT) | F(MMX) | > > F(FXSR) | F(FXSR_OPT) | f_gbpages | f_rdtscp | > > 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW); > > /* cpuid 1.ecx */ > > const u32 kvm_supported_word4_x86_features = > > - F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | > > + F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ | > > 0 /* DS-CPL, VMX, SMX, EST */ | > > 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | > > F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ | > > 0 /* Reserved, DCA */ | F(XMM4_1) | > > F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) | > > diff -pNarU5 linux-3.4/arch/x86/kvm/svm.c linux-3.4-mac/arch/x86/kvm/svm.c > > --- linux-3.4/arch/x86/kvm/svm.c 2012-05-20 18:29:13.000000000 -0400 > > +++ linux-3.4-mac/arch/x86/kvm/svm.c 2012-10-09 11:44:41.598997481 -0400 > > @@ -1102,12 +1102,10 @@ static void init_vmcb(struct vcpu_svm *s > > set_intercept(svm, INTERCEPT_VMSAVE); > > set_intercept(svm, INTERCEPT_STGI); > > set_intercept(svm, INTERCEPT_CLGI); > > set_intercept(svm, INTERCEPT_SKINIT); > > set_intercept(svm, INTERCEPT_WBINVD); > > - set_intercept(svm, INTERCEPT_MONITOR); > > - set_intercept(svm, INTERCEPT_MWAIT); > > set_intercept(svm, INTERCEPT_XSETBV); > > > > control->iopm_base_pa = iopm_base; > > control->msrpm_base_pa = __pa(svm->msrpm); > > control->int_ctl = V_INTR_MASKING_MASK; > > diff -pNarU5 linux-3.4/arch/x86/kvm/vmx.c linux-3.4-mac/arch/x86/kvm/vmx.c > > --- linux-3.4/arch/x86/kvm/vmx.c 2012-05-20 18:29:13.000000000 -0400 > > +++ linux-3.4-mac/arch/x86/kvm/vmx.c 2012-10-09 11:42:59.925215977 -0400 > > @@ -1938,11 +1938,11 @@ static __init void nested_vmx_setup_ctls > > nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high); > > nested_vmx_procbased_ctls_low = 0; > > nested_vmx_procbased_ctls_high &= > > CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_USE_TSC_OFFSETING | > > CPU_BASED_HLT_EXITING | CPU_BASED_INVLPG_EXITING | > > - CPU_BASED_MWAIT_EXITING | CPU_BASED_CR3_LOAD_EXITING | > > + CPU_BASED_CR3_LOAD_EXITING | > > CPU_BASED_CR3_STORE_EXITING | > > #ifdef CONFIG_X86_64 > > CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING | > > #endif > > CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING | > > @@ -2404,12 +2404,10 @@ static __init int setup_vmcs_config(stru > > CPU_BASED_CR3_LOAD_EXITING | > > CPU_BASED_CR3_STORE_EXITING | > > CPU_BASED_USE_IO_BITMAPS | > > CPU_BASED_MOV_DR_EXITING | > > CPU_BASED_USE_TSC_OFFSETING | > > - CPU_BASED_MWAIT_EXITING | > > - CPU_BASED_MONITOR_EXITING | > > CPU_BASED_INVLPG_EXITING | > > CPU_BASED_RDPMC_EXITING; > > > > opt = CPU_BASED_TPR_SHADOW | > > CPU_BASED_USE_MSR_BITMAPS | > > > > If all you're trying to do is (selectively) revert to this behavior, > > that "shouldn't" mess it up for the MacPro either, so I'm thoroughly > > confused at this point :) > > Yes. Me too. Want to try that other patch and see what happens? You mean the old 3.4 patch against current KVM ? I'll try to do that, might take me a while :) > > Back in 2010, running MWAIT in L>=1 behaved 100% exactly like a NOP, > > didn't power down the physical CPU, just immediately moved on to the > > next instruction. As such, there was no power saving and no > > opportunity to yield to another L0 thread either, unlike with NOP > > emulation at L0. > > > > Did that change on newer Intel chips (i.e., is guest-mode MWAIT now > > doing something smarter than just acting as a guest-mode NOP) ? > > > > Thanks, > > --Gabriel > > Interesting. What it seems to say is this: > > MWAIT. Behavior of the MWAIT instruction (which always causes an invalid- > opcode exception—#UD—if CPL > 0) is determined by the setting of the “MWAIT > exiting” VM-execution control: > — If the “MWAIT exiting” VM-execution control is 1, MWAIT causes a VM exit > (see Section 22.1.3). > — If the “MWAIT exiting” VM-execution control is 0, MWAIT operates normally if > any of the following is true: (1) the “interrupt-window exiting” VM-execution > control is 0; (2) ECX[0] is 0; or (3) RFLAGS.IF = 1. > — If the “MWAIT exiting” VM-execution control is 0, the “interrupt-window > exiting” VM-execution control is 1, ECX[0] = 1, and RFLAGS.IF = 0, MWAIT > does not cause the processor to enter an implementation-dependent > optimized state; instead, control passes to the instruction following the > MWAIT instruction. > > > And since interrupt-window exiting is 0 most of the time for KVM, > I would expect MWAIT to behave normally. The intel manual said the same thing back in 2010 as well. However, regardless of how any flags were set, interrupt-window exiting or not, "normal" L1 MWAIT behavior was that it woke up immediately regardless. Remember, never going to sleep is still correct ("normal" ?) behavior per the ISA definition of MWAIT :) Also, when I tested your patch on the macbook air (where it worked), not only was the host reporting 400% CPU for qemu (which is to be expected), but the thermal fan/cooling thing also shifted up into high gear, which means the physical CPU got hot, which it shouldn't have if the guest-mode MWAIT actually did put the host CPU into low power. So at least on this 4-year-old core-I7 chip, the story Intel tells in its manual still doesn't check out. I could never get any clarification on what they mean by "operates normally" :) ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-16 14:58 ` Gabriel L. Somlo @ 2017-03-16 15:23 ` Michael S. Tsirkin 2017-03-16 15:35 ` Radim Krčmář 1 sibling, 0 replies; 54+ messages in thread From: Michael S. Tsirkin @ 2017-03-16 15:23 UTC (permalink / raw) To: Gabriel L. Somlo Cc: linux-kernel, Paolo Bonzini, Radim Krčmář, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc On Thu, Mar 16, 2017 at 10:58:20AM -0400, Gabriel L. Somlo wrote: > On Thu, Mar 16, 2017 at 04:04:12PM +0200, Michael S. Tsirkin wrote: > > On Thu, Mar 16, 2017 at 09:24:27AM -0400, Gabriel L. Somlo wrote: > > > After studying your patch a bit more carefully (sorry, it's crazy > > > around here right now :) ) I realized you're simply trying to > > > (selectively) decide when to exit L1 and emulate as NOP vs. when to > > > just allow L1 to execute MONITOR & MWAIT natively. > > > > > > Is that right ? Because if so, the issues I saw on my MacPro1,1 are > > > weird and inexplicable, given that allowing L>=1 to run MONITOR/MWAIT > > > natively was one of the options Alex Graf and Rene Rebe used back in > > > the very early days of OS X on QEMU, at the time I got involved with > > > that project. Here's part of an out of tree patch against 3.4 which did > > > just that, and worked as far as I remember on *any* MWAIT capable > > > intel chip I had access to back in 2010: > > > > > > ############################################################################## > > > # 99-mwait.patch.kvm-kmod (Rene Rebe <rene@exactcode.de>) 2010-04-27 > > > ############################################################################## > > > diff -pNarU5 linux-3.4/arch/x86/kvm/cpuid.c linux-3.4-mac/arch/x86/kvm/cpuid.c > > > --- linux-3.4/arch/x86/kvm/cpuid.c 2012-05-20 18:29:13.000000000 -0400 > > > +++ linux-3.4-mac/arch/x86/kvm/cpuid.c 2012-10-09 11:42:59.921215750 -0400 > > > @@ -222,11 +222,11 @@ static int do_cpuid_ent(struct kvm_cpuid > > > f_nx | 0 /* Reserved */ | F(MMXEXT) | F(MMX) | > > > F(FXSR) | F(FXSR_OPT) | f_gbpages | f_rdtscp | > > > 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW); > > > /* cpuid 1.ecx */ > > > const u32 kvm_supported_word4_x86_features = > > > - F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | > > > + F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ | > > > 0 /* DS-CPL, VMX, SMX, EST */ | > > > 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | > > > F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ | > > > 0 /* Reserved, DCA */ | F(XMM4_1) | > > > F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) | > > > diff -pNarU5 linux-3.4/arch/x86/kvm/svm.c linux-3.4-mac/arch/x86/kvm/svm.c > > > --- linux-3.4/arch/x86/kvm/svm.c 2012-05-20 18:29:13.000000000 -0400 > > > +++ linux-3.4-mac/arch/x86/kvm/svm.c 2012-10-09 11:44:41.598997481 -0400 > > > @@ -1102,12 +1102,10 @@ static void init_vmcb(struct vcpu_svm *s > > > set_intercept(svm, INTERCEPT_VMSAVE); > > > set_intercept(svm, INTERCEPT_STGI); > > > set_intercept(svm, INTERCEPT_CLGI); > > > set_intercept(svm, INTERCEPT_SKINIT); > > > set_intercept(svm, INTERCEPT_WBINVD); > > > - set_intercept(svm, INTERCEPT_MONITOR); > > > - set_intercept(svm, INTERCEPT_MWAIT); > > > set_intercept(svm, INTERCEPT_XSETBV); > > > > > > control->iopm_base_pa = iopm_base; > > > control->msrpm_base_pa = __pa(svm->msrpm); > > > control->int_ctl = V_INTR_MASKING_MASK; > > > diff -pNarU5 linux-3.4/arch/x86/kvm/vmx.c linux-3.4-mac/arch/x86/kvm/vmx.c > > > --- linux-3.4/arch/x86/kvm/vmx.c 2012-05-20 18:29:13.000000000 -0400 > > > +++ linux-3.4-mac/arch/x86/kvm/vmx.c 2012-10-09 11:42:59.925215977 -0400 > > > @@ -1938,11 +1938,11 @@ static __init void nested_vmx_setup_ctls > > > nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high); > > > nested_vmx_procbased_ctls_low = 0; > > > nested_vmx_procbased_ctls_high &= > > > CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_USE_TSC_OFFSETING | > > > CPU_BASED_HLT_EXITING | CPU_BASED_INVLPG_EXITING | > > > - CPU_BASED_MWAIT_EXITING | CPU_BASED_CR3_LOAD_EXITING | > > > + CPU_BASED_CR3_LOAD_EXITING | > > > CPU_BASED_CR3_STORE_EXITING | > > > #ifdef CONFIG_X86_64 > > > CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING | > > > #endif > > > CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING | > > > @@ -2404,12 +2404,10 @@ static __init int setup_vmcs_config(stru > > > CPU_BASED_CR3_LOAD_EXITING | > > > CPU_BASED_CR3_STORE_EXITING | > > > CPU_BASED_USE_IO_BITMAPS | > > > CPU_BASED_MOV_DR_EXITING | > > > CPU_BASED_USE_TSC_OFFSETING | > > > - CPU_BASED_MWAIT_EXITING | > > > - CPU_BASED_MONITOR_EXITING | > > > CPU_BASED_INVLPG_EXITING | > > > CPU_BASED_RDPMC_EXITING; > > > > > > opt = CPU_BASED_TPR_SHADOW | > > > CPU_BASED_USE_MSR_BITMAPS | > > > > > > If all you're trying to do is (selectively) revert to this behavior, > > > that "shouldn't" mess it up for the MacPro either, so I'm thoroughly > > > confused at this point :) > > > > Yes. Me too. Want to try that other patch and see what happens? > > You mean the old 3.4 patch against current KVM ? I'll try to do that, > might take me a while :) I can rebase them for you if you send me a link. > > > Back in 2010, running MWAIT in L>=1 behaved 100% exactly like a NOP, > > > didn't power down the physical CPU, just immediately moved on to the > > > next instruction. As such, there was no power saving and no > > > opportunity to yield to another L0 thread either, unlike with NOP > > > emulation at L0. > > > > > > Did that change on newer Intel chips (i.e., is guest-mode MWAIT now > > > doing something smarter than just acting as a guest-mode NOP) ? > > > > > > Thanks, > > > --Gabriel > > > > Interesting. What it seems to say is this: > > > > MWAIT. Behavior of the MWAIT instruction (which always causes an invalid- > > opcode exception—#UD—if CPL > 0) is determined by the setting of the “MWAIT > > exiting” VM-execution control: > > — If the “MWAIT exiting” VM-execution control is 1, MWAIT causes a VM exit > > (see Section 22.1.3). > > — If the “MWAIT exiting” VM-execution control is 0, MWAIT operates normally if > > any of the following is true: (1) the “interrupt-window exiting” VM-execution > > control is 0; (2) ECX[0] is 0; or (3) RFLAGS.IF = 1. > > — If the “MWAIT exiting” VM-execution control is 0, the “interrupt-window > > exiting” VM-execution control is 1, ECX[0] = 1, and RFLAGS.IF = 0, MWAIT > > does not cause the processor to enter an implementation-dependent > > optimized state; instead, control passes to the instruction following the > > MWAIT instruction. > > > > > > And since interrupt-window exiting is 0 most of the time for KVM, > > I would expect MWAIT to behave normally. > > The intel manual said the same thing back in 2010 as well. However, > regardless of how any flags were set, interrupt-window exiting or not, > "normal" L1 MWAIT behavior was that it woke up immediately regardless. > Remember, never going to sleep is still correct ("normal" ?) behavior > per the ISA definition of MWAIT :) > > Also, when I tested your patch on the macbook air (where it worked), > not only was the host reporting 400% CPU for qemu (which is to be > expected), but the thermal fan/cooling thing also shifted up into high > gear, which means the physical CPU got hot, which it shouldn't have if > the guest-mode MWAIT actually did put the host CPU into low power. Does same happen with NOP btw? > So at least on this 4-year-old core-I7 chip, the story Intel tells in > its manual still doesn't check out. I could never get any > clarification on what they mean by "operates normally" :) It could be Mac OS sets ECX[0] = 1 and RFLAGS.IF = 0. -- MST ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-16 14:58 ` Gabriel L. Somlo 2017-03-16 15:23 ` Michael S. Tsirkin @ 2017-03-16 15:35 ` Radim Krčmář 2017-03-16 16:01 ` Radim Krčmář 2017-03-16 16:16 ` Gabriel L. Somlo 1 sibling, 2 replies; 54+ messages in thread From: Radim Krčmář @ 2017-03-16 15:35 UTC (permalink / raw) To: Gabriel L. Somlo Cc: Michael S. Tsirkin, linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc 2017-03-16 10:58-0400, Gabriel L. Somlo: > On Thu, Mar 16, 2017 at 04:04:12PM +0200, Michael S. Tsirkin wrote: > > On Thu, Mar 16, 2017 at 09:24:27AM -0400, Gabriel L. Somlo wrote: > > > After studying your patch a bit more carefully (sorry, it's crazy > > > around here right now :) ) I realized you're simply trying to > > > (selectively) decide when to exit L1 and emulate as NOP vs. when to > > > just allow L1 to execute MONITOR & MWAIT natively. > > > > > > Is that right ? Because if so, the issues I saw on my MacPro1,1 are > > > weird and inexplicable, given that allowing L>=1 to run MONITOR/MWAIT > > > natively was one of the options Alex Graf and Rene Rebe used back in > > > the very early days of OS X on QEMU, at the time I got involved with > > > that project. Here's part of an out of tree patch against 3.4 which did > > > just that, and worked as far as I remember on *any* MWAIT capable > > > intel chip I had access to back in 2010: > > > > > > ############################################################################## > > > # 99-mwait.patch.kvm-kmod (Rene Rebe <rene@exactcode.de>) 2010-04-27 > > > ############################################################################## > > > diff -pNarU5 linux-3.4/arch/x86/kvm/cpuid.c linux-3.4-mac/arch/x86/kvm/cpuid.c > > > --- linux-3.4/arch/x86/kvm/cpuid.c 2012-05-20 18:29:13.000000000 -0400 > > > +++ linux-3.4-mac/arch/x86/kvm/cpuid.c 2012-10-09 11:42:59.921215750 -0400 > > > @@ -222,11 +222,11 @@ static int do_cpuid_ent(struct kvm_cpuid > > > f_nx | 0 /* Reserved */ | F(MMXEXT) | F(MMX) | > > > F(FXSR) | F(FXSR_OPT) | f_gbpages | f_rdtscp | > > > 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW); > > > /* cpuid 1.ecx */ > > > const u32 kvm_supported_word4_x86_features = > > > - F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | > > > + F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ | > > > 0 /* DS-CPL, VMX, SMX, EST */ | > > > 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | > > > F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ | > > > 0 /* Reserved, DCA */ | F(XMM4_1) | > > > F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) | > > > diff -pNarU5 linux-3.4/arch/x86/kvm/svm.c linux-3.4-mac/arch/x86/kvm/svm.c > > > --- linux-3.4/arch/x86/kvm/svm.c 2012-05-20 18:29:13.000000000 -0400 > > > +++ linux-3.4-mac/arch/x86/kvm/svm.c 2012-10-09 11:44:41.598997481 -0400 > > > @@ -1102,12 +1102,10 @@ static void init_vmcb(struct vcpu_svm *s > > > set_intercept(svm, INTERCEPT_VMSAVE); > > > set_intercept(svm, INTERCEPT_STGI); > > > set_intercept(svm, INTERCEPT_CLGI); > > > set_intercept(svm, INTERCEPT_SKINIT); > > > set_intercept(svm, INTERCEPT_WBINVD); > > > - set_intercept(svm, INTERCEPT_MONITOR); > > > - set_intercept(svm, INTERCEPT_MWAIT); > > > set_intercept(svm, INTERCEPT_XSETBV); > > > > > > control->iopm_base_pa = iopm_base; > > > control->msrpm_base_pa = __pa(svm->msrpm); > > > control->int_ctl = V_INTR_MASKING_MASK; > > > diff -pNarU5 linux-3.4/arch/x86/kvm/vmx.c linux-3.4-mac/arch/x86/kvm/vmx.c > > > --- linux-3.4/arch/x86/kvm/vmx.c 2012-05-20 18:29:13.000000000 -0400 > > > +++ linux-3.4-mac/arch/x86/kvm/vmx.c 2012-10-09 11:42:59.925215977 -0400 > > > @@ -1938,11 +1938,11 @@ static __init void nested_vmx_setup_ctls > > > nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high); > > > nested_vmx_procbased_ctls_low = 0; > > > nested_vmx_procbased_ctls_high &= > > > CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_USE_TSC_OFFSETING | > > > CPU_BASED_HLT_EXITING | CPU_BASED_INVLPG_EXITING | > > > - CPU_BASED_MWAIT_EXITING | CPU_BASED_CR3_LOAD_EXITING | > > > + CPU_BASED_CR3_LOAD_EXITING | > > > CPU_BASED_CR3_STORE_EXITING | > > > #ifdef CONFIG_X86_64 > > > CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING | > > > #endif > > > CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING | > > > @@ -2404,12 +2404,10 @@ static __init int setup_vmcs_config(stru > > > CPU_BASED_CR3_LOAD_EXITING | > > > CPU_BASED_CR3_STORE_EXITING | > > > CPU_BASED_USE_IO_BITMAPS | > > > CPU_BASED_MOV_DR_EXITING | > > > CPU_BASED_USE_TSC_OFFSETING | > > > - CPU_BASED_MWAIT_EXITING | > > > - CPU_BASED_MONITOR_EXITING | > > > CPU_BASED_INVLPG_EXITING | > > > CPU_BASED_RDPMC_EXITING; > > > > > > opt = CPU_BASED_TPR_SHADOW | > > > CPU_BASED_USE_MSR_BITMAPS | > > > > > > If all you're trying to do is (selectively) revert to this behavior, > > > that "shouldn't" mess it up for the MacPro either, so I'm thoroughly > > > confused at this point :) > > > > Yes. Me too. Want to try that other patch and see what happens? > > You mean the old 3.4 patch against current KVM ? I'll try to do that, > might take me a while :) Michael's patch already did most of that, you just need to add diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index efde6cc50875..b12f07d4ce17 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -348,7 +348,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, const u32 kvm_cpuid_1_ecx_x86_features = /* NOTE: MONITOR (and MWAIT) are emulated as NOP, * but *not* advertised to guests via CPUID ! */ - F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | + F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ | 0 /* DS-CPL, VMX, SMX, EST */ | 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ | Note: this will never be upstream, because mwait isn't what we want by default. :) >> > Back in 2010, running MWAIT in L>=1 behaved 100% exactly like a NOP, >> > didn't power down the physical CPU, just immediately moved on to the >> > next instruction. As such, there was no power saving and no >> > opportunity to yield to another L0 thread either, unlike with NOP >> > emulation at L0. >> > >> > Did that change on newer Intel chips (i.e., is guest-mode MWAIT now >> > doing something smarter than just acting as a guest-mode NOP) ? >> > >> > Thanks, >> > --Gabriel >> >> Interesting. What it seems to say is this: >> >> MWAIT. Behavior of the MWAIT instruction (which always causes an invalid- >> opcode exception—#UD—if CPL > 0) is determined by the setting of the “MWAIT >> exiting” VM-execution control: >> — If the “MWAIT exiting” VM-execution control is 1, MWAIT causes a VM exit >> (see Section 22.1.3). >> — If the “MWAIT exiting” VM-execution control is 0, MWAIT operates normally if >> any of the following is true: (1) the “interrupt-window exiting” VM-execution >> control is 0; (2) ECX[0] is 0; or (3) RFLAGS.IF = 1. >> — If the “MWAIT exiting” VM-execution control is 0, the “interrupt-window >> exiting” VM-execution control is 1, ECX[0] = 1, and RFLAGS.IF = 0, MWAIT >> does not cause the processor to enter an implementation-dependent >> optimized state; instead, control passes to the instruction following the >> MWAIT instruction. >> >> >> And since interrupt-window exiting is 0 most of the time for KVM, >> I would expect MWAIT to behave normally. > > The intel manual said the same thing back in 2010 as well. However, > regardless of how any flags were set, interrupt-window exiting or not, > "normal" L1 MWAIT behavior was that it woke up immediately regardless. > Remember, never going to sleep is still correct ("normal" ?) behavior > per the ISA definition of MWAIT :) I'll write a simple kvm-unit-test to better understand why it is broken for you ... > Also, when I tested your patch on the macbook air (where it worked), > not only was the host reporting 400% CPU for qemu (which is to be > expected), but the thermal fan/cooling thing also shifted up into high > gear, which means the physical CPU got hot, which it shouldn't have if > the guest-mode MWAIT actually did put the host CPU into low power. I tested MWAIT with basically the same kernel patch and the qemu patch with Linux guest on Haswell and Nehalem. Running the guest took 100% of the host CPUs, but it still had the same temperature as when the host was idle. That reminds me that you to pass '-cpu host' for QEMU reasons. ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-16 15:35 ` Radim Krčmář @ 2017-03-16 16:01 ` Radim Krčmář 2017-03-16 16:47 ` Gabriel L. Somlo 2017-03-16 16:16 ` Gabriel L. Somlo 1 sibling, 1 reply; 54+ messages in thread From: Radim Krčmář @ 2017-03-16 16:01 UTC (permalink / raw) To: Gabriel L. Somlo Cc: Michael S. Tsirkin, linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc 2017-03-16 16:35+0100, Radim Krčmář: > 2017-03-16 10:58-0400, Gabriel L. Somlo: >> The intel manual said the same thing back in 2010 as well. However, >> regardless of how any flags were set, interrupt-window exiting or not, >> "normal" L1 MWAIT behavior was that it woke up immediately regardless. >> Remember, never going to sleep is still correct ("normal" ?) behavior >> per the ISA definition of MWAIT :) > > I'll write a simple kvm-unit-test to better understand why it is broken > for you ... Please get git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git and try this, thanks! ---8<--- x86/mwait: crappy test `./configure && make` to build it, then follow the comment in code to try few cases. --- x86/Makefile.common | 1 + x86/mwait.c | 41 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 42 insertions(+) create mode 100644 x86/mwait.c diff --git a/x86/Makefile.common b/x86/Makefile.common index 1dad18ba26e1..1e708a6acd39 100644 --- a/x86/Makefile.common +++ b/x86/Makefile.common @@ -46,6 +46,7 @@ tests-common = $(TEST_DIR)/vmexit.flat $(TEST_DIR)/tsc.flat \ $(TEST_DIR)/tsc_adjust.flat $(TEST_DIR)/asyncpf.flat \ $(TEST_DIR)/init.flat $(TEST_DIR)/smap.flat \ $(TEST_DIR)/hyperv_synic.flat $(TEST_DIR)/hyperv_stimer.flat \ + $(TEST_DIR)/mwait.flat \ ifdef API tests-common += api/api-sample diff --git a/x86/mwait.c b/x86/mwait.c new file mode 100644 index 000000000000..c21dab5cc97d --- /dev/null +++ b/x86/mwait.c @@ -0,0 +1,41 @@ +#include "vm.h" + +#define TARGET_RESUMES 10000 +volatile unsigned page[4096 / 4]; + +/* + * Execute + * time TIMEOUT=20 ./x86-run x86/mwait.flat -append '0 1 1' + * (first two arguments are eax and ecx for MWAIT, the third is FLAGS.IF bit) + * I assume you have 1000 Hz scheduler, so the test should take about 10 + * seconds to run if mwait works (host timer interrupts will kick mwait). + * + * If you get far less, then mwait is just nop, as in the case of + * + * time TIMEOUT=20 ./x86-run x86/mwait.flat -append '0 1 0' + * + * All other combinations of arguments should take 10 seconds. + * Getting killed by the TIMEOUT most likely means that you have different HZ, + * but could also be a bug ... + */ +int main(int argc, char **argv) +{ + uint32_t eax = atol(argv[1]); + uint32_t ecx = atol(argv[2]); + bool sti = atol(argv[3]); + unsigned resumes = 0; + + if (sti) + asm volatile ("sti"); + else + asm volatile ("cli"); + + while (resumes < TARGET_RESUMES) { + asm volatile("monitor" :: "a" (page), "c" (0), "d" (0)); + asm volatile("mwait" :: "a" (eax), "c" (ecx)); + resumes++; + } + + report("resumed from mwait %u times", resumes == TARGET_RESUMES, resumes); + return report_summary(); +} -- 2.11.0 ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-16 16:01 ` Radim Krčmář @ 2017-03-16 16:47 ` Gabriel L. Somlo 2017-03-16 17:22 ` Radim Krčmář 2017-03-16 17:27 ` Michael S. Tsirkin 0 siblings, 2 replies; 54+ messages in thread From: Gabriel L. Somlo @ 2017-03-16 16:47 UTC (permalink / raw) To: Radim Krčmář Cc: Michael S. Tsirkin, linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc On Thu, Mar 16, 2017 at 05:01:58PM +0100, Radim Krčmář wrote: > 2017-03-16 16:35+0100, Radim Krčmář: > > 2017-03-16 10:58-0400, Gabriel L. Somlo: > >> The intel manual said the same thing back in 2010 as well. However, > >> regardless of how any flags were set, interrupt-window exiting or not, > >> "normal" L1 MWAIT behavior was that it woke up immediately regardless. > >> Remember, never going to sleep is still correct ("normal" ?) behavior > >> per the ISA definition of MWAIT :) > > > > I'll write a simple kvm-unit-test to better understand why it is broken > > for you ... > > Please get git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git > > and try this, thanks! > > ---8<--- > x86/mwait: crappy test > > `./configure && make` to build it, then follow the comment in code to > try few cases. kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '0 1 1' timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 0 1 1 enabling apic PASS: resumed from mwait 10000 times SUMMARY: 1 tests real 0m10.564s user 0m10.339s sys 0m0.225s and kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '0 1 0' timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 0 1 0 enabling apic PASS: resumed from mwait 10000 times SUMMARY: 1 tests real 0m0.746s user 0m0.555s sys 0m0.200s Both of these with Michael's v5 patch applied, on the MacPro1,1. Similar behavior (0 1 1 takes 10 seconds, 0 1 0 returns immediately) on the macbook air. If I revert to the original (nop-emulated MWAIT) kvm source, I get both versions to return immediately. HTH, --Gabriel > > --- > x86/Makefile.common | 1 + > x86/mwait.c | 41 +++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 42 insertions(+) > create mode 100644 x86/mwait.c > > diff --git a/x86/Makefile.common b/x86/Makefile.common > index 1dad18ba26e1..1e708a6acd39 100644 > --- a/x86/Makefile.common > +++ b/x86/Makefile.common > @@ -46,6 +46,7 @@ tests-common = $(TEST_DIR)/vmexit.flat $(TEST_DIR)/tsc.flat \ > $(TEST_DIR)/tsc_adjust.flat $(TEST_DIR)/asyncpf.flat \ > $(TEST_DIR)/init.flat $(TEST_DIR)/smap.flat \ > $(TEST_DIR)/hyperv_synic.flat $(TEST_DIR)/hyperv_stimer.flat \ > + $(TEST_DIR)/mwait.flat \ > > ifdef API > tests-common += api/api-sample > diff --git a/x86/mwait.c b/x86/mwait.c > new file mode 100644 > index 000000000000..c21dab5cc97d > --- /dev/null > +++ b/x86/mwait.c > @@ -0,0 +1,41 @@ > +#include "vm.h" > + > +#define TARGET_RESUMES 10000 > +volatile unsigned page[4096 / 4]; > + > +/* > + * Execute > + * time TIMEOUT=20 ./x86-run x86/mwait.flat -append '0 1 1' > + * (first two arguments are eax and ecx for MWAIT, the third is FLAGS.IF bit) > + * I assume you have 1000 Hz scheduler, so the test should take about 10 > + * seconds to run if mwait works (host timer interrupts will kick mwait). > + * > + * If you get far less, then mwait is just nop, as in the case of > + * > + * time TIMEOUT=20 ./x86-run x86/mwait.flat -append '0 1 0' > + * > + * All other combinations of arguments should take 10 seconds. > + * Getting killed by the TIMEOUT most likely means that you have different HZ, > + * but could also be a bug ... > + */ > +int main(int argc, char **argv) > +{ > + uint32_t eax = atol(argv[1]); > + uint32_t ecx = atol(argv[2]); > + bool sti = atol(argv[3]); > + unsigned resumes = 0; > + > + if (sti) > + asm volatile ("sti"); > + else > + asm volatile ("cli"); > + > + while (resumes < TARGET_RESUMES) { > + asm volatile("monitor" :: "a" (page), "c" (0), "d" (0)); > + asm volatile("mwait" :: "a" (eax), "c" (ecx)); > + resumes++; > + } > + > + report("resumed from mwait %u times", resumes == TARGET_RESUMES, resumes); > + return report_summary(); > +} > -- > 2.11.0 > ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-16 16:47 ` Gabriel L. Somlo @ 2017-03-16 17:22 ` Radim Krčmář 2017-03-16 17:39 ` Gabriel L. Somlo 2017-03-16 17:27 ` Michael S. Tsirkin 1 sibling, 1 reply; 54+ messages in thread From: Radim Krčmář @ 2017-03-16 17:22 UTC (permalink / raw) To: Gabriel L. Somlo Cc: Michael S. Tsirkin, linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc 2017-03-16 12:47-0400, Gabriel L. Somlo: > On Thu, Mar 16, 2017 at 05:01:58PM +0100, Radim Krčmář wrote: > > 2017-03-16 16:35+0100, Radim Krčmář: > > > 2017-03-16 10:58-0400, Gabriel L. Somlo: > > >> The intel manual said the same thing back in 2010 as well. However, > > >> regardless of how any flags were set, interrupt-window exiting or not, > > >> "normal" L1 MWAIT behavior was that it woke up immediately regardless. > > >> Remember, never going to sleep is still correct ("normal" ?) behavior > > >> per the ISA definition of MWAIT :) > > > > > > I'll write a simple kvm-unit-test to better understand why it is broken > > > for you ... > > > > Please get git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git > > > > and try this, thanks! > > > > ---8<--- > > x86/mwait: crappy test > > > > `./configure && make` to build it, then follow the comment in code to > > try few cases. > > kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '0 1 1' > timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 0 1 1 > enabling apic > PASS: resumed from mwait 10000 times > SUMMARY: 1 tests > > real 0m10.564s > user 0m10.339s > sys 0m0.225s > > > and > > kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '0 1 0' > timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 0 1 0 > enabling apic > PASS: resumed from mwait 10000 times > SUMMARY: 1 tests > > real 0m0.746s > user 0m0.555s > sys 0m0.200s > > Both of these with Michael's v5 patch applied, on the MacPro1,1. > > Similar behavior (0 1 1 takes 10 seconds, 0 1 0 returns immediately) > on the macbook air. > > If I revert to the original (nop-emulated MWAIT) kvm source, I get > both versions to return immediately. Those look normal ... maybe MWAIT just ignores writes to the monitored area? Please apply the patch below and following and try: time TIMEOUT=20 ./x86-run x86/mwait.flat -append '0 1 1' -smp 2 time TIMEOUT=20 ./x86-run x86/mwait.flat -append '0 0 1' -smp 2 time TIMEOUT=20 ./x86-run x86/mwait.flat -append '0 0 0' -smp 2 All of them should take rougly the same time as the NOP one, time TIMEOUT=20 ./x86-run x86/mwait.flat -append '0 1 0' -smp 2 Thanks. ---8<--- diff --git a/x86/mwait.c b/x86/mwait.c index c21dab5cc97d..ca38e7223596 100644 --- a/x86/mwait.c +++ b/x86/mwait.c @@ -1,7 +1,9 @@ #include "vm.h" +#include "smp.h" #define TARGET_RESUMES 10000 volatile unsigned page[4096 / 4]; +volatile unsigned resumes; /* * Execute @@ -18,19 +20,39 @@ volatile unsigned page[4096 / 4]; * Getting killed by the TIMEOUT most likely means that you have different HZ, * but could also be a bug ... */ +void writer(void *null) +{ + int i; + unsigned old_resumes = 0, new_resumes; + + for (i = 0; i < TARGET_RESUMES; i++) { + (*page)++; + + while (old_resumes == (new_resumes = resumes)) + pause(); + old_resumes = new_resumes; + } +} + int main(int argc, char **argv) { uint32_t eax = atol(argv[1]); uint32_t ecx = atol(argv[2]); bool sti = atol(argv[3]); - unsigned resumes = 0; + bool smp; + + smp_init(); + smp = cpu_count() > 1; + + if (smp) + on_cpu_async(1, writer, NULL); if (sti) asm volatile ("sti"); else asm volatile ("cli"); - while (resumes < TARGET_RESUMES) { + while ((smp ? *page : resumes) < TARGET_RESUMES) { asm volatile("monitor" :: "a" (page), "c" (0), "d" (0)); asm volatile("mwait" :: "a" (eax), "c" (ecx)); resumes++; ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-16 17:22 ` Radim Krčmář @ 2017-03-16 17:39 ` Gabriel L. Somlo 0 siblings, 0 replies; 54+ messages in thread From: Gabriel L. Somlo @ 2017-03-16 17:39 UTC (permalink / raw) To: Radim Krčmář Cc: Michael S. Tsirkin, linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc On Thu, Mar 16, 2017 at 06:22:44PM +0100, Radim Krčmář wrote: > 2017-03-16 12:47-0400, Gabriel L. Somlo: > > On Thu, Mar 16, 2017 at 05:01:58PM +0100, Radim Krčmář wrote: > > > 2017-03-16 16:35+0100, Radim Krčmář: > > > > 2017-03-16 10:58-0400, Gabriel L. Somlo: > > > >> The intel manual said the same thing back in 2010 as well. However, > > > >> regardless of how any flags were set, interrupt-window exiting or not, > > > >> "normal" L1 MWAIT behavior was that it woke up immediately regardless. > > > >> Remember, never going to sleep is still correct ("normal" ?) behavior > > > >> per the ISA definition of MWAIT :) > > > > > > > > I'll write a simple kvm-unit-test to better understand why it is broken > > > > for you ... > > > > > > Please get git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git > > > > > > and try this, thanks! > > > > > > ---8<--- > > > x86/mwait: crappy test > > > > > > `./configure && make` to build it, then follow the comment in code to > > > try few cases. > > > > kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '0 1 1' > > timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 0 1 1 > > enabling apic > > PASS: resumed from mwait 10000 times > > SUMMARY: 1 tests > > > > real 0m10.564s > > user 0m10.339s > > sys 0m0.225s > > > > > > and > > > > kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '0 1 0' > > timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 0 1 0 > > enabling apic > > PASS: resumed from mwait 10000 times > > SUMMARY: 1 tests > > > > real 0m0.746s > > user 0m0.555s > > sys 0m0.200s > > > > Both of these with Michael's v5 patch applied, on the MacPro1,1. > > > > Similar behavior (0 1 1 takes 10 seconds, 0 1 0 returns immediately) > > on the macbook air. > > > > If I revert to the original (nop-emulated MWAIT) kvm source, I get > > both versions to return immediately. > > Those look normal ... maybe MWAIT just ignores writes to the monitored > area? > > Please apply the patch below and following and try: > > time TIMEOUT=20 ./x86-run x86/mwait.flat -append '0 1 1' -smp 2 timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 0 1 1 -smp 2 enabling apic enabling apic PASS: resumed from mwait 10000 times SUMMARY: 1 tests real 0m0.758s user 0m0.557s sys 0m0.220s > time TIMEOUT=20 ./x86-run x86/mwait.flat -append '0 0 1' -smp 2 timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 0 0 1 -smp 2 enabling apic enabling apic PASS: resumed from mwait 10000 times SUMMARY: 1 tests real 0m0.748s user 0m0.550s sys 0m0.210s > time TIMEOUT=20 ./x86-run x86/mwait.flat -append '0 0 0' -smp 2 timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 0 0 0 -smp 2 enabling apic enabling apic PASS: resumed from mwait 10000 times SUMMARY: 1 tests real 0m0.745s user 0m0.558s sys 0m0.203s > > All of them should take rougly the same time as the NOP one, > > time TIMEOUT=20 ./x86-run x86/mwait.flat -append '0 1 0' -smp 2 They all *did* return fast, as you expected. > ---8<--- > diff --git a/x86/mwait.c b/x86/mwait.c > index c21dab5cc97d..ca38e7223596 100644 > --- a/x86/mwait.c > +++ b/x86/mwait.c > @@ -1,7 +1,9 @@ > #include "vm.h" > +#include "smp.h" > > #define TARGET_RESUMES 10000 > volatile unsigned page[4096 / 4]; > +volatile unsigned resumes; > > /* > * Execute > @@ -18,19 +20,39 @@ volatile unsigned page[4096 / 4]; > * Getting killed by the TIMEOUT most likely means that you have different HZ, > * but could also be a bug ... > */ > +void writer(void *null) > +{ > + int i; > + unsigned old_resumes = 0, new_resumes; > + > + for (i = 0; i < TARGET_RESUMES; i++) { > + (*page)++; > + > + while (old_resumes == (new_resumes = resumes)) > + pause(); > + old_resumes = new_resumes; > + } > +} > + > int main(int argc, char **argv) > { > uint32_t eax = atol(argv[1]); > uint32_t ecx = atol(argv[2]); > bool sti = atol(argv[3]); > - unsigned resumes = 0; > + bool smp; > + > + smp_init(); > + smp = cpu_count() > 1; > + > + if (smp) > + on_cpu_async(1, writer, NULL); > > if (sti) > asm volatile ("sti"); > else > asm volatile ("cli"); > > - while (resumes < TARGET_RESUMES) { > + while ((smp ? *page : resumes) < TARGET_RESUMES) { > asm volatile("monitor" :: "a" (page), "c" (0), "d" (0)); > asm volatile("mwait" :: "a" (eax), "c" (ecx)); > resumes++; ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-16 16:47 ` Gabriel L. Somlo 2017-03-16 17:22 ` Radim Krčmář @ 2017-03-16 17:27 ` Michael S. Tsirkin 2017-03-16 17:41 ` Gabriel L. Somlo 1 sibling, 1 reply; 54+ messages in thread From: Michael S. Tsirkin @ 2017-03-16 17:27 UTC (permalink / raw) To: Gabriel L. Somlo Cc: Radim Krčmář, linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc On Thu, Mar 16, 2017 at 12:47:50PM -0400, Gabriel L. Somlo wrote: > On Thu, Mar 16, 2017 at 05:01:58PM +0100, Radim Krčmář wrote: > > 2017-03-16 16:35+0100, Radim Krčmář: > > > 2017-03-16 10:58-0400, Gabriel L. Somlo: > > >> The intel manual said the same thing back in 2010 as well. However, > > >> regardless of how any flags were set, interrupt-window exiting or not, > > >> "normal" L1 MWAIT behavior was that it woke up immediately regardless. > > >> Remember, never going to sleep is still correct ("normal" ?) behavior > > >> per the ISA definition of MWAIT :) > > > > > > I'll write a simple kvm-unit-test to better understand why it is broken > > > for you ... > > > > Please get git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git > > > > and try this, thanks! > > > > ---8<--- > > x86/mwait: crappy test > > > > `./configure && make` to build it, then follow the comment in code to > > try few cases. > > kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '0 1 1' > timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 0 1 1 > enabling apic > PASS: resumed from mwait 10000 times > SUMMARY: 1 tests > > real 0m10.564s > user 0m10.339s > sys 0m0.225s > > > and > > kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '0 1 0' > timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 0 1 0 > enabling apic > PASS: resumed from mwait 10000 times > SUMMARY: 1 tests > > real 0m0.746s > user 0m0.555s > sys 0m0.200s > > Both of these with Michael's v5 patch applied, on the MacPro1,1. Would it make sense to try to set ECX to 0? 0 0 1 and 0 0 0. > Similar behavior (0 1 1 takes 10 seconds, 0 1 0 returns immediately) > on the macbook air. > > If I revert to the original (nop-emulated MWAIT) kvm source, I get > both versions to return immediately. > > HTH, > --Gabriel > > > > > > > --- > > x86/Makefile.common | 1 + > > x86/mwait.c | 41 +++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 42 insertions(+) > > create mode 100644 x86/mwait.c > > > > diff --git a/x86/Makefile.common b/x86/Makefile.common > > index 1dad18ba26e1..1e708a6acd39 100644 > > --- a/x86/Makefile.common > > +++ b/x86/Makefile.common > > @@ -46,6 +46,7 @@ tests-common = $(TEST_DIR)/vmexit.flat $(TEST_DIR)/tsc.flat \ > > $(TEST_DIR)/tsc_adjust.flat $(TEST_DIR)/asyncpf.flat \ > > $(TEST_DIR)/init.flat $(TEST_DIR)/smap.flat \ > > $(TEST_DIR)/hyperv_synic.flat $(TEST_DIR)/hyperv_stimer.flat \ > > + $(TEST_DIR)/mwait.flat \ > > > > ifdef API > > tests-common += api/api-sample > > diff --git a/x86/mwait.c b/x86/mwait.c > > new file mode 100644 > > index 000000000000..c21dab5cc97d > > --- /dev/null > > +++ b/x86/mwait.c > > @@ -0,0 +1,41 @@ > > +#include "vm.h" > > + > > +#define TARGET_RESUMES 10000 > > +volatile unsigned page[4096 / 4]; > > + > > +/* > > + * Execute > > + * time TIMEOUT=20 ./x86-run x86/mwait.flat -append '0 1 1' > > + * (first two arguments are eax and ecx for MWAIT, the third is FLAGS.IF bit) > > + * I assume you have 1000 Hz scheduler, so the test should take about 10 > > + * seconds to run if mwait works (host timer interrupts will kick mwait). > > + * > > + * If you get far less, then mwait is just nop, as in the case of > > + * > > + * time TIMEOUT=20 ./x86-run x86/mwait.flat -append '0 1 0' > > + * > > + * All other combinations of arguments should take 10 seconds. > > + * Getting killed by the TIMEOUT most likely means that you have different HZ, > > + * but could also be a bug ... > > + */ > > +int main(int argc, char **argv) > > +{ > > + uint32_t eax = atol(argv[1]); > > + uint32_t ecx = atol(argv[2]); > > + bool sti = atol(argv[3]); > > + unsigned resumes = 0; > > + > > + if (sti) > > + asm volatile ("sti"); > > + else > > + asm volatile ("cli"); > > + > > + while (resumes < TARGET_RESUMES) { > > + asm volatile("monitor" :: "a" (page), "c" (0), "d" (0)); > > + asm volatile("mwait" :: "a" (eax), "c" (ecx)); > > + resumes++; > > + } > > + > > + report("resumed from mwait %u times", resumes == TARGET_RESUMES, resumes); > > + return report_summary(); > > +} > > -- > > 2.11.0 > > ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-16 17:27 ` Michael S. Tsirkin @ 2017-03-16 17:41 ` Gabriel L. Somlo 2017-03-16 18:29 ` Michael S. Tsirkin 0 siblings, 1 reply; 54+ messages in thread From: Gabriel L. Somlo @ 2017-03-16 17:41 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Radim Krčmář, linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc On Thu, Mar 16, 2017 at 07:27:34PM +0200, Michael S. Tsirkin wrote: > On Thu, Mar 16, 2017 at 12:47:50PM -0400, Gabriel L. Somlo wrote: > > On Thu, Mar 16, 2017 at 05:01:58PM +0100, Radim Krčmář wrote: > > > 2017-03-16 16:35+0100, Radim Krčmář: > > > > 2017-03-16 10:58-0400, Gabriel L. Somlo: > > > >> The intel manual said the same thing back in 2010 as well. However, > > > >> regardless of how any flags were set, interrupt-window exiting or not, > > > >> "normal" L1 MWAIT behavior was that it woke up immediately regardless. > > > >> Remember, never going to sleep is still correct ("normal" ?) behavior > > > >> per the ISA definition of MWAIT :) > > > > > > > > I'll write a simple kvm-unit-test to better understand why it is broken > > > > for you ... > > > > > > Please get git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git > > > > > > and try this, thanks! > > > > > > ---8<--- > > > x86/mwait: crappy test > > > > > > `./configure && make` to build it, then follow the comment in code to > > > try few cases. > > > > kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '0 1 1' > > timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 0 1 1 > > enabling apic > > PASS: resumed from mwait 10000 times > > SUMMARY: 1 tests > > > > real 0m10.564s > > user 0m10.339s > > sys 0m0.225s > > > > > > and > > > > kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '0 1 0' > > timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 0 1 0 > > enabling apic > > PASS: resumed from mwait 10000 times > > SUMMARY: 1 tests > > > > real 0m0.746s > > user 0m0.555s > > sys 0m0.200s > > > > Both of these with Michael's v5 patch applied, on the MacPro1,1. > > Would it make sense to try to set ECX to 0? 0 0 1 and 0 0 0. $ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '0 0 1' timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 0 0 1 enabling apic PASS: resumed from mwait 10000 times SUMMARY: 1 tests real 0m10.567s user 0m10.367s sys 0m0.210s $ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '0 0 0' timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 0 0 0 enabling apic PASS: resumed from mwait 10000 times SUMMARY: 1 tests real 0m10.549s user 0m10.352s sys 0m0.206s Both took 10 seconds. > > > Similar behavior (0 1 1 takes 10 seconds, 0 1 0 returns immediately) > > on the macbook air. > > > > If I revert to the original (nop-emulated MWAIT) kvm source, I get > > both versions to return immediately. > > > > HTH, > > --Gabriel > > > > > > > > > > > > --- > > > x86/Makefile.common | 1 + > > > x86/mwait.c | 41 +++++++++++++++++++++++++++++++++++++++++ > > > 2 files changed, 42 insertions(+) > > > create mode 100644 x86/mwait.c > > > > > > diff --git a/x86/Makefile.common b/x86/Makefile.common > > > index 1dad18ba26e1..1e708a6acd39 100644 > > > --- a/x86/Makefile.common > > > +++ b/x86/Makefile.common > > > @@ -46,6 +46,7 @@ tests-common = $(TEST_DIR)/vmexit.flat $(TEST_DIR)/tsc.flat \ > > > $(TEST_DIR)/tsc_adjust.flat $(TEST_DIR)/asyncpf.flat \ > > > $(TEST_DIR)/init.flat $(TEST_DIR)/smap.flat \ > > > $(TEST_DIR)/hyperv_synic.flat $(TEST_DIR)/hyperv_stimer.flat \ > > > + $(TEST_DIR)/mwait.flat \ > > > > > > ifdef API > > > tests-common += api/api-sample > > > diff --git a/x86/mwait.c b/x86/mwait.c > > > new file mode 100644 > > > index 000000000000..c21dab5cc97d > > > --- /dev/null > > > +++ b/x86/mwait.c > > > @@ -0,0 +1,41 @@ > > > +#include "vm.h" > > > + > > > +#define TARGET_RESUMES 10000 > > > +volatile unsigned page[4096 / 4]; > > > + > > > +/* > > > + * Execute > > > + * time TIMEOUT=20 ./x86-run x86/mwait.flat -append '0 1 1' > > > + * (first two arguments are eax and ecx for MWAIT, the third is FLAGS.IF bit) > > > + * I assume you have 1000 Hz scheduler, so the test should take about 10 > > > + * seconds to run if mwait works (host timer interrupts will kick mwait). > > > + * > > > + * If you get far less, then mwait is just nop, as in the case of > > > + * > > > + * time TIMEOUT=20 ./x86-run x86/mwait.flat -append '0 1 0' > > > + * > > > + * All other combinations of arguments should take 10 seconds. > > > + * Getting killed by the TIMEOUT most likely means that you have different HZ, > > > + * but could also be a bug ... > > > + */ > > > +int main(int argc, char **argv) > > > +{ > > > + uint32_t eax = atol(argv[1]); > > > + uint32_t ecx = atol(argv[2]); > > > + bool sti = atol(argv[3]); > > > + unsigned resumes = 0; > > > + > > > + if (sti) > > > + asm volatile ("sti"); > > > + else > > > + asm volatile ("cli"); > > > + > > > + while (resumes < TARGET_RESUMES) { > > > + asm volatile("monitor" :: "a" (page), "c" (0), "d" (0)); > > > + asm volatile("mwait" :: "a" (eax), "c" (ecx)); > > > + resumes++; > > > + } > > > + > > > + report("resumed from mwait %u times", resumes == TARGET_RESUMES, resumes); > > > + return report_summary(); > > > +} > > > -- > > > 2.11.0 > > > ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-16 17:41 ` Gabriel L. Somlo @ 2017-03-16 18:29 ` Michael S. Tsirkin 2017-03-16 19:24 ` Gabriel L. Somlo 0 siblings, 1 reply; 54+ messages in thread From: Michael S. Tsirkin @ 2017-03-16 18:29 UTC (permalink / raw) To: Gabriel L. Somlo Cc: Radim Krčmář, linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc Let's take a step back and try to figure out how is mwait called. How about dumping code of VCPUs around mwait? gdb disa command will do this. -- MST ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-16 18:29 ` Michael S. Tsirkin @ 2017-03-16 19:24 ` Gabriel L. Somlo 2017-03-16 19:27 ` Michael S. Tsirkin 0 siblings, 1 reply; 54+ messages in thread From: Gabriel L. Somlo @ 2017-03-16 19:24 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Radim Krčmář, linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc On Thu, Mar 16, 2017 at 08:29:32PM +0200, Michael S. Tsirkin wrote: > Let's take a step back and try to figure out how is > mwait called. How about dumping code of VCPUs > around mwait? gdb disa command will do this. Started guest with '-s', tried to attach from gdb with "target remote localhost:1234", got "remote 'g' packet reply is too long: <lengthy string of numbers>" Tried typing 'cont' in the qemu monitor, got os x to crash: panic (cpu 1 caller 0xffffff7f813ff488): pmLock: waited too long, held by 0xffffff7f813eff65 Hmm, maybe that's where it keeps its monitor/mwait idle loop. Restarted the guest, tried this from monitor: dump-guest-memory foobar 0xffffff7f813e0000 0x20000 Got "'dump-guest-memory' has failed: integer is for 32-bit values" Hmmm... I have no idea what I'm doing anymore at this point... :) --G ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-16 19:24 ` Gabriel L. Somlo @ 2017-03-16 19:27 ` Michael S. Tsirkin 2017-03-16 20:17 ` Gabriel L. Somlo 0 siblings, 1 reply; 54+ messages in thread From: Michael S. Tsirkin @ 2017-03-16 19:27 UTC (permalink / raw) To: Gabriel L. Somlo Cc: Radim Krčmář, linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc On Thu, Mar 16, 2017 at 03:24:41PM -0400, Gabriel L. Somlo wrote: > On Thu, Mar 16, 2017 at 08:29:32PM +0200, Michael S. Tsirkin wrote: > > Let's take a step back and try to figure out how is > > mwait called. How about dumping code of VCPUs > > around mwait? gdb disa command will do this. > > Started guest with '-s', tried to attach from gdb with > "target remote localhost:1234", got > "remote 'g' packet reply is too long: <lengthy string of numbers>" Try set arch x86-64:x86-64 > Tried typing 'cont' in the qemu monitor, got os x to crash: > > panic (cpu 1 caller 0xffffff7f813ff488): pmLock: waited too long, held > by 0xffffff7f813eff65 > > Hmm, maybe that's where it keeps its monitor/mwait idle loop. > Restarted the guest, tried this from monitor: > > dump-guest-memory foobar 0xffffff7f813e0000 0x20000 > > Got "'dump-guest-memory' has failed: integer is for 32-bit values" > > Hmmm... I have no idea what I'm doing anymore at this point... :) > > --G I think 0xffffff7f813ff488 is a PC. -- MST ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-16 19:27 ` Michael S. Tsirkin @ 2017-03-16 20:17 ` Gabriel L. Somlo 2017-03-16 21:14 ` Gabriel L. Somlo 0 siblings, 1 reply; 54+ messages in thread From: Gabriel L. Somlo @ 2017-03-16 20:17 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Radim Krčmář, linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc On Thu, Mar 16, 2017 at 09:27:56PM +0200, Michael S. Tsirkin wrote: > On Thu, Mar 16, 2017 at 03:24:41PM -0400, Gabriel L. Somlo wrote: > > On Thu, Mar 16, 2017 at 08:29:32PM +0200, Michael S. Tsirkin wrote: > > > Let's take a step back and try to figure out how is > > > mwait called. How about dumping code of VCPUs > > > around mwait? gdb disa command will do this. > > > > Started guest with '-s', tried to attach from gdb with > > "target remote localhost:1234", got > > "remote 'g' packet reply is too long: <lengthy string of numbers>" > > Try > > set arch x86-64:x86-64 'set architecture i386:x86-64:intel' is what worked for me; Been rooting around for a while, can't find mwait or monitor :( Guess I'll have to recompile KVM to actually issue an invalid opcode, so OS X will print a panic message with the exact address :) Stay tuned... > > > Tried typing 'cont' in the qemu monitor, got os x to crash: > > > > panic (cpu 1 caller 0xffffff7f813ff488): pmLock: waited too long, held > > by 0xffffff7f813eff65 > > > > Hmm, maybe that's where it keeps its monitor/mwait idle loop. > > Restarted the guest, tried this from monitor: > > > > dump-guest-memory foobar 0xffffff7f813e0000 0x20000 > > > > Got "'dump-guest-memory' has failed: integer is for 32-bit values" > > > > Hmmm... I have no idea what I'm doing anymore at this point... :) > > > > --G > > I think 0xffffff7f813ff488 is a PC. > > -- > MST ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-16 20:17 ` Gabriel L. Somlo @ 2017-03-16 21:14 ` Gabriel L. Somlo 2017-03-17 2:03 ` Michael S. Tsirkin 0 siblings, 1 reply; 54+ messages in thread From: Gabriel L. Somlo @ 2017-03-16 21:14 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Radim Krčmář, linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc On Thu, Mar 16, 2017 at 04:17:11PM -0400, Gabriel L. Somlo wrote: > On Thu, Mar 16, 2017 at 09:27:56PM +0200, Michael S. Tsirkin wrote: > > On Thu, Mar 16, 2017 at 03:24:41PM -0400, Gabriel L. Somlo wrote: > > > On Thu, Mar 16, 2017 at 08:29:32PM +0200, Michael S. Tsirkin wrote: > > > > Let's take a step back and try to figure out how is > > > > mwait called. How about dumping code of VCPUs > > > > around mwait? gdb disa command will do this. > > > > > > Started guest with '-s', tried to attach from gdb with > > > "target remote localhost:1234", got > > > "remote 'g' packet reply is too long: <lengthy string of numbers>" > > > > Try > > > > set arch x86-64:x86-64 > > 'set architecture i386:x86-64:intel' is what worked for me; > > Been rooting around for a while, can't find mwait or monitor :( > > Guess I'll have to recompile KVM to actually issue an invalid opcode, > so OS X will print a panic message with the exact address :) > > Stay tuned... OK, so I found a few instances. The one closest to where a random interrupt from gdb landed, was this one: ... 0xffffff7f813ff379: mov 0x90(%r15),%rax 0xffffff7f813ff380: mov 0x18(%rax),%rsi 0xffffff7f813ff384: xor %ecx,%ecx 0xffffff7f813ff386: mov %rsi,%rax 0xffffff7f813ff389: xor %edx,%edx 0xffffff7f813ff38b: monitor %rax,%rcx,%rdx 0xffffff7f813ff38e: test %r14,%r14 0xffffff7f813ff391: je 0xffffff7f813ff3ad 0xffffff7f813ff393: movq $0x0,0x8(%r14) 0xffffff7f813ff39b: movl $0x0,(%r14) 0xffffff7f813ff3a2: test %ebx,%ebx 0xffffff7f813ff3a4: je 0xffffff7f813ff3b2 0xffffff7f813ff3a6: mfence 0xffffff7f813ff3a9: wbinvd 0xffffff7f813ff3ab: jmp 0xffffff7f813ff3b2 0xffffff7f813ff3ad: cmpl $0x0,(%rsi) 0xffffff7f813ff3b0: jne 0xffffff7f813ff3d6 0xffffff7f813ff3b2: mov %r12d,%eax 0xffffff7f813ff3b5: imul $0x148,%rax,%rax 0xffffff7f813ff3bc: lea 0x153bd(%rip),%rcx # 0xffffff7f81414780 0xffffff7f813ff3c3: mov (%rcx),%rcx 0xffffff7f813ff3c6: mov 0x20(%rcx),%rcx 0xffffff7f813ff3ca: mov 0xc(%rcx,%rax,1),%eax 0xffffff7f813ff3ce: mov $0x1,%ecx 0xffffff7f813ff3d3: mwait %rax,%rcx => 0xffffff7f813ff3d6: lfence 0xffffff7f813ff3d9: rdtsc 0xffffff7f813ff3db: lfence 0xffffff7f813ff3de: mov %rax,%rbx 0xffffff7f813ff3e1: mov %rdx,%r15 ... Also, there were a few more within the range occupied by AppleIntelCPUPowerManagement.kext (which provides is the "smart" idle loop used by OS X): ... 0xffffff7f813f799a: mov 0x90(%r15),%rax 0xffffff7f813f79a1: mov 0x18(%rax),%r15 0xffffff7f813f79a5: xor %ecx,%ecx 0xffffff7f813f79a7: mov %r15,%rax 0xffffff7f813f79aa: xor %edx,%edx 0xffffff7f813f79ac: monitor %rax,%rcx,%rdx 0xffffff7f813f79af: mov %r12d,%r12d 0xffffff7f813f79b2: imul $0x148,%r12,%r13 0xffffff7f813f79b9: lea 0x1cdc0(%rip),%rax # 0xffffff7f81414780 0xffffff7f813f79c0: mov (%rax),%rax 0xffffff7f813f79c3: mov 0x20(%rax),%rcx 0xffffff7f813f79c7: testb $0x10,0x2(%rcx,%r13,1) 0xffffff7f813f79cd: je 0xffffff7f813f79d5 0xffffff7f813f79cf: callq *0x80(%rax) 0xffffff7f813f79d5: test %r14,%r14 0xffffff7f813f79d8: je 0xffffff7f813f79f4 0xffffff7f813f79da: movq $0x0,0x8(%r14) 0xffffff7f813f79e2: movl $0x0,(%r14) 0xffffff7f813f79e9: test %ebx,%ebx 0xffffff7f813f79eb: je 0xffffff7f813f79fa 0xffffff7f813f79ed: mfence 0xffffff7f813f79f0: wbinvd 0xffffff7f813f79f2: jmp 0xffffff7f813f79fa 0xffffff7f813f79f4: cmpl $0x0,(%r15) 0xffffff7f813f79f8: jne 0xffffff7f813f7a15 0xffffff7f813f79fa: lea 0x1cd7f(%rip),%rax # 0xffffff7f81414780 0xffffff7f813f7a01: mov (%rax),%rax 0xffffff7f813f7a04: mov 0x20(%rax),%rax 0xffffff7f813f7a08: mov 0xc(%rax,%r13,1),%eax 0xffffff7f813f7a0d: mov $0x1,%ecx 0xffffff7f813f7a12: mwait %rax,%rcx 0xffffff7f813f7a15: lfence 0xffffff7f813f7a18: rdtsc 0xffffff7f813f7a1a: lfence 0xffffff7f813f7a1d: mov %rax,%rbx 0xffffff7f813f7a20: mov %rdx,%r15 ... ... 0xffffff7f813f89c9: xor %ecx,%ecx 0xffffff7f813f89cb: mov %r13,%rax 0xffffff7f813f89ce: xor %edx,%edx 0xffffff7f813f89d0: monitor %rax,%rcx,%rdx 0xffffff7f813f89d3: mov %r12d,%r15d 0xffffff7f813f89d6: imul $0x148,%r15,%r12 0xffffff7f813f89dd: lea 0x1bd9c(%rip),%rax # 0xffffff7f81414780 0xffffff7f813f89e4: mov (%rax),%rax 0xffffff7f813f89e7: mov 0x20(%rax),%rcx 0xffffff7f813f89eb: testb $0x10,0x2(%rcx,%r12,1) 0xffffff7f813f89f1: je 0xffffff7f813f89f9 0xffffff7f813f89f3: callq *0x80(%rax) 0xffffff7f813f89f9: test %r14,%r14 0xffffff7f813f89fc: je 0xffffff7f813f8a18 0xffffff7f813f89fe: movq $0x0,0x8(%r14) 0xffffff7f813f8a06: movl $0x0,(%r14) 0xffffff7f813f8a0d: test %ebx,%ebx 0xffffff7f813f8a0f: je 0xffffff7f813f8a1f 0xffffff7f813f8a11: mfence 0xffffff7f813f8a14: wbinvd 0xffffff7f813f8a16: jmp 0xffffff7f813f8a1f 0xffffff7f813f8a18: cmpl $0x0,0x0(%r13) 0xffffff7f813f8a1d: jne 0xffffff7f813f8a3a 0xffffff7f813f8a1f: lea 0x1bd5a(%rip),%rax # 0xffffff7f81414780 0xffffff7f813f8a26: mov (%rax),%rax 0xffffff7f813f8a29: mov 0x20(%rax),%rax 0xffffff7f813f8a2d: mov 0xc(%rax,%r12,1),%eax 0xffffff7f813f8a32: mov $0x1,%ecx 0xffffff7f813f8a37: mwait %rax,%rcx 0xffffff7f813f8a3a: lfence 0xffffff7f813f8a3d: rdtsc 0xffffff7f813f8a3f: lfence 0xffffff7f813f8a42: mov %rax,%rbx 0xffffff7f813f8a45: mov %rdx,%r12 0xffffff7f813f8a48: shl $0x20,%r12 ... ... 0xffffff7f81401c10: mov %r13,%rax 0xffffff7f81401c13: xor %edx,%edx 0xffffff7f81401c15: monitor %rax,%rcx,%rdx 0xffffff7f81401c18: mov %r12d,%r15d 0xffffff7f81401c1b: imul $0x148,%r15,%r12 0xffffff7f81401c22: lea 0x12b57(%rip),%rax # 0xffffff7f81414780 0xffffff7f81401c29: mov (%rax),%rax 0xffffff7f81401c2c: mov 0x20(%rax),%rcx 0xffffff7f81401c30: testb $0x10,0x2(%rcx,%r12,1) 0xffffff7f81401c36: je 0xffffff7f81401c3e 0xffffff7f81401c38: callq *0x80(%rax) 0xffffff7f81401c3e: test %r14,%r14 0xffffff7f81401c41: je 0xffffff7f81401c5d 0xffffff7f81401c43: movq $0x0,0x8(%r14) 0xffffff7f81401c4b: movl $0x0,(%r14) 0xffffff7f81401c52: test %ebx,%ebx 0xffffff7f81401c54: je 0xffffff7f81401c64 0xffffff7f81401c56: mfence 0xffffff7f81401c59: wbinvd 0xffffff7f81401c5b: jmp 0xffffff7f81401c64 0xffffff7f81401c5d: cmpl $0x0,0x0(%r13) 0xffffff7f81401c62: jne 0xffffff7f81401c7f 0xffffff7f81401c64: lea 0x12b15(%rip),%rax # 0xffffff7f81414780 0xffffff7f81401c6b: mov (%rax),%rax 0xffffff7f81401c6e: mov 0x20(%rax),%rax 0xffffff7f81401c72: mov 0xc(%rax,%r12,1),%eax 0xffffff7f81401c77: mov $0x1,%ecx 0xffffff7f81401c7c: mwait %rax,%rcx 0xffffff7f81401c7f: lfence 0xffffff7f81401c82: rdtsc 0xffffff7f81401c84: lfence 0xffffff7f81401c87: mov %rax,%rbx 0xffffff7f81401c8a: mov %rdx,%r12 0xffffff7f81401c8d: shl $0x20,%r12 0xffffff7f81401c91: lea 0xaf1c(%rip),%rax # 0xffffff7f8140cbb4 0xffffff7f81401c98: testb $0x1,(%rax) ... If that's not enough context, I can email you the whole 'script' output I collected... HTH, --Gabriel ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-16 21:14 ` Gabriel L. Somlo @ 2017-03-17 2:03 ` Michael S. Tsirkin 2017-03-17 13:23 ` Gabriel L. Somlo 0 siblings, 1 reply; 54+ messages in thread From: Michael S. Tsirkin @ 2017-03-17 2:03 UTC (permalink / raw) To: Gabriel L. Somlo Cc: Radim Krčmář, linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc On Thu, Mar 16, 2017 at 05:14:15PM -0400, Gabriel L. Somlo wrote: > On Thu, Mar 16, 2017 at 04:17:11PM -0400, Gabriel L. Somlo wrote: > > On Thu, Mar 16, 2017 at 09:27:56PM +0200, Michael S. Tsirkin wrote: > > > On Thu, Mar 16, 2017 at 03:24:41PM -0400, Gabriel L. Somlo wrote: > > > > On Thu, Mar 16, 2017 at 08:29:32PM +0200, Michael S. Tsirkin wrote: > > > > > Let's take a step back and try to figure out how is > > > > > mwait called. How about dumping code of VCPUs > > > > > around mwait? gdb disa command will do this. > > > > > > > > Started guest with '-s', tried to attach from gdb with > > > > "target remote localhost:1234", got > > > > "remote 'g' packet reply is too long: <lengthy string of numbers>" > > > > > > Try > > > > > > set arch x86-64:x86-64 > > > > 'set architecture i386:x86-64:intel' is what worked for me; > > > > Been rooting around for a while, can't find mwait or monitor :( > > > > Guess I'll have to recompile KVM to actually issue an invalid opcode, > > so OS X will print a panic message with the exact address :) > > > > Stay tuned... > > OK, so I found a few instances. The one closest to where a random > interrupt from gdb landed, was this one: > > ... > 0xffffff7f813ff379: mov 0x90(%r15),%rax > 0xffffff7f813ff380: mov 0x18(%rax),%rsi > 0xffffff7f813ff384: xor %ecx,%ecx > 0xffffff7f813ff386: mov %rsi,%rax > 0xffffff7f813ff389: xor %edx,%edx > 0xffffff7f813ff38b: monitor %rax,%rcx,%rdx > 0xffffff7f813ff38e: test %r14,%r14 > 0xffffff7f813ff391: je 0xffffff7f813ff3ad > 0xffffff7f813ff393: movq $0x0,0x8(%r14) > 0xffffff7f813ff39b: movl $0x0,(%r14) > 0xffffff7f813ff3a2: test %ebx,%ebx > 0xffffff7f813ff3a4: je 0xffffff7f813ff3b2 > 0xffffff7f813ff3a6: mfence > 0xffffff7f813ff3a9: wbinvd > 0xffffff7f813ff3ab: jmp 0xffffff7f813ff3b2 > 0xffffff7f813ff3ad: cmpl $0x0,(%rsi) Seems to do cmpl - could indicate it uses different bytes for signalling? Radim's test monitors and modifies the same byte... > 0xffffff7f813ff3b0: jne 0xffffff7f813ff3d6 > 0xffffff7f813ff3b2: mov %r12d,%eax > 0xffffff7f813ff3b5: imul $0x148,%rax,%rax > 0xffffff7f813ff3bc: lea 0x153bd(%rip),%rcx # 0xffffff7f81414780 > 0xffffff7f813ff3c3: mov (%rcx),%rcx > 0xffffff7f813ff3c6: mov 0x20(%rcx),%rcx > 0xffffff7f813ff3ca: mov 0xc(%rcx,%rax,1),%eax > 0xffffff7f813ff3ce: mov $0x1,%ecx > 0xffffff7f813ff3d3: mwait %rax,%rcx > => 0xffffff7f813ff3d6: lfence > 0xffffff7f813ff3d9: rdtsc > 0xffffff7f813ff3db: lfence > 0xffffff7f813ff3de: mov %rax,%rbx > 0xffffff7f813ff3e1: mov %rdx,%r15 > ... OK nice, so it's actually using 1 for ECX. Now what's rax? Can you check that with gdb pls, then try that value with Radim's test? -- MST ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-17 2:03 ` Michael S. Tsirkin @ 2017-03-17 13:23 ` Gabriel L. Somlo 2017-03-21 3:22 ` Michael S. Tsirkin 0 siblings, 1 reply; 54+ messages in thread From: Gabriel L. Somlo @ 2017-03-17 13:23 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Radim Krčmář, linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc On Fri, Mar 17, 2017 at 04:03:59AM +0200, Michael S. Tsirkin wrote: > On Thu, Mar 16, 2017 at 05:14:15PM -0400, Gabriel L. Somlo wrote: > > On Thu, Mar 16, 2017 at 04:17:11PM -0400, Gabriel L. Somlo wrote: > > > On Thu, Mar 16, 2017 at 09:27:56PM +0200, Michael S. Tsirkin wrote: > > > > On Thu, Mar 16, 2017 at 03:24:41PM -0400, Gabriel L. Somlo wrote: > > > > > On Thu, Mar 16, 2017 at 08:29:32PM +0200, Michael S. Tsirkin wrote: > > > > > > Let's take a step back and try to figure out how is > > > > > > mwait called. How about dumping code of VCPUs > > > > > > around mwait? gdb disa command will do this. > > > > > > > > > > Started guest with '-s', tried to attach from gdb with > > > > > "target remote localhost:1234", got > > > > > "remote 'g' packet reply is too long: <lengthy string of numbers>" > > > > > > > > Try > > > > > > > > set arch x86-64:x86-64 > > > > > > 'set architecture i386:x86-64:intel' is what worked for me; > > > > > > Been rooting around for a while, can't find mwait or monitor :( > > > > > > Guess I'll have to recompile KVM to actually issue an invalid opcode, > > > so OS X will print a panic message with the exact address :) > > > > > > Stay tuned... > > > > OK, so I found a few instances. The one closest to where a random > > interrupt from gdb landed, was this one: > > > > ... > > 0xffffff7f813ff379: mov 0x90(%r15),%rax > > 0xffffff7f813ff380: mov 0x18(%rax),%rsi > > 0xffffff7f813ff384: xor %ecx,%ecx > > 0xffffff7f813ff386: mov %rsi,%rax > > 0xffffff7f813ff389: xor %edx,%edx > > 0xffffff7f813ff38b: monitor %rax,%rcx,%rdx > > 0xffffff7f813ff38e: test %r14,%r14 > > 0xffffff7f813ff391: je 0xffffff7f813ff3ad > > 0xffffff7f813ff393: movq $0x0,0x8(%r14) > > 0xffffff7f813ff39b: movl $0x0,(%r14) > > 0xffffff7f813ff3a2: test %ebx,%ebx > > 0xffffff7f813ff3a4: je 0xffffff7f813ff3b2 > > 0xffffff7f813ff3a6: mfence > > 0xffffff7f813ff3a9: wbinvd > > 0xffffff7f813ff3ab: jmp 0xffffff7f813ff3b2 > > 0xffffff7f813ff3ad: cmpl $0x0,(%rsi) > > Seems to do cmpl - could indicate it uses different bytes > for signalling? Radim's test monitors and > modifies the same byte... > > > 0xffffff7f813ff3b0: jne 0xffffff7f813ff3d6 > > 0xffffff7f813ff3b2: mov %r12d,%eax > > 0xffffff7f813ff3b5: imul $0x148,%rax,%rax > > 0xffffff7f813ff3bc: lea 0x153bd(%rip),%rcx # 0xffffff7f81414780 > > 0xffffff7f813ff3c3: mov (%rcx),%rcx > > 0xffffff7f813ff3c6: mov 0x20(%rcx),%rcx > > 0xffffff7f813ff3ca: mov 0xc(%rcx,%rax,1),%eax > > 0xffffff7f813ff3ce: mov $0x1,%ecx > > 0xffffff7f813ff3d3: mwait %rax,%rcx > > => 0xffffff7f813ff3d6: lfence > > 0xffffff7f813ff3d9: rdtsc > > 0xffffff7f813ff3db: lfence > > 0xffffff7f813ff3de: mov %rax,%rbx > > 0xffffff7f813ff3e1: mov %rdx,%r15 > > ... > > OK nice, so it's actually using 1 for ECX. Now what's rax? > Can you check that with gdb pls, then try that value with > Radim's test? Thread 1 received signal SIGINT, Interrupt. 0xffffff80002c8991 in ?? () (gdb) break *0xffffff7f813ff3ce Breakpoint 1 at 0xffffff7f813ff3ce (gdb) continue Continuing. Thread 3 hit Breakpoint 1, 0xffffff7f813ff3ce in ?? () (gdb) p $rax $1 = 240 (gdb) cont Continuing. [Switching to Thread 1] Thread 1 hit Breakpoint 1, 0xffffff7f813ff3ce in ?? () (gdb) p $rax $2 = 240 (gdb) cont Continuing. [Switching to Thread 4] Thread 4 hit Breakpoint 1, 0xffffff7f813ff3ce in ?? () (gdb) p $rax $3 = 240 (gdb) cont Continuing. Thread 4 hit Breakpoint 1, 0xffffff7f813ff3ce in ?? () (gdb) p $rax $4 = 240 (gdb) So, 240 or 0xf0 OK, now on to Radim's test, on the MacPro1,1: [kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '240 1 1' timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 240 1 1 enabling apic PASS: resumed from mwait 10000 times SUMMARY: 1 tests real 0m0.746s user 0m0.542s sys 0m0.215s [kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '240 1 0' timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 240 1 0 enabling apic PASS: resumed from mwait 10000 times SUMMARY: 1 tests real 0m0.743s user 0m0.528s sys 0m0.226s [kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '240 1 1' -smp 2 timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 240 1 1 -smp 2 enabling apic enabling apic FAIL: resumed from mwait 10150 times SUMMARY: 1 tests, 1 unexpected failures real 0m0.745s user 0m0.545s sys 0m0.214s [kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '240 1 0' -smp 2 timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 240 1 0 -smp 2 enabling apic enabling apic FAIL: resumed from mwait 10116 times SUMMARY: 1 tests, 1 unexpected failures real 0m0.744s user 0m0.541s sys 0m0.217s HTH, --Gabriel ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-17 13:23 ` Gabriel L. Somlo @ 2017-03-21 3:22 ` Michael S. Tsirkin 2017-03-21 16:58 ` Radim Krčmář 0 siblings, 1 reply; 54+ messages in thread From: Michael S. Tsirkin @ 2017-03-21 3:22 UTC (permalink / raw) To: Gabriel L. Somlo Cc: Radim Krčmář, linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc On Fri, Mar 17, 2017 at 09:23:56AM -0400, Gabriel L. Somlo wrote: > OK, now on to Radim's test, on the MacPro1,1: > > [kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '240 1 1' > timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 240 1 1 > enabling apic > PASS: resumed from mwait 10000 times > SUMMARY: 1 tests > > real 0m0.746s > user 0m0.542s > sys 0m0.215s > [kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '240 1 0' > timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 240 1 0 > enabling apic > PASS: resumed from mwait 10000 times > SUMMARY: 1 tests > > real 0m0.743s > user 0m0.528s > sys 0m0.226s > [kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '240 1 1' -smp 2 > timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 240 1 1 -smp 2 > enabling apic > enabling apic > FAIL: resumed from mwait 10150 times > SUMMARY: 1 tests, 1 unexpected failures > > real 0m0.745s > user 0m0.545s > sys 0m0.214s > [kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '240 1 0' -smp 2 > timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 240 1 0 -smp 2 > enabling apic > enabling apic > FAIL: resumed from mwait 10116 times > SUMMARY: 1 tests, 1 unexpected failures > > real 0m0.744s > user 0m0.541s > sys 0m0.217s > > HTH, > --Gabriel Weird. How can it go above 10000? Radim - any idea? -- MST ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-21 3:22 ` Michael S. Tsirkin @ 2017-03-21 16:58 ` Radim Krčmář 2017-03-21 17:29 ` Nadav Amit 0 siblings, 1 reply; 54+ messages in thread From: Radim Krčmář @ 2017-03-21 16:58 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Gabriel L. Somlo, linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc 2017-03-21 05:22+0200, Michael S. Tsirkin: > On Fri, Mar 17, 2017 at 09:23:56AM -0400, Gabriel L. Somlo wrote: >> OK, now on to Radim's test, on the MacPro1,1: >> >> [kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '240 1 1' >> timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 240 1 1 >> enabling apic >> PASS: resumed from mwait 10000 times >> SUMMARY: 1 tests >> >> real 0m0.746s >> user 0m0.542s >> sys 0m0.215s >> [kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '240 1 0' >> timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 240 1 0 >> enabling apic >> PASS: resumed from mwait 10000 times >> SUMMARY: 1 tests >> >> real 0m0.743s >> user 0m0.528s >> sys 0m0.226s >> [kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '240 1 1' -smp 2 >> timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 240 1 1 -smp 2 >> enabling apic >> enabling apic >> FAIL: resumed from mwait 10150 times >> SUMMARY: 1 tests, 1 unexpected failures >> >> real 0m0.745s >> user 0m0.545s >> sys 0m0.214s >> [kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '240 1 0' -smp 2 >> timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 240 1 0 -smp 2 >> enabling apic >> enabling apic >> FAIL: resumed from mwait 10116 times >> SUMMARY: 1 tests, 1 unexpected failures >> >> real 0m0.744s >> user 0m0.541s >> sys 0m0.217s >> >> HTH, >> --Gabriel > > Weird. How can it go above 10000? Radim - any idea? In '-smp 2', the writing VCPU always does 10000 wakeups by writing into monitored memory, but the mwaiting VCPU can be also woken up by host interrupts, which might add a few exits depending on timing. I didn't spend much time in making the PASS/FAIL mean much, or ensuring that we only get 10000 wakeups ... it is nothing to be worried about. Hint 240 behaves as nop even on my system, so I still don't find anything insane on that machine (if OS X is exluded) ... ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-21 16:58 ` Radim Krčmář @ 2017-03-21 17:29 ` Nadav Amit 0 siblings, 0 replies; 54+ messages in thread From: Nadav Amit @ 2017-03-21 17:29 UTC (permalink / raw) To: Radim Krčmář Cc: Michael S. Tsirkin, Gabriel L. Somlo, LKML, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc > On Mar 21, 2017, at 9:58 AM, Radim Krčmář <rkrcmar@redhat.com> wrote: > In '-smp 2', the writing VCPU always does 10000 wakeups by writing into > monitored memory, but the mwaiting VCPU can be also woken up by host > interrupts, which might add a few exits depending on timing. > > I didn't spend much time in making the PASS/FAIL mean much, or ensuring > that we only get 10000 wakeups ... it is nothing to be worried about. > > Hint 240 behaves as nop even on my system, so I still don't find > anything insane on that machine (if OS X is exluded) ... >From my days in Intel (10 years ago), I can say that MWAIT wakes for many microarchitecural events beside interrupts. Out of curiosity, aren’t you worried that on OS X the wbinvd causes an exit after the monitor and before the mwait? ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests @ 2017-03-21 17:29 ` Nadav Amit 0 siblings, 0 replies; 54+ messages in thread From: Nadav Amit @ 2017-03-21 17:29 UTC (permalink / raw) To: Radim Krčmář Cc: Michael S. Tsirkin, Gabriel L. Somlo, LKML, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc > On Mar 21, 2017, at 9:58 AM, Radim Krčmář <rkrcmar@redhat.com> wrote: > In '-smp 2', the writing VCPU always does 10000 wakeups by writing into > monitored memory, but the mwaiting VCPU can be also woken up by host > interrupts, which might add a few exits depending on timing. > > I didn't spend much time in making the PASS/FAIL mean much, or ensuring > that we only get 10000 wakeups ... it is nothing to be worried about. > > Hint 240 behaves as nop even on my system, so I still don't find > anything insane on that machine (if OS X is exluded) ... From my days in Intel (10 years ago), I can say that MWAIT wakes for many microarchitecural events beside interrupts. Out of curiosity, aren’t you worried that on OS X the wbinvd causes an exit after the monitor and before the mwait? ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-21 17:29 ` Nadav Amit (?) @ 2017-03-21 19:22 ` Radim Krčmář 2017-03-21 22:51 ` Gabriel Somlo -1 siblings, 1 reply; 54+ messages in thread From: Radim Krčmář @ 2017-03-21 19:22 UTC (permalink / raw) To: Nadav Amit Cc: Michael S. Tsirkin, Gabriel L. Somlo, LKML, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc 2017-03-21 10:29-0700, Nadav Amit: > > > On Mar 21, 2017, at 9:58 AM, Radim Krčmář <rkrcmar@redhat.com> wrote: > > > In '-smp 2', the writing VCPU always does 10000 wakeups by writing into > > monitored memory, but the mwaiting VCPU can be also woken up by host > > interrupts, which might add a few exits depending on timing. > > > > I didn't spend much time in making the PASS/FAIL mean much, or ensuring > > that we only get 10000 wakeups ... it is nothing to be worried about. > > > > Hint 240 behaves as nop even on my system, so I still don't find > > anything insane on that machine (if OS X is exluded) ... > > From my days in Intel (10 years ago), I can say that MWAIT wakes for many > microarchitecural events beside interrupts. > > Out of curiosity, aren’t you worried that on OS X the wbinvd causes an exit > after the monitor and before the mwait? VM entry clears the monitoring, so it should behave just like an MWAIT without MONITOR, which is NOP according to the spec. It does so on modern hardware, but it definitely is a good thing to try ... (I am worried about disabling MWAIT exits by default and it's a no-go until we understand why OS X doesn't work.) Gabriel, how does testing with this change behave on the old machine? Thanks. ---8<--- This should be the same as "wbinvd", because "wbinvd" does nothing without non-coherent vfio. Simply replacing "vmcall" with "wbinvd" is an option if the "vmcall" version works as expected. --- diff --git a/x86/mwait.c b/x86/mwait.c index 20f4dcbff8ae..19f988b94541 100644 --- a/x86/mwait.c +++ b/x86/mwait.c @@ -54,6 +54,7 @@ int main(int argc, char **argv) while ((smp ? *page : resumes) < TARGET_RESUMES) { asm volatile("monitor" :: "a" (page), "c" (0), "d" (0)); + asm volatile("vmcall" :: "a"(-1)); asm volatile("mwait" :: "a" (eax), "c" (ecx)); resumes++; } ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-21 19:22 ` Radim Krčmář @ 2017-03-21 22:51 ` Gabriel Somlo 2017-03-22 0:02 ` Nadav Amit 0 siblings, 1 reply; 54+ messages in thread From: Gabriel Somlo @ 2017-03-21 22:51 UTC (permalink / raw) To: Radim Krčmář Cc: Nadav Amit, Michael S. Tsirkin, LKML, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc On Tue, Mar 21, 2017 at 08:22:39PM +0100, Radim Krčmář wrote: > 2017-03-21 10:29-0700, Nadav Amit: > > > > > On Mar 21, 2017, at 9:58 AM, Radim Krčmář <rkrcmar@redhat.com> wrote: > > > > > In '-smp 2', the writing VCPU always does 10000 wakeups by writing into > > > monitored memory, but the mwaiting VCPU can be also woken up by host > > > interrupts, which might add a few exits depending on timing. > > > > > > I didn't spend much time in making the PASS/FAIL mean much, or ensuring > > > that we only get 10000 wakeups ... it is nothing to be worried about. > > > > > > Hint 240 behaves as nop even on my system, so I still don't find > > > anything insane on that machine (if OS X is exluded) ... And I get the exact same results on the MacBookAir4,2 (which exhibits no freezing or extreme sluggishness when running OS X 10.7 smp with Michael's KVM MWAIT-in-L1 patch)... > > > > From my days in Intel (10 years ago), I can say that MWAIT wakes for many > > microarchitecural events beside interrupts. > > > > Out of curiosity, aren’t you worried that on OS X the wbinvd causes an exit > > after the monitor and before the mwait? > > VM entry clears the monitoring, so it should behave just like an MWAIT > without MONITOR, which is NOP according to the spec. It does so on > modern hardware, but it definitely is a good thing to try ... > (I am worried about disabling MWAIT exits by default and it's a no-go > until we understand why OS X doesn't work.) > > Gabriel, how does testing with this change behave on the old machine? > > Thanks. > > ---8<--- > This should be the same as "wbinvd", because "wbinvd" does nothing > without non-coherent vfio. > Simply replacing "vmcall" with "wbinvd" is an option if the "vmcall" > version works as expected. > --- > diff --git a/x86/mwait.c b/x86/mwait.c > index 20f4dcbff8ae..19f988b94541 100644 > --- a/x86/mwait.c > +++ b/x86/mwait.c > @@ -54,6 +54,7 @@ int main(int argc, char **argv) > > while ((smp ? *page : resumes) < TARGET_RESUMES) { > asm volatile("monitor" :: "a" (page), "c" (0), "d" (0)); > + asm volatile("vmcall" :: "a"(-1)); > asm volatile("mwait" :: "a" (eax), "c" (ecx)); > resumes++; > } Sure thing, here's the MacPro1,1 results: [kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '240 1 0' timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 240 1 0 enabling apic PASS: resumed from mwait 10000 times SUMMARY: 1 tests real 0m1.709s user 0m0.547s sys 0m0.243s [kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '240 1 1' timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 240 1 1 enabling apic PASS: resumed from mwait 10000 times SUMMARY: 1 tests real 0m0.752s user 0m0.545s sys 0m0.218s [kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '240 1 0' -smp 2 timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 240 1 0 -smp 2 enabling apic enabling apic FAIL: resumed from mwait 10004 times SUMMARY: 1 tests, 1 unexpected failures real 0m0.753s user 0m0.554s sys 0m0.227s [kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '240 1 1' -smp 2 timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 240 1 1 -smp 2 enabling apic enabling apic PASS: resumed from mwait 10000 times SUMMARY: 1 tests real 0m0.755s user 0m0.562s sys 0m0.221s For comparison, the resuls including 'vmcall' on the MacBookAir4,2 (interesting, the results for the last test, "-append '240 1 1' -smp 2", are different): [kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '240 1 0' timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 240 1 0 enabling apic PASS: resumed from mwait 10000 times SUMMARY: 1 tests real 0m0.622s user 0m0.501s sys 0m0.130s [kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '240 1 1' timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 240 1 1 enabling apic PASS: resumed from mwait 10000 times SUMMARY: 1 tests real 0m0.624s user 0m0.504s sys 0m0.127s [kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '240 1 0' -smp 2 timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 240 1 0 -smp 2 enabling apic enabling apic FAIL: resumed from mwait 10023 times SUMMARY: 1 tests, 1 unexpected failures real 0m0.623s user 0m0.544s sys 0m0.110s [kvm-unit-tests]$ time TIMEOUT=20 ./x86-run x86/mwait.flat -append '240 1 1' -smp 2 timeout -k 1s --foreground 20 qemu-kvm -nodefaults -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/mwait.flat -append 240 1 1 -smp 2 enabling apic enabling apic FAIL: resumed from mwait 10006 times SUMMARY: 1 tests, 1 unexpected failures real 0m0.618s user 0m0.527s sys 0m0.121s HTH, --Gabriel ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-21 22:51 ` Gabriel Somlo @ 2017-03-22 0:02 ` Nadav Amit 2017-03-22 13:35 ` Michael S. Tsirkin 0 siblings, 1 reply; 54+ messages in thread From: Nadav Amit @ 2017-03-22 0:02 UTC (permalink / raw) To: Gabriel Somlo Cc: Radim Krčmář, Michael S. Tsirkin, LKML, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, X86 ML, Joerg Roedel, KVM list, linux-doc > On Mar 21, 2017, at 3:51 PM, Gabriel Somlo <gsomlo@gmail.com> wrote: > > And I get the exact same results on the MacBookAir4,2 (which exhibits > no freezing or extreme sluggishness when running OS X 10.7 smp with > Michael's KVM MWAIT-in-L1 patch)... Sorry for my confusion. I didn’t read the entire thread and thought that the problem is spurious wake-ups. Since that is not the case, I would just suggest two things that you can freely ignore: 1. According to the SDM, when an interrupt is delivered, the interrupt is only delivered on the following instruction, so you may consider skipping the MWAIT first. 2. Perhaps the CPU changes for some reason GUEST_ACTIVITY_STATE (which is not according to the SDM). That is it. No more BS from me. Nadav ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-22 0:02 ` Nadav Amit @ 2017-03-22 13:35 ` Michael S. Tsirkin 2017-03-22 14:10 ` Gabriel L. Somlo 0 siblings, 1 reply; 54+ messages in thread From: Michael S. Tsirkin @ 2017-03-22 13:35 UTC (permalink / raw) To: Nadav Amit Cc: Gabriel Somlo, Radim Krčmář, LKML, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, X86 ML, Joerg Roedel, KVM list, linux-doc On Tue, Mar 21, 2017 at 05:02:25PM -0700, Nadav Amit wrote: > > > On Mar 21, 2017, at 3:51 PM, Gabriel Somlo <gsomlo@gmail.com> wrote: > > > > And I get the exact same results on the MacBookAir4,2 (which exhibits > > no freezing or extreme sluggishness when running OS X 10.7 smp with > > Michael's KVM MWAIT-in-L1 patch)... > > Sorry for my confusion. I didn’t read the entire thread and thought that > the problem is spurious wake-ups. > > Since that is not the case, I would just suggest two things that you can > freely ignore: > > 1. According to the SDM, when an interrupt is delivered, the interrupt > is only delivered on the following instruction, so you may consider > skipping the MWAIT first. > > 2. Perhaps the CPU changes for some reason GUEST_ACTIVITY_STATE (which > is not according to the SDM). > > That is it. No more BS from me. > > Nadav Intersting. I found this errata: A REP STOS/MOVS to a MONITOR/MWAIT Address Range May Prevent Triggering of the Monitoring Hardware Could the macbook CPU be affected? -- MST ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-22 13:35 ` Michael S. Tsirkin @ 2017-03-22 14:10 ` Gabriel L. Somlo 2017-03-22 14:15 ` Michael S. Tsirkin 0 siblings, 1 reply; 54+ messages in thread From: Gabriel L. Somlo @ 2017-03-22 14:10 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Nadav Amit, Radim Krčmář, LKML, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, X86 ML, Joerg Roedel, KVM list, linux-doc On Wed, Mar 22, 2017 at 03:35:18PM +0200, Michael S. Tsirkin wrote: > On Tue, Mar 21, 2017 at 05:02:25PM -0700, Nadav Amit wrote: > > > > > On Mar 21, 2017, at 3:51 PM, Gabriel Somlo <gsomlo@gmail.com> wrote: > > > > > > And I get the exact same results on the MacBookAir4,2 (which exhibits > > > no freezing or extreme sluggishness when running OS X 10.7 smp with > > > Michael's KVM MWAIT-in-L1 patch)... > > > > Sorry for my confusion. I didn’t read the entire thread and thought that > > the problem is spurious wake-ups. > > > > Since that is not the case, I would just suggest two things that you can > > freely ignore: > > > > 1. According to the SDM, when an interrupt is delivered, the interrupt > > is only delivered on the following instruction, so you may consider > > skipping the MWAIT first. > > > > 2. Perhaps the CPU changes for some reason GUEST_ACTIVITY_STATE (which > > is not according to the SDM). > > > > That is it. No more BS from me. > > > > Nadav > > Intersting. I found this errata: > A REP STOS/MOVS to a MONITOR/MWAIT Address Range May Prevent Triggering of > the Monitoring Hardware Any way to tell if they mean that for L0, or L>=1, or all of them? > Could the macbook CPU be affected? I ran a grep on the log file I collected when disassembling AppleIntelCPUPowerManagement.kext (where the MWAIT-based idle thread lives) a few days ago, and didn't find any "rep stos" or "rep movs" instances. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-22 14:10 ` Gabriel L. Somlo @ 2017-03-22 14:15 ` Michael S. Tsirkin 0 siblings, 0 replies; 54+ messages in thread From: Michael S. Tsirkin @ 2017-03-22 14:15 UTC (permalink / raw) To: Gabriel L. Somlo Cc: Nadav Amit, Radim Krčmář, LKML, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, X86 ML, Joerg Roedel, KVM list, linux-doc On Wed, Mar 22, 2017 at 10:10:05AM -0400, Gabriel L. Somlo wrote: > On Wed, Mar 22, 2017 at 03:35:18PM +0200, Michael S. Tsirkin wrote: > > On Tue, Mar 21, 2017 at 05:02:25PM -0700, Nadav Amit wrote: > > > > > > > On Mar 21, 2017, at 3:51 PM, Gabriel Somlo <gsomlo@gmail.com> wrote: > > > > > > > > And I get the exact same results on the MacBookAir4,2 (which exhibits > > > > no freezing or extreme sluggishness when running OS X 10.7 smp with > > > > Michael's KVM MWAIT-in-L1 patch)... > > > > > > Sorry for my confusion. I didn’t read the entire thread and thought that > > > the problem is spurious wake-ups. > > > > > > Since that is not the case, I would just suggest two things that you can > > > freely ignore: > > > > > > 1. According to the SDM, when an interrupt is delivered, the interrupt > > > is only delivered on the following instruction, so you may consider > > > skipping the MWAIT first. > > > > > > 2. Perhaps the CPU changes for some reason GUEST_ACTIVITY_STATE (which > > > is not according to the SDM). > > > > > > That is it. No more BS from me. > > > > > > Nadav > > > > Intersting. I found this errata: > > A REP STOS/MOVS to a MONITOR/MWAIT Address Range May Prevent Triggering of > > the Monitoring Hardware > > Any way to tell if they mean that for L0, or L>=1, or all of them? > > > Could the macbook CPU be affected? > > I ran a grep on the log file I collected when disassembling > AppleIntelCPUPowerManagement.kext (where the MWAIT-based idle > thread lives) a few days ago, and didn't find any "rep stos" or > "rep movs" instances. > Right but that would be on the waking side, not the one that does mwait. -- MST ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-16 15:35 ` Radim Krčmář 2017-03-16 16:01 ` Radim Krčmář @ 2017-03-16 16:16 ` Gabriel L. Somlo 2017-03-16 16:45 ` Michael S. Tsirkin 1 sibling, 1 reply; 54+ messages in thread From: Gabriel L. Somlo @ 2017-03-16 16:16 UTC (permalink / raw) To: Radim Krčmář Cc: Michael S. Tsirkin, linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc On Thu, Mar 16, 2017 at 04:35:18PM +0100, Radim Krčmář wrote: > 2017-03-16 10:58-0400, Gabriel L. Somlo: > > On Thu, Mar 16, 2017 at 04:04:12PM +0200, Michael S. Tsirkin wrote: > > > On Thu, Mar 16, 2017 at 09:24:27AM -0400, Gabriel L. Somlo wrote: > > > > After studying your patch a bit more carefully (sorry, it's crazy > > > > around here right now :) ) I realized you're simply trying to > > > > (selectively) decide when to exit L1 and emulate as NOP vs. when to > > > > just allow L1 to execute MONITOR & MWAIT natively. > > > > > > > > Is that right ? Because if so, the issues I saw on my MacPro1,1 are > > > > weird and inexplicable, given that allowing L>=1 to run MONITOR/MWAIT > > > > natively was one of the options Alex Graf and Rene Rebe used back in > > > > the very early days of OS X on QEMU, at the time I got involved with > > > > that project. Here's part of an out of tree patch against 3.4 which did > > > > just that, and worked as far as I remember on *any* MWAIT capable > > > > intel chip I had access to back in 2010: > > > > > > > > ############################################################################## > > > > # 99-mwait.patch.kvm-kmod (Rene Rebe <rene@exactcode.de>) 2010-04-27 > > > > ############################################################################## > > > > diff -pNarU5 linux-3.4/arch/x86/kvm/cpuid.c linux-3.4-mac/arch/x86/kvm/cpuid.c > > > > --- linux-3.4/arch/x86/kvm/cpuid.c 2012-05-20 18:29:13.000000000 -0400 > > > > +++ linux-3.4-mac/arch/x86/kvm/cpuid.c 2012-10-09 11:42:59.921215750 -0400 > > > > @@ -222,11 +222,11 @@ static int do_cpuid_ent(struct kvm_cpuid > > > > f_nx | 0 /* Reserved */ | F(MMXEXT) | F(MMX) | > > > > F(FXSR) | F(FXSR_OPT) | f_gbpages | f_rdtscp | > > > > 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW); > > > > /* cpuid 1.ecx */ > > > > const u32 kvm_supported_word4_x86_features = > > > > - F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | > > > > + F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ | > > > > 0 /* DS-CPL, VMX, SMX, EST */ | > > > > 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | > > > > F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ | > > > > 0 /* Reserved, DCA */ | F(XMM4_1) | > > > > F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) | > > > > diff -pNarU5 linux-3.4/arch/x86/kvm/svm.c linux-3.4-mac/arch/x86/kvm/svm.c > > > > --- linux-3.4/arch/x86/kvm/svm.c 2012-05-20 18:29:13.000000000 -0400 > > > > +++ linux-3.4-mac/arch/x86/kvm/svm.c 2012-10-09 11:44:41.598997481 -0400 > > > > @@ -1102,12 +1102,10 @@ static void init_vmcb(struct vcpu_svm *s > > > > set_intercept(svm, INTERCEPT_VMSAVE); > > > > set_intercept(svm, INTERCEPT_STGI); > > > > set_intercept(svm, INTERCEPT_CLGI); > > > > set_intercept(svm, INTERCEPT_SKINIT); > > > > set_intercept(svm, INTERCEPT_WBINVD); > > > > - set_intercept(svm, INTERCEPT_MONITOR); > > > > - set_intercept(svm, INTERCEPT_MWAIT); > > > > set_intercept(svm, INTERCEPT_XSETBV); > > > > > > > > control->iopm_base_pa = iopm_base; > > > > control->msrpm_base_pa = __pa(svm->msrpm); > > > > control->int_ctl = V_INTR_MASKING_MASK; > > > > diff -pNarU5 linux-3.4/arch/x86/kvm/vmx.c linux-3.4-mac/arch/x86/kvm/vmx.c > > > > --- linux-3.4/arch/x86/kvm/vmx.c 2012-05-20 18:29:13.000000000 -0400 > > > > +++ linux-3.4-mac/arch/x86/kvm/vmx.c 2012-10-09 11:42:59.925215977 -0400 > > > > @@ -1938,11 +1938,11 @@ static __init void nested_vmx_setup_ctls > > > > nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high); > > > > nested_vmx_procbased_ctls_low = 0; > > > > nested_vmx_procbased_ctls_high &= > > > > CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_USE_TSC_OFFSETING | > > > > CPU_BASED_HLT_EXITING | CPU_BASED_INVLPG_EXITING | > > > > - CPU_BASED_MWAIT_EXITING | CPU_BASED_CR3_LOAD_EXITING | > > > > + CPU_BASED_CR3_LOAD_EXITING | > > > > CPU_BASED_CR3_STORE_EXITING | > > > > #ifdef CONFIG_X86_64 > > > > CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING | > > > > #endif > > > > CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING | > > > > @@ -2404,12 +2404,10 @@ static __init int setup_vmcs_config(stru > > > > CPU_BASED_CR3_LOAD_EXITING | > > > > CPU_BASED_CR3_STORE_EXITING | > > > > CPU_BASED_USE_IO_BITMAPS | > > > > CPU_BASED_MOV_DR_EXITING | > > > > CPU_BASED_USE_TSC_OFFSETING | > > > > - CPU_BASED_MWAIT_EXITING | > > > > - CPU_BASED_MONITOR_EXITING | > > > > CPU_BASED_INVLPG_EXITING | > > > > CPU_BASED_RDPMC_EXITING; > > > > > > > > opt = CPU_BASED_TPR_SHADOW | > > > > CPU_BASED_USE_MSR_BITMAPS | > > > > > > > > If all you're trying to do is (selectively) revert to this behavior, > > > > that "shouldn't" mess it up for the MacPro either, so I'm thoroughly > > > > confused at this point :) > > > > > > Yes. Me too. Want to try that other patch and see what happens? > > > > You mean the old 3.4 patch against current KVM ? I'll try to do that, > > might take me a while :) > > Michael's patch already did most of that, you just need to add > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index efde6cc50875..b12f07d4ce17 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -348,7 +348,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, > const u32 kvm_cpuid_1_ecx_x86_features = > /* NOTE: MONITOR (and MWAIT) are emulated as NOP, > * but *not* advertised to guests via CPUID ! */ > - F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | > + F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ | > 0 /* DS-CPL, VMX, SMX, EST */ | > 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | > F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ | > > Note: this will never be upstream, because mwait isn't what we want by > default. :) But since OS X doesn't check CPUID and simply runs MONITOR & MWAIT assuming they're present, the above one-liner would make no difference. If everything else in the old patch I quoted is identical to what Michael does, then I don't know -- maybe the MacPro1,1 has really broken L>=1 MWAIT, and it only ever worked with vmexit and emulation on the host side. > >> > Back in 2010, running MWAIT in L>=1 behaved 100% exactly like a NOP, > >> > didn't power down the physical CPU, just immediately moved on to the > >> > next instruction. As such, there was no power saving and no > >> > opportunity to yield to another L0 thread either, unlike with NOP > >> > emulation at L0. > >> > > >> > Did that change on newer Intel chips (i.e., is guest-mode MWAIT now > >> > doing something smarter than just acting as a guest-mode NOP) ? > >> > > >> > Thanks, > >> > --Gabriel > >> > >> Interesting. What it seems to say is this: > >> > >> MWAIT. Behavior of the MWAIT instruction (which always causes an invalid- > >> opcode exception—#UD—if CPL > 0) is determined by the setting of the “MWAIT > >> exiting” VM-execution control: > >> — If the “MWAIT exiting” VM-execution control is 1, MWAIT causes a VM exit > >> (see Section 22.1.3). > >> — If the “MWAIT exiting” VM-execution control is 0, MWAIT operates normally if > >> any of the following is true: (1) the “interrupt-window exiting” VM-execution > >> control is 0; (2) ECX[0] is 0; or (3) RFLAGS.IF = 1. > >> — If the “MWAIT exiting” VM-execution control is 0, the “interrupt-window > >> exiting” VM-execution control is 1, ECX[0] = 1, and RFLAGS.IF = 0, MWAIT > >> does not cause the processor to enter an implementation-dependent > >> optimized state; instead, control passes to the instruction following the > >> MWAIT instruction. > >> > >> > >> And since interrupt-window exiting is 0 most of the time for KVM, > >> I would expect MWAIT to behave normally. > > > > The intel manual said the same thing back in 2010 as well. However, > > regardless of how any flags were set, interrupt-window exiting or not, > > "normal" L1 MWAIT behavior was that it woke up immediately regardless. > > Remember, never going to sleep is still correct ("normal" ?) behavior > > per the ISA definition of MWAIT :) > > I'll write a simple kvm-unit-test to better understand why it is broken > for you ... > > > Also, when I tested your patch on the macbook air (where it worked), > > not only was the host reporting 400% CPU for qemu (which is to be > > expected), but the thermal fan/cooling thing also shifted up into high > > gear, which means the physical CPU got hot, which it shouldn't have if > > the guest-mode MWAIT actually did put the host CPU into low power. > > I tested MWAIT with basically the same kernel patch and the qemu patch > with Linux guest on Haswell and Nehalem. Running the guest took 100% of > the host CPUs, but it still had the same temperature as when the host > was idle. > > That reminds me that you to pass '-cpu host' for QEMU reasons. For OS X to boot, one needs '-cpu core2duo' for <= 10.11, and '-cpu Penryn' for 10.12. I never managed to get it working with any other settings. So I'm ready to write off the MacPro1,1 (unless you want me run more tests and report back for you, which I'm happy to do in any case). But please please, so at least I walk away from this having learned something :) help me understand the use case: - By careful setting of vmx flags, and/or on newer, sanely built Intel hardware, L1 MWAIT actually powers down the physical host core (while I couldn't get it to stay cool on my end, I totally believe you managed to pull it off) - We never admit to supporting MWAIT to guests, but when they do anyway (either because they're old/grumpy/careless OS X versions, or some newfangled custom-built Linux kernel which is hacked to ignore CPUID on purpose), we now allow the guest to: - keep its alloted time slice - but "waste" it by powering down the host CPU instead of - vmexit to the host OS at L0 - yield the host core to another L0 runnable thread Since newer OS X actually checks CPUID, I don't have a major stake in one way vs. the other, but I'm really really curious: Are we trying to save power assuming the host is unlikely to have enough runnable L0 threads for when the L0-emulated NOP yields? So we're better off letting the guest keep the CPU but also keep it cool while at it (assuming the guest isn't totally hostile and didn't pick a setting where L1 MWAIT actually works as L1 NOP, in which case we don't even get to stay cool)? Man, I wish I had the cycles to resurrect my attempt at acually emulating MWAIT with something like a condition queue (below, just for reference). Thanks much, --Gabriel ############################################################################## # kvm-mwait-emu.patch (Gabriel Somlo <somlo@cmu.edu> 2014/02/05) # -- based on an idea suggested by Alex Graf -- # GLS: emulate MONITOR and MWAIT at page-level granularity by write-protecting # the page containing a monitored location and appropriately handling # subsequent write faults. # After debugging the SMP issue, we'll need a way to trigger a # periodic cleanup that will switch write-protected monitored pages # back to read-write, once they've stayed unused for "long enough" ############################################################################## diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index fdf83af..7ca9b51 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -337,6 +337,16 @@ struct kvm_pmu { u64 reprogram_pmi; }; +/* + * mwait-monitored page list element type + */ +struct kvm_mwait_pg { + gpa_t gpa; + struct list_head vcpu_list; /* VCPUs monitoring (armed on) this page */ + struct list_head link; /* links mwait-pages within a KVM */ + unsigned accessed; +}; + struct kvm_vcpu_arch { /* * rip and regs accesses must go through @@ -528,6 +538,10 @@ struct kvm_vcpu_arch { struct { bool pv_unhalted; } pv; + + /* MONITOR/MWAIT support */ + struct kvm_mwait_pg *mwp; /* page monitored by this VCPU */ + struct list_head mw_link; /* all VCPUs monitoring the same page */ }; struct kvm_lpage_info { @@ -607,6 +621,10 @@ struct kvm_arch { u64 hv_hypercall; u64 hv_tsc_page; + /* MONITOR/MWAIT support */ + struct mutex mwait_lock; + struct list_head mwait_pg_list; /* monitored pages within this KVM */ + #ifdef CONFIG_KVM_MMU_AUDIT int audit_point; #endif @@ -854,6 +872,8 @@ int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port); void kvm_emulate_cpuid(struct kvm_vcpu *vcpu); int kvm_emulate_halt(struct kvm_vcpu *vcpu); int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu); +int kvm_emulate_monitor(struct kvm_vcpu *vcpu); +int kvm_emulate_mwait(struct kvm_vcpu *vcpu); void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg); int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg); @@ -915,6 +935,7 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new, int bytes); int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn); int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva); +int kvm_mmu_protect_page(struct kvm *kvm, gfn_t gfn); void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu); int kvm_mmu_load(struct kvm_vcpu *vcpu); void kvm_mmu_unload(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index c697625..7d4f1ca 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -279,6 +279,14 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW); /* cpuid 1.ecx */ const u32 kvm_supported_word4_x86_features = + /* OS X does not check CPUID before using MONITOR/MWAIT from its + * power-optimized idle loop (AppleIntelPowerManagement.kext). + * For now, we don't advertise MWAIT support below, but attempt + * to emulate them instead of issuing an invalid opcode fault + * if a misbehaving guest calls them anyway. Removing the above + * mentioned kext from OS X will cause it to fall back to a + * HLT-based idle loop, as an optional guest optimization step. + */ F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | 0 /* DS-CPL, VMX, SMX, EST */ | 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index e50425d..bc02ebd 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2283,6 +2283,20 @@ int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn) } EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page); +int kvm_mmu_protect_page(struct kvm *kvm, gfn_t gfn) +{ + int r; + + spin_lock(&kvm->mmu_lock); + r = rmap_write_protect(kvm, gfn); + if (r) + kvm_flush_remote_tlbs(kvm); + spin_unlock(&kvm->mmu_lock); + + return r; +} +EXPORT_SYMBOL_GPL(kvm_mmu_protect_page); + /* * The function is based on mtrr_type_lookup() in * arch/x86/kernel/cpu/mtrr/generic.c @@ -4146,12 +4160,68 @@ static bool is_mmio_page_fault(struct kvm_vcpu *vcpu, gva_t addr) return vcpu_match_mmio_gva(vcpu, addr); } +// try to handle fault caused by write to monitored (mwait) page +// FIXME: aim for better integration between this and FNAME(page_fault)() and +// kvm_mmu_page_fault() below. For now, this is proof-of-concept code. +static bool handle_mwait_write_fault(struct kvm_vcpu *vcpu, gva_t gva, + void *in, int in_len) +{ + gpa_t gpa; + struct kvm_mwait_pg *p, *mwp = NULL; + struct kvm_vcpu_arch *v, *u; + bool r = false; + + gpa = kvm_mmu_gva_to_gpa_system(vcpu, gva, NULL); + if (gpa == UNMAPPED_GVA) + goto ul_out; + + mutex_lock(&vcpu->kvm->arch.mwait_lock); + + /* is gpa matching a monitored (mwait) page? */ + list_for_each_entry(p, &vcpu->kvm->arch.mwait_pg_list, link) + if (p->gpa == gpa) { + mwp = p; + break; + } + if (mwp == NULL) + goto out; + + mwp->accessed = 1; + + if (x86_emulate_instruction(vcpu, gva, + EMULTYPE_RETRY, in, in_len) != EMULATE_DONE) + goto out; + + /* disarm all VCPUs monitoring this page, waking them if needed */ + list_for_each_entry_safe(v, u, &mwp->vcpu_list, mw_link) { + list_del(&v->mw_link); + v->mwp = NULL; + if (v->mp_state == KVM_MP_STATE_MWAIT) + v->mp_state = KVM_MP_STATE_RUNNABLE; + } + + // What if the mwait is woken up by an interrupt instead of a write ? + // It might remain "armed" on its old mwait page, but any subsequent + // MONITOR instruction would replace that, so I don't think we need + // to worry about it... + + r = true; +out: + mutex_unlock(&vcpu->kvm->arch.mwait_lock); +ul_out: + return r; +} + int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code, void *insn, int insn_len) { int r, emulation_type = EMULTYPE_RETRY; enum emulation_result er; + /* writing to MONITORed memory area ? */ + if (handle_mwait_write_fault(vcpu, cr2, insn, insn_len)) + return 1; + r = vcpu->arch.mmu.page_fault(vcpu, cr2, error_code, false); if (r < 0) goto out; diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index e81df8f..638704c 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3262,6 +3262,18 @@ static int pause_interception(struct vcpu_svm *svm) return 1; } +static int monitor_interception(struct vcpu_svm *svm) +{ + skip_emulated_instruction(&(svm->vcpu)); + return kvm_emulate_monitor(&(svm->vcpu)); +} + +static int mwait_interception(struct vcpu_svm *svm) +{ + skip_emulated_instruction(&(svm->vcpu)); + return kvm_emulate_mwait(&(svm->vcpu)); +} + static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = { [SVM_EXIT_READ_CR0] = cr_interception, [SVM_EXIT_READ_CR3] = cr_interception, @@ -3319,8 +3331,8 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = { [SVM_EXIT_CLGI] = clgi_interception, [SVM_EXIT_SKINIT] = skinit_interception, [SVM_EXIT_WBINVD] = emulate_on_interception, - [SVM_EXIT_MONITOR] = invalid_op_interception, - [SVM_EXIT_MWAIT] = invalid_op_interception, + [SVM_EXIT_MONITOR] = monitor_interception, + [SVM_EXIT_MWAIT] = mwait_interception, [SVM_EXIT_XSETBV] = xsetbv_interception, [SVM_EXIT_NPF] = pf_interception, }; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a06f101..a7382e1 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5603,6 +5603,18 @@ static int handle_invalid_op(struct kvm_vcpu *vcpu) return 1; } +static int handle_monitor(struct kvm_vcpu *vcpu) +{ + skip_emulated_instruction(vcpu); + return kvm_emulate_monitor(vcpu); +} + +static int handle_mwait(struct kvm_vcpu *vcpu) +{ + skip_emulated_instruction(vcpu); + return kvm_emulate_mwait(vcpu); +} + /* * To run an L2 guest, we need a vmcs02 based on the L1-specified vmcs12. * We could reuse a single VMCS for all the L2 guests, but we also want the @@ -6483,8 +6495,8 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = { [EXIT_REASON_EPT_VIOLATION] = handle_ept_violation, [EXIT_REASON_EPT_MISCONFIG] = handle_ept_misconfig, [EXIT_REASON_PAUSE_INSTRUCTION] = handle_pause, - [EXIT_REASON_MWAIT_INSTRUCTION] = handle_invalid_op, - [EXIT_REASON_MONITOR_INSTRUCTION] = handle_invalid_op, + [EXIT_REASON_MWAIT_INSTRUCTION] = handle_mwait, + [EXIT_REASON_MONITOR_INSTRUCTION] = handle_monitor, [EXIT_REASON_INVEPT] = handle_invept, }; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 39c28f09..8edc1be 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5592,6 +5592,70 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_emulate_halt); +int kvm_emulate_monitor(struct kvm_vcpu *vcpu) +{ + gva_t gva; + gpa_t gpa; + struct kvm_mwait_pg *p; + + /* emulate as NOP if no-kvm-irqchip */ + if (!irqchip_in_kernel(vcpu->kvm)) + return 1; + + mutex_lock(&vcpu->kvm->arch.mwait_lock); + + /* relinguish any previously monitored mwait page */ + if (vcpu->arch.mwp != NULL) { + list_del(&vcpu->arch.mw_link); + vcpu->arch.mwp->accessed = 1; + vcpu->arch.mwp = NULL; + } + + gva = kvm_register_read(vcpu, VCPU_REGS_RAX); + gpa = kvm_mmu_gva_to_gpa_system(vcpu, gva, NULL); + if (gpa == UNMAPPED_GVA) + goto out; /* let some write op map the page first */ + + /* does the mwait page we're looking for already exist? */ + list_for_each_entry(p, &vcpu->kvm->arch.mwait_pg_list, link) + if (p->gpa == gpa) { + vcpu->arch.mwp = p; + break; + } + if (vcpu->arch.mwp == NULL) { /* no, add new mwait page */ + if (!kvm_mmu_protect_page(vcpu->kvm, gpa_to_gfn(gpa))) + goto out; + p = kmalloc(sizeof(struct kvm_mwait_pg), GFP_KERNEL); + p->gpa = gpa; + INIT_LIST_HEAD(&p->vcpu_list); + list_add(&p->link, &vcpu->kvm->arch.mwait_pg_list); + + vcpu->arch.mwp = p; + } + + /* link this VCPU into list of VCPUs monitoring this mwait page */ + list_add(&vcpu->arch.mw_link, &vcpu->arch.mwp->vcpu_list); + +out: + mutex_unlock(&vcpu->kvm->arch.mwait_lock); + return 1; +} +EXPORT_SYMBOL_GPL(kvm_emulate_monitor); + +int kvm_emulate_mwait(struct kvm_vcpu *vcpu) +{ + /* emulate as NOP if no-kvm-irqchip */ + if (!irqchip_in_kernel(vcpu->kvm)) + return 1; + + mutex_lock(&vcpu->kvm->arch.mwait_lock); + if (vcpu->arch.mwp != NULL) + vcpu->arch.mp_state = KVM_MP_STATE_MWAIT; + mutex_unlock(&vcpu->kvm->arch.mwait_lock); + return 1; +} +EXPORT_SYMBOL_GPL(kvm_emulate_mwait); + int kvm_hv_hypercall(struct kvm_vcpu *vcpu) { u64 param, ingpa, outgpa, ret; @@ -6077,6 +6141,7 @@ static int __vcpu_run(struct kvm_vcpu *vcpu) if (kvm_check_request(KVM_REQ_UNHALT, vcpu)) { kvm_apic_accept_events(vcpu); switch(vcpu->arch.mp_state) { + case KVM_MP_STATE_MWAIT: case KVM_MP_STATE_HALTED: vcpu->arch.pv.pv_unhalted = false; vcpu->arch.mp_state = @@ -6961,6 +7026,8 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) kvm_async_pf_hash_reset(vcpu); kvm_pmu_init(vcpu); + vcpu->arch.mwp = NULL; + return 0; fail_free_wbinvd_dirty_mask: free_cpumask_var(vcpu->arch.wbinvd_dirty_mask); @@ -7013,6 +7080,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) pvclock_update_vm_gtod_copy(kvm); + mutex_init(&kvm->arch.mwait_lock); + INIT_LIST_HEAD(&kvm->arch.mwait_pg_list); + return 0; } @@ -7254,8 +7324,10 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) || kvm_apic_has_events(vcpu) || vcpu->arch.pv.pv_unhalted || atomic_read(&vcpu->arch.nmi_queued) || - (kvm_arch_interrupt_allowed(vcpu) && - kvm_cpu_has_interrupt(vcpu)); + (kvm_cpu_has_interrupt(vcpu) && + (kvm_arch_interrupt_allowed(vcpu) || + (vcpu->arch.mp_state == KVM_MP_STATE_MWAIT && + kvm_register_read(vcpu, VCPU_REGS_RCX) & 0x01))); } int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu) diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 932d7f2..a4925fc 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -398,6 +398,7 @@ struct kvm_vapic_addr { #define KVM_MP_STATE_INIT_RECEIVED 2 #define KVM_MP_STATE_HALTED 3 #define KVM_MP_STATE_SIPI_RECEIVED 4 +#define KVM_MP_STATE_MWAIT 5 struct kvm_mp_state { __u32 mp_state; ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-16 16:16 ` Gabriel L. Somlo @ 2017-03-16 16:45 ` Michael S. Tsirkin 2017-03-16 16:52 ` Gabriel L. Somlo 0 siblings, 1 reply; 54+ messages in thread From: Michael S. Tsirkin @ 2017-03-16 16:45 UTC (permalink / raw) To: Gabriel L. Somlo Cc: Radim Krčmář, linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc On Thu, Mar 16, 2017 at 12:16:13PM -0400, Gabriel L. Somlo wrote: > On Thu, Mar 16, 2017 at 04:35:18PM +0100, Radim Krčmář wrote: > > 2017-03-16 10:58-0400, Gabriel L. Somlo: > > > On Thu, Mar 16, 2017 at 04:04:12PM +0200, Michael S. Tsirkin wrote: > > > > On Thu, Mar 16, 2017 at 09:24:27AM -0400, Gabriel L. Somlo wrote: > > > > > After studying your patch a bit more carefully (sorry, it's crazy > > > > > around here right now :) ) I realized you're simply trying to > > > > > (selectively) decide when to exit L1 and emulate as NOP vs. when to > > > > > just allow L1 to execute MONITOR & MWAIT natively. > > > > > > > > > > Is that right ? Because if so, the issues I saw on my MacPro1,1 are > > > > > weird and inexplicable, given that allowing L>=1 to run MONITOR/MWAIT > > > > > natively was one of the options Alex Graf and Rene Rebe used back in > > > > > the very early days of OS X on QEMU, at the time I got involved with > > > > > that project. Here's part of an out of tree patch against 3.4 which did > > > > > just that, and worked as far as I remember on *any* MWAIT capable > > > > > intel chip I had access to back in 2010: > > > > > > > > > > ############################################################################## > > > > > # 99-mwait.patch.kvm-kmod (Rene Rebe <rene@exactcode.de>) 2010-04-27 > > > > > ############################################################################## > > > > > diff -pNarU5 linux-3.4/arch/x86/kvm/cpuid.c linux-3.4-mac/arch/x86/kvm/cpuid.c > > > > > --- linux-3.4/arch/x86/kvm/cpuid.c 2012-05-20 18:29:13.000000000 -0400 > > > > > +++ linux-3.4-mac/arch/x86/kvm/cpuid.c 2012-10-09 11:42:59.921215750 -0400 > > > > > @@ -222,11 +222,11 @@ static int do_cpuid_ent(struct kvm_cpuid > > > > > f_nx | 0 /* Reserved */ | F(MMXEXT) | F(MMX) | > > > > > F(FXSR) | F(FXSR_OPT) | f_gbpages | f_rdtscp | > > > > > 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW); > > > > > /* cpuid 1.ecx */ > > > > > const u32 kvm_supported_word4_x86_features = > > > > > - F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | > > > > > + F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ | > > > > > 0 /* DS-CPL, VMX, SMX, EST */ | > > > > > 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | > > > > > F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ | > > > > > 0 /* Reserved, DCA */ | F(XMM4_1) | > > > > > F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) | > > > > > diff -pNarU5 linux-3.4/arch/x86/kvm/svm.c linux-3.4-mac/arch/x86/kvm/svm.c > > > > > --- linux-3.4/arch/x86/kvm/svm.c 2012-05-20 18:29:13.000000000 -0400 > > > > > +++ linux-3.4-mac/arch/x86/kvm/svm.c 2012-10-09 11:44:41.598997481 -0400 > > > > > @@ -1102,12 +1102,10 @@ static void init_vmcb(struct vcpu_svm *s > > > > > set_intercept(svm, INTERCEPT_VMSAVE); > > > > > set_intercept(svm, INTERCEPT_STGI); > > > > > set_intercept(svm, INTERCEPT_CLGI); > > > > > set_intercept(svm, INTERCEPT_SKINIT); > > > > > set_intercept(svm, INTERCEPT_WBINVD); > > > > > - set_intercept(svm, INTERCEPT_MONITOR); > > > > > - set_intercept(svm, INTERCEPT_MWAIT); > > > > > set_intercept(svm, INTERCEPT_XSETBV); > > > > > > > > > > control->iopm_base_pa = iopm_base; > > > > > control->msrpm_base_pa = __pa(svm->msrpm); > > > > > control->int_ctl = V_INTR_MASKING_MASK; > > > > > diff -pNarU5 linux-3.4/arch/x86/kvm/vmx.c linux-3.4-mac/arch/x86/kvm/vmx.c > > > > > --- linux-3.4/arch/x86/kvm/vmx.c 2012-05-20 18:29:13.000000000 -0400 > > > > > +++ linux-3.4-mac/arch/x86/kvm/vmx.c 2012-10-09 11:42:59.925215977 -0400 > > > > > @@ -1938,11 +1938,11 @@ static __init void nested_vmx_setup_ctls > > > > > nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high); > > > > > nested_vmx_procbased_ctls_low = 0; > > > > > nested_vmx_procbased_ctls_high &= > > > > > CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_USE_TSC_OFFSETING | > > > > > CPU_BASED_HLT_EXITING | CPU_BASED_INVLPG_EXITING | > > > > > - CPU_BASED_MWAIT_EXITING | CPU_BASED_CR3_LOAD_EXITING | > > > > > + CPU_BASED_CR3_LOAD_EXITING | > > > > > CPU_BASED_CR3_STORE_EXITING | > > > > > #ifdef CONFIG_X86_64 > > > > > CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING | > > > > > #endif > > > > > CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING | > > > > > @@ -2404,12 +2404,10 @@ static __init int setup_vmcs_config(stru > > > > > CPU_BASED_CR3_LOAD_EXITING | > > > > > CPU_BASED_CR3_STORE_EXITING | > > > > > CPU_BASED_USE_IO_BITMAPS | > > > > > CPU_BASED_MOV_DR_EXITING | > > > > > CPU_BASED_USE_TSC_OFFSETING | > > > > > - CPU_BASED_MWAIT_EXITING | > > > > > - CPU_BASED_MONITOR_EXITING | > > > > > CPU_BASED_INVLPG_EXITING | > > > > > CPU_BASED_RDPMC_EXITING; > > > > > > > > > > opt = CPU_BASED_TPR_SHADOW | > > > > > CPU_BASED_USE_MSR_BITMAPS | > > > > > > > > > > If all you're trying to do is (selectively) revert to this behavior, > > > > > that "shouldn't" mess it up for the MacPro either, so I'm thoroughly > > > > > confused at this point :) > > > > > > > > Yes. Me too. Want to try that other patch and see what happens? > > > > > > You mean the old 3.4 patch against current KVM ? I'll try to do that, > > > might take me a while :) > > > > Michael's patch already did most of that, you just need to add > > > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > > index efde6cc50875..b12f07d4ce17 100644 > > --- a/arch/x86/kvm/cpuid.c > > +++ b/arch/x86/kvm/cpuid.c > > @@ -348,7 +348,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, > > const u32 kvm_cpuid_1_ecx_x86_features = > > /* NOTE: MONITOR (and MWAIT) are emulated as NOP, > > * but *not* advertised to guests via CPUID ! */ > > - F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | > > + F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ | > > 0 /* DS-CPL, VMX, SMX, EST */ | > > 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | > > F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ | > > > > Note: this will never be upstream, because mwait isn't what we want by > > default. :) > > But since OS X doesn't check CPUID and simply runs MONITOR & MWAIT > assuming they're present, the above one-liner would make no > difference. If everything else in the old patch I quoted is identical > to what Michael does, then I don't know -- maybe the MacPro1,1 has > really broken L>=1 MWAIT, and it only ever worked with vmexit and > emulation on the host side. I think I have an idea. It is probably one of the monitor bugs on this host. X86_BUG_CLFLUSH_MONITOR or X86_BUG_MONITOR. If you tell guest you have a CPU that does not need it but host does need it, then mwait will not work. if (c->x86 == 6 && boot_cpu_has(X86_FEATURE_CLFLUSH) && (c->x86_model == 29 || c->x86_model == 46 || c->x86_model == 47)) set_cpu_bug(c, X86_BUG_CLFLUSH_MONITOR); if (c->x86 == 6 && boot_cpu_has(X86_FEATURE_MWAIT) && ((c->x86_model == INTEL_FAM6_ATOM_GOLDMONT))) set_cpu_bug(c, X86_BUG_MONITOR); what did you say your host model is? > > >> > Back in 2010, running MWAIT in L>=1 behaved 100% exactly like a NOP, > > >> > didn't power down the physical CPU, just immediately moved on to the > > >> > next instruction. As such, there was no power saving and no > > >> > opportunity to yield to another L0 thread either, unlike with NOP > > >> > emulation at L0. > > >> > > > >> > Did that change on newer Intel chips (i.e., is guest-mode MWAIT now > > >> > doing something smarter than just acting as a guest-mode NOP) ? > > >> > > > >> > Thanks, > > >> > --Gabriel > > >> > > >> Interesting. What it seems to say is this: > > >> > > >> MWAIT. Behavior of the MWAIT instruction (which always causes an invalid- > > >> opcode exception—#UD—if CPL > 0) is determined by the setting of the “MWAIT > > >> exiting” VM-execution control: > > >> — If the “MWAIT exiting” VM-execution control is 1, MWAIT causes a VM exit > > >> (see Section 22.1.3). > > >> — If the “MWAIT exiting” VM-execution control is 0, MWAIT operates normally if > > >> any of the following is true: (1) the “interrupt-window exiting” VM-execution > > >> control is 0; (2) ECX[0] is 0; or (3) RFLAGS.IF = 1. > > >> — If the “MWAIT exiting” VM-execution control is 0, the “interrupt-window > > >> exiting” VM-execution control is 1, ECX[0] = 1, and RFLAGS.IF = 0, MWAIT > > >> does not cause the processor to enter an implementation-dependent > > >> optimized state; instead, control passes to the instruction following the > > >> MWAIT instruction. > > >> > > >> > > >> And since interrupt-window exiting is 0 most of the time for KVM, > > >> I would expect MWAIT to behave normally. > > > > > > The intel manual said the same thing back in 2010 as well. However, > > > regardless of how any flags were set, interrupt-window exiting or not, > > > "normal" L1 MWAIT behavior was that it woke up immediately regardless. > > > Remember, never going to sleep is still correct ("normal" ?) behavior > > > per the ISA definition of MWAIT :) > > > > I'll write a simple kvm-unit-test to better understand why it is broken > > for you ... > > > > > Also, when I tested your patch on the macbook air (where it worked), > > > not only was the host reporting 400% CPU for qemu (which is to be > > > expected), but the thermal fan/cooling thing also shifted up into high > > > gear, which means the physical CPU got hot, which it shouldn't have if > > > the guest-mode MWAIT actually did put the host CPU into low power. > > > > I tested MWAIT with basically the same kernel patch and the qemu patch > > with Linux guest on Haswell and Nehalem. Running the guest took 100% of > > the host CPUs, but it still had the same temperature as when the host > > was idle. > > > > That reminds me that you to pass '-cpu host' for QEMU reasons. > > For OS X to boot, one needs '-cpu core2duo' for <= 10.11, and > '-cpu Penryn' for 10.12. I never managed to get it working with any > other settings. > > So I'm ready to write off the MacPro1,1 (unless you want me run more > tests and report back for you, which I'm happy to do in any case). > > But please please, so at least I walk away from this having learned > something :) help me understand the use case: > > - By careful setting of vmx flags, and/or on newer, sanely > built Intel hardware, L1 MWAIT actually powers down the > physical host core (while I couldn't get it to stay cool > on my end, I totally believe you managed to pull it off) > > - We never admit to supporting MWAIT to guests, but when they > do anyway (either because they're old/grumpy/careless OS X > versions, or some newfangled custom-built Linux kernel which > is hacked to ignore CPUID on purpose), we now allow the > guest to: > - keep its alloted time slice > - but "waste" it by powering down the host CPU > instead of > - vmexit to the host OS at L0 > - yield the host core to another L0 runnable thread NOP doesn't yield atomatically, does it? CPU stays runnable, it just makes it a bit cheaper to switch to another thread as you don't need to exit. > Since newer OS X actually checks CPUID, I don't have a major stake in > one way vs. the other, but I'm really really curious: > > Are we trying to save power assuming the host is unlikely to have > enough runnable L0 threads for when the L0-emulated NOP yields? So > we're better off letting the guest keep the CPU but also keep it cool > while at it (assuming the guest isn't totally hostile and didn't pick > a setting where L1 MWAIT actually works as L1 NOP, in which case we > don't even get to stay cool)? > > Man, I wish I had the cycles to resurrect my attempt at acually emulating > MWAIT with something like a condition queue (below, just for reference). > > Thanks much, > --Gabriel > > > ############################################################################## > # kvm-mwait-emu.patch (Gabriel Somlo <somlo@cmu.edu> 2014/02/05) > # -- based on an idea suggested by Alex Graf -- > # GLS: emulate MONITOR and MWAIT at page-level granularity by write-protecting > # the page containing a monitored location and appropriately handling > # subsequent write faults. > # After debugging the SMP issue, we'll need a way to trigger a > # periodic cleanup that will switch write-protected monitored pages > # back to read-write, once they've stayed unused for "long enough" > ############################################################################## > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index fdf83af..7ca9b51 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -337,6 +337,16 @@ struct kvm_pmu { > u64 reprogram_pmi; > }; > > +/* > + * mwait-monitored page list element type > + */ > +struct kvm_mwait_pg { > + gpa_t gpa; > + struct list_head vcpu_list; /* VCPUs monitoring (armed on) this page */ > + struct list_head link; /* links mwait-pages within a KVM */ > + unsigned accessed; > +}; > + > struct kvm_vcpu_arch { > /* > * rip and regs accesses must go through > @@ -528,6 +538,10 @@ struct kvm_vcpu_arch { > struct { > bool pv_unhalted; > } pv; > + > + /* MONITOR/MWAIT support */ > + struct kvm_mwait_pg *mwp; /* page monitored by this VCPU */ > + struct list_head mw_link; /* all VCPUs monitoring the same page */ > }; > > struct kvm_lpage_info { > @@ -607,6 +621,10 @@ struct kvm_arch { > u64 hv_hypercall; > u64 hv_tsc_page; > > + /* MONITOR/MWAIT support */ > + struct mutex mwait_lock; > + struct list_head mwait_pg_list; /* monitored pages within this KVM */ > + > #ifdef CONFIG_KVM_MMU_AUDIT > int audit_point; > #endif > @@ -854,6 +872,8 @@ int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port); > void kvm_emulate_cpuid(struct kvm_vcpu *vcpu); > int kvm_emulate_halt(struct kvm_vcpu *vcpu); > int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu); > +int kvm_emulate_monitor(struct kvm_vcpu *vcpu); > +int kvm_emulate_mwait(struct kvm_vcpu *vcpu); > > void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg); > int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg); > @@ -915,6 +935,7 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, > const u8 *new, int bytes); > int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn); > int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva); > +int kvm_mmu_protect_page(struct kvm *kvm, gfn_t gfn); > void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu); > int kvm_mmu_load(struct kvm_vcpu *vcpu); > void kvm_mmu_unload(struct kvm_vcpu *vcpu); > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index c697625..7d4f1ca 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -279,6 +279,14 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, > 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW); > /* cpuid 1.ecx */ > const u32 kvm_supported_word4_x86_features = > + /* OS X does not check CPUID before using MONITOR/MWAIT from its > + * power-optimized idle loop (AppleIntelPowerManagement.kext). > + * For now, we don't advertise MWAIT support below, but attempt > + * to emulate them instead of issuing an invalid opcode fault > + * if a misbehaving guest calls them anyway. Removing the above > + * mentioned kext from OS X will cause it to fall back to a > + * HLT-based idle loop, as an optional guest optimization step. > + */ > F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | > 0 /* DS-CPL, VMX, SMX, EST */ | > 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index e50425d..bc02ebd 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -2283,6 +2283,20 @@ int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn) > } > EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page); > > +int kvm_mmu_protect_page(struct kvm *kvm, gfn_t gfn) > +{ > + int r; > + > + spin_lock(&kvm->mmu_lock); > + r = rmap_write_protect(kvm, gfn); > + if (r) > + kvm_flush_remote_tlbs(kvm); > + spin_unlock(&kvm->mmu_lock); > + > + return r; > +} > +EXPORT_SYMBOL_GPL(kvm_mmu_protect_page); > + > /* > * The function is based on mtrr_type_lookup() in > * arch/x86/kernel/cpu/mtrr/generic.c > @@ -4146,12 +4160,68 @@ static bool is_mmio_page_fault(struct kvm_vcpu *vcpu, gva_t addr) > return vcpu_match_mmio_gva(vcpu, addr); > } > > +// try to handle fault caused by write to monitored (mwait) page > +// FIXME: aim for better integration between this and FNAME(page_fault)() and > +// kvm_mmu_page_fault() below. For now, this is proof-of-concept code. > +static bool handle_mwait_write_fault(struct kvm_vcpu *vcpu, gva_t gva, > + void *in, int in_len) > +{ > + gpa_t gpa; > + struct kvm_mwait_pg *p, *mwp = NULL; > + struct kvm_vcpu_arch *v, *u; > + bool r = false; > + > + gpa = kvm_mmu_gva_to_gpa_system(vcpu, gva, NULL); > + if (gpa == UNMAPPED_GVA) > + goto ul_out; > + > + mutex_lock(&vcpu->kvm->arch.mwait_lock); > + > + /* is gpa matching a monitored (mwait) page? */ > + list_for_each_entry(p, &vcpu->kvm->arch.mwait_pg_list, link) > + if (p->gpa == gpa) { > + mwp = p; > + break; > + } > + if (mwp == NULL) > + goto out; > + > + mwp->accessed = 1; > + > + if (x86_emulate_instruction(vcpu, gva, > + EMULTYPE_RETRY, in, in_len) != EMULATE_DONE) > + goto out; > + > + /* disarm all VCPUs monitoring this page, waking them if needed */ > + list_for_each_entry_safe(v, u, &mwp->vcpu_list, mw_link) { > + list_del(&v->mw_link); > + v->mwp = NULL; > + if (v->mp_state == KVM_MP_STATE_MWAIT) > + v->mp_state = KVM_MP_STATE_RUNNABLE; > + } > + > + // What if the mwait is woken up by an interrupt instead of a write ? > + // It might remain "armed" on its old mwait page, but any subsequent > + // MONITOR instruction would replace that, so I don't think we need > + // to worry about it... > + > + r = true; > +out: > + mutex_unlock(&vcpu->kvm->arch.mwait_lock); > +ul_out: > + return r; > +} > + > int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code, > void *insn, int insn_len) > { > int r, emulation_type = EMULTYPE_RETRY; > enum emulation_result er; > > + /* writing to MONITORed memory area ? */ > + if (handle_mwait_write_fault(vcpu, cr2, insn, insn_len)) > + return 1; > + > r = vcpu->arch.mmu.page_fault(vcpu, cr2, error_code, false); > if (r < 0) > goto out; > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c > index e81df8f..638704c 100644 > --- a/arch/x86/kvm/svm.c > +++ b/arch/x86/kvm/svm.c > @@ -3262,6 +3262,18 @@ static int pause_interception(struct vcpu_svm *svm) > return 1; > } > > +static int monitor_interception(struct vcpu_svm *svm) > +{ > + skip_emulated_instruction(&(svm->vcpu)); > + return kvm_emulate_monitor(&(svm->vcpu)); > +} > + > +static int mwait_interception(struct vcpu_svm *svm) > +{ > + skip_emulated_instruction(&(svm->vcpu)); > + return kvm_emulate_mwait(&(svm->vcpu)); > +} > + > static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = { > [SVM_EXIT_READ_CR0] = cr_interception, > [SVM_EXIT_READ_CR3] = cr_interception, > @@ -3319,8 +3331,8 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = { > [SVM_EXIT_CLGI] = clgi_interception, > [SVM_EXIT_SKINIT] = skinit_interception, > [SVM_EXIT_WBINVD] = emulate_on_interception, > - [SVM_EXIT_MONITOR] = invalid_op_interception, > - [SVM_EXIT_MWAIT] = invalid_op_interception, > + [SVM_EXIT_MONITOR] = monitor_interception, > + [SVM_EXIT_MWAIT] = mwait_interception, > [SVM_EXIT_XSETBV] = xsetbv_interception, > [SVM_EXIT_NPF] = pf_interception, > }; > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index a06f101..a7382e1 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -5603,6 +5603,18 @@ static int handle_invalid_op(struct kvm_vcpu *vcpu) > return 1; > } > > +static int handle_monitor(struct kvm_vcpu *vcpu) > +{ > + skip_emulated_instruction(vcpu); > + return kvm_emulate_monitor(vcpu); > +} > + > +static int handle_mwait(struct kvm_vcpu *vcpu) > +{ > + skip_emulated_instruction(vcpu); > + return kvm_emulate_mwait(vcpu); > +} > + > /* > * To run an L2 guest, we need a vmcs02 based on the L1-specified vmcs12. > * We could reuse a single VMCS for all the L2 guests, but we also want the > @@ -6483,8 +6495,8 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = { > [EXIT_REASON_EPT_VIOLATION] = handle_ept_violation, > [EXIT_REASON_EPT_MISCONFIG] = handle_ept_misconfig, > [EXIT_REASON_PAUSE_INSTRUCTION] = handle_pause, > - [EXIT_REASON_MWAIT_INSTRUCTION] = handle_invalid_op, > - [EXIT_REASON_MONITOR_INSTRUCTION] = handle_invalid_op, > + [EXIT_REASON_MWAIT_INSTRUCTION] = handle_mwait, > + [EXIT_REASON_MONITOR_INSTRUCTION] = handle_monitor, > [EXIT_REASON_INVEPT] = handle_invept, > }; > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 39c28f09..8edc1be 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -5592,6 +5592,70 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu) > } > EXPORT_SYMBOL_GPL(kvm_emulate_halt); > > +int kvm_emulate_monitor(struct kvm_vcpu *vcpu) > +{ > + gva_t gva; > + gpa_t gpa; > + struct kvm_mwait_pg *p; > + > + /* emulate as NOP if no-kvm-irqchip */ > + if (!irqchip_in_kernel(vcpu->kvm)) > + return 1; > + > + mutex_lock(&vcpu->kvm->arch.mwait_lock); > + > + /* relinguish any previously monitored mwait page */ > + if (vcpu->arch.mwp != NULL) { > + list_del(&vcpu->arch.mw_link); > + vcpu->arch.mwp->accessed = 1; > + vcpu->arch.mwp = NULL; > + } > + > + gva = kvm_register_read(vcpu, VCPU_REGS_RAX); > + gpa = kvm_mmu_gva_to_gpa_system(vcpu, gva, NULL); > + if (gpa == UNMAPPED_GVA) > + goto out; /* let some write op map the page first */ > + > + /* does the mwait page we're looking for already exist? */ > + list_for_each_entry(p, &vcpu->kvm->arch.mwait_pg_list, link) > + if (p->gpa == gpa) { > + vcpu->arch.mwp = p; > + break; > + } > + if (vcpu->arch.mwp == NULL) { /* no, add new mwait page */ > + if (!kvm_mmu_protect_page(vcpu->kvm, gpa_to_gfn(gpa))) > + goto out; > + p = kmalloc(sizeof(struct kvm_mwait_pg), GFP_KERNEL); > + p->gpa = gpa; > + INIT_LIST_HEAD(&p->vcpu_list); > + list_add(&p->link, &vcpu->kvm->arch.mwait_pg_list); > + > + vcpu->arch.mwp = p; > + } > + > + /* link this VCPU into list of VCPUs monitoring this mwait page */ > + list_add(&vcpu->arch.mw_link, &vcpu->arch.mwp->vcpu_list); > + > +out: > + mutex_unlock(&vcpu->kvm->arch.mwait_lock); > + return 1; > +} > +EXPORT_SYMBOL_GPL(kvm_emulate_monitor); > + > +int kvm_emulate_mwait(struct kvm_vcpu *vcpu) > +{ > + /* emulate as NOP if no-kvm-irqchip */ > + if (!irqchip_in_kernel(vcpu->kvm)) > + return 1; > + > + mutex_lock(&vcpu->kvm->arch.mwait_lock); > + if (vcpu->arch.mwp != NULL) > + vcpu->arch.mp_state = KVM_MP_STATE_MWAIT; > + mutex_unlock(&vcpu->kvm->arch.mwait_lock); > + return 1; > +} > +EXPORT_SYMBOL_GPL(kvm_emulate_mwait); > + > int kvm_hv_hypercall(struct kvm_vcpu *vcpu) > { > u64 param, ingpa, outgpa, ret; > @@ -6077,6 +6141,7 @@ static int __vcpu_run(struct kvm_vcpu *vcpu) > if (kvm_check_request(KVM_REQ_UNHALT, vcpu)) { > kvm_apic_accept_events(vcpu); > switch(vcpu->arch.mp_state) { > + case KVM_MP_STATE_MWAIT: > case KVM_MP_STATE_HALTED: > vcpu->arch.pv.pv_unhalted = false; > vcpu->arch.mp_state = > @@ -6961,6 +7026,8 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) > kvm_async_pf_hash_reset(vcpu); > kvm_pmu_init(vcpu); > > + vcpu->arch.mwp = NULL; > + > return 0; > fail_free_wbinvd_dirty_mask: > free_cpumask_var(vcpu->arch.wbinvd_dirty_mask); > @@ -7013,6 +7080,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) > > pvclock_update_vm_gtod_copy(kvm); > > + mutex_init(&kvm->arch.mwait_lock); > + INIT_LIST_HEAD(&kvm->arch.mwait_pg_list); > + > return 0; > } > > @@ -7254,8 +7324,10 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) > || kvm_apic_has_events(vcpu) > || vcpu->arch.pv.pv_unhalted > || atomic_read(&vcpu->arch.nmi_queued) || > - (kvm_arch_interrupt_allowed(vcpu) && > - kvm_cpu_has_interrupt(vcpu)); > + (kvm_cpu_has_interrupt(vcpu) && > + (kvm_arch_interrupt_allowed(vcpu) || > + (vcpu->arch.mp_state == KVM_MP_STATE_MWAIT && > + kvm_register_read(vcpu, VCPU_REGS_RCX) & 0x01))); > } > > int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu) > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index 932d7f2..a4925fc 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -398,6 +398,7 @@ struct kvm_vapic_addr { > #define KVM_MP_STATE_INIT_RECEIVED 2 > #define KVM_MP_STATE_HALTED 3 > #define KVM_MP_STATE_SIPI_RECEIVED 4 > +#define KVM_MP_STATE_MWAIT 5 > > struct kvm_mp_state { > __u32 mp_state; ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-16 16:45 ` Michael S. Tsirkin @ 2017-03-16 16:52 ` Gabriel L. Somlo 2017-03-16 16:54 ` Gabriel L. Somlo 0 siblings, 1 reply; 54+ messages in thread From: Gabriel L. Somlo @ 2017-03-16 16:52 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Radim Krčmář, linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc On Thu, Mar 16, 2017 at 06:45:02PM +0200, Michael S. Tsirkin wrote: > On Thu, Mar 16, 2017 at 12:16:13PM -0400, Gabriel L. Somlo wrote: > > On Thu, Mar 16, 2017 at 04:35:18PM +0100, Radim Krčmář wrote: > > > 2017-03-16 10:58-0400, Gabriel L. Somlo: > > > > On Thu, Mar 16, 2017 at 04:04:12PM +0200, Michael S. Tsirkin wrote: > > > > > On Thu, Mar 16, 2017 at 09:24:27AM -0400, Gabriel L. Somlo wrote: > > > > > > After studying your patch a bit more carefully (sorry, it's crazy > > > > > > around here right now :) ) I realized you're simply trying to > > > > > > (selectively) decide when to exit L1 and emulate as NOP vs. when to > > > > > > just allow L1 to execute MONITOR & MWAIT natively. > > > > > > > > > > > > Is that right ? Because if so, the issues I saw on my MacPro1,1 are > > > > > > weird and inexplicable, given that allowing L>=1 to run MONITOR/MWAIT > > > > > > natively was one of the options Alex Graf and Rene Rebe used back in > > > > > > the very early days of OS X on QEMU, at the time I got involved with > > > > > > that project. Here's part of an out of tree patch against 3.4 which did > > > > > > just that, and worked as far as I remember on *any* MWAIT capable > > > > > > intel chip I had access to back in 2010: > > > > > > > > > > > > ############################################################################## > > > > > > # 99-mwait.patch.kvm-kmod (Rene Rebe <rene@exactcode.de>) 2010-04-27 > > > > > > ############################################################################## > > > > > > diff -pNarU5 linux-3.4/arch/x86/kvm/cpuid.c linux-3.4-mac/arch/x86/kvm/cpuid.c > > > > > > --- linux-3.4/arch/x86/kvm/cpuid.c 2012-05-20 18:29:13.000000000 -0400 > > > > > > +++ linux-3.4-mac/arch/x86/kvm/cpuid.c 2012-10-09 11:42:59.921215750 -0400 > > > > > > @@ -222,11 +222,11 @@ static int do_cpuid_ent(struct kvm_cpuid > > > > > > f_nx | 0 /* Reserved */ | F(MMXEXT) | F(MMX) | > > > > > > F(FXSR) | F(FXSR_OPT) | f_gbpages | f_rdtscp | > > > > > > 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW); > > > > > > /* cpuid 1.ecx */ > > > > > > const u32 kvm_supported_word4_x86_features = > > > > > > - F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | > > > > > > + F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ | > > > > > > 0 /* DS-CPL, VMX, SMX, EST */ | > > > > > > 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | > > > > > > F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ | > > > > > > 0 /* Reserved, DCA */ | F(XMM4_1) | > > > > > > F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) | > > > > > > diff -pNarU5 linux-3.4/arch/x86/kvm/svm.c linux-3.4-mac/arch/x86/kvm/svm.c > > > > > > --- linux-3.4/arch/x86/kvm/svm.c 2012-05-20 18:29:13.000000000 -0400 > > > > > > +++ linux-3.4-mac/arch/x86/kvm/svm.c 2012-10-09 11:44:41.598997481 -0400 > > > > > > @@ -1102,12 +1102,10 @@ static void init_vmcb(struct vcpu_svm *s > > > > > > set_intercept(svm, INTERCEPT_VMSAVE); > > > > > > set_intercept(svm, INTERCEPT_STGI); > > > > > > set_intercept(svm, INTERCEPT_CLGI); > > > > > > set_intercept(svm, INTERCEPT_SKINIT); > > > > > > set_intercept(svm, INTERCEPT_WBINVD); > > > > > > - set_intercept(svm, INTERCEPT_MONITOR); > > > > > > - set_intercept(svm, INTERCEPT_MWAIT); > > > > > > set_intercept(svm, INTERCEPT_XSETBV); > > > > > > > > > > > > control->iopm_base_pa = iopm_base; > > > > > > control->msrpm_base_pa = __pa(svm->msrpm); > > > > > > control->int_ctl = V_INTR_MASKING_MASK; > > > > > > diff -pNarU5 linux-3.4/arch/x86/kvm/vmx.c linux-3.4-mac/arch/x86/kvm/vmx.c > > > > > > --- linux-3.4/arch/x86/kvm/vmx.c 2012-05-20 18:29:13.000000000 -0400 > > > > > > +++ linux-3.4-mac/arch/x86/kvm/vmx.c 2012-10-09 11:42:59.925215977 -0400 > > > > > > @@ -1938,11 +1938,11 @@ static __init void nested_vmx_setup_ctls > > > > > > nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high); > > > > > > nested_vmx_procbased_ctls_low = 0; > > > > > > nested_vmx_procbased_ctls_high &= > > > > > > CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_USE_TSC_OFFSETING | > > > > > > CPU_BASED_HLT_EXITING | CPU_BASED_INVLPG_EXITING | > > > > > > - CPU_BASED_MWAIT_EXITING | CPU_BASED_CR3_LOAD_EXITING | > > > > > > + CPU_BASED_CR3_LOAD_EXITING | > > > > > > CPU_BASED_CR3_STORE_EXITING | > > > > > > #ifdef CONFIG_X86_64 > > > > > > CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING | > > > > > > #endif > > > > > > CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING | > > > > > > @@ -2404,12 +2404,10 @@ static __init int setup_vmcs_config(stru > > > > > > CPU_BASED_CR3_LOAD_EXITING | > > > > > > CPU_BASED_CR3_STORE_EXITING | > > > > > > CPU_BASED_USE_IO_BITMAPS | > > > > > > CPU_BASED_MOV_DR_EXITING | > > > > > > CPU_BASED_USE_TSC_OFFSETING | > > > > > > - CPU_BASED_MWAIT_EXITING | > > > > > > - CPU_BASED_MONITOR_EXITING | > > > > > > CPU_BASED_INVLPG_EXITING | > > > > > > CPU_BASED_RDPMC_EXITING; > > > > > > > > > > > > opt = CPU_BASED_TPR_SHADOW | > > > > > > CPU_BASED_USE_MSR_BITMAPS | > > > > > > > > > > > > If all you're trying to do is (selectively) revert to this behavior, > > > > > > that "shouldn't" mess it up for the MacPro either, so I'm thoroughly > > > > > > confused at this point :) > > > > > > > > > > Yes. Me too. Want to try that other patch and see what happens? > > > > > > > > You mean the old 3.4 patch against current KVM ? I'll try to do that, > > > > might take me a while :) > > > > > > Michael's patch already did most of that, you just need to add > > > > > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > > > index efde6cc50875..b12f07d4ce17 100644 > > > --- a/arch/x86/kvm/cpuid.c > > > +++ b/arch/x86/kvm/cpuid.c > > > @@ -348,7 +348,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, > > > const u32 kvm_cpuid_1_ecx_x86_features = > > > /* NOTE: MONITOR (and MWAIT) are emulated as NOP, > > > * but *not* advertised to guests via CPUID ! */ > > > - F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | > > > + F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ | > > > 0 /* DS-CPL, VMX, SMX, EST */ | > > > 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | > > > F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ | > > > > > > Note: this will never be upstream, because mwait isn't what we want by > > > default. :) > > > > But since OS X doesn't check CPUID and simply runs MONITOR & MWAIT > > assuming they're present, the above one-liner would make no > > difference. If everything else in the old patch I quoted is identical > > to what Michael does, then I don't know -- maybe the MacPro1,1 has > > really broken L>=1 MWAIT, and it only ever worked with vmexit and > > emulation on the host side. > > > I think I have an idea. It is probably one of the monitor bugs > on this host. > > X86_BUG_CLFLUSH_MONITOR or X86_BUG_MONITOR. > > If you tell guest you have a CPU that does not need it > but host does need it, then mwait will not work. > > if (c->x86 == 6 && boot_cpu_has(X86_FEATURE_CLFLUSH) && > (c->x86_model == 29 || c->x86_model == 46 || c->x86_model == 47)) > set_cpu_bug(c, X86_BUG_CLFLUSH_MONITOR); > > > if (c->x86 == 6 && boot_cpu_has(X86_FEATURE_MWAIT) && > ((c->x86_model == INTEL_FAM6_ATOM_GOLDMONT))) > set_cpu_bug(c, X86_BUG_MONITOR); > > what did you say your host model is? # dmidecode -t1 # dmidecode 2.12 SMBIOS 2.4 present. Handle 0x0021, DMI type 1, 27 bytes System Information Manufacturer: Apple Computer, Inc. Product Name: MacPro1,1 Version: 1.0 Serial Number: G87030UEUPZ UUID: 9CFE245E-D0C8-BD45-A79F-54EA5FBD3D97 Wake-up Type: Power Switch SKU Number: System SKU# Family: MacPro Thx, --G ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-16 16:52 ` Gabriel L. Somlo @ 2017-03-16 16:54 ` Gabriel L. Somlo 2017-03-16 17:14 ` Michael S. Tsirkin 0 siblings, 1 reply; 54+ messages in thread From: Gabriel L. Somlo @ 2017-03-16 16:54 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Radim Krčmář, linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc On Thu, Mar 16, 2017 at 12:52:32PM -0400, Gabriel L. Somlo wrote: > On Thu, Mar 16, 2017 at 06:45:02PM +0200, Michael S. Tsirkin wrote: > > On Thu, Mar 16, 2017 at 12:16:13PM -0400, Gabriel L. Somlo wrote: > > > On Thu, Mar 16, 2017 at 04:35:18PM +0100, Radim Krčmář wrote: > > > > 2017-03-16 10:58-0400, Gabriel L. Somlo: > > > > > On Thu, Mar 16, 2017 at 04:04:12PM +0200, Michael S. Tsirkin wrote: > > > > > > On Thu, Mar 16, 2017 at 09:24:27AM -0400, Gabriel L. Somlo wrote: > > > > > > > After studying your patch a bit more carefully (sorry, it's crazy > > > > > > > around here right now :) ) I realized you're simply trying to > > > > > > > (selectively) decide when to exit L1 and emulate as NOP vs. when to > > > > > > > just allow L1 to execute MONITOR & MWAIT natively. > > > > > > > > > > > > > > Is that right ? Because if so, the issues I saw on my MacPro1,1 are > > > > > > > weird and inexplicable, given that allowing L>=1 to run MONITOR/MWAIT > > > > > > > natively was one of the options Alex Graf and Rene Rebe used back in > > > > > > > the very early days of OS X on QEMU, at the time I got involved with > > > > > > > that project. Here's part of an out of tree patch against 3.4 which did > > > > > > > just that, and worked as far as I remember on *any* MWAIT capable > > > > > > > intel chip I had access to back in 2010: > > > > > > > > > > > > > > ############################################################################## > > > > > > > # 99-mwait.patch.kvm-kmod (Rene Rebe <rene@exactcode.de>) 2010-04-27 > > > > > > > ############################################################################## > > > > > > > diff -pNarU5 linux-3.4/arch/x86/kvm/cpuid.c linux-3.4-mac/arch/x86/kvm/cpuid.c > > > > > > > --- linux-3.4/arch/x86/kvm/cpuid.c 2012-05-20 18:29:13.000000000 -0400 > > > > > > > +++ linux-3.4-mac/arch/x86/kvm/cpuid.c 2012-10-09 11:42:59.921215750 -0400 > > > > > > > @@ -222,11 +222,11 @@ static int do_cpuid_ent(struct kvm_cpuid > > > > > > > f_nx | 0 /* Reserved */ | F(MMXEXT) | F(MMX) | > > > > > > > F(FXSR) | F(FXSR_OPT) | f_gbpages | f_rdtscp | > > > > > > > 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW); > > > > > > > /* cpuid 1.ecx */ > > > > > > > const u32 kvm_supported_word4_x86_features = > > > > > > > - F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | > > > > > > > + F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ | > > > > > > > 0 /* DS-CPL, VMX, SMX, EST */ | > > > > > > > 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | > > > > > > > F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ | > > > > > > > 0 /* Reserved, DCA */ | F(XMM4_1) | > > > > > > > F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) | > > > > > > > diff -pNarU5 linux-3.4/arch/x86/kvm/svm.c linux-3.4-mac/arch/x86/kvm/svm.c > > > > > > > --- linux-3.4/arch/x86/kvm/svm.c 2012-05-20 18:29:13.000000000 -0400 > > > > > > > +++ linux-3.4-mac/arch/x86/kvm/svm.c 2012-10-09 11:44:41.598997481 -0400 > > > > > > > @@ -1102,12 +1102,10 @@ static void init_vmcb(struct vcpu_svm *s > > > > > > > set_intercept(svm, INTERCEPT_VMSAVE); > > > > > > > set_intercept(svm, INTERCEPT_STGI); > > > > > > > set_intercept(svm, INTERCEPT_CLGI); > > > > > > > set_intercept(svm, INTERCEPT_SKINIT); > > > > > > > set_intercept(svm, INTERCEPT_WBINVD); > > > > > > > - set_intercept(svm, INTERCEPT_MONITOR); > > > > > > > - set_intercept(svm, INTERCEPT_MWAIT); > > > > > > > set_intercept(svm, INTERCEPT_XSETBV); > > > > > > > > > > > > > > control->iopm_base_pa = iopm_base; > > > > > > > control->msrpm_base_pa = __pa(svm->msrpm); > > > > > > > control->int_ctl = V_INTR_MASKING_MASK; > > > > > > > diff -pNarU5 linux-3.4/arch/x86/kvm/vmx.c linux-3.4-mac/arch/x86/kvm/vmx.c > > > > > > > --- linux-3.4/arch/x86/kvm/vmx.c 2012-05-20 18:29:13.000000000 -0400 > > > > > > > +++ linux-3.4-mac/arch/x86/kvm/vmx.c 2012-10-09 11:42:59.925215977 -0400 > > > > > > > @@ -1938,11 +1938,11 @@ static __init void nested_vmx_setup_ctls > > > > > > > nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high); > > > > > > > nested_vmx_procbased_ctls_low = 0; > > > > > > > nested_vmx_procbased_ctls_high &= > > > > > > > CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_USE_TSC_OFFSETING | > > > > > > > CPU_BASED_HLT_EXITING | CPU_BASED_INVLPG_EXITING | > > > > > > > - CPU_BASED_MWAIT_EXITING | CPU_BASED_CR3_LOAD_EXITING | > > > > > > > + CPU_BASED_CR3_LOAD_EXITING | > > > > > > > CPU_BASED_CR3_STORE_EXITING | > > > > > > > #ifdef CONFIG_X86_64 > > > > > > > CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING | > > > > > > > #endif > > > > > > > CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING | > > > > > > > @@ -2404,12 +2404,10 @@ static __init int setup_vmcs_config(stru > > > > > > > CPU_BASED_CR3_LOAD_EXITING | > > > > > > > CPU_BASED_CR3_STORE_EXITING | > > > > > > > CPU_BASED_USE_IO_BITMAPS | > > > > > > > CPU_BASED_MOV_DR_EXITING | > > > > > > > CPU_BASED_USE_TSC_OFFSETING | > > > > > > > - CPU_BASED_MWAIT_EXITING | > > > > > > > - CPU_BASED_MONITOR_EXITING | > > > > > > > CPU_BASED_INVLPG_EXITING | > > > > > > > CPU_BASED_RDPMC_EXITING; > > > > > > > > > > > > > > opt = CPU_BASED_TPR_SHADOW | > > > > > > > CPU_BASED_USE_MSR_BITMAPS | > > > > > > > > > > > > > > If all you're trying to do is (selectively) revert to this behavior, > > > > > > > that "shouldn't" mess it up for the MacPro either, so I'm thoroughly > > > > > > > confused at this point :) > > > > > > > > > > > > Yes. Me too. Want to try that other patch and see what happens? > > > > > > > > > > You mean the old 3.4 patch against current KVM ? I'll try to do that, > > > > > might take me a while :) > > > > > > > > Michael's patch already did most of that, you just need to add > > > > > > > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > > > > index efde6cc50875..b12f07d4ce17 100644 > > > > --- a/arch/x86/kvm/cpuid.c > > > > +++ b/arch/x86/kvm/cpuid.c > > > > @@ -348,7 +348,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, > > > > const u32 kvm_cpuid_1_ecx_x86_features = > > > > /* NOTE: MONITOR (and MWAIT) are emulated as NOP, > > > > * but *not* advertised to guests via CPUID ! */ > > > > - F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | > > > > + F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ | > > > > 0 /* DS-CPL, VMX, SMX, EST */ | > > > > 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | > > > > F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ | > > > > > > > > Note: this will never be upstream, because mwait isn't what we want by > > > > default. :) > > > > > > But since OS X doesn't check CPUID and simply runs MONITOR & MWAIT > > > assuming they're present, the above one-liner would make no > > > difference. If everything else in the old patch I quoted is identical > > > to what Michael does, then I don't know -- maybe the MacPro1,1 has > > > really broken L>=1 MWAIT, and it only ever worked with vmexit and > > > emulation on the host side. > > > > > > I think I have an idea. It is probably one of the monitor bugs > > on this host. > > > > X86_BUG_CLFLUSH_MONITOR or X86_BUG_MONITOR. > > > > If you tell guest you have a CPU that does not need it > > but host does need it, then mwait will not work. > > > > if (c->x86 == 6 && boot_cpu_has(X86_FEATURE_CLFLUSH) && > > (c->x86_model == 29 || c->x86_model == 46 || c->x86_model == 47)) > > set_cpu_bug(c, X86_BUG_CLFLUSH_MONITOR); > > > > > > if (c->x86 == 6 && boot_cpu_has(X86_FEATURE_MWAIT) && > > ((c->x86_model == INTEL_FAM6_ATOM_GOLDMONT))) > > set_cpu_bug(c, X86_BUG_MONITOR); > > > > what did you say your host model is? > > # dmidecode -t1 > # dmidecode 2.12 > SMBIOS 2.4 present. > > Handle 0x0021, DMI type 1, 27 bytes > System Information > Manufacturer: Apple Computer, Inc. > Product Name: MacPro1,1 > Version: 1.0 > Serial Number: G87030UEUPZ > UUID: 9CFE245E-D0C8-BD45-A79F-54EA5FBD3D97 > Wake-up Type: Power Switch > SKU Number: System SKU# > Family: MacPro And, probably more usefully: [... ommitting 0,1,2 ...] processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz stepping : 6 microcode : 0xd2 cpu MHz : 2659.998 cache size : 4096 KB physical id : 3 siblings : 2 core id : 0 cpu cores : 2 apicid : 6 initial apicid : 6 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow dtherm bugs : bogomips : 5320.05 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-16 16:54 ` Gabriel L. Somlo @ 2017-03-16 17:14 ` Michael S. Tsirkin 2017-03-16 17:38 ` Radim Krčmář 0 siblings, 1 reply; 54+ messages in thread From: Michael S. Tsirkin @ 2017-03-16 17:14 UTC (permalink / raw) To: Gabriel L. Somlo Cc: Radim Krčmář, linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc On Thu, Mar 16, 2017 at 12:54:50PM -0400, Gabriel L. Somlo wrote: > On Thu, Mar 16, 2017 at 12:52:32PM -0400, Gabriel L. Somlo wrote: > > On Thu, Mar 16, 2017 at 06:45:02PM +0200, Michael S. Tsirkin wrote: > > > On Thu, Mar 16, 2017 at 12:16:13PM -0400, Gabriel L. Somlo wrote: > > > > On Thu, Mar 16, 2017 at 04:35:18PM +0100, Radim Krčmář wrote: > > > > > 2017-03-16 10:58-0400, Gabriel L. Somlo: > > > > > > On Thu, Mar 16, 2017 at 04:04:12PM +0200, Michael S. Tsirkin wrote: > > > > > > > On Thu, Mar 16, 2017 at 09:24:27AM -0400, Gabriel L. Somlo wrote: > > > > > > > > After studying your patch a bit more carefully (sorry, it's crazy > > > > > > > > around here right now :) ) I realized you're simply trying to > > > > > > > > (selectively) decide when to exit L1 and emulate as NOP vs. when to > > > > > > > > just allow L1 to execute MONITOR & MWAIT natively. > > > > > > > > > > > > > > > > Is that right ? Because if so, the issues I saw on my MacPro1,1 are > > > > > > > > weird and inexplicable, given that allowing L>=1 to run MONITOR/MWAIT > > > > > > > > natively was one of the options Alex Graf and Rene Rebe used back in > > > > > > > > the very early days of OS X on QEMU, at the time I got involved with > > > > > > > > that project. Here's part of an out of tree patch against 3.4 which did > > > > > > > > just that, and worked as far as I remember on *any* MWAIT capable > > > > > > > > intel chip I had access to back in 2010: > > > > > > > > > > > > > > > > ############################################################################## > > > > > > > > # 99-mwait.patch.kvm-kmod (Rene Rebe <rene@exactcode.de>) 2010-04-27 > > > > > > > > ############################################################################## > > > > > > > > diff -pNarU5 linux-3.4/arch/x86/kvm/cpuid.c linux-3.4-mac/arch/x86/kvm/cpuid.c > > > > > > > > --- linux-3.4/arch/x86/kvm/cpuid.c 2012-05-20 18:29:13.000000000 -0400 > > > > > > > > +++ linux-3.4-mac/arch/x86/kvm/cpuid.c 2012-10-09 11:42:59.921215750 -0400 > > > > > > > > @@ -222,11 +222,11 @@ static int do_cpuid_ent(struct kvm_cpuid > > > > > > > > f_nx | 0 /* Reserved */ | F(MMXEXT) | F(MMX) | > > > > > > > > F(FXSR) | F(FXSR_OPT) | f_gbpages | f_rdtscp | > > > > > > > > 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW); > > > > > > > > /* cpuid 1.ecx */ > > > > > > > > const u32 kvm_supported_word4_x86_features = > > > > > > > > - F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | > > > > > > > > + F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ | > > > > > > > > 0 /* DS-CPL, VMX, SMX, EST */ | > > > > > > > > 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | > > > > > > > > F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ | > > > > > > > > 0 /* Reserved, DCA */ | F(XMM4_1) | > > > > > > > > F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) | > > > > > > > > diff -pNarU5 linux-3.4/arch/x86/kvm/svm.c linux-3.4-mac/arch/x86/kvm/svm.c > > > > > > > > --- linux-3.4/arch/x86/kvm/svm.c 2012-05-20 18:29:13.000000000 -0400 > > > > > > > > +++ linux-3.4-mac/arch/x86/kvm/svm.c 2012-10-09 11:44:41.598997481 -0400 > > > > > > > > @@ -1102,12 +1102,10 @@ static void init_vmcb(struct vcpu_svm *s > > > > > > > > set_intercept(svm, INTERCEPT_VMSAVE); > > > > > > > > set_intercept(svm, INTERCEPT_STGI); > > > > > > > > set_intercept(svm, INTERCEPT_CLGI); > > > > > > > > set_intercept(svm, INTERCEPT_SKINIT); > > > > > > > > set_intercept(svm, INTERCEPT_WBINVD); > > > > > > > > - set_intercept(svm, INTERCEPT_MONITOR); > > > > > > > > - set_intercept(svm, INTERCEPT_MWAIT); > > > > > > > > set_intercept(svm, INTERCEPT_XSETBV); > > > > > > > > > > > > > > > > control->iopm_base_pa = iopm_base; > > > > > > > > control->msrpm_base_pa = __pa(svm->msrpm); > > > > > > > > control->int_ctl = V_INTR_MASKING_MASK; > > > > > > > > diff -pNarU5 linux-3.4/arch/x86/kvm/vmx.c linux-3.4-mac/arch/x86/kvm/vmx.c > > > > > > > > --- linux-3.4/arch/x86/kvm/vmx.c 2012-05-20 18:29:13.000000000 -0400 > > > > > > > > +++ linux-3.4-mac/arch/x86/kvm/vmx.c 2012-10-09 11:42:59.925215977 -0400 > > > > > > > > @@ -1938,11 +1938,11 @@ static __init void nested_vmx_setup_ctls > > > > > > > > nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high); > > > > > > > > nested_vmx_procbased_ctls_low = 0; > > > > > > > > nested_vmx_procbased_ctls_high &= > > > > > > > > CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_USE_TSC_OFFSETING | > > > > > > > > CPU_BASED_HLT_EXITING | CPU_BASED_INVLPG_EXITING | > > > > > > > > - CPU_BASED_MWAIT_EXITING | CPU_BASED_CR3_LOAD_EXITING | > > > > > > > > + CPU_BASED_CR3_LOAD_EXITING | > > > > > > > > CPU_BASED_CR3_STORE_EXITING | > > > > > > > > #ifdef CONFIG_X86_64 > > > > > > > > CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING | > > > > > > > > #endif > > > > > > > > CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING | > > > > > > > > @@ -2404,12 +2404,10 @@ static __init int setup_vmcs_config(stru > > > > > > > > CPU_BASED_CR3_LOAD_EXITING | > > > > > > > > CPU_BASED_CR3_STORE_EXITING | > > > > > > > > CPU_BASED_USE_IO_BITMAPS | > > > > > > > > CPU_BASED_MOV_DR_EXITING | > > > > > > > > CPU_BASED_USE_TSC_OFFSETING | > > > > > > > > - CPU_BASED_MWAIT_EXITING | > > > > > > > > - CPU_BASED_MONITOR_EXITING | > > > > > > > > CPU_BASED_INVLPG_EXITING | > > > > > > > > CPU_BASED_RDPMC_EXITING; > > > > > > > > > > > > > > > > opt = CPU_BASED_TPR_SHADOW | > > > > > > > > CPU_BASED_USE_MSR_BITMAPS | > > > > > > > > > > > > > > > > If all you're trying to do is (selectively) revert to this behavior, > > > > > > > > that "shouldn't" mess it up for the MacPro either, so I'm thoroughly > > > > > > > > confused at this point :) > > > > > > > > > > > > > > Yes. Me too. Want to try that other patch and see what happens? > > > > > > > > > > > > You mean the old 3.4 patch against current KVM ? I'll try to do that, > > > > > > might take me a while :) > > > > > > > > > > Michael's patch already did most of that, you just need to add > > > > > > > > > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > > > > > index efde6cc50875..b12f07d4ce17 100644 > > > > > --- a/arch/x86/kvm/cpuid.c > > > > > +++ b/arch/x86/kvm/cpuid.c > > > > > @@ -348,7 +348,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, > > > > > const u32 kvm_cpuid_1_ecx_x86_features = > > > > > /* NOTE: MONITOR (and MWAIT) are emulated as NOP, > > > > > * but *not* advertised to guests via CPUID ! */ > > > > > - F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | > > > > > + F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ | > > > > > 0 /* DS-CPL, VMX, SMX, EST */ | > > > > > 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | > > > > > F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ | > > > > > > > > > > Note: this will never be upstream, because mwait isn't what we want by > > > > > default. :) > > > > > > > > But since OS X doesn't check CPUID and simply runs MONITOR & MWAIT > > > > assuming they're present, the above one-liner would make no > > > > difference. If everything else in the old patch I quoted is identical > > > > to what Michael does, then I don't know -- maybe the MacPro1,1 has > > > > really broken L>=1 MWAIT, and it only ever worked with vmexit and > > > > emulation on the host side. > > > > > > > > > I think I have an idea. It is probably one of the monitor bugs > > > on this host. > > > > > > X86_BUG_CLFLUSH_MONITOR or X86_BUG_MONITOR. > > > > > > If you tell guest you have a CPU that does not need it > > > but host does need it, then mwait will not work. > > > > > > if (c->x86 == 6 && boot_cpu_has(X86_FEATURE_CLFLUSH) && > > > (c->x86_model == 29 || c->x86_model == 46 || c->x86_model == 47)) > > > set_cpu_bug(c, X86_BUG_CLFLUSH_MONITOR); > > > > > > > > > if (c->x86 == 6 && boot_cpu_has(X86_FEATURE_MWAIT) && > > > ((c->x86_model == INTEL_FAM6_ATOM_GOLDMONT))) > > > set_cpu_bug(c, X86_BUG_MONITOR); > > > > > > what did you say your host model is? > > > > # dmidecode -t1 > > # dmidecode 2.12 > > SMBIOS 2.4 present. > > > > Handle 0x0021, DMI type 1, 27 bytes > > System Information > > Manufacturer: Apple Computer, Inc. > > Product Name: MacPro1,1 > > Version: 1.0 > > Serial Number: G87030UEUPZ > > UUID: 9CFE245E-D0C8-BD45-A79F-54EA5FBD3D97 > > Wake-up Type: Power Switch > > SKU Number: System SKU# > > Family: MacPro > > And, probably more usefully: > > [... ommitting 0,1,2 ...] > > processor : 3 > vendor_id : GenuineIntel > cpu family : 6 > model : 15 > model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz > stepping : 6 > microcode : 0xd2 > cpu MHz : 2659.998 > cache size : 4096 KB > physical id : 3 > siblings : 2 > core id : 0 > cpu cores : 2 > apicid : 6 > initial apicid : 6 > fpu : yes > fpu_exception : yes > cpuid level : 10 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow dtherm > bugs : > bogomips : 5320.05 > clflush size : 64 > cache_alignment : 64 > address sizes : 36 bits physical, 48 bits virtual > power management: Hmm nope not one of these. Need to poke at errata some more. -- MST ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-16 17:14 ` Michael S. Tsirkin @ 2017-03-16 17:38 ` Radim Krčmář 0 siblings, 0 replies; 54+ messages in thread From: Radim Krčmář @ 2017-03-16 17:38 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Gabriel L. Somlo, linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc 2017-03-16 19:14+0200, Michael S. Tsirkin: > On Thu, Mar 16, 2017 at 12:54:50PM -0400, Gabriel L. Somlo wrote: > > On Thu, Mar 16, 2017 at 12:52:32PM -0400, Gabriel L. Somlo wrote: > > > On Thu, Mar 16, 2017 at 06:45:02PM +0200, Michael S. Tsirkin wrote: > > > > On Thu, Mar 16, 2017 at 12:16:13PM -0400, Gabriel L. Somlo wrote: > > > > > On Thu, Mar 16, 2017 at 04:35:18PM +0100, Radim Krčmář wrote: > > > > > > 2017-03-16 10:58-0400, Gabriel L. Somlo: > > > > > > > On Thu, Mar 16, 2017 at 04:04:12PM +0200, Michael S. Tsirkin wrote: > > > > > > > > On Thu, Mar 16, 2017 at 09:24:27AM -0400, Gabriel L. Somlo wrote: > > > > > > > > > After studying your patch a bit more carefully (sorry, it's crazy > > > > > > > > > around here right now :) ) I realized you're simply trying to > > > > > > > > > (selectively) decide when to exit L1 and emulate as NOP vs. when to > > > > > > > > > just allow L1 to execute MONITOR & MWAIT natively. > > > > > > > > > > > > > > > > > > Is that right ? Because if so, the issues I saw on my MacPro1,1 are > > > > > > > > > weird and inexplicable, given that allowing L>=1 to run MONITOR/MWAIT > > > > > > > > > natively was one of the options Alex Graf and Rene Rebe used back in > > > > > > > > > the very early days of OS X on QEMU, at the time I got involved with > > > > > > > > > that project. Here's part of an out of tree patch against 3.4 which did > > > > > > > > > just that, and worked as far as I remember on *any* MWAIT capable > > > > > > > > > intel chip I had access to back in 2010: > > > > > > > > > > > > > > > > > > ############################################################################## > > > > > > > > > # 99-mwait.patch.kvm-kmod (Rene Rebe <rene@exactcode.de>) 2010-04-27 > > > > > > > > > ############################################################################## > > > > > > > > > diff -pNarU5 linux-3.4/arch/x86/kvm/cpuid.c linux-3.4-mac/arch/x86/kvm/cpuid.c > > > > > > > > > --- linux-3.4/arch/x86/kvm/cpuid.c 2012-05-20 18:29:13.000000000 -0400 > > > > > > > > > +++ linux-3.4-mac/arch/x86/kvm/cpuid.c 2012-10-09 11:42:59.921215750 -0400 > > > > > > > > > @@ -222,11 +222,11 @@ static int do_cpuid_ent(struct kvm_cpuid > > > > > > > > > f_nx | 0 /* Reserved */ | F(MMXEXT) | F(MMX) | > > > > > > > > > F(FXSR) | F(FXSR_OPT) | f_gbpages | f_rdtscp | > > > > > > > > > 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW); > > > > > > > > > /* cpuid 1.ecx */ > > > > > > > > > const u32 kvm_supported_word4_x86_features = > > > > > > > > > - F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | > > > > > > > > > + F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ | > > > > > > > > > 0 /* DS-CPL, VMX, SMX, EST */ | > > > > > > > > > 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | > > > > > > > > > F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ | > > > > > > > > > 0 /* Reserved, DCA */ | F(XMM4_1) | > > > > > > > > > F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) | > > > > > > > > > diff -pNarU5 linux-3.4/arch/x86/kvm/svm.c linux-3.4-mac/arch/x86/kvm/svm.c > > > > > > > > > --- linux-3.4/arch/x86/kvm/svm.c 2012-05-20 18:29:13.000000000 -0400 > > > > > > > > > +++ linux-3.4-mac/arch/x86/kvm/svm.c 2012-10-09 11:44:41.598997481 -0400 > > > > > > > > > @@ -1102,12 +1102,10 @@ static void init_vmcb(struct vcpu_svm *s > > > > > > > > > set_intercept(svm, INTERCEPT_VMSAVE); > > > > > > > > > set_intercept(svm, INTERCEPT_STGI); > > > > > > > > > set_intercept(svm, INTERCEPT_CLGI); > > > > > > > > > set_intercept(svm, INTERCEPT_SKINIT); > > > > > > > > > set_intercept(svm, INTERCEPT_WBINVD); > > > > > > > > > - set_intercept(svm, INTERCEPT_MONITOR); > > > > > > > > > - set_intercept(svm, INTERCEPT_MWAIT); > > > > > > > > > set_intercept(svm, INTERCEPT_XSETBV); > > > > > > > > > > > > > > > > > > control->iopm_base_pa = iopm_base; > > > > > > > > > control->msrpm_base_pa = __pa(svm->msrpm); > > > > > > > > > control->int_ctl = V_INTR_MASKING_MASK; > > > > > > > > > diff -pNarU5 linux-3.4/arch/x86/kvm/vmx.c linux-3.4-mac/arch/x86/kvm/vmx.c > > > > > > > > > --- linux-3.4/arch/x86/kvm/vmx.c 2012-05-20 18:29:13.000000000 -0400 > > > > > > > > > +++ linux-3.4-mac/arch/x86/kvm/vmx.c 2012-10-09 11:42:59.925215977 -0400 > > > > > > > > > @@ -1938,11 +1938,11 @@ static __init void nested_vmx_setup_ctls > > > > > > > > > nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high); > > > > > > > > > nested_vmx_procbased_ctls_low = 0; > > > > > > > > > nested_vmx_procbased_ctls_high &= > > > > > > > > > CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_USE_TSC_OFFSETING | > > > > > > > > > CPU_BASED_HLT_EXITING | CPU_BASED_INVLPG_EXITING | > > > > > > > > > - CPU_BASED_MWAIT_EXITING | CPU_BASED_CR3_LOAD_EXITING | > > > > > > > > > + CPU_BASED_CR3_LOAD_EXITING | > > > > > > > > > CPU_BASED_CR3_STORE_EXITING | > > > > > > > > > #ifdef CONFIG_X86_64 > > > > > > > > > CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING | > > > > > > > > > #endif > > > > > > > > > CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING | > > > > > > > > > @@ -2404,12 +2404,10 @@ static __init int setup_vmcs_config(stru > > > > > > > > > CPU_BASED_CR3_LOAD_EXITING | > > > > > > > > > CPU_BASED_CR3_STORE_EXITING | > > > > > > > > > CPU_BASED_USE_IO_BITMAPS | > > > > > > > > > CPU_BASED_MOV_DR_EXITING | > > > > > > > > > CPU_BASED_USE_TSC_OFFSETING | > > > > > > > > > - CPU_BASED_MWAIT_EXITING | > > > > > > > > > - CPU_BASED_MONITOR_EXITING | > > > > > > > > > CPU_BASED_INVLPG_EXITING | > > > > > > > > > CPU_BASED_RDPMC_EXITING; > > > > > > > > > > > > > > > > > > opt = CPU_BASED_TPR_SHADOW | > > > > > > > > > CPU_BASED_USE_MSR_BITMAPS | > > > > > > > > > > > > > > > > > > If all you're trying to do is (selectively) revert to this behavior, > > > > > > > > > that "shouldn't" mess it up for the MacPro either, so I'm thoroughly > > > > > > > > > confused at this point :) > > > > > > > > > > > > > > > > Yes. Me too. Want to try that other patch and see what happens? > > > > > > > > > > > > > > You mean the old 3.4 patch against current KVM ? I'll try to do that, > > > > > > > might take me a while :) > > > > > > > > > > > > Michael's patch already did most of that, you just need to add > > > > > > > > > > > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > > > > > > index efde6cc50875..b12f07d4ce17 100644 > > > > > > --- a/arch/x86/kvm/cpuid.c > > > > > > +++ b/arch/x86/kvm/cpuid.c > > > > > > @@ -348,7 +348,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, > > > > > > const u32 kvm_cpuid_1_ecx_x86_features = > > > > > > /* NOTE: MONITOR (and MWAIT) are emulated as NOP, > > > > > > * but *not* advertised to guests via CPUID ! */ > > > > > > - F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | > > > > > > + F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ | > > > > > > 0 /* DS-CPL, VMX, SMX, EST */ | > > > > > > 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | > > > > > > F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ | > > > > > > > > > > > > Note: this will never be upstream, because mwait isn't what we want by > > > > > > default. :) > > > > > > > > > > But since OS X doesn't check CPUID and simply runs MONITOR & MWAIT > > > > > assuming they're present, the above one-liner would make no > > > > > difference. If everything else in the old patch I quoted is identical > > > > > to what Michael does, then I don't know -- maybe the MacPro1,1 has > > > > > really broken L>=1 MWAIT, and it only ever worked with vmexit and > > > > > emulation on the host side. > > > > > > > > > > > > I think I have an idea. It is probably one of the monitor bugs > > > > on this host. > > > > > > > > X86_BUG_CLFLUSH_MONITOR or X86_BUG_MONITOR. > > > > > > > > If you tell guest you have a CPU that does not need it > > > > but host does need it, then mwait will not work. > > > > > > > > if (c->x86 == 6 && boot_cpu_has(X86_FEATURE_CLFLUSH) && > > > > (c->x86_model == 29 || c->x86_model == 46 || c->x86_model == 47)) > > > > set_cpu_bug(c, X86_BUG_CLFLUSH_MONITOR); > > > > > > > > > > > > if (c->x86 == 6 && boot_cpu_has(X86_FEATURE_MWAIT) && > > > > ((c->x86_model == INTEL_FAM6_ATOM_GOLDMONT))) > > > > set_cpu_bug(c, X86_BUG_MONITOR); > > > > > > > > what did you say your host model is? > > > > > > # dmidecode -t1 > > > # dmidecode 2.12 > > > SMBIOS 2.4 present. > > > > > > Handle 0x0021, DMI type 1, 27 bytes > > > System Information > > > Manufacturer: Apple Computer, Inc. > > > Product Name: MacPro1,1 > > > Version: 1.0 > > > Serial Number: G87030UEUPZ > > > UUID: 9CFE245E-D0C8-BD45-A79F-54EA5FBD3D97 > > > Wake-up Type: Power Switch > > > SKU Number: System SKU# > > > Family: MacPro > > > > And, probably more usefully: > > > > [... ommitting 0,1,2 ...] > > > > processor : 3 > > vendor_id : GenuineIntel > > cpu family : 6 > > model : 15 > > model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz > > stepping : 6 > > microcode : 0xd2 > > cpu MHz : 2659.998 > > cache size : 4096 KB > > physical id : 3 > > siblings : 2 > > core id : 0 > > cpu cores : 2 > > apicid : 6 > > initial apicid : 6 > > fpu : yes > > fpu_exception : yes > > cpuid level : 10 > > wp : yes > > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow dtherm > > bugs : > > bogomips : 5320.05 > > clflush size : 64 > > cache_alignment : 64 > > address sizes : 36 bits physical, 48 bits virtual > > power management: > > > Hmm nope not one of these. > Need to poke at errata some more. Intel lists two bugs with MWAIT on that Model: http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-5100-spec-update.pdf AG36. Split Locked Stores May not Trigger the Monitoring Hardware AG106. A REP STOS/MOVS to a MONITOR/MWAIT Address Range May Prevent Triggering of the Monitoring Hardware The latter can be dimissed as it should have been hit on bare metal as well. The former looks pretty unlikely as well, but maybe the guest maps w/b what bare metal would map differently? ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-16 13:24 ` Gabriel L. Somlo 2017-03-16 14:04 ` Michael S. Tsirkin @ 2017-03-16 14:08 ` Radim Krčmář 2017-03-16 15:44 ` Gabriel L. Somlo 1 sibling, 1 reply; 54+ messages in thread From: Radim Krčmář @ 2017-03-16 14:08 UTC (permalink / raw) To: Gabriel L. Somlo Cc: Michael S. Tsirkin, linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc 2017-03-16 09:24-0400, Gabriel L. Somlo: > On Thu, Mar 16, 2017 at 01:41:28AM +0200, Michael S. Tsirkin wrote: > > On Wed, Mar 15, 2017 at 07:35:34PM -0400, Gabriel L. Somlo wrote: > > > On Wed, Mar 15, 2017 at 11:22:18PM +0200, Michael S. Tsirkin wrote: > > > > Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem: > > > > unless explicitly provided with kernel command line argument > > > > "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability, > > > > without checking CPUID. > > > > > > > > We currently emulate that as a NOP but on VMX we can do better: let > > > > guest stop the CPU until timer, IPI or memory change. CPU will be busy > > > > but that isn't any worse than a NOP emulation. > > > > > > > > Note that mwait within guests is not the same as on real hardware > > > > because halt causes an exit while mwait doesn't. For this reason it > > > > might not be a good idea to use the regular MWAIT flag in CPUID to > > > > signal this capability. Add a flag in the hypervisor leaf instead. > > > > > > > > Additionally, we add a capability for QEMU - e.g. if it knows there's an > > > > isolated CPU dedicated for the VCPU it can set the standard MWAIT flag > > > > to improve guest behaviour. > > > > > > Same behavior (on the mac pro 1,1 running F22 with custom-compiled > > > kernel from kvm git master, plus this patch on top). > > > > > > The OS X 10.7 kernel hangs (or at least progresses extremely slowly) > > > on boot, does not bring up guest graphical interface within the first > > > 10 minutes that I waited for it. That, in contrast with the default > > > nop-based emulation where the guest comes up within 30 seconds. > > > > > > Thanks a lot, meanwhile I'll try to write a unit-test and experiment > > with various behaviours. > > > > > I will run another round of tests on a newer Mac (4-year-old macbook > > > air) and report back tomorrow. > > > > > > Going off on a tangent, why would encouraging otherwise well-behaved > > > guests (like linux ones, for example) to use MWAIT be desirable to > > > begin with ? Is it a matter of minimizing the overhead associated with > > > exiting and re-entering L1 ? Because if so, AFAIR staying inside L1 and > > > running guest-mode MWAIT in a tight loop will actually waste the host > > > CPU without the opportunity to yield to some other L0 thread. Sorry if > > > I fell into the middle of an ongoing conversation on this and missed > > > most of the relevant context, in which case please feel free to ignore > > > me... :) > > > > > > Thanks, > > > --G > > > > It's just some experiments I'm running, I'm not ready to describe it > > yet. I thought this part might be useful to at least some guests, so > > trying to upstream it right now. > > OK, so on a macbook air running F25 and the latest kvm git master plus > your v5 patch (4.11.0-rc2+), things appear to work. > > host-side cpuid output: > eax=0x000040 ebx=0x000040 ecx=0x000003 edx=0x021120 > > guest-side cpuid output: > eax=00000000 ebx=00000000 ecx=0x000003 edx=00000000 > > processor : 3 > vendor_id : GenuineIntel > cpu family : 6 > model : 42 > model name : Intel(R) Core(TM) i7-2677M CPU @ 1.80GHz > stepping : 7 > microcode : 0x29 > cpu MHz : 1157.849 > cache size : 4096 KB > physical id : 0 > siblings : 4 > core id : 1 > cpu cores : 2 > apicid : 3 > initial apicid : 3 > fpu : yes > fpu_exception : yes > cpuid level : 13 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts > bugs : > bogomips : 3604.68 > clflush size : 64 > cache_alignment : 64 > address sizes : 36 bits physical, 48 bits virtual > power management: > > After studying your patch a bit more carefully (sorry, it's crazy > around here right now :) ) I realized you're simply trying to > (selectively) decide when to exit L1 and emulate as NOP vs. when to > just allow L1 to execute MONITOR & MWAIT natively. > > Is that right ? Because if so, the issues I saw on my MacPro1,1 are > weird and inexplicable, given that allowing L>=1 to run MONITOR/MWAIT > natively was one of the options Alex Graf and Rene Rebe used back in > the very early days of OS X on QEMU, at the time I got involved with > that project. Here's part of an out of tree patch against 3.4 which did > just that, and worked as far as I remember on *any* MWAIT capable > intel chip I had access to back in 2010: > > ############################################################################## > # 99-mwait.patch.kvm-kmod (Rene Rebe <rene@exactcode.de>) 2010-04-27 > ############################################################################## > diff -pNarU5 linux-3.4/arch/x86/kvm/cpuid.c linux-3.4-mac/arch/x86/kvm/cpuid.c > --- linux-3.4/arch/x86/kvm/cpuid.c 2012-05-20 18:29:13.000000000 -0400 > +++ linux-3.4-mac/arch/x86/kvm/cpuid.c 2012-10-09 11:42:59.921215750 -0400 > @@ -222,11 +222,11 @@ static int do_cpuid_ent(struct kvm_cpuid > f_nx | 0 /* Reserved */ | F(MMXEXT) | F(MMX) | > F(FXSR) | F(FXSR_OPT) | f_gbpages | f_rdtscp | > 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW); > /* cpuid 1.ecx */ > const u32 kvm_supported_word4_x86_features = > - F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | > + F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ | > 0 /* DS-CPL, VMX, SMX, EST */ | > 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | > F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ | > 0 /* Reserved, DCA */ | F(XMM4_1) | > F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) | > diff -pNarU5 linux-3.4/arch/x86/kvm/svm.c linux-3.4-mac/arch/x86/kvm/svm.c > --- linux-3.4/arch/x86/kvm/svm.c 2012-05-20 18:29:13.000000000 -0400 > +++ linux-3.4-mac/arch/x86/kvm/svm.c 2012-10-09 11:44:41.598997481 -0400 > @@ -1102,12 +1102,10 @@ static void init_vmcb(struct vcpu_svm *s > set_intercept(svm, INTERCEPT_VMSAVE); > set_intercept(svm, INTERCEPT_STGI); > set_intercept(svm, INTERCEPT_CLGI); > set_intercept(svm, INTERCEPT_SKINIT); > set_intercept(svm, INTERCEPT_WBINVD); > - set_intercept(svm, INTERCEPT_MONITOR); > - set_intercept(svm, INTERCEPT_MWAIT); > set_intercept(svm, INTERCEPT_XSETBV); > > control->iopm_base_pa = iopm_base; > control->msrpm_base_pa = __pa(svm->msrpm); > control->int_ctl = V_INTR_MASKING_MASK; > diff -pNarU5 linux-3.4/arch/x86/kvm/vmx.c linux-3.4-mac/arch/x86/kvm/vmx.c > --- linux-3.4/arch/x86/kvm/vmx.c 2012-05-20 18:29:13.000000000 -0400 > +++ linux-3.4-mac/arch/x86/kvm/vmx.c 2012-10-09 11:42:59.925215977 -0400 > @@ -1938,11 +1938,11 @@ static __init void nested_vmx_setup_ctls > nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high); > nested_vmx_procbased_ctls_low = 0; > nested_vmx_procbased_ctls_high &= > CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_USE_TSC_OFFSETING | > CPU_BASED_HLT_EXITING | CPU_BASED_INVLPG_EXITING | > - CPU_BASED_MWAIT_EXITING | CPU_BASED_CR3_LOAD_EXITING | > + CPU_BASED_CR3_LOAD_EXITING | > CPU_BASED_CR3_STORE_EXITING | > #ifdef CONFIG_X86_64 > CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING | > #endif > CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING | > @@ -2404,12 +2404,10 @@ static __init int setup_vmcs_config(stru > CPU_BASED_CR3_LOAD_EXITING | > CPU_BASED_CR3_STORE_EXITING | > CPU_BASED_USE_IO_BITMAPS | > CPU_BASED_MOV_DR_EXITING | > CPU_BASED_USE_TSC_OFFSETING | > - CPU_BASED_MWAIT_EXITING | > - CPU_BASED_MONITOR_EXITING | > CPU_BASED_INVLPG_EXITING | > CPU_BASED_RDPMC_EXITING; > > opt = CPU_BASED_TPR_SHADOW | > CPU_BASED_USE_MSR_BITMAPS | > > If all you're trying to do is (selectively) revert to this behavior, > that "shouldn't" mess it up for the MacPro either, so I'm thoroughly > confused at this point :) > > Back in 2010, running MWAIT in L>=1 behaved 100% exactly like a NOP, > didn't power down the physical CPU, just immediately moved on to the > next instruction. As such, there was no power saving and no > opportunity to yield to another L0 thread either, unlike with NOP > emulation at L0. > > Did that change on newer Intel chips (i.e., is guest-mode MWAIT now > doing something smarter than just acting as a guest-mode NOP) ? Probably, MWAIT in new intel chips enters power saving mode normally. If hardware-executed MWAIT acted as a NOP in your old chip, then that shouldn't be a problem either ... Maybe OS X gets confused into doing something really dumb because we do not expose the MONITOR/MWAIT feature bit correctly. Can you try this QEMU patch on the old hardware? diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 7aa762245a54..4b112e12188a 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -2764,10 +2764,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, break; case 5: /* mwait info: needed for Core compatibility */ - *eax = 0; /* Smallest monitor-line size in bytes */ - *ebx = 0; /* Largest monitor-line size in bytes */ - *ecx = CPUID_MWAIT_EMX | CPUID_MWAIT_IBE; - *edx = 0; + host_cpuid(index, 0, eax, ebx, ecx, edx); break; case 6: /* Thermal and Power Leaf */ diff --git a/target/i386/kvm.c b/target/i386/kvm.c index 55865dbee0aa..1eb78291b093 100644 --- a/target/i386/kvm.c +++ b/target/i386/kvm.c @@ -360,6 +360,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function, if (!kvm_irqchip_in_kernel()) { ret &= ~CPUID_EXT_X2APIC; } + ret |= CPUID_EXT_MONITOR; } else if (function == 6 && reg == R_EAX) { ret |= CPUID_6_EAX_ARAT; /* safe to allow because of emulated APIC */ } else if (function == 7 && index == 0 && reg == R_EBX) { Thanks. ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-16 14:08 ` Radim Krčmář @ 2017-03-16 15:44 ` Gabriel L. Somlo 2017-03-16 15:54 ` Radim Krčmář 0 siblings, 1 reply; 54+ messages in thread From: Gabriel L. Somlo @ 2017-03-16 15:44 UTC (permalink / raw) To: Radim Krčmář Cc: Michael S. Tsirkin, linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc On Thu, Mar 16, 2017 at 03:08:07PM +0100, Radim Krčmář wrote: > 2017-03-16 09:24-0400, Gabriel L. Somlo: > > On Thu, Mar 16, 2017 at 01:41:28AM +0200, Michael S. Tsirkin wrote: > > > On Wed, Mar 15, 2017 at 07:35:34PM -0400, Gabriel L. Somlo wrote: > > > > On Wed, Mar 15, 2017 at 11:22:18PM +0200, Michael S. Tsirkin wrote: > > > > > Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem: > > > > > unless explicitly provided with kernel command line argument > > > > > "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability, > > > > > without checking CPUID. > > > > > > > > > > We currently emulate that as a NOP but on VMX we can do better: let > > > > > guest stop the CPU until timer, IPI or memory change. CPU will be busy > > > > > but that isn't any worse than a NOP emulation. > > > > > > > > > > Note that mwait within guests is not the same as on real hardware > > > > > because halt causes an exit while mwait doesn't. For this reason it > > > > > might not be a good idea to use the regular MWAIT flag in CPUID to > > > > > signal this capability. Add a flag in the hypervisor leaf instead. > > > > > > > > > > Additionally, we add a capability for QEMU - e.g. if it knows there's an > > > > > isolated CPU dedicated for the VCPU it can set the standard MWAIT flag > > > > > to improve guest behaviour. > > > > > > > > Same behavior (on the mac pro 1,1 running F22 with custom-compiled > > > > kernel from kvm git master, plus this patch on top). > > > > > > > > The OS X 10.7 kernel hangs (or at least progresses extremely slowly) > > > > on boot, does not bring up guest graphical interface within the first > > > > 10 minutes that I waited for it. That, in contrast with the default > > > > nop-based emulation where the guest comes up within 30 seconds. > > > > > > > > > Thanks a lot, meanwhile I'll try to write a unit-test and experiment > > > with various behaviours. > > > > > > > I will run another round of tests on a newer Mac (4-year-old macbook > > > > air) and report back tomorrow. > > > > > > > > Going off on a tangent, why would encouraging otherwise well-behaved > > > > guests (like linux ones, for example) to use MWAIT be desirable to > > > > begin with ? Is it a matter of minimizing the overhead associated with > > > > exiting and re-entering L1 ? Because if so, AFAIR staying inside L1 and > > > > running guest-mode MWAIT in a tight loop will actually waste the host > > > > CPU without the opportunity to yield to some other L0 thread. Sorry if > > > > I fell into the middle of an ongoing conversation on this and missed > > > > most of the relevant context, in which case please feel free to ignore > > > > me... :) > > > > > > > > Thanks, > > > > --G > > > > > > It's just some experiments I'm running, I'm not ready to describe it > > > yet. I thought this part might be useful to at least some guests, so > > > trying to upstream it right now. > > > > OK, so on a macbook air running F25 and the latest kvm git master plus > > your v5 patch (4.11.0-rc2+), things appear to work. > > > > host-side cpuid output: > > eax=0x000040 ebx=0x000040 ecx=0x000003 edx=0x021120 > > > > guest-side cpuid output: > > eax=00000000 ebx=00000000 ecx=0x000003 edx=00000000 > > > > processor : 3 > > vendor_id : GenuineIntel > > cpu family : 6 > > model : 42 > > model name : Intel(R) Core(TM) i7-2677M CPU @ 1.80GHz > > stepping : 7 > > microcode : 0x29 > > cpu MHz : 1157.849 > > cache size : 4096 KB > > physical id : 0 > > siblings : 4 > > core id : 1 > > cpu cores : 2 > > apicid : 3 > > initial apicid : 3 > > fpu : yes > > fpu_exception : yes > > cpuid level : 13 > > wp : yes > > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts > > bugs : > > bogomips : 3604.68 > > clflush size : 64 > > cache_alignment : 64 > > address sizes : 36 bits physical, 48 bits virtual > > power management: > > > > After studying your patch a bit more carefully (sorry, it's crazy > > around here right now :) ) I realized you're simply trying to > > (selectively) decide when to exit L1 and emulate as NOP vs. when to > > just allow L1 to execute MONITOR & MWAIT natively. > > > > Is that right ? Because if so, the issues I saw on my MacPro1,1 are > > weird and inexplicable, given that allowing L>=1 to run MONITOR/MWAIT > > natively was one of the options Alex Graf and Rene Rebe used back in > > the very early days of OS X on QEMU, at the time I got involved with > > that project. Here's part of an out of tree patch against 3.4 which did > > just that, and worked as far as I remember on *any* MWAIT capable > > intel chip I had access to back in 2010: > > > > ############################################################################## > > # 99-mwait.patch.kvm-kmod (Rene Rebe <rene@exactcode.de>) 2010-04-27 > > ############################################################################## > > diff -pNarU5 linux-3.4/arch/x86/kvm/cpuid.c linux-3.4-mac/arch/x86/kvm/cpuid.c > > --- linux-3.4/arch/x86/kvm/cpuid.c 2012-05-20 18:29:13.000000000 -0400 > > +++ linux-3.4-mac/arch/x86/kvm/cpuid.c 2012-10-09 11:42:59.921215750 -0400 > > @@ -222,11 +222,11 @@ static int do_cpuid_ent(struct kvm_cpuid > > f_nx | 0 /* Reserved */ | F(MMXEXT) | F(MMX) | > > F(FXSR) | F(FXSR_OPT) | f_gbpages | f_rdtscp | > > 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW); > > /* cpuid 1.ecx */ > > const u32 kvm_supported_word4_x86_features = > > - F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | > > + F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ | > > 0 /* DS-CPL, VMX, SMX, EST */ | > > 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | > > F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ | > > 0 /* Reserved, DCA */ | F(XMM4_1) | > > F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) | > > diff -pNarU5 linux-3.4/arch/x86/kvm/svm.c linux-3.4-mac/arch/x86/kvm/svm.c > > --- linux-3.4/arch/x86/kvm/svm.c 2012-05-20 18:29:13.000000000 -0400 > > +++ linux-3.4-mac/arch/x86/kvm/svm.c 2012-10-09 11:44:41.598997481 -0400 > > @@ -1102,12 +1102,10 @@ static void init_vmcb(struct vcpu_svm *s > > set_intercept(svm, INTERCEPT_VMSAVE); > > set_intercept(svm, INTERCEPT_STGI); > > set_intercept(svm, INTERCEPT_CLGI); > > set_intercept(svm, INTERCEPT_SKINIT); > > set_intercept(svm, INTERCEPT_WBINVD); > > - set_intercept(svm, INTERCEPT_MONITOR); > > - set_intercept(svm, INTERCEPT_MWAIT); > > set_intercept(svm, INTERCEPT_XSETBV); > > > > control->iopm_base_pa = iopm_base; > > control->msrpm_base_pa = __pa(svm->msrpm); > > control->int_ctl = V_INTR_MASKING_MASK; > > diff -pNarU5 linux-3.4/arch/x86/kvm/vmx.c linux-3.4-mac/arch/x86/kvm/vmx.c > > --- linux-3.4/arch/x86/kvm/vmx.c 2012-05-20 18:29:13.000000000 -0400 > > +++ linux-3.4-mac/arch/x86/kvm/vmx.c 2012-10-09 11:42:59.925215977 -0400 > > @@ -1938,11 +1938,11 @@ static __init void nested_vmx_setup_ctls > > nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high); > > nested_vmx_procbased_ctls_low = 0; > > nested_vmx_procbased_ctls_high &= > > CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_USE_TSC_OFFSETING | > > CPU_BASED_HLT_EXITING | CPU_BASED_INVLPG_EXITING | > > - CPU_BASED_MWAIT_EXITING | CPU_BASED_CR3_LOAD_EXITING | > > + CPU_BASED_CR3_LOAD_EXITING | > > CPU_BASED_CR3_STORE_EXITING | > > #ifdef CONFIG_X86_64 > > CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING | > > #endif > > CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING | > > @@ -2404,12 +2404,10 @@ static __init int setup_vmcs_config(stru > > CPU_BASED_CR3_LOAD_EXITING | > > CPU_BASED_CR3_STORE_EXITING | > > CPU_BASED_USE_IO_BITMAPS | > > CPU_BASED_MOV_DR_EXITING | > > CPU_BASED_USE_TSC_OFFSETING | > > - CPU_BASED_MWAIT_EXITING | > > - CPU_BASED_MONITOR_EXITING | > > CPU_BASED_INVLPG_EXITING | > > CPU_BASED_RDPMC_EXITING; > > > > opt = CPU_BASED_TPR_SHADOW | > > CPU_BASED_USE_MSR_BITMAPS | > > > > If all you're trying to do is (selectively) revert to this behavior, > > that "shouldn't" mess it up for the MacPro either, so I'm thoroughly > > confused at this point :) > > > > Back in 2010, running MWAIT in L>=1 behaved 100% exactly like a NOP, > > didn't power down the physical CPU, just immediately moved on to the > > next instruction. As such, there was no power saving and no > > opportunity to yield to another L0 thread either, unlike with NOP > > emulation at L0. > > > > Did that change on newer Intel chips (i.e., is guest-mode MWAIT now > > doing something smarter than just acting as a guest-mode NOP) ? > > Probably, MWAIT in new intel chips enters power saving mode normally. > > If hardware-executed MWAIT acted as a NOP in your old chip, then that > shouldn't be a problem either ... Maybe OS X gets confused into doing > something really dumb because we do not expose the MONITOR/MWAIT feature > bit correctly. > > Can you try this QEMU patch on the old hardware? > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c > index 7aa762245a54..4b112e12188a 100644 > --- a/target/i386/cpu.c > +++ b/target/i386/cpu.c > @@ -2764,10 +2764,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, > break; > case 5: > /* mwait info: needed for Core compatibility */ > - *eax = 0; /* Smallest monitor-line size in bytes */ > - *ebx = 0; /* Largest monitor-line size in bytes */ > - *ecx = CPUID_MWAIT_EMX | CPUID_MWAIT_IBE; > - *edx = 0; > + host_cpuid(index, 0, eax, ebx, ecx, edx); > break; > case 6: > /* Thermal and Power Leaf */ > diff --git a/target/i386/kvm.c b/target/i386/kvm.c > index 55865dbee0aa..1eb78291b093 100644 > --- a/target/i386/kvm.c > +++ b/target/i386/kvm.c > @@ -360,6 +360,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function, > if (!kvm_irqchip_in_kernel()) { > ret &= ~CPUID_EXT_X2APIC; > } > + ret |= CPUID_EXT_MONITOR; > } else if (function == 6 && reg == R_EAX) { > ret |= CPUID_6_EAX_ARAT; /* safe to allow because of emulated APIC */ > } else if (function == 7 && index == 0 && reg == R_EBX) { > > > Thanks. No change, still hangs on boot. Thanks, --G ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-16 15:44 ` Gabriel L. Somlo @ 2017-03-16 15:54 ` Radim Krčmář 2017-03-16 16:26 ` Gabriel L. Somlo 0 siblings, 1 reply; 54+ messages in thread From: Radim Krčmář @ 2017-03-16 15:54 UTC (permalink / raw) To: Gabriel L. Somlo Cc: Michael S. Tsirkin, linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc 2017-03-16 11:44-0400, Gabriel L. Somlo: > On Thu, Mar 16, 2017 at 03:08:07PM +0100, Radim Krčmář wrote: >> 2017-03-16 09:24-0400, Gabriel L. Somlo: >> > On Thu, Mar 16, 2017 at 01:41:28AM +0200, Michael S. Tsirkin wrote: >> > > On Wed, Mar 15, 2017 at 07:35:34PM -0400, Gabriel L. Somlo wrote: >> > > > On Wed, Mar 15, 2017 at 11:22:18PM +0200, Michael S. Tsirkin wrote: >> > > > > Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem: >> > > > > unless explicitly provided with kernel command line argument >> > > > > "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability, >> > > > > without checking CPUID. >> > > > > >> > > > > We currently emulate that as a NOP but on VMX we can do better: let >> > > > > guest stop the CPU until timer, IPI or memory change. CPU will be busy >> > > > > but that isn't any worse than a NOP emulation. >> > > > > >> > > > > Note that mwait within guests is not the same as on real hardware >> > > > > because halt causes an exit while mwait doesn't. For this reason it >> > > > > might not be a good idea to use the regular MWAIT flag in CPUID to >> > > > > signal this capability. Add a flag in the hypervisor leaf instead. >> > > > > >> > > > > Additionally, we add a capability for QEMU - e.g. if it knows there's an >> > > > > isolated CPU dedicated for the VCPU it can set the standard MWAIT flag >> > > > > to improve guest behaviour. >> > > > >> > > > Same behavior (on the mac pro 1,1 running F22 with custom-compiled >> > > > kernel from kvm git master, plus this patch on top). >> > > > >> > > > The OS X 10.7 kernel hangs (or at least progresses extremely slowly) >> > > > on boot, does not bring up guest graphical interface within the first >> > > > 10 minutes that I waited for it. That, in contrast with the default >> > > > nop-based emulation where the guest comes up within 30 seconds. >> > > >> > > >> > > Thanks a lot, meanwhile I'll try to write a unit-test and experiment >> > > with various behaviours. >> > > >> > > > I will run another round of tests on a newer Mac (4-year-old macbook >> > > > air) and report back tomorrow. >> > > > >> > > > Going off on a tangent, why would encouraging otherwise well-behaved >> > > > guests (like linux ones, for example) to use MWAIT be desirable to >> > > > begin with ? Is it a matter of minimizing the overhead associated with >> > > > exiting and re-entering L1 ? Because if so, AFAIR staying inside L1 and >> > > > running guest-mode MWAIT in a tight loop will actually waste the host >> > > > CPU without the opportunity to yield to some other L0 thread. Sorry if >> > > > I fell into the middle of an ongoing conversation on this and missed >> > > > most of the relevant context, in which case please feel free to ignore >> > > > me... :) >> > > > >> > > > Thanks, >> > > > --G >> > > >> > > It's just some experiments I'm running, I'm not ready to describe it >> > > yet. I thought this part might be useful to at least some guests, so >> > > trying to upstream it right now. >> > >> > OK, so on a macbook air running F25 and the latest kvm git master plus >> > your v5 patch (4.11.0-rc2+), things appear to work. >> > >> > host-side cpuid output: >> > eax=0x000040 ebx=0x000040 ecx=0x000003 edx=0x021120 >> > >> > guest-side cpuid output: >> > eax=00000000 ebx=00000000 ecx=0x000003 edx=00000000 >> > >> > processor : 3 >> > vendor_id : GenuineIntel >> > cpu family : 6 >> > model : 42 >> > model name : Intel(R) Core(TM) i7-2677M CPU @ 1.80GHz >> > stepping : 7 >> > microcode : 0x29 >> > cpu MHz : 1157.849 >> > cache size : 4096 KB >> > physical id : 0 >> > siblings : 4 >> > core id : 1 >> > cpu cores : 2 >> > apicid : 3 >> > initial apicid : 3 >> > fpu : yes >> > fpu_exception : yes >> > cpuid level : 13 >> > wp : yes >> > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts >> > bugs : >> > bogomips : 3604.68 >> > clflush size : 64 >> > cache_alignment : 64 >> > address sizes : 36 bits physical, 48 bits virtual >> > power management: >> > >> > After studying your patch a bit more carefully (sorry, it's crazy >> > around here right now :) ) I realized you're simply trying to >> > (selectively) decide when to exit L1 and emulate as NOP vs. when to >> > just allow L1 to execute MONITOR & MWAIT natively. >> > >> > Is that right ? Because if so, the issues I saw on my MacPro1,1 are >> > weird and inexplicable, given that allowing L>=1 to run MONITOR/MWAIT >> > natively was one of the options Alex Graf and Rene Rebe used back in >> > the very early days of OS X on QEMU, at the time I got involved with >> > that project. Here's part of an out of tree patch against 3.4 which did >> > just that, and worked as far as I remember on *any* MWAIT capable >> > intel chip I had access to back in 2010: >> > >> > ############################################################################## >> > # 99-mwait.patch.kvm-kmod (Rene Rebe <rene@exactcode.de>) 2010-04-27 >> > ############################################################################## >> > diff -pNarU5 linux-3.4/arch/x86/kvm/cpuid.c linux-3.4-mac/arch/x86/kvm/cpuid.c >> > --- linux-3.4/arch/x86/kvm/cpuid.c 2012-05-20 18:29:13.000000000 -0400 >> > +++ linux-3.4-mac/arch/x86/kvm/cpuid.c 2012-10-09 11:42:59.921215750 -0400 >> > @@ -222,11 +222,11 @@ static int do_cpuid_ent(struct kvm_cpuid >> > f_nx | 0 /* Reserved */ | F(MMXEXT) | F(MMX) | >> > F(FXSR) | F(FXSR_OPT) | f_gbpages | f_rdtscp | >> > 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW); >> > /* cpuid 1.ecx */ >> > const u32 kvm_supported_word4_x86_features = >> > - F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | >> > + F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ | >> > 0 /* DS-CPL, VMX, SMX, EST */ | >> > 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | >> > F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ | >> > 0 /* Reserved, DCA */ | F(XMM4_1) | >> > F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) | >> > diff -pNarU5 linux-3.4/arch/x86/kvm/svm.c linux-3.4-mac/arch/x86/kvm/svm.c >> > --- linux-3.4/arch/x86/kvm/svm.c 2012-05-20 18:29:13.000000000 -0400 >> > +++ linux-3.4-mac/arch/x86/kvm/svm.c 2012-10-09 11:44:41.598997481 -0400 >> > @@ -1102,12 +1102,10 @@ static void init_vmcb(struct vcpu_svm *s >> > set_intercept(svm, INTERCEPT_VMSAVE); >> > set_intercept(svm, INTERCEPT_STGI); >> > set_intercept(svm, INTERCEPT_CLGI); >> > set_intercept(svm, INTERCEPT_SKINIT); >> > set_intercept(svm, INTERCEPT_WBINVD); >> > - set_intercept(svm, INTERCEPT_MONITOR); >> > - set_intercept(svm, INTERCEPT_MWAIT); >> > set_intercept(svm, INTERCEPT_XSETBV); >> > >> > control->iopm_base_pa = iopm_base; >> > control->msrpm_base_pa = __pa(svm->msrpm); >> > control->int_ctl = V_INTR_MASKING_MASK; >> > diff -pNarU5 linux-3.4/arch/x86/kvm/vmx.c linux-3.4-mac/arch/x86/kvm/vmx.c >> > --- linux-3.4/arch/x86/kvm/vmx.c 2012-05-20 18:29:13.000000000 -0400 >> > +++ linux-3.4-mac/arch/x86/kvm/vmx.c 2012-10-09 11:42:59.925215977 -0400 >> > @@ -1938,11 +1938,11 @@ static __init void nested_vmx_setup_ctls >> > nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high); >> > nested_vmx_procbased_ctls_low = 0; >> > nested_vmx_procbased_ctls_high &= >> > CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_USE_TSC_OFFSETING | >> > CPU_BASED_HLT_EXITING | CPU_BASED_INVLPG_EXITING | >> > - CPU_BASED_MWAIT_EXITING | CPU_BASED_CR3_LOAD_EXITING | >> > + CPU_BASED_CR3_LOAD_EXITING | >> > CPU_BASED_CR3_STORE_EXITING | >> > #ifdef CONFIG_X86_64 >> > CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING | >> > #endif >> > CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING | >> > @@ -2404,12 +2404,10 @@ static __init int setup_vmcs_config(stru >> > CPU_BASED_CR3_LOAD_EXITING | >> > CPU_BASED_CR3_STORE_EXITING | >> > CPU_BASED_USE_IO_BITMAPS | >> > CPU_BASED_MOV_DR_EXITING | >> > CPU_BASED_USE_TSC_OFFSETING | >> > - CPU_BASED_MWAIT_EXITING | >> > - CPU_BASED_MONITOR_EXITING | >> > CPU_BASED_INVLPG_EXITING | >> > CPU_BASED_RDPMC_EXITING; >> > >> > opt = CPU_BASED_TPR_SHADOW | >> > CPU_BASED_USE_MSR_BITMAPS | >> > >> > If all you're trying to do is (selectively) revert to this behavior, >> > that "shouldn't" mess it up for the MacPro either, so I'm thoroughly >> > confused at this point :) >> > >> > Back in 2010, running MWAIT in L>=1 behaved 100% exactly like a NOP, >> > didn't power down the physical CPU, just immediately moved on to the >> > next instruction. As such, there was no power saving and no >> > opportunity to yield to another L0 thread either, unlike with NOP >> > emulation at L0. >> > >> > Did that change on newer Intel chips (i.e., is guest-mode MWAIT now >> > doing something smarter than just acting as a guest-mode NOP) ? >> >> Probably, MWAIT in new intel chips enters power saving mode normally. >> >> If hardware-executed MWAIT acted as a NOP in your old chip, then that >> shouldn't be a problem either ... Maybe OS X gets confused into doing >> something really dumb because we do not expose the MONITOR/MWAIT feature >> bit correctly. >> >> Can you try this QEMU patch on the old hardware? >> >> diff --git a/target/i386/cpu.c b/target/i386/cpu.c >> index 7aa762245a54..4b112e12188a 100644 >> --- a/target/i386/cpu.c >> +++ b/target/i386/cpu.c >> @@ -2764,10 +2764,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, >> break; >> case 5: >> /* mwait info: needed for Core compatibility */ >> - *eax = 0; /* Smallest monitor-line size in bytes */ >> - *ebx = 0; /* Largest monitor-line size in bytes */ >> - *ecx = CPUID_MWAIT_EMX | CPUID_MWAIT_IBE; >> - *edx = 0; >> + host_cpuid(index, 0, eax, ebx, ecx, edx); >> break; >> case 6: >> /* Thermal and Power Leaf */ >> diff --git a/target/i386/kvm.c b/target/i386/kvm.c >> index 55865dbee0aa..1eb78291b093 100644 >> --- a/target/i386/kvm.c >> +++ b/target/i386/kvm.c >> @@ -360,6 +360,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function, >> if (!kvm_irqchip_in_kernel()) { >> ret &= ~CPUID_EXT_X2APIC; >> } >> + ret |= CPUID_EXT_MONITOR; >> } else if (function == 6 && reg == R_EAX) { >> ret |= CPUID_6_EAX_ARAT; /* safe to allow because of emulated APIC */ >> } else if (function == 7 && index == 0 && reg == R_EBX) { >> >> >> Thanks. > > No change, still hangs on boot. Hm, also with '-cpu host'? (I forgot that the CPUID_EXT_MONITOR isn't visible in the guest otherwise ...) Thanks. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-16 15:54 ` Radim Krčmář @ 2017-03-16 16:26 ` Gabriel L. Somlo 0 siblings, 0 replies; 54+ messages in thread From: Gabriel L. Somlo @ 2017-03-16 16:26 UTC (permalink / raw) To: Radim Krčmář Cc: Michael S. Tsirkin, linux-kernel, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc On Thu, Mar 16, 2017 at 04:54:06PM +0100, Radim Krčmář wrote: > 2017-03-16 11:44-0400, Gabriel L. Somlo: > > On Thu, Mar 16, 2017 at 03:08:07PM +0100, Radim Krčmář wrote: > >> 2017-03-16 09:24-0400, Gabriel L. Somlo: > >> > On Thu, Mar 16, 2017 at 01:41:28AM +0200, Michael S. Tsirkin wrote: > >> > > On Wed, Mar 15, 2017 at 07:35:34PM -0400, Gabriel L. Somlo wrote: > >> > > > On Wed, Mar 15, 2017 at 11:22:18PM +0200, Michael S. Tsirkin wrote: > >> > > > > Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem: > >> > > > > unless explicitly provided with kernel command line argument > >> > > > > "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability, > >> > > > > without checking CPUID. > >> > > > > > >> > > > > We currently emulate that as a NOP but on VMX we can do better: let > >> > > > > guest stop the CPU until timer, IPI or memory change. CPU will be busy > >> > > > > but that isn't any worse than a NOP emulation. > >> > > > > > >> > > > > Note that mwait within guests is not the same as on real hardware > >> > > > > because halt causes an exit while mwait doesn't. For this reason it > >> > > > > might not be a good idea to use the regular MWAIT flag in CPUID to > >> > > > > signal this capability. Add a flag in the hypervisor leaf instead. > >> > > > > > >> > > > > Additionally, we add a capability for QEMU - e.g. if it knows there's an > >> > > > > isolated CPU dedicated for the VCPU it can set the standard MWAIT flag > >> > > > > to improve guest behaviour. > >> > > > > >> > > > Same behavior (on the mac pro 1,1 running F22 with custom-compiled > >> > > > kernel from kvm git master, plus this patch on top). > >> > > > > >> > > > The OS X 10.7 kernel hangs (or at least progresses extremely slowly) > >> > > > on boot, does not bring up guest graphical interface within the first > >> > > > 10 minutes that I waited for it. That, in contrast with the default > >> > > > nop-based emulation where the guest comes up within 30 seconds. > >> > > > >> > > > >> > > Thanks a lot, meanwhile I'll try to write a unit-test and experiment > >> > > with various behaviours. > >> > > > >> > > > I will run another round of tests on a newer Mac (4-year-old macbook > >> > > > air) and report back tomorrow. > >> > > > > >> > > > Going off on a tangent, why would encouraging otherwise well-behaved > >> > > > guests (like linux ones, for example) to use MWAIT be desirable to > >> > > > begin with ? Is it a matter of minimizing the overhead associated with > >> > > > exiting and re-entering L1 ? Because if so, AFAIR staying inside L1 and > >> > > > running guest-mode MWAIT in a tight loop will actually waste the host > >> > > > CPU without the opportunity to yield to some other L0 thread. Sorry if > >> > > > I fell into the middle of an ongoing conversation on this and missed > >> > > > most of the relevant context, in which case please feel free to ignore > >> > > > me... :) > >> > > > > >> > > > Thanks, > >> > > > --G > >> > > > >> > > It's just some experiments I'm running, I'm not ready to describe it > >> > > yet. I thought this part might be useful to at least some guests, so > >> > > trying to upstream it right now. > >> > > >> > OK, so on a macbook air running F25 and the latest kvm git master plus > >> > your v5 patch (4.11.0-rc2+), things appear to work. > >> > > >> > host-side cpuid output: > >> > eax=0x000040 ebx=0x000040 ecx=0x000003 edx=0x021120 > >> > > >> > guest-side cpuid output: > >> > eax=00000000 ebx=00000000 ecx=0x000003 edx=00000000 > >> > > >> > processor : 3 > >> > vendor_id : GenuineIntel > >> > cpu family : 6 > >> > model : 42 > >> > model name : Intel(R) Core(TM) i7-2677M CPU @ 1.80GHz > >> > stepping : 7 > >> > microcode : 0x29 > >> > cpu MHz : 1157.849 > >> > cache size : 4096 KB > >> > physical id : 0 > >> > siblings : 4 > >> > core id : 1 > >> > cpu cores : 2 > >> > apicid : 3 > >> > initial apicid : 3 > >> > fpu : yes > >> > fpu_exception : yes > >> > cpuid level : 13 > >> > wp : yes > >> > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts > >> > bugs : > >> > bogomips : 3604.68 > >> > clflush size : 64 > >> > cache_alignment : 64 > >> > address sizes : 36 bits physical, 48 bits virtual > >> > power management: > >> > > >> > After studying your patch a bit more carefully (sorry, it's crazy > >> > around here right now :) ) I realized you're simply trying to > >> > (selectively) decide when to exit L1 and emulate as NOP vs. when to > >> > just allow L1 to execute MONITOR & MWAIT natively. > >> > > >> > Is that right ? Because if so, the issues I saw on my MacPro1,1 are > >> > weird and inexplicable, given that allowing L>=1 to run MONITOR/MWAIT > >> > natively was one of the options Alex Graf and Rene Rebe used back in > >> > the very early days of OS X on QEMU, at the time I got involved with > >> > that project. Here's part of an out of tree patch against 3.4 which did > >> > just that, and worked as far as I remember on *any* MWAIT capable > >> > intel chip I had access to back in 2010: > >> > > >> > ############################################################################## > >> > # 99-mwait.patch.kvm-kmod (Rene Rebe <rene@exactcode.de>) 2010-04-27 > >> > ############################################################################## > >> > diff -pNarU5 linux-3.4/arch/x86/kvm/cpuid.c linux-3.4-mac/arch/x86/kvm/cpuid.c > >> > --- linux-3.4/arch/x86/kvm/cpuid.c 2012-05-20 18:29:13.000000000 -0400 > >> > +++ linux-3.4-mac/arch/x86/kvm/cpuid.c 2012-10-09 11:42:59.921215750 -0400 > >> > @@ -222,11 +222,11 @@ static int do_cpuid_ent(struct kvm_cpuid > >> > f_nx | 0 /* Reserved */ | F(MMXEXT) | F(MMX) | > >> > F(FXSR) | F(FXSR_OPT) | f_gbpages | f_rdtscp | > >> > 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW); > >> > /* cpuid 1.ecx */ > >> > const u32 kvm_supported_word4_x86_features = > >> > - F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | > >> > + F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ | > >> > 0 /* DS-CPL, VMX, SMX, EST */ | > >> > 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | > >> > F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ | > >> > 0 /* Reserved, DCA */ | F(XMM4_1) | > >> > F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) | > >> > diff -pNarU5 linux-3.4/arch/x86/kvm/svm.c linux-3.4-mac/arch/x86/kvm/svm.c > >> > --- linux-3.4/arch/x86/kvm/svm.c 2012-05-20 18:29:13.000000000 -0400 > >> > +++ linux-3.4-mac/arch/x86/kvm/svm.c 2012-10-09 11:44:41.598997481 -0400 > >> > @@ -1102,12 +1102,10 @@ static void init_vmcb(struct vcpu_svm *s > >> > set_intercept(svm, INTERCEPT_VMSAVE); > >> > set_intercept(svm, INTERCEPT_STGI); > >> > set_intercept(svm, INTERCEPT_CLGI); > >> > set_intercept(svm, INTERCEPT_SKINIT); > >> > set_intercept(svm, INTERCEPT_WBINVD); > >> > - set_intercept(svm, INTERCEPT_MONITOR); > >> > - set_intercept(svm, INTERCEPT_MWAIT); > >> > set_intercept(svm, INTERCEPT_XSETBV); > >> > > >> > control->iopm_base_pa = iopm_base; > >> > control->msrpm_base_pa = __pa(svm->msrpm); > >> > control->int_ctl = V_INTR_MASKING_MASK; > >> > diff -pNarU5 linux-3.4/arch/x86/kvm/vmx.c linux-3.4-mac/arch/x86/kvm/vmx.c > >> > --- linux-3.4/arch/x86/kvm/vmx.c 2012-05-20 18:29:13.000000000 -0400 > >> > +++ linux-3.4-mac/arch/x86/kvm/vmx.c 2012-10-09 11:42:59.925215977 -0400 > >> > @@ -1938,11 +1938,11 @@ static __init void nested_vmx_setup_ctls > >> > nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high); > >> > nested_vmx_procbased_ctls_low = 0; > >> > nested_vmx_procbased_ctls_high &= > >> > CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_USE_TSC_OFFSETING | > >> > CPU_BASED_HLT_EXITING | CPU_BASED_INVLPG_EXITING | > >> > - CPU_BASED_MWAIT_EXITING | CPU_BASED_CR3_LOAD_EXITING | > >> > + CPU_BASED_CR3_LOAD_EXITING | > >> > CPU_BASED_CR3_STORE_EXITING | > >> > #ifdef CONFIG_X86_64 > >> > CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING | > >> > #endif > >> > CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING | > >> > @@ -2404,12 +2404,10 @@ static __init int setup_vmcs_config(stru > >> > CPU_BASED_CR3_LOAD_EXITING | > >> > CPU_BASED_CR3_STORE_EXITING | > >> > CPU_BASED_USE_IO_BITMAPS | > >> > CPU_BASED_MOV_DR_EXITING | > >> > CPU_BASED_USE_TSC_OFFSETING | > >> > - CPU_BASED_MWAIT_EXITING | > >> > - CPU_BASED_MONITOR_EXITING | > >> > CPU_BASED_INVLPG_EXITING | > >> > CPU_BASED_RDPMC_EXITING; > >> > > >> > opt = CPU_BASED_TPR_SHADOW | > >> > CPU_BASED_USE_MSR_BITMAPS | > >> > > >> > If all you're trying to do is (selectively) revert to this behavior, > >> > that "shouldn't" mess it up for the MacPro either, so I'm thoroughly > >> > confused at this point :) > >> > > >> > Back in 2010, running MWAIT in L>=1 behaved 100% exactly like a NOP, > >> > didn't power down the physical CPU, just immediately moved on to the > >> > next instruction. As such, there was no power saving and no > >> > opportunity to yield to another L0 thread either, unlike with NOP > >> > emulation at L0. > >> > > >> > Did that change on newer Intel chips (i.e., is guest-mode MWAIT now > >> > doing something smarter than just acting as a guest-mode NOP) ? > >> > >> Probably, MWAIT in new intel chips enters power saving mode normally. > >> > >> If hardware-executed MWAIT acted as a NOP in your old chip, then that > >> shouldn't be a problem either ... Maybe OS X gets confused into doing > >> something really dumb because we do not expose the MONITOR/MWAIT feature > >> bit correctly. > >> > >> Can you try this QEMU patch on the old hardware? > >> > >> diff --git a/target/i386/cpu.c b/target/i386/cpu.c > >> index 7aa762245a54..4b112e12188a 100644 > >> --- a/target/i386/cpu.c > >> +++ b/target/i386/cpu.c > >> @@ -2764,10 +2764,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, > >> break; > >> case 5: > >> /* mwait info: needed for Core compatibility */ > >> - *eax = 0; /* Smallest monitor-line size in bytes */ > >> - *ebx = 0; /* Largest monitor-line size in bytes */ > >> - *ecx = CPUID_MWAIT_EMX | CPUID_MWAIT_IBE; > >> - *edx = 0; > >> + host_cpuid(index, 0, eax, ebx, ecx, edx); > >> break; > >> case 6: > >> /* Thermal and Power Leaf */ > >> diff --git a/target/i386/kvm.c b/target/i386/kvm.c > >> index 55865dbee0aa..1eb78291b093 100644 > >> --- a/target/i386/kvm.c > >> +++ b/target/i386/kvm.c > >> @@ -360,6 +360,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function, > >> if (!kvm_irqchip_in_kernel()) { > >> ret &= ~CPUID_EXT_X2APIC; > >> } > >> + ret |= CPUID_EXT_MONITOR; > >> } else if (function == 6 && reg == R_EAX) { > >> ret |= CPUID_6_EAX_ARAT; /* safe to allow because of emulated APIC */ > >> } else if (function == 7 && index == 0 && reg == R_EBX) { > >> > >> > >> Thanks. > > > > No change, still hangs on boot. > > Hm, also with '-cpu host'? > (I forgot that the CPUID_EXT_MONITOR isn't visible in the guest > otherwise ...) Yeah, managed to get it started with '-cpu host', but same behavior. Maybe that version of Xeon really was braindamaged in some way, and never would have worked with L1 MWAIT regardless. I only ever used that machine after the emulate-as-nop patch made it into KVM (commit 87c0057), so I honestly can't say whether it ever worked with MWAIT run natively at L1... Thanks, --Gabriel ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-15 21:22 [PATCH v5 untested] kvm: better MWAIT emulation for guests Michael S. Tsirkin 2017-03-15 23:35 ` Gabriel L. Somlo @ 2017-03-21 16:16 ` Joerg Roedel 2017-03-21 18:45 ` Michael S. Tsirkin 2017-03-27 13:34 ` Alexander Graf 2 siblings, 1 reply; 54+ messages in thread From: Joerg Roedel @ 2017-03-21 16:16 UTC (permalink / raw) To: Michael S. Tsirkin Cc: linux-kernel, Gabriel L. Somlo, Paolo Bonzini, Radim Krčmář, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, kvm, linux-doc On Wed, Mar 15, 2017 at 11:22:18PM +0200, Michael S. Tsirkin wrote: > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c > index d1efe2c..18e53bc 100644 > --- a/arch/x86/kvm/svm.c > +++ b/arch/x86/kvm/svm.c > @@ -1198,8 +1198,6 @@ static void init_vmcb(struct vcpu_svm *svm) > set_intercept(svm, INTERCEPT_CLGI); > set_intercept(svm, INTERCEPT_SKINIT); > set_intercept(svm, INTERCEPT_WBINVD); > - set_intercept(svm, INTERCEPT_MONITOR); > - set_intercept(svm, INTERCEPT_MWAIT); Why do you remove the intercepts for AMD? The new kvm_mwait_in_guest() function will always return false on AMD anyway, and on Intel you re-add the intercepts for !kvm_mwait_in_guest(). Joerg ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-21 16:16 ` Joerg Roedel @ 2017-03-21 18:45 ` Michael S. Tsirkin 0 siblings, 0 replies; 54+ messages in thread From: Michael S. Tsirkin @ 2017-03-21 18:45 UTC (permalink / raw) To: Joerg Roedel Cc: linux-kernel, Gabriel L. Somlo, Paolo Bonzini, Radim Krčmář, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, kvm, linux-doc On Tue, Mar 21, 2017 at 05:16:32PM +0100, Joerg Roedel wrote: > On Wed, Mar 15, 2017 at 11:22:18PM +0200, Michael S. Tsirkin wrote: > > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c > > index d1efe2c..18e53bc 100644 > > --- a/arch/x86/kvm/svm.c > > +++ b/arch/x86/kvm/svm.c > > @@ -1198,8 +1198,6 @@ static void init_vmcb(struct vcpu_svm *svm) > > set_intercept(svm, INTERCEPT_CLGI); > > set_intercept(svm, INTERCEPT_SKINIT); > > set_intercept(svm, INTERCEPT_WBINVD); > > - set_intercept(svm, INTERCEPT_MONITOR); > > - set_intercept(svm, INTERCEPT_MWAIT); > > Why do you remove the intercepts for AMD? The new kvm_mwait_in_guest() > function will always return false on AMD anyway, I think that's a bug and I should fix it to return true there. > and on Intel you re-add > the intercepts for !kvm_mwait_in_guest(). > > > Joerg Does AMD need some work-around similar to CPUID5_ECX_INTERRUPT_BREAK? That's why we have kvm_mwait_in_guest ... -- MST ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-15 21:22 [PATCH v5 untested] kvm: better MWAIT emulation for guests Michael S. Tsirkin 2017-03-15 23:35 ` Gabriel L. Somlo 2017-03-21 16:16 ` Joerg Roedel @ 2017-03-27 13:34 ` Alexander Graf 2017-03-28 14:28 ` Radim Krčmář 2 siblings, 1 reply; 54+ messages in thread From: Alexander Graf @ 2017-03-27 13:34 UTC (permalink / raw) To: Michael S. Tsirkin, linux-kernel Cc: Gabriel L. Somlo, Paolo Bonzini, Radim Krčmář, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc On 15/03/2017 22:22, Michael S. Tsirkin wrote: > Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem: > unless explicitly provided with kernel command line argument > "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability, > without checking CPUID. > > We currently emulate that as a NOP but on VMX we can do better: let > guest stop the CPU until timer, IPI or memory change. CPU will be busy > but that isn't any worse than a NOP emulation. > > Note that mwait within guests is not the same as on real hardware > because halt causes an exit while mwait doesn't. For this reason it > might not be a good idea to use the regular MWAIT flag in CPUID to > signal this capability. Add a flag in the hypervisor leaf instead. So imagine we had proper MWAIT emulation capabilities based on page faults. In that case, we could do something as fancy as Treat MWAIT as pass-through by default Have a per-vcpu monitor timer 10 times a second in the background that checks which instruction we're in If we're in mwait for the last - say - 1 second, switch to emulated MWAIT, if $IP was in non-mwait within that time, reset counter. Or instead maybe just reuse the adapter hlt logic? Either way, with that we should be able to get super low latency IPIs running while still maintaining some sanity on systems which don't have dedicated CPUs for workloads. And we wouldn't need guest modifications, which is a great plus. So older guests (and Windows?) could benefit from mwait as well. Alex ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-27 13:34 ` Alexander Graf @ 2017-03-28 14:28 ` Radim Krčmář 2017-03-28 20:35 ` Jim Mattson 0 siblings, 1 reply; 54+ messages in thread From: Radim Krčmář @ 2017-03-28 14:28 UTC (permalink / raw) To: Alexander Graf Cc: Michael S. Tsirkin, linux-kernel, Gabriel L. Somlo, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel, kvm, linux-doc 2017-03-27 15:34+0200, Alexander Graf: > On 15/03/2017 22:22, Michael S. Tsirkin wrote: >> Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem: >> unless explicitly provided with kernel command line argument >> "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability, >> without checking CPUID. >> >> We currently emulate that as a NOP but on VMX we can do better: let >> guest stop the CPU until timer, IPI or memory change. CPU will be busy >> but that isn't any worse than a NOP emulation. >> >> Note that mwait within guests is not the same as on real hardware >> because halt causes an exit while mwait doesn't. For this reason it >> might not be a good idea to use the regular MWAIT flag in CPUID to >> signal this capability. Add a flag in the hypervisor leaf instead. > > So imagine we had proper MWAIT emulation capabilities based on page faults. > In that case, we could do something as fancy as > > Treat MWAIT as pass-through by default > > Have a per-vcpu monitor timer 10 times a second in the background that > checks which instruction we're in > > If we're in mwait for the last - say - 1 second, switch to emulated MWAIT, > if $IP was in non-mwait within that time, reset counter. Or we could reuse external interrupts for sampling. Exits trigerred by them would check for current instruction (probably would be best to limit just to timer tick) and a sufficient ratio (> 0?) of other exits would imply that MWAIT is not used. > Or instead maybe just reuse the adapter hlt logic? Emulated MWAIT is very similar to emulated HLT, so reusing the logic makes sense. We would just add new wakeup methods. > Either way, with that we should be able to get super low latency IPIs > running while still maintaining some sanity on systems which don't have > dedicated CPUs for workloads. > > And we wouldn't need guest modifications, which is a great plus. So older > guests (and Windows?) could benefit from mwait as well. There is no need guest modifications -- it could be exposed as standard MWAIT feature to the guest, with responsibilities for guest/host-impact on the user. I think that the page-fault based MWAIT would require paravirt if it should be enabled by default, because of performance concerns: Enabling write protection on a page needs a VM exit on all other VCPUs when beginning monitoring (to reload page permissions and prevent missed writes). We'd want to keep trapping writes to the page all the time because toggling is slow, but this could regress performance for an OS that has other data accessed by other VCPUs in that page. No current interface can tell the guest that it should reserve the whole page instead of what CPUID[5] says and that writes to the monitored page are not "cheap", but can trigger a VM exit ... And before we disable MWAIT exiting by default, we also have to understand the old OS X on core 2 bug from Gabriel. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-28 14:28 ` Radim Krčmář @ 2017-03-28 20:35 ` Jim Mattson 2017-03-29 12:11 ` Radim Krčmář 0 siblings, 1 reply; 54+ messages in thread From: Jim Mattson @ 2017-03-28 20:35 UTC (permalink / raw) To: Radim Krčmář Cc: Alexander Graf, Michael S. Tsirkin, LKML, Gabriel L. Somlo, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, the arch/x86 maintainers, Joerg Roedel, kvm list, linux-doc On Tue, Mar 28, 2017 at 7:28 AM, Radim Krčmář <rkrcmar@redhat.com> wrote: > 2017-03-27 15:34+0200, Alexander Graf: >> On 15/03/2017 22:22, Michael S. Tsirkin wrote: >>> Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem: >>> unless explicitly provided with kernel command line argument >>> "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability, >>> without checking CPUID. >>> >>> We currently emulate that as a NOP but on VMX we can do better: let >>> guest stop the CPU until timer, IPI or memory change. CPU will be busy >>> but that isn't any worse than a NOP emulation. >>> >>> Note that mwait within guests is not the same as on real hardware >>> because halt causes an exit while mwait doesn't. For this reason it >>> might not be a good idea to use the regular MWAIT flag in CPUID to >>> signal this capability. Add a flag in the hypervisor leaf instead. >> >> So imagine we had proper MWAIT emulation capabilities based on page faults. >> In that case, we could do something as fancy as >> >> Treat MWAIT as pass-through by default >> >> Have a per-vcpu monitor timer 10 times a second in the background that >> checks which instruction we're in >> >> If we're in mwait for the last - say - 1 second, switch to emulated MWAIT, >> if $IP was in non-mwait within that time, reset counter. > > Or we could reuse external interrupts for sampling. Exits trigerred by > them would check for current instruction (probably would be best to > limit just to timer tick) and a sufficient ratio (> 0?) of other exits > would imply that MWAIT is not used. > >> Or instead maybe just reuse the adapter hlt logic? > > Emulated MWAIT is very similar to emulated HLT, so reusing the logic > makes sense. We would just add new wakeup methods. > >> Either way, with that we should be able to get super low latency IPIs >> running while still maintaining some sanity on systems which don't have >> dedicated CPUs for workloads. >> >> And we wouldn't need guest modifications, which is a great plus. So older >> guests (and Windows?) could benefit from mwait as well. > > There is no need guest modifications -- it could be exposed as standard > MWAIT feature to the guest, with responsibilities for guest/host-impact > on the user. > > I think that the page-fault based MWAIT would require paravirt if it > should be enabled by default, because of performance concerns: > Enabling write protection on a page needs a VM exit on all other VCPUs > when beginning monitoring (to reload page permissions and prevent missed > writes). > We'd want to keep trapping writes to the page all the time because > toggling is slow, but this could regress performance for an OS that has > other data accessed by other VCPUs in that page. > No current interface can tell the guest that it should reserve the whole > page instead of what CPUID[5] says and that writes to the monitored page > are not "cheap", but can trigger a VM exit ... CPUID.05H:EBX is supposed to address the false sharing issue. IIRC, VMware Fusion reports 64 in CPUID.05H:EAX and 4096 in CPUID.05H:EBX when running Mac OS X guests. Per Intel's SDM volume 3, section 8.10.5, "To avoid false wake-ups; use the largest monitor line size to pad the data structure used to monitor writes. Software must make sure that beyond the data structure, no unrelated data variable exists in the triggering area for MWAIT. A pad may be needed to avoid this situation." Unfortunately, most operating systems do not follow this advice. > > And before we disable MWAIT exiting by default, we also have to > understand the old OS X on core 2 bug from Gabriel. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-28 20:35 ` Jim Mattson @ 2017-03-29 12:11 ` Radim Krčmář 2017-04-03 10:04 ` Alexander Graf 0 siblings, 1 reply; 54+ messages in thread From: Radim Krčmář @ 2017-03-29 12:11 UTC (permalink / raw) To: Jim Mattson Cc: Alexander Graf, Michael S. Tsirkin, LKML, Gabriel L. Somlo, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, the arch/x86 maintainers, Joerg Roedel, kvm list, linux-doc 2017-03-28 13:35-0700, Jim Mattson: > On Tue, Mar 28, 2017 at 7:28 AM, Radim Krčmář <rkrcmar@redhat.com> wrote: >> 2017-03-27 15:34+0200, Alexander Graf: >>> On 15/03/2017 22:22, Michael S. Tsirkin wrote: >>>> Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem: >>>> unless explicitly provided with kernel command line argument >>>> "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability, >>>> without checking CPUID. >>>> >>>> We currently emulate that as a NOP but on VMX we can do better: let >>>> guest stop the CPU until timer, IPI or memory change. CPU will be busy >>>> but that isn't any worse than a NOP emulation. >>>> >>>> Note that mwait within guests is not the same as on real hardware >>>> because halt causes an exit while mwait doesn't. For this reason it >>>> might not be a good idea to use the regular MWAIT flag in CPUID to >>>> signal this capability. Add a flag in the hypervisor leaf instead. >>> >>> So imagine we had proper MWAIT emulation capabilities based on page faults. >>> In that case, we could do something as fancy as >>> >>> Treat MWAIT as pass-through by default >>> >>> Have a per-vcpu monitor timer 10 times a second in the background that >>> checks which instruction we're in >>> >>> If we're in mwait for the last - say - 1 second, switch to emulated MWAIT, >>> if $IP was in non-mwait within that time, reset counter. >> >> Or we could reuse external interrupts for sampling. Exits trigerred by >> them would check for current instruction (probably would be best to >> limit just to timer tick) and a sufficient ratio (> 0?) of other exits >> would imply that MWAIT is not used. >> >>> Or instead maybe just reuse the adapter hlt logic? >> >> Emulated MWAIT is very similar to emulated HLT, so reusing the logic >> makes sense. We would just add new wakeup methods. >> >>> Either way, with that we should be able to get super low latency IPIs >>> running while still maintaining some sanity on systems which don't have >>> dedicated CPUs for workloads. >>> >>> And we wouldn't need guest modifications, which is a great plus. So older >>> guests (and Windows?) could benefit from mwait as well. >> >> There is no need guest modifications -- it could be exposed as standard >> MWAIT feature to the guest, with responsibilities for guest/host-impact >> on the user. >> >> I think that the page-fault based MWAIT would require paravirt if it >> should be enabled by default, because of performance concerns: >> Enabling write protection on a page needs a VM exit on all other VCPUs >> when beginning monitoring (to reload page permissions and prevent missed >> writes). >> We'd want to keep trapping writes to the page all the time because >> toggling is slow, but this could regress performance for an OS that has >> other data accessed by other VCPUs in that page. >> No current interface can tell the guest that it should reserve the whole >> page instead of what CPUID[5] says and that writes to the monitored page >> are not "cheap", but can trigger a VM exit ... > > CPUID.05H:EBX is supposed to address the false sharing issue. IIRC, > VMware Fusion reports 64 in CPUID.05H:EAX and 4096 in CPUID.05H:EBX > when running Mac OS X guests. Per Intel's SDM volume 3, section > 8.10.5, "To avoid false wake-ups; use the largest monitor line size to > pad the data structure used to monitor writes. Software must make sure > that beyond the data structure, no unrelated data variable exists in > the triggering area for MWAIT. A pad may be needed to avoid this > situation." Unfortunately, most operating systems do not follow this > advice. Right, EBX provides what we need to expose that the whole page is monitored, thanks! > Unfortunately, most operating systems do not follow this > advice. Yeah ... KVM could add yet another heuristic to drop MWAIT emulation and use hardware if there were many traps while the target was not MWAITING, it's getting over-complicated, though :/ ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-29 12:11 ` Radim Krčmář @ 2017-04-03 10:04 ` Alexander Graf 2017-04-04 12:39 ` Radim Krčmář 0 siblings, 1 reply; 54+ messages in thread From: Alexander Graf @ 2017-04-03 10:04 UTC (permalink / raw) To: Radim Krčmář, Jim Mattson Cc: Michael S. Tsirkin, LKML, Gabriel L. Somlo, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, the arch/x86 maintainers, Joerg Roedel, kvm list, linux-doc On 03/29/2017 02:11 PM, Radim Krčmář wrote: > 2017-03-28 13:35-0700, Jim Mattson: >> On Tue, Mar 28, 2017 at 7:28 AM, Radim Krčmář <rkrcmar@redhat.com> wrote: >>> 2017-03-27 15:34+0200, Alexander Graf: >>>> On 15/03/2017 22:22, Michael S. Tsirkin wrote: >>>>> Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem: >>>>> unless explicitly provided with kernel command line argument >>>>> "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability, >>>>> without checking CPUID. >>>>> >>>>> We currently emulate that as a NOP but on VMX we can do better: let >>>>> guest stop the CPU until timer, IPI or memory change. CPU will be busy >>>>> but that isn't any worse than a NOP emulation. >>>>> >>>>> Note that mwait within guests is not the same as on real hardware >>>>> because halt causes an exit while mwait doesn't. For this reason it >>>>> might not be a good idea to use the regular MWAIT flag in CPUID to >>>>> signal this capability. Add a flag in the hypervisor leaf instead. >>>> So imagine we had proper MWAIT emulation capabilities based on page faults. >>>> In that case, we could do something as fancy as >>>> >>>> Treat MWAIT as pass-through by default >>>> >>>> Have a per-vcpu monitor timer 10 times a second in the background that >>>> checks which instruction we're in >>>> >>>> If we're in mwait for the last - say - 1 second, switch to emulated MWAIT, >>>> if $IP was in non-mwait within that time, reset counter. >>> Or we could reuse external interrupts for sampling. Exits trigerred by >>> them would check for current instruction (probably would be best to >>> limit just to timer tick) and a sufficient ratio (> 0?) of other exits >>> would imply that MWAIT is not used. >>> >>>> Or instead maybe just reuse the adapter hlt logic? >>> Emulated MWAIT is very similar to emulated HLT, so reusing the logic >>> makes sense. We would just add new wakeup methods. >>> >>>> Either way, with that we should be able to get super low latency IPIs >>>> running while still maintaining some sanity on systems which don't have >>>> dedicated CPUs for workloads. >>>> >>>> And we wouldn't need guest modifications, which is a great plus. So older >>>> guests (and Windows?) could benefit from mwait as well. >>> There is no need guest modifications -- it could be exposed as standard >>> MWAIT feature to the guest, with responsibilities for guest/host-impact >>> on the user. >>> >>> I think that the page-fault based MWAIT would require paravirt if it >>> should be enabled by default, because of performance concerns: >>> Enabling write protection on a page needs a VM exit on all other VCPUs >>> when beginning monitoring (to reload page permissions and prevent missed >>> writes). >>> We'd want to keep trapping writes to the page all the time because >>> toggling is slow, but this could regress performance for an OS that has >>> other data accessed by other VCPUs in that page. >>> No current interface can tell the guest that it should reserve the whole >>> page instead of what CPUID[5] says and that writes to the monitored page >>> are not "cheap", but can trigger a VM exit ... >> CPUID.05H:EBX is supposed to address the false sharing issue. IIRC, >> VMware Fusion reports 64 in CPUID.05H:EAX and 4096 in CPUID.05H:EBX >> when running Mac OS X guests. Per Intel's SDM volume 3, section >> 8.10.5, "To avoid false wake-ups; use the largest monitor line size to >> pad the data structure used to monitor writes. Software must make sure >> that beyond the data structure, no unrelated data variable exists in >> the triggering area for MWAIT. A pad may be needed to avoid this >> situation." Unfortunately, most operating systems do not follow this >> advice. > Right, EBX provides what we need to expose that the whole page is > monitored, thanks! So coming back to the original patch, is there anything that should keep us from exposing MWAIT straight into the guest at all times? Alex ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-04-03 10:04 ` Alexander Graf @ 2017-04-04 12:39 ` Radim Krčmář 2017-04-04 12:51 ` Alexander Graf 0 siblings, 1 reply; 54+ messages in thread From: Radim Krčmář @ 2017-04-04 12:39 UTC (permalink / raw) To: Alexander Graf Cc: Jim Mattson, Michael S. Tsirkin, LKML, Gabriel L. Somlo, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, the arch/x86 maintainers, Joerg Roedel, kvm list, linux-doc 2017-04-03 12:04+0200, Alexander Graf: > On 03/29/2017 02:11 PM, Radim Krčmář wrote: >> 2017-03-28 13:35-0700, Jim Mattson: >> > On Tue, Mar 28, 2017 at 7:28 AM, Radim Krčmář <rkrcmar@redhat.com> wrote: >> > > 2017-03-27 15:34+0200, Alexander Graf: >> > > > On 15/03/2017 22:22, Michael S. Tsirkin wrote: >> > > > > Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem: >> > > > > unless explicitly provided with kernel command line argument >> > > > > "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability, >> > > > > without checking CPUID. >> > > > > >> > > > > We currently emulate that as a NOP but on VMX we can do better: let >> > > > > guest stop the CPU until timer, IPI or memory change. CPU will be busy >> > > > > but that isn't any worse than a NOP emulation. >> > > > > >> > > > > Note that mwait within guests is not the same as on real hardware >> > > > > because halt causes an exit while mwait doesn't. For this reason it >> > > > > might not be a good idea to use the regular MWAIT flag in CPUID to >> > > > > signal this capability. Add a flag in the hypervisor leaf instead. >> > > > So imagine we had proper MWAIT emulation capabilities based on page faults. >> > > > In that case, we could do something as fancy as >> > > > >> > > > Treat MWAIT as pass-through by default >> > > > >> > > > Have a per-vcpu monitor timer 10 times a second in the background that >> > > > checks which instruction we're in >> > > > >> > > > If we're in mwait for the last - say - 1 second, switch to emulated MWAIT, >> > > > if $IP was in non-mwait within that time, reset counter. >> > > Or we could reuse external interrupts for sampling. Exits trigerred by >> > > them would check for current instruction (probably would be best to >> > > limit just to timer tick) and a sufficient ratio (> 0?) of other exits >> > > would imply that MWAIT is not used. >> > > >> > > > Or instead maybe just reuse the adapter hlt logic? >> > > Emulated MWAIT is very similar to emulated HLT, so reusing the logic >> > > makes sense. We would just add new wakeup methods. >> > > >> > > > Either way, with that we should be able to get super low latency IPIs >> > > > running while still maintaining some sanity on systems which don't have >> > > > dedicated CPUs for workloads. >> > > > >> > > > And we wouldn't need guest modifications, which is a great plus. So older >> > > > guests (and Windows?) could benefit from mwait as well. >> > > There is no need guest modifications -- it could be exposed as standard >> > > MWAIT feature to the guest, with responsibilities for guest/host-impact >> > > on the user. >> > > >> > > I think that the page-fault based MWAIT would require paravirt if it >> > > should be enabled by default, because of performance concerns: >> > > Enabling write protection on a page needs a VM exit on all other VCPUs >> > > when beginning monitoring (to reload page permissions and prevent missed >> > > writes). >> > > We'd want to keep trapping writes to the page all the time because >> > > toggling is slow, but this could regress performance for an OS that has >> > > other data accessed by other VCPUs in that page. >> > > No current interface can tell the guest that it should reserve the whole >> > > page instead of what CPUID[5] says and that writes to the monitored page >> > > are not "cheap", but can trigger a VM exit ... >> > CPUID.05H:EBX is supposed to address the false sharing issue. IIRC, >> > VMware Fusion reports 64 in CPUID.05H:EAX and 4096 in CPUID.05H:EBX >> > when running Mac OS X guests. Per Intel's SDM volume 3, section >> > 8.10.5, "To avoid false wake-ups; use the largest monitor line size to >> > pad the data structure used to monitor writes. Software must make sure >> > that beyond the data structure, no unrelated data variable exists in >> > the triggering area for MWAIT. A pad may be needed to avoid this >> > situation." Unfortunately, most operating systems do not follow this >> > advice. >> Right, EBX provides what we need to expose that the whole page is >> monitored, thanks! > > So coming back to the original patch, is there anything that should keep us > from exposing MWAIT straight into the guest at all times? Just minor issues: * OS X on Core 2 fails for unknown reason if we disable the instruction trapping, which is an argument against doing it by default * idling guests would consume host CPU, which is a significant change in behavior and shouldn't be done without userspace's involvement I think the best compromise is to add a capability for the MWAIT VM-exit controls and let userspace expose MWAIT if it wishes to. Will send a patch. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-04-04 12:39 ` Radim Krčmář @ 2017-04-04 12:51 ` Alexander Graf 2017-04-04 13:13 ` Radim Krčmář 0 siblings, 1 reply; 54+ messages in thread From: Alexander Graf @ 2017-04-04 12:51 UTC (permalink / raw) To: Radim Krčmář Cc: Jim Mattson, Michael S. Tsirkin, LKML, Gabriel L. Somlo, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, the arch/x86 maintainers, Joerg Roedel, kvm list, linux-doc On 04/04/2017 02:39 PM, Radim Krčmář wrote: > 2017-04-03 12:04+0200, Alexander Graf: >> On 03/29/2017 02:11 PM, Radim Krčmář wrote: >>> 2017-03-28 13:35-0700, Jim Mattson: >>>> On Tue, Mar 28, 2017 at 7:28 AM, Radim Krčmář <rkrcmar@redhat.com> wrote: >>>>> 2017-03-27 15:34+0200, Alexander Graf: >>>>>> On 15/03/2017 22:22, Michael S. Tsirkin wrote: >>>>>>> Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem: >>>>>>> unless explicitly provided with kernel command line argument >>>>>>> "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability, >>>>>>> without checking CPUID. >>>>>>> >>>>>>> We currently emulate that as a NOP but on VMX we can do better: let >>>>>>> guest stop the CPU until timer, IPI or memory change. CPU will be busy >>>>>>> but that isn't any worse than a NOP emulation. >>>>>>> >>>>>>> Note that mwait within guests is not the same as on real hardware >>>>>>> because halt causes an exit while mwait doesn't. For this reason it >>>>>>> might not be a good idea to use the regular MWAIT flag in CPUID to >>>>>>> signal this capability. Add a flag in the hypervisor leaf instead. >>>>>> So imagine we had proper MWAIT emulation capabilities based on page faults. >>>>>> In that case, we could do something as fancy as >>>>>> >>>>>> Treat MWAIT as pass-through by default >>>>>> >>>>>> Have a per-vcpu monitor timer 10 times a second in the background that >>>>>> checks which instruction we're in >>>>>> >>>>>> If we're in mwait for the last - say - 1 second, switch to emulated MWAIT, >>>>>> if $IP was in non-mwait within that time, reset counter. >>>>> Or we could reuse external interrupts for sampling. Exits trigerred by >>>>> them would check for current instruction (probably would be best to >>>>> limit just to timer tick) and a sufficient ratio (> 0?) of other exits >>>>> would imply that MWAIT is not used. >>>>> >>>>>> Or instead maybe just reuse the adapter hlt logic? >>>>> Emulated MWAIT is very similar to emulated HLT, so reusing the logic >>>>> makes sense. We would just add new wakeup methods. >>>>> >>>>>> Either way, with that we should be able to get super low latency IPIs >>>>>> running while still maintaining some sanity on systems which don't have >>>>>> dedicated CPUs for workloads. >>>>>> >>>>>> And we wouldn't need guest modifications, which is a great plus. So older >>>>>> guests (and Windows?) could benefit from mwait as well. >>>>> There is no need guest modifications -- it could be exposed as standard >>>>> MWAIT feature to the guest, with responsibilities for guest/host-impact >>>>> on the user. >>>>> >>>>> I think that the page-fault based MWAIT would require paravirt if it >>>>> should be enabled by default, because of performance concerns: >>>>> Enabling write protection on a page needs a VM exit on all other VCPUs >>>>> when beginning monitoring (to reload page permissions and prevent missed >>>>> writes). >>>>> We'd want to keep trapping writes to the page all the time because >>>>> toggling is slow, but this could regress performance for an OS that has >>>>> other data accessed by other VCPUs in that page. >>>>> No current interface can tell the guest that it should reserve the whole >>>>> page instead of what CPUID[5] says and that writes to the monitored page >>>>> are not "cheap", but can trigger a VM exit ... >>>> CPUID.05H:EBX is supposed to address the false sharing issue. IIRC, >>>> VMware Fusion reports 64 in CPUID.05H:EAX and 4096 in CPUID.05H:EBX >>>> when running Mac OS X guests. Per Intel's SDM volume 3, section >>>> 8.10.5, "To avoid false wake-ups; use the largest monitor line size to >>>> pad the data structure used to monitor writes. Software must make sure >>>> that beyond the data structure, no unrelated data variable exists in >>>> the triggering area for MWAIT. A pad may be needed to avoid this >>>> situation." Unfortunately, most operating systems do not follow this >>>> advice. >>> Right, EBX provides what we need to expose that the whole page is >>> monitored, thanks! >> So coming back to the original patch, is there anything that should keep us >> from exposing MWAIT straight into the guest at all times? > Just minor issues: > * OS X on Core 2 fails for unknown reason if we disable the instruction > trapping, which is an argument against doing it by default So for that we should try and see if changing the exposed CPUID MWAIT leaf helps. Currently we return 0/0 which is pretty bogus and might be the reason OSX fails. > * idling guests would consume host CPU, which is a significant change > in behavior and shouldn't be done without userspace's involvement That's the same as today, as idling guests with MWAIT would also today end up in a NOP emulated loop. Please bear in mind that I do not advocate to expose the MWAIT CPUID flag. This is only for the instruction trap. > I think the best compromise is to add a capability for the MWAIT VM-exit > controls and let userspace expose MWAIT if it wishes to. > Will send a patch. Please see my patch to force enable CPUID bits ;). Alex ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-04-04 12:51 ` Alexander Graf @ 2017-04-04 13:13 ` Radim Krčmář 2017-04-04 13:15 ` Alexander Graf 0 siblings, 1 reply; 54+ messages in thread From: Radim Krčmář @ 2017-04-04 13:13 UTC (permalink / raw) To: Alexander Graf Cc: Jim Mattson, Michael S. Tsirkin, LKML, Gabriel L. Somlo, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, the arch/x86 maintainers, Joerg Roedel, kvm list, linux-doc 2017-04-04 14:51+0200, Alexander Graf: > On 04/04/2017 02:39 PM, Radim Krčmář wrote: >> 2017-04-03 12:04+0200, Alexander Graf: >> > So coming back to the original patch, is there anything that should keep us >> > from exposing MWAIT straight into the guest at all times? >> Just minor issues: >> * OS X on Core 2 fails for unknown reason if we disable the instruction >> trapping, which is an argument against doing it by default > > So for that we should try and see if changing the exposed CPUID MWAIT leaf > helps. Currently we return 0/0 which is pretty bogus and might be the reason > OSX fails. We have tried to pass host's CPUID MWAIT leaf and it still failed: https://www.spinics.net/lists/kvm/msg146686.html I wouldn't mind breaking that particular combination of OS X and hardware, but I'm worried to do it because we don't understand why it broke, so there could be more ... >> * idling guests would consume host CPU, which is a significant change >> in behavior and shouldn't be done without userspace's involvement > > That's the same as today, as idling guests with MWAIT would also today end > up in a NOP emulated loop. > > Please bear in mind that I do not advocate to expose the MWAIT CPUID flag. > This is only for the instruction trap. Ah, makes sense. >> I think the best compromise is to add a capability for the MWAIT VM-exit >> controls and let userspace expose MWAIT if it wishes to. >> Will send a patch. > > Please see my patch to force enable CPUID bits ;). Nice. MWAIT could also use setting of arbitrary values for its leaf, but a generic interface for that would probably look clunky on the command line ... ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-04-04 13:13 ` Radim Krčmář @ 2017-04-04 13:15 ` Alexander Graf 2017-04-04 13:44 ` [Qemu-devel] " Radim Krčmář 0 siblings, 1 reply; 54+ messages in thread From: Alexander Graf @ 2017-04-04 13:15 UTC (permalink / raw) To: Radim Krčmář Cc: Jim Mattson, Michael S. Tsirkin, LKML, Gabriel L. Somlo, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, the arch/x86 maintainers, Joerg Roedel, kvm list, linux-doc On 04/04/2017 03:13 PM, Radim Krčmář wrote: > 2017-04-04 14:51+0200, Alexander Graf: >> On 04/04/2017 02:39 PM, Radim Krčmář wrote: >>> 2017-04-03 12:04+0200, Alexander Graf: >>>> So coming back to the original patch, is there anything that should keep us >>>> from exposing MWAIT straight into the guest at all times? >>> Just minor issues: >>> * OS X on Core 2 fails for unknown reason if we disable the instruction >>> trapping, which is an argument against doing it by default >> So for that we should try and see if changing the exposed CPUID MWAIT leaf >> helps. Currently we return 0/0 which is pretty bogus and might be the reason >> OSX fails. > We have tried to pass host's CPUID MWAIT leaf and it still failed: > https://www.spinics.net/lists/kvm/msg146686.html > > I wouldn't mind breaking that particular combination of OS X and > hardware, but I'm worried to do it because we don't understand why it > broke, so there could be more ... > >>> * idling guests would consume host CPU, which is a significant change >>> in behavior and shouldn't be done without userspace's involvement >> That's the same as today, as idling guests with MWAIT would also today end >> up in a NOP emulated loop. >> >> Please bear in mind that I do not advocate to expose the MWAIT CPUID flag. >> This is only for the instruction trap. > Ah, makes sense. > >>> I think the best compromise is to add a capability for the MWAIT VM-exit >>> controls and let userspace expose MWAIT if it wishes to. >>> Will send a patch. >> Please see my patch to force enable CPUID bits ;). > Nice. MWAIT could also use setting of arbitrary values for its leaf, > but a generic interface for that would probably look clunky on the > command line ... I think we should have an interface similar to smbios for that eventually. Something where you can explicitly set arbitrary CPUID leaf information using leaf specific syntax. There are more leafs where it would make sense - cache topology for example. Alex ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-04-04 13:15 ` Alexander Graf @ 2017-04-04 13:44 ` Radim Krčmář 0 siblings, 0 replies; 54+ messages in thread From: Radim Krčmář @ 2017-04-04 13:44 UTC (permalink / raw) To: Alexander Graf Cc: Jim Mattson, Michael S. Tsirkin, LKML, Gabriel L. Somlo, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, the arch/x86 maintainers, Joerg Roedel, kvm list, linux-doc, qemu-devel [Cc qemu-devel as we've gone off-topic] 2017-04-04 15:15+0200, Alexander Graf: > On 04/04/2017 03:13 PM, Radim Krčmář wrote: >> 2017-04-04 14:51+0200, Alexander Graf: >> > Please see my patch to force enable CPUID bits ;). >> Nice. MWAIT could also use setting of arbitrary values for its leaf, >> but a generic interface for that would probably look clunky on the >> command line ... > > > I think we should have an interface similar to smbios for that eventually. > Something where you can explicitly set arbitrary CPUID leaf information > using leaf specific syntax. There are more leafs where it would make sense - > cache topology for example. Right, separating cpuid from -cpu makes it bearable, like -cpuid leaf=%x[,subleaf=%x][,eax=%x][,ebx=%x][,ecx=%x][,edx=%x] And Having multiple interfaces for the same thing would result in some corner case decisions ... I think QEMU should check that feature flags specified flags specified by -cpu are not cleared by -cpuid. I'm not sure if setters like "|=" and "&=~" would be beneficial in some cases. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [Qemu-devel] [PATCH v5 untested] kvm: better MWAIT emulation for guests @ 2017-04-04 13:44 ` Radim Krčmář 0 siblings, 0 replies; 54+ messages in thread From: Radim Krčmář @ 2017-04-04 13:44 UTC (permalink / raw) To: Alexander Graf Cc: Jim Mattson, Michael S. Tsirkin, LKML, Gabriel L. Somlo, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, the arch/x86 maintainers, Joerg Roedel, kvm list, linux-doc, qemu-devel [Cc qemu-devel as we've gone off-topic] 2017-04-04 15:15+0200, Alexander Graf: > On 04/04/2017 03:13 PM, Radim Krčmář wrote: >> 2017-04-04 14:51+0200, Alexander Graf: >> > Please see my patch to force enable CPUID bits ;). >> Nice. MWAIT could also use setting of arbitrary values for its leaf, >> but a generic interface for that would probably look clunky on the >> command line ... > > > I think we should have an interface similar to smbios for that eventually. > Something where you can explicitly set arbitrary CPUID leaf information > using leaf specific syntax. There are more leafs where it would make sense - > cache topology for example. Right, separating cpuid from -cpu makes it bearable, like -cpuid leaf=%x[,subleaf=%x][,eax=%x][,ebx=%x][,ecx=%x][,edx=%x] And Having multiple interfaces for the same thing would result in some corner case decisions ... I think QEMU should check that feature flags specified flags specified by -cpu are not cleared by -cpuid. I'm not sure if setters like "|=" and "&=~" would be beneficial in some cases. ^ permalink raw reply [flat|nested] 54+ messages in thread
end of thread, other threads:[~2017-04-04 13:44 UTC | newest] Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-03-15 21:22 [PATCH v5 untested] kvm: better MWAIT emulation for guests Michael S. Tsirkin 2017-03-15 23:35 ` Gabriel L. Somlo 2017-03-15 23:41 ` Michael S. Tsirkin 2017-03-16 13:24 ` Gabriel L. Somlo 2017-03-16 14:04 ` Michael S. Tsirkin 2017-03-16 14:58 ` Gabriel L. Somlo 2017-03-16 15:23 ` Michael S. Tsirkin 2017-03-16 15:35 ` Radim Krčmář 2017-03-16 16:01 ` Radim Krčmář 2017-03-16 16:47 ` Gabriel L. Somlo 2017-03-16 17:22 ` Radim Krčmář 2017-03-16 17:39 ` Gabriel L. Somlo 2017-03-16 17:27 ` Michael S. Tsirkin 2017-03-16 17:41 ` Gabriel L. Somlo 2017-03-16 18:29 ` Michael S. Tsirkin 2017-03-16 19:24 ` Gabriel L. Somlo 2017-03-16 19:27 ` Michael S. Tsirkin 2017-03-16 20:17 ` Gabriel L. Somlo 2017-03-16 21:14 ` Gabriel L. Somlo 2017-03-17 2:03 ` Michael S. Tsirkin 2017-03-17 13:23 ` Gabriel L. Somlo 2017-03-21 3:22 ` Michael S. Tsirkin 2017-03-21 16:58 ` Radim Krčmář 2017-03-21 17:29 ` Nadav Amit 2017-03-21 17:29 ` Nadav Amit 2017-03-21 19:22 ` Radim Krčmář 2017-03-21 22:51 ` Gabriel Somlo 2017-03-22 0:02 ` Nadav Amit 2017-03-22 13:35 ` Michael S. Tsirkin 2017-03-22 14:10 ` Gabriel L. Somlo 2017-03-22 14:15 ` Michael S. Tsirkin 2017-03-16 16:16 ` Gabriel L. Somlo 2017-03-16 16:45 ` Michael S. Tsirkin 2017-03-16 16:52 ` Gabriel L. Somlo 2017-03-16 16:54 ` Gabriel L. Somlo 2017-03-16 17:14 ` Michael S. Tsirkin 2017-03-16 17:38 ` Radim Krčmář 2017-03-16 14:08 ` Radim Krčmář 2017-03-16 15:44 ` Gabriel L. Somlo 2017-03-16 15:54 ` Radim Krčmář 2017-03-16 16:26 ` Gabriel L. Somlo 2017-03-21 16:16 ` Joerg Roedel 2017-03-21 18:45 ` Michael S. Tsirkin 2017-03-27 13:34 ` Alexander Graf 2017-03-28 14:28 ` Radim Krčmář 2017-03-28 20:35 ` Jim Mattson 2017-03-29 12:11 ` Radim Krčmář 2017-04-03 10:04 ` Alexander Graf 2017-04-04 12:39 ` Radim Krčmář 2017-04-04 12:51 ` Alexander Graf 2017-04-04 13:13 ` Radim Krčmář 2017-04-04 13:15 ` Alexander Graf 2017-04-04 13:44 ` Radim Krčmář 2017-04-04 13:44 ` [Qemu-devel] " Radim Krčmář
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.