[PATCH v2] kvm: x86: emulate monitor and mwait instructions as nop

* [PATCH v2] kvm: x86: emulate monitor and mwait instructions as nop
@ 2014-05-07 20:52 Gabriel L. Somlo
  2014-06-02 19:25 ` Gabriel L. Somlo
  2014-06-05 20:59 ` Eric Northup
  0 siblings, 2 replies; 35+ messages in thread
From: Gabriel L. Somlo @ 2014-05-07 20:52 UTC (permalink / raw)
  To: kvm; +Cc: mst, pbonzini, afaerber, agraf

Treat monitor and mwait instructions as nop, which is architecturally
correct (but inefficient) behavior. We do this to prevent misbehaving
guests (e.g. OS X <= 10.7) from crashing after they fail to check for
monitor/mwait availability via cpuid.

Since mwait-based idle loops relying on these nop-emulated instructions
would keep the host CPU pegged at 100%, do NOT advertise their presence
via cpuid, to prevent compliant guests from using them inadvertently.

Signed-off-by: Gabriel L. Somlo <somlo@cmu.edu>
---

New in v2: remove invalid_op handler functions which were only used to
           handle exits caused by monitor and mwait

On Wed, May 07, 2014 at 08:31:27PM +0200, Alexander Graf wrote:
> On 05/07/2014 08:15 PM, Michael S. Tsirkin wrote:
> >If we really want to be paranoid and worry about guests
> >that use this strange way to trigger invalid opcode,
> >we can make it possible for userspace to enable/disable
> >this hack, and teach qemu to set it.
> >
> >That would make it even safer than it was.
> >
> >Not sure it's worth it, just a thought.
> 
> Since we don't trap on non-exposed other instructions (new SSE and
> whatdoiknow) I don't think it's really bad to just expose
> MONITOR/MWAIT as nops.

So AFAICT, linux prefers to use mwait for idling if cpuid tells it that
it's available. If we keep telling everyone that we do NOT have monitor
and mwait available, compliant guests will never end up using them, and
this hack would remain completely invisible to them, which is good
(better to use hlt-based idle loops when you're a vm guest, that would
actually allow the host to relax while you're halted :)

So the only time anyone would be able to tell we have this hack would be
when they're about to receive an invalid opcode for using monitor/mwait
in violation of what CPUID (would have) told them. That's what happens
to OS X prior to 10.8, which is when I'm hypothesizing the Apple devs
begain to seriously think about their OS running as a vm guest (on fusion
and parallels)...

Instead of killing the misbehaving guest with an invalid opcode, we'd
allow them to peg the host CPU with their monitor == mwait == nop idle
loop instead, which, at least on OS X, should be tolerable long enough
to run 'rm -rf System/Library/Extensions/AppleIntelCPUPowerManagement.kext'
and reboot the guest, after which things would settle down by reverting
the guest to a hlt-based idle loop.

The only reason I can think of to add functionality for enabling/disabling
this hack would be to protect against a malicious guest which would use
mwait *on purpose* to peg the host CPU. But a malicious guest could just
run "for(;;);" in ring 0 and accomplish the same goal, so we wouldn't
really gain anything in exchange for the added complexity...

Thanks,
  Gabriel

 arch/x86/kvm/cpuid.c |  2 ++
 arch/x86/kvm/svm.c   | 28 ++++++++++++++++++++--------
 arch/x86/kvm/vmx.c   | 20 ++++++++++++++++----
 3 files changed, 38 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index f47a104..d094fc6 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -283,6 +283,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 		0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW);
 	/* cpuid 1.ecx */
 	const u32 kvm_supported_word4_x86_features =
+		/* NOTE: MONITOR (and MWAIT) are emulated as NOP,
+		 * but *not* advertised to guests via CPUID ! */
 		F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |
 		0 /* DS-CPL, VMX, SMX, EST */ |
 		0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 7f4f9c2..0b7d58d 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2770,12 +2770,6 @@ static int xsetbv_interception(struct vcpu_svm *svm)
 	return 1;
 }
 
-static int invalid_op_interception(struct vcpu_svm *svm)
-{
-	kvm_queue_exception(&svm->vcpu, UD_VECTOR);
-	return 1;
-}
-
 static int task_switch_interception(struct vcpu_svm *svm)
 {
 	u16 tss_selector;
@@ -3287,6 +3281,24 @@ static int pause_interception(struct vcpu_svm *svm)
 	return 1;
 }
 
+static int nop_interception(struct vcpu_svm *svm)
+{
+	skip_emulated_instruction(&(svm->vcpu));
+	return 1;
+}
+
+static int monitor_interception(struct vcpu_svm *svm)
+{
+	printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
+	return nop_interception(svm);
+}
+
+static int mwait_interception(struct vcpu_svm *svm)
+{
+	printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n");
+	return nop_interception(svm);
+}
+
 static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
 	[SVM_EXIT_READ_CR0]			= cr_interception,
 	[SVM_EXIT_READ_CR3]			= cr_interception,
@@ -3344,8 +3356,8 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
 	[SVM_EXIT_CLGI]				= clgi_interception,
 	[SVM_EXIT_SKINIT]			= skinit_interception,
 	[SVM_EXIT_WBINVD]                       = emulate_on_interception,
-	[SVM_EXIT_MONITOR]			= invalid_op_interception,
-	[SVM_EXIT_MWAIT]			= invalid_op_interception,
+	[SVM_EXIT_MONITOR]			= monitor_interception,
+	[SVM_EXIT_MWAIT]			= mwait_interception,
 	[SVM_EXIT_XSETBV]			= xsetbv_interception,
 	[SVM_EXIT_NPF]				= pf_interception,
 };
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 33e8c02..3ccbcb1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5669,12 +5669,24 @@ static int handle_pause(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
-static int handle_invalid_op(struct kvm_vcpu *vcpu)
+static int handle_nop(struct kvm_vcpu *vcpu)
 {
-	kvm_queue_exception(vcpu, UD_VECTOR);
+	skip_emulated_instruction(vcpu);
 	return 1;
 }
 
+static int handle_mwait(struct kvm_vcpu *vcpu)
+{
+	printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n");
+	return handle_nop(vcpu);
+}
+
+static int handle_monitor(struct kvm_vcpu *vcpu)
+{
+	printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
+	return handle_nop(vcpu);
+}
+
 /*
  * To run an L2 guest, we need a vmcs02 based on the L1-specified vmcs12.
  * We could reuse a single VMCS for all the L2 guests, but we also want the
@@ -6571,8 +6583,8 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[EXIT_REASON_EPT_VIOLATION]	      = handle_ept_violation,
 	[EXIT_REASON_EPT_MISCONFIG]           = handle_ept_misconfig,
 	[EXIT_REASON_PAUSE_INSTRUCTION]       = handle_pause,
-	[EXIT_REASON_MWAIT_INSTRUCTION]	      = handle_invalid_op,
-	[EXIT_REASON_MONITOR_INSTRUCTION]     = handle_invalid_op,
+	[EXIT_REASON_MWAIT_INSTRUCTION]	      = handle_mwait,
+	[EXIT_REASON_MONITOR_INSTRUCTION]     = handle_monitor,
 	[EXIT_REASON_INVEPT]                  = handle_invept,
 };
 
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread