All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] use explicit 64bit storage for sysenter values
@ 2009-05-28  9:56 Andre Przywara
  2009-05-28  9:56 ` [PATCH 2/2] add sysenter/syscall emulation for 32bit compat mode Andre Przywara
  2009-05-31  8:53 ` [PATCH 1/2] use explicit 64bit storage for sysenter values Avi Kivity
  0 siblings, 2 replies; 6+ messages in thread
From: Andre Przywara @ 2009-05-28  9:56 UTC (permalink / raw)
  To: avi; +Cc: kvm, Andre Przywara, Christoph Egger

Since AMD does not support sysenter in 64bit mode, the VMCB fields storing
the MSRs are truncated to 32bit upon VMRUN/#VMEXIT. So store the values
in a separate 64bit storage to avoid truncation.

Signed-off-by: Christoph Egger <christoph.egger@amd.com>
---
 arch/x86/kvm/kvm_svm.h |    4 ++++
 arch/x86/kvm/svm.c     |   12 ++++++------
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/kvm_svm.h b/arch/x86/kvm/kvm_svm.h
index ed66e4c..4129dc1 100644
--- a/arch/x86/kvm/kvm_svm.h
+++ b/arch/x86/kvm/kvm_svm.h
@@ -27,6 +27,10 @@ struct vcpu_svm {
 	unsigned long vmcb_pa;
 	struct svm_cpu_data *svm_data;
 	uint64_t asid_generation;
+	uint64_t sysenter_cs;
+	uint64_t sysenter_esp;
+	uint64_t sysenter_eip;
+	struct kvm_segment user_cs; /* used in sysenter/sysexit emulation */
 
 	u64 next_rip;
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index dd667dd..f0f2885 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1978,13 +1978,13 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 *data)
 		break;
 #endif
 	case MSR_IA32_SYSENTER_CS:
-		*data = svm->vmcb->save.sysenter_cs;
+		*data = svm->sysenter_cs;
 		break;
 	case MSR_IA32_SYSENTER_EIP:
-		*data = svm->vmcb->save.sysenter_eip;
+		*data = svm->sysenter_eip;
 		break;
 	case MSR_IA32_SYSENTER_ESP:
-		*data = svm->vmcb->save.sysenter_esp;
+		*data = svm->sysenter_esp;
 		break;
 	/* Nobody will change the following 5 values in the VMCB so
 	   we can safely return them on rdmsr. They will always be 0
@@ -2068,13 +2068,13 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data)
 		break;
 #endif
 	case MSR_IA32_SYSENTER_CS:
-		svm->vmcb->save.sysenter_cs = data;
+		svm->sysenter_cs = data;
 		break;
 	case MSR_IA32_SYSENTER_EIP:
-		svm->vmcb->save.sysenter_eip = data;
+		svm->sysenter_eip = data;
 		break;
 	case MSR_IA32_SYSENTER_ESP:
-		svm->vmcb->save.sysenter_esp = data;
+		svm->sysenter_esp = data;
 		break;
 	case MSR_IA32_DEBUGCTLMSR:
 		if (!svm_has(SVM_FEATURE_LBRV)) {
-- 
1.6.1.3



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] add sysenter/syscall emulation for 32bit compat mode
  2009-05-28  9:56 [PATCH 1/2] use explicit 64bit storage for sysenter values Andre Przywara
@ 2009-05-28  9:56 ` Andre Przywara
  2009-05-31  8:59   ` Avi Kivity
  2009-05-31  8:53 ` [PATCH 1/2] use explicit 64bit storage for sysenter values Avi Kivity
  1 sibling, 1 reply; 6+ messages in thread
From: Andre Przywara @ 2009-05-28  9:56 UTC (permalink / raw)
  To: avi; +Cc: kvm, Andre Przywara, Amit Shah, Christoph Egger

sysenter/sysexit are not supported on AMD's 32bit compat mode, whereas
syscall is not supported on Intel's 32bit compat mode. To allow cross
vendor migration we emulate the missing instructions by setting up the
processor state according to the other call.
The sysenter code was originally sketched by Amit Shah, it was completed,
debugged,  syscall added and made-to-work by Christoph Egger and polished
up by Andre Przywara.
Please note that sysret does not need to be emulated, because it will be
exectued in 64bit mode and returning to 32bit compat mode works on Intel.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Christoph Egger <christoph.egger@amd.com>
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
---
 arch/x86/kvm/x86.c         |   37 ++++-
 arch/x86/kvm/x86_emulate.c |  349 +++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 380 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6d44dd5..dae7726 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2593,11 +2593,38 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
 		/* Reject the instructions other than VMCALL/VMMCALL when
 		 * try to emulate invalid opcode */
 		c = &vcpu->arch.emulate_ctxt.decode;
-		if ((emulation_type & EMULTYPE_TRAP_UD) &&
-		    (!(c->twobyte && c->b == 0x01 &&
-		      (c->modrm_reg == 0 || c->modrm_reg == 3) &&
-		       c->modrm_mod == 3 && c->modrm_rm == 1)))
-			return EMULATE_FAIL;
+
+		if (emulation_type & EMULTYPE_TRAP_UD) {
+			if (!c->twobyte)
+				return EMULATE_FAIL;
+			switch (c->b) {
+			case 0x01: /* VMMCALL */
+				if (c->modrm_mod != 3)
+					return EMULATE_FAIL;
+				if (c->modrm_rm != 1)
+					return EMULATE_FAIL;
+				break;
+			case 0x34: /* sysenter */
+			case 0x35: /* sysexit */
+				if (c->modrm_mod != 0)
+					return EMULATE_FAIL;
+				if (c->modrm_rm != 0)
+					return EMULATE_FAIL;
+				break;
+			case 0x05: /* syscall */
+				r = 0;
+				if (c->modrm_mod != 0)
+					return EMULATE_FAIL;
+				if (c->modrm_rm != 0)
+					return EMULATE_FAIL;
+				break;
+			default:
+				return EMULATE_FAIL;
+			}
+
+			if (!(c->modrm_reg == 0 || c->modrm_reg == 3))
+				return EMULATE_FAIL;
+		}
 
 		++vcpu->stat.insn_emulation;
 		if (r)  {
diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c
index 22c765d..41b78fa 100644
--- a/arch/x86/kvm/x86_emulate.c
+++ b/arch/x86/kvm/x86_emulate.c
@@ -32,6 +32,8 @@
 #include <linux/module.h>
 #include <asm/kvm_x86_emulate.h>
 
+#include "mmu.h"
+
 /*
  * Opcode effective-address decode tables.
  * Note that we only emulate instructions that have at least one memory
@@ -217,7 +219,9 @@ static u32 twobyte_table[256] = {
 	ModRM | ImplicitOps, ModRM, ModRM | ImplicitOps, ModRM, 0, 0, 0, 0,
 	0, 0, 0, 0, 0, 0, 0, 0,
 	/* 0x30 - 0x3F */
-	ImplicitOps, 0, ImplicitOps, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+	ImplicitOps, 0, ImplicitOps, 0,
+	ImplicitOps, ImplicitOps, 0, 0,
+	0, 0, 0, 0, 0, 0, 0, 0,
 	/* 0x40 - 0x47 */
 	DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov,
 	DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov,
@@ -320,8 +324,11 @@ static u32 group2_table[] = {
 };
 
 /* EFLAGS bit definitions. */
+#define EFLG_VM (1<<17)
+#define EFLG_RF (1<<16)
 #define EFLG_OF (1<<11)
 #define EFLG_DF (1<<10)
+#define EFLG_IF (1<<9)
 #define EFLG_SF (1<<7)
 #define EFLG_ZF (1<<6)
 #define EFLG_AF (1<<4)
@@ -1985,10 +1992,114 @@ twobyte_insn:
 			goto cannot_emulate;
 		}
 		break;
+	case 0x05: { /* syscall */
+		unsigned long cr0 = ctxt->vcpu->arch.cr0;
+		struct kvm_segment cs, ss;
+
+		memset(&cs, 0, sizeof(struct kvm_segment));
+		memset(&ss, 0, sizeof(struct kvm_segment));
+
+		/* inject #UD if
+		 * 1. we are in real mode
+		 * 2. protected mode is not enabled
+		 * 3. LOCK prefix is used
+		 */
+		if ((ctxt->mode == X86EMUL_MODE_REAL)
+			|| (!(cr0 & X86_CR0_PE))
+			|| (c->lock_prefix)) {
+			/* we don't need to inject #UD here, because
+			 * when emulate_instruction() returns something else
+			 * than EMULATE_DONE, then svm.c:ud_interception()
+			 * will do that for us.
+			 */
+			goto cannot_emulate;
+		}
+
+		/* inject #UD if syscall/sysret are disabled. */
+		kvm_x86_ops->get_msr(ctxt->vcpu, MSR_K6_EFER, &msr_data);
+		if ((msr_data & EFER_SCE) == 0) {
+			/* we don't need to inject #UD here, because
+			 * when emulate_instruction() returns something else
+			 * than EMULATE_DONE, then svm.c:ud_interception()
+			 * will do that for us.
+			 */
+			goto cannot_emulate;
+		}
+
+		kvm_x86_ops->get_msr(ctxt->vcpu, MSR_STAR, &msr_data);
+		msr_data >>= 32;
+
+		cs.selector = (u16)(msr_data & 0xfffc);
+		cs.l = 0;		/* will be adjusted later */
+		cs.base = 0;		/* flat segment */
+		cs.g = 1;		/* 4kb granularity */
+		cs.limit = 0xfffff;	/* 4GB limit */
+		cs.type = 0x0b;		/* Read, Execute, Accessed */
+		cs.dpl = 0;		/* will be adjusted later */
+		cs.present = 1;
+		cs.s = 1;
+		cs.db = 1;
+
+		ss.unusable = 0;
+		ss.selector = (u16)(msr_data + 8);
+		ss.base = 0;
+		ss.type = 0x03;		/* Read/Write, Expand-Up, Accessed */
+		ss.present = 1;
+		ss.s = 1;
+		ss.db = 1;
+
+		if (is_long_mode(ctxt->vcpu)) {
+
+			cs.db = 0;
+			cs.l = 1;	/* long mode */
+
+			c->regs[VCPU_REGS_RCX] = c->eip;
+			c->regs[VCPU_REGS_R11] = ctxt->eflags & ~EFLG_RF;
+
+			switch (ctxt->mode) {
+			case X86EMUL_MODE_PROT64:
+				/* Intel cares about granularity (g bit),
+				 * so we don't set the effective limit.
+				 */
+				cs.g = 1;
+				cs.limit = 0xffffffff;
+
+				kvm_x86_ops->get_msr(ctxt->vcpu,
+					MSR_LSTAR, &msr_data);
+				break;
+			case X86EMUL_MODE_PROT32:
+				/* compat mode */
+				kvm_x86_ops->get_msr(ctxt->vcpu,
+					MSR_CSTAR, &msr_data);
+				break;
+			}
+
+			c->eip = msr_data;
+
+			kvm_x86_ops->get_msr(ctxt->vcpu,
+				MSR_SYSCALL_MASK, &msr_data);
+			ctxt->eflags &= ~(msr_data | EFLG_RF);
+		} else {
+			/* legacy mode */
+
+			kvm_x86_ops->get_msr(ctxt->vcpu, MSR_STAR, &msr_data);
+			c->regs[VCPU_REGS_RCX] = c->eip;
+			c->eip = (u32)msr_data;
+
+			ctxt->eflags &= ~(EFLG_VM | EFLG_IF | EFLG_RF);
+		}
+
+		kvm_x86_ops->set_segment(ctxt->vcpu, &cs, VCPU_SREG_CS);
+		kvm_x86_ops->set_segment(ctxt->vcpu, &ss, VCPU_SREG_SS);
+
+		goto writeback;
+		break;
+	}
 	case 0x06:
 		emulate_clts(ctxt->vcpu);
 		c->dst.type = OP_NONE;
 		break;
+	case 0x07:		/* sysret */
 	case 0x08:		/* invd */
 	case 0x09:		/* wbinvd */
 	case 0x0d:		/* GrpP (prefetch) */
@@ -2051,6 +2162,242 @@ twobyte_insn:
 		rc = X86EMUL_CONTINUE;
 		c->dst.type = OP_NONE;
 		break;
+	case 0x34: { /* sysenter */
+		/* Intel manual Vol 2b */
+		unsigned long cr0 = ctxt->vcpu->arch.cr0;
+		struct kvm_segment cs, ss;
+
+		memset(&cs, 0, sizeof(struct kvm_segment));
+		memset(&ss, 0, sizeof(struct kvm_segment));
+
+		/* XXX sysenter/sysexit have not been tested in 64bit mode.
+		 * Therefore, we inject an #UD.
+		 */
+		if (ctxt->mode == X86EMUL_MODE_PROT64) {
+			/* we don't need to inject #UD here, because
+			 * when emulate_instruction() returns something else
+			 * than EMULATE_DONE, then svm.c:ud_interception()
+			 * will do that for us.
+			 */
+			goto cannot_emulate;
+		}
+
+		if ((ctxt->mode == X86EMUL_MODE_REAL) ||
+		    (!(cr0 & X86_CR0_PE))) {
+			kvm_inject_gp(ctxt->vcpu, 0);
+			goto cannot_emulate;
+		}
+
+		/* inject #UD if LOCK prefix is used */
+		if (c->lock_prefix) {
+			/* we don't need to inject #UD here, because
+			 * when emulate_instruction() returns something else
+			 * than EMULATE_DONE, then svm.c:ud_interception()
+			 * will do that for us.
+			 */
+			goto cannot_emulate;
+		}
+
+		kvm_x86_ops->get_msr(ctxt->vcpu,
+			MSR_IA32_SYSENTER_CS, &msr_data);
+		switch (ctxt->mode) {
+		case X86EMUL_MODE_PROT32:
+			if ((msr_data & 0xfffc) != 0x0)
+				break;
+			kvm_inject_gp(ctxt->vcpu, 0);
+			goto cannot_emulate;
+		case X86EMUL_MODE_PROT64:
+			if (msr_data != 0x0)
+				break;
+			kvm_inject_gp(ctxt->vcpu, 0);
+			goto cannot_emulate;
+		}
+
+		ctxt->eflags &= ~(EFLG_VM | EFLG_IF | EFLG_RF);
+
+		kvm_x86_ops->get_segment(ctxt->vcpu, &cs, VCPU_SREG_CS);
+
+		cs.selector = (u16)msr_data;
+		cs.base = 0;		/* flat segment */
+		cs.limit = 0xfffff;	/* 4GB limit */
+		cs.g = 1;		/* 4kb granularity */
+		cs.s = 1;
+		cs.type = 0x0b;		/* Execute + Read, Accessed */
+		cs.db = 1;		/* 32bit code segment */
+		cs.dpl = 0;
+		cs.selector &= ~SELECTOR_RPL_MASK;
+		cs.present = 1;
+
+		/* No need to set cpl explicitely here. set_segment()
+		 * does this below based on the cs.dpl value.
+		 */
+
+		ss.unusable = 0;
+		ss.selector = cs.selector + 8;
+		ss.base = 0;		/* flat segment */
+		ss.limit = 0xfffff;	/* 4GB limit */
+		ss.g = 1;		/* 4kb granularity */
+		ss.s = 1;
+		ss.type = 0x03;		/* Read/Write, Accessed */
+		ss.db = 1;		/* 32bit stack segment */
+		ss.dpl = 0;
+		ss.selector &= ~SELECTOR_RPL_MASK;
+		ss.present = 1;
+
+		switch (ctxt->mode) {
+		case X86EMUL_MODE_PROT32:
+			if (!is_long_mode(ctxt->vcpu))
+				break;
+			/* fallthrough */
+		case X86EMUL_MODE_PROT64:
+			cs.base = 0;
+			cs.db = 0;
+			cs.l = 1;
+			cs.limit = 0xffffffff;
+			ss.base = 0;
+			ss.limit = 0xffffffff;
+			break;
+		default:
+			break;
+		}
+
+		kvm_x86_ops->set_segment(ctxt->vcpu, &cs, VCPU_SREG_CS);
+		kvm_x86_ops->set_segment(ctxt->vcpu, &ss, VCPU_SREG_SS);
+
+		kvm_x86_ops->get_msr(ctxt->vcpu,
+			MSR_IA32_SYSENTER_EIP, &msr_data);
+		c->eip = msr_data;
+
+		kvm_x86_ops->get_msr(ctxt->vcpu,
+			MSR_IA32_SYSENTER_ESP, &msr_data);
+		c->regs[VCPU_REGS_RSP] = msr_data;
+
+		goto writeback;
+		break;
+	}
+	case 0x35: { /* sysexit */
+		/* Intel manual Vol 2b */
+		unsigned long cr0 = ctxt->vcpu->arch.cr0;
+		struct kvm_segment cs, ss;
+		int usermode;
+
+		memset(&cs, 0, sizeof(struct kvm_segment));
+		memset(&ss, 0, sizeof(struct kvm_segment));
+
+		if ((ctxt->mode == X86EMUL_MODE_REAL)
+		   || (!(cr0 & X86_CR0_PE))
+		   || (kvm_x86_ops->get_cpl(ctxt->vcpu) != 0)) {
+			kvm_inject_gp(ctxt->vcpu, 0);
+			goto cannot_emulate;
+		}
+
+		/* inject #UD if LOCK prefix is used */
+		if (c->lock_prefix) {
+			/* we don't need to inject #UD here, because
+			 * when emulate_instruction() returns something else
+			 * than EMULATE_DONE, then svm.c:ud_interception()
+			 * will do that for us.
+			 */
+			goto cannot_emulate;
+		}
+
+		/* TODO: Check if rip and rsp are canonical.
+		 * inject_gp() if not
+		 */
+
+		/* if REX.W bit is set ... */
+		if ((c->rex_prefix & 0x8) != 0x0) {
+			/* Application is in 64bit mode */
+			usermode = X86EMUL_MODE_PROT64;
+		} else {
+			/* Application is in 32bit legacy/compat mode */
+			usermode = X86EMUL_MODE_PROT32;
+		}
+
+		kvm_x86_ops->get_msr(ctxt->vcpu,
+			MSR_IA32_SYSENTER_CS, &msr_data);
+		kvm_x86_ops->get_segment(ctxt->vcpu, &cs, VCPU_SREG_CS);
+		switch (usermode) {
+		case X86EMUL_MODE_PROT32:
+			cs.selector = (u16)(msr_data + 16);
+			if ((msr_data & 0xfffc) != 0x0)
+				break;
+			kvm_inject_gp(ctxt->vcpu, 0);
+			goto cannot_emulate;
+		case X86EMUL_MODE_PROT64:
+			cs.selector = (u16)(msr_data + 32);
+			if (msr_data != 0x0)
+				break;
+			kvm_inject_gp(ctxt->vcpu, 0);
+			goto cannot_emulate;
+		}
+
+		cs.base = 0;		/* flat segment */
+		cs.limit = 0xfffff;	/* 4GB limit */
+		cs.g = 1;		/* 4kb granularity */
+		cs.s = 1;
+		cs.type = 0x0b;		/* Execute, Read, Non-conforming code */
+		cs.db = 1;		/* 32bit code segment */
+		cs.dpl = 3;
+		cs.selector |= SELECTOR_RPL_MASK;
+		cs.present = 1;
+		cs.l = 0;		/* For return to compatibility mode */
+
+		/* No need to set cpl explicitely here. set_segment()
+		 * does this below based on the cs.dpl value.
+		 */
+
+		switch (usermode) {
+		case X86EMUL_MODE_PROT32:
+			ss.selector = (u16)(msr_data + 24);
+			break;
+		case X86EMUL_MODE_PROT64:
+			ss.selector = (cs.selector + 8);
+			break;
+		}
+		ss.base = 0;		/* flat segment */
+		ss.limit = 0xfffff;	/* 4GB limit */
+		ss.g = 1;		/* 4kb granularity */
+		ss.s = 1;
+		ss.type = 0x03;		/* Expand Up, Read/Write, Data */
+		ss.db = 1;		/* 32bit stack segment */
+		ss.dpl = 3;
+		ss.selector |= SELECTOR_RPL_MASK;
+		ss.present = 1;
+
+		switch (usermode) {
+		case X86EMUL_MODE_PROT32:
+			/* We don't care about cs.g/ss.g bits
+			 * (= 4kb granularity) so we have to set the effective
+			 * limit here or we get a #GP in the guest, otherwise.
+			 */
+			cs.limit = 0xffffffff;
+			ss.limit = 0xffffffff;
+			break;
+		case X86EMUL_MODE_PROT64:
+			/* We don't care about cs.g/ss.g bits
+			 * (= 4kb granularity) so we have to set the effective
+			 * limit here or we get a #GP in the guest, otherwise.
+			 */
+			cs.base = 0;
+			cs.db = 0;
+			cs.l = 1;
+			cs.limit = 0xffffffff;
+			ss.base = 0;
+			ss.limit = 0xffffffff;
+			break;
+		default:
+			break;
+		}
+		kvm_x86_ops->set_segment(ctxt->vcpu, &cs, VCPU_SREG_CS);
+		kvm_x86_ops->set_segment(ctxt->vcpu, &ss, VCPU_SREG_SS);
+
+		c->eip = ctxt->vcpu->arch.regs[VCPU_REGS_RDX];
+		c->regs[VCPU_REGS_RSP] = ctxt->vcpu->arch.regs[VCPU_REGS_RCX];
+
+		goto writeback;
+		break;
+	}
 	case 0x40 ... 0x4f:	/* cmov */
 		c->dst.val = c->dst.orig_val = c->src.val;
 		if (!test_cc(c->b, ctxt->eflags))
-- 
1.6.1.3



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] use explicit 64bit storage for sysenter values
  2009-05-28  9:56 [PATCH 1/2] use explicit 64bit storage for sysenter values Andre Przywara
  2009-05-28  9:56 ` [PATCH 2/2] add sysenter/syscall emulation for 32bit compat mode Andre Przywara
@ 2009-05-31  8:53 ` Avi Kivity
  1 sibling, 0 replies; 6+ messages in thread
From: Avi Kivity @ 2009-05-31  8:53 UTC (permalink / raw)
  To: Andre Przywara; +Cc: kvm, Christoph Egger

Andre Przywara wrote:
> Since AMD does not support sysenter in 64bit mode, the VMCB fields storing
> the MSRs are truncated to 32bit upon VMRUN/#VMEXIT. So store the values
> in a separate 64bit storage to avoid truncation.
>   

Applied, thanks.

> +	struct kvm_segment user_cs; /* used in sysenter/sysexit emulation */
>   

Dropped this, unused.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] add sysenter/syscall emulation for 32bit compat mode
  2009-05-28  9:56 ` [PATCH 2/2] add sysenter/syscall emulation for 32bit compat mode Andre Przywara
@ 2009-05-31  8:59   ` Avi Kivity
  2009-06-05 12:23     ` Andre Przywara
  0 siblings, 1 reply; 6+ messages in thread
From: Avi Kivity @ 2009-05-31  8:59 UTC (permalink / raw)
  To: Andre Przywara; +Cc: kvm, Amit Shah, Christoph Egger

Andre Przywara wrote:
> sysenter/sysexit are not supported on AMD's 32bit compat mode, whereas
> syscall is not supported on Intel's 32bit compat mode. To allow cross
> vendor migration we emulate the missing instructions by setting up the
> processor state according to the other call.
> The sysenter code was originally sketched by Amit Shah, it was completed,
> debugged,  syscall added and made-to-work by Christoph Egger and polished
> up by Andre Przywara.
> Please note that sysret does not need to be emulated, because it will be
> exectued in 64bit mode and returning to 32bit compat mode works on Intel.
>  		++vcpu->stat.insn_emulation;
>  		if (r)  {
> diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c
> index 22c765d..41b78fa 100644
> --- a/arch/x86/kvm/x86_emulate.c
> +++ b/arch/x86/kvm/x86_emulate.c
> @@ -32,6 +32,8 @@
>  #include <linux/module.h>
>  #include <asm/kvm_x86_emulate.h>
>  
> +#include "mmu.h"
> +
>   

I think this is unneeded?

> @@ -1985,10 +1992,114 @@ twobyte_insn:
>  			goto cannot_emulate;
>  		}
>  		break;
> +	case 0x05: { /* syscall */
> +		unsigned long cr0 = ctxt->vcpu->arch.cr0;
> +		struct kvm_segment cs, ss;
> +
> +		memset(&cs, 0, sizeof(struct kvm_segment));
> +		memset(&ss, 0, sizeof(struct kvm_segment));
> +
> +		/* inject #UD if
> +		 * 1. we are in real mode
> +		 * 2. protected mode is not enabled
> +		 * 3. LOCK prefix is used
> +		 */
> +		if ((ctxt->mode == X86EMUL_MODE_REAL)
> +			|| (!(cr0 & X86_CR0_PE))
> +			|| (c->lock_prefix)) {
> +			/* we don't need to inject #UD here, because
> +			 * when emulate_instruction() returns something else
> +			 * than EMULATE_DONE, then svm.c:ud_interception()
> +			 * will do that for us.
> +			 */
> +			goto cannot_emulate;
>   

I prefer explicit injection, relying on the caller is tricky and may change.

> +	case 0x07:		/* sysret */
>   

Since we don't emulate sysret, it should be dropped here.

> +			cs.limit = 0xffffffff;
> +			ss.base = 0;
> +			ss.limit = 0xffffffff;
>   

Once is enough.


Please move the code out of the switch and into separate functions.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] add sysenter/syscall emulation for 32bit compat mode
  2009-05-31  8:59   ` Avi Kivity
@ 2009-06-05 12:23     ` Andre Przywara
  2009-06-07  7:09       ` Avi Kivity
  0 siblings, 1 reply; 6+ messages in thread
From: Andre Przywara @ 2009-06-05 12:23 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Amit Shah, Christoph Egger

Hi Avi,

>> --- a/arch/x86/kvm/x86_emulate.c
>> +++ b/arch/x86/kvm/x86_emulate.c
>> @@ -32,6 +32,8 @@
>>  #include <linux/module.h>
>>  #include <asm/kvm_x86_emulate.h>
>>  
>> +#include "mmu.h"
>> +
>>   
> 
> I think this is unneeded?
Seems so. Probably a left-over from debugging.
> 
>> @@ -1985,10 +1992,114 @@ twobyte_insn:
>>              goto cannot_emulate;
>>          }
>>          break;
>> +    case 0x05: { /* syscall */
>> +        unsigned long cr0 = ctxt->vcpu->arch.cr0;
>> +        struct kvm_segment cs, ss;
>> +
>> +        memset(&cs, 0, sizeof(struct kvm_segment));
>> +        memset(&ss, 0, sizeof(struct kvm_segment));
>> +
>> +        /* inject #UD if
>> +         * 1. we are in real mode
>> +         * 2. protected mode is not enabled
>> +         * 3. LOCK prefix is used
>> +         */
>> +        if ((ctxt->mode == X86EMUL_MODE_REAL)
>> +            || (!(cr0 & X86_CR0_PE))
>> +            || (c->lock_prefix)) {
>> +            /* we don't need to inject #UD here, because
>> +             * when emulate_instruction() returns something else
>> +             * than EMULATE_DONE, then svm.c:ud_interception()
>> +             * will do that for us.
>> +             */
>> +            goto cannot_emulate;
>>   
> 
> I prefer explicit injection, relying on the caller is tricky and may 
> change.
I don't agree. If this function cannot emulate an instruction, it 
returns -1 and lets the upper levels handle this. If we cannot rely on 
this, what else can we rely on? I could remove the comment in case this 
is confusing. The same functionality (return -1 to inject UD into the 
guest) is used in other places in this same file.

> 
>> +    case 0x07:        /* sysret */
>>   
> 
> Since we don't emulate sysret, it should be dropped here.
OK, will do.
> 
>> +            cs.limit = 0xffffffff;
>> +            ss.base = 0;
>> +            ss.limit = 0xffffffff;
>>   
> 
> Once is enough.
You are right about the ss.base assignment. But the limit goes from five 
f's to eight f's. On a first glance this should not matter (as the 
granularity bit is set), but exactly here are differences between VMX 
and SVM, so I'd like to leave it this way.
> 
> Please move the code out of the switch and into separate functions.

Ok, will do.

Thanks for the review!
Renewed patch will follow.

Regards,
Andre.

-- 
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] add sysenter/syscall emulation for 32bit compat mode
  2009-06-05 12:23     ` Andre Przywara
@ 2009-06-07  7:09       ` Avi Kivity
  0 siblings, 0 replies; 6+ messages in thread
From: Avi Kivity @ 2009-06-07  7:09 UTC (permalink / raw)
  To: Andre Przywara; +Cc: kvm, Amit Shah, Christoph Egger

Andre Przywara wrote:
>
>>
>>> @@ -1985,10 +1992,114 @@ twobyte_insn:
>>>              goto cannot_emulate;
>>>          }
>>>          break;
>>> +    case 0x05: { /* syscall */
>>> +        unsigned long cr0 = ctxt->vcpu->arch.cr0;
>>> +        struct kvm_segment cs, ss;
>>> +
>>> +        memset(&cs, 0, sizeof(struct kvm_segment));
>>> +        memset(&ss, 0, sizeof(struct kvm_segment));
>>> +
>>> +        /* inject #UD if
>>> +         * 1. we are in real mode
>>> +         * 2. protected mode is not enabled
>>> +         * 3. LOCK prefix is used
>>> +         */
>>> +        if ((ctxt->mode == X86EMUL_MODE_REAL)
>>> +            || (!(cr0 & X86_CR0_PE))
>>> +            || (c->lock_prefix)) {
>>> +            /* we don't need to inject #UD here, because
>>> +             * when emulate_instruction() returns something else
>>> +             * than EMULATE_DONE, then svm.c:ud_interception()
>>> +             * will do that for us.
>>> +             */
>>> +            goto cannot_emulate;
>>>   
>>
>> I prefer explicit injection, relying on the caller is tricky and may 
>> change.
> I don't agree. If this function cannot emulate an instruction, it 
> returns -1 and lets the upper levels handle this. If we cannot rely on 
> this, what else can we rely on? I could remove the comment in case 
> this is confusing. The same functionality (return -1 to inject UD into 
> the guest) is used in other places in this same file.

We return -1, but that doesn't mean we inject #UD.  For some cases (page 
table emulation), we unshadow the page and retry (letting the guest 
execute natively).

We only inject #UD from that specific call site.  If we somehow emulated 
from another call site, we'd get different behaviour.

>
>>
>>> +            cs.limit = 0xffffffff;
>>> +            ss.base = 0;
>>> +            ss.limit = 0xffffffff;
>>>   
>>
>> Once is enough.
> You are right about the ss.base assignment. But the limit goes from 
> five f's to eight f's. On a first glance this should not matter (as 
> the granularity bit is set), but exactly here are differences between 
> VMX and SVM, so I'd like to leave it this way.

I misread, I thought you were setting ss.limit twice.  Certainly the 
code is correct as is and should not be modified.  Sorry about the 
confusion.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-06-07  7:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-28  9:56 [PATCH 1/2] use explicit 64bit storage for sysenter values Andre Przywara
2009-05-28  9:56 ` [PATCH 2/2] add sysenter/syscall emulation for 32bit compat mode Andre Przywara
2009-05-31  8:59   ` Avi Kivity
2009-06-05 12:23     ` Andre Przywara
2009-06-07  7:09       ` Avi Kivity
2009-05-31  8:53 ` [PATCH 1/2] use explicit 64bit storage for sysenter values Avi Kivity

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.