All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] KVM: x86: Emulator fixes
@ 2014-12-25  0:52 Nadav Amit
  2014-12-25  0:52 ` [PATCH 1/8] KVM: x86: #PF error-code on R/W operations is wrong Nadav Amit
                   ` (9 more replies)
  0 siblings, 10 replies; 18+ messages in thread
From: Nadav Amit @ 2014-12-25  0:52 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, Nadav Amit

Few more emulator fixes. Each is logically independent from the others.

The first one is the most interesting one.  It appears that the current
behavior may cause the VM to enter the page-fault handler twice on certain
faulting write accesses. If you do not like my solution, please propose a
better one.

The fourth (JMP/CALL using call- or task-gate) is not a fix, but returns an
error instead of emulating the wrong (#GP) exception.

Thanks for reviewing the patches.

Nadav Amit (8):
  KVM: x86: #PF error-code on R/W operations is wrong
  KVM: x86: pop sreg accesses only 2 bytes
  KVM: x86: fnstcw and fnstsw may cause spurious exception
  KVM: x86: JMP/CALL using call- or task-gate causes exception
  KVM: x86: em_call_far should return failure result
  KVM: x86: POP [ESP] is not emulated correctly
  KVM: x86: Do not set access bit on accessed segments
  KVM: x86: Access to LDT/GDT that wraparound is incorrect

 arch/x86/include/asm/kvm_host.h |  12 ++++
 arch/x86/kvm/emulate.c          | 138 ++++++++++++++++++++++++++--------------
 arch/x86/kvm/mmu.h              |  12 ----
 3 files changed, 103 insertions(+), 59 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/8] KVM: x86: #PF error-code on R/W operations is wrong
  2014-12-25  0:52 [PATCH 0/8] KVM: x86: Emulator fixes Nadav Amit
@ 2014-12-25  0:52 ` Nadav Amit
  2014-12-26  7:25   ` Wu, Feng
  2014-12-25  0:52 ` [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes Nadav Amit
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 18+ messages in thread
From: Nadav Amit @ 2014-12-25  0:52 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, Nadav Amit

When emulating an instruction that reads the destination memory operand (i.e.,
instructions without the Mov flag in the emulator), the operand is first read.
If a page-fault is detected in this phase, the error-code which would be
delivered to the VM does not indicate that the access that caused the exception
is a write one. This does not conform with real hardware, and may cause the VM
to enter the page-fault handler twice for no reason (once for read, once for
write).

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/include/asm/kvm_host.h | 12 ++++++++++++
 arch/x86/kvm/emulate.c          |  6 +++++-
 arch/x86/kvm/mmu.h              | 12 ------------
 3 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 73ccb12..d6f90ca 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -162,6 +162,18 @@ enum {
 #define DR7_FIXED_1	0x00000400
 #define DR7_VOLATILE	0xffff2bff
 
+#define PFERR_PRESENT_BIT 0
+#define PFERR_WRITE_BIT 1
+#define PFERR_USER_BIT 2
+#define PFERR_RSVD_BIT 3
+#define PFERR_FETCH_BIT 4
+
+#define PFERR_PRESENT_MASK (1U << PFERR_PRESENT_BIT)
+#define PFERR_WRITE_MASK (1U << PFERR_WRITE_BIT)
+#define PFERR_USER_MASK (1U << PFERR_USER_BIT)
+#define PFERR_RSVD_MASK (1U << PFERR_RSVD_BIT)
+#define PFERR_FETCH_MASK (1U << PFERR_FETCH_BIT)
+
 /* apic attention bits */
 #define KVM_APIC_CHECK_VAPIC	0
 /*
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 7a9697f..e5a84be 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4941,8 +4941,12 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 		/* optimisation - avoid slow emulated read if Mov */
 		rc = segmented_read(ctxt, ctxt->dst.addr.mem,
 				   &ctxt->dst.val, ctxt->dst.bytes);
-		if (rc != X86EMUL_CONTINUE)
+		if (rc != X86EMUL_CONTINUE) {
+			if (rc == X86EMUL_PROPAGATE_FAULT &&
+			    ctxt->exception.vector == PF_VECTOR)
+				ctxt->exception.error_code |= PFERR_WRITE_MASK;
 			goto done;
+		}
 	}
 	ctxt->dst.orig_val = ctxt->dst.val;
 
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 6b34876b..daae711 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -44,18 +44,6 @@
 #define PT_DIRECTORY_LEVEL 2
 #define PT_PAGE_TABLE_LEVEL 1
 
-#define PFERR_PRESENT_BIT 0
-#define PFERR_WRITE_BIT 1
-#define PFERR_USER_BIT 2
-#define PFERR_RSVD_BIT 3
-#define PFERR_FETCH_BIT 4
-
-#define PFERR_PRESENT_MASK (1U << PFERR_PRESENT_BIT)
-#define PFERR_WRITE_MASK (1U << PFERR_WRITE_BIT)
-#define PFERR_USER_MASK (1U << PFERR_USER_BIT)
-#define PFERR_RSVD_MASK (1U << PFERR_RSVD_BIT)
-#define PFERR_FETCH_MASK (1U << PFERR_FETCH_BIT)
-
 static inline u64 rsvd_bits(int s, int e)
 {
 	return ((1ULL << (e - s + 1)) - 1) << s;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes
  2014-12-25  0:52 [PATCH 0/8] KVM: x86: Emulator fixes Nadav Amit
  2014-12-25  0:52 ` [PATCH 1/8] KVM: x86: #PF error-code on R/W operations is wrong Nadav Amit
@ 2014-12-25  0:52 ` Nadav Amit
  2014-12-25  9:10   ` Chen, Tiejun
  2014-12-25  0:52 ` [PATCH 3/8] KVM: x86: fnstcw and fnstsw may cause spurious exception Nadav Amit
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 18+ messages in thread
From: Nadav Amit @ 2014-12-25  0:52 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, Nadav Amit

Although pop sreg updates RSP according to the operand size, only 2 bytes are
read.  The current behavior may result in incorrect #GP or #PF exceptions.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/kvm/emulate.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e5a84be..702da5e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1830,12 +1830,14 @@ static int em_pop_sreg(struct x86_emulate_ctxt *ctxt)
 	unsigned long selector;
 	int rc;
 
-	rc = emulate_pop(ctxt, &selector, ctxt->op_bytes);
+	rc = emulate_pop(ctxt, &selector, 2);
 	if (rc != X86EMUL_CONTINUE)
 		return rc;
 
 	if (ctxt->modrm_reg == VCPU_SREG_SS)
 		ctxt->interruptibility = KVM_X86_SHADOW_INT_MOV_SS;
+	if (ctxt->op_bytes > 2)
+		rsp_increment(ctxt, ctxt->op_bytes - 2);
 
 	rc = load_segment_descriptor(ctxt, (u16)selector, seg);
 	return rc;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/8] KVM: x86: fnstcw and fnstsw may cause spurious exception
  2014-12-25  0:52 [PATCH 0/8] KVM: x86: Emulator fixes Nadav Amit
  2014-12-25  0:52 ` [PATCH 1/8] KVM: x86: #PF error-code on R/W operations is wrong Nadav Amit
  2014-12-25  0:52 ` [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes Nadav Amit
@ 2014-12-25  0:52 ` Nadav Amit
  2014-12-25  0:52 ` [PATCH 4/8] KVM: x86: JMP/CALL using call- or task-gate causes exception Nadav Amit
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Nadav Amit @ 2014-12-25  0:52 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, Nadav Amit

Since the operand size of fnstcw and fnstsw is updated during the execution,
the emulation may cause spurious exceptions as it reads the memory beforehand.

Marking these instructions as Mov (since the previous value is ignored) and
DstMem16 to simplify the setting of operand size.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/kvm/emulate.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 702da5e..19923e7 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -86,6 +86,7 @@
 #define DstAcc      (OpAcc << DstShift)
 #define DstDI       (OpDI << DstShift)
 #define DstMem64    (OpMem64 << DstShift)
+#define DstMem16    (OpMem16 << DstShift)
 #define DstImmUByte (OpImmUByte << DstShift)
 #define DstDX       (OpDX << DstShift)
 #define DstAccLo    (OpAccLo << DstShift)
@@ -1059,8 +1060,6 @@ static int em_fnstcw(struct x86_emulate_ctxt *ctxt)
 	asm volatile("fnstcw %0": "+m"(fcw));
 	ctxt->ops->put_fpu(ctxt);
 
-	/* force 2 byte destination */
-	ctxt->dst.bytes = 2;
 	ctxt->dst.val = fcw;
 
 	return X86EMUL_CONTINUE;
@@ -1077,8 +1076,6 @@ static int em_fnstsw(struct x86_emulate_ctxt *ctxt)
 	asm volatile("fnstsw %0": "+m"(fsw));
 	ctxt->ops->put_fpu(ctxt);
 
-	/* force 2 byte destination */
-	ctxt->dst.bytes = 2;
 	ctxt->dst.val = fsw;
 
 	return X86EMUL_CONTINUE;
@@ -3930,7 +3927,7 @@ static const struct gprefix pfx_0f_e7 = {
 };
 
 static const struct escape escape_d9 = { {
-	N, N, N, N, N, N, N, I(DstMem, em_fnstcw),
+	N, N, N, N, N, N, N, I(DstMem16 | Mov, em_fnstcw),
 }, {
 	/* 0xC0 - 0xC7 */
 	N, N, N, N, N, N, N, N,
@@ -3972,7 +3969,7 @@ static const struct escape escape_db = { {
 } };
 
 static const struct escape escape_dd = { {
-	N, N, N, N, N, N, N, I(DstMem, em_fnstsw),
+	N, N, N, N, N, N, N, I(DstMem16 | Mov, em_fnstsw),
 }, {
 	/* 0xC0 - 0xC7 */
 	N, N, N, N, N, N, N, N,
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/8] KVM: x86: JMP/CALL using call- or task-gate causes exception
  2014-12-25  0:52 [PATCH 0/8] KVM: x86: Emulator fixes Nadav Amit
                   ` (2 preceding siblings ...)
  2014-12-25  0:52 ` [PATCH 3/8] KVM: x86: fnstcw and fnstsw may cause spurious exception Nadav Amit
@ 2014-12-25  0:52 ` Nadav Amit
  2014-12-25  0:52 ` [PATCH 5/8] KVM: x86: em_call_far should return failure result Nadav Amit
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Nadav Amit @ 2014-12-25  0:52 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, Nadav Amit

The KVM emulator does not emulate JMP and CALL that target a call gate or a
task gate.  This patch does not try to implement these scenario as they are
presumably rare; yet it returns X86EMUL_UNHANDLEABLE error in such cases
instead of generating an exception.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/kvm/emulate.c | 54 +++++++++++++++++++++++++++++++-------------------
 1 file changed, 34 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 19923e7..fd89471 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -264,6 +264,13 @@ struct instr_dual {
 #define EFLG_RESERVED_ZEROS_MASK 0xffc0802a
 #define EFLG_RESERVED_ONE_MASK 2
 
+enum x86_transfer_type {
+	X86_TRANSFER_NONE,
+	X86_TRANSFER_CALL_JMP,
+	X86_TRANSFER_RET,
+	X86_TRANSFER_TASK_SWITCH,
+};
+
 static ulong reg_read(struct x86_emulate_ctxt *ctxt, unsigned nr)
 {
 	if (!(ctxt->regs_valid & (1 << nr))) {
@@ -1474,7 +1481,7 @@ static int write_segment_descriptor(struct x86_emulate_ctxt *ctxt,
 /* Does not support long mode */
 static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
 				     u16 selector, int seg, u8 cpl,
-				     bool in_task_switch,
+				     enum x86_transfer_type transfer,
 				     struct desc_struct *desc)
 {
 	struct desc_struct seg_desc, old_desc;
@@ -1528,11 +1535,15 @@ static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
 		return ret;
 
 	err_code = selector & 0xfffc;
-	err_vec = in_task_switch ? TS_VECTOR : GP_VECTOR;
+	err_vec = (transfer == X86_TRANSFER_TASK_SWITCH) ? TS_VECTOR :
+							   GP_VECTOR;
 
 	/* can't load system descriptor into segment selector */
-	if (seg <= VCPU_SREG_GS && !seg_desc.s)
+	if (seg <= VCPU_SREG_GS && !seg_desc.s) {
+		if (transfer == X86_TRANSFER_CALL_JMP)
+			return X86EMUL_UNHANDLEABLE;
 		goto exception;
+	}
 
 	if (!seg_desc.p) {
 		err_vec = (seg == VCPU_SREG_SS) ? SS_VECTOR : NP_VECTOR;
@@ -1630,7 +1641,8 @@ static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
 				   u16 selector, int seg)
 {
 	u8 cpl = ctxt->ops->cpl(ctxt);
-	return __load_segment_descriptor(ctxt, selector, seg, cpl, false, NULL);
+	return __load_segment_descriptor(ctxt, selector, seg, cpl,
+					 X86_TRANSFER_NONE, NULL);
 }
 
 static void write_register_operand(struct operand *op)
@@ -2042,7 +2054,8 @@ static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
 
 	memcpy(&sel, ctxt->src.valptr + ctxt->op_bytes, 2);
 
-	rc = __load_segment_descriptor(ctxt, sel, VCPU_SREG_CS, cpl, false,
+	rc = __load_segment_descriptor(ctxt, sel, VCPU_SREG_CS, cpl,
+				       X86_TRANSFER_CALL_JMP,
 				       &new_desc);
 	if (rc != X86EMUL_CONTINUE)
 		return rc;
@@ -2131,7 +2144,8 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
 	/* Outer-privilege level return is not implemented */
 	if (ctxt->mode >= X86EMUL_MODE_PROT16 && (cs & 3) > cpl)
 		return X86EMUL_UNHANDLEABLE;
-	rc = __load_segment_descriptor(ctxt, (u16)cs, VCPU_SREG_CS, cpl, false,
+	rc = __load_segment_descriptor(ctxt, (u16)cs, VCPU_SREG_CS, cpl,
+				       X86_TRANSFER_RET,
 				       &new_desc);
 	if (rc != X86EMUL_CONTINUE)
 		return rc;
@@ -2569,23 +2583,23 @@ static int load_state_from_tss16(struct x86_emulate_ctxt *ctxt,
 	 * it is handled in a context of new task
 	 */
 	ret = __load_segment_descriptor(ctxt, tss->ldt, VCPU_SREG_LDTR, cpl,
-					true, NULL);
+					X86_TRANSFER_TASK_SWITCH, NULL);
 	if (ret != X86EMUL_CONTINUE)
 		return ret;
 	ret = __load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES, cpl,
-					true, NULL);
+					X86_TRANSFER_TASK_SWITCH, NULL);
 	if (ret != X86EMUL_CONTINUE)
 		return ret;
 	ret = __load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS, cpl,
-					true, NULL);
+					X86_TRANSFER_TASK_SWITCH, NULL);
 	if (ret != X86EMUL_CONTINUE)
 		return ret;
 	ret = __load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS, cpl,
-					true, NULL);
+					X86_TRANSFER_TASK_SWITCH, NULL);
 	if (ret != X86EMUL_CONTINUE)
 		return ret;
 	ret = __load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS, cpl,
-					true, NULL);
+					X86_TRANSFER_TASK_SWITCH, NULL);
 	if (ret != X86EMUL_CONTINUE)
 		return ret;
 
@@ -2707,31 +2721,31 @@ static int load_state_from_tss32(struct x86_emulate_ctxt *ctxt,
 	 * it is handled in a context of new task
 	 */
 	ret = __load_segment_descriptor(ctxt, tss->ldt_selector, VCPU_SREG_LDTR,
-					cpl, true, NULL);
+					cpl, X86_TRANSFER_TASK_SWITCH, NULL);
 	if (ret != X86EMUL_CONTINUE)
 		return ret;
 	ret = __load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES, cpl,
-					true, NULL);
+					X86_TRANSFER_TASK_SWITCH, NULL);
 	if (ret != X86EMUL_CONTINUE)
 		return ret;
 	ret = __load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS, cpl,
-					true, NULL);
+					X86_TRANSFER_TASK_SWITCH, NULL);
 	if (ret != X86EMUL_CONTINUE)
 		return ret;
 	ret = __load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS, cpl,
-					true, NULL);
+					X86_TRANSFER_TASK_SWITCH, NULL);
 	if (ret != X86EMUL_CONTINUE)
 		return ret;
 	ret = __load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS, cpl,
-					true, NULL);
+					X86_TRANSFER_TASK_SWITCH, NULL);
 	if (ret != X86EMUL_CONTINUE)
 		return ret;
 	ret = __load_segment_descriptor(ctxt, tss->fs, VCPU_SREG_FS, cpl,
-					true, NULL);
+					X86_TRANSFER_TASK_SWITCH, NULL);
 	if (ret != X86EMUL_CONTINUE)
 		return ret;
 	ret = __load_segment_descriptor(ctxt, tss->gs, VCPU_SREG_GS, cpl,
-					true, NULL);
+					X86_TRANSFER_TASK_SWITCH, NULL);
 	if (ret != X86EMUL_CONTINUE)
 		return ret;
 
@@ -3017,8 +3031,8 @@ static int em_call_far(struct x86_emulate_ctxt *ctxt)
 	ops->get_segment(ctxt, &old_cs, &old_desc, NULL, VCPU_SREG_CS);
 
 	memcpy(&sel, ctxt->src.valptr + ctxt->op_bytes, 2);
-	rc = __load_segment_descriptor(ctxt, sel, VCPU_SREG_CS, cpl, false,
-				       &new_desc);
+	rc = __load_segment_descriptor(ctxt, sel, VCPU_SREG_CS, cpl,
+				       X86_TRANSFER_CALL_JMP, &new_desc);
 	if (rc != X86EMUL_CONTINUE)
 		return X86EMUL_CONTINUE;
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/8] KVM: x86: em_call_far should return failure result
  2014-12-25  0:52 [PATCH 0/8] KVM: x86: Emulator fixes Nadav Amit
                   ` (3 preceding siblings ...)
  2014-12-25  0:52 ` [PATCH 4/8] KVM: x86: JMP/CALL using call- or task-gate causes exception Nadav Amit
@ 2014-12-25  0:52 ` Nadav Amit
  2014-12-25  0:52 ` [PATCH 6/8] KVM: x86: POP [ESP] is not emulated correctly Nadav Amit
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Nadav Amit @ 2014-12-25  0:52 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, Nadav Amit

Currently, if em_call_far fails it returns success instead of the resulting
error-code. Fix it.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/kvm/emulate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index fd89471..7f80f01 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3034,7 +3034,7 @@ static int em_call_far(struct x86_emulate_ctxt *ctxt)
 	rc = __load_segment_descriptor(ctxt, sel, VCPU_SREG_CS, cpl,
 				       X86_TRANSFER_CALL_JMP, &new_desc);
 	if (rc != X86EMUL_CONTINUE)
-		return X86EMUL_CONTINUE;
+		return rc;
 
 	rc = assign_eip_far(ctxt, ctxt->src.val, &new_desc);
 	if (rc != X86EMUL_CONTINUE)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/8] KVM: x86: POP [ESP] is not emulated correctly
  2014-12-25  0:52 [PATCH 0/8] KVM: x86: Emulator fixes Nadav Amit
                   ` (4 preceding siblings ...)
  2014-12-25  0:52 ` [PATCH 5/8] KVM: x86: em_call_far should return failure result Nadav Amit
@ 2014-12-25  0:52 ` Nadav Amit
  2014-12-25  0:52 ` [PATCH 7/8] KVM: x86: Do not set access bit on accessed segments Nadav Amit
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Nadav Amit @ 2014-12-25  0:52 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, Nadav Amit

According to Intel SDM: "If the ESP register is used as a base register for
addressing a destination operand in memory, the POP instruction computes the
effective address of the operand after it increments the ESP register."

The current emulation does not behave so. The fix required to waste another
of the precious instruction flags and to check the flag in decode_modrm.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/kvm/emulate.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 7f80f01..7bf3548 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -171,6 +171,7 @@
 #define PrivUD      ((u64)1 << 51)  /* #UD instead of #GP on CPL > 0 */
 #define NearBranch  ((u64)1 << 52)  /* Near branches */
 #define No16	    ((u64)1 << 53)  /* No 16 bit operand */
+#define IncSP       ((u64)1 << 54)  /* SP is incremented before ModRM calc */
 
 #define DstXacc     (DstAccLo | SrcAccHi | SrcWrite)
 
@@ -1229,6 +1230,10 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
 			else {
 				modrm_ea += reg_read(ctxt, base_reg);
 				adjust_modrm_seg(ctxt, base_reg);
+				/* Increment ESP on POP [ESP] */
+				if ((ctxt->d & IncSP) &&
+				    base_reg == VCPU_REGS_RSP)
+					modrm_ea += ctxt->op_bytes;
 			}
 			if (index_reg != 4)
 				modrm_ea += reg_read(ctxt, index_reg) << scale;
@@ -3825,7 +3830,7 @@ static const struct opcode group1[] = {
 };
 
 static const struct opcode group1A[] = {
-	I(DstMem | SrcNone | Mov | Stack, em_pop), N, N, N, N, N, N, N,
+	I(DstMem | SrcNone | Mov | Stack | IncSP, em_pop), N, N, N, N, N, N, N,
 };
 
 static const struct opcode group2[] = {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 7/8] KVM: x86: Do not set access bit on accessed segments
  2014-12-25  0:52 [PATCH 0/8] KVM: x86: Emulator fixes Nadav Amit
                   ` (5 preceding siblings ...)
  2014-12-25  0:52 ` [PATCH 6/8] KVM: x86: POP [ESP] is not emulated correctly Nadav Amit
@ 2014-12-25  0:52 ` Nadav Amit
  2014-12-25  0:52 ` [PATCH 8/8] KVM: x86: Access to LDT/GDT that wraparound is incorrect Nadav Amit
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Nadav Amit @ 2014-12-25  0:52 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, Nadav Amit

When segment is loaded, the segment access bit is set unconditionally.  In
fact, it should be set conditionally, based on whether the segment had the
accessed bit set before. In addition, it can improve performance.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/kvm/emulate.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 7bf3548..4fcd9fd 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1620,10 +1620,13 @@ static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
 
 	if (seg_desc.s) {
 		/* mark segment as accessed */
-		seg_desc.type |= 1;
-		ret = write_segment_descriptor(ctxt, selector, &seg_desc);
-		if (ret != X86EMUL_CONTINUE)
-			return ret;
+		if (!(seg_desc.type & 1)) {
+			seg_desc.type |= 1;
+			ret = write_segment_descriptor(ctxt, selector,
+						       &seg_desc);
+			if (ret != X86EMUL_CONTINUE)
+				return ret;
+		}
 	} else if (ctxt->mode == X86EMUL_MODE_PROT64) {
 		ret = ctxt->ops->read_std(ctxt, desc_addr+8, &base3,
 				sizeof(base3), &ctxt->exception);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 8/8] KVM: x86: Access to LDT/GDT that wraparound is incorrect
  2014-12-25  0:52 [PATCH 0/8] KVM: x86: Emulator fixes Nadav Amit
                   ` (6 preceding siblings ...)
  2014-12-25  0:52 ` [PATCH 7/8] KVM: x86: Do not set access bit on accessed segments Nadav Amit
@ 2014-12-25  0:52 ` Nadav Amit
  2014-12-27 22:24 ` [PATCH 0/8] KVM: x86: Emulator fixes Paolo Bonzini
  2015-01-08 10:42 ` Paolo Bonzini
  9 siblings, 0 replies; 18+ messages in thread
From: Nadav Amit @ 2014-12-25  0:52 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, Nadav Amit

When access to descriptor in LDT/GDT wraparound outside long-mode, the address
of the descriptor should be truncated to 32-bit.  Citing Intel SDM 2.1.1.1
"Global and Local Descriptor Tables in IA-32e Mode": "GDTR and LDTR registers
are expanded to 64-bits wide in both IA-32e sub-modes (64-bit mode and
compatibility mode)."

So in other cases, we need to truncate. Creating new function to return a
pointer to descriptor table to avoid too much code duplication.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/kvm/emulate.c | 45 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 4fcd9fd..cb6b8ef 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1446,10 +1446,8 @@ static void get_descriptor_table_ptr(struct x86_emulate_ctxt *ctxt,
 		ops->get_gdt(ctxt, dt);
 }
 
-/* allowed just for 8 bytes segments */
-static int read_segment_descriptor(struct x86_emulate_ctxt *ctxt,
-				   u16 selector, struct desc_struct *desc,
-				   ulong *desc_addr_p)
+static int get_descriptor_ptr(struct x86_emulate_ctxt *ctxt,
+			      u16 selector, ulong *desc_addr_p)
 {
 	struct desc_ptr dt;
 	u16 index = selector >> 3;
@@ -1460,8 +1458,32 @@ static int read_segment_descriptor(struct x86_emulate_ctxt *ctxt,
 	if (dt.size < index * 8 + 7)
 		return emulate_gp(ctxt, selector & 0xfffc);
 
-	*desc_addr_p = addr = dt.address + index * 8;
-	return ctxt->ops->read_std(ctxt, addr, desc, sizeof *desc,
+	addr = dt.address + index * 8;
+
+	if (addr >> 32 != 0) {
+		u64 efer = 0;
+
+		ctxt->ops->get_msr(ctxt, MSR_EFER, &efer);
+		if (!(efer & EFER_LMA))
+			addr &= (u32)-1;
+	}
+
+	*desc_addr_p = addr;
+	return X86EMUL_CONTINUE;
+}
+
+/* allowed just for 8 bytes segments */
+static int read_segment_descriptor(struct x86_emulate_ctxt *ctxt,
+				   u16 selector, struct desc_struct *desc,
+				   ulong *desc_addr_p)
+{
+	int rc;
+
+	rc = get_descriptor_ptr(ctxt, selector, desc_addr_p);
+	if (rc != X86EMUL_CONTINUE)
+		return rc;
+
+	return ctxt->ops->read_std(ctxt, *desc_addr_p, desc, sizeof(*desc),
 				   &ctxt->exception);
 }
 
@@ -1469,16 +1491,13 @@ static int read_segment_descriptor(struct x86_emulate_ctxt *ctxt,
 static int write_segment_descriptor(struct x86_emulate_ctxt *ctxt,
 				    u16 selector, struct desc_struct *desc)
 {
-	struct desc_ptr dt;
-	u16 index = selector >> 3;
+	int rc;
 	ulong addr;
 
-	get_descriptor_table_ptr(ctxt, selector, &dt);
-
-	if (dt.size < index * 8 + 7)
-		return emulate_gp(ctxt, selector & 0xfffc);
+	rc = get_descriptor_ptr(ctxt, selector, &addr);
+	if (rc != X86EMUL_CONTINUE)
+		return rc;
 
-	addr = dt.address + index * 8;
 	return ctxt->ops->write_std(ctxt, addr, desc, sizeof *desc,
 				    &ctxt->exception);
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes
  2014-12-25  0:52 ` [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes Nadav Amit
@ 2014-12-25  9:10   ` Chen, Tiejun
  2014-12-25  9:55     ` Nadav Amit
  0 siblings, 1 reply; 18+ messages in thread
From: Chen, Tiejun @ 2014-12-25  9:10 UTC (permalink / raw)
  To: Nadav Amit, pbonzini; +Cc: kvm

On 2014/12/25 8:52, Nadav Amit wrote:
> Although pop sreg updates RSP according to the operand size, only 2 bytes are
> read.  The current behavior may result in incorrect #GP or #PF exceptions.
>
> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
> ---
>   arch/x86/kvm/emulate.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index e5a84be..702da5e 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -1830,12 +1830,14 @@ static int em_pop_sreg(struct x86_emulate_ctxt *ctxt)
>   	unsigned long selector;
>   	int rc;
>

Looks we just should do similar thing to em_push_sreg(),

         unsigned long selector;
         int rc;

+       if (ctxt->op_bytes == 4) {
+               rsp_increment(ctxt, -2);
+               ctxt->op_bytes = 2;
+       }
         rc = emulate_pop(ctxt, &selector, ctxt->op_bytes);
         if (rc != X86EMUL_CONTINUE)
                 return rc;

Right?

Thanks
Tiejun

> -	rc = emulate_pop(ctxt, &selector, ctxt->op_bytes);
> +	rc = emulate_pop(ctxt, &selector, 2);
>   	if (rc != X86EMUL_CONTINUE)
>   		return rc;
>
>   	if (ctxt->modrm_reg == VCPU_SREG_SS)
>   		ctxt->interruptibility = KVM_X86_SHADOW_INT_MOV_SS;
> +	if (ctxt->op_bytes > 2)
> +		rsp_increment(ctxt, ctxt->op_bytes - 2);
>
>   	rc = load_segment_descriptor(ctxt, (u16)selector, seg);
>   	return rc;
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes
  2014-12-25  9:10   ` Chen, Tiejun
@ 2014-12-25  9:55     ` Nadav Amit
  2014-12-26  1:54       ` Chen, Tiejun
  2014-12-26  7:25       ` Wu, Feng
  0 siblings, 2 replies; 18+ messages in thread
From: Nadav Amit @ 2014-12-25  9:55 UTC (permalink / raw)
  To: Chen, Tiejun; +Cc: Paolo Bonzini, kvm list

Tiejun <tiejun.chen@intel.com> wrote:

> On 2014/12/25 8:52, Nadav Amit wrote:
>> Although pop sreg updates RSP according to the operand size, only 2 bytes are
>> read.  The current behavior may result in incorrect #GP or #PF exceptions.
>> 
>> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
>> ---
>>  arch/x86/kvm/emulate.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>> index e5a84be..702da5e 100644
>> --- a/arch/x86/kvm/emulate.c
>> +++ b/arch/x86/kvm/emulate.c
>> @@ -1830,12 +1830,14 @@ static int em_pop_sreg(struct x86_emulate_ctxt *ctxt)
>>  	unsigned long selector;
>>  	int rc;
> 
> Looks we just should do similar thing to em_push_sreg(),
> 
>        unsigned long selector;
>        int rc;
> 
> +       if (ctxt->op_bytes == 4) {
> +               rsp_increment(ctxt, -2);
> +               ctxt->op_bytes = 2;
> +       }
>        rc = emulate_pop(ctxt, &selector, ctxt->op_bytes);
>        if (rc != X86EMUL_CONTINUE)
>                return rc;
> 
> Right?
I don’t think so. It seems the behaviour of push and pop is a bit different.
For push: “If the source operand is a segment register (16 bits) and the
operand size is 64-bits, a zero-extended value is pushed on the stack; if
the operand size is 32-bits ... all recent Core and Atom processors perform
a 16-bit move, leaving the upper portion of the stack location unmodified.”

Therefore, for push in the case of op_bytes==8 we push zero-extended value.

For pop the behaviour is not well-documented, but experimentally it appears
only the first two bytes are accessed. I cannot see why it would be
different when opsize is 8, since it is not like the push case, where the
segment register value was zero extended.

If you feel strongly about it, I’ll create a unit test.

Nadav


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes
  2014-12-25  9:55     ` Nadav Amit
@ 2014-12-26  1:54       ` Chen, Tiejun
  2014-12-26  7:25       ` Wu, Feng
  1 sibling, 0 replies; 18+ messages in thread
From: Chen, Tiejun @ 2014-12-26  1:54 UTC (permalink / raw)
  To: Nadav Amit; +Cc: Paolo Bonzini, kvm list

On 2014/12/25 17:55, Nadav Amit wrote:
> Tiejun <tiejun.chen@intel.com> wrote:
>
>> On 2014/12/25 8:52, Nadav Amit wrote:
>>> Although pop sreg updates RSP according to the operand size, only 2 bytes are
>>> read.  The current behavior may result in incorrect #GP or #PF exceptions.
>>>
>>> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
>>> ---
>>>   arch/x86/kvm/emulate.c | 4 +++-
>>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>>> index e5a84be..702da5e 100644
>>> --- a/arch/x86/kvm/emulate.c
>>> +++ b/arch/x86/kvm/emulate.c
>>> @@ -1830,12 +1830,14 @@ static int em_pop_sreg(struct x86_emulate_ctxt *ctxt)
>>>   	unsigned long selector;
>>>   	int rc;
>>
>> Looks we just should do similar thing to em_push_sreg(),
>>
>>         unsigned long selector;
>>         int rc;
>>
>> +       if (ctxt->op_bytes == 4) {
>> +               rsp_increment(ctxt, -2);
>> +               ctxt->op_bytes = 2;
>> +       }
>>         rc = emulate_pop(ctxt, &selector, ctxt->op_bytes);
>>         if (rc != X86EMUL_CONTINUE)
>>                 return rc;
>>
>> Right?
> I don’t think so. It seems the behaviour of push and pop is a bit different.
> For push: “If the source operand is a segment register (16 bits) and the
> operand size is 64-bits, a zero-extended value is pushed on the stack; if
> the operand size is 32-bits ... all recent Core and Atom processors perform
> a 16-bit move, leaving the upper portion of the stack location unmodified.”
>
> Therefore, for push in the case of op_bytes==8 we push zero-extended value.
>
> For pop the behaviour is not well-documented, but experimentally it appears
> only the first two bytes are accessed. I cannot see why it would be

Maybe we can comment something here, like "/* Just force 2 byte 
destination to already work well in most cases. */".

> different when opsize is 8, since it is not like the push case, where the
> segment register value was zero extended.

Thanks for your explanation.

>
> If you feel strongly about it, I’ll create a unit test.

Based on your description I think I can stand with you at this point.

Tiejun

>
> Nadav
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes
  2014-12-25  9:55     ` Nadav Amit
  2014-12-26  1:54       ` Chen, Tiejun
@ 2014-12-26  7:25       ` Wu, Feng
  2014-12-27 20:05         ` Nadav Amit
  1 sibling, 1 reply; 18+ messages in thread
From: Wu, Feng @ 2014-12-26  7:25 UTC (permalink / raw)
  To: Nadav Amit, Chen, Tiejun; +Cc: Paolo Bonzini, kvm list, Wu, Feng



> -----Original Message-----
> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
> Behalf Of Nadav Amit
> Sent: Thursday, December 25, 2014 5:55 PM
> To: Chen, Tiejun
> Cc: Paolo Bonzini; kvm list
> Subject: Re: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes
> 
> Tiejun <tiejun.chen@intel.com> wrote:
> 
> > On 2014/12/25 8:52, Nadav Amit wrote:
> >> Although pop sreg updates RSP according to the operand size, only 2 bytes
> are
> >> read.  The current behavior may result in incorrect #GP or #PF exceptions.
> >>
> >> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
> >> ---
> >>  arch/x86/kvm/emulate.c | 4 +++-
> >>  1 file changed, 3 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> >> index e5a84be..702da5e 100644
> >> --- a/arch/x86/kvm/emulate.c
> >> +++ b/arch/x86/kvm/emulate.c
> >> @@ -1830,12 +1830,14 @@ static int em_pop_sreg(struct
> x86_emulate_ctxt *ctxt)
> >>  	unsigned long selector;
> >>  	int rc;
> >
> > Looks we just should do similar thing to em_push_sreg(),
> >
> >        unsigned long selector;
> >        int rc;
> >
> > +       if (ctxt->op_bytes == 4) {
> > +               rsp_increment(ctxt, -2);
> > +               ctxt->op_bytes = 2;
> > +       }
> >        rc = emulate_pop(ctxt, &selector, ctxt->op_bytes);
> >        if (rc != X86EMUL_CONTINUE)
> >                return rc;
> >
> > Right?
> I don't think so. It seems the behaviour of push and pop is a bit different.
> For push: "If the source operand is a segment register (16 bits) and the
> operand size is 64-bits, a zero-extended value is pushed on the stack; if
> the operand size is 32-bits ... all recent Core and Atom processors perform
> a 16-bit move, leaving the upper portion of the stack location unmodified."
> 
> Therefore, for push in the case of op_bytes==8 we push zero-extended value.
> 
> For pop the behaviour is not well-documented, but experimentally it appears
> only the first two bytes are accessed. I cannot see why it would be
> different when opsize is 8, since it is not like the push case, where the
> segment register value was zero extended.

Let's take 64-bits operand size as an example. When pushing a segment register, it
uses zero-extended value, so 8 bytes will be pushed on the stack. When popping it,
the current code return the top 8 bytes in the stack, and it only uses the lowest 2
bytes for load_segment_descriptor(). what is the issue here?

Thanks,
Feng

> 
> If you feel strongly about it, I'll create a unit test.
> 
> Nadav
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 1/8] KVM: x86: #PF error-code on R/W operations is wrong
  2014-12-25  0:52 ` [PATCH 1/8] KVM: x86: #PF error-code on R/W operations is wrong Nadav Amit
@ 2014-12-26  7:25   ` Wu, Feng
  2014-12-27 19:55     ` Nadav Amit
  0 siblings, 1 reply; 18+ messages in thread
From: Wu, Feng @ 2014-12-26  7:25 UTC (permalink / raw)
  To: Nadav Amit, pbonzini; +Cc: kvm, Wu, Feng



> -----Original Message-----
> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
> Behalf Of Nadav Amit
> Sent: Thursday, December 25, 2014 8:52 AM
> To: pbonzini@redhat.com
> Cc: kvm@vger.kernel.org; Nadav Amit
> Subject: [PATCH 1/8] KVM: x86: #PF error-code on R/W operations is wrong
> 
> When emulating an instruction that reads the destination memory operand
> (i.e.,
> instructions without the Mov flag in the emulator), the operand is first read.
> If a page-fault is detected in this phase, the error-code which would be
> delivered to the VM does not indicate that the access that caused the
> exception
> is a write one. This does not conform with real hardware, and may cause the
> VM
> to enter the page-fault handler twice for no reason (once for read, once for
> write).
> 
> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
> ---
>  arch/x86/include/asm/kvm_host.h | 12 ++++++++++++
>  arch/x86/kvm/emulate.c          |  6 +++++-
>  arch/x86/kvm/mmu.h              | 12 ------------
>  3 files changed, 17 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h
> b/arch/x86/include/asm/kvm_host.h
> index 73ccb12..d6f90ca 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -162,6 +162,18 @@ enum {
>  #define DR7_FIXED_1	0x00000400
>  #define DR7_VOLATILE	0xffff2bff
> 
> +#define PFERR_PRESENT_BIT 0
> +#define PFERR_WRITE_BIT 1
> +#define PFERR_USER_BIT 2
> +#define PFERR_RSVD_BIT 3
> +#define PFERR_FETCH_BIT 4
> +
> +#define PFERR_PRESENT_MASK (1U << PFERR_PRESENT_BIT)
> +#define PFERR_WRITE_MASK (1U << PFERR_WRITE_BIT)
> +#define PFERR_USER_MASK (1U << PFERR_USER_BIT)
> +#define PFERR_RSVD_MASK (1U << PFERR_RSVD_BIT)
> +#define PFERR_FETCH_MASK (1U << PFERR_FETCH_BIT)
> +
>  /* apic attention bits */
>  #define KVM_APIC_CHECK_VAPIC	0
>  /*
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 7a9697f..e5a84be 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -4941,8 +4941,12 @@ int x86_emulate_insn(struct x86_emulate_ctxt
> *ctxt)
>  		/* optimisation - avoid slow emulated read if Mov */
>  		rc = segmented_read(ctxt, ctxt->dst.addr.mem,
>  				   &ctxt->dst.val, ctxt->dst.bytes);

This is a write operation, do you know why we need to read the dst operand first here?

Thanks,
Feng

> -		if (rc != X86EMUL_CONTINUE)
> +		if (rc != X86EMUL_CONTINUE) {
> +			if (rc == X86EMUL_PROPAGATE_FAULT &&
> +			    ctxt->exception.vector == PF_VECTOR)
> +				ctxt->exception.error_code |= PFERR_WRITE_MASK;
>  			goto done;
> +		}
>  	}
>  	ctxt->dst.orig_val = ctxt->dst.val;
> 
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 6b34876b..daae711 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -44,18 +44,6 @@
>  #define PT_DIRECTORY_LEVEL 2
>  #define PT_PAGE_TABLE_LEVEL 1
> 
> -#define PFERR_PRESENT_BIT 0
> -#define PFERR_WRITE_BIT 1
> -#define PFERR_USER_BIT 2
> -#define PFERR_RSVD_BIT 3
> -#define PFERR_FETCH_BIT 4
> -
> -#define PFERR_PRESENT_MASK (1U << PFERR_PRESENT_BIT)
> -#define PFERR_WRITE_MASK (1U << PFERR_WRITE_BIT)
> -#define PFERR_USER_MASK (1U << PFERR_USER_BIT)
> -#define PFERR_RSVD_MASK (1U << PFERR_RSVD_BIT)
> -#define PFERR_FETCH_MASK (1U << PFERR_FETCH_BIT)
> -
>  static inline u64 rsvd_bits(int s, int e)
>  {
>  	return ((1ULL << (e - s + 1)) - 1) << s;
> --
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/8] KVM: x86: #PF error-code on R/W operations is wrong
  2014-12-26  7:25   ` Wu, Feng
@ 2014-12-27 19:55     ` Nadav Amit
  0 siblings, 0 replies; 18+ messages in thread
From: Nadav Amit @ 2014-12-27 19:55 UTC (permalink / raw)
  To: Wu, Feng; +Cc: Nadav Amit, pbonzini, kvm

Feng <feng.wu@intel.com> wrote:

> 
> 
>> -----Original Message-----
>> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
>> Behalf Of Nadav Amit
>> Sent: Thursday, December 25, 2014 8:52 AM
>> To: pbonzini@redhat.com
>> Cc: kvm@vger.kernel.org; Nadav Amit
>> Subject: [PATCH 1/8] KVM: x86: #PF error-code on R/W operations is wrong
>> 
>> When emulating an instruction that reads the destination memory operand
>> (i.e.,
>> instructions without the Mov flag in the emulator), the operand is first read.
>> If a page-fault is detected in this phase, the error-code which would be
>> delivered to the VM does not indicate that the access that caused the
>> exception
>> is a write one. This does not conform with real hardware, and may cause the
>> VM
>> to enter the page-fault handler twice for no reason (once for read, once for
>> write).
>> 
>> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
>> ---
>> arch/x86/include/asm/kvm_host.h | 12 ++++++++++++
>> arch/x86/kvm/emulate.c          |  6 +++++-
>> arch/x86/kvm/mmu.h              | 12 ------------
>> 3 files changed, 17 insertions(+), 13 deletions(-)
>> 
>> diff --git a/arch/x86/include/asm/kvm_host.h
>> b/arch/x86/include/asm/kvm_host.h
>> index 73ccb12..d6f90ca 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -162,6 +162,18 @@ enum {
>> #define DR7_FIXED_1	0x00000400
>> #define DR7_VOLATILE	0xffff2bff
>> 
>> +#define PFERR_PRESENT_BIT 0
>> +#define PFERR_WRITE_BIT 1
>> +#define PFERR_USER_BIT 2
>> +#define PFERR_RSVD_BIT 3
>> +#define PFERR_FETCH_BIT 4
>> +
>> +#define PFERR_PRESENT_MASK (1U << PFERR_PRESENT_BIT)
>> +#define PFERR_WRITE_MASK (1U << PFERR_WRITE_BIT)
>> +#define PFERR_USER_MASK (1U << PFERR_USER_BIT)
>> +#define PFERR_RSVD_MASK (1U << PFERR_RSVD_BIT)
>> +#define PFERR_FETCH_MASK (1U << PFERR_FETCH_BIT)
>> +
>> /* apic attention bits */
>> #define KVM_APIC_CHECK_VAPIC	0
>> /*
>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>> index 7a9697f..e5a84be 100644
>> --- a/arch/x86/kvm/emulate.c
>> +++ b/arch/x86/kvm/emulate.c
>> @@ -4941,8 +4941,12 @@ int x86_emulate_insn(struct x86_emulate_ctxt
>> *ctxt)
>> 		/* optimisation - avoid slow emulated read if Mov */
>> 		rc = segmented_read(ctxt, ctxt->dst.addr.mem,
>> 				   &ctxt->dst.val, ctxt->dst.bytes);
> 
> This is a write operation, do you know why we need to read the dst operand first here?
Some x86 instructions read the destination operand during their operation.

For instance - "MOV [EAX], EBX" (Intel ASM format), would perform 
[EAX] = [EAX] + EBX. Therefore, it would first read the memory of [EAX] at
this stage.

Nadav


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes
  2014-12-26  7:25       ` Wu, Feng
@ 2014-12-27 20:05         ` Nadav Amit
  0 siblings, 0 replies; 18+ messages in thread
From: Nadav Amit @ 2014-12-27 20:05 UTC (permalink / raw)
  To: Wu, Feng; +Cc: Chen, Tiejun, Paolo Bonzini, kvm list

Feng <feng.wu@intel.com> wrote:

> 
> 
>> -----Original Message-----
>> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
>> Behalf Of Nadav Amit
>> Sent: Thursday, December 25, 2014 5:55 PM
>> To: Chen, Tiejun
>> Cc: Paolo Bonzini; kvm list
>> Subject: Re: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes
>> 
>> Tiejun <tiejun.chen@intel.com> wrote:
>> 
>>> On 2014/12/25 8:52, Nadav Amit wrote:
>>>> Although pop sreg updates RSP according to the operand size, only 2 bytes
>> are
>>>> read.  The current behavior may result in incorrect #GP or #PF exceptions.
>>>> 
>>>> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
>>>> ---
>>>> arch/x86/kvm/emulate.c | 4 +++-
>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>> 
>>>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>>>> index e5a84be..702da5e 100644
>>>> --- a/arch/x86/kvm/emulate.c
>>>> +++ b/arch/x86/kvm/emulate.c
>>>> @@ -1830,12 +1830,14 @@ static int em_pop_sreg(struct
>> x86_emulate_ctxt *ctxt)
>>>> unsigned long selector;
>>>> 	int rc;
>>> 
>>> Looks we just should do similar thing to em_push_sreg(),
>>> 
>>>       unsigned long selector;
>>>       int rc;
>>> 
>>> +       if (ctxt->op_bytes == 4) {
>>> +               rsp_increment(ctxt, -2);
>>> +               ctxt->op_bytes = 2;
>>> +       }
>>>       rc = emulate_pop(ctxt, &selector, ctxt->op_bytes);
>>>       if (rc != X86EMUL_CONTINUE)
>>>               return rc;
>>> 
>>> Right?
>> I don't think so. It seems the behaviour of push and pop is a bit different.
>> For push: "If the source operand is a segment register (16 bits) and the
>> operand size is 64-bits, a zero-extended value is pushed on the stack; if
>> the operand size is 32-bits ... all recent Core and Atom processors perform
>> a 16-bit move, leaving the upper portion of the stack location unmodified."
>> 
>> Therefore, for push in the case of op_bytes==8 we push zero-extended value.
>> 
>> For pop the behaviour is not well-documented, but experimentally it appears
>> only the first two bytes are accessed. I cannot see why it would be
>> different when opsize is 8, since it is not like the push case, where the
>> segment register value was zero extended.
> 
> Let's take 64-bits operand size as an example. When pushing a segment register, it
> uses zero-extended value, so 8 bytes will be pushed on the stack. When popping it,
> the current code return the top 8 bytes in the stack, and it only uses the lowest 2
> bytes for load_segment_descriptor(). what is the issue here?
The issue I try to solve is that during the emulated write operation of the
pop the read is perform using the wrong size (operand size instead of
segment selector size). As you indicated, the destination register/memory of
the pop instruction will be identical before the fix and after the fix.

However, the emulated read may cause #PF if the operand-size that does not
occur on read hardware. Consider for instance a case in which the operand
size is 8, RSP=0xFFC and the page of [0x1000] is non-present. In this case
POP-SREG should not cause a #PF, yet on KVM it does.

Nadav


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/8] KVM: x86: Emulator fixes
  2014-12-25  0:52 [PATCH 0/8] KVM: x86: Emulator fixes Nadav Amit
                   ` (7 preceding siblings ...)
  2014-12-25  0:52 ` [PATCH 8/8] KVM: x86: Access to LDT/GDT that wraparound is incorrect Nadav Amit
@ 2014-12-27 22:24 ` Paolo Bonzini
  2015-01-08 10:42 ` Paolo Bonzini
  9 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2014-12-27 22:24 UTC (permalink / raw)
  To: Nadav Amit; +Cc: kvm



On 25/12/2014 01:52, Nadav Amit wrote:
> The first one is the most interesting one.  It appears that the current
> behavior may cause the VM to enter the page-fault handler twice on certain
> faulting write accesses. If you do not like my solution, please propose a
> better one.

My series here: http://www.spinics.net/lists/kvm/msg101498.html would
fix this as well, but vmx.flat show it has a problem that I've not
managed to debug yet.  So your solution is fine for now.

Paolo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/8] KVM: x86: Emulator fixes
  2014-12-25  0:52 [PATCH 0/8] KVM: x86: Emulator fixes Nadav Amit
                   ` (8 preceding siblings ...)
  2014-12-27 22:24 ` [PATCH 0/8] KVM: x86: Emulator fixes Paolo Bonzini
@ 2015-01-08 10:42 ` Paolo Bonzini
  9 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2015-01-08 10:42 UTC (permalink / raw)
  To: Nadav Amit; +Cc: kvm



On 25/12/2014 01:52, Nadav Amit wrote:
> Few more emulator fixes. Each is logically independent from the others.
> 
> The first one is the most interesting one.  It appears that the current
> behavior may cause the VM to enter the page-fault handler twice on certain
> faulting write accesses. If you do not like my solution, please propose a
> better one.
> 
> The fourth (JMP/CALL using call- or task-gate) is not a fix, but returns an
> error instead of emulating the wrong (#GP) exception.
> 
> Thanks for reviewing the patches.
> 
> Nadav Amit (8):
>   KVM: x86: #PF error-code on R/W operations is wrong
>   KVM: x86: pop sreg accesses only 2 bytes
>   KVM: x86: fnstcw and fnstsw may cause spurious exception
>   KVM: x86: JMP/CALL using call- or task-gate causes exception
>   KVM: x86: em_call_far should return failure result
>   KVM: x86: POP [ESP] is not emulated correctly
>   KVM: x86: Do not set access bit on accessed segments
>   KVM: x86: Access to LDT/GDT that wraparound is incorrect
> 
>  arch/x86/include/asm/kvm_host.h |  12 ++++
>  arch/x86/kvm/emulate.c          | 138 ++++++++++++++++++++++++++--------------
>  arch/x86/kvm/mmu.h              |  12 ----
>  3 files changed, 103 insertions(+), 59 deletions(-)
> 

I'm applying patches 2-8.  I want to play a bit more with patch 1.

Paolo

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-01-08 10:42 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-25  0:52 [PATCH 0/8] KVM: x86: Emulator fixes Nadav Amit
2014-12-25  0:52 ` [PATCH 1/8] KVM: x86: #PF error-code on R/W operations is wrong Nadav Amit
2014-12-26  7:25   ` Wu, Feng
2014-12-27 19:55     ` Nadav Amit
2014-12-25  0:52 ` [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes Nadav Amit
2014-12-25  9:10   ` Chen, Tiejun
2014-12-25  9:55     ` Nadav Amit
2014-12-26  1:54       ` Chen, Tiejun
2014-12-26  7:25       ` Wu, Feng
2014-12-27 20:05         ` Nadav Amit
2014-12-25  0:52 ` [PATCH 3/8] KVM: x86: fnstcw and fnstsw may cause spurious exception Nadav Amit
2014-12-25  0:52 ` [PATCH 4/8] KVM: x86: JMP/CALL using call- or task-gate causes exception Nadav Amit
2014-12-25  0:52 ` [PATCH 5/8] KVM: x86: em_call_far should return failure result Nadav Amit
2014-12-25  0:52 ` [PATCH 6/8] KVM: x86: POP [ESP] is not emulated correctly Nadav Amit
2014-12-25  0:52 ` [PATCH 7/8] KVM: x86: Do not set access bit on accessed segments Nadav Amit
2014-12-25  0:52 ` [PATCH 8/8] KVM: x86: Access to LDT/GDT that wraparound is incorrect Nadav Amit
2014-12-27 22:24 ` [PATCH 0/8] KVM: x86: Emulator fixes Paolo Bonzini
2015-01-08 10:42 ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.