* [PATCH 0/8] KVM: x86: Emulator fixes
@ 2014-12-25 0:52 Nadav Amit
2014-12-25 0:52 ` [PATCH 1/8] KVM: x86: #PF error-code on R/W operations is wrong Nadav Amit
` (9 more replies)
0 siblings, 10 replies; 18+ messages in thread
From: Nadav Amit @ 2014-12-25 0:52 UTC (permalink / raw)
To: pbonzini; +Cc: kvm, Nadav Amit
Few more emulator fixes. Each is logically independent from the others.
The first one is the most interesting one. It appears that the current
behavior may cause the VM to enter the page-fault handler twice on certain
faulting write accesses. If you do not like my solution, please propose a
better one.
The fourth (JMP/CALL using call- or task-gate) is not a fix, but returns an
error instead of emulating the wrong (#GP) exception.
Thanks for reviewing the patches.
Nadav Amit (8):
KVM: x86: #PF error-code on R/W operations is wrong
KVM: x86: pop sreg accesses only 2 bytes
KVM: x86: fnstcw and fnstsw may cause spurious exception
KVM: x86: JMP/CALL using call- or task-gate causes exception
KVM: x86: em_call_far should return failure result
KVM: x86: POP [ESP] is not emulated correctly
KVM: x86: Do not set access bit on accessed segments
KVM: x86: Access to LDT/GDT that wraparound is incorrect
arch/x86/include/asm/kvm_host.h | 12 ++++
arch/x86/kvm/emulate.c | 138 ++++++++++++++++++++++++++--------------
arch/x86/kvm/mmu.h | 12 ----
3 files changed, 103 insertions(+), 59 deletions(-)
--
1.9.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/8] KVM: x86: #PF error-code on R/W operations is wrong
2014-12-25 0:52 [PATCH 0/8] KVM: x86: Emulator fixes Nadav Amit
@ 2014-12-25 0:52 ` Nadav Amit
2014-12-26 7:25 ` Wu, Feng
2014-12-25 0:52 ` [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes Nadav Amit
` (8 subsequent siblings)
9 siblings, 1 reply; 18+ messages in thread
From: Nadav Amit @ 2014-12-25 0:52 UTC (permalink / raw)
To: pbonzini; +Cc: kvm, Nadav Amit
When emulating an instruction that reads the destination memory operand (i.e.,
instructions without the Mov flag in the emulator), the operand is first read.
If a page-fault is detected in this phase, the error-code which would be
delivered to the VM does not indicate that the access that caused the exception
is a write one. This does not conform with real hardware, and may cause the VM
to enter the page-fault handler twice for no reason (once for read, once for
write).
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
arch/x86/include/asm/kvm_host.h | 12 ++++++++++++
arch/x86/kvm/emulate.c | 6 +++++-
arch/x86/kvm/mmu.h | 12 ------------
3 files changed, 17 insertions(+), 13 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 73ccb12..d6f90ca 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -162,6 +162,18 @@ enum {
#define DR7_FIXED_1 0x00000400
#define DR7_VOLATILE 0xffff2bff
+#define PFERR_PRESENT_BIT 0
+#define PFERR_WRITE_BIT 1
+#define PFERR_USER_BIT 2
+#define PFERR_RSVD_BIT 3
+#define PFERR_FETCH_BIT 4
+
+#define PFERR_PRESENT_MASK (1U << PFERR_PRESENT_BIT)
+#define PFERR_WRITE_MASK (1U << PFERR_WRITE_BIT)
+#define PFERR_USER_MASK (1U << PFERR_USER_BIT)
+#define PFERR_RSVD_MASK (1U << PFERR_RSVD_BIT)
+#define PFERR_FETCH_MASK (1U << PFERR_FETCH_BIT)
+
/* apic attention bits */
#define KVM_APIC_CHECK_VAPIC 0
/*
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 7a9697f..e5a84be 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4941,8 +4941,12 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
/* optimisation - avoid slow emulated read if Mov */
rc = segmented_read(ctxt, ctxt->dst.addr.mem,
&ctxt->dst.val, ctxt->dst.bytes);
- if (rc != X86EMUL_CONTINUE)
+ if (rc != X86EMUL_CONTINUE) {
+ if (rc == X86EMUL_PROPAGATE_FAULT &&
+ ctxt->exception.vector == PF_VECTOR)
+ ctxt->exception.error_code |= PFERR_WRITE_MASK;
goto done;
+ }
}
ctxt->dst.orig_val = ctxt->dst.val;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 6b34876b..daae711 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -44,18 +44,6 @@
#define PT_DIRECTORY_LEVEL 2
#define PT_PAGE_TABLE_LEVEL 1
-#define PFERR_PRESENT_BIT 0
-#define PFERR_WRITE_BIT 1
-#define PFERR_USER_BIT 2
-#define PFERR_RSVD_BIT 3
-#define PFERR_FETCH_BIT 4
-
-#define PFERR_PRESENT_MASK (1U << PFERR_PRESENT_BIT)
-#define PFERR_WRITE_MASK (1U << PFERR_WRITE_BIT)
-#define PFERR_USER_MASK (1U << PFERR_USER_BIT)
-#define PFERR_RSVD_MASK (1U << PFERR_RSVD_BIT)
-#define PFERR_FETCH_MASK (1U << PFERR_FETCH_BIT)
-
static inline u64 rsvd_bits(int s, int e)
{
return ((1ULL << (e - s + 1)) - 1) << s;
--
1.9.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes
2014-12-25 0:52 [PATCH 0/8] KVM: x86: Emulator fixes Nadav Amit
2014-12-25 0:52 ` [PATCH 1/8] KVM: x86: #PF error-code on R/W operations is wrong Nadav Amit
@ 2014-12-25 0:52 ` Nadav Amit
2014-12-25 9:10 ` Chen, Tiejun
2014-12-25 0:52 ` [PATCH 3/8] KVM: x86: fnstcw and fnstsw may cause spurious exception Nadav Amit
` (7 subsequent siblings)
9 siblings, 1 reply; 18+ messages in thread
From: Nadav Amit @ 2014-12-25 0:52 UTC (permalink / raw)
To: pbonzini; +Cc: kvm, Nadav Amit
Although pop sreg updates RSP according to the operand size, only 2 bytes are
read. The current behavior may result in incorrect #GP or #PF exceptions.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
arch/x86/kvm/emulate.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e5a84be..702da5e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1830,12 +1830,14 @@ static int em_pop_sreg(struct x86_emulate_ctxt *ctxt)
unsigned long selector;
int rc;
- rc = emulate_pop(ctxt, &selector, ctxt->op_bytes);
+ rc = emulate_pop(ctxt, &selector, 2);
if (rc != X86EMUL_CONTINUE)
return rc;
if (ctxt->modrm_reg == VCPU_SREG_SS)
ctxt->interruptibility = KVM_X86_SHADOW_INT_MOV_SS;
+ if (ctxt->op_bytes > 2)
+ rsp_increment(ctxt, ctxt->op_bytes - 2);
rc = load_segment_descriptor(ctxt, (u16)selector, seg);
return rc;
--
1.9.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 3/8] KVM: x86: fnstcw and fnstsw may cause spurious exception
2014-12-25 0:52 [PATCH 0/8] KVM: x86: Emulator fixes Nadav Amit
2014-12-25 0:52 ` [PATCH 1/8] KVM: x86: #PF error-code on R/W operations is wrong Nadav Amit
2014-12-25 0:52 ` [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes Nadav Amit
@ 2014-12-25 0:52 ` Nadav Amit
2014-12-25 0:52 ` [PATCH 4/8] KVM: x86: JMP/CALL using call- or task-gate causes exception Nadav Amit
` (6 subsequent siblings)
9 siblings, 0 replies; 18+ messages in thread
From: Nadav Amit @ 2014-12-25 0:52 UTC (permalink / raw)
To: pbonzini; +Cc: kvm, Nadav Amit
Since the operand size of fnstcw and fnstsw is updated during the execution,
the emulation may cause spurious exceptions as it reads the memory beforehand.
Marking these instructions as Mov (since the previous value is ignored) and
DstMem16 to simplify the setting of operand size.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
arch/x86/kvm/emulate.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 702da5e..19923e7 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -86,6 +86,7 @@
#define DstAcc (OpAcc << DstShift)
#define DstDI (OpDI << DstShift)
#define DstMem64 (OpMem64 << DstShift)
+#define DstMem16 (OpMem16 << DstShift)
#define DstImmUByte (OpImmUByte << DstShift)
#define DstDX (OpDX << DstShift)
#define DstAccLo (OpAccLo << DstShift)
@@ -1059,8 +1060,6 @@ static int em_fnstcw(struct x86_emulate_ctxt *ctxt)
asm volatile("fnstcw %0": "+m"(fcw));
ctxt->ops->put_fpu(ctxt);
- /* force 2 byte destination */
- ctxt->dst.bytes = 2;
ctxt->dst.val = fcw;
return X86EMUL_CONTINUE;
@@ -1077,8 +1076,6 @@ static int em_fnstsw(struct x86_emulate_ctxt *ctxt)
asm volatile("fnstsw %0": "+m"(fsw));
ctxt->ops->put_fpu(ctxt);
- /* force 2 byte destination */
- ctxt->dst.bytes = 2;
ctxt->dst.val = fsw;
return X86EMUL_CONTINUE;
@@ -3930,7 +3927,7 @@ static const struct gprefix pfx_0f_e7 = {
};
static const struct escape escape_d9 = { {
- N, N, N, N, N, N, N, I(DstMem, em_fnstcw),
+ N, N, N, N, N, N, N, I(DstMem16 | Mov, em_fnstcw),
}, {
/* 0xC0 - 0xC7 */
N, N, N, N, N, N, N, N,
@@ -3972,7 +3969,7 @@ static const struct escape escape_db = { {
} };
static const struct escape escape_dd = { {
- N, N, N, N, N, N, N, I(DstMem, em_fnstsw),
+ N, N, N, N, N, N, N, I(DstMem16 | Mov, em_fnstsw),
}, {
/* 0xC0 - 0xC7 */
N, N, N, N, N, N, N, N,
--
1.9.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 4/8] KVM: x86: JMP/CALL using call- or task-gate causes exception
2014-12-25 0:52 [PATCH 0/8] KVM: x86: Emulator fixes Nadav Amit
` (2 preceding siblings ...)
2014-12-25 0:52 ` [PATCH 3/8] KVM: x86: fnstcw and fnstsw may cause spurious exception Nadav Amit
@ 2014-12-25 0:52 ` Nadav Amit
2014-12-25 0:52 ` [PATCH 5/8] KVM: x86: em_call_far should return failure result Nadav Amit
` (5 subsequent siblings)
9 siblings, 0 replies; 18+ messages in thread
From: Nadav Amit @ 2014-12-25 0:52 UTC (permalink / raw)
To: pbonzini; +Cc: kvm, Nadav Amit
The KVM emulator does not emulate JMP and CALL that target a call gate or a
task gate. This patch does not try to implement these scenario as they are
presumably rare; yet it returns X86EMUL_UNHANDLEABLE error in such cases
instead of generating an exception.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
arch/x86/kvm/emulate.c | 54 +++++++++++++++++++++++++++++++-------------------
1 file changed, 34 insertions(+), 20 deletions(-)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 19923e7..fd89471 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -264,6 +264,13 @@ struct instr_dual {
#define EFLG_RESERVED_ZEROS_MASK 0xffc0802a
#define EFLG_RESERVED_ONE_MASK 2
+enum x86_transfer_type {
+ X86_TRANSFER_NONE,
+ X86_TRANSFER_CALL_JMP,
+ X86_TRANSFER_RET,
+ X86_TRANSFER_TASK_SWITCH,
+};
+
static ulong reg_read(struct x86_emulate_ctxt *ctxt, unsigned nr)
{
if (!(ctxt->regs_valid & (1 << nr))) {
@@ -1474,7 +1481,7 @@ static int write_segment_descriptor(struct x86_emulate_ctxt *ctxt,
/* Does not support long mode */
static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
u16 selector, int seg, u8 cpl,
- bool in_task_switch,
+ enum x86_transfer_type transfer,
struct desc_struct *desc)
{
struct desc_struct seg_desc, old_desc;
@@ -1528,11 +1535,15 @@ static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
return ret;
err_code = selector & 0xfffc;
- err_vec = in_task_switch ? TS_VECTOR : GP_VECTOR;
+ err_vec = (transfer == X86_TRANSFER_TASK_SWITCH) ? TS_VECTOR :
+ GP_VECTOR;
/* can't load system descriptor into segment selector */
- if (seg <= VCPU_SREG_GS && !seg_desc.s)
+ if (seg <= VCPU_SREG_GS && !seg_desc.s) {
+ if (transfer == X86_TRANSFER_CALL_JMP)
+ return X86EMUL_UNHANDLEABLE;
goto exception;
+ }
if (!seg_desc.p) {
err_vec = (seg == VCPU_SREG_SS) ? SS_VECTOR : NP_VECTOR;
@@ -1630,7 +1641,8 @@ static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
u16 selector, int seg)
{
u8 cpl = ctxt->ops->cpl(ctxt);
- return __load_segment_descriptor(ctxt, selector, seg, cpl, false, NULL);
+ return __load_segment_descriptor(ctxt, selector, seg, cpl,
+ X86_TRANSFER_NONE, NULL);
}
static void write_register_operand(struct operand *op)
@@ -2042,7 +2054,8 @@ static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
memcpy(&sel, ctxt->src.valptr + ctxt->op_bytes, 2);
- rc = __load_segment_descriptor(ctxt, sel, VCPU_SREG_CS, cpl, false,
+ rc = __load_segment_descriptor(ctxt, sel, VCPU_SREG_CS, cpl,
+ X86_TRANSFER_CALL_JMP,
&new_desc);
if (rc != X86EMUL_CONTINUE)
return rc;
@@ -2131,7 +2144,8 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
/* Outer-privilege level return is not implemented */
if (ctxt->mode >= X86EMUL_MODE_PROT16 && (cs & 3) > cpl)
return X86EMUL_UNHANDLEABLE;
- rc = __load_segment_descriptor(ctxt, (u16)cs, VCPU_SREG_CS, cpl, false,
+ rc = __load_segment_descriptor(ctxt, (u16)cs, VCPU_SREG_CS, cpl,
+ X86_TRANSFER_RET,
&new_desc);
if (rc != X86EMUL_CONTINUE)
return rc;
@@ -2569,23 +2583,23 @@ static int load_state_from_tss16(struct x86_emulate_ctxt *ctxt,
* it is handled in a context of new task
*/
ret = __load_segment_descriptor(ctxt, tss->ldt, VCPU_SREG_LDTR, cpl,
- true, NULL);
+ X86_TRANSFER_TASK_SWITCH, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
ret = __load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES, cpl,
- true, NULL);
+ X86_TRANSFER_TASK_SWITCH, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
ret = __load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS, cpl,
- true, NULL);
+ X86_TRANSFER_TASK_SWITCH, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
ret = __load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS, cpl,
- true, NULL);
+ X86_TRANSFER_TASK_SWITCH, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
ret = __load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS, cpl,
- true, NULL);
+ X86_TRANSFER_TASK_SWITCH, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
@@ -2707,31 +2721,31 @@ static int load_state_from_tss32(struct x86_emulate_ctxt *ctxt,
* it is handled in a context of new task
*/
ret = __load_segment_descriptor(ctxt, tss->ldt_selector, VCPU_SREG_LDTR,
- cpl, true, NULL);
+ cpl, X86_TRANSFER_TASK_SWITCH, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
ret = __load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES, cpl,
- true, NULL);
+ X86_TRANSFER_TASK_SWITCH, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
ret = __load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS, cpl,
- true, NULL);
+ X86_TRANSFER_TASK_SWITCH, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
ret = __load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS, cpl,
- true, NULL);
+ X86_TRANSFER_TASK_SWITCH, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
ret = __load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS, cpl,
- true, NULL);
+ X86_TRANSFER_TASK_SWITCH, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
ret = __load_segment_descriptor(ctxt, tss->fs, VCPU_SREG_FS, cpl,
- true, NULL);
+ X86_TRANSFER_TASK_SWITCH, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
ret = __load_segment_descriptor(ctxt, tss->gs, VCPU_SREG_GS, cpl,
- true, NULL);
+ X86_TRANSFER_TASK_SWITCH, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
@@ -3017,8 +3031,8 @@ static int em_call_far(struct x86_emulate_ctxt *ctxt)
ops->get_segment(ctxt, &old_cs, &old_desc, NULL, VCPU_SREG_CS);
memcpy(&sel, ctxt->src.valptr + ctxt->op_bytes, 2);
- rc = __load_segment_descriptor(ctxt, sel, VCPU_SREG_CS, cpl, false,
- &new_desc);
+ rc = __load_segment_descriptor(ctxt, sel, VCPU_SREG_CS, cpl,
+ X86_TRANSFER_CALL_JMP, &new_desc);
if (rc != X86EMUL_CONTINUE)
return X86EMUL_CONTINUE;
--
1.9.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 5/8] KVM: x86: em_call_far should return failure result
2014-12-25 0:52 [PATCH 0/8] KVM: x86: Emulator fixes Nadav Amit
` (3 preceding siblings ...)
2014-12-25 0:52 ` [PATCH 4/8] KVM: x86: JMP/CALL using call- or task-gate causes exception Nadav Amit
@ 2014-12-25 0:52 ` Nadav Amit
2014-12-25 0:52 ` [PATCH 6/8] KVM: x86: POP [ESP] is not emulated correctly Nadav Amit
` (4 subsequent siblings)
9 siblings, 0 replies; 18+ messages in thread
From: Nadav Amit @ 2014-12-25 0:52 UTC (permalink / raw)
To: pbonzini; +Cc: kvm, Nadav Amit
Currently, if em_call_far fails it returns success instead of the resulting
error-code. Fix it.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
arch/x86/kvm/emulate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index fd89471..7f80f01 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3034,7 +3034,7 @@ static int em_call_far(struct x86_emulate_ctxt *ctxt)
rc = __load_segment_descriptor(ctxt, sel, VCPU_SREG_CS, cpl,
X86_TRANSFER_CALL_JMP, &new_desc);
if (rc != X86EMUL_CONTINUE)
- return X86EMUL_CONTINUE;
+ return rc;
rc = assign_eip_far(ctxt, ctxt->src.val, &new_desc);
if (rc != X86EMUL_CONTINUE)
--
1.9.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 6/8] KVM: x86: POP [ESP] is not emulated correctly
2014-12-25 0:52 [PATCH 0/8] KVM: x86: Emulator fixes Nadav Amit
` (4 preceding siblings ...)
2014-12-25 0:52 ` [PATCH 5/8] KVM: x86: em_call_far should return failure result Nadav Amit
@ 2014-12-25 0:52 ` Nadav Amit
2014-12-25 0:52 ` [PATCH 7/8] KVM: x86: Do not set access bit on accessed segments Nadav Amit
` (3 subsequent siblings)
9 siblings, 0 replies; 18+ messages in thread
From: Nadav Amit @ 2014-12-25 0:52 UTC (permalink / raw)
To: pbonzini; +Cc: kvm, Nadav Amit
According to Intel SDM: "If the ESP register is used as a base register for
addressing a destination operand in memory, the POP instruction computes the
effective address of the operand after it increments the ESP register."
The current emulation does not behave so. The fix required to waste another
of the precious instruction flags and to check the flag in decode_modrm.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
arch/x86/kvm/emulate.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 7f80f01..7bf3548 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -171,6 +171,7 @@
#define PrivUD ((u64)1 << 51) /* #UD instead of #GP on CPL > 0 */
#define NearBranch ((u64)1 << 52) /* Near branches */
#define No16 ((u64)1 << 53) /* No 16 bit operand */
+#define IncSP ((u64)1 << 54) /* SP is incremented before ModRM calc */
#define DstXacc (DstAccLo | SrcAccHi | SrcWrite)
@@ -1229,6 +1230,10 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
else {
modrm_ea += reg_read(ctxt, base_reg);
adjust_modrm_seg(ctxt, base_reg);
+ /* Increment ESP on POP [ESP] */
+ if ((ctxt->d & IncSP) &&
+ base_reg == VCPU_REGS_RSP)
+ modrm_ea += ctxt->op_bytes;
}
if (index_reg != 4)
modrm_ea += reg_read(ctxt, index_reg) << scale;
@@ -3825,7 +3830,7 @@ static const struct opcode group1[] = {
};
static const struct opcode group1A[] = {
- I(DstMem | SrcNone | Mov | Stack, em_pop), N, N, N, N, N, N, N,
+ I(DstMem | SrcNone | Mov | Stack | IncSP, em_pop), N, N, N, N, N, N, N,
};
static const struct opcode group2[] = {
--
1.9.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 7/8] KVM: x86: Do not set access bit on accessed segments
2014-12-25 0:52 [PATCH 0/8] KVM: x86: Emulator fixes Nadav Amit
` (5 preceding siblings ...)
2014-12-25 0:52 ` [PATCH 6/8] KVM: x86: POP [ESP] is not emulated correctly Nadav Amit
@ 2014-12-25 0:52 ` Nadav Amit
2014-12-25 0:52 ` [PATCH 8/8] KVM: x86: Access to LDT/GDT that wraparound is incorrect Nadav Amit
` (2 subsequent siblings)
9 siblings, 0 replies; 18+ messages in thread
From: Nadav Amit @ 2014-12-25 0:52 UTC (permalink / raw)
To: pbonzini; +Cc: kvm, Nadav Amit
When segment is loaded, the segment access bit is set unconditionally. In
fact, it should be set conditionally, based on whether the segment had the
accessed bit set before. In addition, it can improve performance.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
arch/x86/kvm/emulate.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 7bf3548..4fcd9fd 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1620,10 +1620,13 @@ static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
if (seg_desc.s) {
/* mark segment as accessed */
- seg_desc.type |= 1;
- ret = write_segment_descriptor(ctxt, selector, &seg_desc);
- if (ret != X86EMUL_CONTINUE)
- return ret;
+ if (!(seg_desc.type & 1)) {
+ seg_desc.type |= 1;
+ ret = write_segment_descriptor(ctxt, selector,
+ &seg_desc);
+ if (ret != X86EMUL_CONTINUE)
+ return ret;
+ }
} else if (ctxt->mode == X86EMUL_MODE_PROT64) {
ret = ctxt->ops->read_std(ctxt, desc_addr+8, &base3,
sizeof(base3), &ctxt->exception);
--
1.9.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 8/8] KVM: x86: Access to LDT/GDT that wraparound is incorrect
2014-12-25 0:52 [PATCH 0/8] KVM: x86: Emulator fixes Nadav Amit
` (6 preceding siblings ...)
2014-12-25 0:52 ` [PATCH 7/8] KVM: x86: Do not set access bit on accessed segments Nadav Amit
@ 2014-12-25 0:52 ` Nadav Amit
2014-12-27 22:24 ` [PATCH 0/8] KVM: x86: Emulator fixes Paolo Bonzini
2015-01-08 10:42 ` Paolo Bonzini
9 siblings, 0 replies; 18+ messages in thread
From: Nadav Amit @ 2014-12-25 0:52 UTC (permalink / raw)
To: pbonzini; +Cc: kvm, Nadav Amit
When access to descriptor in LDT/GDT wraparound outside long-mode, the address
of the descriptor should be truncated to 32-bit. Citing Intel SDM 2.1.1.1
"Global and Local Descriptor Tables in IA-32e Mode": "GDTR and LDTR registers
are expanded to 64-bits wide in both IA-32e sub-modes (64-bit mode and
compatibility mode)."
So in other cases, we need to truncate. Creating new function to return a
pointer to descriptor table to avoid too much code duplication.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
arch/x86/kvm/emulate.c | 45 ++++++++++++++++++++++++++++++++-------------
1 file changed, 32 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 4fcd9fd..cb6b8ef 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1446,10 +1446,8 @@ static void get_descriptor_table_ptr(struct x86_emulate_ctxt *ctxt,
ops->get_gdt(ctxt, dt);
}
-/* allowed just for 8 bytes segments */
-static int read_segment_descriptor(struct x86_emulate_ctxt *ctxt,
- u16 selector, struct desc_struct *desc,
- ulong *desc_addr_p)
+static int get_descriptor_ptr(struct x86_emulate_ctxt *ctxt,
+ u16 selector, ulong *desc_addr_p)
{
struct desc_ptr dt;
u16 index = selector >> 3;
@@ -1460,8 +1458,32 @@ static int read_segment_descriptor(struct x86_emulate_ctxt *ctxt,
if (dt.size < index * 8 + 7)
return emulate_gp(ctxt, selector & 0xfffc);
- *desc_addr_p = addr = dt.address + index * 8;
- return ctxt->ops->read_std(ctxt, addr, desc, sizeof *desc,
+ addr = dt.address + index * 8;
+
+ if (addr >> 32 != 0) {
+ u64 efer = 0;
+
+ ctxt->ops->get_msr(ctxt, MSR_EFER, &efer);
+ if (!(efer & EFER_LMA))
+ addr &= (u32)-1;
+ }
+
+ *desc_addr_p = addr;
+ return X86EMUL_CONTINUE;
+}
+
+/* allowed just for 8 bytes segments */
+static int read_segment_descriptor(struct x86_emulate_ctxt *ctxt,
+ u16 selector, struct desc_struct *desc,
+ ulong *desc_addr_p)
+{
+ int rc;
+
+ rc = get_descriptor_ptr(ctxt, selector, desc_addr_p);
+ if (rc != X86EMUL_CONTINUE)
+ return rc;
+
+ return ctxt->ops->read_std(ctxt, *desc_addr_p, desc, sizeof(*desc),
&ctxt->exception);
}
@@ -1469,16 +1491,13 @@ static int read_segment_descriptor(struct x86_emulate_ctxt *ctxt,
static int write_segment_descriptor(struct x86_emulate_ctxt *ctxt,
u16 selector, struct desc_struct *desc)
{
- struct desc_ptr dt;
- u16 index = selector >> 3;
+ int rc;
ulong addr;
- get_descriptor_table_ptr(ctxt, selector, &dt);
-
- if (dt.size < index * 8 + 7)
- return emulate_gp(ctxt, selector & 0xfffc);
+ rc = get_descriptor_ptr(ctxt, selector, &addr);
+ if (rc != X86EMUL_CONTINUE)
+ return rc;
- addr = dt.address + index * 8;
return ctxt->ops->write_std(ctxt, addr, desc, sizeof *desc,
&ctxt->exception);
}
--
1.9.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes
2014-12-25 0:52 ` [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes Nadav Amit
@ 2014-12-25 9:10 ` Chen, Tiejun
2014-12-25 9:55 ` Nadav Amit
0 siblings, 1 reply; 18+ messages in thread
From: Chen, Tiejun @ 2014-12-25 9:10 UTC (permalink / raw)
To: Nadav Amit, pbonzini; +Cc: kvm
On 2014/12/25 8:52, Nadav Amit wrote:
> Although pop sreg updates RSP according to the operand size, only 2 bytes are
> read. The current behavior may result in incorrect #GP or #PF exceptions.
>
> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
> ---
> arch/x86/kvm/emulate.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index e5a84be..702da5e 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -1830,12 +1830,14 @@ static int em_pop_sreg(struct x86_emulate_ctxt *ctxt)
> unsigned long selector;
> int rc;
>
Looks we just should do similar thing to em_push_sreg(),
unsigned long selector;
int rc;
+ if (ctxt->op_bytes == 4) {
+ rsp_increment(ctxt, -2);
+ ctxt->op_bytes = 2;
+ }
rc = emulate_pop(ctxt, &selector, ctxt->op_bytes);
if (rc != X86EMUL_CONTINUE)
return rc;
Right?
Thanks
Tiejun
> - rc = emulate_pop(ctxt, &selector, ctxt->op_bytes);
> + rc = emulate_pop(ctxt, &selector, 2);
> if (rc != X86EMUL_CONTINUE)
> return rc;
>
> if (ctxt->modrm_reg == VCPU_SREG_SS)
> ctxt->interruptibility = KVM_X86_SHADOW_INT_MOV_SS;
> + if (ctxt->op_bytes > 2)
> + rsp_increment(ctxt, ctxt->op_bytes - 2);
>
> rc = load_segment_descriptor(ctxt, (u16)selector, seg);
> return rc;
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes
2014-12-25 9:10 ` Chen, Tiejun
@ 2014-12-25 9:55 ` Nadav Amit
2014-12-26 1:54 ` Chen, Tiejun
2014-12-26 7:25 ` Wu, Feng
0 siblings, 2 replies; 18+ messages in thread
From: Nadav Amit @ 2014-12-25 9:55 UTC (permalink / raw)
To: Chen, Tiejun; +Cc: Paolo Bonzini, kvm list
Tiejun <tiejun.chen@intel.com> wrote:
> On 2014/12/25 8:52, Nadav Amit wrote:
>> Although pop sreg updates RSP according to the operand size, only 2 bytes are
>> read. The current behavior may result in incorrect #GP or #PF exceptions.
>>
>> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
>> ---
>> arch/x86/kvm/emulate.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>> index e5a84be..702da5e 100644
>> --- a/arch/x86/kvm/emulate.c
>> +++ b/arch/x86/kvm/emulate.c
>> @@ -1830,12 +1830,14 @@ static int em_pop_sreg(struct x86_emulate_ctxt *ctxt)
>> unsigned long selector;
>> int rc;
>
> Looks we just should do similar thing to em_push_sreg(),
>
> unsigned long selector;
> int rc;
>
> + if (ctxt->op_bytes == 4) {
> + rsp_increment(ctxt, -2);
> + ctxt->op_bytes = 2;
> + }
> rc = emulate_pop(ctxt, &selector, ctxt->op_bytes);
> if (rc != X86EMUL_CONTINUE)
> return rc;
>
> Right?
I don’t think so. It seems the behaviour of push and pop is a bit different.
For push: “If the source operand is a segment register (16 bits) and the
operand size is 64-bits, a zero-extended value is pushed on the stack; if
the operand size is 32-bits ... all recent Core and Atom processors perform
a 16-bit move, leaving the upper portion of the stack location unmodified.”
Therefore, for push in the case of op_bytes==8 we push zero-extended value.
For pop the behaviour is not well-documented, but experimentally it appears
only the first two bytes are accessed. I cannot see why it would be
different when opsize is 8, since it is not like the push case, where the
segment register value was zero extended.
If you feel strongly about it, I’ll create a unit test.
Nadav
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes
2014-12-25 9:55 ` Nadav Amit
@ 2014-12-26 1:54 ` Chen, Tiejun
2014-12-26 7:25 ` Wu, Feng
1 sibling, 0 replies; 18+ messages in thread
From: Chen, Tiejun @ 2014-12-26 1:54 UTC (permalink / raw)
To: Nadav Amit; +Cc: Paolo Bonzini, kvm list
On 2014/12/25 17:55, Nadav Amit wrote:
> Tiejun <tiejun.chen@intel.com> wrote:
>
>> On 2014/12/25 8:52, Nadav Amit wrote:
>>> Although pop sreg updates RSP according to the operand size, only 2 bytes are
>>> read. The current behavior may result in incorrect #GP or #PF exceptions.
>>>
>>> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
>>> ---
>>> arch/x86/kvm/emulate.c | 4 +++-
>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>>> index e5a84be..702da5e 100644
>>> --- a/arch/x86/kvm/emulate.c
>>> +++ b/arch/x86/kvm/emulate.c
>>> @@ -1830,12 +1830,14 @@ static int em_pop_sreg(struct x86_emulate_ctxt *ctxt)
>>> unsigned long selector;
>>> int rc;
>>
>> Looks we just should do similar thing to em_push_sreg(),
>>
>> unsigned long selector;
>> int rc;
>>
>> + if (ctxt->op_bytes == 4) {
>> + rsp_increment(ctxt, -2);
>> + ctxt->op_bytes = 2;
>> + }
>> rc = emulate_pop(ctxt, &selector, ctxt->op_bytes);
>> if (rc != X86EMUL_CONTINUE)
>> return rc;
>>
>> Right?
> I don’t think so. It seems the behaviour of push and pop is a bit different.
> For push: “If the source operand is a segment register (16 bits) and the
> operand size is 64-bits, a zero-extended value is pushed on the stack; if
> the operand size is 32-bits ... all recent Core and Atom processors perform
> a 16-bit move, leaving the upper portion of the stack location unmodified.”
>
> Therefore, for push in the case of op_bytes==8 we push zero-extended value.
>
> For pop the behaviour is not well-documented, but experimentally it appears
> only the first two bytes are accessed. I cannot see why it would be
Maybe we can comment something here, like "/* Just force 2 byte
destination to already work well in most cases. */".
> different when opsize is 8, since it is not like the push case, where the
> segment register value was zero extended.
Thanks for your explanation.
>
> If you feel strongly about it, I’ll create a unit test.
Based on your description I think I can stand with you at this point.
Tiejun
>
> Nadav
>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes
2014-12-25 9:55 ` Nadav Amit
2014-12-26 1:54 ` Chen, Tiejun
@ 2014-12-26 7:25 ` Wu, Feng
2014-12-27 20:05 ` Nadav Amit
1 sibling, 1 reply; 18+ messages in thread
From: Wu, Feng @ 2014-12-26 7:25 UTC (permalink / raw)
To: Nadav Amit, Chen, Tiejun; +Cc: Paolo Bonzini, kvm list, Wu, Feng
> -----Original Message-----
> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
> Behalf Of Nadav Amit
> Sent: Thursday, December 25, 2014 5:55 PM
> To: Chen, Tiejun
> Cc: Paolo Bonzini; kvm list
> Subject: Re: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes
>
> Tiejun <tiejun.chen@intel.com> wrote:
>
> > On 2014/12/25 8:52, Nadav Amit wrote:
> >> Although pop sreg updates RSP according to the operand size, only 2 bytes
> are
> >> read. The current behavior may result in incorrect #GP or #PF exceptions.
> >>
> >> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
> >> ---
> >> arch/x86/kvm/emulate.c | 4 +++-
> >> 1 file changed, 3 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> >> index e5a84be..702da5e 100644
> >> --- a/arch/x86/kvm/emulate.c
> >> +++ b/arch/x86/kvm/emulate.c
> >> @@ -1830,12 +1830,14 @@ static int em_pop_sreg(struct
> x86_emulate_ctxt *ctxt)
> >> unsigned long selector;
> >> int rc;
> >
> > Looks we just should do similar thing to em_push_sreg(),
> >
> > unsigned long selector;
> > int rc;
> >
> > + if (ctxt->op_bytes == 4) {
> > + rsp_increment(ctxt, -2);
> > + ctxt->op_bytes = 2;
> > + }
> > rc = emulate_pop(ctxt, &selector, ctxt->op_bytes);
> > if (rc != X86EMUL_CONTINUE)
> > return rc;
> >
> > Right?
> I don't think so. It seems the behaviour of push and pop is a bit different.
> For push: "If the source operand is a segment register (16 bits) and the
> operand size is 64-bits, a zero-extended value is pushed on the stack; if
> the operand size is 32-bits ... all recent Core and Atom processors perform
> a 16-bit move, leaving the upper portion of the stack location unmodified."
>
> Therefore, for push in the case of op_bytes==8 we push zero-extended value.
>
> For pop the behaviour is not well-documented, but experimentally it appears
> only the first two bytes are accessed. I cannot see why it would be
> different when opsize is 8, since it is not like the push case, where the
> segment register value was zero extended.
Let's take 64-bits operand size as an example. When pushing a segment register, it
uses zero-extended value, so 8 bytes will be pushed on the stack. When popping it,
the current code return the top 8 bytes in the stack, and it only uses the lowest 2
bytes for load_segment_descriptor(). what is the issue here?
Thanks,
Feng
>
> If you feel strongly about it, I'll create a unit test.
>
> Nadav
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH 1/8] KVM: x86: #PF error-code on R/W operations is wrong
2014-12-25 0:52 ` [PATCH 1/8] KVM: x86: #PF error-code on R/W operations is wrong Nadav Amit
@ 2014-12-26 7:25 ` Wu, Feng
2014-12-27 19:55 ` Nadav Amit
0 siblings, 1 reply; 18+ messages in thread
From: Wu, Feng @ 2014-12-26 7:25 UTC (permalink / raw)
To: Nadav Amit, pbonzini; +Cc: kvm, Wu, Feng
> -----Original Message-----
> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
> Behalf Of Nadav Amit
> Sent: Thursday, December 25, 2014 8:52 AM
> To: pbonzini@redhat.com
> Cc: kvm@vger.kernel.org; Nadav Amit
> Subject: [PATCH 1/8] KVM: x86: #PF error-code on R/W operations is wrong
>
> When emulating an instruction that reads the destination memory operand
> (i.e.,
> instructions without the Mov flag in the emulator), the operand is first read.
> If a page-fault is detected in this phase, the error-code which would be
> delivered to the VM does not indicate that the access that caused the
> exception
> is a write one. This does not conform with real hardware, and may cause the
> VM
> to enter the page-fault handler twice for no reason (once for read, once for
> write).
>
> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
> ---
> arch/x86/include/asm/kvm_host.h | 12 ++++++++++++
> arch/x86/kvm/emulate.c | 6 +++++-
> arch/x86/kvm/mmu.h | 12 ------------
> 3 files changed, 17 insertions(+), 13 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h
> b/arch/x86/include/asm/kvm_host.h
> index 73ccb12..d6f90ca 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -162,6 +162,18 @@ enum {
> #define DR7_FIXED_1 0x00000400
> #define DR7_VOLATILE 0xffff2bff
>
> +#define PFERR_PRESENT_BIT 0
> +#define PFERR_WRITE_BIT 1
> +#define PFERR_USER_BIT 2
> +#define PFERR_RSVD_BIT 3
> +#define PFERR_FETCH_BIT 4
> +
> +#define PFERR_PRESENT_MASK (1U << PFERR_PRESENT_BIT)
> +#define PFERR_WRITE_MASK (1U << PFERR_WRITE_BIT)
> +#define PFERR_USER_MASK (1U << PFERR_USER_BIT)
> +#define PFERR_RSVD_MASK (1U << PFERR_RSVD_BIT)
> +#define PFERR_FETCH_MASK (1U << PFERR_FETCH_BIT)
> +
> /* apic attention bits */
> #define KVM_APIC_CHECK_VAPIC 0
> /*
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 7a9697f..e5a84be 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -4941,8 +4941,12 @@ int x86_emulate_insn(struct x86_emulate_ctxt
> *ctxt)
> /* optimisation - avoid slow emulated read if Mov */
> rc = segmented_read(ctxt, ctxt->dst.addr.mem,
> &ctxt->dst.val, ctxt->dst.bytes);
This is a write operation, do you know why we need to read the dst operand first here?
Thanks,
Feng
> - if (rc != X86EMUL_CONTINUE)
> + if (rc != X86EMUL_CONTINUE) {
> + if (rc == X86EMUL_PROPAGATE_FAULT &&
> + ctxt->exception.vector == PF_VECTOR)
> + ctxt->exception.error_code |= PFERR_WRITE_MASK;
> goto done;
> + }
> }
> ctxt->dst.orig_val = ctxt->dst.val;
>
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 6b34876b..daae711 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -44,18 +44,6 @@
> #define PT_DIRECTORY_LEVEL 2
> #define PT_PAGE_TABLE_LEVEL 1
>
> -#define PFERR_PRESENT_BIT 0
> -#define PFERR_WRITE_BIT 1
> -#define PFERR_USER_BIT 2
> -#define PFERR_RSVD_BIT 3
> -#define PFERR_FETCH_BIT 4
> -
> -#define PFERR_PRESENT_MASK (1U << PFERR_PRESENT_BIT)
> -#define PFERR_WRITE_MASK (1U << PFERR_WRITE_BIT)
> -#define PFERR_USER_MASK (1U << PFERR_USER_BIT)
> -#define PFERR_RSVD_MASK (1U << PFERR_RSVD_BIT)
> -#define PFERR_FETCH_MASK (1U << PFERR_FETCH_BIT)
> -
> static inline u64 rsvd_bits(int s, int e)
> {
> return ((1ULL << (e - s + 1)) - 1) << s;
> --
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/8] KVM: x86: #PF error-code on R/W operations is wrong
2014-12-26 7:25 ` Wu, Feng
@ 2014-12-27 19:55 ` Nadav Amit
0 siblings, 0 replies; 18+ messages in thread
From: Nadav Amit @ 2014-12-27 19:55 UTC (permalink / raw)
To: Wu, Feng; +Cc: Nadav Amit, pbonzini, kvm
Feng <feng.wu@intel.com> wrote:
>
>
>> -----Original Message-----
>> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
>> Behalf Of Nadav Amit
>> Sent: Thursday, December 25, 2014 8:52 AM
>> To: pbonzini@redhat.com
>> Cc: kvm@vger.kernel.org; Nadav Amit
>> Subject: [PATCH 1/8] KVM: x86: #PF error-code on R/W operations is wrong
>>
>> When emulating an instruction that reads the destination memory operand
>> (i.e.,
>> instructions without the Mov flag in the emulator), the operand is first read.
>> If a page-fault is detected in this phase, the error-code which would be
>> delivered to the VM does not indicate that the access that caused the
>> exception
>> is a write one. This does not conform with real hardware, and may cause the
>> VM
>> to enter the page-fault handler twice for no reason (once for read, once for
>> write).
>>
>> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
>> ---
>> arch/x86/include/asm/kvm_host.h | 12 ++++++++++++
>> arch/x86/kvm/emulate.c | 6 +++++-
>> arch/x86/kvm/mmu.h | 12 ------------
>> 3 files changed, 17 insertions(+), 13 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h
>> b/arch/x86/include/asm/kvm_host.h
>> index 73ccb12..d6f90ca 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -162,6 +162,18 @@ enum {
>> #define DR7_FIXED_1 0x00000400
>> #define DR7_VOLATILE 0xffff2bff
>>
>> +#define PFERR_PRESENT_BIT 0
>> +#define PFERR_WRITE_BIT 1
>> +#define PFERR_USER_BIT 2
>> +#define PFERR_RSVD_BIT 3
>> +#define PFERR_FETCH_BIT 4
>> +
>> +#define PFERR_PRESENT_MASK (1U << PFERR_PRESENT_BIT)
>> +#define PFERR_WRITE_MASK (1U << PFERR_WRITE_BIT)
>> +#define PFERR_USER_MASK (1U << PFERR_USER_BIT)
>> +#define PFERR_RSVD_MASK (1U << PFERR_RSVD_BIT)
>> +#define PFERR_FETCH_MASK (1U << PFERR_FETCH_BIT)
>> +
>> /* apic attention bits */
>> #define KVM_APIC_CHECK_VAPIC 0
>> /*
>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>> index 7a9697f..e5a84be 100644
>> --- a/arch/x86/kvm/emulate.c
>> +++ b/arch/x86/kvm/emulate.c
>> @@ -4941,8 +4941,12 @@ int x86_emulate_insn(struct x86_emulate_ctxt
>> *ctxt)
>> /* optimisation - avoid slow emulated read if Mov */
>> rc = segmented_read(ctxt, ctxt->dst.addr.mem,
>> &ctxt->dst.val, ctxt->dst.bytes);
>
> This is a write operation, do you know why we need to read the dst operand first here?
Some x86 instructions read the destination operand during their operation.
For instance - "MOV [EAX], EBX" (Intel ASM format), would perform
[EAX] = [EAX] + EBX. Therefore, it would first read the memory of [EAX] at
this stage.
Nadav
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes
2014-12-26 7:25 ` Wu, Feng
@ 2014-12-27 20:05 ` Nadav Amit
0 siblings, 0 replies; 18+ messages in thread
From: Nadav Amit @ 2014-12-27 20:05 UTC (permalink / raw)
To: Wu, Feng; +Cc: Chen, Tiejun, Paolo Bonzini, kvm list
Feng <feng.wu@intel.com> wrote:
>
>
>> -----Original Message-----
>> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
>> Behalf Of Nadav Amit
>> Sent: Thursday, December 25, 2014 5:55 PM
>> To: Chen, Tiejun
>> Cc: Paolo Bonzini; kvm list
>> Subject: Re: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes
>>
>> Tiejun <tiejun.chen@intel.com> wrote:
>>
>>> On 2014/12/25 8:52, Nadav Amit wrote:
>>>> Although pop sreg updates RSP according to the operand size, only 2 bytes
>> are
>>>> read. The current behavior may result in incorrect #GP or #PF exceptions.
>>>>
>>>> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
>>>> ---
>>>> arch/x86/kvm/emulate.c | 4 +++-
>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>>>> index e5a84be..702da5e 100644
>>>> --- a/arch/x86/kvm/emulate.c
>>>> +++ b/arch/x86/kvm/emulate.c
>>>> @@ -1830,12 +1830,14 @@ static int em_pop_sreg(struct
>> x86_emulate_ctxt *ctxt)
>>>> unsigned long selector;
>>>> int rc;
>>>
>>> Looks we just should do similar thing to em_push_sreg(),
>>>
>>> unsigned long selector;
>>> int rc;
>>>
>>> + if (ctxt->op_bytes == 4) {
>>> + rsp_increment(ctxt, -2);
>>> + ctxt->op_bytes = 2;
>>> + }
>>> rc = emulate_pop(ctxt, &selector, ctxt->op_bytes);
>>> if (rc != X86EMUL_CONTINUE)
>>> return rc;
>>>
>>> Right?
>> I don't think so. It seems the behaviour of push and pop is a bit different.
>> For push: "If the source operand is a segment register (16 bits) and the
>> operand size is 64-bits, a zero-extended value is pushed on the stack; if
>> the operand size is 32-bits ... all recent Core and Atom processors perform
>> a 16-bit move, leaving the upper portion of the stack location unmodified."
>>
>> Therefore, for push in the case of op_bytes==8 we push zero-extended value.
>>
>> For pop the behaviour is not well-documented, but experimentally it appears
>> only the first two bytes are accessed. I cannot see why it would be
>> different when opsize is 8, since it is not like the push case, where the
>> segment register value was zero extended.
>
> Let's take 64-bits operand size as an example. When pushing a segment register, it
> uses zero-extended value, so 8 bytes will be pushed on the stack. When popping it,
> the current code return the top 8 bytes in the stack, and it only uses the lowest 2
> bytes for load_segment_descriptor(). what is the issue here?
The issue I try to solve is that during the emulated write operation of the
pop the read is perform using the wrong size (operand size instead of
segment selector size). As you indicated, the destination register/memory of
the pop instruction will be identical before the fix and after the fix.
However, the emulated read may cause #PF if the operand-size that does not
occur on read hardware. Consider for instance a case in which the operand
size is 8, RSP=0xFFC and the page of [0x1000] is non-present. In this case
POP-SREG should not cause a #PF, yet on KVM it does.
Nadav
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/8] KVM: x86: Emulator fixes
2014-12-25 0:52 [PATCH 0/8] KVM: x86: Emulator fixes Nadav Amit
` (7 preceding siblings ...)
2014-12-25 0:52 ` [PATCH 8/8] KVM: x86: Access to LDT/GDT that wraparound is incorrect Nadav Amit
@ 2014-12-27 22:24 ` Paolo Bonzini
2015-01-08 10:42 ` Paolo Bonzini
9 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2014-12-27 22:24 UTC (permalink / raw)
To: Nadav Amit; +Cc: kvm
On 25/12/2014 01:52, Nadav Amit wrote:
> The first one is the most interesting one. It appears that the current
> behavior may cause the VM to enter the page-fault handler twice on certain
> faulting write accesses. If you do not like my solution, please propose a
> better one.
My series here: http://www.spinics.net/lists/kvm/msg101498.html would
fix this as well, but vmx.flat show it has a problem that I've not
managed to debug yet. So your solution is fine for now.
Paolo
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/8] KVM: x86: Emulator fixes
2014-12-25 0:52 [PATCH 0/8] KVM: x86: Emulator fixes Nadav Amit
` (8 preceding siblings ...)
2014-12-27 22:24 ` [PATCH 0/8] KVM: x86: Emulator fixes Paolo Bonzini
@ 2015-01-08 10:42 ` Paolo Bonzini
9 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2015-01-08 10:42 UTC (permalink / raw)
To: Nadav Amit; +Cc: kvm
On 25/12/2014 01:52, Nadav Amit wrote:
> Few more emulator fixes. Each is logically independent from the others.
>
> The first one is the most interesting one. It appears that the current
> behavior may cause the VM to enter the page-fault handler twice on certain
> faulting write accesses. If you do not like my solution, please propose a
> better one.
>
> The fourth (JMP/CALL using call- or task-gate) is not a fix, but returns an
> error instead of emulating the wrong (#GP) exception.
>
> Thanks for reviewing the patches.
>
> Nadav Amit (8):
> KVM: x86: #PF error-code on R/W operations is wrong
> KVM: x86: pop sreg accesses only 2 bytes
> KVM: x86: fnstcw and fnstsw may cause spurious exception
> KVM: x86: JMP/CALL using call- or task-gate causes exception
> KVM: x86: em_call_far should return failure result
> KVM: x86: POP [ESP] is not emulated correctly
> KVM: x86: Do not set access bit on accessed segments
> KVM: x86: Access to LDT/GDT that wraparound is incorrect
>
> arch/x86/include/asm/kvm_host.h | 12 ++++
> arch/x86/kvm/emulate.c | 138 ++++++++++++++++++++++++++--------------
> arch/x86/kvm/mmu.h | 12 ----
> 3 files changed, 103 insertions(+), 59 deletions(-)
>
I'm applying patches 2-8. I want to play a bit more with patch 1.
Paolo
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2015-01-08 10:42 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-25 0:52 [PATCH 0/8] KVM: x86: Emulator fixes Nadav Amit
2014-12-25 0:52 ` [PATCH 1/8] KVM: x86: #PF error-code on R/W operations is wrong Nadav Amit
2014-12-26 7:25 ` Wu, Feng
2014-12-27 19:55 ` Nadav Amit
2014-12-25 0:52 ` [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes Nadav Amit
2014-12-25 9:10 ` Chen, Tiejun
2014-12-25 9:55 ` Nadav Amit
2014-12-26 1:54 ` Chen, Tiejun
2014-12-26 7:25 ` Wu, Feng
2014-12-27 20:05 ` Nadav Amit
2014-12-25 0:52 ` [PATCH 3/8] KVM: x86: fnstcw and fnstsw may cause spurious exception Nadav Amit
2014-12-25 0:52 ` [PATCH 4/8] KVM: x86: JMP/CALL using call- or task-gate causes exception Nadav Amit
2014-12-25 0:52 ` [PATCH 5/8] KVM: x86: em_call_far should return failure result Nadav Amit
2014-12-25 0:52 ` [PATCH 6/8] KVM: x86: POP [ESP] is not emulated correctly Nadav Amit
2014-12-25 0:52 ` [PATCH 7/8] KVM: x86: Do not set access bit on accessed segments Nadav Amit
2014-12-25 0:52 ` [PATCH 8/8] KVM: x86: Access to LDT/GDT that wraparound is incorrect Nadav Amit
2014-12-27 22:24 ` [PATCH 0/8] KVM: x86: Emulator fixes Paolo Bonzini
2015-01-08 10:42 ` Paolo Bonzini
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.