All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/21] Fixes for various KVM bugs
@ 2014-11-02  9:54 Nadav Amit
  2014-11-02  9:54 ` [PATCH 01/21] KVM: x86: decode_modrm does not regard modrm correctly Nadav Amit
                   ` (21 more replies)
  0 siblings, 22 replies; 59+ messages in thread
From: Nadav Amit @ 2014-11-02  9:54 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, nadav.amit, Nadav Amit

This patch-set fixes various KVM bugs, mainly in the emulator. Each patch is
independent, except for patches 15-16 (which are intended to fix a single bug).
Patch 19 ist not a real fix for bug but improves the behavior of KVM when it
cannot handle a certain guest behavior.

Some previous fixes were found to be incomplete or faulty. Patches 18,20
handle these cases.

Thanks for reviewing the patches. A separate patch-set which deals with
__lienarize (emulator) related bugs would follow.

Nadav Amit (21):
  KVM: x86: decode_modrm does not regard modrm correctly
  KVM: x86: No error-code on real-mode exceptions
  KVM: x86: Emulator should set DR6 upon GD like real CPU
  KVM: x86: Clear DR6[0:3] on #DB during handle_dr
  KVM: x86: Breakpoints do not consider CS.base
  KVM: x86: Emulator MOV-sreg uses incorrect size
  KVM: x86: Emulator considers imm as memory operand
  KVM: x86: Reset FPU state during reset
  KVM: x86: SYSCALL cannot clear eflags[1]
  KVM: x86: Wrong flags on CMPS and SCAS emulation
  KVM: x86: Emulate push sreg as done in Core
  KVM: x86: MOV to CR3 can set bit 63
  KVM: x86: Do not update EFLAGS on faulting emulation
  KVM: x86: Software disabled APIC should still deliver NMIs
  KVM: x86: Combine the lgdt and lidt emulation logic
  KVM: x86: Inject #GP when loading system segments with non-canonical
    base
  KVM: x86: Remove redundant and incorrect cpl check on task-switch
  KVM: x86: Emulator mis-decodes VEX instructions on real-mode
  KVM: x86: Warn on APIC base relocation
  KVM: x86: MOVNTI emulation min opsize is not respected
  KVM: x86: Return UNHANDLABLE on unsupported SYSENTER

 arch/x86/include/asm/kvm_host.h |   2 +
 arch/x86/kvm/emulate.c          | 107 +++++++++++++++++++++++++---------------
 arch/x86/kvm/lapic.c            |  27 +++++++---
 arch/x86/kvm/vmx.c              |   5 +-
 arch/x86/kvm/x86.c              |  40 +++++++++------
 5 files changed, 116 insertions(+), 65 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 01/21] KVM: x86: decode_modrm does not regard modrm correctly
  2014-11-02  9:54 [PATCH 00/21] Fixes for various KVM bugs Nadav Amit
@ 2014-11-02  9:54 ` Nadav Amit
  2014-11-05 11:14   ` Paolo Bonzini
  2014-11-02  9:54 ` [PATCH 02/21] KVM: x86: No error-code on real-mode exceptions Nadav Amit
                   ` (20 subsequent siblings)
  21 siblings, 1 reply; 59+ messages in thread
From: Nadav Amit @ 2014-11-02  9:54 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, nadav.amit, Nadav Amit

In one occassion, decode_modrm uses the rm field after it is extended with
REX.B to determine the addressing mode. Doing so causes it not to read the
offset for rip-relative addressing.

This patch uses the value after masking instead.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/kvm/emulate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 4b3fa70..efe7239 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1209,7 +1209,7 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
 		}
 		switch (ctxt->modrm_mod) {
 		case 0:
-			if (ctxt->modrm_rm == 5)
+			if ((ctxt->modrm_rm & 7) == 5)
 				modrm_ea += insn_fetch(s32, ctxt);
 			break;
 		case 1:
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 02/21] KVM: x86: No error-code on real-mode exceptions
  2014-11-02  9:54 [PATCH 00/21] Fixes for various KVM bugs Nadav Amit
  2014-11-02  9:54 ` [PATCH 01/21] KVM: x86: decode_modrm does not regard modrm correctly Nadav Amit
@ 2014-11-02  9:54 ` Nadav Amit
  2014-11-02  9:54 ` [PATCH 03/21] KVM: x86: Emulator should set DR6 upon GD like real CPU Nadav Amit
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 59+ messages in thread
From: Nadav Amit @ 2014-11-02  9:54 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, nadav.amit, Nadav Amit

Real-mode exceptions do not deliver error code. As can be seen in Intel SDM
volume 2, real-mode exceptions do not have parentheses, which indicate
error-code.  To avoid significant changes of the code, the error code is
"removed" during exception queueing.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/kvm/x86.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 14c49cf..e885029 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -353,6 +353,8 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
 
 	if (!vcpu->arch.exception.pending) {
 	queue:
+		if (has_error && !is_protmode(vcpu))
+			has_error = false;
 		vcpu->arch.exception.pending = true;
 		vcpu->arch.exception.has_error_code = has_error;
 		vcpu->arch.exception.nr = nr;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 03/21] KVM: x86: Emulator should set DR6 upon GD like real CPU
  2014-11-02  9:54 [PATCH 00/21] Fixes for various KVM bugs Nadav Amit
  2014-11-02  9:54 ` [PATCH 01/21] KVM: x86: decode_modrm does not regard modrm correctly Nadav Amit
  2014-11-02  9:54 ` [PATCH 02/21] KVM: x86: No error-code on real-mode exceptions Nadav Amit
@ 2014-11-02  9:54 ` Nadav Amit
  2014-11-02  9:54 ` [PATCH 04/21] KVM: x86: Clear DR6[0:3] on #DB during handle_dr Nadav Amit
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 59+ messages in thread
From: Nadav Amit @ 2014-11-02  9:54 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, nadav.amit, Nadav Amit

It should clear B0-B3 and set BD.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/kvm/emulate.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index efe7239..273c37e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3568,8 +3568,15 @@ static int check_dr_read(struct x86_emulate_ctxt *ctxt)
 	if ((cr4 & X86_CR4_DE) && (dr == 4 || dr == 5))
 		return emulate_ud(ctxt);
 
-	if (check_dr7_gd(ctxt))
+	if (check_dr7_gd(ctxt)) {
+		ulong dr6;
+
+		ctxt->ops->get_dr(ctxt, 6, &dr6);
+		dr6 &= ~15;
+		dr6 |= DR6_BD | DR6_RTM;
+		ctxt->ops->set_dr(ctxt, 6, dr6);
 		return emulate_db(ctxt);
+	}
 
 	return X86EMUL_CONTINUE;
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 04/21] KVM: x86: Clear DR6[0:3] on #DB during handle_dr
  2014-11-02  9:54 [PATCH 00/21] Fixes for various KVM bugs Nadav Amit
                   ` (2 preceding siblings ...)
  2014-11-02  9:54 ` [PATCH 03/21] KVM: x86: Emulator should set DR6 upon GD like real CPU Nadav Amit
@ 2014-11-02  9:54 ` Nadav Amit
  2014-11-02  9:54 ` [PATCH 05/21] KVM: x86: Breakpoints do not consider CS.base Nadav Amit
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 59+ messages in thread
From: Nadav Amit @ 2014-11-02  9:54 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, nadav.amit, Nadav Amit

DR6[0:3] (previous breakpoint indications) are cleared when #DB is injected
during handle_exception, just as real hardware does.  Similarily, handle_dr
should clear DR6[0:3].

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/kvm/vmx.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index fe4d2f4..8f81bae 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5192,6 +5192,7 @@ static int handle_dr(struct kvm_vcpu *vcpu)
 			vcpu->run->exit_reason = KVM_EXIT_DEBUG;
 			return 0;
 		} else {
+			vcpu->arch.dr6 &= ~15;
 			vcpu->arch.dr6 |= DR6_BD | DR6_RTM;
 			kvm_queue_exception(vcpu, DB_VECTOR);
 			return 1;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 05/21] KVM: x86: Breakpoints do not consider CS.base
  2014-11-02  9:54 [PATCH 00/21] Fixes for various KVM bugs Nadav Amit
                   ` (3 preceding siblings ...)
  2014-11-02  9:54 ` [PATCH 04/21] KVM: x86: Clear DR6[0:3] on #DB during handle_dr Nadav Amit
@ 2014-11-02  9:54 ` Nadav Amit
  2014-11-02  9:54 ` [PATCH 06/21] KVM: x86: Emulator MOV-sreg uses incorrect size Nadav Amit
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 59+ messages in thread
From: Nadav Amit @ 2014-11-02  9:54 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, nadav.amit, Nadav Amit

x86 debug registers hold a linear address. Therefore, breakpoints detection
should consider CS.base, and check whether instruction linear address equals
(CS.base + RIP). This patch introduces a function to evaluate RIP linear
address and uses it for breakpoints detection.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/vmx.c              |  4 +---
 arch/x86/kvm/x86.c              | 29 ++++++++++++++++-------------
 3 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3ac8076..904535f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1067,6 +1067,7 @@ void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
 void kvm_define_shared_msr(unsigned index, u32 msr);
 int kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
 
+unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu);
 bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);
 
 void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 8f81bae..42106be 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5185,9 +5185,7 @@ static int handle_dr(struct kvm_vcpu *vcpu)
 		if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP) {
 			vcpu->run->debug.arch.dr6 = vcpu->arch.dr6;
 			vcpu->run->debug.arch.dr7 = dr7;
-			vcpu->run->debug.arch.pc =
-				vmcs_readl(GUEST_CS_BASE) +
-				vmcs_readl(GUEST_RIP);
+			vcpu->run->debug.arch.pc = kvm_get_linear_rip(vcpu);
 			vcpu->run->debug.arch.exception = DB_VECTOR;
 			vcpu->run->exit_reason = KVM_EXIT_DEBUG;
 			return 0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e885029..773c17e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5207,21 +5207,17 @@ static void kvm_vcpu_check_singlestep(struct kvm_vcpu *vcpu, unsigned long rflag
 
 static bool kvm_vcpu_check_breakpoint(struct kvm_vcpu *vcpu, int *r)
 {
-	struct kvm_run *kvm_run = vcpu->run;
-	unsigned long eip = vcpu->arch.emulate_ctxt.eip;
-	u32 dr6 = 0;
-
 	if (unlikely(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP) &&
 	    (vcpu->arch.guest_debug_dr7 & DR7_BP_EN_MASK)) {
-		dr6 = kvm_vcpu_check_hw_bp(eip, 0,
+		struct kvm_run *kvm_run = vcpu->run;
+		unsigned long eip = kvm_get_linear_rip(vcpu);
+		u32 dr6 = kvm_vcpu_check_hw_bp(eip, 0,
 					   vcpu->arch.guest_debug_dr7,
 					   vcpu->arch.eff_db);
 
 		if (dr6 != 0) {
 			kvm_run->debug.arch.dr6 = dr6 | DR6_FIXED_1 | DR6_RTM;
-			kvm_run->debug.arch.pc = kvm_rip_read(vcpu) +
-				get_segment_base(vcpu, VCPU_SREG_CS);
-
+			kvm_run->debug.arch.pc = eip;
 			kvm_run->debug.arch.exception = DB_VECTOR;
 			kvm_run->exit_reason = KVM_EXIT_DEBUG;
 			*r = EMULATE_USER_EXIT;
@@ -5231,7 +5227,8 @@ static bool kvm_vcpu_check_breakpoint(struct kvm_vcpu *vcpu, int *r)
 
 	if (unlikely(vcpu->arch.dr7 & DR7_BP_EN_MASK) &&
 	    !(kvm_get_rflags(vcpu) & X86_EFLAGS_RF)) {
-		dr6 = kvm_vcpu_check_hw_bp(eip, 0,
+		unsigned long eip = kvm_get_linear_rip(vcpu);
+		u32 dr6 = kvm_vcpu_check_hw_bp(eip, 0,
 					   vcpu->arch.dr7,
 					   vcpu->arch.db);
 
@@ -7538,12 +7535,18 @@ int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu)
 	return kvm_x86_ops->interrupt_allowed(vcpu);
 }
 
-bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip)
+unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu)
 {
-	unsigned long current_rip = kvm_rip_read(vcpu) +
-		get_segment_base(vcpu, VCPU_SREG_CS);
+	if (is_64_bit_mode(vcpu))
+		return kvm_rip_read(vcpu);
+	return (u32)(get_segment_base(vcpu, VCPU_SREG_CS) +
+		     kvm_rip_read(vcpu));
+}
+EXPORT_SYMBOL_GPL(kvm_get_linear_rip);
 
-	return current_rip == linear_rip;
+bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip)
+{
+	return kvm_get_linear_rip(vcpu) == linear_rip;
 }
 EXPORT_SYMBOL_GPL(kvm_is_linear_rip);
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 06/21] KVM: x86: Emulator MOV-sreg uses incorrect size
  2014-11-02  9:54 [PATCH 00/21] Fixes for various KVM bugs Nadav Amit
                   ` (4 preceding siblings ...)
  2014-11-02  9:54 ` [PATCH 05/21] KVM: x86: Breakpoints do not consider CS.base Nadav Amit
@ 2014-11-02  9:54 ` Nadav Amit
  2014-11-05 11:28   ` Paolo Bonzini
  2014-11-02  9:54 ` [PATCH 07/21] KVM: x86: Emulator considers imm as memory operand Nadav Amit
                   ` (15 subsequent siblings)
  21 siblings, 1 reply; 59+ messages in thread
From: Nadav Amit @ 2014-11-02  9:54 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, nadav.amit, Nadav Amit

In x86, you cannot MOV-sreg to memory is either 16-bits or 64-bits.  When
destination is registers, and the operand size is 32-bits, the high 16-bits in
modern CPUs is filled with zero.

In contrast, KVM may write to memory 32-bits on MOV-sreg. This patch fixes KVM
behavior, and sets the destination operand size to two, if the destination is
memory.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/kvm/emulate.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 273c37e..f456783 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3187,6 +3187,8 @@ static int em_mov_rm_sreg(struct x86_emulate_ctxt *ctxt)
 		return emulate_ud(ctxt);
 
 	ctxt->dst.val = get_segment_selector(ctxt, ctxt->modrm_reg);
+	if (ctxt->dst.bytes == 4 && ctxt->dst.type == OP_MEM)
+		ctxt->dst.bytes = 2;
 	return X86EMUL_CONTINUE;
 }
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 07/21] KVM: x86: Emulator considers imm as memory operand
  2014-11-02  9:54 [PATCH 00/21] Fixes for various KVM bugs Nadav Amit
                   ` (5 preceding siblings ...)
  2014-11-02  9:54 ` [PATCH 06/21] KVM: x86: Emulator MOV-sreg uses incorrect size Nadav Amit
@ 2014-11-02  9:54 ` Nadav Amit
  2014-11-05 11:36   ` Paolo Bonzini
  2014-11-02  9:54 ` [PATCH 08/21] KVM: x86: Reset FPU state during reset Nadav Amit
                   ` (14 subsequent siblings)
  21 siblings, 1 reply; 59+ messages in thread
From: Nadav Amit @ 2014-11-02  9:54 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, nadav.amit, Nadav Amit

The emulator mistakenly considers some of the immediate operands as memory
operands, performs memory read and uses the wrong data.  By default, every
operand is marked as OP_MEM, so if it is not changed, memory read may be
wrongly emulated and the wrong value would be used.  Consider for instance the
ROR instruction - src2 (the number of times) would be read from memory instead
of being used as immediate.

Mark every immediate operand as such to avoid this problem.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/kvm/emulate.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index f456783..e624d62 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4269,6 +4269,7 @@ static int decode_operand(struct x86_emulate_ctxt *ctxt, struct operand *op,
 		fetch_register_operand(op);
 		break;
 	case OpCL:
+		op->type = OP_IMM;
 		op->bytes = 1;
 		op->val = reg_read(ctxt, VCPU_REGS_RCX) & 0xff;
 		break;
@@ -4276,6 +4277,7 @@ static int decode_operand(struct x86_emulate_ctxt *ctxt, struct operand *op,
 		rc = decode_imm(ctxt, op, 1, true);
 		break;
 	case OpOne:
+		op->type = OP_IMM;
 		op->bytes = 1;
 		op->val = 1;
 		break;
@@ -4334,21 +4336,27 @@ static int decode_operand(struct x86_emulate_ctxt *ctxt, struct operand *op,
 		ctxt->memop.bytes = ctxt->op_bytes + 2;
 		goto mem_common;
 	case OpES:
+		op->type = OP_IMM;
 		op->val = VCPU_SREG_ES;
 		break;
 	case OpCS:
+		op->type = OP_IMM;
 		op->val = VCPU_SREG_CS;
 		break;
 	case OpSS:
+		op->type = OP_IMM;
 		op->val = VCPU_SREG_SS;
 		break;
 	case OpDS:
+		op->type = OP_IMM;
 		op->val = VCPU_SREG_DS;
 		break;
 	case OpFS:
+		op->type = OP_IMM;
 		op->val = VCPU_SREG_FS;
 		break;
 	case OpGS:
+		op->type = OP_IMM;
 		op->val = VCPU_SREG_GS;
 		break;
 	case OpImplicit:
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 08/21] KVM: x86: Reset FPU state during reset
  2014-11-02  9:54 [PATCH 00/21] Fixes for various KVM bugs Nadav Amit
                   ` (6 preceding siblings ...)
  2014-11-02  9:54 ` [PATCH 07/21] KVM: x86: Emulator considers imm as memory operand Nadav Amit
@ 2014-11-02  9:54 ` Nadav Amit
  2014-11-05 12:04   ` Paolo Bonzini
  2014-11-02  9:54 ` [PATCH 09/21] KVM: x86: SYSCALL cannot clear eflags[1] Nadav Amit
                   ` (13 subsequent siblings)
  21 siblings, 1 reply; 59+ messages in thread
From: Nadav Amit @ 2014-11-02  9:54 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, nadav.amit, Nadav Amit

When resetting the VCPU, the FPU should be reset as well (e.g., XCR0 state).
Call fx_init during reset as well.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/kvm/x86.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 773c17e..9b90ea7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7020,6 +7020,9 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu)
 	vcpu->arch.regs_avail = ~0;
 	vcpu->arch.regs_dirty = ~0;
 
+	/* should never fail, since fpu_alloc already done */
+	fx_init(vcpu);
+
 	kvm_x86_ops->vcpu_reset(vcpu);
 }
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 09/21] KVM: x86: SYSCALL cannot clear eflags[1]
  2014-11-02  9:54 [PATCH 00/21] Fixes for various KVM bugs Nadav Amit
                   ` (7 preceding siblings ...)
  2014-11-02  9:54 ` [PATCH 08/21] KVM: x86: Reset FPU state during reset Nadav Amit
@ 2014-11-02  9:54 ` Nadav Amit
  2014-11-02  9:54 ` [PATCH 10/21] KVM: x86: Wrong flags on CMPS and SCAS emulation Nadav Amit
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 59+ messages in thread
From: Nadav Amit @ 2014-11-02  9:54 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, nadav.amit, Nadav Amit

SYSCALL emulation currently clears in 64-bit mode eflags according to
MSR_SYSCALL_MASK.  However, on bare-metal eflags[1] which is fixed to one
cannot be cleared, even if MSR_SYSCALL_MASK masks the bit.  This wrong behavior
may result in failed VM-entry, as VT disallows entry with eflags[1] cleared.

This patch sets the bit after masking eflags on syscall.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/kvm/emulate.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e624d62..f1484bb 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2286,6 +2286,7 @@ static int em_syscall(struct x86_emulate_ctxt *ctxt)
 
 		ops->get_msr(ctxt, MSR_SYSCALL_MASK, &msr_data);
 		ctxt->eflags &= ~msr_data;
+		ctxt->eflags |= EFLG_RESERVED_ONE_MASK;
 #endif
 	} else {
 		/* legacy mode */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 10/21] KVM: x86: Wrong flags on CMPS and SCAS emulation
  2014-11-02  9:54 [PATCH 00/21] Fixes for various KVM bugs Nadav Amit
                   ` (8 preceding siblings ...)
  2014-11-02  9:54 ` [PATCH 09/21] KVM: x86: SYSCALL cannot clear eflags[1] Nadav Amit
@ 2014-11-02  9:54 ` Nadav Amit
  2014-11-02  9:54 ` [PATCH 11/21] KVM: x86: Emulate push sreg as done in Core Nadav Amit
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 59+ messages in thread
From: Nadav Amit @ 2014-11-02  9:54 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, nadav.amit, Nadav Amit

CMPS and SCAS instructions are evaluated in the wrong order.  For reference (of
CMPS), see http://www.fermimn.gov.it/linux/quarta/x86/cmps.htm : "Note that the
direction of subtraction for CMPS is [SI] - [DI] or [ESI] - [EDI]. The left
operand (SI or ESI) is the source and the right operand (DI or EDI) is the
destination. This is the reverse of the usual Intel convention in which the
left operand is the destination and the right operand is the source."

Introducing em_cmp_r for this matter that performs comparison in reverse order
using fastop infrastructure to avoid a wrapper function.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/kvm/emulate.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index f1484bb..9dfd286 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -380,6 +380,15 @@ static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *));
 	ON64(FOP2E(op##q, rax, cl)) \
 	FOP_END
 
+/* 2 operand, src and dest are reversed */
+#define FASTOP2R(op, name) \
+	FOP_START(name) \
+	FOP2E(op##b, dl, al) \
+	FOP2E(op##w, dx, ax) \
+	FOP2E(op##l, edx, eax) \
+	ON64(FOP2E(op##q, rdx, rax)) \
+	FOP_END
+
 #define FOP3E(op,  dst, src, src2) \
 	FOP_ALIGN #op " %" #src2 ", %" #src ", %" #dst " \n\t" FOP_RET
 
@@ -890,6 +899,8 @@ FASTOP2W(btc);
 
 FASTOP2(xadd);
 
+FASTOP2R(cmp, cmp_r);
+
 static u8 test_cc(unsigned int condition, unsigned long flags)
 {
 	u8 rc;
@@ -3973,12 +3984,12 @@ static const struct opcode opcode_table[256] = {
 	I2bv(DstAcc | SrcMem | Mov | MemAbs, em_mov),
 	I2bv(DstMem | SrcAcc | Mov | MemAbs | PageTable, em_mov),
 	I2bv(SrcSI | DstDI | Mov | String, em_mov),
-	F2bv(SrcSI | DstDI | String | NoWrite, em_cmp),
+	F2bv(SrcSI | DstDI | String | NoWrite, em_cmp_r),
 	/* 0xA8 - 0xAF */
 	F2bv(DstAcc | SrcImm | NoWrite, em_test),
 	I2bv(SrcAcc | DstDI | Mov | String, em_mov),
 	I2bv(SrcSI | DstAcc | Mov | String, em_mov),
-	F2bv(SrcAcc | DstDI | String | NoWrite, em_cmp),
+	F2bv(SrcAcc | DstDI | String | NoWrite, em_cmp_r),
 	/* 0xB0 - 0xB7 */
 	X8(I(ByteOp | DstReg | SrcImm | Mov, em_mov)),
 	/* 0xB8 - 0xBF */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 11/21] KVM: x86: Emulate push sreg as done in Core
  2014-11-02  9:54 [PATCH 00/21] Fixes for various KVM bugs Nadav Amit
                   ` (9 preceding siblings ...)
  2014-11-02  9:54 ` [PATCH 10/21] KVM: x86: Wrong flags on CMPS and SCAS emulation Nadav Amit
@ 2014-11-02  9:54 ` Nadav Amit
  2014-11-02  9:54 ` [PATCH 12/21] KVM: x86: MOV to CR3 can set bit 63 Nadav Amit
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 59+ messages in thread
From: Nadav Amit @ 2014-11-02  9:54 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, nadav.amit, Nadav Amit

According to Intel SDM push of segment selectors is done in the following
manner: "if the operand size is 32-bits, either a zero-extended value is pushed
on the stack or the segment selector is written on the stack using a 16-bit
move. For the last case, all recent Core and Atom processors perform a 16-bit
move, leaving the upper portion of the stack location unmodified."

This patch modifies the behavior to match the core behavior.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/kvm/emulate.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 9dfd286..d45a57b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1797,6 +1797,10 @@ static int em_push_sreg(struct x86_emulate_ctxt *ctxt)
 	int seg = ctxt->src2.val;
 
 	ctxt->src.val = get_segment_selector(ctxt, seg);
+	if (ctxt->op_bytes == 4) {
+		rsp_increment(ctxt, -2);
+		ctxt->op_bytes = 2;
+	}
 
 	return em_push(ctxt);
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 12/21] KVM: x86: MOV to CR3 can set bit 63
  2014-11-02  9:54 [PATCH 00/21] Fixes for various KVM bugs Nadav Amit
                   ` (10 preceding siblings ...)
  2014-11-02  9:54 ` [PATCH 11/21] KVM: x86: Emulate push sreg as done in Core Nadav Amit
@ 2014-11-02  9:54 ` Nadav Amit
  2015-02-10 16:15   ` Jan Kiszka
  2014-11-02  9:54 ` [PATCH 13/21] KVM: x86: Do not update EFLAGS on faulting emulation Nadav Amit
                   ` (9 subsequent siblings)
  21 siblings, 1 reply; 59+ messages in thread
From: Nadav Amit @ 2014-11-02  9:54 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, nadav.amit, Nadav Amit

Although Intel SDM mentions bit 63 is reserved, MOV to CR3 can have bit 63 set.
As Intel SDM states in section 4.10.4 "Invalidation of TLBs and
Paging-Structure Caches": " MOV to CR3. ... If CR4.PCIDE = 1 and bit 63 of the
instruction’s source operand is 0 ..."

In other words, bit 63 is not reserved. KVM emulator currently consider bit 63
as reserved. Fix it.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/emulate.c          | 2 +-
 arch/x86/kvm/x86.c              | 2 ++
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 904535f..dc932d3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -51,6 +51,7 @@
 			  | X86_CR0_NW | X86_CR0_CD | X86_CR0_PG))
 
 #define CR3_L_MODE_RESERVED_BITS 0xFFFFFF0000000000ULL
+#define CR3_PCID_INVD		 (1UL << 63)
 #define CR4_RESERVED_BITS                                               \
 	(~(unsigned long)(X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | X86_CR4_DE\
 			  | X86_CR4_PSE | X86_CR4_PAE | X86_CR4_MCE     \
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index d45a57b..259c04b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3544,7 +3544,7 @@ static int check_cr_write(struct x86_emulate_ctxt *ctxt)
 
 		ctxt->ops->get_msr(ctxt, MSR_EFER, &efer);
 		if (efer & EFER_LMA)
-			rsvd = CR3_L_MODE_RESERVED_BITS;
+			rsvd = CR3_L_MODE_RESERVED_BITS & ~CR3_PCID_INVD;
 
 		if (new_val & rsvd)
 			return emulate_gp(ctxt, 0);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9b90ea7..204e5b4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -750,6 +750,8 @@ EXPORT_SYMBOL_GPL(kvm_set_cr4);
 
 int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 {
+	cr3 &= ~CR3_PCID_INVD;
+
 	if (cr3 == kvm_read_cr3(vcpu) && !pdptrs_changed(vcpu)) {
 		kvm_mmu_sync_roots(vcpu);
 		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 13/21] KVM: x86: Do not update EFLAGS on faulting emulation
  2014-11-02  9:54 [PATCH 00/21] Fixes for various KVM bugs Nadav Amit
                   ` (11 preceding siblings ...)
  2014-11-02  9:54 ` [PATCH 12/21] KVM: x86: MOV to CR3 can set bit 63 Nadav Amit
@ 2014-11-02  9:54 ` Nadav Amit
  2014-11-02  9:54 ` [PATCH 14/21] KVM: x86: Software disabled APIC should still deliver NMIs Nadav Amit
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 59+ messages in thread
From: Nadav Amit @ 2014-11-02  9:54 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, nadav.amit, Nadav Amit

If the emulation ends in fault, eflags should not be updated.  However, several
instruction emulations (actually all the fastops) currently update eflags, if
the fault was detected afterwards (e.g., #PF during writeback).

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/kvm/x86.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 204e5b4..9cf8da4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5360,7 +5360,9 @@ restart:
 		kvm_rip_write(vcpu, ctxt->eip);
 		if (r == EMULATE_DONE)
 			kvm_vcpu_check_singlestep(vcpu, rflags, &r);
-		__kvm_set_rflags(vcpu, ctxt->eflags);
+		if (!ctxt->have_exception ||
+		    exception_type(ctxt->exception.vector) == EXCPT_TRAP)
+			__kvm_set_rflags(vcpu, ctxt->eflags);
 
 		/*
 		 * For STI, interrupts are shadowed; so KVM_REQ_EVENT will
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 14/21] KVM: x86: Software disabled APIC should still deliver NMIs
  2014-11-02  9:54 [PATCH 00/21] Fixes for various KVM bugs Nadav Amit
                   ` (12 preceding siblings ...)
  2014-11-02  9:54 ` [PATCH 13/21] KVM: x86: Do not update EFLAGS on faulting emulation Nadav Amit
@ 2014-11-02  9:54 ` Nadav Amit
  2014-11-05 12:30   ` Paolo Bonzini
  2014-11-02  9:54 ` [PATCH 15/21] KVM: x86: Combine the lgdt and lidt emulation logic Nadav Amit
                   ` (7 subsequent siblings)
  21 siblings, 1 reply; 59+ messages in thread
From: Nadav Amit @ 2014-11-02  9:54 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, nadav.amit, Nadav Amit

Currently, the APIC logical map does not consider VCPUs whose local-apic is
software-disabled.  However, NMIs, INIT, etc. should still be delivered to such
VCPUs. Therefore, the APIC mode should first be determined, and then the map,
considering all VCPUs should be constructed.

To address this issue, first find the APIC mode, and only then construct the
logical map.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/kvm/lapic.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 66dd173..2a838af 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -156,8 +156,6 @@ static void recalculate_apic_map(struct kvm *kvm)
 
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		struct kvm_lapic *apic = vcpu->arch.apic;
-		u16 cid, lid;
-		u32 ldr;
 
 		if (!kvm_apic_present(vcpu))
 			continue;
@@ -175,13 +173,22 @@ static void recalculate_apic_map(struct kvm *kvm)
 			new->cid_mask = (1 << KVM_X2APIC_CID_BITS) - 1;
 			new->lid_mask = 0xffff;
 			new->broadcast = X2APIC_BROADCAST;
-		} else if (kvm_apic_sw_enabled(apic) &&
-				!new->cid_mask /* flat mode */ &&
-				kvm_apic_get_reg(apic, APIC_DFR) == APIC_DFR_CLUSTER) {
-			new->cid_shift = 4;
-			new->cid_mask = 0xf;
-			new->lid_mask = 0xf;
+			break;
+		} else if (kvm_apic_sw_enabled(apic)) {
+			if (kvm_apic_get_reg(apic, APIC_DFR) ==
+							APIC_DFR_CLUSTER) {
+				new->cid_shift = 4;
+				new->cid_mask = 0xf;
+				new->lid_mask = 0xf;
+			}
+			break;
 		}
+	}
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		struct kvm_lapic *apic = vcpu->arch.apic;
+		u16 cid, lid;
+		u32 ldr;
 
 		new->phys_map[kvm_apic_id(apic)] = apic;
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 15/21] KVM: x86: Combine the lgdt and lidt emulation logic
  2014-11-02  9:54 [PATCH 00/21] Fixes for various KVM bugs Nadav Amit
                   ` (13 preceding siblings ...)
  2014-11-02  9:54 ` [PATCH 14/21] KVM: x86: Software disabled APIC should still deliver NMIs Nadav Amit
@ 2014-11-02  9:54 ` Nadav Amit
  2014-11-02  9:54 ` [PATCH 16/21] KVM: x86: Inject #GP when loading system segments with non-canonical base Nadav Amit
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 59+ messages in thread
From: Nadav Amit @ 2014-11-02  9:54 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, nadav.amit, Nadav Amit

LGDT and LIDT emulation logic is almost identical. Merge the logic into a
single point to avoid redundancy. This will be used by the next patch that
will ensure the bases of the loaded GDTR and IDTR are canonical.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/kvm/emulate.c | 27 +++++++++++----------------
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 259c04b..d6bea35 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3307,7 +3307,7 @@ static int em_sidt(struct x86_emulate_ctxt *ctxt)
 	return emulate_store_desc_ptr(ctxt, ctxt->ops->get_idt);
 }
 
-static int em_lgdt(struct x86_emulate_ctxt *ctxt)
+static int em_lgdt_lidt(struct x86_emulate_ctxt *ctxt, bool lgdt)
 {
 	struct desc_ptr desc_ptr;
 	int rc;
@@ -3319,12 +3319,20 @@ static int em_lgdt(struct x86_emulate_ctxt *ctxt)
 			     ctxt->op_bytes);
 	if (rc != X86EMUL_CONTINUE)
 		return rc;
-	ctxt->ops->set_gdt(ctxt, &desc_ptr);
+	if (lgdt)
+		ctxt->ops->set_gdt(ctxt, &desc_ptr);
+	else
+		ctxt->ops->set_idt(ctxt, &desc_ptr);
 	/* Disable writeback. */
 	ctxt->dst.type = OP_NONE;
 	return X86EMUL_CONTINUE;
 }
 
+static int em_lgdt(struct x86_emulate_ctxt *ctxt)
+{
+	return em_lgdt_lidt(ctxt, true);
+}
+
 static int em_vmmcall(struct x86_emulate_ctxt *ctxt)
 {
 	int rc;
@@ -3338,20 +3346,7 @@ static int em_vmmcall(struct x86_emulate_ctxt *ctxt)
 
 static int em_lidt(struct x86_emulate_ctxt *ctxt)
 {
-	struct desc_ptr desc_ptr;
-	int rc;
-
-	if (ctxt->mode == X86EMUL_MODE_PROT64)
-		ctxt->op_bytes = 8;
-	rc = read_descriptor(ctxt, ctxt->src.addr.mem,
-			     &desc_ptr.size, &desc_ptr.address,
-			     ctxt->op_bytes);
-	if (rc != X86EMUL_CONTINUE)
-		return rc;
-	ctxt->ops->set_idt(ctxt, &desc_ptr);
-	/* Disable writeback. */
-	ctxt->dst.type = OP_NONE;
-	return X86EMUL_CONTINUE;
+	return em_lgdt_lidt(ctxt, false);
 }
 
 static int em_smsw(struct x86_emulate_ctxt *ctxt)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 16/21] KVM: x86: Inject #GP when loading system segments with non-canonical base
  2014-11-02  9:54 [PATCH 00/21] Fixes for various KVM bugs Nadav Amit
                   ` (14 preceding siblings ...)
  2014-11-02  9:54 ` [PATCH 15/21] KVM: x86: Combine the lgdt and lidt emulation logic Nadav Amit
@ 2014-11-02  9:54 ` Nadav Amit
  2014-11-02  9:54 ` [PATCH 17/21] KVM: x86: Remove redundant and incorrect cpl check on task-switch Nadav Amit
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 59+ messages in thread
From: Nadav Amit @ 2014-11-02  9:54 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, nadav.amit, Nadav Amit

When emulating LTR/LDTR/LGDT/LIDT, #GP should be injected if the base is
non-canonical. Otherwise, VM-entry will fail.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/kvm/emulate.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index d6bea35..830da4b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1600,6 +1600,9 @@ static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
 				sizeof(base3), &ctxt->exception);
 		if (ret != X86EMUL_CONTINUE)
 			return ret;
+		if (is_noncanonical_address(get_desc_base(&seg_desc) |
+					     ((u64)base3 << 32)))
+			return emulate_gp(ctxt, 0);
 	}
 load:
 	ctxt->ops->set_segment(ctxt, selector, &seg_desc, base3, seg);
@@ -3319,6 +3322,9 @@ static int em_lgdt_lidt(struct x86_emulate_ctxt *ctxt, bool lgdt)
 			     ctxt->op_bytes);
 	if (rc != X86EMUL_CONTINUE)
 		return rc;
+	if (ctxt->mode == X86EMUL_MODE_PROT64 &&
+	    is_noncanonical_address(desc_ptr.address))
+		return emulate_gp(ctxt, 0);
 	if (lgdt)
 		ctxt->ops->set_gdt(ctxt, &desc_ptr);
 	else
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 17/21] KVM: x86: Remove redundant and incorrect cpl check on task-switch
  2014-11-02  9:54 [PATCH 00/21] Fixes for various KVM bugs Nadav Amit
                   ` (15 preceding siblings ...)
  2014-11-02  9:54 ` [PATCH 16/21] KVM: x86: Inject #GP when loading system segments with non-canonical base Nadav Amit
@ 2014-11-02  9:54 ` Nadav Amit
  2014-11-02  9:54 ` [PATCH 18/21] KVM: x86: Emulator mis-decodes VEX instructions on real-mode Nadav Amit
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 59+ messages in thread
From: Nadav Amit @ 2014-11-02  9:54 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, nadav.amit, Nadav Amit

Task-switch emulation checks the privilege level prior to performing the
task-switch.  This check is incorrect in the case of task-gates, in which the
tss.dpl is ignored, and can cause superfluous exceptions.  Moreover this check
is unnecassary, since the CPU checks the privilege levels prior to exiting.
Intel SDM 25.4.2 says "If CALL or JMP accesses a TSS descriptor directly
outside IA-32e mode, privilege levels are checked on the TSS descriptor" prior
to exiting.  AMD 15.14.1 says "The intercept is checked before the task switch
takes place but after the incoming TSS and task gate (if one was involved) have
been checked for correctness."

This patch removes the CPL checks for CALL and JMP.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/kvm/emulate.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 830da4b..db8cb4d6 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2804,7 +2804,8 @@ static int emulator_do_task_switch(struct x86_emulate_ctxt *ctxt,
 	 *
 	 * 1. jmp/call/int to task gate: Check against DPL of the task gate
 	 * 2. Exception/IRQ/iret: No check is performed
-	 * 3. jmp/call to TSS: Check against DPL of the TSS
+	 * 3. jmp/call to TSS/task-gate: No check is performed since the
+	 *    hardware checks it before exiting.
 	 */
 	if (reason == TASK_SWITCH_GATE) {
 		if (idt_index != -1) {
@@ -2821,13 +2822,8 @@ static int emulator_do_task_switch(struct x86_emulate_ctxt *ctxt,
 			if ((tss_selector & 3) > dpl || ops->cpl(ctxt) > dpl)
 				return emulate_gp(ctxt, (idt_index << 3) | 0x2);
 		}
-	} else if (reason != TASK_SWITCH_IRET) {
-		int dpl = next_tss_desc.dpl;
-		if ((tss_selector & 3) > dpl || ops->cpl(ctxt) > dpl)
-			return emulate_gp(ctxt, tss_selector);
 	}
 
-
 	desc_limit = desc_limit_scaled(&next_tss_desc);
 	if (!next_tss_desc.p ||
 	    ((desc_limit < 0x67 && (next_tss_desc.type & 8)) ||
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 18/21] KVM: x86: Emulator mis-decodes VEX instructions on real-mode
  2014-11-02  9:54 [PATCH 00/21] Fixes for various KVM bugs Nadav Amit
                   ` (16 preceding siblings ...)
  2014-11-02  9:54 ` [PATCH 17/21] KVM: x86: Remove redundant and incorrect cpl check on task-switch Nadav Amit
@ 2014-11-02  9:54 ` Nadav Amit
  2014-11-08  7:25   ` Paolo Bonzini
  2014-11-02  9:54 ` [PATCH 19/21] KVM: x86: Warn on APIC base relocation Nadav Amit
                   ` (3 subsequent siblings)
  21 siblings, 1 reply; 59+ messages in thread
From: Nadav Amit @ 2014-11-02  9:54 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, nadav.amit, Nadav Amit

Commit 7fe864dc942c ("KVM: x86: Emulator considers imm as memory operand")
marked VEX instructions as such in protected mode.  VEX-prefix instructions are
not supported relevant on real-mode and VM86, but should cause #UD instead of
being decoded as LES/LDS.

Fix this behaviour to be consistent with real hardware.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/kvm/emulate.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index db8cb4d6..24b0df7 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4505,8 +4505,7 @@ done_prefixes:
 
 	/* vex-prefix instructions are not implemented */
 	if (ctxt->opcode_len == 1 && (ctxt->b == 0xc5 || ctxt->b == 0xc4) &&
-	    (mode == X86EMUL_MODE_PROT64 ||
-	    (mode >= X86EMUL_MODE_PROT16 && (ctxt->modrm & 0x80)))) {
+	    (mode == X86EMUL_MODE_PROT64 || (ctxt->modrm & 0x80))) {
 		ctxt->d = NotImpl;
 	}
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 19/21] KVM: x86: Warn on APIC base relocation
  2014-11-02  9:54 [PATCH 00/21] Fixes for various KVM bugs Nadav Amit
                   ` (17 preceding siblings ...)
  2014-11-02  9:54 ` [PATCH 18/21] KVM: x86: Emulator mis-decodes VEX instructions on real-mode Nadav Amit
@ 2014-11-02  9:54 ` Nadav Amit
  2014-11-02  9:55 ` [PATCH 20/21] KVM: x86: MOVNTI emulation min opsize is not respected Nadav Amit
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 59+ messages in thread
From: Nadav Amit @ 2014-11-02  9:54 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, nadav.amit, Nadav Amit

APIC base relocation is unsupported by KVM. If anyone uses it, the least should
be to report a warning in the hypervisor.

Note that KVM-unit-tests uses this feature for some reason, so running the
tests triggers the warning.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/kvm/lapic.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 2a838af..a25fa41 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1463,6 +1463,10 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
 	apic->base_address = apic->vcpu->arch.apic_base &
 			     MSR_IA32_APICBASE_BASE;
 
+	if ((value & MSR_IA32_APICBASE_ENABLE) &&
+	     apic->base_address != APIC_DEFAULT_PHYS_BASE)
+		pr_warn_once("APIC base relocation is unsupported by KVM");
+
 	/* with FSB delivery interrupt, we can restart APIC functionality */
 	apic_debug("apic base msr is 0x%016" PRIx64 ", and base address is "
 		   "0x%lx.\n", apic->vcpu->arch.apic_base, apic->base_address);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 20/21] KVM: x86: MOVNTI emulation min opsize is not respected
  2014-11-02  9:54 [PATCH 00/21] Fixes for various KVM bugs Nadav Amit
                   ` (18 preceding siblings ...)
  2014-11-02  9:54 ` [PATCH 19/21] KVM: x86: Warn on APIC base relocation Nadav Amit
@ 2014-11-02  9:55 ` Nadav Amit
  2014-11-05 12:18   ` Paolo Bonzini
  2014-11-06  9:23   ` Paolo Bonzini
  2014-11-02  9:55 ` [PATCH 21/21] KVM: x86: Return UNHANDLABLE on unsupported SYSENTER Nadav Amit
  2014-11-05 12:31 ` [PATCH 00/21] Fixes for various KVM bugs Paolo Bonzini
  21 siblings, 2 replies; 59+ messages in thread
From: Nadav Amit @ 2014-11-02  9:55 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, nadav.amit, Nadav Amit

Commit 3b32004a66e9 ("KVM: x86: movnti minimum op size of 32-bit is not kept")
did not fully fix the minimum operand size of MONTI emulation. Still, MOVNTI
may be mistakenly performed using 16-bit opsize.

This patch add No16 flag to mark an instruction does not support 16-bits
operand size.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/kvm/emulate.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 24b0df7..84a83dc 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -167,6 +167,7 @@
 #define NoBigReal   ((u64)1 << 50)  /* No big real mode */
 #define PrivUD      ((u64)1 << 51)  /* #UD instead of #GP on CPL > 0 */
 #define NearBranch  ((u64)1 << 52)  /* Near branches */
+#define No16	    ((u64)1 << 53)  /* No 16 bit operand */
 
 #define DstXacc     (DstAccLo | SrcAccHi | SrcWrite)
 
@@ -4116,7 +4117,7 @@ static const struct opcode twobyte_table[256] = {
 	D(DstReg | SrcMem8 | ModRM | Mov), D(DstReg | SrcMem16 | ModRM | Mov),
 	/* 0xC0 - 0xC7 */
 	F2bv(DstMem | SrcReg | ModRM | SrcWrite | Lock, em_xadd),
-	N, D(DstMem | SrcReg | ModRM | Mov),
+	N, I(DstMem | SrcReg | ModRM | No16 | Mov, em_mov),
 	N, N, N, GD(0, &group9),
 	/* 0xC8 - 0xCF */
 	X8(I(DstReg, em_bswap)),
@@ -4561,7 +4562,8 @@ done_prefixes:
 		return EMULATION_FAILED;
 
 	if (unlikely(ctxt->d &
-	    (NotImpl|Stack|Op3264|Sse|Mmx|Intercept|CheckPerm|NearBranch))) {
+	    (NotImpl|Stack|Op3264|Sse|Mmx|Intercept|CheckPerm|NearBranch|
+	     No16))) {
 		/*
 		 * These are copied unconditionally here, and checked unconditionally
 		 * in x86_emulate_insn.
@@ -4586,6 +4588,9 @@ done_prefixes:
 				ctxt->op_bytes = 4;
 		}
 
+		if ((ctxt->d & No16) && ctxt->op_bytes == 2)
+			ctxt->op_bytes = 4;
+
 		if (ctxt->d & Sse)
 			ctxt->op_bytes = 16;
 		else if (ctxt->d & Mmx)
@@ -5051,11 +5056,6 @@ twobyte_insn:
 		ctxt->dst.val = (ctxt->src.bytes == 1) ? (s8) ctxt->src.val :
 							(s16) ctxt->src.val;
 		break;
-	case 0xc3:		/* movnti */
-		ctxt->dst.bytes = ctxt->op_bytes;
-		ctxt->dst.val = (ctxt->op_bytes == 8) ? (u64) ctxt->src.val :
-							(u32) ctxt->src.val;
-		break;
 	default:
 		goto cannot_emulate;
 	}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 21/21] KVM: x86: Return UNHANDLABLE on unsupported SYSENTER
  2014-11-02  9:54 [PATCH 00/21] Fixes for various KVM bugs Nadav Amit
                   ` (19 preceding siblings ...)
  2014-11-02  9:55 ` [PATCH 20/21] KVM: x86: MOVNTI emulation min opsize is not respected Nadav Amit
@ 2014-11-02  9:55 ` Nadav Amit
  2014-11-05 12:31 ` [PATCH 00/21] Fixes for various KVM bugs Paolo Bonzini
  21 siblings, 0 replies; 59+ messages in thread
From: Nadav Amit @ 2014-11-02  9:55 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, nadav.amit, Nadav Amit

Now that KVM injects #UD on "unhandlable" error, it makes better sense to
return such error on sysenter instead of directly injecting #UD to the guest.
This allows to track more easily the unhandlable cases the emulator does not
support.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/kvm/emulate.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 84a83dc..7892c76 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2339,11 +2339,9 @@ static int em_sysenter(struct x86_emulate_ctxt *ctxt)
 	    && !vendor_intel(ctxt))
 		return emulate_ud(ctxt);
 
-	/* XXX sysenter/sysexit have not been tested in 64bit mode.
-	* Therefore, we inject an #UD.
-	*/
+	/* sysenter/sysexit have not been tested in 64bit mode. */
 	if (ctxt->mode == X86EMUL_MODE_PROT64)
-		return emulate_ud(ctxt);
+		return X86EMUL_UNHANDLEABLE;
 
 	setup_syscalls_segments(ctxt, &cs, &ss);
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/21] KVM: x86: decode_modrm does not regard modrm correctly
  2014-11-02  9:54 ` [PATCH 01/21] KVM: x86: decode_modrm does not regard modrm correctly Nadav Amit
@ 2014-11-05 11:14   ` Paolo Bonzini
  0 siblings, 0 replies; 59+ messages in thread
From: Paolo Bonzini @ 2014-11-05 11:14 UTC (permalink / raw)
  To: Nadav Amit; +Cc: kvm, nadav.amit

On 02/11/2014 10:54, Nadav Amit wrote:
> In one occassion, decode_modrm uses the rm field after it is extended with
> REX.B to determine the addressing mode. Doing so causes it not to read the
> offset for rip-relative addressing.
> 
> This patch uses the value after masking instead.
> 
> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
> ---
>  arch/x86/kvm/emulate.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 4b3fa70..efe7239 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -1209,7 +1209,7 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
>  		}
>  		switch (ctxt->modrm_mod) {
>  		case 0:
> -			if (ctxt->modrm_rm == 5)
> +			if ((ctxt->modrm_rm & 7) == 5)
>  				modrm_ea += insn_fetch(s32, ctxt);
>  			break;
>  		case 1:
> 

We can do instead:

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 05ea85ff39c6..a4703eb9c1ed 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1223,6 +1223,7 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
 			if (index_reg != 4)
 				modrm_ea += reg_read(ctxt, index_reg) << scale;
 		} else if ((ctxt->modrm_rm & 7) == 5 && ctxt->modrm_mod == 0) {
+			modrm_ea += insn_fetch(s32, ctxt);
 			if (ctxt->mode == X86EMUL_MODE_PROT64)
 				ctxt->rip_relative = 1;
 		} else {
@@ -1231,10 +1232,6 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
 			adjust_modrm_seg(ctxt, base_reg);
 		}
 		switch (ctxt->modrm_mod) {
-		case 0:
-			if ((ctxt->modrm_rm & 7) == 5)
-				modrm_ea += insn_fetch(s32, ctxt);
-			break;
 		case 1:
 			modrm_ea += insn_fetch(s8, ctxt);
 			break;

which matches what we already do to handle the case where REX.B is
ignored for MOD=00, RM=0100:

                        if ((base_reg & 7) == 5 && ctxt->modrm_mod == 0)

I'll write a testcase, too.

Paolo

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH 06/21] KVM: x86: Emulator MOV-sreg uses incorrect size
  2014-11-02  9:54 ` [PATCH 06/21] KVM: x86: Emulator MOV-sreg uses incorrect size Nadav Amit
@ 2014-11-05 11:28   ` Paolo Bonzini
  0 siblings, 0 replies; 59+ messages in thread
From: Paolo Bonzini @ 2014-11-05 11:28 UTC (permalink / raw)
  To: Nadav Amit; +Cc: kvm, nadav.amit



On 02/11/2014 10:54, Nadav Amit wrote:
> In x86, you cannot MOV-sreg to memory is either 16-bits or 64-bits.  When
> destination is registers, and the operand size is 32-bits, the high 16-bits in
> modern CPUs is filled with zero.
> 
> In contrast, KVM may write to memory 32-bits on MOV-sreg. This patch fixes KVM
> behavior, and sets the destination operand size to two, if the destination is
> memory.

    KVM: x86: Emulation of MOV-sreg to memory uses incorrect size
    
    In x86, you can only MOV-sreg to memory with either 16-bits or 64-bits size.
    In contrast, KVM may write to 32-bits memory on MOV-sreg. This patch fixes KVM
    behavior, and sets the destination operand size to two, if the destination is
    memory.
    
    When destination is registers, and the operand size is 32-bits, the high
    16-bits in modern CPUs is filled with zero.  This is handled correctly.
    
    Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo

> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
> ---
>  arch/x86/kvm/emulate.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 273c37e..f456783 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -3187,6 +3187,8 @@ static int em_mov_rm_sreg(struct x86_emulate_ctxt *ctxt)
>  		return emulate_ud(ctxt);
>  
>  	ctxt->dst.val = get_segment_selector(ctxt, ctxt->modrm_reg);
> +	if (ctxt->dst.bytes == 4 && ctxt->dst.type == OP_MEM)
> +		ctxt->dst.bytes = 2;
>  	return X86EMUL_CONTINUE;
>  }
>  
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 07/21] KVM: x86: Emulator considers imm as memory operand
  2014-11-02  9:54 ` [PATCH 07/21] KVM: x86: Emulator considers imm as memory operand Nadav Amit
@ 2014-11-05 11:36   ` Paolo Bonzini
  0 siblings, 0 replies; 59+ messages in thread
From: Paolo Bonzini @ 2014-11-05 11:36 UTC (permalink / raw)
  To: Nadav Amit; +Cc: kvm, nadav.amit

On 02/11/2014 10:54, Nadav Amit wrote:
> The emulator mistakenly considers some of the immediate operands as memory
> operands, performs memory read and uses the wrong data.  By default, every
> operand is marked as OP_MEM, so if it is not changed, memory read may be
> wrongly emulated and the wrong value would be used.  Consider for instance the
> ROR instruction - src2 (the number of times) would be read from memory instead
> of being used as immediate.
> 
> Mark every immediate operand as such to avoid this problem.
> 
> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
> ---
>  arch/x86/kvm/emulate.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index f456783..e624d62 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -4269,6 +4269,7 @@ static int decode_operand(struct x86_emulate_ctxt *ctxt, struct operand *op,
>  		fetch_register_operand(op);
>  		break;
>  	case OpCL:
> +		op->type = OP_IMM;
>  		op->bytes = 1;
>  		op->val = reg_read(ctxt, VCPU_REGS_RCX) & 0xff;
>  		break;
> @@ -4276,6 +4277,7 @@ static int decode_operand(struct x86_emulate_ctxt *ctxt, struct operand *op,
>  		rc = decode_imm(ctxt, op, 1, true);
>  		break;
>  	case OpOne:
> +		op->type = OP_IMM;
>  		op->bytes = 1;
>  		op->val = 1;
>  		break;
> @@ -4334,21 +4336,27 @@ static int decode_operand(struct x86_emulate_ctxt *ctxt, struct operand *op,
>  		ctxt->memop.bytes = ctxt->op_bytes + 2;
>  		goto mem_common;
>  	case OpES:
> +		op->type = OP_IMM;
>  		op->val = VCPU_SREG_ES;
>  		break;
>  	case OpCS:
> +		op->type = OP_IMM;
>  		op->val = VCPU_SREG_CS;
>  		break;
>  	case OpSS:
> +		op->type = OP_IMM;
>  		op->val = VCPU_SREG_SS;
>  		break;
>  	case OpDS:
> +		op->type = OP_IMM;
>  		op->val = VCPU_SREG_DS;
>  		break;
>  	case OpFS:
> +		op->type = OP_IMM;
>  		op->val = VCPU_SREG_FS;
>  		break;
>  	case OpGS:
> +		op->type = OP_IMM;
>  		op->val = VCPU_SREG_GS;
>  		break;
>  	case OpImplicit:
> 

I'm including this for stable@ because, until commit c44b4c6ab80e (KVM:
emulate: clean up initializations in init_decode_cache, 2014-04-16),
this would be harmless.  The type would be set to OP_REG instead of
OP_IMM, but it would not matter because no writeback is done for Src2
operands.

Now, the uninitialized op->type may be an OP_MEM from a previous
instruction.

Paolo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 08/21] KVM: x86: Reset FPU state during reset
  2014-11-02  9:54 ` [PATCH 08/21] KVM: x86: Reset FPU state during reset Nadav Amit
@ 2014-11-05 12:04   ` Paolo Bonzini
  2014-11-05 13:20     ` Nadav Amit
  0 siblings, 1 reply; 59+ messages in thread
From: Paolo Bonzini @ 2014-11-05 12:04 UTC (permalink / raw)
  To: Nadav Amit; +Cc: kvm, nadav.amit



On 02/11/2014 10:54, Nadav Amit wrote:
> When resetting the VCPU, the FPU should be reset as well (e.g., XCR0 state).
> Call fx_init during reset as well.

Actually it shouldn't be after INIT.  XCR0 is not mentioned explicitly 
in Table 9-1 of the SDM (IA-32 Processor States Following Power-up, 
Reset, or INIT), but since MSR_IA32_XSS is not specified, I think XCR0 
should fall under "All other MSRs".

> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
> ---
>  arch/x86/kvm/x86.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 773c17e..9b90ea7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7020,6 +7020,9 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu)
>  	vcpu->arch.regs_avail = ~0;
>  	vcpu->arch.regs_dirty = ~0;
>  
> +	/* should never fail, since fpu_alloc already done */
> +	fx_init(vcpu);
> +
>  	kvm_x86_ops->vcpu_reset(vcpu);
>  }
>  
> 

Even then, I think this patch is not really nice...  The call sequence 
leading to kvm_vcpu_reset is:

kvm_vm_ioctl_create_vcpu
        kvm_arch_vcpu_create
                kvm_vcpu_init
                        kvm_arch_vcpu_init
                                fx_init (does fpu_alloc)
        kvm_arch_vcpu_setup
                kvm_vcpu_reset
                        fx_init (no fpu_alloc)

The FPU state is not needed between kvm_arch_vcpu_init and 
kvm_arch_vcpu_setup.  So we could simply move the reset from 
kvm_vcpu_init to kvm_vcpu_reset, like this:

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 904535fe825e..eaa3be26dfdc 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -914,8 +914,6 @@ void kvm_pic_clear_all(struct kvm_pic *pic, int irq_source_id);
 
 void kvm_inject_nmi(struct kvm_vcpu *vcpu);
 
-int fx_init(struct kvm_vcpu *vcpu);
-
 void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 		       const u8 *new, int bytes);
 int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 773c17ec42dd..a0566efbb77f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6863,26 +6863,10 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
 	return 0;
 }
 
-int fx_init(struct kvm_vcpu *vcpu)
+static int fx_init(struct kvm_vcpu *vcpu)
 {
-	int err;
-
-	err = fpu_alloc(&vcpu->arch.guest_fpu);
-	if (err)
-		return err;
-
-	fpu_finit(&vcpu->arch.guest_fpu);
-
-	/*
-	 * Ensure guest xcr0 is valid for loading
-	 */
-	vcpu->arch.xcr0 = XSTATE_FP;
-
-	vcpu->arch.cr0 |= X86_CR0_ET;
-
-	return 0;
+	return fpu_alloc(&vcpu->arch.guest_fpu);
 }
-EXPORT_SYMBOL_GPL(fx_init);
 
 static void fx_free(struct kvm_vcpu *vcpu)
 {
@@ -7020,6 +7004,15 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu)
 	vcpu->arch.regs_avail = ~0;
 	vcpu->arch.regs_dirty = ~0;
 
+	fpu_finit(&vcpu->arch.guest_fpu);
+
+	/*
+	 * Ensure guest xcr0 is valid for loading
+	 */
+	vcpu->arch.xcr0 = XSTATE_FP;
+
+	vcpu->arch.cr0 |= X86_CR0_ET;
+
 	kvm_x86_ops->vcpu_reset(vcpu);
 }
 

However, as said above I'm not applying either patch, at least for now.

Paolo

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH 20/21] KVM: x86: MOVNTI emulation min opsize is not respected
  2014-11-02  9:55 ` [PATCH 20/21] KVM: x86: MOVNTI emulation min opsize is not respected Nadav Amit
@ 2014-11-05 12:18   ` Paolo Bonzini
  2014-11-05 19:58     ` Nadav Amit
  2014-11-05 19:58     ` Nadav Amit
  2014-11-06  9:23   ` Paolo Bonzini
  1 sibling, 2 replies; 59+ messages in thread
From: Paolo Bonzini @ 2014-11-05 12:18 UTC (permalink / raw)
  To: Nadav Amit; +Cc: kvm, nadav.amit



On 02/11/2014 10:55, Nadav Amit wrote:
> Commit 3b32004a66e9 ("KVM: x86: movnti minimum op size of 32-bit is not kept")
> did not fully fix the minimum operand size of MONTI emulation. Still, MOVNTI
> may be mistakenly performed using 16-bit opsize.
> 
> This patch add No16 flag to mark an instruction does not support 16-bits
> operand size.

So a

    .byte 0x66
    movntiw (%esi), %eax

will zero the higher two bytes of %eax before this patch, and load 4
bytes from (%esi) after?

Paolo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 14/21] KVM: x86: Software disabled APIC should still deliver NMIs
  2014-11-02  9:54 ` [PATCH 14/21] KVM: x86: Software disabled APIC should still deliver NMIs Nadav Amit
@ 2014-11-05 12:30   ` Paolo Bonzini
  2014-11-05 20:45     ` Nadav Amit
  0 siblings, 1 reply; 59+ messages in thread
From: Paolo Bonzini @ 2014-11-05 12:30 UTC (permalink / raw)
  To: Nadav Amit; +Cc: kvm, nadav.amit



On 02/11/2014 10:54, Nadav Amit wrote:
> Currently, the APIC logical map does not consider VCPUs whose local-apic is
> software-disabled.  However, NMIs, INIT, etc. should still be delivered to such
> VCPUs. Therefore, the APIC mode should first be determined, and then the map,
> considering all VCPUs should be constructed.
> 
> To address this issue, first find the APIC mode, and only then construct the
> logical map.
> 
> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
> ---
>  arch/x86/kvm/lapic.c | 23 +++++++++++++++--------
>  1 file changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 66dd173..2a838af 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -156,8 +156,6 @@ static void recalculate_apic_map(struct kvm *kvm)
>  
>  	kvm_for_each_vcpu(i, vcpu, kvm) {
>  		struct kvm_lapic *apic = vcpu->arch.apic;
> -		u16 cid, lid;
> -		u32 ldr;
>  
>  		if (!kvm_apic_present(vcpu))
>  			continue;
> @@ -175,13 +173,22 @@ static void recalculate_apic_map(struct kvm *kvm)
>  			new->cid_mask = (1 << KVM_X2APIC_CID_BITS) - 1;
>  			new->lid_mask = 0xffff;
>  			new->broadcast = X2APIC_BROADCAST;
> -		} else if (kvm_apic_sw_enabled(apic) &&
> -				!new->cid_mask /* flat mode */ &&
> -				kvm_apic_get_reg(apic, APIC_DFR) == APIC_DFR_CLUSTER) {
> -			new->cid_shift = 4;
> -			new->cid_mask = 0xf;
> -			new->lid_mask = 0xf;
> +			break;
> +		} else if (kvm_apic_sw_enabled(apic)) {
> +			if (kvm_apic_get_reg(apic, APIC_DFR) ==
> +							APIC_DFR_CLUSTER) {
> +				new->cid_shift = 4;
> +				new->cid_mask = 0xf;
> +				new->lid_mask = 0xf;
> +			}
> +			break;
>  		}

We need to take into account DFR even if all APICs are software disabled,
in case it will be used to send NMIs.

What about this on top of your patch?

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index c98b44d0ffe6..d8b11f47f45e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -160,29 +160,34 @@ static void recalculate_apic_map(struct kvm *kvm)
 		if (!kvm_apic_present(vcpu))
 			continue;
 
-		/*
-		 * All APICs have to be configured in the same mode by an OS.
-		 * We take advatage of this while building logical id loockup
-		 * table. After reset APICs are in xapic/flat mode, so if we
-		 * find apic with different setting we assume this is the mode
-		 * OS wants all apics to be in; build lookup table accordingly.
-		 */
 		if (apic_x2apic_mode(apic)) {
 			new->ldr_bits = 32;
 			new->cid_shift = 16;
 			new->cid_mask = (1 << KVM_X2APIC_CID_BITS) - 1;
 			new->lid_mask = 0xffff;
 			new->broadcast = X2APIC_BROADCAST;
-			break;
-		} else if (kvm_apic_sw_enabled(apic)) {
+		} else if (kvm_apic_hw_enabled(apic)) {
 			if (kvm_apic_get_reg(apic, APIC_DFR) ==
 							APIC_DFR_CLUSTER) {
 				new->cid_shift = 4;
 				new->cid_mask = 0xf;
 				new->lid_mask = 0xf;
+			} else {
+				new->cid_shift = 8;
+				new->cid_mask = 0;
+				new->lid_mask = 0xff;
 			}
-			break;
 		}
+
+		/*
+		 * All APICs have to be configured in the same mode by an OS.
+		 * We take advatage of this while building logical id loockup
+		 * table. After reset APICs are in software disabled mode, so if
+		 * we find apic with different setting we assume this is the mode
+		 * OS wants all apics to be in; build lookup table accordingly.
+		 */
+		if (kvm_apic_sw_enabled(apic))
+			break;
 	}
 
 	kvm_for_each_vcpu(i, vcpu, kvm) {

Paolo

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH 00/21] Fixes for various KVM bugs
  2014-11-02  9:54 [PATCH 00/21] Fixes for various KVM bugs Nadav Amit
                   ` (20 preceding siblings ...)
  2014-11-02  9:55 ` [PATCH 21/21] KVM: x86: Return UNHANDLABLE on unsupported SYSENTER Nadav Amit
@ 2014-11-05 12:31 ` Paolo Bonzini
  21 siblings, 0 replies; 59+ messages in thread
From: Paolo Bonzini @ 2014-11-05 12:31 UTC (permalink / raw)
  To: Nadav Amit; +Cc: kvm, nadav.amit



On 02/11/2014 10:54, Nadav Amit wrote:
> Nadav Amit (21):
>   KVM: x86: decode_modrm does not regard modrm correctly
>   KVM: x86: No error-code on real-mode exceptions
>   KVM: x86: Emulator should set DR6 upon GD like real CPU
>   KVM: x86: Clear DR6[0:3] on #DB during handle_dr
>   KVM: x86: Breakpoints do not consider CS.base
>   KVM: x86: Emulator MOV-sreg uses incorrect size
>   KVM: x86: Emulator considers imm as memory operand
>   KVM: x86: SYSCALL cannot clear eflags[1]
>   KVM: x86: Wrong flags on CMPS and SCAS emulation
>   KVM: x86: Emulate push sreg as done in Core
>   KVM: x86: MOV to CR3 can set bit 63
>   KVM: x86: Do not update EFLAGS on faulting emulation
>   KVM: x86: Combine the lgdt and lidt emulation logic
>   KVM: x86: Inject #GP when loading system segments with non-canonical
>     base
>   KVM: x86: Remove redundant and incorrect cpl check on task-switch
>   KVM: x86: Emulator mis-decodes VEX instructions on real-mode
>   KVM: x86: Warn on APIC base relocation
>   KVM: x86: Return UNHANDLABLE on unsupported SYSENTER

Applied, thanks.

>   KVM: x86: MOVNTI emulation min opsize is not respected
>   KVM: x86: Reset FPU state during reset
>   KVM: x86: Software disabled APIC should still deliver NMIs

Not applied, see individual replies.

Paolo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 08/21] KVM: x86: Reset FPU state during reset
  2014-11-05 12:04   ` Paolo Bonzini
@ 2014-11-05 13:20     ` Nadav Amit
  2014-11-05 14:55       ` Paolo Bonzini
  0 siblings, 1 reply; 59+ messages in thread
From: Nadav Amit @ 2014-11-05 13:20 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Nadav Amit, kvm


> On Nov 5, 2014, at 14:04, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> 
> 
> On 02/11/2014 10:54, Nadav Amit wrote:
>> When resetting the VCPU, the FPU should be reset as well (e.g., XCR0 state).
>> Call fx_init during reset as well.
> 
> Actually it shouldn't be after INIT.  XCR0 is not mentioned explicitly 
> in Table 9-1 of the SDM (IA-32 Processor States Following Power-up, 
> Reset, or INIT), but since MSR_IA32_XSS is not specified, I think XCR0 
> should fall under "All other MSRs”.


I should have given a reference, since Intel SDM is a wild place - see section 2.6 “EXTENDED CONTROL REGISTERS (INCLUDING XCR0)” : "After reset, all bits (except bit 0) in XCR0 are cleared to zero, XCR0[0] is set to 1."


> 
>> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
>> ---
>> arch/x86/kvm/x86.c | 3 +++
>> 1 file changed, 3 insertions(+)
>> 
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 773c17e..9b90ea7 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -7020,6 +7020,9 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu)
>> 	vcpu->arch.regs_avail = ~0;
>> 	vcpu->arch.regs_dirty = ~0;
>> 
>> +	/* should never fail, since fpu_alloc already done */
>> +	fx_init(vcpu);
>> +
>> 	kvm_x86_ops->vcpu_reset(vcpu);
>> }
>> 
>> 
> 
> Even then, I think this patch is not really nice...  The call sequence 
> leading to kvm_vcpu_reset is:

I agree. I did a lousy job this time… (made a hack, and forgot to unhack it).

> 
> kvm_vm_ioctl_create_vcpu
>        kvm_arch_vcpu_create
>                kvm_vcpu_init
>                        kvm_arch_vcpu_init
>                                fx_init (does fpu_alloc)
>        kvm_arch_vcpu_setup
>                kvm_vcpu_reset
>                        fx_init (no fpu_alloc)
> 
> The FPU state is not needed between kvm_arch_vcpu_init and 
> kvm_arch_vcpu_setup.  So we could simply move the reset from 
> kvm_vcpu_init to kvm_vcpu_reset, like this:
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 904535fe825e..eaa3be26dfdc 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -914,8 +914,6 @@ void kvm_pic_clear_all(struct kvm_pic *pic, int irq_source_id);
> 
> void kvm_inject_nmi(struct kvm_vcpu *vcpu);
> 
> -int fx_init(struct kvm_vcpu *vcpu);
> -
> void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
> 		       const u8 *new, int bytes);
> int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 773c17ec42dd..a0566efbb77f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6863,26 +6863,10 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
> 	return 0;
> }
> 
> -int fx_init(struct kvm_vcpu *vcpu)
> +static int fx_init(struct kvm_vcpu *vcpu)
> {
> -	int err;
> -
> -	err = fpu_alloc(&vcpu->arch.guest_fpu);
> -	if (err)
> -		return err;
> -
> -	fpu_finit(&vcpu->arch.guest_fpu);
> -
> -	/*
> -	 * Ensure guest xcr0 is valid for loading
> -	 */
> -	vcpu->arch.xcr0 = XSTATE_FP;
> -
> -	vcpu->arch.cr0 |= X86_CR0_ET;
> -
> -	return 0;
> +	return fpu_alloc(&vcpu->arch.guest_fpu);
> }
> -EXPORT_SYMBOL_GPL(fx_init);
> 
> static void fx_free(struct kvm_vcpu *vcpu)
> {
> @@ -7020,6 +7004,15 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu)
> 	vcpu->arch.regs_avail = ~0;
> 	vcpu->arch.regs_dirty = ~0;
> 
> +	fpu_finit(&vcpu->arch.guest_fpu);
> +
> +	/*
> +	 * Ensure guest xcr0 is valid for loading
> +	 */
> +	vcpu->arch.xcr0 = XSTATE_FP;
> +
> +	vcpu->arch.cr0 |= X86_CR0_ET;
> +
> 	kvm_x86_ops->vcpu_reset(vcpu);
> }
> 
> 
> However, as said above I'm not applying either patch, at least for now.

Ok. I hope you will apply it in the future, since we need to workaround it in our tests.
( I run some tests that stress the emulator, but this bug causes problems even without this stress ).

Thanks,
Nadav

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 08/21] KVM: x86: Reset FPU state during reset
  2014-11-05 13:20     ` Nadav Amit
@ 2014-11-05 14:55       ` Paolo Bonzini
  2014-11-05 20:31         ` Nadav Amit
  0 siblings, 1 reply; 59+ messages in thread
From: Paolo Bonzini @ 2014-11-05 14:55 UTC (permalink / raw)
  To: Nadav Amit; +Cc: Nadav Amit, kvm



On 05/11/2014 14:20, Nadav Amit wrote:
>> > Actually it shouldn't be after INIT.  XCR0 is not mentioned explicitly 
>> > in Table 9-1 of the SDM (IA-32 Processor States Following Power-up, 
>> > Reset, or INIT), but since MSR_IA32_XSS is not specified, I think XCR0 
>> > should fall under "All other MSRs”.
> 
> I should have given a reference, since Intel SDM is a wild place - see section 2.6 “EXTENDED CONTROL REGISTERS (INCLUDING XCR0)” : "After reset, all bits (except bit 0) in XCR0 are cleared to zero, XCR0[0] is set to 1."

Yes, I found that, but INIT is not reset. :)

Reset is typically handled by userspace in the case of KVM.
kvm_vcpu_reset is only called by KVM when you get an INIT interrupt, in
kvm_accept_apic_events.

Paolo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 20/21] KVM: x86: MOVNTI emulation min opsize is not respected
  2014-11-05 12:18   ` Paolo Bonzini
@ 2014-11-05 19:58     ` Nadav Amit
  2014-11-05 19:58     ` Nadav Amit
  1 sibling, 0 replies; 59+ messages in thread
From: Nadav Amit @ 2014-11-05 19:58 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Nadav Amit, kvm


> On Nov 5, 2014, at 14:18, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> 
> 
> On 02/11/2014 10:55, Nadav Amit wrote:
>> Commit 3b32004a66e9 ("KVM: x86: movnti minimum op size of 32-bit is not kept")
>> did not fully fix the minimum operand size of MONTI emulation. Still, MOVNTI
>> may be mistakenly performed using 16-bit opsize.
>> 
>> This patch add No16 flag to mark an instruction does not support 16-bits
>> operand size.
> 
> So a
> 
>    .byte 0x66
>    movntiw (%esi), %eax
> 
> will zero the higher two bytes of %eax before this patch, and load 4
> bytes from (%esi) after?
> 
Well, actually the 0x66 prefix is an illegal prefix for this instruction, so it will cause #UD.
But if the default operand size is 16 (e.g., CS.D = 0), then yes - after this patch it will load 4 bytes from (%esi), and this is the expected behaviour.

Here is a small test to show the behaviour (build with -m32 ).
We set CS to 16-bit segment, so default operand size is 16-bit, but 32-bits are assigned.
If you replace movntil with movl, you’ll see only 16-bits are stored, as you would expect from mov.

---
#include <sys/types.h>
#include <asm/ldt.h>
#include <stdio.h>

int main()
{
	unsigned int a = 0;
	unsigned int b = 0x87654321u;

	struct user_desc d = {
		.entry_number = 0,
		.base_addr = 0,
		.limit = 0xfffffu,
		.seg_32bit = 0, 
		.contents = 2,
		.read_exec_only = 1,
		.limit_in_pages = 1,
		.seg_not_present = 0,
		.useable = 1,
	};

	modify_ldt(1, &d, sizeof(d));
	asm volatile (  "lcall $0x7, $1f\n\t"
			"jmp 2f\n\t"
			"1: .byte 0x67\n\t"
			"movntil %%ebx, (%%eax)\n\t"
			".byte 0x66\n\t"
			"lret\n\t"
			"2:\n\t"
			: : "a"(&a), "b"(b) : "memory");
	printf("result %x\n", a);
	return 0;
}
---

Nadav


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 20/21] KVM: x86: MOVNTI emulation min opsize is not respected
  2014-11-05 12:18   ` Paolo Bonzini
  2014-11-05 19:58     ` Nadav Amit
@ 2014-11-05 19:58     ` Nadav Amit
  1 sibling, 0 replies; 59+ messages in thread
From: Nadav Amit @ 2014-11-05 19:58 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Nadav Amit, kvm


> On Nov 5, 2014, at 14:18, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> 
> 
> On 02/11/2014 10:55, Nadav Amit wrote:
>> Commit 3b32004a66e9 ("KVM: x86: movnti minimum op size of 32-bit is not kept")
>> did not fully fix the minimum operand size of MONTI emulation. Still, MOVNTI
>> may be mistakenly performed using 16-bit opsize.
>> 
>> This patch add No16 flag to mark an instruction does not support 16-bits
>> operand size.
> 
> So a
> 
>   .byte 0x66
>   movntiw (%esi), %eax
> 
> will zero the higher two bytes of %eax before this patch, and load 4
> bytes from (%esi) after?
> 
Well, actually the 0x66 prefix is an illegal prefix for this instruction, so it will cause #UD.
But if the default operand size is 16 (e.g., CS.D = 0), then yes - after this patch it will load 4 bytes from (%esi), and this is the expected behaviour.

Here is a small test to show the behaviour (build with -m32 ).
We set CS to 16-bit segment, so default operand size is 16-bit, but 32-bits are assigned.
If you replace movntil with movl, you’ll see only 16-bits are stored, as you would expect from mov.

---
#include <sys/types.h>
#include <asm/ldt.h>
#include <stdio.h>

int main()
{
	unsigned int a = 0;
	unsigned int b = 0x87654321u;

	struct user_desc d = {
		.entry_number = 0,
		.base_addr = 0,
		.limit = 0xfffffu,
		.seg_32bit = 0, 
		.contents = 2,
		.read_exec_only = 1,
		.limit_in_pages = 1,
		.seg_not_present = 0,
		.useable = 1,
	};

	modify_ldt(1, &d, sizeof(d));
	asm volatile (  "lcall $0x7, $1f\n\t"
			"jmp 2f\n\t"
			"1: .byte 0x67\n\t"
			"movntil %%ebx, (%%eax)\n\t"
			".byte 0x66\n\t"
			"lret\n\t"
			"2:\n\t"
			: : "a"(&a), "b"(b) : "memory");
	printf("result %x\n", a);
	return 0;
}
---

Nadav


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 08/21] KVM: x86: Reset FPU state during reset
  2014-11-05 14:55       ` Paolo Bonzini
@ 2014-11-05 20:31         ` Nadav Amit
  2014-11-06  8:58           ` Paolo Bonzini
  0 siblings, 1 reply; 59+ messages in thread
From: Nadav Amit @ 2014-11-05 20:31 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Nadav Amit, kvm


> On Nov 5, 2014, at 16:55, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> 
> 
> On 05/11/2014 14:20, Nadav Amit wrote:
>>>> Actually it shouldn't be after INIT.  XCR0 is not mentioned explicitly 
>>>> in Table 9-1 of the SDM (IA-32 Processor States Following Power-up, 
>>>> Reset, or INIT), but since MSR_IA32_XSS is not specified, I think XCR0 
>>>> should fall under "All other MSRs”.
>> 
>> I should have given a reference, since Intel SDM is a wild place - see section 2.6 “EXTENDED CONTROL REGISTERS (INCLUDING XCR0)” : "After reset, all bits (except bit 0) in XCR0 are cleared to zero, XCR0[0] is set to 1."
> 
> Yes, I found that, but INIT is not reset. :)

Paolo, you kill me…
You are correct, it does not appear clearly in the SDM, but that is what real hardware does.
If you look at bochs - http://code.metager.de/source/xref/bochs/bochs/cpu/init.cc - you’ll see they call
"BX_CPU_THIS_PTR xcr0.set32(0x1);” regardless to whether it is hardware or software reset (the latter happens on INIT).


> Reset is typically handled by userspace in the case of KVM.
> kvm_vcpu_reset is only called by KVM when you get an INIT interrupt, in
> kvm_accept_apic_events.
I know. Yet, my testing environment relies on INIT…

Nadav



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 14/21] KVM: x86: Software disabled APIC should still deliver NMIs
  2014-11-05 12:30   ` Paolo Bonzini
@ 2014-11-05 20:45     ` Nadav Amit
  2014-11-06  9:34       ` Paolo Bonzini
  0 siblings, 1 reply; 59+ messages in thread
From: Nadav Amit @ 2014-11-05 20:45 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Nadav Amit, kvm


> On Nov 5, 2014, at 14:30, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> 
> 
> On 02/11/2014 10:54, Nadav Amit wrote:
>> Currently, the APIC logical map does not consider VCPUs whose local-apic is
>> software-disabled.  However, NMIs, INIT, etc. should still be delivered to such
>> VCPUs. Therefore, the APIC mode should first be determined, and then the map,
>> considering all VCPUs should be constructed.
>> 
>> To address this issue, first find the APIC mode, and only then construct the
>> logical map.

[snip]

> 
> We need to take into account DFR even if all APICs are software disabled,
> in case it will be used to send NMIs.
> 
> What about this on top of your patch?
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index c98b44d0ffe6..d8b11f47f45e 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -160,29 +160,34 @@ static void recalculate_apic_map(struct kvm *kvm)
> 		if (!kvm_apic_present(vcpu))
> 			continue;
> 
> -		/*
> -		 * All APICs have to be configured in the same mode by an OS.
> -		 * We take advatage of this while building logical id loockup
> -		 * table. After reset APICs are in xapic/flat mode, so if we
> -		 * find apic with different setting we assume this is the mode
> -		 * OS wants all apics to be in; build lookup table accordingly.
> -		 */
> 		if (apic_x2apic_mode(apic)) {
> 			new->ldr_bits = 32;
> 			new->cid_shift = 16;
> 			new->cid_mask = (1 << KVM_X2APIC_CID_BITS) - 1;
> 			new->lid_mask = 0xffff;
> 			new->broadcast = X2APIC_BROADCAST;
> -			break;
> -		} else if (kvm_apic_sw_enabled(apic)) {
> +		} else if (kvm_apic_hw_enabled(apic)) {
> 			if (kvm_apic_get_reg(apic, APIC_DFR) ==
> 							APIC_DFR_CLUSTER) {
> 				new->cid_shift = 4;
> 				new->cid_mask = 0xf;
> 				new->lid_mask = 0xf;
> +			} else {
> +				new->cid_shift = 8;
> +				new->cid_mask = 0;
> +				new->lid_mask = 0xff;
> 			}
> -			break;
> 		}
> +
> +		/*
> +		 * All APICs have to be configured in the same mode by an OS.
> +		 * We take advatage of this while building logical id loockup
> +		 * table. After reset APICs are in software disabled mode, so if
> +		 * we find apic with different setting we assume this is the mode
> +		 * OS wants all apics to be in; build lookup table accordingly.
> +		 */
> +		if (kvm_apic_sw_enabled(apic))
> +			break;
> 	}
> 
> 	kvm_for_each_vcpu(i, vcpu, kvm) {
> 

I didn’t encounter the problem you try to fix, so I can’t say for sure, yet…

If I understand the SDM correctly, in such scenario (all APICs are software disabled) the mode is left as the default - flat mode (see section 10.6.2.2 "Logical Destination Mode”):
"All processors that have their APIC software enabled (using the spurious vector enable/disable bit) must have their DFRs (Destination Format Registers) programmed identically. The default mode for DFR is flat mode.”

So I think the previous behaviour (before the additional changes) is the correct one.
I might be able to confirm it, but anyhow only in a couple of weeks.

Nadav

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 08/21] KVM: x86: Reset FPU state during reset
  2014-11-05 20:31         ` Nadav Amit
@ 2014-11-06  8:58           ` Paolo Bonzini
  2014-11-06  9:13             ` Nadav Amit
  0 siblings, 1 reply; 59+ messages in thread
From: Paolo Bonzini @ 2014-11-06  8:58 UTC (permalink / raw)
  To: Nadav Amit; +Cc: Nadav Amit, kvm

On 05/11/2014 21:31, Nadav Amit wrote:
> You are correct, it does not appear clearly in the SDM, but that is what real hardware does.
> If you look at bochs - http://code.metager.de/source/xref/bochs/bochs/cpu/init.cc - you’ll see they call
> "BX_CPU_THIS_PTR xcr0.set32(0x1);” regardless to whether it is hardware or software reset (the latter happens on INIT).

Fair enough. :)

Does the patch in
http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/129060 look good?

Paolo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 08/21] KVM: x86: Reset FPU state during reset
  2014-11-06  8:58           ` Paolo Bonzini
@ 2014-11-06  9:13             ` Nadav Amit
  2014-11-06  9:44               ` Paolo Bonzini
  0 siblings, 1 reply; 59+ messages in thread
From: Nadav Amit @ 2014-11-06  9:13 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Nadav Amit, kvm


> On Nov 6, 2014, at 10:58, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> On 05/11/2014 21:31, Nadav Amit wrote:
>> You are correct, it does not appear clearly in the SDM, but that is what real hardware does.
>> If you look at bochs - http://code.metager.de/source/xref/bochs/bochs/cpu/init.cc - you’ll see they call
>> "BX_CPU_THIS_PTR xcr0.set32(0x1);” regardless to whether it is hardware or software reset (the latter happens on INIT).
> 
> Fair enough. :)
Thanks. It is turning harder to find references for the crazy x86 behaviour. :)

> 
> Does the patch in
> http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/129060 look good?

Yes. However, re-reviewing the patches both my patch and yours actually do something slightly different than the spec: they clear XMM8-15 and YMM[128:…] which should not happen on INIT according to the spec.

Fixing it might be a bit intrusive. What do you say?

Nadav

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 20/21] KVM: x86: MOVNTI emulation min opsize is not respected
  2014-11-02  9:55 ` [PATCH 20/21] KVM: x86: MOVNTI emulation min opsize is not respected Nadav Amit
  2014-11-05 12:18   ` Paolo Bonzini
@ 2014-11-06  9:23   ` Paolo Bonzini
  1 sibling, 0 replies; 59+ messages in thread
From: Paolo Bonzini @ 2014-11-06  9:23 UTC (permalink / raw)
  To: kvm; +Cc: kvm, nadav.amit



On 02/11/2014 10:55, Nadav Amit wrote:
> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
> ---

Applied, thanks.

Paolo


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 14/21] KVM: x86: Software disabled APIC should still deliver NMIs
  2014-11-05 20:45     ` Nadav Amit
@ 2014-11-06  9:34       ` Paolo Bonzini
  2014-11-06 16:45         ` Radim Krčmář
  0 siblings, 1 reply; 59+ messages in thread
From: Paolo Bonzini @ 2014-11-06  9:34 UTC (permalink / raw)
  To: Nadav Amit; +Cc: Nadav Amit, kvm



On 05/11/2014 21:45, Nadav Amit wrote:
> If I understand the SDM correctly, in such scenario (all APICs are
> software disabled) the mode is left as the default - flat mode (see
> section 10.6.2.2 "Logical Destination Mode”): "All processors that
> have their APIC software enabled (using the spurious vector
> enable/disable bit) must have their DFRs (Destination Format
> Registers) programmed identically. The default mode for DFR is flat
> mode.”

That's not what either Bochs or QEMU do, though.  (Though in the case of
Bochs I cannot find the place where reception of IPIs is prevented for
software-disabled APICs, so I'm not sure how much to trust it in this case).

I'm not sure why software-disabled APICs could have different DFRs, if
the APICs can receive NMI IPIs.  I'll ask Intel.

Paolo


> So I think the previous behaviour (before the additional changes) is
> the correct one. I might be able to confirm it, but anyhow only in a
> couple of weeks.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 08/21] KVM: x86: Reset FPU state during reset
  2014-11-06  9:13             ` Nadav Amit
@ 2014-11-06  9:44               ` Paolo Bonzini
  2014-11-06  9:56                 ` Nadav Amit
  2014-11-06 17:38                 ` Radim Krčmář
  0 siblings, 2 replies; 59+ messages in thread
From: Paolo Bonzini @ 2014-11-06  9:44 UTC (permalink / raw)
  To: Nadav Amit; +Cc: Nadav Amit, kvm



On 06/11/2014 10:13, Nadav Amit wrote:
> 
>> On Nov 6, 2014, at 10:58, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 05/11/2014 21:31, Nadav Amit wrote:
>>> You are correct, it does not appear clearly in the SDM, but that is what real hardware does.
>>> If you look at bochs - http://code.metager.de/source/xref/bochs/bochs/cpu/init.cc - you’ll see they call
>>> "BX_CPU_THIS_PTR xcr0.set32(0x1);” regardless to whether it is hardware or software reset (the latter happens on INIT).
>>
>> Fair enough. :)
> Thanks. It is turning harder to find references for the crazy x86 behaviour. :)

Indeed, I'll ask Intel to clarify this one too.

The crazy thing is that AMD doesn't say anything, either!  Their own 
manual just says "Hardware initializes XCR0 to 0000_0000_0000_0001h", 
but it doesn't say when.

>> Does the patch in
>> http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/129060 look good?
> 
> Yes. However, re-reviewing the patches both my patch and yours actually do something slightly different than the spec: they clear XMM8-15 and YMM[128:…] which should not happen on INIT according to the spec.

Yes, my patch just wanted to have the same effect as yours, but 
fpu_finit must remain in fx_alloc.  Setting cr0 is also unnecessary, 
since vmx_vcpu_reset and svm_vcpu_reset both do this.

> Fixing it might be a bit intrusive. What do you say?

I think it's easy if we start with my version of the change:

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index dc932d388c43..aba13df4e0ec 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -915,8 +915,6 @@ void kvm_pic_clear_all(struct kvm_pic *pic, int irq_source_id);
 
 void kvm_inject_nmi(struct kvm_vcpu *vcpu);
 
-int fx_init(struct kvm_vcpu *vcpu);
-
 void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 		       const u8 *new, int bytes);
 int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e0260ccd78a4..0ef4c0b27248 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6868,7 +6868,7 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
 	return 0;
 }
 
-int fx_init(struct kvm_vcpu *vcpu)
+static int fx_init(struct kvm_vcpu *vcpu)
 {
 	int err;
 
@@ -6878,16 +6878,8 @@ int fx_init(struct kvm_vcpu *vcpu)
 
 	fpu_finit(&vcpu->arch.guest_fpu);
 
-	/*
-	 * Ensure guest xcr0 is valid for loading
-	 */
-	vcpu->arch.xcr0 = XSTATE_FP;
-
-	vcpu->arch.cr0 |= X86_CR0_ET;
-
 	return 0;
 }
-EXPORT_SYMBOL_GPL(fx_init);
 
 static void fx_free(struct kvm_vcpu *vcpu)
 {
@@ -7025,6 +7017,11 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu)
 	vcpu->arch.regs_avail = ~0;
 	vcpu->arch.regs_dirty = ~0;
 
+	/*
+	 * Ensure guest xcr0 is valid for loading
+	 */
+	vcpu->arch.xcr0 = XSTATE_FP;
+
 	kvm_x86_ops->vcpu_reset(vcpu);
 }
 

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH 08/21] KVM: x86: Reset FPU state during reset
  2014-11-06  9:44               ` Paolo Bonzini
@ 2014-11-06  9:56                 ` Nadav Amit
  2014-11-06 10:44                   ` Paolo Bonzini
  2014-11-06 17:38                 ` Radim Krčmář
  1 sibling, 1 reply; 59+ messages in thread
From: Nadav Amit @ 2014-11-06  9:56 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Nadav Amit, kvm


> On Nov 6, 2014, at 11:44, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> 
> 
> On 06/11/2014 10:13, Nadav Amit wrote:
>> 
>>> On Nov 6, 2014, at 10:58, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> 
>>> On 05/11/2014 21:31, Nadav Amit wrote:
>>>> You are correct, it does not appear clearly in the SDM, but that is what real hardware does.
>>>> If you look at bochs - http://code.metager.de/source/xref/bochs/bochs/cpu/init.cc - you’ll see they call
>>>> "BX_CPU_THIS_PTR xcr0.set32(0x1);” regardless to whether it is hardware or software reset (the latter happens on INIT).
>>> 
>>> Fair enough. :)
>> Thanks. It is turning harder to find references for the crazy x86 behaviour. :)
> 
> Indeed, I'll ask Intel to clarify this one too.
> 
> The crazy thing is that AMD doesn't say anything, either!  Their own 
> manual just says "Hardware initializes XCR0 to 0000_0000_0000_0001h", 
> but it doesn't say when.
I know. I looked at their manual too.

> 
>>> Does the patch in
>>> http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/129060 look good?
>> 
>> Yes. However, re-reviewing the patches both my patch and yours actually do something slightly different than the spec: they clear XMM8-15 and YMM[128:…] which should not happen on INIT according to the spec.
> 
> Yes, my patch just wanted to have the same effect as yours, but 
> fpu_finit must remain in fx_alloc.  Setting cr0 is also unnecessary, 
> since vmx_vcpu_reset and svm_vcpu_reset both do this.
> 
>> Fixing it might be a bit intrusive. What do you say?
> 
> I think it's easy if we start with my version of the change:
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index dc932d388c43..aba13df4e0ec 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -915,8 +915,6 @@ void kvm_pic_clear_all(struct kvm_pic *pic, int irq_source_id);
> 
> void kvm_inject_nmi(struct kvm_vcpu *vcpu);
> 
> -int fx_init(struct kvm_vcpu *vcpu);
> -
> void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
> 		       const u8 *new, int bytes);
> int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e0260ccd78a4..0ef4c0b27248 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6868,7 +6868,7 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
> 	return 0;
> }
> 
> -int fx_init(struct kvm_vcpu *vcpu)
> +static int fx_init(struct kvm_vcpu *vcpu)
> {
> 	int err;
> 
> @@ -6878,16 +6878,8 @@ int fx_init(struct kvm_vcpu *vcpu)
> 
> 	fpu_finit(&vcpu->arch.guest_fpu);
> 
> -	/*
> -	 * Ensure guest xcr0 is valid for loading
> -	 */
> -	vcpu->arch.xcr0 = XSTATE_FP;
> -
> -	vcpu->arch.cr0 |= X86_CR0_ET;
I think this line was removed by mistake.

> -
> 	return 0;
> }
> -EXPORT_SYMBOL_GPL(fx_init);
> 
> static void fx_free(struct kvm_vcpu *vcpu)
> {
> @@ -7025,6 +7017,11 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu)
> 	vcpu->arch.regs_avail = ~0;
> 	vcpu->arch.regs_dirty = ~0;
> 
> +	/*
> +	 * Ensure guest xcr0 is valid for loading
> +	 */
> +	vcpu->arch.xcr0 = XSTATE_FP;
> +
> 	kvm_x86_ops->vcpu_reset(vcpu);
> }
> 
I’ll give it a shot (in a couple of weeks).

Thanks,
Nadav


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 08/21] KVM: x86: Reset FPU state during reset
  2014-11-06  9:56                 ` Nadav Amit
@ 2014-11-06 10:44                   ` Paolo Bonzini
  0 siblings, 0 replies; 59+ messages in thread
From: Paolo Bonzini @ 2014-11-06 10:44 UTC (permalink / raw)
  To: Nadav Amit; +Cc: Nadav Amit, kvm



On 06/11/2014 10:56, Nadav Amit wrote:
> 
>> On Nov 6, 2014, at 11:44, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>>
>>
>> On 06/11/2014 10:13, Nadav Amit wrote:
>>>
>>>> On Nov 6, 2014, at 10:58, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>
>>>> On 05/11/2014 21:31, Nadav Amit wrote:
>>>>> You are correct, it does not appear clearly in the SDM, but that is what real hardware does.
>>>>> If you look at bochs - http://code.metager.de/source/xref/bochs/bochs/cpu/init.cc - you’ll see they call
>>>>> "BX_CPU_THIS_PTR xcr0.set32(0x1);” regardless to whether it is hardware or software reset (the latter happens on INIT).
>>>>
>>>> Fair enough. :)
>>> Thanks. It is turning harder to find references for the crazy x86 behaviour. :)
>>
>> Indeed, I'll ask Intel to clarify this one too.
>>
>> The crazy thing is that AMD doesn't say anything, either!  Their own 
>> manual just says "Hardware initializes XCR0 to 0000_0000_0000_0001h", 
>> but it doesn't say when.
> I know. I looked at their manual too.
> 
>>
>>>> Does the patch in
>>>> http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/129060 look good?
>>>
>>> Yes. However, re-reviewing the patches both my patch and yours actually do something slightly different than the spec: they clear XMM8-15 and YMM[128:…] which should not happen on INIT according to the spec.
>>
>> Yes, my patch just wanted to have the same effect as yours, but 
>> fpu_finit must remain in fx_alloc.  Setting cr0 is also unnecessary, 
>> since vmx_vcpu_reset and svm_vcpu_reset both do this.
>>
>>> Fixing it might be a bit intrusive. What do you say?
>>
>> I think it's easy if we start with my version of the change:
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index dc932d388c43..aba13df4e0ec 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -915,8 +915,6 @@ void kvm_pic_clear_all(struct kvm_pic *pic, int irq_source_id);
>>
>> void kvm_inject_nmi(struct kvm_vcpu *vcpu);
>>
>> -int fx_init(struct kvm_vcpu *vcpu);
>> -
>> void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
>> 		       const u8 *new, int bytes);
>> int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn);
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index e0260ccd78a4..0ef4c0b27248 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -6868,7 +6868,7 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
>> 	return 0;
>> }
>>
>> -int fx_init(struct kvm_vcpu *vcpu)
>> +static int fx_init(struct kvm_vcpu *vcpu)
>> {
>> 	int err;
>>
>> @@ -6878,16 +6878,8 @@ int fx_init(struct kvm_vcpu *vcpu)
>>
>> 	fpu_finit(&vcpu->arch.guest_fpu);
>>
>> -	/*
>> -	 * Ensure guest xcr0 is valid for loading
>> -	 */
>> -	vcpu->arch.xcr0 = XSTATE_FP;
>> -
>> -	vcpu->arch.cr0 |= X86_CR0_ET;
> I think this line was removed by mistake.

It is set already by vmx_vcpu_setup and svm_vcpu_setup.

At startup we always set NW and CD in addition to ET.

Paolo

>> -
>> 	return 0;
>> }
>> -EXPORT_SYMBOL_GPL(fx_init);
>>
>> static void fx_free(struct kvm_vcpu *vcpu)
>> {
>> @@ -7025,6 +7017,11 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu)
>> 	vcpu->arch.regs_avail = ~0;
>> 	vcpu->arch.regs_dirty = ~0;
>>
>> +	/*
>> +	 * Ensure guest xcr0 is valid for loading
>> +	 */
>> +	vcpu->arch.xcr0 = XSTATE_FP;
>> +
>> 	kvm_x86_ops->vcpu_reset(vcpu);
>> }
>>
> I’ll give it a shot (in a couple of weeks).
> 
> Thanks,
> Nadav
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 14/21] KVM: x86: Software disabled APIC should still deliver NMIs
  2014-11-06  9:34       ` Paolo Bonzini
@ 2014-11-06 16:45         ` Radim Krčmář
  2014-11-10 17:35           ` Paolo Bonzini
  2014-11-14 15:00           ` Paolo Bonzini
  0 siblings, 2 replies; 59+ messages in thread
From: Radim Krčmář @ 2014-11-06 16:45 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Nadav Amit, Nadav Amit, kvm

2014-11-06 10:34+0100, Paolo Bonzini:
> On 05/11/2014 21:45, Nadav Amit wrote:
> > If I understand the SDM correctly, in such scenario (all APICs are
> > software disabled) the mode is left as the default - flat mode (see

APIC doesn't have any global mode (it is just KVM's simplification), so
when a message lands on the system bus, it just compares MDA with LDR
and DFR ...

> > section 10.6.2.2 "Logical Destination Mode”): "All processors that
> > have their APIC software enabled (using the spurious vector
> > enable/disable bit) must have their DFRs (Destination Format
> > Registers) programmed identically. The default mode for DFR is flat
> > mode.”

I think the "default mode" points to reset state, which is flat DFR;
and it is mentioned only because of the following sentence
  If you are using cluster mode, DFRs must be programmed before the APIC
  is software enabled.

> That's not what either Bochs or QEMU do, though.  (Though in the case of
> Bochs I cannot find the place where reception of IPIs is prevented for
> software-disabled APICs, so I'm not sure how much to trust it in this case).
> 
> I'm not sure why software-disabled APICs could have different DFRs, if
> the APICs can receive NMI IPIs.  I'll ask Intel.

When changing the mode, we can't switch DFR synchronously, so it has to
happen and NMI may be needed (watchdog?) before APIC configuration.
Explicit statement might have been a hint to be _extra_ careful when
using logical destination for INIT, NMI, ... I wonder what they'll say.

Anyway, Paolo's patch seems to be in the right direction, I'd modify it
a bit though:

LDR=0 isn't addressable in any logical mode, so we can ignore APICs that
don't set it and decide the mode by the last nonzero one.
This works in a situation, where one part is configured for cluster and
the rest is still in reset state.

(It gets harder if we allow nonzero LDRs with different DFR ...
 we'd need to split our logical map to handle it.)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 758f838..6da303e1 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -148,10 +148,6 @@ static void recalculate_apic_map(struct kvm *kvm)
 		goto out;
 
 	new->ldr_bits = 8;
-	/* flat mode is default */
-	new->cid_shift = 8;
-	new->cid_mask = 0;
-	new->lid_mask = 0xff;
 	new->broadcast = APIC_BROADCAST;
 
 	kvm_for_each_vcpu(i, vcpu, kvm) {
@@ -166,7 +162,7 @@ static void recalculate_apic_map(struct kvm *kvm)
 			new->cid_mask = (1 << KVM_X2APIC_CID_BITS) - 1;
 			new->lid_mask = 0xffff;
 			new->broadcast = X2APIC_BROADCAST;
-		} else if (kvm_apic_hw_enabled(apic)) {
+		} else if (kvm_apic_get_reg(apic, APIC_LDR)) {
 			if (kvm_apic_get_reg(apic, APIC_DFR) ==
 							APIC_DFR_CLUSTER) {
 				new->cid_shift = 4;

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH 08/21] KVM: x86: Reset FPU state during reset
  2014-11-06  9:44               ` Paolo Bonzini
  2014-11-06  9:56                 ` Nadav Amit
@ 2014-11-06 17:38                 ` Radim Krčmář
  1 sibling, 0 replies; 59+ messages in thread
From: Radim Krčmář @ 2014-11-06 17:38 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Nadav Amit, Nadav Amit, kvm

2014-11-06 10:44+0100, Paolo Bonzini:
> 
> 
> On 06/11/2014 10:13, Nadav Amit wrote:
> > 
> >> On Nov 6, 2014, at 10:58, Paolo Bonzini <pbonzini@redhat.com> wrote:
> >>
> >> On 05/11/2014 21:31, Nadav Amit wrote:
> >>> You are correct, it does not appear clearly in the SDM, but that is what real hardware does.
> >>> If you look at bochs - http://code.metager.de/source/xref/bochs/bochs/cpu/init.cc - you’ll see they call
> >>> "BX_CPU_THIS_PTR xcr0.set32(0x1);” regardless to whether it is hardware or software reset (the latter happens on INIT).
> >>
> >> Fair enough. :)
> > Thanks. It is turning harder to find references for the crazy x86 behaviour. :)
> 
> Indeed, I'll ask Intel to clarify this one too.
> 
> The crazy thing is that AMD doesn't say anything, either!  Their own 
> manual just says "Hardware initializes XCR0 to 0000_0000_0000_0001h", 
> but it doesn't say when.

I found AMD 2:15.21.8
  INIT reinitializes the control registers, segment registers and GP
  registers in a manner similar to RESET, but does not alter the
  contents of most MSRs, caches or numeric coprocessor (x87 or SSE)
  state,

So reseting XCR0 is ok, but we shouldn't call fx_init() on INIT,
the version below seems fine in this regard.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 18/21] KVM: x86: Emulator mis-decodes VEX instructions on real-mode
  2014-11-02  9:54 ` [PATCH 18/21] KVM: x86: Emulator mis-decodes VEX instructions on real-mode Nadav Amit
@ 2014-11-08  7:25   ` Paolo Bonzini
  0 siblings, 0 replies; 59+ messages in thread
From: Paolo Bonzini @ 2014-11-08  7:25 UTC (permalink / raw)
  To: Nadav Amit; +Cc: kvm, nadav.amit



On 02/11/2014 10:54, Nadav Amit wrote:
> Commit 7fe864dc942c ("KVM: x86: Emulator considers imm as memory operand")
> marked VEX instructions as such in protected mode.  VEX-prefix instructions are
> not supported relevant on real-mode and VM86, but should cause #UD instead of
> being decoded as LES/LDS.
> 
> Fix this behaviour to be consistent with real hardware.
> 
> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
> ---
>  arch/x86/kvm/emulate.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index db8cb4d6..24b0df7 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -4505,8 +4505,7 @@ done_prefixes:
>  
>  	/* vex-prefix instructions are not implemented */
>  	if (ctxt->opcode_len == 1 && (ctxt->b == 0xc5 || ctxt->b == 0xc4) &&
> -	    (mode == X86EMUL_MODE_PROT64 ||
> -	    (mode >= X86EMUL_MODE_PROT16 && (ctxt->modrm & 0x80)))) {
> +	    (mode == X86EMUL_MODE_PROT64 || (ctxt->modrm & 0x80))) {

This should also check for (ctxt->modrm & 0xc0) == 0xc0 instead of just
ctxt->modrm & 0x80.  Otherwise, installation of Windows XP and 2003 is
broken on pre-Westmere system, because they execute LDS in the process
of transitioning from 16- to 32-bit protected mode.

This was not visible before because at this point CS is already 32-bit;
I fixed the patch.

Paolo

>  		ctxt->d = NotImpl;
>  	}
>  
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 14/21] KVM: x86: Software disabled APIC should still deliver NMIs
  2014-11-06 16:45         ` Radim Krčmář
@ 2014-11-10 17:35           ` Paolo Bonzini
  2014-11-10 18:06             ` Radim Krčmář
  2014-11-14 15:00           ` Paolo Bonzini
  1 sibling, 1 reply; 59+ messages in thread
From: Paolo Bonzini @ 2014-11-10 17:35 UTC (permalink / raw)
  To: kvm; +Cc: Nadav Amit, Nadav Amit, kvm

On 06/11/2014 17:45, Radim Krčmář wrote:
> -		} else if (kvm_apic_hw_enabled(apic)) {
> +		} else if (kvm_apic_get_reg(apic, APIC_LDR)) {
>  			if (kvm_apic_get_reg(apic, APIC_DFR) ==
>  							APIC_DFR_CLUSTER) {
>  				new->cid_shift = 4;

Why are you removing the kvm_apic_hw_enabled(apic)?  We do not reset the
APIC to the power-up values (the SDM says that "may" happen) when the
enabled bit is turned off in the APIC_BASE MSR.

Paolo


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 14/21] KVM: x86: Software disabled APIC should still deliver NMIs
  2014-11-10 17:35           ` Paolo Bonzini
@ 2014-11-10 18:06             ` Radim Krčmář
  0 siblings, 0 replies; 59+ messages in thread
From: Radim Krčmář @ 2014-11-10 18:06 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Nadav Amit, Nadav Amit, kvm

2014-11-10 18:35+0100, Paolo Bonzini:
> On 06/11/2014 17:45, Radim Krčmář wrote:
> > -		} else if (kvm_apic_hw_enabled(apic)) {
> > +		} else if (kvm_apic_get_reg(apic, APIC_LDR)) {
> >  			if (kvm_apic_get_reg(apic, APIC_DFR) ==
> >  							APIC_DFR_CLUSTER) {
> >  				new->cid_shift = 4;
> 
> Why are you removing the kvm_apic_hw_enabled(apic)?  We do not reset the
> APIC to the power-up values (the SDM says that "may" happen) when the
> enabled bit is turned off in the APIC_BASE MSR.

We checked for that before, in kvm_apic_present().

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 14/21] KVM: x86: Software disabled APIC should still deliver NMIs
  2014-11-06 16:45         ` Radim Krčmář
  2014-11-10 17:35           ` Paolo Bonzini
@ 2014-11-14 15:00           ` Paolo Bonzini
  2014-11-26 17:01             ` Nadav Amit
  1 sibling, 1 reply; 59+ messages in thread
From: Paolo Bonzini @ 2014-11-14 15:00 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: Nadav Amit, Nadav Amit, kvm



On 06/11/2014 17:45, Radim Krčmář wrote:
> 2014-11-06 10:34+0100, Paolo Bonzini:
>> On 05/11/2014 21:45, Nadav Amit wrote:
>>> If I understand the SDM correctly, in such scenario (all APICs are
>>> software disabled) the mode is left as the default - flat mode (see
> 
> APIC doesn't have any global mode (it is just KVM's simplification), so
> when a message lands on the system bus, it just compares MDA with LDR
> and DFR ...
> 
>>> section 10.6.2.2 "Logical Destination Mode”): "All processors that
>>> have their APIC software enabled (using the spurious vector
>>> enable/disable bit) must have their DFRs (Destination Format
>>> Registers) programmed identically. The default mode for DFR is flat
>>> mode.”
> 
> I think the "default mode" points to reset state, which is flat DFR;
> and it is mentioned only because of the following sentence
>   If you are using cluster mode, DFRs must be programmed before the APIC
>   is software enabled.
> 
>> That's not what either Bochs or QEMU do, though.  (Though in the case of
>> Bochs I cannot find the place where reception of IPIs is prevented for
>> software-disabled APICs, so I'm not sure how much to trust it in this case).
>>
>> I'm not sure why software-disabled APICs could have different DFRs, if
>> the APICs can receive NMI IPIs.  I'll ask Intel.
> 
> When changing the mode, we can't switch DFR synchronously, so it has to
> happen and NMI may be needed (watchdog?) before APIC configuration.
> Explicit statement might have been a hint to be _extra_ careful when
> using logical destination for INIT, NMI, ... I wonder what they'll say.
> 
> Anyway, Paolo's patch seems to be in the right direction, I'd modify it
> a bit though:
> 
> LDR=0 isn't addressable in any logical mode, so we can ignore APICs that
> don't set it and decide the mode by the last nonzero one.
> This works in a situation, where one part is configured for cluster and
> the rest is still in reset state.
> 
> (It gets harder if we allow nonzero LDRs with different DFR ...
>  we'd need to split our logical map to handle it.)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 758f838..6da303e1 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -148,10 +148,6 @@ static void recalculate_apic_map(struct kvm *kvm)
>  		goto out;
>  
>  	new->ldr_bits = 8;
> -	/* flat mode is default */
> -	new->cid_shift = 8;
> -	new->cid_mask = 0;
> -	new->lid_mask = 0xff;
>  	new->broadcast = APIC_BROADCAST;
>  
>  	kvm_for_each_vcpu(i, vcpu, kvm) {
> @@ -166,7 +162,7 @@ static void recalculate_apic_map(struct kvm *kvm)
>  			new->cid_mask = (1 << KVM_X2APIC_CID_BITS) - 1;
>  			new->lid_mask = 0xffff;
>  			new->broadcast = X2APIC_BROADCAST;
> -		} else if (kvm_apic_hw_enabled(apic)) {
> +		} else if (kvm_apic_get_reg(apic, APIC_LDR)) {
>  			if (kvm_apic_get_reg(apic, APIC_DFR) ==
>  							APIC_DFR_CLUSTER) {
>  				new->cid_shift = 4;
> 

I merged this patch and Nadav's.

Paolo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 14/21] KVM: x86: Software disabled APIC should still deliver NMIs
  2014-11-14 15:00           ` Paolo Bonzini
@ 2014-11-26 17:01             ` Nadav Amit
  2014-11-26 18:00               ` Paolo Bonzini
  2014-11-27 13:39               ` Radim Krčmář
  0 siblings, 2 replies; 59+ messages in thread
From: Nadav Amit @ 2014-11-26 17:01 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář; +Cc: kvm list

Paolo Bonzini <pbonzini@redhat.com> wrote:

> 
> 
> On 06/11/2014 17:45, Radim Krčmář wrote:
>> 2014-11-06 10:34+0100, Paolo Bonzini:
>>> On 05/11/2014 21:45, Nadav Amit wrote:
>>>> If I understand the SDM correctly, in such scenario (all APICs are
>>>> software disabled) the mode is left as the default - flat mode (see
>> 
>> APIC doesn't have any global mode (it is just KVM's simplification), so
>> when a message lands on the system bus, it just compares MDA with LDR
>> and DFR ...
>> 
>>>> section 10.6.2.2 "Logical Destination Mode”): "All processors that
>>>> have their APIC software enabled (using the spurious vector
>>>> enable/disable bit) must have their DFRs (Destination Format
>>>> Registers) programmed identically. The default mode for DFR is flat
>>>> mode.”
>> 
>> I think the "default mode" points to reset state, which is flat DFR;
>> and it is mentioned only because of the following sentence
>>  If you are using cluster mode, DFRs must be programmed before the APIC
>>  is software enabled.
>> 
>>> That's not what either Bochs or QEMU do, though.  (Though in the case of
>>> Bochs I cannot find the place where reception of IPIs is prevented for
>>> software-disabled APICs, so I'm not sure how much to trust it in this case).
>>> 
>>> I'm not sure why software-disabled APICs could have different DFRs, if
>>> the APICs can receive NMI IPIs.  I'll ask Intel.
>> 
>> When changing the mode, we can't switch DFR synchronously, so it has to
>> happen and NMI may be needed (watchdog?) before APIC configuration.
>> Explicit statement might have been a hint to be _extra_ careful when
>> using logical destination for INIT, NMI, ... I wonder what they'll say.
>> 
>> Anyway, Paolo's patch seems to be in the right direction, I'd modify it
>> a bit though:
>> 
>> LDR=0 isn't addressable in any logical mode, so we can ignore APICs that
>> don't set it and decide the mode by the last nonzero one.
>> This works in a situation, where one part is configured for cluster and
>> the rest is still in reset state.
>> 
>> (It gets harder if we allow nonzero LDRs with different DFR ...
>> we'd need to split our logical map to handle it.)
>> 
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index 758f838..6da303e1 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -148,10 +148,6 @@ static void recalculate_apic_map(struct kvm *kvm)
>> 		goto out;
>> 
>> 	new->ldr_bits = 8;
>> -	/* flat mode is default */
>> -	new->cid_shift = 8;
>> -	new->cid_mask = 0;
>> -	new->lid_mask = 0xff;
>> 	new->broadcast = APIC_BROADCAST;
>> 
>> 	kvm_for_each_vcpu(i, vcpu, kvm) {
>> @@ -166,7 +162,7 @@ static void recalculate_apic_map(struct kvm *kvm)
>> 			new->cid_mask = (1 << KVM_X2APIC_CID_BITS) - 1;
>> 			new->lid_mask = 0xffff;
>> 			new->broadcast = X2APIC_BROADCAST;
>> -		} else if (kvm_apic_hw_enabled(apic)) {
>> +		} else if (kvm_apic_get_reg(apic, APIC_LDR)) {
>> 			if (kvm_apic_get_reg(apic, APIC_DFR) ==
>> 							APIC_DFR_CLUSTER) {
>> 				new->cid_shift = 4;
> 
> I merged this patch and Nadav’s.

Sorry for the late and long reply, but I got an issue with the new version
(and my previous version as well). Indeed, the SDM states that DFR should
be the same for enabled CPUs, and that the BIOS should get all CPUs in
either xAPIC or x2APIC. Yet, there is nothing that says all CPUs need to be
in xAPIC/x2APIC mode.

In my tests (which pass on bare-metal), I got a scenario in which some CPUs
are in xAPIC mode, the BSP changed (which is currently not handled correctly
by KVM) and the BSP has x2APIC enabled. All the core APICs are
software-enabled. The expected behaviour is that the CPUs with x2APIC
enabled would work in x2APIC mode.

I think such a transitory scenario is possible on real-systems as well,
perhaps during CPU hot-plug. It appears the previous version (before all of
our changes) handled it better. I presume the most efficient way is to start
determining the APIC logical mode from the BSP, and if it is disabled,
traverse the rest of the CPUs until finding the first one with APIC enabled.
Yet, I have not finished doing and checking the BSP fix and other dependent
INIT signal handling fixes.

In the meanwhile, would you be ok with restoring some of the previous
behaviour - i.e., x2APIC is enabled if any CPU turned it on (regardless to
whether APIC is software enabled), otherwise use the configuration of the
last enabled APIC?

-- >8 —
Subject: [PATCH] KVM: x86: Traverse all CPUs during recalculate_apic_map

---
 arch/x86/kvm/lapic.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9c90d31..6dc2be6 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -139,6 +139,7 @@ static void recalculate_apic_map(struct kvm *kvm)
 	struct kvm_apic_map *new, *old = NULL;
 	struct kvm_vcpu *vcpu;
 	int i;
+	bool any_enabled = false;
 
 	new = kzalloc(sizeof(struct kvm_apic_map), GFP_KERNEL);
 
@@ -160,13 +161,21 @@ static void recalculate_apic_map(struct kvm *kvm)
 		if (!kvm_apic_present(vcpu))
 			continue;
 
+		/*
+		 * All APICs DFRs have to be configured the same mode by an OS.
+		 * We take advatage of this while building logical id lookup
+		 * table. After reset APICs are in software disabled mode, so if
+		 * we find apic with different setting we assume this is the mode
+		 * OS wants all apics to be in; build lookup table accordingly.
+		 */
 		if (apic_x2apic_mode(apic)) {
 			new->ldr_bits = 32;
 			new->cid_shift = 16;
 			new->cid_mask = (1 << KVM_X2APIC_CID_BITS) - 1;
 			new->lid_mask = 0xffff;
 			new->broadcast = X2APIC_BROADCAST;
-		} else if (kvm_apic_get_reg(apic, APIC_LDR)) {
+			break;
+		} else if (!any_enabled && kvm_apic_get_reg(apic, APIC_LDR)) {
 			if (kvm_apic_get_reg(apic, APIC_DFR) ==
 							APIC_DFR_CLUSTER) {
 				new->cid_shift = 4;
@@ -179,15 +188,8 @@ static void recalculate_apic_map(struct kvm *kvm)
 			}
 		}
 
-		/*
-		 * All APICs have to be configured in the same mode by an OS.
-		 * We take advatage of this while building logical id loockup
-		 * table. After reset APICs are in software disabled mode, so if
-		 * we find apic with different setting we assume this is the mode
-		 * OS wants all apics to be in; build lookup table accordingly.
-		 */
 		if (kvm_apic_sw_enabled(apic))
-			break;
+			any_enabled = true;
 	}
 
 	kvm_for_each_vcpu(i, vcpu, kvm) {
-- 
1.9.1



^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH 14/21] KVM: x86: Software disabled APIC should still deliver NMIs
  2014-11-26 17:01             ` Nadav Amit
@ 2014-11-26 18:00               ` Paolo Bonzini
  2014-11-27 13:39               ` Radim Krčmář
  1 sibling, 0 replies; 59+ messages in thread
From: Paolo Bonzini @ 2014-11-26 18:00 UTC (permalink / raw)
  To: Nadav Amit, Radim Krčmář; +Cc: kvm list



On 26/11/2014 18:01, Nadav Amit wrote:
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
>>
>>
>> On 06/11/2014 17:45, Radim Krčmář wrote:
>>> 2014-11-06 10:34+0100, Paolo Bonzini:
>>>> On 05/11/2014 21:45, Nadav Amit wrote:
>>>>> If I understand the SDM correctly, in such scenario (all APICs are
>>>>> software disabled) the mode is left as the default - flat mode (see
>>>
>>> APIC doesn't have any global mode (it is just KVM's simplification), so
>>> when a message lands on the system bus, it just compares MDA with LDR
>>> and DFR ...
>>>
>>>>> section 10.6.2.2 "Logical Destination Mode”): "All processors that
>>>>> have their APIC software enabled (using the spurious vector
>>>>> enable/disable bit) must have their DFRs (Destination Format
>>>>> Registers) programmed identically. The default mode for DFR is flat
>>>>> mode.”
>>>
>>> I think the "default mode" points to reset state, which is flat DFR;
>>> and it is mentioned only because of the following sentence
>>>  If you are using cluster mode, DFRs must be programmed before the APIC
>>>  is software enabled.
>>>
>>>> That's not what either Bochs or QEMU do, though.  (Though in the case of
>>>> Bochs I cannot find the place where reception of IPIs is prevented for
>>>> software-disabled APICs, so I'm not sure how much to trust it in this case).
>>>>
>>>> I'm not sure why software-disabled APICs could have different DFRs, if
>>>> the APICs can receive NMI IPIs.  I'll ask Intel.
>>>
>>> When changing the mode, we can't switch DFR synchronously, so it has to
>>> happen and NMI may be needed (watchdog?) before APIC configuration.
>>> Explicit statement might have been a hint to be _extra_ careful when
>>> using logical destination for INIT, NMI, ... I wonder what they'll say.
>>>
>>> Anyway, Paolo's patch seems to be in the right direction, I'd modify it
>>> a bit though:
>>>
>>> LDR=0 isn't addressable in any logical mode, so we can ignore APICs that
>>> don't set it and decide the mode by the last nonzero one.
>>> This works in a situation, where one part is configured for cluster and
>>> the rest is still in reset state.
>>>
>>> (It gets harder if we allow nonzero LDRs with different DFR ...
>>> we'd need to split our logical map to handle it.)
>>>
>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>> index 758f838..6da303e1 100644
>>> --- a/arch/x86/kvm/lapic.c
>>> +++ b/arch/x86/kvm/lapic.c
>>> @@ -148,10 +148,6 @@ static void recalculate_apic_map(struct kvm *kvm)
>>> 		goto out;
>>>
>>> 	new->ldr_bits = 8;
>>> -	/* flat mode is default */
>>> -	new->cid_shift = 8;
>>> -	new->cid_mask = 0;
>>> -	new->lid_mask = 0xff;
>>> 	new->broadcast = APIC_BROADCAST;
>>>
>>> 	kvm_for_each_vcpu(i, vcpu, kvm) {
>>> @@ -166,7 +162,7 @@ static void recalculate_apic_map(struct kvm *kvm)
>>> 			new->cid_mask = (1 << KVM_X2APIC_CID_BITS) - 1;
>>> 			new->lid_mask = 0xffff;
>>> 			new->broadcast = X2APIC_BROADCAST;
>>> -		} else if (kvm_apic_hw_enabled(apic)) {
>>> +		} else if (kvm_apic_get_reg(apic, APIC_LDR)) {
>>> 			if (kvm_apic_get_reg(apic, APIC_DFR) ==
>>> 							APIC_DFR_CLUSTER) {
>>> 				new->cid_shift = 4;
>>
>> I merged this patch and Nadav’s.
> 
> Sorry for the late and long reply, but I got an issue with the new version
> (and my previous version as well). Indeed, the SDM states that DFR should
> be the same for enabled CPUs, and that the BIOS should get all CPUs in
> either xAPIC or x2APIC. Yet, there is nothing that says all CPUs need to be
> in xAPIC/x2APIC mode.
> 
> In my tests (which pass on bare-metal), I got a scenario in which some CPUs
> are in xAPIC mode, the BSP changed (which is currently not handled correctly
> by KVM) and the BSP has x2APIC enabled. All the core APICs are
> software-enabled. The expected behaviour is that the CPUs with x2APIC
> enabled would work in x2APIC mode.
> 
> I think such a transitory scenario is possible on real-systems as well,
> perhaps during CPU hot-plug. It appears the previous version (before all of
> our changes) handled it better. I presume the most efficient way is to start
> determining the APIC logical mode from the BSP, and if it is disabled,
> traverse the rest of the CPUs until finding the first one with APIC enabled.
> Yet, I have not finished doing and checking the BSP fix and other dependent
> INIT signal handling fixes.
> 
> In the meanwhile, would you be ok with restoring some of the previous
> behaviour - i.e., x2APIC is enabled if any CPU turned it on (regardless to
> whether APIC is software enabled), otherwise use the configuration of the
> last enabled APIC?
> 
> -- >8 —
> Subject: [PATCH] KVM: x86: Traverse all CPUs during recalculate_apic_map
> 
> ---
>  arch/x86/kvm/lapic.c | 20 +++++++++++---------
>  1 file changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 9c90d31..6dc2be6 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -139,6 +139,7 @@ static void recalculate_apic_map(struct kvm *kvm)
>  	struct kvm_apic_map *new, *old = NULL;
>  	struct kvm_vcpu *vcpu;
>  	int i;
> +	bool any_enabled = false;
>  
>  	new = kzalloc(sizeof(struct kvm_apic_map), GFP_KERNEL);
>  
> @@ -160,13 +161,21 @@ static void recalculate_apic_map(struct kvm *kvm)
>  		if (!kvm_apic_present(vcpu))
>  			continue;
>  
> +		/*
> +		 * All APICs DFRs have to be configured the same mode by an OS.
> +		 * We take advatage of this while building logical id lookup
> +		 * table. After reset APICs are in software disabled mode, so if
> +		 * we find apic with different setting we assume this is the mode
> +		 * OS wants all apics to be in; build lookup table accordingly.
> +		 */
>  		if (apic_x2apic_mode(apic)) {
>  			new->ldr_bits = 32;
>  			new->cid_shift = 16;
>  			new->cid_mask = (1 << KVM_X2APIC_CID_BITS) - 1;
>  			new->lid_mask = 0xffff;
>  			new->broadcast = X2APIC_BROADCAST;
> -		} else if (kvm_apic_get_reg(apic, APIC_LDR)) {
> +			break;
> +		} else if (!any_enabled && kvm_apic_get_reg(apic, APIC_LDR)) {
>  			if (kvm_apic_get_reg(apic, APIC_DFR) ==
>  							APIC_DFR_CLUSTER) {
>  				new->cid_shift = 4;
> @@ -179,15 +188,8 @@ static void recalculate_apic_map(struct kvm *kvm)
>  			}
>  		}
>  
> -		/*
> -		 * All APICs have to be configured in the same mode by an OS.
> -		 * We take advatage of this while building logical id loockup
> -		 * table. After reset APICs are in software disabled mode, so if
> -		 * we find apic with different setting we assume this is the mode
> -		 * OS wants all apics to be in; build lookup table accordingly.
> -		 */
>  		if (kvm_apic_sw_enabled(apic))
> -			break;
> +			any_enabled = true;
>  	}
>  
>  	kvm_for_each_vcpu(i, vcpu, kvm) {
> 

Yes, this looks good.

Paolo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 14/21] KVM: x86: Software disabled APIC should still deliver NMIs
  2014-11-26 17:01             ` Nadav Amit
  2014-11-26 18:00               ` Paolo Bonzini
@ 2014-11-27 13:39               ` Radim Krčmář
  2014-11-27 21:45                 ` Nadav Amit
  1 sibling, 1 reply; 59+ messages in thread
From: Radim Krčmář @ 2014-11-27 13:39 UTC (permalink / raw)
  To: Nadav Amit; +Cc: Paolo Bonzini, kvm list

2014-11-26 19:01+0200, Nadav Amit:
> Sorry for the late and long reply, but I got an issue with the new version
> (and my previous version as well). Indeed, the SDM states that DFR should
> be the same for enabled CPUs, and that the BIOS should get all CPUs in
> either xAPIC or x2APIC. Yet, there is nothing that says all CPUs need to be
> in xAPIC/x2APIC mode.
> 
> In my tests (which pass on bare-metal), I got a scenario in which some CPUs
> are in xAPIC mode, the BSP changed (which is currently not handled correctly
> by KVM) and the BSP has x2APIC enabled.

How many (V)CPUs were you using?
(We fail hard with logical destination x2APIC and 16+ VCPUs.)

>                                         All the core APICs are
> software-enabled. The expected behaviour is that the CPUs with x2APIC
> enabled would work in x2APIC mode.

(Nice, I bet that made some Intel designers happy.)

There shouldn't be any message conflict when using APIC IDs <255, so it
might be possible if the x2APIC isn't programmed to issue weird
messages, like physical to nonexistent APIC ID 0xff000000, which would
be also interpreted as xAPIC broadcast.

> I think such a transitory scenario is possible on real-systems as well,
> perhaps during CPU hot-plug. It appears the previous version (before all of
> our changes) handled it better. I presume the most efficient way is to start
> determining the APIC logical mode from the BSP, and if it is disabled,
> traverse the rest of the CPUs until finding the first one with APIC enabled.
> Yet, I have not finished doing and checking the BSP fix and other dependent
> INIT signal handling fixes.
> 
> In the meanwhile, would you be ok with restoring some of the previous
> behaviour - i.e., x2APIC is enabled if any CPU turned it on (regardless to
> whether APIC is software enabled), otherwise use the configuration of the
> last enabled APIC?

I don't think this patch improves anything.
(Both behaviors are wrong and I think the current one is a bit less so.)

Our x2APIC implementation is a hack that allowed faster IPI thanks to 1
MSR exit instead of 2 MMIO ones.  No OS, that doesn't know KVM's
limitations, should have enabled it because we didn't emulate interrupt
remapping, which is an architectural requirement for x2APIC ...

And for more concrete points:
- Physical x2APIC isn't affected (only broadcast, which is incorrect
  either way)

- Logical x2APIC and xAPIC don't work at the same time
  - Btw. logical x2APIC isn't supposed to work (see KVM_X2APIC_CID_BITS)
  - Logical xAPIC is shifted incorrectly in x2APIC mode, so they are all
    going to be inaccessible (ldr = 0)
  - Our map isn't designed to allow x2APIC and xAPIC at the same time

- Your patch does not cover the case where sw-disabled x2APIC is
  "before" sw-enabled xAPIC, only if it is after.

> -- >8 —
> Subject: [PATCH] KVM: x86: Traverse all CPUs during recalculate_apic_map
> 
> ---
>  arch/x86/kvm/lapic.c | 20 +++++++++++---------
>  1 file changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 9c90d31..6dc2be6 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -139,6 +139,7 @@ static void recalculate_apic_map(struct kvm *kvm)
>  	struct kvm_apic_map *new, *old = NULL;
>  	struct kvm_vcpu *vcpu;
>  	int i;
> +	bool any_enabled = false;
>  
>  	new = kzalloc(sizeof(struct kvm_apic_map), GFP_KERNEL);
>  
> @@ -160,13 +161,21 @@ static void recalculate_apic_map(struct kvm *kvm)
>  		if (!kvm_apic_present(vcpu))
>  			continue;
>  
> +		/*
> +		 * All APICs DFRs have to be configured the same mode by an OS.
> +		 * We take advatage of this while building logical id lookup
> +		 * table. After reset APICs are in software disabled mode, so if
> +		 * we find apic with different setting we assume this is the mode
> +		 * OS wants all apics to be in; build lookup table accordingly.
> +		 */
>  		if (apic_x2apic_mode(apic)) {
>  			new->ldr_bits = 32;
>  			new->cid_shift = 16;
>  			new->cid_mask = (1 << KVM_X2APIC_CID_BITS) - 1;
>  			new->lid_mask = 0xffff;
>  			new->broadcast = X2APIC_BROADCAST;
> -		} else if (kvm_apic_get_reg(apic, APIC_LDR)) {
> +			break;
> +		} else if (!any_enabled && kvm_apic_get_reg(apic, APIC_LDR)) {
>  			if (kvm_apic_get_reg(apic, APIC_DFR) ==
>  							APIC_DFR_CLUSTER) {
>  				new->cid_shift = 4;
> @@ -179,15 +188,8 @@ static void recalculate_apic_map(struct kvm *kvm)
>  			}
>  		}
>  
> -		/*
> -		 * All APICs have to be configured in the same mode by an OS.
> -		 * We take advatage of this while building logical id loockup
> -		 * table. After reset APICs are in software disabled mode, so if
> -		 * we find apic with different setting we assume this is the mode
> -		 * OS wants all apics to be in; build lookup table accordingly.
> -		 */
>  		if (kvm_apic_sw_enabled(apic))
> -			break;
> +			any_enabled = true;
>  	}
>  
>  	kvm_for_each_vcpu(i, vcpu, kvm) {
> -- 
> 1.9.1
> 
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 14/21] KVM: x86: Software disabled APIC should still deliver NMIs
  2014-11-27 13:39               ` Radim Krčmář
@ 2014-11-27 21:45                 ` Nadav Amit
  2014-11-27 22:26                   ` Radim Krčmář
  0 siblings, 1 reply; 59+ messages in thread
From: Nadav Amit @ 2014-11-27 21:45 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: Paolo Bonzini, kvm list

Radim Krčmář <rkrcmar@redhat.com> wrote:

> 2014-11-26 19:01+0200, Nadav Amit:
>> Sorry for the late and long reply, but I got an issue with the new version
>> (and my previous version as well). Indeed, the SDM states that DFR should
>> be the same for enabled CPUs, and that the BIOS should get all CPUs in
>> either xAPIC or x2APIC. Yet, there is nothing that says all CPUs need to be
>> in xAPIC/x2APIC mode.
>> 
>> In my tests (which pass on bare-metal), I got a scenario in which some CPUs
>> are in xAPIC mode, the BSP changed (which is currently not handled correctly
>> by KVM) and the BSP has x2APIC enabled.
> 
> How many (V)CPUs were you using?
> (We fail hard with logical destination x2APIC and 16+ VCPUs.)
2 at the moment. What failure do you refer to?

> 
>>                                        All the core APICs are
>> software-enabled. The expected behaviour is that the CPUs with x2APIC
>> enabled would work in x2APIC mode.
> 
> (Nice, I bet that made some Intel designers happy.)
> 
> There shouldn't be any message conflict when using APIC IDs <255, so it
> might be possible if the x2APIC isn't programmed to issue weird
> messages, like physical to nonexistent APIC ID 0xff000000, which would
> be also interpreted as xAPIC broadcast.
> 
>> I think such a transitory scenario is possible on real-systems as well,
>> perhaps during CPU hot-plug. It appears the previous version (before all of
>> our changes) handled it better. I presume the most efficient way is to start
>> determining the APIC logical mode from the BSP, and if it is disabled,
>> traverse the rest of the CPUs until finding the first one with APIC enabled.
>> Yet, I have not finished doing and checking the BSP fix and other dependent
>> INIT signal handling fixes.
>> 
>> In the meanwhile, would you be ok with restoring some of the previous
>> behaviour - i.e., x2APIC is enabled if any CPU turned it on (regardless to
>> whether APIC is software enabled), otherwise use the configuration of the
>> last enabled APIC?
> 
> I don't think this patch improves anything.
> (Both behaviors are wrong and I think the current one is a bit less so.)
> 
> Our x2APIC implementation is a hack that allowed faster IPI thanks to 1
> MSR exit instead of 2 MMIO ones.  No OS, that doesn't know KVM's
> limitations, should have enabled it because we didn't emulate interrupt
> remapping, which is an architectural requirement for x2APIC …
It is a shame - I was under the impression QEMU emulation of the Intel IOMMU
would include it as well, and I now see they only did DMAR…

> And for more concrete points:
> - Physical x2APIC isn't affected (only broadcast, which is incorrect
>  either way)
> 
> - Logical x2APIC and xAPIC don't work at the same time
No, but it is important to determine what is the “consensus” APIC mode.

>  - Btw. logical x2APIC isn't supposed to work (see KVM_X2APIC_CID_BITS)
Why? It is as if there is only a single cluster. You can still send an APIC
message to multiple CPUs within the same cluster (0).

>  - Logical xAPIC is shifted incorrectly in x2APIC mode, so they are all
>    going to be inaccessible (ldr = 0)
>  - Our map isn't designed to allow x2APIC and xAPIC at the same time
> 
> - Your patch does not cover the case where sw-disabled x2APIC is
>  "before" sw-enabled xAPIC, only if it is after.
I thought I covered it. The rationale was that if any lapic is in x2APIC
mode, then the are all in x2APIC mode. It is done similarly to the previous
version (3.18).

Anyhow, I have my workarounds, so do as you find appropriately. Once I deal
with the BSP issues, I may resubmit another version.

Nadav

>> -- >8 —
>> Subject: [PATCH] KVM: x86: Traverse all CPUs during recalculate_apic_map
>> 
>> ---
>> arch/x86/kvm/lapic.c | 20 +++++++++++---------
>> 1 file changed, 11 insertions(+), 9 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index 9c90d31..6dc2be6 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -139,6 +139,7 @@ static void recalculate_apic_map(struct kvm *kvm)
>> 	struct kvm_apic_map *new, *old = NULL;
>> 	struct kvm_vcpu *vcpu;
>> 	int i;
>> +	bool any_enabled = false;
>> 
>> 	new = kzalloc(sizeof(struct kvm_apic_map), GFP_KERNEL);
>> 
>> @@ -160,13 +161,21 @@ static void recalculate_apic_map(struct kvm *kvm)
>> 		if (!kvm_apic_present(vcpu))
>> 			continue;
>> 
>> +		/*
>> +		 * All APICs DFRs have to be configured the same mode by an OS.
>> +		 * We take advatage of this while building logical id lookup
>> +		 * table. After reset APICs are in software disabled mode, so if
>> +		 * we find apic with different setting we assume this is the mode
>> +		 * OS wants all apics to be in; build lookup table accordingly.
>> +		 */
>> 		if (apic_x2apic_mode(apic)) {
>> 			new->ldr_bits = 32;
>> 			new->cid_shift = 16;
>> 			new->cid_mask = (1 << KVM_X2APIC_CID_BITS) - 1;
>> 			new->lid_mask = 0xffff;
>> 			new->broadcast = X2APIC_BROADCAST;
>> -		} else if (kvm_apic_get_reg(apic, APIC_LDR)) {
>> +			break;
>> +		} else if (!any_enabled && kvm_apic_get_reg(apic, APIC_LDR)) {
>> 			if (kvm_apic_get_reg(apic, APIC_DFR) ==
>> 							APIC_DFR_CLUSTER) {
>> 				new->cid_shift = 4;
>> @@ -179,15 +188,8 @@ static void recalculate_apic_map(struct kvm *kvm)
>> 			}
>> 		}
>> 
>> -		/*
>> -		 * All APICs have to be configured in the same mode by an OS.
>> -		 * We take advatage of this while building logical id loockup
>> -		 * table. After reset APICs are in software disabled mode, so if
>> -		 * we find apic with different setting we assume this is the mode
>> -		 * OS wants all apics to be in; build lookup table accordingly.
>> -		 */
>> 		if (kvm_apic_sw_enabled(apic))
>> -			break;
>> +			any_enabled = true;
>> 	}
>> 
>> 	kvm_for_each_vcpu(i, vcpu, kvm) {
>> -- 
>> 1.9.1



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 14/21] KVM: x86: Software disabled APIC should still deliver NMIs
  2014-11-27 21:45                 ` Nadav Amit
@ 2014-11-27 22:26                   ` Radim Krčmář
  2014-12-01 16:30                     ` Paolo Bonzini
  0 siblings, 1 reply; 59+ messages in thread
From: Radim Krčmář @ 2014-11-27 22:26 UTC (permalink / raw)
  To: Nadav Amit; +Cc: Paolo Bonzini, kvm list

2014-11-27 23:45+0200, Nadav Amit:
> Radim Krčmář <rkrcmar@redhat.com> wrote:
> > 2014-11-26 19:01+0200, Nadav Amit:
> >> Sorry for the late and long reply, but I got an issue with the new version
> >> (and my previous version as well). Indeed, the SDM states that DFR should
> >> be the same for enabled CPUs, and that the BIOS should get all CPUs in
> >> either xAPIC or x2APIC. Yet, there is nothing that says all CPUs need to be
> >> in xAPIC/x2APIC mode.
> >> 
> >> In my tests (which pass on bare-metal), I got a scenario in which some CPUs
> >> are in xAPIC mode, the BSP changed (which is currently not handled correctly
> >> by KVM) and the BSP has x2APIC enabled.
> > 
> > How many (V)CPUs were you using?
> > (We fail hard with logical destination x2APIC and 16+ VCPUs.)
> 2 at the moment. What failure do you refer to?

(I'll cover it under KVM_X2APIC_CID_BITS.)

xAPIC shouldn't have ever made it into the logical map under x2APIC ...
Were you testing with broadcasts?

> > Our x2APIC implementation is a hack that allowed faster IPI thanks to 1
> > MSR exit instead of 2 MMIO ones.  No OS, that doesn't know KVM's
> > limitations, should have enabled it because we didn't emulate interrupt
> > remapping, which is an architectural requirement for x2APIC …
> It is a shame - I was under the impression QEMU emulation of the Intel IOMMU
> would include it as well, and I now see they only did DMAR…

(and we had this x2APIC for years ...)

> > And for more concrete points:
> > - Physical x2APIC isn't affected (only broadcast, which is incorrect
> >  either way)
> > 
> > - Logical x2APIC and xAPIC don't work at the same time
> No, but it is important to determine what is the “consensus” APIC mode.

Only for our abstraction, SDM's APICs don't need it and I'd rather see
us not depend on it either ...
(Sanity check: if you do xAPIC broadcast when there is xAPIC and x2APIC
 on real hw, does the xAPIC receive it? And if x2APIC sends 0xff000000?)

> >  - Btw. logical x2APIC isn't supposed to work (see KVM_X2APIC_CID_BITS)
> Why? It is as if there is only a single cluster. You can still send an APIC
> message to multiple CPUs within the same cluster (0).

KVM_X2APIC_CID_BITS = 0 meant that all VCPUs and messages got mapped
into cluster 0.
If you had 32 VCPUs, at least half of them wouldn't have a pointer in
the map -- and those left out would most likely be within first 16 APIC
ids, so messages would go completely off.

> >  - Logical xAPIC is shifted incorrectly in x2APIC mode, so they are all
> >    going to be inaccessible (ldr = 0)
> >  - Our map isn't designed to allow x2APIC and xAPIC at the same time
> > 
> > - Your patch does not cover the case where sw-disabled x2APIC is
> >  "before" sw-enabled xAPIC, only if it is after.
> I thought I covered it. The rationale was that if any lapic is in x2APIC
> mode, then the are all in x2APIC mode. It is done similarly to the previous
> version (3.18).

True, sorry, I missed the 'break;' in x2apic path.

We can't deliver xAPIC and x2APIC broadcasts/logical messages at the
same time with current KVM and this patch just switches the working case
in favour of x2APIC, which is why I didn't think it was necessary ...
(And I didn't understand why prefer disabled x2APIC to enabled xAPIC.)

> Anyhow, I have my workarounds, so do as you find appropriately. Once I deal
> with the BSP issues, I may resubmit another version.

I don't really mind having it, guests worked even with more broken code,
and this patch helps at least one use case :)

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 14/21] KVM: x86: Software disabled APIC should still deliver NMIs
  2014-11-27 22:26                   ` Radim Krčmář
@ 2014-12-01 16:30                     ` Paolo Bonzini
  2014-12-01 17:49                       ` Radim Krčmář
  0 siblings, 1 reply; 59+ messages in thread
From: Paolo Bonzini @ 2014-12-01 16:30 UTC (permalink / raw)
  To: Radim Krčmář, Nadav Amit; +Cc: kvm list



On 27/11/2014 23:26, Radim Krčmář wrote:
> We can't deliver xAPIC and x2APIC broadcasts/logical messages at the
> same time with current KVM and this patch just switches the working case
> in favour of x2APIC, which is why I didn't think it was necessary ...
> (And I didn't understand why prefer disabled x2APIC to enabled xAPIC.)

Indeed.  A possible thing to do perhaps would be to build two logical
maps, one for x2APIC and one for xAPIC, and consult both...  That would
slow down the common case when one map is empty, though.

Paolo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 14/21] KVM: x86: Software disabled APIC should still deliver NMIs
  2014-12-01 16:30                     ` Paolo Bonzini
@ 2014-12-01 17:49                       ` Radim Krčmář
  0 siblings, 0 replies; 59+ messages in thread
From: Radim Krčmář @ 2014-12-01 17:49 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Nadav Amit, kvm list

2014-12-01 17:30+0100, Paolo Bonzini:
> On 27/11/2014 23:26, Radim Krčmář wrote:
> > We can't deliver xAPIC and x2APIC broadcasts/logical messages at the
> > same time with current KVM and this patch just switches the working case
> > in favour of x2APIC, which is why I didn't think it was necessary ...
> > (And I didn't understand why prefer disabled x2APIC to enabled xAPIC.)
> 
> Indeed.  A possible thing to do perhaps would be to build two logical
> maps, one for x2APIC and one for xAPIC, and consult both...  That would
> slow down the common case when one map is empty, though.

Yeah, that seems like the cleanest solution ... I'll prepare the patch.
(so we can then throw it away :)

We'll need to take a look at the source (IO)APIC now.
Some design points I'm not sure about now:
 - Does xAPIC ignore reserved bytes in destination?
   (= Is x2APIC broadcast also delivered to xAPICs?)
 - we'll probably still have to hack it to deliver IOAPIC requests to
   low x2APIC logical addresses for compatibility reasons
 - to gain speed, we could try to encode them into the same cache-line
     struct {kvm_apic_t *xapic, kvm_apic_t *x2apic} logical_map[16][16];
   (and waste some space instead)


---
If we wanted to be super correct, with respect to sw-disabled xAPIC,
there should be three: x2APIC, flat xAPIC, and cluster xAPIC :)

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 12/21] KVM: x86: MOV to CR3 can set bit 63
  2014-11-02  9:54 ` [PATCH 12/21] KVM: x86: MOV to CR3 can set bit 63 Nadav Amit
@ 2015-02-10 16:15   ` Jan Kiszka
  2015-02-10 16:18     ` Paolo Bonzini
  0 siblings, 1 reply; 59+ messages in thread
From: Jan Kiszka @ 2015-02-10 16:15 UTC (permalink / raw)
  To: Nadav Amit, pbonzini; +Cc: kvm, nadav.amit

On 2014-11-02 10:54, Nadav Amit wrote:
> Although Intel SDM mentions bit 63 is reserved, MOV to CR3 can have bit 63 set.
> As Intel SDM states in section 4.10.4 "Invalidation of TLBs and
> Paging-Structure Caches": " MOV to CR3. ... If CR4.PCIDE = 1 and bit 63 of the
> instruction’s source operand is 0 ..."
> 
> In other words, bit 63 is not reserved. KVM emulator currently consider bit 63
> as reserved. Fix it.
> 
> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
> ---
>  arch/x86/include/asm/kvm_host.h | 1 +
>  arch/x86/kvm/emulate.c          | 2 +-
>  arch/x86/kvm/x86.c              | 2 ++
>  3 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 904535f..dc932d3 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -51,6 +51,7 @@
>  			  | X86_CR0_NW | X86_CR0_CD | X86_CR0_PG))
>  
>  #define CR3_L_MODE_RESERVED_BITS 0xFFFFFF0000000000ULL
> +#define CR3_PCID_INVD		 (1UL << 63)

1ULL (for i386)

Paolo, there is no 32-bit test build anymore on your side, right? I was
about to drop them from kvm-kmod as well, but at least 2 remained in
place and caught this.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 12/21] KVM: x86: MOV to CR3 can set bit 63
  2015-02-10 16:15   ` Jan Kiszka
@ 2015-02-10 16:18     ` Paolo Bonzini
  2015-02-10 16:34       ` Jan Kiszka
  0 siblings, 1 reply; 59+ messages in thread
From: Paolo Bonzini @ 2015-02-10 16:18 UTC (permalink / raw)
  To: Jan Kiszka, Nadav Amit; +Cc: kvm, nadav.amit



On 10/02/2015 17:15, Jan Kiszka wrote:
>> >  #define CR3_L_MODE_RESERVED_BITS 0xFFFFFF0000000000ULL
>> > +#define CR3_PCID_INVD		 (1UL << 63)
> 1ULL (for i386)

Already fixed:

commit cfaa790a3fb8a7efa98f4a6457e19dc3a0db35d3
Author: Borislav Petkov <bp@suse.de>
Date:   Thu Jan 15 09:44:56 2015 +0100

    kvm: Fix CR3_PCID_INVD type on 32-bit
    
    arch/x86/kvm/emulate.c: In function ‘check_cr_write’:
    arch/x86/kvm/emulate.c:3552:4: warning: left shift count >= width of type
        rsvd = CR3_L_MODE_RESERVED_BITS & ~CR3_PCID_INVD;
    
    happens because sizeof(UL) on 32-bit is 4 bytes but we shift it 63 bits
    to the left.
    
    Signed-off-by: Borislav Petkov <bp@suse.de>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

> Paolo, there is no 32-bit test build anymore on your side, right?

Reinstating it has been on my todo list for a while.  But as of now
I'm not doing 32-bit tests.

> I was
> about to drop them from kvm-kmod as well, but at least 2 remained in
> place and caught this.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 12/21] KVM: x86: MOV to CR3 can set bit 63
  2015-02-10 16:18     ` Paolo Bonzini
@ 2015-02-10 16:34       ` Jan Kiszka
  2015-02-10 16:42         ` Paolo Bonzini
  0 siblings, 1 reply; 59+ messages in thread
From: Jan Kiszka @ 2015-02-10 16:34 UTC (permalink / raw)
  To: Paolo Bonzini, Nadav Amit; +Cc: kvm, nadav.amit

On 2015-02-10 17:18, Paolo Bonzini wrote:
> On 10/02/2015 17:15, Jan Kiszka wrote:
>>>>  #define CR3_L_MODE_RESERVED_BITS 0xFFFFFF0000000000ULL
>>>> +#define CR3_PCID_INVD		 (1UL << 63)
>> 1ULL (for i386)
> 
> Already fixed:
> 
> commit cfaa790a3fb8a7efa98f4a6457e19dc3a0db35d3
> Author: Borislav Petkov <bp@suse.de>
> Date:   Thu Jan 15 09:44:56 2015 +0100
> 
>     kvm: Fix CR3_PCID_INVD type on 32-bit
>     
>     arch/x86/kvm/emulate.c: In function ‘check_cr_write’:
>     arch/x86/kvm/emulate.c:3552:4: warning: left shift count >= width of type
>         rsvd = CR3_L_MODE_RESERVED_BITS & ~CR3_PCID_INVD;
>     
>     happens because sizeof(UL) on 32-bit is 4 bytes but we shift it 63 bits
>     to the left.
>     
>     Signed-off-by: Borislav Petkov <bp@suse.de>
>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Ah, sorry. Your kvm-kmod queue lacked a source link update!

Is this queued for stable as well? It's not in 3.19 apparently.

> 
>> Paolo, there is no 32-bit test build anymore on your side, right?
> 
> Reinstating it has been on my todo list for a while.  But as of now
> I'm not doing 32-bit tests.

OK.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 12/21] KVM: x86: MOV to CR3 can set bit 63
  2015-02-10 16:34       ` Jan Kiszka
@ 2015-02-10 16:42         ` Paolo Bonzini
  0 siblings, 0 replies; 59+ messages in thread
From: Paolo Bonzini @ 2015-02-10 16:42 UTC (permalink / raw)
  To: Jan Kiszka, Nadav Amit; +Cc: kvm, nadav.amit



On 10/02/2015 17:34, Jan Kiszka wrote:
>> > 
>> > commit cfaa790a3fb8a7efa98f4a6457e19dc3a0db35d3
>> > Author: Borislav Petkov <bp@suse.de>
>> > Date:   Thu Jan 15 09:44:56 2015 +0100
>> > 
>> >     kvm: Fix CR3_PCID_INVD type on 32-bit
>> >     
>> >     arch/x86/kvm/emulate.c: In function ‘check_cr_write’:
>> >     arch/x86/kvm/emulate.c:3552:4: warning: left shift count >= width of type
>> >         rsvd = CR3_L_MODE_RESERVED_BITS & ~CR3_PCID_INVD;
>> >     
>> >     happens because sizeof(UL) on 32-bit is 4 bytes but we shift it 63 bits
>> >     to the left.
>> >     
>> >     Signed-off-by: Borislav Petkov <bp@suse.de>
>> >     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Ah, sorry. Your kvm-kmod queue lacked a source link update!
> 
> Is this queued for stable as well? It's not in 3.19 apparently.

I'll queue it for stable as soon as it hits Linus's tree.  Thanks,

Paolo

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2015-02-10 16:42 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-02  9:54 [PATCH 00/21] Fixes for various KVM bugs Nadav Amit
2014-11-02  9:54 ` [PATCH 01/21] KVM: x86: decode_modrm does not regard modrm correctly Nadav Amit
2014-11-05 11:14   ` Paolo Bonzini
2014-11-02  9:54 ` [PATCH 02/21] KVM: x86: No error-code on real-mode exceptions Nadav Amit
2014-11-02  9:54 ` [PATCH 03/21] KVM: x86: Emulator should set DR6 upon GD like real CPU Nadav Amit
2014-11-02  9:54 ` [PATCH 04/21] KVM: x86: Clear DR6[0:3] on #DB during handle_dr Nadav Amit
2014-11-02  9:54 ` [PATCH 05/21] KVM: x86: Breakpoints do not consider CS.base Nadav Amit
2014-11-02  9:54 ` [PATCH 06/21] KVM: x86: Emulator MOV-sreg uses incorrect size Nadav Amit
2014-11-05 11:28   ` Paolo Bonzini
2014-11-02  9:54 ` [PATCH 07/21] KVM: x86: Emulator considers imm as memory operand Nadav Amit
2014-11-05 11:36   ` Paolo Bonzini
2014-11-02  9:54 ` [PATCH 08/21] KVM: x86: Reset FPU state during reset Nadav Amit
2014-11-05 12:04   ` Paolo Bonzini
2014-11-05 13:20     ` Nadav Amit
2014-11-05 14:55       ` Paolo Bonzini
2014-11-05 20:31         ` Nadav Amit
2014-11-06  8:58           ` Paolo Bonzini
2014-11-06  9:13             ` Nadav Amit
2014-11-06  9:44               ` Paolo Bonzini
2014-11-06  9:56                 ` Nadav Amit
2014-11-06 10:44                   ` Paolo Bonzini
2014-11-06 17:38                 ` Radim Krčmář
2014-11-02  9:54 ` [PATCH 09/21] KVM: x86: SYSCALL cannot clear eflags[1] Nadav Amit
2014-11-02  9:54 ` [PATCH 10/21] KVM: x86: Wrong flags on CMPS and SCAS emulation Nadav Amit
2014-11-02  9:54 ` [PATCH 11/21] KVM: x86: Emulate push sreg as done in Core Nadav Amit
2014-11-02  9:54 ` [PATCH 12/21] KVM: x86: MOV to CR3 can set bit 63 Nadav Amit
2015-02-10 16:15   ` Jan Kiszka
2015-02-10 16:18     ` Paolo Bonzini
2015-02-10 16:34       ` Jan Kiszka
2015-02-10 16:42         ` Paolo Bonzini
2014-11-02  9:54 ` [PATCH 13/21] KVM: x86: Do not update EFLAGS on faulting emulation Nadav Amit
2014-11-02  9:54 ` [PATCH 14/21] KVM: x86: Software disabled APIC should still deliver NMIs Nadav Amit
2014-11-05 12:30   ` Paolo Bonzini
2014-11-05 20:45     ` Nadav Amit
2014-11-06  9:34       ` Paolo Bonzini
2014-11-06 16:45         ` Radim Krčmář
2014-11-10 17:35           ` Paolo Bonzini
2014-11-10 18:06             ` Radim Krčmář
2014-11-14 15:00           ` Paolo Bonzini
2014-11-26 17:01             ` Nadav Amit
2014-11-26 18:00               ` Paolo Bonzini
2014-11-27 13:39               ` Radim Krčmář
2014-11-27 21:45                 ` Nadav Amit
2014-11-27 22:26                   ` Radim Krčmář
2014-12-01 16:30                     ` Paolo Bonzini
2014-12-01 17:49                       ` Radim Krčmář
2014-11-02  9:54 ` [PATCH 15/21] KVM: x86: Combine the lgdt and lidt emulation logic Nadav Amit
2014-11-02  9:54 ` [PATCH 16/21] KVM: x86: Inject #GP when loading system segments with non-canonical base Nadav Amit
2014-11-02  9:54 ` [PATCH 17/21] KVM: x86: Remove redundant and incorrect cpl check on task-switch Nadav Amit
2014-11-02  9:54 ` [PATCH 18/21] KVM: x86: Emulator mis-decodes VEX instructions on real-mode Nadav Amit
2014-11-08  7:25   ` Paolo Bonzini
2014-11-02  9:54 ` [PATCH 19/21] KVM: x86: Warn on APIC base relocation Nadav Amit
2014-11-02  9:55 ` [PATCH 20/21] KVM: x86: MOVNTI emulation min opsize is not respected Nadav Amit
2014-11-05 12:18   ` Paolo Bonzini
2014-11-05 19:58     ` Nadav Amit
2014-11-05 19:58     ` Nadav Amit
2014-11-06  9:23   ` Paolo Bonzini
2014-11-02  9:55 ` [PATCH 21/21] KVM: x86: Return UNHANDLABLE on unsupported SYSENTER Nadav Amit
2014-11-05 12:31 ` [PATCH 00/21] Fixes for various KVM bugs Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.