[PATCH 00/12] KVM: x86: Fixes for debug registers, IA32_APIC

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 00/12] KVM: x86: Fixes for debug registers, IA32_APIC_BASE, and nVMX
@ 2014-01-04 17:47 Jan Kiszka
  2014-01-04 17:47 ` [PATCH 01/12] KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS Jan Kiszka
                   ` (12 more replies)
  0 siblings, 13 replies; 23+ messages in thread
From: Jan Kiszka @ 2014-01-04 17:47 UTC (permalink / raw)
  To: Paolo Bonzini, Gleb Natapov, Marcelo Tosatti; +Cc: kvm

This is on top of next after merging in the two patches of mine that are
only present in master ATM.

Highlights:
 - reworked fix of DR6 reading on SVM
 - full check for invalid writes to IA32_APIC_BASE
 - fixed support for halting in L2 (nVMX)
 - fully emulated preemption timer (nVMX)
 - tracing of nested vmexits (nVMX)

The patch "KVM: nVMX: Leave VMX mode on clearing of feature control MSR"
is included again, unchanged from previous posting.

Most fixes are backed by KVM unit tests, to be posted soon as well.

Jan Kiszka (12):
  KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS
  KVM: SVM: Fix reading of DR6
  KVM: VMX: Fix DR6 update on #DB exception
  KVM: x86: Validate guest writes to MSR_IA32_APICBASE
  KVM: nVMX: Leave VMX mode on clearing of feature control MSR
  KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit
  KVM: nVMX: Add tracepoints for nested_vmexit and nested_vmexit_inject
  KVM: nVMX: Clean up handling of VMX-related MSRs
  KVM: nVMX: Fix nested_run_pending on activity state HLT
  KVM: nVMX: Update guest activity state field on L2 exits
  KVM: nVMX: Rework interception of IRQs and NMIs
  KVM: nVMX: Fully emulate preemption timer

 arch/x86/include/asm/kvm_host.h       |   4 +
 arch/x86/include/uapi/asm/msr-index.h |   1 +
 arch/x86/kvm/cpuid.h                  |   8 +
 arch/x86/kvm/lapic.h                  |   2 +-
 arch/x86/kvm/svm.c                    |  15 ++
 arch/x86/kvm/vmx.c                    | 399 ++++++++++++++++++++--------------
 arch/x86/kvm/x86.c                    |  67 +++++-
 7 files changed, 318 insertions(+), 178 deletions(-)

-- 
1.8.1.1.298.ge7eed54


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 01/12] KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS
  2014-01-04 17:47 [PATCH 00/12] KVM: x86: Fixes for debug registers, IA32_APIC_BASE, and nVMX Jan Kiszka
@ 2014-01-04 17:47 ` Jan Kiszka
  2014-01-04 17:47 ` [PATCH 02/12] KVM: SVM: Fix reading of DR6 Jan Kiszka
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Jan Kiszka @ 2014-01-04 17:47 UTC (permalink / raw)
  To: Paolo Bonzini, Gleb Natapov, Marcelo Tosatti; +Cc: kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

Whenever we change arch.dr7, we also have to call kvm_update_dr7. In
case guest debugging is off, this will synchronize the new state into
hardware.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 arch/x86/kvm/x86.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1dc0359..5f75230 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2988,6 +2988,7 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
 	memcpy(vcpu->arch.db, dbgregs->db, sizeof(vcpu->arch.db));
 	vcpu->arch.dr6 = dbgregs->dr6;
 	vcpu->arch.dr7 = dbgregs->dr7;
+	kvm_update_dr7(vcpu);
 
 	return 0;
 }
-- 
1.8.1.1.298.ge7eed54


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 02/12] KVM: SVM: Fix reading of DR6
  2014-01-04 17:47 [PATCH 00/12] KVM: x86: Fixes for debug registers, IA32_APIC_BASE, and nVMX Jan Kiszka
  2014-01-04 17:47 ` [PATCH 01/12] KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS Jan Kiszka
@ 2014-01-04 17:47 ` Jan Kiszka
  2014-01-04 17:47 ` [PATCH 03/12] KVM: VMX: Fix DR6 update on #DB exception Jan Kiszka
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Jan Kiszka @ 2014-01-04 17:47 UTC (permalink / raw)
  To: Paolo Bonzini, Gleb Natapov, Marcelo Tosatti; +Cc: kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

In contrast to VMX, SVM dose not automatically transfer DR6 into the
VCPU's arch.dr6. So if we face a DR6 read, we must consult a new vendor
hook to obtain the current value. And as SVM now picks the DR6 state
from its VMCB, we also need a set callback in order to write updates of
DR6 back.

Fixes a regression of 020df0794f.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/svm.c              | 15 +++++++++++++++
 arch/x86/kvm/vmx.c              | 11 +++++++++++
 arch/x86/kvm/x86.c              | 19 +++++++++++++++++--
 4 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ae5d783..e73651b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -699,6 +699,8 @@ struct kvm_x86_ops {
 	void (*set_idt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
 	void (*get_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
 	void (*set_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
+	u64 (*get_dr6)(struct kvm_vcpu *vcpu);
+	void (*set_dr6)(struct kvm_vcpu *vcpu, unsigned long value);
 	void (*set_dr7)(struct kvm_vcpu *vcpu, unsigned long value);
 	void (*cache_reg)(struct kvm_vcpu *vcpu, enum kvm_reg reg);
 	unsigned long (*get_rflags)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c7168a5..e81df8f 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1671,6 +1671,19 @@ static void new_asid(struct vcpu_svm *svm, struct svm_cpu_data *sd)
 	mark_dirty(svm->vmcb, VMCB_ASID);
 }
 
+static u64 svm_get_dr6(struct kvm_vcpu *vcpu)
+{
+	return to_svm(vcpu)->vmcb->save.dr6;
+}
+
+static void svm_set_dr6(struct kvm_vcpu *vcpu, unsigned long value)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	svm->vmcb->save.dr6 = value;
+	mark_dirty(svm->vmcb, VMCB_DR);
+}
+
 static void svm_set_dr7(struct kvm_vcpu *vcpu, unsigned long value)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -4286,6 +4299,8 @@ static struct kvm_x86_ops svm_x86_ops = {
 	.set_idt = svm_set_idt,
 	.get_gdt = svm_get_gdt,
 	.set_gdt = svm_set_gdt,
+	.get_dr6 = svm_get_dr6,
+	.set_dr6 = svm_set_dr6,
 	.set_dr7 = svm_set_dr7,
 	.cache_reg = svm_cache_reg,
 	.get_rflags = svm_get_rflags,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e947cba..55cb4b6 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5149,6 +5149,15 @@ static int handle_dr(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+static u64 vmx_get_dr6(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.dr6;
+}
+
+static void vmx_set_dr6(struct kvm_vcpu *vcpu, unsigned long val)
+{
+}
+
 static void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val)
 {
 	vmcs_writel(GUEST_DR7, val);
@@ -8558,6 +8567,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
 	.set_idt = vmx_set_idt,
 	.get_gdt = vmx_get_gdt,
 	.set_gdt = vmx_set_gdt,
+	.get_dr6 = vmx_get_dr6,
+	.set_dr6 = vmx_set_dr6,
 	.set_dr7 = vmx_set_dr7,
 	.cache_reg = vmx_cache_reg,
 	.get_rflags = vmx_get_rflags,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5f75230..ea7c6a5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -719,6 +719,12 @@ unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_get_cr8);
 
+static void kvm_update_dr6(struct kvm_vcpu *vcpu)
+{
+	if (!(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP))
+		kvm_x86_ops->set_dr6(vcpu, vcpu->arch.dr6);
+}
+
 static void kvm_update_dr7(struct kvm_vcpu *vcpu)
 {
 	unsigned long dr7;
@@ -747,6 +753,7 @@ static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val)
 		if (val & 0xffffffff00000000ULL)
 			return -1; /* #GP */
 		vcpu->arch.dr6 = (val & DR6_VOLATILE) | DR6_FIXED_1;
+		kvm_update_dr6(vcpu);
 		break;
 	case 5:
 		if (kvm_read_cr4_bits(vcpu, X86_CR4_DE))
@@ -788,7 +795,10 @@ static int _kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val)
 			return 1;
 		/* fall through */
 	case 6:
-		*val = vcpu->arch.dr6;
+		if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)
+			*val = vcpu->arch.dr6;
+		else
+			*val = kvm_x86_ops->get_dr6(vcpu);
 		break;
 	case 5:
 		if (kvm_read_cr4_bits(vcpu, X86_CR4_DE))
@@ -2972,8 +2982,11 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
 static void kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu,
 					     struct kvm_debugregs *dbgregs)
 {
+	unsigned long val;
+
 	memcpy(dbgregs->db, vcpu->arch.db, sizeof(vcpu->arch.db));
-	dbgregs->dr6 = vcpu->arch.dr6;
+	_kvm_get_dr(vcpu, 6, &val);
+	dbgregs->dr6 = val;
 	dbgregs->dr7 = vcpu->arch.dr7;
 	dbgregs->flags = 0;
 	memset(&dbgregs->reserved, 0, sizeof(dbgregs->reserved));
@@ -2987,6 +3000,7 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
 
 	memcpy(vcpu->arch.db, dbgregs->db, sizeof(vcpu->arch.db));
 	vcpu->arch.dr6 = dbgregs->dr6;
+	kvm_update_dr6(vcpu);
 	vcpu->arch.dr7 = dbgregs->dr7;
 	kvm_update_dr7(vcpu);
 
@@ -6761,6 +6775,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu)
 
 	memset(vcpu->arch.db, 0, sizeof(vcpu->arch.db));
 	vcpu->arch.dr6 = DR6_FIXED_1;
+	kvm_update_dr6(vcpu);
 	vcpu->arch.dr7 = DR7_FIXED_1;
 	kvm_update_dr7(vcpu);
 
-- 
1.8.1.1.298.ge7eed54


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 03/12] KVM: VMX: Fix DR6 update on #DB exception
  2014-01-04 17:47 [PATCH 00/12] KVM: x86: Fixes for debug registers, IA32_APIC_BASE, and nVMX Jan Kiszka
  2014-01-04 17:47 ` [PATCH 01/12] KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS Jan Kiszka
  2014-01-04 17:47 ` [PATCH 02/12] KVM: SVM: Fix reading of DR6 Jan Kiszka
@ 2014-01-04 17:47 ` Jan Kiszka
  2014-01-04 17:47 ` [PATCH 04/12] KVM: x86: Validate guest writes to MSR_IA32_APICBASE Jan Kiszka
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Jan Kiszka @ 2014-01-04 17:47 UTC (permalink / raw)
  To: Paolo Bonzini, Gleb Natapov, Marcelo Tosatti; +Cc: kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

According to the SDM, only bits 0-3 of DR6 "may" be cleared by "certain"
debug exception. So do update them on #DB exception in KVM, but leave
the rest alone, only setting BD and BS in addition to already set bits
in DR6. This also aligns us with kvm_vcpu_check_singlestep.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 arch/x86/kvm/vmx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 55cb4b6..2a95ce0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4869,7 +4869,8 @@ static int handle_exception(struct kvm_vcpu *vcpu)
 		dr6 = vmcs_readl(EXIT_QUALIFICATION);
 		if (!(vcpu->guest_debug &
 		      (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))) {
-			vcpu->arch.dr6 = dr6 | DR6_FIXED_1;
+			vcpu->arch.dr6 &= ~15;
+			vcpu->arch.dr6 |= dr6;
 			kvm_queue_exception(vcpu, DB_VECTOR);
 			return 1;
 		}
-- 
1.8.1.1.298.ge7eed54


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 04/12] KVM: x86: Validate guest writes to MSR_IA32_APICBASE
  2014-01-04 17:47 [PATCH 00/12] KVM: x86: Fixes for debug registers, IA32_APIC_BASE, and nVMX Jan Kiszka
                   ` (2 preceding siblings ...)
  2014-01-04 17:47 ` [PATCH 03/12] KVM: VMX: Fix DR6 update on #DB exception Jan Kiszka
@ 2014-01-04 17:47 ` Jan Kiszka
  2014-01-16 14:07   ` Paolo Bonzini
  2014-01-04 17:47 ` [PATCH 05/12] KVM: nVMX: Leave VMX mode on clearing of feature control MSR Jan Kiszka
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 23+ messages in thread
From: Jan Kiszka @ 2014-01-04 17:47 UTC (permalink / raw)
  To: Paolo Bonzini, Gleb Natapov, Marcelo Tosatti; +Cc: kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

Check for invalid state transitions on guest-initiated updates of
MSR_IA32_APICBASE. This address both enabling of the x2APIC when it is
not supported and all invalid transitions as described in SDM section
10.12.5. It also checks that no reserved bit is set in APICBASE by the
guest.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 arch/x86/kvm/cpuid.h |  8 ++++++++
 arch/x86/kvm/lapic.h |  2 +-
 arch/x86/kvm/vmx.c   |  9 +++++----
 arch/x86/kvm/x86.c   | 32 +++++++++++++++++++++++++-------
 4 files changed, 39 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index f1e4895..a2a1bb7 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -72,4 +72,12 @@ static inline bool guest_cpuid_has_pcid(struct kvm_vcpu *vcpu)
 	return best && (best->ecx & bit(X86_FEATURE_PCID));
 }
 
+static inline bool guest_cpuid_has_x2apic(struct kvm_vcpu *vcpu)
+{
+	struct kvm_cpuid_entry2 *best;
+
+	best = kvm_find_cpuid_entry(vcpu, 1, 0);
+	return best && (best->ecx & bit(X86_FEATURE_X2APIC));
+}
+
 #endif
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index c730ac9..3ee60ef 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -65,7 +65,7 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 		struct kvm_lapic_irq *irq, int *r, unsigned long *dest_map);
 
 u64 kvm_get_apic_base(struct kvm_vcpu *vcpu);
-void kvm_set_apic_base(struct kvm_vcpu *vcpu, u64 data);
+int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
 void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
 		struct kvm_lapic_state *s);
 int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2a95ce0..9fa8a1c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4417,7 +4417,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
 static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
-	u64 msr;
+	struct msr_data apic_base_msr;
 
 	vmx->rmode.vm86_active = 0;
 
@@ -4425,10 +4425,11 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 
 	vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
 	kvm_set_cr8(&vmx->vcpu, 0);
-	msr = 0xfee00000 | MSR_IA32_APICBASE_ENABLE;
+	apic_base_msr.data = 0xfee00000 | MSR_IA32_APICBASE_ENABLE;
 	if (kvm_vcpu_is_bsp(&vmx->vcpu))
-		msr |= MSR_IA32_APICBASE_BSP;
-	kvm_set_apic_base(&vmx->vcpu, msr);
+		apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
+	apic_base_msr.host_initiated = true;
+	kvm_set_apic_base(&vmx->vcpu, &apic_base_msr);
 
 	vmx_segment_cache_clear(vmx);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ea7c6a5..559ae75 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -254,10 +254,26 @@ u64 kvm_get_apic_base(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_get_apic_base);
 
-void kvm_set_apic_base(struct kvm_vcpu *vcpu, u64 data)
-{
-	/* TODO: reserve bits check */
-	kvm_lapic_set_base(vcpu, data);
+int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+	u64 old_state = vcpu->arch.apic_base &
+		(MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE);
+	u64 new_state = msr_info->data &
+		(MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE);
+	u64 reserved_bits = ((~0ULL) << boot_cpu_data.x86_phys_bits) | 0x2ff |
+		(guest_cpuid_has_x2apic(vcpu) ? 0 : X2APIC_ENABLE);
+
+	if (!msr_info->host_initiated &&
+	    ((msr_info->data & reserved_bits) != 0 ||
+	     new_state == X2APIC_ENABLE ||
+	     (new_state == MSR_IA32_APICBASE_ENABLE &&
+	      old_state == (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE)) ||
+	     (new_state == (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE) &&
+	      old_state == 0)))
+		return 1;
+
+	kvm_lapic_set_base(vcpu, msr_info->data);
+	return 0;
 }
 EXPORT_SYMBOL_GPL(kvm_set_apic_base);
 
@@ -2027,8 +2043,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case 0x200 ... 0x2ff:
 		return set_msr_mtrr(vcpu, msr, data);
 	case MSR_IA32_APICBASE:
-		kvm_set_apic_base(vcpu, data);
-		break;
+		return kvm_set_apic_base(vcpu, msr_info);
 	case APIC_BASE_MSR ... APIC_BASE_MSR + 0x3ff:
 		return kvm_x2apic_msr_write(vcpu, msr, data);
 	case MSR_IA32_TSCDEADLINE:
@@ -6459,6 +6474,7 @@ EXPORT_SYMBOL_GPL(kvm_task_switch);
 int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 				  struct kvm_sregs *sregs)
 {
+	struct msr_data apic_base_msr;
 	int mmu_reset_needed = 0;
 	int pending_vec, max_bits, idx;
 	struct desc_ptr dt;
@@ -6482,7 +6498,9 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 
 	mmu_reset_needed |= vcpu->arch.efer != sregs->efer;
 	kvm_x86_ops->set_efer(vcpu, sregs->efer);
-	kvm_set_apic_base(vcpu, sregs->apic_base);
+	apic_base_msr.data = sregs->apic_base;
+	apic_base_msr.host_initiated = true;
+	kvm_set_apic_base(vcpu, &apic_base_msr);
 
 	mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0;
 	kvm_x86_ops->set_cr0(vcpu, sregs->cr0);
-- 
1.8.1.1.298.ge7eed54


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 05/12] KVM: nVMX: Leave VMX mode on clearing of feature control MSR
  2014-01-04 17:47 [PATCH 00/12] KVM: x86: Fixes for debug registers, IA32_APIC_BASE, and nVMX Jan Kiszka
                   ` (3 preceding siblings ...)
  2014-01-04 17:47 ` [PATCH 04/12] KVM: x86: Validate guest writes to MSR_IA32_APICBASE Jan Kiszka
@ 2014-01-04 17:47 ` Jan Kiszka
  2014-01-04 17:47 ` [PATCH 06/12] KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit Jan Kiszka
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Jan Kiszka @ 2014-01-04 17:47 UTC (permalink / raw)
  To: Paolo Bonzini, Gleb Natapov, Marcelo Tosatti; +Cc: kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

When userspace sets MSR_IA32_FEATURE_CONTROL to 0, make sure we leave
root and non-root mode, fully disabling VMX. The register state of the
VCPU is undefined after this step, so userspace has to set it to a
proper state afterward.

This enables to reboot a VM while it is running some hypervisor code.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 arch/x86/kvm/vmx.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9fa8a1c..3edf08f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2455,6 +2455,8 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
 	return 1;
 }
 
+static void vmx_leave_nested(struct kvm_vcpu *vcpu);
+
 static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
 	u32 msr_index = msr_info->index;
@@ -2470,6 +2472,8 @@ static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 				& FEATURE_CONTROL_LOCKED)
 			return 0;
 		to_vmx(vcpu)->nested.msr_ia32_feature_control = data;
+		if (host_initialized && data == 0)
+			vmx_leave_nested(vcpu);
 		return 1;
 	}
 
@@ -8507,6 +8511,16 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu)
 }
 
 /*
+ * Forcibly leave nested mode in order to be able to reset the VCPU later on.
+ */
+static void vmx_leave_nested(struct kvm_vcpu *vcpu)
+{
+	if (is_guest_mode(vcpu))
+		nested_vmx_vmexit(vcpu);
+	free_nested(to_vmx(vcpu));
+}
+
+/*
  * L1's failure to enter L2 is a subset of a normal exit, as explained in
  * 23.7 "VM-entry failures during or after loading guest state" (this also
  * lists the acceptable exit-reason and exit-qualification parameters).
-- 
1.8.1.1.298.ge7eed54


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 06/12] KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit
  2014-01-04 17:47 [PATCH 00/12] KVM: x86: Fixes for debug registers, IA32_APIC_BASE, and nVMX Jan Kiszka
                   ` (4 preceding siblings ...)
  2014-01-04 17:47 ` [PATCH 05/12] KVM: nVMX: Leave VMX mode on clearing of feature control MSR Jan Kiszka
@ 2014-01-04 17:47 ` Jan Kiszka
  2014-01-04 17:47 ` [PATCH 07/12] KVM: nVMX: Add tracepoints for nested_vmexit and nested_vmexit_inject Jan Kiszka
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Jan Kiszka @ 2014-01-04 17:47 UTC (permalink / raw)
  To: Paolo Bonzini, Gleb Natapov, Marcelo Tosatti; +Cc: kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

Instead of fixing up the vmcs12 after the nested vmexit, pass key
parameters already when calling nested_vmx_vmexit. This will help
tracing those vmexits.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 arch/x86/kvm/vmx.c | 63 +++++++++++++++++++++++++++++-------------------------
 1 file changed, 34 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 3edf08f..0bd0509 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1058,7 +1058,9 @@ static inline bool is_exception(u32 intr_info)
 		== (INTR_TYPE_HARD_EXCEPTION | INTR_INFO_VALID_MASK);
 }
 
-static void nested_vmx_vmexit(struct kvm_vcpu *vcpu);
+static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
+			      u32 exit_intr_info,
+			      unsigned long exit_qualification);
 static void nested_vmx_entry_failure(struct kvm_vcpu *vcpu,
 			struct vmcs12 *vmcs12,
 			u32 reason, unsigned long qualification);
@@ -1967,7 +1969,9 @@ static int nested_vmx_check_exception(struct kvm_vcpu *vcpu, unsigned nr)
 	if (!(vmcs12->exception_bitmap & (1u << nr)))
 		return 0;
 
-	nested_vmx_vmexit(vcpu);
+	nested_vmx_vmexit(vcpu, to_vmx(vcpu)->exit_reason,
+			  vmcs_read32(VM_EXIT_INTR_INFO),
+			  vmcs_readl(EXIT_QUALIFICATION));
 	return 1;
 }
 
@@ -4650,15 +4654,12 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
 static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
 {
 	if (is_guest_mode(vcpu)) {
-		struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
-
 		if (to_vmx(vcpu)->nested.nested_run_pending)
 			return 0;
 		if (nested_exit_on_nmi(vcpu)) {
-			nested_vmx_vmexit(vcpu);
-			vmcs12->vm_exit_reason = EXIT_REASON_EXCEPTION_NMI;
-			vmcs12->vm_exit_intr_info = NMI_VECTOR |
-				INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK;
+			nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
+					  NMI_VECTOR | INTR_TYPE_NMI_INTR |
+					  INTR_INFO_VALID_MASK, 0);
 			/*
 			 * The NMI-triggered VM exit counts as injection:
 			 * clear this one and block further NMIs.
@@ -4680,15 +4681,11 @@ static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
 static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu)
 {
 	if (is_guest_mode(vcpu)) {
-		struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
-
 		if (to_vmx(vcpu)->nested.nested_run_pending)
 			return 0;
 		if (nested_exit_on_intr(vcpu)) {
-			nested_vmx_vmexit(vcpu);
-			vmcs12->vm_exit_reason =
-				EXIT_REASON_EXTERNAL_INTERRUPT;
-			vmcs12->vm_exit_intr_info = 0;
+			nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT,
+					  0, 0);
 			/*
 			 * fall through to normal code, but now in L1, not L2
 			 */
@@ -6853,7 +6850,9 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
 		return handle_invalid_guest_state(vcpu);
 
 	if (is_guest_mode(vcpu) && nested_vmx_exit_handled(vcpu)) {
-		nested_vmx_vmexit(vcpu);
+		nested_vmx_vmexit(vcpu, exit_reason,
+				  vmcs_read32(VM_EXIT_INTR_INFO),
+				  vmcs_readl(EXIT_QUALIFICATION));
 		return 1;
 	}
 
@@ -7594,15 +7593,14 @@ static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
 static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
 		struct x86_exception *fault)
 {
-	struct vmcs12 *vmcs12;
-	nested_vmx_vmexit(vcpu);
-	vmcs12 = get_vmcs12(vcpu);
+	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+	u32 exit_reason;
 
 	if (fault->error_code & PFERR_RSVD_MASK)
-		vmcs12->vm_exit_reason = EXIT_REASON_EPT_MISCONFIG;
+		exit_reason = EXIT_REASON_EPT_MISCONFIG;
 	else
-		vmcs12->vm_exit_reason = EXIT_REASON_EPT_VIOLATION;
-	vmcs12->exit_qualification = vcpu->arch.exit_qualification;
+		exit_reason = EXIT_REASON_EPT_VIOLATION;
+	nested_vmx_vmexit(vcpu, exit_reason, 0, vcpu->arch.exit_qualification);
 	vmcs12->guest_physical_address = fault->address;
 }
 
@@ -7640,7 +7638,9 @@ static void vmx_inject_page_fault_nested(struct kvm_vcpu *vcpu,
 
 	/* TODO: also check PFEC_MATCH/MASK, not just EB.PF. */
 	if (vmcs12->exception_bitmap & (1u << PF_VECTOR))
-		nested_vmx_vmexit(vcpu);
+		nested_vmx_vmexit(vcpu, to_vmx(vcpu)->exit_reason,
+				  vmcs_read32(VM_EXIT_INTR_INFO),
+				  vmcs_readl(EXIT_QUALIFICATION));
 	else
 		kvm_inject_page_fault(vcpu, fault);
 }
@@ -8195,7 +8195,9 @@ static void vmcs12_save_pending_event(struct kvm_vcpu *vcpu,
  * exit-information fields only. Other fields are modified by L1 with VMWRITE,
  * which already writes to vmcs12 directly.
  */
-static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
+static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
+			   u32 exit_reason, u32 exit_intr_info,
+			   unsigned long exit_qualification)
 {
 	/* update guest state fields: */
 	vmcs12->guest_cr0 = vmcs12_guest_cr0(vcpu, vmcs12);
@@ -8286,10 +8288,10 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
 
 	/* update exit information fields: */
 
-	vmcs12->vm_exit_reason  = to_vmx(vcpu)->exit_reason;
-	vmcs12->exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
+	vmcs12->vm_exit_reason = exit_reason;
+	vmcs12->exit_qualification = exit_qualification;
 
-	vmcs12->vm_exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
+	vmcs12->vm_exit_intr_info = exit_intr_info;
 	if ((vmcs12->vm_exit_intr_info &
 	     (INTR_INFO_VALID_MASK | INTR_INFO_DELIVER_CODE_MASK)) ==
 	    (INTR_INFO_VALID_MASK | INTR_INFO_DELIVER_CODE_MASK))
@@ -8455,7 +8457,9 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
  * and modify vmcs12 to make it see what it would expect to see there if
  * L2 was its real guest. Must only be called when in L2 (is_guest_mode())
  */
-static void nested_vmx_vmexit(struct kvm_vcpu *vcpu)
+static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
+			      u32 exit_intr_info,
+			      unsigned long exit_qualification)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	int cpu;
@@ -8465,7 +8469,8 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu)
 	WARN_ON_ONCE(vmx->nested.nested_run_pending);
 
 	leave_guest_mode(vcpu);
-	prepare_vmcs12(vcpu, vmcs12);
+	prepare_vmcs12(vcpu, vmcs12, exit_reason, exit_intr_info,
+		       exit_qualification);
 
 	cpu = get_cpu();
 	vmx->loaded_vmcs = &vmx->vmcs01;
@@ -8516,7 +8521,7 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu)
 static void vmx_leave_nested(struct kvm_vcpu *vcpu)
 {
 	if (is_guest_mode(vcpu))
-		nested_vmx_vmexit(vcpu);
+		nested_vmx_vmexit(vcpu, -1, 0, 0);
 	free_nested(to_vmx(vcpu));
 }
 
-- 
1.8.1.1.298.ge7eed54


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 07/12] KVM: nVMX: Add tracepoints for nested_vmexit and nested_vmexit_inject
  2014-01-04 17:47 [PATCH 00/12] KVM: x86: Fixes for debug registers, IA32_APIC_BASE, and nVMX Jan Kiszka
                   ` (5 preceding siblings ...)
  2014-01-04 17:47 ` [PATCH 06/12] KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit Jan Kiszka
@ 2014-01-04 17:47 ` Jan Kiszka
  2014-01-04 17:47 ` [PATCH 08/12] KVM: nVMX: Clean up handling of VMX-related MSRs Jan Kiszka
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Jan Kiszka @ 2014-01-04 17:47 UTC (permalink / raw)
  To: Paolo Bonzini, Gleb Natapov, Marcelo Tosatti; +Cc: kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

Used von SVM introduced for tracing nested vmexit: kvm_nested_vmexit
marks exits from L2 to L0 while kvm_nested_vmexit_inject marks vmexits
that are reflected to L1.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 arch/x86/kvm/vmx.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0bd0509..9cd6eb7 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6701,6 +6701,13 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
 	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
 	u32 exit_reason = vmx->exit_reason;
 
+	trace_kvm_nested_vmexit(kvm_rip_read(vcpu), exit_reason,
+				vmcs_readl(EXIT_QUALIFICATION),
+				vmx->idt_vectoring_info,
+				intr_info,
+				vmcs_read32(VM_EXIT_INTR_ERROR_CODE),
+				KVM_ISA_VMX);
+
 	if (vmx->nested.nested_run_pending)
 		return 0;
 
@@ -8472,6 +8479,13 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
 	prepare_vmcs12(vcpu, vmcs12, exit_reason, exit_intr_info,
 		       exit_qualification);
 
+	trace_kvm_nested_vmexit_inject(vmcs12->vm_exit_reason,
+				       vmcs12->exit_qualification,
+				       vmcs12->idt_vectoring_info_field,
+				       vmcs12->vm_exit_intr_info,
+				       vmcs12->vm_exit_intr_error_code,
+				       KVM_ISA_VMX);
+
 	cpu = get_cpu();
 	vmx->loaded_vmcs = &vmx->vmcs01;
 	vmx_vcpu_put(vcpu);
-- 
1.8.1.1.298.ge7eed54


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 08/12] KVM: nVMX: Clean up handling of VMX-related MSRs
  2014-01-04 17:47 [PATCH 00/12] KVM: x86: Fixes for debug registers, IA32_APIC_BASE, and nVMX Jan Kiszka
                   ` (6 preceding siblings ...)
  2014-01-04 17:47 ` [PATCH 07/12] KVM: nVMX: Add tracepoints for nested_vmexit and nested_vmexit_inject Jan Kiszka
@ 2014-01-04 17:47 ` Jan Kiszka
  2014-01-04 17:47 ` [PATCH 09/12] KVM: nVMX: Fix nested_run_pending on activity state HLT Jan Kiszka
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Jan Kiszka @ 2014-01-04 17:47 UTC (permalink / raw)
  To: Paolo Bonzini, Gleb Natapov, Marcelo Tosatti; +Cc: kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

This simplifies the code and also stops issuing warning about writing to
unhandled MSRs when VMX is disabled or the Feature Control MSR is
locked - we do handle them all according to the spec.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 arch/x86/include/uapi/asm/msr-index.h |  1 +
 arch/x86/kvm/vmx.c                    | 79 ++++++++++-------------------------
 2 files changed, 24 insertions(+), 56 deletions(-)

diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h
index 37813b5..2e4a42d 100644
--- a/arch/x86/include/uapi/asm/msr-index.h
+++ b/arch/x86/include/uapi/asm/msr-index.h
@@ -527,6 +527,7 @@
 #define MSR_IA32_VMX_TRUE_PROCBASED_CTLS 0x0000048e
 #define MSR_IA32_VMX_TRUE_EXIT_CTLS      0x0000048f
 #define MSR_IA32_VMX_TRUE_ENTRY_CTLS     0x00000490
+#define MSR_IA32_VMX_VMFUNC             0x00000491
 
 /* VMX_BASIC bits and bitmasks */
 #define VMX_BASIC_VMCS_SIZE_SHIFT	32
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9cd6eb7..36efd47 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2361,32 +2361,10 @@ static inline u64 vmx_control_msr(u32 low, u32 high)
 	return low | ((u64)high << 32);
 }
 
-/*
- * If we allow our guest to use VMX instructions (i.e., nested VMX), we should
- * also let it use VMX-specific MSRs.
- * vmx_get_vmx_msr() and vmx_set_vmx_msr() return 1 when we handled a
- * VMX-specific MSR, or 0 when we haven't (and the caller should handle it
- * like all other MSRs).
- */
+/* Returns 0 on success, non-0 otherwise. */
 static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
 {
-	if (!nested_vmx_allowed(vcpu) && msr_index >= MSR_IA32_VMX_BASIC &&
-		     msr_index <= MSR_IA32_VMX_TRUE_ENTRY_CTLS) {
-		/*
-		 * According to the spec, processors which do not support VMX
-		 * should throw a #GP(0) when VMX capability MSRs are read.
-		 */
-		kvm_queue_exception_e(vcpu, GP_VECTOR, 0);
-		return 1;
-	}
-
 	switch (msr_index) {
-	case MSR_IA32_FEATURE_CONTROL:
-		if (nested_vmx_allowed(vcpu)) {
-			*pdata = to_vmx(vcpu)->nested.msr_ia32_feature_control;
-			break;
-		}
-		return 0;
 	case MSR_IA32_VMX_BASIC:
 		/*
 		 * This MSR reports some information about VMX support. We
@@ -2453,38 +2431,9 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
 		*pdata = nested_vmx_ept_caps;
 		break;
 	default:
-		return 0;
-	}
-
-	return 1;
-}
-
-static void vmx_leave_nested(struct kvm_vcpu *vcpu);
-
-static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
-{
-	u32 msr_index = msr_info->index;
-	u64 data = msr_info->data;
-	bool host_initialized = msr_info->host_initiated;
-
-	if (!nested_vmx_allowed(vcpu))
-		return 0;
-
-	if (msr_index == MSR_IA32_FEATURE_CONTROL) {
-		if (!host_initialized &&
-				to_vmx(vcpu)->nested.msr_ia32_feature_control
-				& FEATURE_CONTROL_LOCKED)
-			return 0;
-		to_vmx(vcpu)->nested.msr_ia32_feature_control = data;
-		if (host_initialized && data == 0)
-			vmx_leave_nested(vcpu);
 		return 1;
 	}
 
-	/*
-	 * No need to treat VMX capability MSRs specially: If we don't handle
-	 * them, handle_wrmsr will #GP(0), which is correct (they are readonly)
-	 */
 	return 0;
 }
 
@@ -2530,13 +2479,20 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
 	case MSR_IA32_SYSENTER_ESP:
 		data = vmcs_readl(GUEST_SYSENTER_ESP);
 		break;
+	case MSR_IA32_FEATURE_CONTROL:
+		if (!nested_vmx_allowed(vcpu))
+			return 1;
+		data = to_vmx(vcpu)->nested.msr_ia32_feature_control;
+		break;
+	case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
+		if (!nested_vmx_allowed(vcpu))
+			return 1;
+		return vmx_get_vmx_msr(vcpu, msr_index, pdata);
 	case MSR_TSC_AUX:
 		if (!to_vmx(vcpu)->rdtscp_enabled)
 			return 1;
 		/* Otherwise falls through */
 	default:
-		if (vmx_get_vmx_msr(vcpu, msr_index, pdata))
-			return 0;
 		msr = find_msr_entry(to_vmx(vcpu), msr_index);
 		if (msr) {
 			data = msr->data;
@@ -2549,6 +2505,8 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
 	return 0;
 }
 
+static void vmx_leave_nested(struct kvm_vcpu *vcpu);
+
 /*
  * Writes msr value into into the appropriate "register".
  * Returns 0 on success, non-0 otherwise.
@@ -2603,6 +2561,17 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_TSC_ADJUST:
 		ret = kvm_set_msr_common(vcpu, msr_info);
 		break;
+	case MSR_IA32_FEATURE_CONTROL:
+		if (!nested_vmx_allowed(vcpu) ||
+		    (to_vmx(vcpu)->nested.msr_ia32_feature_control &
+		     FEATURE_CONTROL_LOCKED && !msr_info->host_initiated))
+			return 1;
+		vmx->nested.msr_ia32_feature_control = data;
+		if (msr_info->host_initiated && data == 0)
+			vmx_leave_nested(vcpu);
+		break;
+	case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
+		return 1; /* they are read-only */
 	case MSR_TSC_AUX:
 		if (!vmx->rdtscp_enabled)
 			return 1;
@@ -2611,8 +2580,6 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			return 1;
 		/* Otherwise falls through */
 	default:
-		if (vmx_set_vmx_msr(vcpu, msr_info))
-			break;
 		msr = find_msr_entry(vmx, msr_index);
 		if (msr) {
 			msr->data = data;
-- 
1.8.1.1.298.ge7eed54


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 09/12] KVM: nVMX: Fix nested_run_pending on activity state HLT
  2014-01-04 17:47 [PATCH 00/12] KVM: x86: Fixes for debug registers, IA32_APIC_BASE, and nVMX Jan Kiszka
                   ` (7 preceding siblings ...)
  2014-01-04 17:47 ` [PATCH 08/12] KVM: nVMX: Clean up handling of VMX-related MSRs Jan Kiszka
@ 2014-01-04 17:47 ` Jan Kiszka
  2014-01-04 17:47 ` [PATCH 10/12] KVM: nVMX: Update guest activity state field on L2 exits Jan Kiszka
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Jan Kiszka @ 2014-01-04 17:47 UTC (permalink / raw)
  To: Paolo Bonzini, Gleb Natapov, Marcelo Tosatti; +Cc: kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

When we suspend the guest in HLT state, the nested run is no longer
pending - we emulated it completely. So only set nested_run_pending
after checking the activity state.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 arch/x86/kvm/vmx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 36efd47..bde8ddd 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8050,8 +8050,6 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
 
 	enter_guest_mode(vcpu);
 
-	vmx->nested.nested_run_pending = 1;
-
 	vmx->nested.vmcs01_tsc_offset = vmcs_read64(TSC_OFFSET);
 
 	cpu = get_cpu();
@@ -8070,6 +8068,8 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
 	if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT)
 		return kvm_emulate_halt(vcpu);
 
+	vmx->nested.nested_run_pending = 1;
+
 	/*
 	 * Note no nested_vmx_succeed or nested_vmx_fail here. At this point
 	 * we are no longer running L1, and VMLAUNCH/VMRESUME has not yet
-- 
1.8.1.1.298.ge7eed54


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 10/12] KVM: nVMX: Update guest activity state field on L2 exits
  2014-01-04 17:47 [PATCH 00/12] KVM: x86: Fixes for debug registers, IA32_APIC_BASE, and nVMX Jan Kiszka
                   ` (8 preceding siblings ...)
  2014-01-04 17:47 ` [PATCH 09/12] KVM: nVMX: Fix nested_run_pending on activity state HLT Jan Kiszka
@ 2014-01-04 17:47 ` Jan Kiszka
  2014-01-05 20:01   ` Paolo Bonzini
  2014-01-04 17:47 ` [PATCH 11/12] KVM: nVMX: Rework interception of IRQs and NMIs Jan Kiszka
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 23+ messages in thread
From: Jan Kiszka @ 2014-01-04 17:47 UTC (permalink / raw)
  To: Paolo Bonzini, Gleb Natapov, Marcelo Tosatti; +Cc: kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

Set guest activity state in L1's VMCS according to the VCPUs mp_state.
This ensures we report the correct state in case we L2 executed HLT or
if we put L2 into HLT state and it was now woken up by an event.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 arch/x86/kvm/vmx.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bde8ddd..1245ff1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8223,6 +8223,10 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 		vmcs_read32(GUEST_INTERRUPTIBILITY_INFO);
 	vmcs12->guest_pending_dbg_exceptions =
 		vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS);
+	if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED)
+		vmcs12->guest_activity_state = GUEST_ACTIVITY_HLT;
+	else
+		vmcs12->guest_activity_state = GUEST_ACTIVITY_ACTIVE;
 
 	if ((vmcs12->pin_based_vm_exec_control & PIN_BASED_VMX_PREEMPTION_TIMER) &&
 	    (vmcs12->vm_exit_controls & VM_EXIT_SAVE_VMX_PREEMPTION_TIMER))
-- 
1.8.1.1.298.ge7eed54


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 11/12] KVM: nVMX: Rework interception of IRQs and NMIs
  2014-01-04 17:47 [PATCH 00/12] KVM: x86: Fixes for debug registers, IA32_APIC_BASE, and nVMX Jan Kiszka
                   ` (9 preceding siblings ...)
  2014-01-04 17:47 ` [PATCH 10/12] KVM: nVMX: Update guest activity state field on L2 exits Jan Kiszka
@ 2014-01-04 17:47 ` Jan Kiszka
  2014-01-16 15:08   ` Paolo Bonzini
  2014-01-04 17:47 ` [PATCH 12/12] KVM: nVMX: Fully emulate preemption timer Jan Kiszka
  2014-01-16 15:08 ` [PATCH 00/12] KVM: x86: Fixes for debug registers, IA32_APIC_BASE, and nVMX Paolo Bonzini
  12 siblings, 1 reply; 23+ messages in thread
From: Jan Kiszka @ 2014-01-04 17:47 UTC (permalink / raw)
  To: Paolo Bonzini, Gleb Natapov, Marcelo Tosatti; +Cc: kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

Move the check for leaving L2 on pending and intercepted IRQs or NMIs
from the *_allowed handler into a dedicated callback. Invoke this
callback at the relevant points before KVM checks if IRQs/NMIs can be
injected. The callback has the task to switch from L2 to L1 if needed
and inject the proper vmexit events.

The rework fixes L2 wakeups from HLT and provides the foundation for
preemption timer emulation.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/vmx.c              | 67 +++++++++++++++++++++++------------------
 arch/x86/kvm/x86.c              | 15 +++++++--
 3 files changed, 53 insertions(+), 31 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e73651b..d195421 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -764,6 +764,8 @@ struct kvm_x86_ops {
 			       struct x86_instruction_info *info,
 			       enum x86_intercept_stage stage);
 	void (*handle_external_intr)(struct kvm_vcpu *vcpu);
+
+	int (*check_nested_events)(struct kvm_vcpu *vcpu, bool external_intr);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1245ff1..ec8a976 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4620,22 +4620,8 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
 
 static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
 {
-	if (is_guest_mode(vcpu)) {
-		if (to_vmx(vcpu)->nested.nested_run_pending)
-			return 0;
-		if (nested_exit_on_nmi(vcpu)) {
-			nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
-					  NMI_VECTOR | INTR_TYPE_NMI_INTR |
-					  INTR_INFO_VALID_MASK, 0);
-			/*
-			 * The NMI-triggered VM exit counts as injection:
-			 * clear this one and block further NMIs.
-			 */
-			vcpu->arch.nmi_pending = 0;
-			vmx_set_nmi_mask(vcpu, true);
-			return 0;
-		}
-	}
+	if (to_vmx(vcpu)->nested.nested_run_pending)
+		return 0;
 
 	if (!cpu_has_virtual_nmis() && to_vmx(vcpu)->soft_vnmi_blocked)
 		return 0;
@@ -4647,19 +4633,8 @@ static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
 
 static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu)
 {
-	if (is_guest_mode(vcpu)) {
-		if (to_vmx(vcpu)->nested.nested_run_pending)
-			return 0;
-		if (nested_exit_on_intr(vcpu)) {
-			nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT,
-					  0, 0);
-			/*
-			 * fall through to normal code, but now in L1, not L2
-			 */
-		}
-	}
-
-	return (vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
+	return (!to_vmx(vcpu)->nested.nested_run_pending &&
+		vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
 		!(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
 			(GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS));
 }
@@ -8158,6 +8133,35 @@ static void vmcs12_save_pending_event(struct kvm_vcpu *vcpu,
 	}
 }
 
+static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	if (vcpu->arch.nmi_pending && nested_exit_on_nmi(vcpu)) {
+		if (vmx->nested.nested_run_pending)
+			return -EBUSY;
+		nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
+				  NMI_VECTOR | INTR_TYPE_NMI_INTR |
+				  INTR_INFO_VALID_MASK, 0);
+		/*
+		 * The NMI-triggered VM exit counts as injection:
+		 * clear this one and block further NMIs.
+		 */
+		vcpu->arch.nmi_pending = 0;
+		vmx_set_nmi_mask(vcpu, true);
+		return 0;
+	}
+
+	if ((kvm_cpu_has_interrupt(vcpu) || external_intr) &&
+	    nested_exit_on_intr(vcpu)) {
+		if (vmx->nested.nested_run_pending)
+			return -EBUSY;
+		nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);
+	}
+
+	return 0;
+}
+
 /*
  * prepare_vmcs12 is part of what we need to do when the nested L2 guest exits
  * and we want to prepare to run its L1 parent. L1 keeps a vmcs for L2 (vmcs12),
@@ -8498,6 +8502,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
 		nested_vmx_succeed(vcpu);
 	if (enable_shadow_vmcs)
 		vmx->nested.sync_shadow_vmcs = true;
+
+	/* in case we halted in L2 */
+	vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
 }
 
 /*
@@ -8637,6 +8644,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
 
 	.check_intercept = vmx_check_intercept,
 	.handle_external_intr = vmx_handle_external_intr,
+
+	.check_nested_events = vmx_check_nested_events,
 };
 
 static int __init vmx_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 559ae75..8746b7e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5846,6 +5846,9 @@ static void inject_pending_event(struct kvm_vcpu *vcpu)
 		return;
 	}
 
+	if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events)
+		kvm_x86_ops->check_nested_events(vcpu, false);
+
 	/* try to inject new event if pending */
 	if (vcpu->arch.nmi_pending) {
 		if (kvm_x86_ops->nmi_allowed(vcpu)) {
@@ -5966,12 +5969,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 		inject_pending_event(vcpu);
 
+		if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events)
+			req_immediate_exit |=
+				kvm_x86_ops->check_nested_events(vcpu,
+								 req_int_win);
+
 		/* enable NMI/IRQ window open exits if needed */
 		if (vcpu->arch.nmi_pending)
-			req_immediate_exit =
+			req_immediate_exit |=
 				kvm_x86_ops->enable_nmi_window(vcpu) != 0;
 		else if (kvm_cpu_has_injectable_intr(vcpu) || req_int_win)
-			req_immediate_exit =
+			req_immediate_exit |=
 				kvm_x86_ops->enable_irq_window(vcpu) != 0;
 
 		if (kvm_lapic_enabled(vcpu)) {
@@ -7295,6 +7303,9 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
 {
+	if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events)
+		kvm_x86_ops->check_nested_events(vcpu, false);
+
 	return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
 		!vcpu->arch.apf.halted)
 		|| !list_empty_careful(&vcpu->async_pf.done)
-- 
1.8.1.1.298.ge7eed54


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 12/12] KVM: nVMX: Fully emulate preemption timer
  2014-01-04 17:47 [PATCH 00/12] KVM: x86: Fixes for debug registers, IA32_APIC_BASE, and nVMX Jan Kiszka
                   ` (10 preceding siblings ...)
  2014-01-04 17:47 ` [PATCH 11/12] KVM: nVMX: Rework interception of IRQs and NMIs Jan Kiszka
@ 2014-01-04 17:47 ` Jan Kiszka
  2014-01-16 15:08 ` [PATCH 00/12] KVM: x86: Fixes for debug registers, IA32_APIC_BASE, and nVMX Paolo Bonzini
  12 siblings, 0 replies; 23+ messages in thread
From: Jan Kiszka @ 2014-01-04 17:47 UTC (permalink / raw)
  To: Paolo Bonzini, Gleb Natapov, Marcelo Tosatti; +Cc: kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

We cannot rely on the hardware-provided preemption timer support because
we are holding L2 in HLT outside non-root mode. Furthermore, emulating
the preemption will resolve tick rate errata on older Intel CPUs.

The emulation is based on hrtimer which is started on L2 entry, stopped
on L2 exit and evaluated via the new check_nested_events hook. As we no
longer rely on hardware features, we can enable both the preemption
timer support and value saving unconditionally.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 arch/x86/kvm/vmx.c | 151 ++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 96 insertions(+), 55 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ec8a976..51d13c7 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -31,6 +31,7 @@
 #include <linux/ftrace_event.h>
 #include <linux/slab.h>
 #include <linux/tboot.h>
+#include <linux/hrtimer.h>
 #include "kvm_cache_regs.h"
 #include "x86.h"
 
@@ -110,6 +111,8 @@ module_param(nested, bool, S_IRUGO);
 
 #define RMODE_GUEST_OWNED_EFLAGS_BITS (~(X86_EFLAGS_IOPL | X86_EFLAGS_VM))
 
+#define VMX_MISC_EMULATED_PREEMPTION_TIMER_RATE 5
+
 /*
  * These 2 parameters are used to config the controls for Pause-Loop Exiting:
  * ple_gap:    upper bound on the amount of time between two successive
@@ -374,6 +377,9 @@ struct nested_vmx {
 	 */
 	struct page *apic_access_page;
 	u64 msr_ia32_feature_control;
+
+	struct hrtimer preemption_timer;
+	bool preemption_timer_expired;
 };
 
 #define POSTED_INTR_ON  0
@@ -1047,6 +1053,12 @@ static inline bool nested_cpu_has_virtual_nmis(struct vmcs12 *vmcs12)
 	return vmcs12->pin_based_vm_exec_control & PIN_BASED_VIRTUAL_NMIS;
 }
 
+static inline bool nested_cpu_has_preemption_timer(struct vmcs12 *vmcs12)
+{
+	return vmcs12->pin_based_vm_exec_control &
+		PIN_BASED_VMX_PREEMPTION_TIMER;
+}
+
 static inline int nested_cpu_has_ept(struct vmcs12 *vmcs12)
 {
 	return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_EPT);
@@ -2248,9 +2260,9 @@ static __init void nested_vmx_setup_ctls_msrs(void)
 	 */
 	nested_vmx_pinbased_ctls_low |= PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR;
 	nested_vmx_pinbased_ctls_high &= PIN_BASED_EXT_INTR_MASK |
-		PIN_BASED_NMI_EXITING | PIN_BASED_VIRTUAL_NMIS |
+		PIN_BASED_NMI_EXITING | PIN_BASED_VIRTUAL_NMIS;
+	nested_vmx_pinbased_ctls_high |= PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR |
 		PIN_BASED_VMX_PREEMPTION_TIMER;
-	nested_vmx_pinbased_ctls_high |= PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR;
 
 	/*
 	 * Exit controls
@@ -2265,15 +2277,10 @@ static __init void nested_vmx_setup_ctls_msrs(void)
 #ifdef CONFIG_X86_64
 		VM_EXIT_HOST_ADDR_SPACE_SIZE |
 #endif
-		VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
+		VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
+	nested_vmx_exit_ctls_high |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
+		VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER |
 		VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
-	if (!(nested_vmx_pinbased_ctls_high & PIN_BASED_VMX_PREEMPTION_TIMER) ||
-	    !(nested_vmx_exit_ctls_high & VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) {
-		nested_vmx_exit_ctls_high &= ~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
-		nested_vmx_pinbased_ctls_high &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
-	}
-	nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
-		VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER);
 
 	/* entry controls */
 	rdmsr(MSR_IA32_VMX_ENTRY_CTLS,
@@ -2342,9 +2349,9 @@ static __init void nested_vmx_setup_ctls_msrs(void)
 
 	/* miscellaneous data */
 	rdmsr(MSR_IA32_VMX_MISC, nested_vmx_misc_low, nested_vmx_misc_high);
-	nested_vmx_misc_low &= VMX_MISC_PREEMPTION_TIMER_RATE_MASK |
-		VMX_MISC_SAVE_EFER_LMA;
-	nested_vmx_misc_low |= VMX_MISC_ACTIVITY_HLT;
+	nested_vmx_misc_low &= VMX_MISC_SAVE_EFER_LMA;
+	nested_vmx_misc_low |= VMX_MISC_EMULATED_PREEMPTION_TIMER_RATE |
+		VMX_MISC_ACTIVITY_HLT;
 	nested_vmx_misc_high = 0;
 }
 
@@ -5702,6 +5709,18 @@ static void nested_vmx_failValid(struct kvm_vcpu *vcpu,
 	 */
 }
 
+static enum hrtimer_restart vmx_preemption_timer_fn(struct hrtimer *timer)
+{
+	struct vcpu_vmx *vmx =
+		container_of(timer, struct vcpu_vmx, nested.preemption_timer);
+
+	vmx->nested.preemption_timer_expired = true;
+	kvm_make_request(KVM_REQ_EVENT, &vmx->vcpu);
+	kvm_vcpu_kick(&vmx->vcpu);
+
+	return HRTIMER_NORESTART;
+}
+
 /*
  * Emulate the VMXON instruction.
  * Currently, we just remember that VMX is active, and do not save or even
@@ -5766,6 +5785,10 @@ static int handle_vmon(struct kvm_vcpu *vcpu)
 	INIT_LIST_HEAD(&(vmx->nested.vmcs02_pool));
 	vmx->nested.vmcs02_num = 0;
 
+	hrtimer_init(&vmx->nested.preemption_timer, CLOCK_MONOTONIC,
+		     HRTIMER_MODE_REL);
+	vmx->nested.preemption_timer.function = vmx_preemption_timer_fn;
+
 	vmx->nested.vmxon = true;
 
 	skip_emulated_instruction(vcpu);
@@ -6745,9 +6768,6 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
 		 * table is L0's fault.
 		 */
 		return 0;
-	case EXIT_REASON_PREEMPTION_TIMER:
-		return vmcs12->pin_based_vm_exec_control &
-			PIN_BASED_VMX_PREEMPTION_TIMER;
 	case EXIT_REASON_WBINVD:
 		return nested_cpu_has2(vmcs12, SECONDARY_EXEC_WBINVD_EXITING);
 	case EXIT_REASON_XSETBV:
@@ -6763,27 +6783,6 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2)
 	*info2 = vmcs_read32(VM_EXIT_INTR_INFO);
 }
 
-static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu)
-{
-	u64 delta_tsc_l1;
-	u32 preempt_val_l1, preempt_val_l2, preempt_scale;
-
-	if (!(get_vmcs12(vcpu)->pin_based_vm_exec_control &
-			PIN_BASED_VMX_PREEMPTION_TIMER))
-		return;
-	preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) &
-			MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE;
-	preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
-	delta_tsc_l1 = vmx_read_l1_tsc(vcpu, native_read_tsc())
-		- vcpu->arch.last_guest_tsc;
-	preempt_val_l1 = delta_tsc_l1 >> preempt_scale;
-	if (preempt_val_l2 <= preempt_val_l1)
-		preempt_val_l2 = 0;
-	else
-		preempt_val_l2 -= preempt_val_l1;
-	vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2);
-}
-
 /*
  * The guest has exited.  See if we can fix it or if we need userspace
  * assistance.
@@ -7196,8 +7195,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	atomic_switch_perf_msrs(vmx);
 	debugctlmsr = get_debugctlmsr();
 
-	if (is_guest_mode(vcpu) && !vmx->nested.nested_run_pending)
-		nested_adjust_preemption_timer(vcpu);
 	vmx->__launched = vmx->loaded_vmcs->launched;
 	asm(
 		/* Store host registers */
@@ -7594,6 +7591,28 @@ static void vmx_inject_page_fault_nested(struct kvm_vcpu *vcpu,
 		kvm_inject_page_fault(vcpu, fault);
 }
 
+static void vmx_start_preemption_timer(struct kvm_vcpu *vcpu)
+{
+	u64 preemption_timeout = get_vmcs12(vcpu)->vmx_preemption_timer_value;
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	if (vcpu->arch.virtual_tsc_khz == 0)
+		return;
+
+	/* Make sure short timeouts reliably trigger an immediate vmexit.
+	 * hrtimer_start does not guarantee this. */
+	if (preemption_timeout <= 1) {
+		vmx_preemption_timer_fn(&vmx->nested.preemption_timer);
+		return;
+	}
+
+	preemption_timeout <<= VMX_MISC_EMULATED_PREEMPTION_TIMER_RATE;
+	preemption_timeout *= 1000000;
+	do_div(preemption_timeout, vcpu->arch.virtual_tsc_khz);
+	hrtimer_start(&vmx->nested.preemption_timer,
+		      ns_to_ktime(preemption_timeout), HRTIMER_MODE_REL);
+}
+
 /*
  * prepare_vmcs02 is called when the L1 guest hypervisor runs its nested
  * L2 guest. L1 has a vmcs for L2 (vmcs12), and this function "merges" it
@@ -7607,7 +7626,6 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	u32 exec_control;
-	u32 exit_control;
 
 	vmcs_write16(GUEST_ES_SELECTOR, vmcs12->guest_es_selector);
 	vmcs_write16(GUEST_CS_SELECTOR, vmcs12->guest_cs_selector);
@@ -7665,13 +7683,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
 
 	vmcs_write64(VMCS_LINK_POINTER, -1ull);
 
-	vmcs_write32(PIN_BASED_VM_EXEC_CONTROL,
-		(vmcs_config.pin_based_exec_ctrl |
-		 vmcs12->pin_based_vm_exec_control));
+	exec_control = vmcs12->pin_based_vm_exec_control;
+	exec_control |= vmcs_config.pin_based_exec_ctrl;
+	exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
+	vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, exec_control);
 
-	if (vmcs12->pin_based_vm_exec_control & PIN_BASED_VMX_PREEMPTION_TIMER)
-		vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
-			     vmcs12->vmx_preemption_timer_value);
+	vmx->nested.preemption_timer_expired = false;
+	if (nested_cpu_has_preemption_timer(vmcs12))
+		vmx_start_preemption_timer(vcpu);
 
 	/*
 	 * Whether page-faults are trapped is determined by a combination of
@@ -7699,7 +7718,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
 		enable_ept ? vmcs12->page_fault_error_code_match : 0);
 
 	if (cpu_has_secondary_exec_ctrls()) {
-		u32 exec_control = vmx_secondary_exec_control(vmx);
+		exec_control = vmx_secondary_exec_control(vmx);
 		if (!vmx->rdtscp_enabled)
 			exec_control &= ~SECONDARY_EXEC_RDTSCP;
 		/* Take the following fields only from vmcs12 */
@@ -7786,10 +7805,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
 	 * we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER
 	 * bits are further modified by vmx_set_efer() below.
 	 */
-	exit_control = vmcs_config.vmexit_ctrl;
-	if (vmcs12->pin_based_vm_exec_control & PIN_BASED_VMX_PREEMPTION_TIMER)
-		exit_control |= VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
-	vm_exit_controls_init(vmx, exit_control);
+	vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);
 
 	/* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are
 	 * emulated by vmx_set_efer(), below.
@@ -8137,6 +8153,14 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
+	if (nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) &&
+	    vmx->nested.preemption_timer_expired) {
+		if (vmx->nested.nested_run_pending)
+			return -EBUSY;
+		nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0);
+		return 0;
+	}
+
 	if (vcpu->arch.nmi_pending && nested_exit_on_nmi(vcpu)) {
 		if (vmx->nested.nested_run_pending)
 			return -EBUSY;
@@ -8162,6 +8186,20 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
 	return 0;
 }
 
+static u32 vmx_get_preemption_timer_value(struct kvm_vcpu *vcpu)
+{
+	ktime_t remaining =
+		hrtimer_get_remaining(&to_vmx(vcpu)->nested.preemption_timer);
+	u64 value;
+
+	if (ktime_to_ns(remaining) <= 0)
+		return 0;
+
+	value = ktime_to_ns(remaining) * vcpu->arch.virtual_tsc_khz;
+	do_div(value, 1000000);
+	return value >> VMX_MISC_EMULATED_PREEMPTION_TIMER_RATE;
+}
+
 /*
  * prepare_vmcs12 is part of what we need to do when the nested L2 guest exits
  * and we want to prepare to run its L1 parent. L1 keeps a vmcs for L2 (vmcs12),
@@ -8232,10 +8270,13 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 	else
 		vmcs12->guest_activity_state = GUEST_ACTIVITY_ACTIVE;
 
-	if ((vmcs12->pin_based_vm_exec_control & PIN_BASED_VMX_PREEMPTION_TIMER) &&
-	    (vmcs12->vm_exit_controls & VM_EXIT_SAVE_VMX_PREEMPTION_TIMER))
-		vmcs12->vmx_preemption_timer_value =
-			vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
+	if (nested_cpu_has_preemption_timer(vmcs12)) {
+		if (vmcs12->vm_exit_controls &
+		    VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
+			vmcs12->vmx_preemption_timer_value =
+				vmx_get_preemption_timer_value(vcpu);
+		hrtimer_cancel(&to_vmx(vcpu)->nested.preemption_timer);
+	}
 
 	/*
 	 * In some cases (usually, nested EPT), L2 is allowed to change its
-- 
1.8.1.1.298.ge7eed54


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 10/12] KVM: nVMX: Update guest activity state field on L2 exits
  2014-01-04 17:47 ` [PATCH 10/12] KVM: nVMX: Update guest activity state field on L2 exits Jan Kiszka
@ 2014-01-05 20:01   ` Paolo Bonzini
  2014-01-05 20:16     ` Jan Kiszka
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2014-01-05 20:01 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Gleb Natapov, Marcelo Tosatti, kvm

Il 04/01/2014 18:47, Jan Kiszka ha scritto:
> From: Jan Kiszka <jan.kiszka@siemens.com>
> 
> Set guest activity state in L1's VMCS according to the VCPUs mp_state.
> This ensures we report the correct state in case we L2 executed HLT or
> if we put L2 into HLT state and it was now woken up by an event.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  arch/x86/kvm/vmx.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index bde8ddd..1245ff1 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -8223,6 +8223,10 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>  		vmcs_read32(GUEST_INTERRUPTIBILITY_INFO);
>  	vmcs12->guest_pending_dbg_exceptions =
>  		vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS);
> +	if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED)
> +		vmcs12->guest_activity_state = GUEST_ACTIVITY_HLT;
> +	else
> +		vmcs12->guest_activity_state = GUEST_ACTIVITY_ACTIVE;
>  
>  	if ((vmcs12->pin_based_vm_exec_control & PIN_BASED_VMX_PREEMPTION_TIMER) &&
>  	    (vmcs12->vm_exit_controls & VM_EXIT_SAVE_VMX_PREEMPTION_TIMER))
> 

So, without this patch, CPU_BASED_HLT_EXITING should have been set in
nested_vmx_procbased_ctls_low (because disabling HLT exits means L2 can
go "spontaneously" to KVM_MP_STATE_HALTED and without this patch it
would exit the HLT on a subsequent unrelated vmexit).

I cannot think of any other control that we do not support disabling,
but perhaps I'm missing something.

Paolo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 10/12] KVM: nVMX: Update guest activity state field on L2 exits
  2014-01-05 20:01   ` Paolo Bonzini
@ 2014-01-05 20:16     ` Jan Kiszka
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Kiszka @ 2014-01-05 20:16 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Gleb Natapov, Marcelo Tosatti, kvm

[-- Attachment #1: Type: text/plain, Size: 2168 bytes --]

On 2014-01-05 21:01, Paolo Bonzini wrote:
> Il 04/01/2014 18:47, Jan Kiszka ha scritto:
>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>
>> Set guest activity state in L1's VMCS according to the VCPUs mp_state.
>> This ensures we report the correct state in case we L2 executed HLT or
>> if we put L2 into HLT state and it was now woken up by an event.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>  arch/x86/kvm/vmx.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index bde8ddd..1245ff1 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -8223,6 +8223,10 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>>  		vmcs_read32(GUEST_INTERRUPTIBILITY_INFO);
>>  	vmcs12->guest_pending_dbg_exceptions =
>>  		vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS);
>> +	if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED)
>> +		vmcs12->guest_activity_state = GUEST_ACTIVITY_HLT;
>> +	else
>> +		vmcs12->guest_activity_state = GUEST_ACTIVITY_ACTIVE;
>>  
>>  	if ((vmcs12->pin_based_vm_exec_control & PIN_BASED_VMX_PREEMPTION_TIMER) &&
>>  	    (vmcs12->vm_exit_controls & VM_EXIT_SAVE_VMX_PREEMPTION_TIMER))
>>
> 
> So, without this patch, CPU_BASED_HLT_EXITING should have been set in
> nested_vmx_procbased_ctls_low (because disabling HLT exits means L2 can
> go "spontaneously" to KVM_MP_STATE_HALTED and without this patch it
> would exit the HLT on a subsequent unrelated vmexit).

I guess so. I've no hardware around that does not support
GUEST_ACTIVITY_HLT, but it makes most sense that those machines enforced
HLT_EXITING that way - so should have we. But that inconsistency is
fixed now, hopefully.

> 
> I cannot think of any other control that we do not support disabling,
> but perhaps I'm missing something.

Yeah, I wouldn't be surprised to find a few more broken corner cases. As
the number of hypervisor continue to increase, they may help to find
them. Reminds me that I still want to run [1] over my queue. They claim
to use the preemption timer extensively.

Jan

[1] http://muen.codelabs.ch/


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 04/12] KVM: x86: Validate guest writes to MSR_IA32_APICBASE
  2014-01-04 17:47 ` [PATCH 04/12] KVM: x86: Validate guest writes to MSR_IA32_APICBASE Jan Kiszka
@ 2014-01-16 14:07   ` Paolo Bonzini
  2014-01-16 14:19     ` Jan Kiszka
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2014-01-16 14:07 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Gleb Natapov, Marcelo Tosatti, kvm

Il 04/01/2014 18:47, Jan Kiszka ha scritto:
> +	u64 old_state = vcpu->arch.apic_base &
> +		(MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE);
> +	u64 new_state = msr_info->data &
> +		(MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE);
> +	u64 reserved_bits = ((~0ULL) << boot_cpu_data.x86_phys_bits) | 0x2ff |
> +		(guest_cpuid_has_x2apic(vcpu) ? 0 : X2APIC_ENABLE);
> +

Should this use the guest CPUID instead?

> +	if (!msr_info->host_initiated &&

Is this check on host_initiated just for backwards compatibility, or is
there another case that I am missing?

Paolo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 04/12] KVM: x86: Validate guest writes to MSR_IA32_APICBASE
  2014-01-16 14:07   ` Paolo Bonzini
@ 2014-01-16 14:19     ` Jan Kiszka
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Kiszka @ 2014-01-16 14:19 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Gleb Natapov, Marcelo Tosatti, kvm

On 2014-01-16 15:07, Paolo Bonzini wrote:
> Il 04/01/2014 18:47, Jan Kiszka ha scritto:
>> +	u64 old_state = vcpu->arch.apic_base &
>> +		(MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE);
>> +	u64 new_state = msr_info->data &
>> +		(MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE);
>> +	u64 reserved_bits = ((~0ULL) << boot_cpu_data.x86_phys_bits) | 0x2ff |
>> +		(guest_cpuid_has_x2apic(vcpu) ? 0 : X2APIC_ENABLE);
>> +
> 
> Should this use the guest CPUID instead?

Hmm, they may differ... Then yes.

> 
>> +	if (!msr_info->host_initiated &&
> 
> Is this check on host_initiated just for backwards compatibility, or is
> there another case that I am missing?

The path is taken for both host-initiated and guest-initiated APICBASE
updates. Host-initiated ones are allowed to perform architecturally
invalid state transitions. And the MSR is emulated, so if they like to
set a reserved bit...

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 00/12] KVM: x86: Fixes for debug registers, IA32_APIC_BASE, and nVMX
  2014-01-04 17:47 [PATCH 00/12] KVM: x86: Fixes for debug registers, IA32_APIC_BASE, and nVMX Jan Kiszka
                   ` (11 preceding siblings ...)
  2014-01-04 17:47 ` [PATCH 12/12] KVM: nVMX: Fully emulate preemption timer Jan Kiszka
@ 2014-01-16 15:08 ` Paolo Bonzini
  2014-01-16 15:12   ` Jan Kiszka
  2014-01-21 15:32   ` Jan Kiszka
  12 siblings, 2 replies; 23+ messages in thread
From: Paolo Bonzini @ 2014-01-16 15:08 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Gleb Natapov, Marcelo Tosatti, kvm

Il 04/01/2014 18:47, Jan Kiszka ha scritto:
> This is on top of next after merging in the two patches of mine that are
> only present in master ATM.
> 
> Highlights:
>  - reworked fix of DR6 reading on SVM
>  - full check for invalid writes to IA32_APIC_BASE
>  - fixed support for halting in L2 (nVMX)
>  - fully emulated preemption timer (nVMX)
>  - tracing of nested vmexits (nVMX)
> 
> The patch "KVM: nVMX: Leave VMX mode on clearing of feature control MSR"
> is included again, unchanged from previous posting.
> 
> Most fixes are backed by KVM unit tests, to be posted soon as well.

I'm applying patches 1-10 (for now to kvm/queue).

For the last two, I prefer to wait for 3.15.

Also, for patch 11 I would really prefer to use check_nested_events for
both VMX and SVM.  I will look at SVM next week.

Thanks,

Paolo

> Jan Kiszka (12):
>   KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS
>   KVM: SVM: Fix reading of DR6
>   KVM: VMX: Fix DR6 update on #DB exception
>   KVM: x86: Validate guest writes to MSR_IA32_APICBASE
>   KVM: nVMX: Leave VMX mode on clearing of feature control MSR
>   KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit
>   KVM: nVMX: Add tracepoints for nested_vmexit and nested_vmexit_inject
>   KVM: nVMX: Clean up handling of VMX-related MSRs
>   KVM: nVMX: Fix nested_run_pending on activity state HLT
>   KVM: nVMX: Update guest activity state field on L2 exits
>   KVM: nVMX: Rework interception of IRQs and NMIs
>   KVM: nVMX: Fully emulate preemption timer
> 
>  arch/x86/include/asm/kvm_host.h       |   4 +
>  arch/x86/include/uapi/asm/msr-index.h |   1 +
>  arch/x86/kvm/cpuid.h                  |   8 +
>  arch/x86/kvm/lapic.h                  |   2 +-
>  arch/x86/kvm/svm.c                    |  15 ++
>  arch/x86/kvm/vmx.c                    | 399 ++++++++++++++++++++--------------
>  arch/x86/kvm/x86.c                    |  67 +++++-
>  7 files changed, 318 insertions(+), 178 deletions(-)
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 11/12] KVM: nVMX: Rework interception of IRQs and NMIs
  2014-01-04 17:47 ` [PATCH 11/12] KVM: nVMX: Rework interception of IRQs and NMIs Jan Kiszka
@ 2014-01-16 15:08   ` Paolo Bonzini
  0 siblings, 0 replies; 23+ messages in thread
From: Paolo Bonzini @ 2014-01-16 15:08 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Gleb Natapov, Marcelo Tosatti, kvm

Il 04/01/2014 18:47, Jan Kiszka ha scritto:
> From: Jan Kiszka <jan.kiszka@siemens.com>
> 
> Move the check for leaving L2 on pending and intercepted IRQs or NMIs
> from the *_allowed handler into a dedicated callback. Invoke this
> callback at the relevant points before KVM checks if IRQs/NMIs can be
> injected. The callback has the task to switch from L2 to L1 if needed
> and inject the proper vmexit events.
> 
> The rework fixes L2 wakeups from HLT and provides the foundation for
> preemption timer emulation.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  2 ++
>  arch/x86/kvm/vmx.c              | 67 +++++++++++++++++++++++------------------
>  arch/x86/kvm/x86.c              | 15 +++++++--
>  3 files changed, 53 insertions(+), 31 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index e73651b..d195421 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -764,6 +764,8 @@ struct kvm_x86_ops {
>  			       struct x86_instruction_info *info,
>  			       enum x86_intercept_stage stage);
>  	void (*handle_external_intr)(struct kvm_vcpu *vcpu);
> +
> +	int (*check_nested_events)(struct kvm_vcpu *vcpu, bool external_intr);
>  };
>  
>  struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 1245ff1..ec8a976 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -4620,22 +4620,8 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
>  
>  static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
>  {
> -	if (is_guest_mode(vcpu)) {
> -		if (to_vmx(vcpu)->nested.nested_run_pending)
> -			return 0;
> -		if (nested_exit_on_nmi(vcpu)) {
> -			nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
> -					  NMI_VECTOR | INTR_TYPE_NMI_INTR |
> -					  INTR_INFO_VALID_MASK, 0);
> -			/*
> -			 * The NMI-triggered VM exit counts as injection:
> -			 * clear this one and block further NMIs.
> -			 */
> -			vcpu->arch.nmi_pending = 0;
> -			vmx_set_nmi_mask(vcpu, true);
> -			return 0;
> -		}
> -	}
> +	if (to_vmx(vcpu)->nested.nested_run_pending)
> +		return 0;
>  
>  	if (!cpu_has_virtual_nmis() && to_vmx(vcpu)->soft_vnmi_blocked)
>  		return 0;
> @@ -4647,19 +4633,8 @@ static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
>  
>  static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu)
>  {
> -	if (is_guest_mode(vcpu)) {
> -		if (to_vmx(vcpu)->nested.nested_run_pending)
> -			return 0;
> -		if (nested_exit_on_intr(vcpu)) {
> -			nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT,
> -					  0, 0);
> -			/*
> -			 * fall through to normal code, but now in L1, not L2
> -			 */
> -		}
> -	}
> -
> -	return (vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
> +	return (!to_vmx(vcpu)->nested.nested_run_pending &&
> +		vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
>  		!(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
>  			(GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS));
>  }
> @@ -8158,6 +8133,35 @@ static void vmcs12_save_pending_event(struct kvm_vcpu *vcpu,
>  	}
>  }
>  
> +static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
> +{
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> +	if (vcpu->arch.nmi_pending && nested_exit_on_nmi(vcpu)) {
> +		if (vmx->nested.nested_run_pending)
> +			return -EBUSY;
> +		nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
> +				  NMI_VECTOR | INTR_TYPE_NMI_INTR |
> +				  INTR_INFO_VALID_MASK, 0);
> +		/*
> +		 * The NMI-triggered VM exit counts as injection:
> +		 * clear this one and block further NMIs.
> +		 */
> +		vcpu->arch.nmi_pending = 0;
> +		vmx_set_nmi_mask(vcpu, true);
> +		return 0;
> +	}
> +
> +	if ((kvm_cpu_has_interrupt(vcpu) || external_intr) &&
> +	    nested_exit_on_intr(vcpu)) {
> +		if (vmx->nested.nested_run_pending)
> +			return -EBUSY;
> +		nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);
> +	}
> +
> +	return 0;
> +}
> +
>  /*
>   * prepare_vmcs12 is part of what we need to do when the nested L2 guest exits
>   * and we want to prepare to run its L1 parent. L1 keeps a vmcs for L2 (vmcs12),
> @@ -8498,6 +8502,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
>  		nested_vmx_succeed(vcpu);
>  	if (enable_shadow_vmcs)
>  		vmx->nested.sync_shadow_vmcs = true;
> +
> +	/* in case we halted in L2 */
> +	vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
>  }
>  
>  /*
> @@ -8637,6 +8644,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
>  
>  	.check_intercept = vmx_check_intercept,
>  	.handle_external_intr = vmx_handle_external_intr,
> +
> +	.check_nested_events = vmx_check_nested_events,
>  };
>  
>  static int __init vmx_init(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 559ae75..8746b7e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5846,6 +5846,9 @@ static void inject_pending_event(struct kvm_vcpu *vcpu)
>  		return;
>  	}
>  
> +	if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events)
> +		kvm_x86_ops->check_nested_events(vcpu, false);
> +
>  	/* try to inject new event if pending */
>  	if (vcpu->arch.nmi_pending) {
>  		if (kvm_x86_ops->nmi_allowed(vcpu)) {
> @@ -5966,12 +5969,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  
>  		inject_pending_event(vcpu);
>  
> +		if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events)
> +			req_immediate_exit |=
> +				kvm_x86_ops->check_nested_events(vcpu,
> +								 req_int_win);

Please add "!= 0" like below.  For now I only have this cosmetic
comment, I may have more questions when I port SVM to the new framework.

Thanks,

Paolo

>  		/* enable NMI/IRQ window open exits if needed */
>  		if (vcpu->arch.nmi_pending)
> -			req_immediate_exit =
> +			req_immediate_exit |=
>  				kvm_x86_ops->enable_nmi_window(vcpu) != 0;
>  		else if (kvm_cpu_has_injectable_intr(vcpu) || req_int_win)
> -			req_immediate_exit =
> +			req_immediate_exit |=
>  				kvm_x86_ops->enable_irq_window(vcpu) != 0;
>  
>  		if (kvm_lapic_enabled(vcpu)) {
> @@ -7295,6 +7303,9 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  
>  int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
>  {
> +	if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events)
> +		kvm_x86_ops->check_nested_events(vcpu, false);
> +
>  	return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
>  		!vcpu->arch.apf.halted)
>  		|| !list_empty_careful(&vcpu->async_pf.done)
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 00/12] KVM: x86: Fixes for debug registers, IA32_APIC_BASE, and nVMX
  2014-01-16 15:08 ` [PATCH 00/12] KVM: x86: Fixes for debug registers, IA32_APIC_BASE, and nVMX Paolo Bonzini
@ 2014-01-16 15:12   ` Jan Kiszka
  2014-01-16 15:20     ` Paolo Bonzini
  2014-01-21 15:32   ` Jan Kiszka
  1 sibling, 1 reply; 23+ messages in thread
From: Jan Kiszka @ 2014-01-16 15:12 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Gleb Natapov, Marcelo Tosatti, kvm

On 2014-01-16 16:08, Paolo Bonzini wrote:
> Il 04/01/2014 18:47, Jan Kiszka ha scritto:
>> This is on top of next after merging in the two patches of mine that are
>> only present in master ATM.
>>
>> Highlights:
>>  - reworked fix of DR6 reading on SVM
>>  - full check for invalid writes to IA32_APIC_BASE
>>  - fixed support for halting in L2 (nVMX)
>>  - fully emulated preemption timer (nVMX)
>>  - tracing of nested vmexits (nVMX)
>>
>> The patch "KVM: nVMX: Leave VMX mode on clearing of feature control MSR"
>> is included again, unchanged from previous posting.
>>
>> Most fixes are backed by KVM unit tests, to be posted soon as well.
> 
> I'm applying patches 1-10 (for now to kvm/queue).
> 
> For the last two, I prefer to wait for 3.15.

Should we disable the broken features in 3.14?

> 
> Also, for patch 11 I would really prefer to use check_nested_events for
> both VMX and SVM.  I will look at SVM next week.

OK, thanks. I will send an updated patch 11 and also a patch on top of 4
to read the physical bit width from the guest cpuid.

Jan

> 
> Thanks,
> 
> Paolo
> 
>> Jan Kiszka (12):
>>   KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS
>>   KVM: SVM: Fix reading of DR6
>>   KVM: VMX: Fix DR6 update on #DB exception
>>   KVM: x86: Validate guest writes to MSR_IA32_APICBASE
>>   KVM: nVMX: Leave VMX mode on clearing of feature control MSR
>>   KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit
>>   KVM: nVMX: Add tracepoints for nested_vmexit and nested_vmexit_inject
>>   KVM: nVMX: Clean up handling of VMX-related MSRs
>>   KVM: nVMX: Fix nested_run_pending on activity state HLT
>>   KVM: nVMX: Update guest activity state field on L2 exits
>>   KVM: nVMX: Rework interception of IRQs and NMIs
>>   KVM: nVMX: Fully emulate preemption timer
>>
>>  arch/x86/include/asm/kvm_host.h       |   4 +
>>  arch/x86/include/uapi/asm/msr-index.h |   1 +
>>  arch/x86/kvm/cpuid.h                  |   8 +
>>  arch/x86/kvm/lapic.h                  |   2 +-
>>  arch/x86/kvm/svm.c                    |  15 ++
>>  arch/x86/kvm/vmx.c                    | 399 ++++++++++++++++++++--------------
>>  arch/x86/kvm/x86.c                    |  67 +++++-
>>  7 files changed, 318 insertions(+), 178 deletions(-)
>>
> 

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 00/12] KVM: x86: Fixes for debug registers, IA32_APIC_BASE, and nVMX
  2014-01-16 15:12   ` Jan Kiszka
@ 2014-01-16 15:20     ` Paolo Bonzini
  2014-01-16 15:38       ` Jan Kiszka
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2014-01-16 15:20 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Gleb Natapov, Marcelo Tosatti, kvm

Il 16/01/2014 16:12, Jan Kiszka ha scritto:
>> > I'm applying patches 1-10 (for now to kvm/queue).
>> > 
>> > For the last two, I prefer to wait for 3.15.
> Should we disable the broken features in 3.14?

I was thinking about it---which would have meant applying patches 1-8 only.

In the end I decided not to do it because the patch doesn't affect
KVM-on-KVM emulation.  If the patch already helped you in developing
Jailhouse, so the feature must not be _completely_ broken.  So the churn
of reverting the feature now and reapplying it later is not warranted, IMO.

Thanks,

Paolo

>> > 
>> > Also, for patch 11 I would really prefer to use check_nested_events for
>> > both VMX and SVM.  I will look at SVM next week.
> OK, thanks. I will send an updated patch 11 and also a patch on top of 4
> to read the physical bit width from the guest cpuid.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 00/12] KVM: x86: Fixes for debug registers, IA32_APIC_BASE, and nVMX
  2014-01-16 15:20     ` Paolo Bonzini
@ 2014-01-16 15:38       ` Jan Kiszka
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Kiszka @ 2014-01-16 15:38 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Gleb Natapov, Marcelo Tosatti, kvm

On 2014-01-16 16:20, Paolo Bonzini wrote:
> Il 16/01/2014 16:12, Jan Kiszka ha scritto:
>>>> I'm applying patches 1-10 (for now to kvm/queue).
>>>>
>>>> For the last two, I prefer to wait for 3.15.
>> Should we disable the broken features in 3.14?
> 
> I was thinking about it---which would have meant applying patches 1-8 only.

And we would have to disable the option to turn of HLT interception.

> 
> In the end I decided not to do it because the patch doesn't affect
> KVM-on-KVM emulation.  If the patch already helped you in developing
> Jailhouse, so the feature must not be _completely_ broken.  So the churn
> of reverting the feature now and reapplying it later is not warranted, IMO.

Well, in the end nested=1 remains the sign that this feature is still
experimental.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 00/12] KVM: x86: Fixes for debug registers, IA32_APIC_BASE, and nVMX
  2014-01-16 15:08 ` [PATCH 00/12] KVM: x86: Fixes for debug registers, IA32_APIC_BASE, and nVMX Paolo Bonzini
  2014-01-16 15:12   ` Jan Kiszka
@ 2014-01-21 15:32   ` Jan Kiszka
  1 sibling, 0 replies; 23+ messages in thread
From: Jan Kiszka @ 2014-01-21 15:32 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Gleb Natapov, Marcelo Tosatti, kvm

On 2014-01-16 16:08, Paolo Bonzini wrote:
> Il 04/01/2014 18:47, Jan Kiszka ha scritto:
>> This is on top of next after merging in the two patches of mine that are
>> only present in master ATM.
>>
>> Highlights:
>>  - reworked fix of DR6 reading on SVM
>>  - full check for invalid writes to IA32_APIC_BASE
>>  - fixed support for halting in L2 (nVMX)
>>  - fully emulated preemption timer (nVMX)
>>  - tracing of nested vmexits (nVMX)
>>
>> The patch "KVM: nVMX: Leave VMX mode on clearing of feature control MSR"
>> is included again, unchanged from previous posting.
>>
>> Most fixes are backed by KVM unit tests, to be posted soon as well.
> 
> I'm applying patches 1-10 (for now to kvm/queue).
> 
> For the last two, I prefer to wait for 3.15.

FWIW, I'm seeing very rare valid IDT vectoring info on NMI exits. As
Jailhouse intercepts no exceptions, nor do any of the other reasons
given in 27.2.3 of the SDM apply here, this must not happen - and real
hw confirms this. Will dig into this once my time permits.

Jan

> 
> Also, for patch 11 I would really prefer to use check_nested_events for
> both VMX and SVM.  I will look at SVM next week.
> 
> Thanks,
> 
> Paolo
> 
>> Jan Kiszka (12):
>>   KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS
>>   KVM: SVM: Fix reading of DR6
>>   KVM: VMX: Fix DR6 update on #DB exception
>>   KVM: x86: Validate guest writes to MSR_IA32_APICBASE
>>   KVM: nVMX: Leave VMX mode on clearing of feature control MSR
>>   KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit
>>   KVM: nVMX: Add tracepoints for nested_vmexit and nested_vmexit_inject
>>   KVM: nVMX: Clean up handling of VMX-related MSRs
>>   KVM: nVMX: Fix nested_run_pending on activity state HLT
>>   KVM: nVMX: Update guest activity state field on L2 exits
>>   KVM: nVMX: Rework interception of IRQs and NMIs
>>   KVM: nVMX: Fully emulate preemption timer
>>
>>  arch/x86/include/asm/kvm_host.h       |   4 +
>>  arch/x86/include/uapi/asm/msr-index.h |   1 +
>>  arch/x86/kvm/cpuid.h                  |   8 +
>>  arch/x86/kvm/lapic.h                  |   2 +-
>>  arch/x86/kvm/svm.c                    |  15 ++
>>  arch/x86/kvm/vmx.c                    | 399 ++++++++++++++++++++--------------
>>  arch/x86/kvm/x86.c                    |  67 +++++-
>>  7 files changed, 318 insertions(+), 178 deletions(-)
>>

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-01-21 15:32 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-04 17:47 [PATCH 00/12] KVM: x86: Fixes for debug registers, IA32_APIC_BASE, and nVMX Jan Kiszka
2014-01-04 17:47 ` [PATCH 01/12] KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS Jan Kiszka
2014-01-04 17:47 ` [PATCH 02/12] KVM: SVM: Fix reading of DR6 Jan Kiszka
2014-01-04 17:47 ` [PATCH 03/12] KVM: VMX: Fix DR6 update on #DB exception Jan Kiszka
2014-01-04 17:47 ` [PATCH 04/12] KVM: x86: Validate guest writes to MSR_IA32_APICBASE Jan Kiszka
2014-01-16 14:07   ` Paolo Bonzini
2014-01-16 14:19     ` Jan Kiszka
2014-01-04 17:47 ` [PATCH 05/12] KVM: nVMX: Leave VMX mode on clearing of feature control MSR Jan Kiszka
2014-01-04 17:47 ` [PATCH 06/12] KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit Jan Kiszka
2014-01-04 17:47 ` [PATCH 07/12] KVM: nVMX: Add tracepoints for nested_vmexit and nested_vmexit_inject Jan Kiszka
2014-01-04 17:47 ` [PATCH 08/12] KVM: nVMX: Clean up handling of VMX-related MSRs Jan Kiszka
2014-01-04 17:47 ` [PATCH 09/12] KVM: nVMX: Fix nested_run_pending on activity state HLT Jan Kiszka
2014-01-04 17:47 ` [PATCH 10/12] KVM: nVMX: Update guest activity state field on L2 exits Jan Kiszka
2014-01-05 20:01   ` Paolo Bonzini
2014-01-05 20:16     ` Jan Kiszka
2014-01-04 17:47 ` [PATCH 11/12] KVM: nVMX: Rework interception of IRQs and NMIs Jan Kiszka
2014-01-16 15:08   ` Paolo Bonzini
2014-01-04 17:47 ` [PATCH 12/12] KVM: nVMX: Fully emulate preemption timer Jan Kiszka
2014-01-16 15:08 ` [PATCH 00/12] KVM: x86: Fixes for debug registers, IA32_APIC_BASE, and nVMX Paolo Bonzini
2014-01-16 15:12   ` Jan Kiszka
2014-01-16 15:20     ` Paolo Bonzini
2014-01-16 15:38       ` Jan Kiszka
2014-01-21 15:32   ` Jan Kiszka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.