linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] KVM: nested x86: nested page faults fixes
@ 2014-09-02 15:13 Paolo Bonzini
  2014-09-02 15:13 ` [PATCH 1/4] KVM: x86: reserve bit 8 of non-leaf PDPEs and PML4Es in 64-bit mode on AMD Paolo Bonzini
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Paolo Bonzini @ 2014-09-02 15:13 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: jroedel, agraf, valentine.sinitsyn, jan.kiszka, gleb, avi

Patch 1 implements AMD semantics for non-leaf PDPEs and PML4Es, which
are a bit different from Intel.  The SVM test relies on this, so fix it.

Patch 2 lets nested SVM implement nested page fault correctly.  We were
not setting bits 32/33.

Patches 3 and 4 fix the interaction between emulator and nested EPT/NPT,
which was reported by Valentine.

Reviews are very welcome, I'm walking on thin ice here...

Paolo

Paolo Bonzini (4):
  KVM: x86: reserve bit 8 of non-leaf PDPEs and PML4Es in 64-bit mode on AMD
  KVM: nSVM: propagate the NPF EXITINFO to the guest
  KVM: x86: inject nested page faults on emulated instructions
  KVM: x86: propagate exception from permission checks on the nested page fault

 arch/x86/include/asm/kvm_host.h |  9 ++++++---
 arch/x86/kvm/cpuid.h            |  8 ++++++++
 arch/x86/kvm/mmu.c              | 15 ++++++++++++---
 arch/x86/kvm/paging_tmpl.h      | 13 ++++++++++---
 arch/x86/kvm/svm.c              | 26 ++++++++++++++++++++++----
 arch/x86/kvm/x86.c              | 27 +++++++++++++++++++--------
 6 files changed, 77 insertions(+), 21 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/4] KVM: x86: reserve bit 8 of non-leaf PDPEs and PML4Es in 64-bit mode on AMD
  2014-09-02 15:13 [PATCH 0/4] KVM: nested x86: nested page faults fixes Paolo Bonzini
@ 2014-09-02 15:13 ` Paolo Bonzini
  2014-09-02 15:13 ` [PATCH 2/4] KVM: nSVM: propagate the NPF EXITINFO to the guest Paolo Bonzini
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2014-09-02 15:13 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: jroedel, agraf, valentine.sinitsyn, jan.kiszka, gleb, avi

Bit 8 would be the "global" bit, which does not quite make sense for non-leaf
page table entries.  Intel ignores it; AMD ignores it in PDEs, but reserves it
in PDPEs and PML4Es.  The SVM test is relying on this behavior, so enforce it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/cpuid.h |  8 ++++++++
 arch/x86/kvm/mmu.c   | 13 +++++++++++--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index a5380590ab0e..43b33e301e68 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -88,6 +88,14 @@ static inline bool guest_cpuid_has_x2apic(struct kvm_vcpu *vcpu)
 	return best && (best->ecx & bit(X86_FEATURE_X2APIC));
 }
 
+static inline bool guest_cpuid_is_amd(struct kvm_vcpu *vcpu)
+{
+	struct kvm_cpuid_entry2 *best;
+
+	best = kvm_find_cpuid_entry(vcpu, 0, 0);
+	return best && best->ebx == X86EMUL_CPUID_VENDOR_AuthenticAMD_ebx;
+}
+
 static inline bool guest_cpuid_has_gbpages(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpuid_entry2 *best;
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 6b6df0c5be3d..5b93a597e0c8 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3512,6 +3512,7 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
 	int maxphyaddr = cpuid_maxphyaddr(vcpu);
 	u64 exb_bit_rsvd = 0;
 	u64 gbpages_bit_rsvd = 0;
+	u64 nonleaf_bit8_rsvd = 0;
 
 	context->bad_mt_xwr = 0;
 
@@ -3519,6 +3520,14 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
 		exb_bit_rsvd = rsvd_bits(63, 63);
 	if (!guest_cpuid_has_gbpages(vcpu))
 		gbpages_bit_rsvd = rsvd_bits(7, 7);
+
+	/*
+	 * Non-leaf PML4Es and PDPEs reserve bit 8 (which would be the G bit for
+	 * leaf entries) on AMD CPUs only.
+	 */
+	if (guest_cpuid_is_amd(vcpu))
+		nonleaf_bit8_rsvd = rsvd_bits(8, 8);
+
 	switch (context->root_level) {
 	case PT32_ROOT_LEVEL:
 		/* no rsvd bits for 2 level 4K page table entries */
@@ -3553,9 +3562,9 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
 		break;
 	case PT64_ROOT_LEVEL:
 		context->rsvd_bits_mask[0][3] = exb_bit_rsvd |
-			rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 7);
+			nonleaf_bit8_rsvd | rsvd_bits(7, 7) | rsvd_bits(maxphyaddr, 51);
 		context->rsvd_bits_mask[0][2] = exb_bit_rsvd |
-			gbpages_bit_rsvd | rsvd_bits(maxphyaddr, 51);
+			nonleaf_bit8_rsvd | gbpages_bit_rsvd | rsvd_bits(maxphyaddr, 51);
 		context->rsvd_bits_mask[0][1] = exb_bit_rsvd |
 			rsvd_bits(maxphyaddr, 51);
 		context->rsvd_bits_mask[0][0] = exb_bit_rsvd |
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/4] KVM: nSVM: propagate the NPF EXITINFO to the guest
  2014-09-02 15:13 [PATCH 0/4] KVM: nested x86: nested page faults fixes Paolo Bonzini
  2014-09-02 15:13 ` [PATCH 1/4] KVM: x86: reserve bit 8 of non-leaf PDPEs and PML4Es in 64-bit mode on AMD Paolo Bonzini
@ 2014-09-02 15:13 ` Paolo Bonzini
  2014-09-02 16:33   ` Joerg Roedel
  2014-09-02 15:13 ` [PATCH 3/4] KVM: x86: inject nested page faults on emulated instructions Paolo Bonzini
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2014-09-02 15:13 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: jroedel, agraf, valentine.sinitsyn, jan.kiszka, gleb, avi

This is similar to what the EPT code does with the exit qualification.
This allows the guest to see a valid value for bits 33:32.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/paging_tmpl.h |  6 ++++++
 arch/x86/kvm/svm.c         | 26 ++++++++++++++++++++++----
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 410776528265..99d4c4e836a0 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -322,8 +322,14 @@ retry_walk:
 
 		real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn),
 					      PFERR_USER_MASK|PFERR_WRITE_MASK);
+
+		/*
+		 * Can this happen (except if the guest is playing TOCTTOU games)?
+		 * We should have gotten a nested page fault on table_gfn instead.
+		 */
 		if (unlikely(real_gfn == UNMAPPED_GVA))
 			goto error;
+
 		real_gfn = gpa_to_gfn(real_gfn);
 
 		host_addr = gfn_to_hva_prot(vcpu->kvm, real_gfn,
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 8ef704037370..bd78a3baca31 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1974,10 +1974,28 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	svm->vmcb->control.exit_code = SVM_EXIT_NPF;
-	svm->vmcb->control.exit_code_hi = 0;
-	svm->vmcb->control.exit_info_1 = fault->error_code;
-	svm->vmcb->control.exit_info_2 = fault->address;
+	/*
+	 * We can keep the value that the processor stored in the VMCB,
+	 * but make up something sensible if we hit the WARN.
+	 */
+	if (WARN_ON(svm->vmcb->control.exit_code != SVM_EXIT_NPF)) {
+		svm->vmcb->control.exit_code = SVM_EXIT_NPF;
+		svm->vmcb->control.exit_code_hi = 0;
+		svm->vmcb->control.exit_info_1 = (1ULL << 32);
+		svm->vmcb->control.exit_info_2 = fault->address;
+	}
+
+	/*
+	 * The present bit is always zero for page structure faults on real
+	 * hardware, but we compute it as 1 in fault->error_code.  We can
+	 * just keep the original exit info for page structure faults---but
+	 * not for other faults, because the processor's error code might
+	 * be wrong if we have just created a shadow PTE.
+	 */
+	if (svm->vmcb->control.exit_info_1 & (1ULL << 32)) {
+		svm->vmcb->control.exit_info_1 &= ~0xffffffffULL;
+		svm->vmcb->control.exit_info_1 |= fault->error_code;
+	}
 
 	nested_svm_vmexit(svm);
 }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/4] KVM: x86: inject nested page faults on emulated instructions
  2014-09-02 15:13 [PATCH 0/4] KVM: nested x86: nested page faults fixes Paolo Bonzini
  2014-09-02 15:13 ` [PATCH 1/4] KVM: x86: reserve bit 8 of non-leaf PDPEs and PML4Es in 64-bit mode on AMD Paolo Bonzini
  2014-09-02 15:13 ` [PATCH 2/4] KVM: nSVM: propagate the NPF EXITINFO to the guest Paolo Bonzini
@ 2014-09-02 15:13 ` Paolo Bonzini
  2014-09-04  7:02   ` Gleb Natapov
  2014-09-02 15:13 ` [PATCH 4/4] KVM: x86: propagate exception from permission checks on the nested page fault Paolo Bonzini
  2014-09-02 16:02 ` [PATCH 0/4] KVM: nested x86: nested page faults fixes Valentine Sinitsyn
  4 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2014-09-02 15:13 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: jroedel, agraf, valentine.sinitsyn, jan.kiszka, gleb, avi

This is required for the following patch to work correctly.  If a nested page
fault happens during emulation, we must inject a vmexit, not a page fault.
Luckily we already have the required machinery: it is enough to return
X86EMUL_INTERCEPTED instead of X86EMUL_PROPAGATE_FAULT.

Reported-by: Valentine Sinitsyn <valentine.sinitsyn@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/x86.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e4ed85e07a01..9e3b74c044ed 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -416,6 +416,16 @@ void kvm_propagate_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
 		vcpu->arch.mmu.inject_page_fault(vcpu, fault);
 }
 
+static inline int kvm_propagate_or_intercept(struct kvm_vcpu *vcpu,
+					     struct x86_exception *exception)
+{
+	if (likely(!exception->nested_page_fault))
+		return X86EMUL_PROPAGATE_FAULT;
+
+	vcpu->arch.mmu.inject_page_fault(vcpu, exception);
+	return X86EMUL_INTERCEPTED;
+}
+
 void kvm_inject_nmi(struct kvm_vcpu *vcpu)
 {
 	atomic_inc(&vcpu->arch.nmi_queued);
@@ -4122,7 +4132,7 @@ static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int bytes,
 		int ret;
 
 		if (gpa == UNMAPPED_GVA)
-			return X86EMUL_PROPAGATE_FAULT;
+			return kvm_propagate_or_intercept(vcpu, exception);
 		ret = kvm_read_guest_page(vcpu->kvm, gpa >> PAGE_SHIFT, data,
 					  offset, toread);
 		if (ret < 0) {
@@ -4152,7 +4162,7 @@ static int kvm_fetch_guest_virt(struct x86_emulate_ctxt *ctxt,
 	gpa_t gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, addr, access|PFERR_FETCH_MASK,
 						    exception);
 	if (unlikely(gpa == UNMAPPED_GVA))
-		return X86EMUL_PROPAGATE_FAULT;
+		return kvm_propagate_or_intercept(vcpu, exception);
 
 	offset = addr & (PAGE_SIZE-1);
 	if (WARN_ON(offset + bytes > PAGE_SIZE))
@@ -4203,7 +4213,7 @@ int kvm_write_guest_virt_system(struct x86_emulate_ctxt *ctxt,
 		int ret;
 
 		if (gpa == UNMAPPED_GVA)
-			return X86EMUL_PROPAGATE_FAULT;
+			return kvm_propagate_or_intercept(vcpu, exception);
 		ret = kvm_write_guest(vcpu->kvm, gpa, data, towrite);
 		if (ret < 0) {
 			r = X86EMUL_IO_NEEDED;
@@ -4350,7 +4360,7 @@ static int emulator_read_write_onepage(unsigned long addr, void *val,
 	ret = vcpu_mmio_gva_to_gpa(vcpu, addr, &gpa, exception, write);
 
 	if (ret < 0)
-		return X86EMUL_PROPAGATE_FAULT;
+		return kvm_propagate_or_intercept(vcpu, exception);
 
 	/* For APIC access vmexit */
 	if (ret)
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/4] KVM: x86: propagate exception from permission checks on the nested page fault
  2014-09-02 15:13 [PATCH 0/4] KVM: nested x86: nested page faults fixes Paolo Bonzini
                   ` (2 preceding siblings ...)
  2014-09-02 15:13 ` [PATCH 3/4] KVM: x86: inject nested page faults on emulated instructions Paolo Bonzini
@ 2014-09-02 15:13 ` Paolo Bonzini
  2014-09-02 16:02 ` [PATCH 0/4] KVM: nested x86: nested page faults fixes Valentine Sinitsyn
  4 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2014-09-02 15:13 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: jroedel, agraf, valentine.sinitsyn, jan.kiszka, gleb, avi

Currently, if a permission error happens during the translation of
the final GPA to HPA, walk_addr_generic returns 0 but does not fill
in walker->fault.  To avoid this, add an x86_exception* argument
to the translate_gpa function, and let it fill in walker->fault.
The nested_page_fault field will be true, since the walk_mmu is the
nested_mmu and translate_gpu instead operates on the "outer" (NPT)
instance.

Reported-by: Valentine Sinitsyn <valentine.sinitsyn@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 9 ++++++---
 arch/x86/kvm/mmu.c              | 2 +-
 arch/x86/kvm/paging_tmpl.h      | 7 ++++---
 arch/x86/kvm/x86.c              | 9 +++++----
 4 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 08cc299ec6f4..6bf2f3c7e180 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -262,7 +262,8 @@ struct kvm_mmu {
 				  struct x86_exception *fault);
 	gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access,
 			    struct x86_exception *exception);
-	gpa_t (*translate_gpa)(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access);
+	gpa_t (*translate_gpa)(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access,
+			       struct x86_exception *exception);
 	int (*sync_page)(struct kvm_vcpu *vcpu,
 			 struct kvm_mmu_page *sp);
 	void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva);
@@ -924,7 +925,8 @@ void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu);
 int kvm_mmu_load(struct kvm_vcpu *vcpu);
 void kvm_mmu_unload(struct kvm_vcpu *vcpu);
 void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu);
-gpa_t translate_nested_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access);
+gpa_t translate_nested_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access,
+			   struct x86_exception *exception);
 gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva,
 			      struct x86_exception *exception);
 gpa_t kvm_mmu_gva_to_gpa_fetch(struct kvm_vcpu *vcpu, gva_t gva,
@@ -944,7 +946,8 @@ void kvm_mmu_new_cr3(struct kvm_vcpu *vcpu);
 void kvm_enable_tdp(void);
 void kvm_disable_tdp(void);
 
-static inline gpa_t translate_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access)
+static inline gpa_t translate_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access,
+				  struct x86_exception *exception)
 {
 	return gpa;
 }
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5b93a597e0c8..76398fe15df2 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3200,7 +3200,7 @@ static gpa_t nonpaging_gva_to_gpa_nested(struct kvm_vcpu *vcpu, gva_t vaddr,
 {
 	if (exception)
 		exception->error_code = 0;
-	return vcpu->arch.nested_mmu.translate_gpa(vcpu, vaddr, access);
+	return vcpu->arch.nested_mmu.translate_gpa(vcpu, vaddr, access, exception);
 }
 
 static bool quickly_check_mmio_pf(struct kvm_vcpu *vcpu, u64 addr, bool direct)
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 99d4c4e836a0..a97e9214ebce 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -321,14 +321,15 @@ retry_walk:
 		walker->pte_gpa[walker->level - 1] = pte_gpa;
 
 		real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn),
-					      PFERR_USER_MASK|PFERR_WRITE_MASK);
+					      PFERR_USER_MASK|PFERR_WRITE_MASK,
+					      &walker->fault);
 
 		/*
 		 * Can this happen (except if the guest is playing TOCTTOU games)?
 		 * We should have gotten a nested page fault on table_gfn instead.
 		 */
 		if (unlikely(real_gfn == UNMAPPED_GVA))
-			goto error;
+			return 0;
 
 		real_gfn = gpa_to_gfn(real_gfn);
 
@@ -370,7 +371,7 @@ retry_walk:
 	if (PTTYPE == 32 && walker->level == PT_DIRECTORY_LEVEL && is_cpuid_PSE36())
 		gfn += pse36_gfn_delta(pte);
 
-	real_gpa = mmu->translate_gpa(vcpu, gfn_to_gpa(gfn), access);
+	real_gpa = mmu->translate_gpa(vcpu, gfn_to_gpa(gfn), access, &walker->fault);
 	if (real_gpa == UNMAPPED_GVA)
 		return 0;
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9e3b74c044ed..022513bf92f1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -467,11 +467,12 @@ int kvm_read_guest_page_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 			    gfn_t ngfn, void *data, int offset, int len,
 			    u32 access)
 {
+	struct x86_exception exception;
 	gfn_t real_gfn;
 	gpa_t ngpa;
 
 	ngpa     = gfn_to_gpa(ngfn);
-	real_gfn = mmu->translate_gpa(vcpu, ngpa, access);
+	real_gfn = mmu->translate_gpa(vcpu, ngpa, access, &exception);
 	if (real_gfn == UNMAPPED_GVA)
 		return -EFAULT;
 
@@ -4073,16 +4074,16 @@ void kvm_get_segment(struct kvm_vcpu *vcpu,
 	kvm_x86_ops->get_segment(vcpu, var, seg);
 }
 
-gpa_t translate_nested_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access)
+gpa_t translate_nested_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access,
+			   struct x86_exception *exception)
 {
 	gpa_t t_gpa;
-	struct x86_exception exception;
 
 	BUG_ON(!mmu_is_nested(vcpu));
 
 	/* NPT walks are always user-walks */
 	access |= PFERR_USER_MASK;
-	t_gpa  = vcpu->arch.mmu.gva_to_gpa(vcpu, gpa, access, &exception);
+	t_gpa  = vcpu->arch.mmu.gva_to_gpa(vcpu, gpa, access, exception);
 
 	return t_gpa;
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/4] KVM: nested x86: nested page faults fixes
  2014-09-02 15:13 [PATCH 0/4] KVM: nested x86: nested page faults fixes Paolo Bonzini
                   ` (3 preceding siblings ...)
  2014-09-02 15:13 ` [PATCH 4/4] KVM: x86: propagate exception from permission checks on the nested page fault Paolo Bonzini
@ 2014-09-02 16:02 ` Valentine Sinitsyn
  2014-09-02 16:56   ` Paolo Bonzini
  4 siblings, 1 reply; 18+ messages in thread
From: Valentine Sinitsyn @ 2014-09-02 16:02 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: jroedel, agraf, jan.kiszka, gleb, avi

Hi Paolo,

On 02.09.2014 21:13, Paolo Bonzini wrote:
> Patches 3 and 4 fix the interaction between emulator and nested EPT/NPT,
> which was reported by Valentine.
I can confirm the initial bug I observed is fixed with these patches 
(applied to 3.16.1).

All tests in kvm-unit-test's master also pass, except for ioio which is 
(probably) affected by another (unrelated) bug fixed by Jan back in June 
but not in 3.16 [1,2].

1. http://thread.gmane.org/gmane.comp.emulators.kvm.devel/124071/
2. http://thread.gmane.org/gmane.comp.emulators.kvm.devel/124067/

Thanks,
Valentine

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/4] KVM: nSVM: propagate the NPF EXITINFO to the guest
  2014-09-02 15:13 ` [PATCH 2/4] KVM: nSVM: propagate the NPF EXITINFO to the guest Paolo Bonzini
@ 2014-09-02 16:33   ` Joerg Roedel
  2014-09-02 16:46     ` Paolo Bonzini
  0 siblings, 1 reply; 18+ messages in thread
From: Joerg Roedel @ 2014-09-02 16:33 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, agraf, valentine.sinitsyn, jan.kiszka, gleb, avi

Ah, here you add emulation of these bits.

On Tue, Sep 02, 2014 at 05:13:48PM +0200, Paolo Bonzini wrote:
> This is similar to what the EPT code does with the exit qualification.
> This allows the guest to see a valid value for bits 33:32.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/paging_tmpl.h |  6 ++++++
>  arch/x86/kvm/svm.c         | 26 ++++++++++++++++++++++----
>  2 files changed, 28 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index 410776528265..99d4c4e836a0 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -322,8 +322,14 @@ retry_walk:
>  
>  		real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn),
>  					      PFERR_USER_MASK|PFERR_WRITE_MASK);
> +
> +		/*
> +		 * Can this happen (except if the guest is playing TOCTTOU games)?
> +		 * We should have gotten a nested page fault on table_gfn instead.
> +		 */

Comment is true, but doesn't make the check below obsolete, no?

>  		if (unlikely(real_gfn == UNMAPPED_GVA))
>  			goto error;
> @@ -1974,10 +1974,28 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
> -	svm->vmcb->control.exit_code = SVM_EXIT_NPF;
> -	svm->vmcb->control.exit_code_hi = 0;
> -	svm->vmcb->control.exit_info_1 = fault->error_code;
> -	svm->vmcb->control.exit_info_2 = fault->address;
> +	/*
> +	 * We can keep the value that the processor stored in the VMCB,
> +	 * but make up something sensible if we hit the WARN.
> +	 */
> +	if (WARN_ON(svm->vmcb->control.exit_code != SVM_EXIT_NPF)) {
> +		svm->vmcb->control.exit_code = SVM_EXIT_NPF;
> +		svm->vmcb->control.exit_code_hi = 0;
> +		svm->vmcb->control.exit_info_1 = (1ULL << 32);
> +		svm->vmcb->control.exit_info_2 = fault->address;
> +	}

Its been a while since I looked into this, but is an injected NPF exit
always the result of a real NPF exit? How about an io-port emulated on
L1 but passed through to L2 by the nested hypervisor. On emulation of
INS or OUTS, KVM would need to read/write to an L2 address space, maybe
causing NPF faults to be injected. In this case an IOIO exit would cause
an injected NPF exit for L1.


	Joerg


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/4] KVM: nSVM: propagate the NPF EXITINFO to the guest
  2014-09-02 16:33   ` Joerg Roedel
@ 2014-09-02 16:46     ` Paolo Bonzini
  2014-09-02 17:01       ` Paolo Bonzini
                         ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Paolo Bonzini @ 2014-09-02 16:46 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: linux-kernel, kvm, agraf, valentine.sinitsyn, jan.kiszka, gleb, avi

Il 02/09/2014 18:33, Joerg Roedel ha scritto:
> Ah, here you add emulation of these bits.
> 
> On Tue, Sep 02, 2014 at 05:13:48PM +0200, Paolo Bonzini wrote:
>> This is similar to what the EPT code does with the exit qualification.
>> This allows the guest to see a valid value for bits 33:32.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  arch/x86/kvm/paging_tmpl.h |  6 ++++++
>>  arch/x86/kvm/svm.c         | 26 ++++++++++++++++++++++----
>>  2 files changed, 28 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
>> index 410776528265..99d4c4e836a0 100644
>> --- a/arch/x86/kvm/paging_tmpl.h
>> +++ b/arch/x86/kvm/paging_tmpl.h
>> @@ -322,8 +322,14 @@ retry_walk:
>>  
>>  		real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn),
>>  					      PFERR_USER_MASK|PFERR_WRITE_MASK);
>> +
>> +		/*
>> +		 * Can this happen (except if the guest is playing TOCTTOU games)?
>> +		 * We should have gotten a nested page fault on table_gfn instead.
>> +		 */
> 
> Comment is true, but doesn't make the check below obsolete, no?

No, it doesn't.  I'll rewrite it as

	/*
	 * This cannot happen unless the guest is playing TOCTTOU games,
	 * because we would have gotten a nested page fault on table_gfn
	 * instead.  If this happens, the exit qualification / exit info
	 * field will incorrectly have "guest page access" as the
	 * nested page fault's cause, instead of "guest page structure
	 * access".
	 */

>>  		if (unlikely(real_gfn == UNMAPPED_GVA))
>>  			goto error;
>> @@ -1974,10 +1974,28 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
>>  {
>>  	struct vcpu_svm *svm = to_svm(vcpu);
>>  
>> -	svm->vmcb->control.exit_code = SVM_EXIT_NPF;
>> -	svm->vmcb->control.exit_code_hi = 0;
>> -	svm->vmcb->control.exit_info_1 = fault->error_code;
>> -	svm->vmcb->control.exit_info_2 = fault->address;
>> +	/*
>> +	 * We can keep the value that the processor stored in the VMCB,
>> +	 * but make up something sensible if we hit the WARN.
>> +	 */
>> +	if (WARN_ON(svm->vmcb->control.exit_code != SVM_EXIT_NPF)) {
>> +		svm->vmcb->control.exit_code = SVM_EXIT_NPF;
>> +		svm->vmcb->control.exit_code_hi = 0;
>> +		svm->vmcb->control.exit_info_1 = (1ULL << 32);
>> +		svm->vmcb->control.exit_info_2 = fault->address;
>> +	}
> 
> Its been a while since I looked into this, but is an injected NPF exit
> always the result of a real NPF exit?

I think so, but that's why I CCed you. :)

> How about an io-port emulated on
> L1 but passed through to L2 by the nested hypervisor. On emulation of
> INS or OUTS, KVM would need to read/write to an L2 address space,

It would need to read/write to *L1* (that's where the VMCB's IOIO map
lies), which could result into a regular page fault injected into L1.

Paolo

> maybe
> causing NPF faults to be injected. In this case an IOIO exit would cause
> an injected NPF exit for L1.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/4] KVM: nested x86: nested page faults fixes
  2014-09-02 16:02 ` [PATCH 0/4] KVM: nested x86: nested page faults fixes Valentine Sinitsyn
@ 2014-09-02 16:56   ` Paolo Bonzini
  0 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2014-09-02 16:56 UTC (permalink / raw)
  To: Valentine Sinitsyn, linux-kernel, kvm
  Cc: jroedel, agraf, jan.kiszka, gleb, avi

Il 02/09/2014 18:02, Valentine Sinitsyn ha scritto:
>>
> I can confirm the initial bug I observed is fixed with these patches
> (applied to 3.16.1).
> 
> All tests in kvm-unit-test's master also pass, except for ioio which is
> (probably) affected by another (unrelated) bug fixed by Jan back in June
> but not in 3.16 [1,2].
> 
> 1. http://thread.gmane.org/gmane.comp.emulators.kvm.devel/124071/
> 2. http://thread.gmane.org/gmane.comp.emulators.kvm.devel/124067/

Indeed, ioio also hangs for me in 3.16 (not in 3.17).  Thanks, I'll add
your Tested-by.

Paolo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/4] KVM: nSVM: propagate the NPF EXITINFO to the guest
  2014-09-02 16:46     ` Paolo Bonzini
@ 2014-09-02 17:01       ` Paolo Bonzini
  2014-09-02 17:01       ` Joerg Roedel
  2014-09-02 17:47       ` Avi Kivity
  2 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2014-09-02 17:01 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: linux-kernel, kvm, agraf, valentine.sinitsyn, jan.kiszka, gleb, avi

Il 02/09/2014 18:46, Paolo Bonzini ha scritto:
>> > How about an io-port emulated on
>> > L1 but passed through to L2 by the nested hypervisor. On emulation of
>> > INS or OUTS, KVM would need to read/write to an L2 address space,
> It would need to read/write to *L1* (that's where the VMCB's IOIO map
> lies), which could result into a regular page fault injected into L1.

Nevermind, the VMCB's IO bitmap is a guest physical address.  I see what
you mean now.  nested_svm_intercept_ioio would return NESTED_EXIT_HOST
and proceed to emulate the read/write.  This could indeed cause a NPF.
For now I'll remove the WARN_ON.

Paolo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/4] KVM: nSVM: propagate the NPF EXITINFO to the guest
  2014-09-02 16:46     ` Paolo Bonzini
  2014-09-02 17:01       ` Paolo Bonzini
@ 2014-09-02 17:01       ` Joerg Roedel
  2014-09-02 17:47       ` Avi Kivity
  2 siblings, 0 replies; 18+ messages in thread
From: Joerg Roedel @ 2014-09-02 17:01 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, agraf, valentine.sinitsyn, jan.kiszka, gleb, avi

On Tue, Sep 02, 2014 at 06:46:06PM +0200, Paolo Bonzini wrote:
> Il 02/09/2014 18:33, Joerg Roedel ha scritto:
> > Comment is true, but doesn't make the check below obsolete, no?
> 
> No, it doesn't.  I'll rewrite it as
> 
> 	/*
> 	 * This cannot happen unless the guest is playing TOCTTOU games,
> 	 * because we would have gotten a nested page fault on table_gfn
> 	 * instead.  If this happens, the exit qualification / exit info
> 	 * field will incorrectly have "guest page access" as the
> 	 * nested page fault's cause, instead of "guest page structure
> 	 * access".
> 	 */

Sounds good.

> > Its been a while since I looked into this, but is an injected NPF exit
> > always the result of a real NPF exit?
> 
> I think so, but that's why I CCed you. :)
> 
> > How about an io-port emulated on
> > L1 but passed through to L2 by the nested hypervisor. On emulation of
> > INS or OUTS, KVM would need to read/write to an L2 address space,
> 
> It would need to read/write to *L1* (that's where the VMCB's IOIO map
> lies), which could result into a regular page fault injected into L1.

Hmm, INS and OUTS read/write to a virtual memory location, right? So if
we have an io-port intercepted by bare-metal KVM but not by the L1
hypervisor, and the L2 guest accesses this io-port, bare-metal KVM would
get an IOIO exit it has to emulate itself (it can't inject an IOIO exit
to L1 anyway, since L1 does not intercept the io-port).

During emulation the bare-metal KVM would read/write at an L2 virtual
address, because that is the context the instruction is executed in.
And while resolving this L2 virtual address, an NPF fault could be
detected that needs to be injected into the L1 hypervisor, right?



	Joerg


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/4] KVM: nSVM: propagate the NPF EXITINFO to the guest
  2014-09-02 16:46     ` Paolo Bonzini
  2014-09-02 17:01       ` Paolo Bonzini
  2014-09-02 17:01       ` Joerg Roedel
@ 2014-09-02 17:47       ` Avi Kivity
  2 siblings, 0 replies; 18+ messages in thread
From: Avi Kivity @ 2014-09-02 17:47 UTC (permalink / raw)
  To: Paolo Bonzini, Joerg Roedel
  Cc: linux-kernel, kvm, agraf, valentine.sinitsyn, jan.kiszka, gleb, avi


On 09/02/2014 07:46 PM, Paolo Bonzini wrote:
> */
>>>   		if (unlikely(real_gfn == UNMAPPED_GVA))
>>>   			goto error;
>>> @@ -1974,10 +1974,28 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
>>>   {
>>>   	struct vcpu_svm *svm = to_svm(vcpu);
>>>   
>>> -	svm->vmcb->control.exit_code = SVM_EXIT_NPF;
>>> -	svm->vmcb->control.exit_code_hi = 0;
>>> -	svm->vmcb->control.exit_info_1 = fault->error_code;
>>> -	svm->vmcb->control.exit_info_2 = fault->address;
>>> +	/*
>>> +	 * We can keep the value that the processor stored in the VMCB,
>>> +	 * but make up something sensible if we hit the WARN.
>>> +	 */
>>> +	if (WARN_ON(svm->vmcb->control.exit_code != SVM_EXIT_NPF)) {
>>> +		svm->vmcb->control.exit_code = SVM_EXIT_NPF;
>>> +		svm->vmcb->control.exit_code_hi = 0;
>>> +		svm->vmcb->control.exit_info_1 = (1ULL << 32);
>>> +		svm->vmcb->control.exit_info_2 = fault->address;
>>> +	}
>> Its been a while since I looked into this, but is an injected NPF exit
>> always the result of a real NPF exit?
> I think so, but that's why I CCed you. :)

It could always be the result of emulation into which L0 was tricked.  I 
don't think it's a safe assumption.

>> How about an io-port emulated on
>> L1 but passed through to L2 by the nested hypervisor. On emulation of
>> INS or OUTS, KVM would need to read/write to an L2 address space,
> It would need to read/write to *L1* (that's where the VMCB's IOIO map
> lies), which could result into a regular page fault injected into L1.
>
> Paolo
>
>> maybe
>> causing NPF faults to be injected. In this case an IOIO exit would cause
>> an injected NPF exit for L1.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/4] KVM: x86: inject nested page faults on emulated instructions
  2014-09-02 15:13 ` [PATCH 3/4] KVM: x86: inject nested page faults on emulated instructions Paolo Bonzini
@ 2014-09-04  7:02   ` Gleb Natapov
  2014-09-04 14:12     ` Paolo Bonzini
  0 siblings, 1 reply; 18+ messages in thread
From: Gleb Natapov @ 2014-09-04  7:02 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, jroedel, agraf, valentine.sinitsyn, jan.kiszka, avi

On Tue, Sep 02, 2014 at 05:13:49PM +0200, Paolo Bonzini wrote:
> This is required for the following patch to work correctly.  If a nested page
> fault happens during emulation, we must inject a vmexit, not a page fault.
> Luckily we already have the required machinery: it is enough to return
> X86EMUL_INTERCEPTED instead of X86EMUL_PROPAGATE_FAULT.
> 
I wonder why this patch is needed. X86EMUL_PROPAGATE_FAULT causes
ctxt->have_exception to be set to true in x86_emulate_insn().
x86_emulate_instruction() checks ctxt->have_exception and calls
inject_emulated_exception() if it is true. inject_emulated_exception()
calls kvm_propagate_fault() where we check if the fault was nested and
generate vmexit or a page fault accordingly.

> Reported-by: Valentine Sinitsyn <valentine.sinitsyn@gmail.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/x86.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e4ed85e07a01..9e3b74c044ed 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -416,6 +416,16 @@ void kvm_propagate_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
>  		vcpu->arch.mmu.inject_page_fault(vcpu, fault);
>  }
>  
> +static inline int kvm_propagate_or_intercept(struct kvm_vcpu *vcpu,
> +					     struct x86_exception *exception)
> +{
> +	if (likely(!exception->nested_page_fault))
> +		return X86EMUL_PROPAGATE_FAULT;
> +
> +	vcpu->arch.mmu.inject_page_fault(vcpu, exception);
> +	return X86EMUL_INTERCEPTED;
> +}
> +
>  void kvm_inject_nmi(struct kvm_vcpu *vcpu)
>  {
>  	atomic_inc(&vcpu->arch.nmi_queued);
> @@ -4122,7 +4132,7 @@ static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int bytes,
>  		int ret;
>  
>  		if (gpa == UNMAPPED_GVA)
> -			return X86EMUL_PROPAGATE_FAULT;
> +			return kvm_propagate_or_intercept(vcpu, exception);
>  		ret = kvm_read_guest_page(vcpu->kvm, gpa >> PAGE_SHIFT, data,
>  					  offset, toread);
>  		if (ret < 0) {
> @@ -4152,7 +4162,7 @@ static int kvm_fetch_guest_virt(struct x86_emulate_ctxt *ctxt,
>  	gpa_t gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, addr, access|PFERR_FETCH_MASK,
>  						    exception);
>  	if (unlikely(gpa == UNMAPPED_GVA))
> -		return X86EMUL_PROPAGATE_FAULT;
> +		return kvm_propagate_or_intercept(vcpu, exception);
>  
>  	offset = addr & (PAGE_SIZE-1);
>  	if (WARN_ON(offset + bytes > PAGE_SIZE))
> @@ -4203,7 +4213,7 @@ int kvm_write_guest_virt_system(struct x86_emulate_ctxt *ctxt,
>  		int ret;
>  
>  		if (gpa == UNMAPPED_GVA)
> -			return X86EMUL_PROPAGATE_FAULT;
> +			return kvm_propagate_or_intercept(vcpu, exception);
>  		ret = kvm_write_guest(vcpu->kvm, gpa, data, towrite);
>  		if (ret < 0) {
>  			r = X86EMUL_IO_NEEDED;
> @@ -4350,7 +4360,7 @@ static int emulator_read_write_onepage(unsigned long addr, void *val,
>  	ret = vcpu_mmio_gva_to_gpa(vcpu, addr, &gpa, exception, write);
>  
>  	if (ret < 0)
> -		return X86EMUL_PROPAGATE_FAULT;
> +		return kvm_propagate_or_intercept(vcpu, exception);
>  
>  	/* For APIC access vmexit */
>  	if (ret)
> -- 
> 1.8.3.1
> 
> 

--
			Gleb.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/4] KVM: x86: inject nested page faults on emulated instructions
  2014-09-04  7:02   ` Gleb Natapov
@ 2014-09-04 14:12     ` Paolo Bonzini
  2014-09-04 15:05       ` Gleb Natapov
  0 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2014-09-04 14:12 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: linux-kernel, kvm, jroedel, agraf, valentine.sinitsyn, jan.kiszka, avi

Il 04/09/2014 09:02, Gleb Natapov ha scritto:
> On Tue, Sep 02, 2014 at 05:13:49PM +0200, Paolo Bonzini wrote:
>> > This is required for the following patch to work correctly.  If a nested page
>> > fault happens during emulation, we must inject a vmexit, not a page fault.
>> > Luckily we already have the required machinery: it is enough to return
>> > X86EMUL_INTERCEPTED instead of X86EMUL_PROPAGATE_FAULT.
>> > 
> I wonder why this patch is needed. X86EMUL_PROPAGATE_FAULT causes
> ctxt->have_exception to be set to true in x86_emulate_insn().
> x86_emulate_instruction() checks ctxt->have_exception and calls
> inject_emulated_exception() if it is true. inject_emulated_exception()
> calls kvm_propagate_fault() where we check if the fault was nested and
> generate vmexit or a page fault accordingly.

Good question. :)

If you do that, KVM gets down to the "if (writeback)" and writes the 
ctxt->eip from L2 into the L1 EIP.

Possibly this patch can be replaced by just this?

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 022513b..475e979 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5312,7 +5312,7 @@ restart:
 
        if (ctxt->have_exception) {
                inject_emulated_exception(vcpu);
-               r = EMULATE_DONE;
+               return EMULATE_DONE;
        } else if (vcpu->arch.pio.count) {
                if (!vcpu->arch.pio.in) {
                        /* FIXME: return into emulator if single-stepping.  */

But I'm not sure how to test it, and I like the idea of treating nested page
faults like other nested vmexits during emulation (which is what this patch
does).

If I included this patch, I could then remove kvm_propagate_fault
like (I think) this:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 92493e10937c..e096db566ac2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4910,9 +4902,10 @@ static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask)
 static void inject_emulated_exception(struct kvm_vcpu *vcpu)
 {
 	struct x86_emulate_ctxt *ctxt = &vcpu->arch.emulate_ctxt;
-	if (ctxt->exception.vector == PF_VECTOR)
-		kvm_propagate_fault(vcpu, &ctxt->exception);
-	else if (ctxt->exception.error_code_valid)
+	if (ctxt->exception.vector == PF_VECTOR) {
+		WARN_ON(fault->nested_page_fault);
+		vcpu->arch.walk_mmu->inject_page_fault(vcpu, fault);
+	} else if (ctxt->exception.error_code_valid)
 		kvm_queue_exception_e(vcpu, ctxt->exception.vector,
 				      ctxt->exception.error_code);
 	else

What do you think?

Paolo

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/4] KVM: x86: inject nested page faults on emulated instructions
  2014-09-04 14:12     ` Paolo Bonzini
@ 2014-09-04 15:05       ` Gleb Natapov
  2014-09-04 17:17         ` Paolo Bonzini
  2014-09-04 17:44         ` Paolo Bonzini
  0 siblings, 2 replies; 18+ messages in thread
From: Gleb Natapov @ 2014-09-04 15:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Gleb Natapov, linux-kernel, kvm, jroedel, agraf,
	valentine.sinitsyn, jan.kiszka, avi

On Thu, Sep 04, 2014 at 04:12:19PM +0200, Paolo Bonzini wrote:
> Il 04/09/2014 09:02, Gleb Natapov ha scritto:
> > On Tue, Sep 02, 2014 at 05:13:49PM +0200, Paolo Bonzini wrote:
> >> > This is required for the following patch to work correctly.  If a nested page
> >> > fault happens during emulation, we must inject a vmexit, not a page fault.
> >> > Luckily we already have the required machinery: it is enough to return
> >> > X86EMUL_INTERCEPTED instead of X86EMUL_PROPAGATE_FAULT.
> >> > 
> > I wonder why this patch is needed. X86EMUL_PROPAGATE_FAULT causes
> > ctxt->have_exception to be set to true in x86_emulate_insn().
> > x86_emulate_instruction() checks ctxt->have_exception and calls
> > inject_emulated_exception() if it is true. inject_emulated_exception()
> > calls kvm_propagate_fault() where we check if the fault was nested and
> > generate vmexit or a page fault accordingly.
> 
> Good question. :)
> 
> If you do that, KVM gets down to the "if (writeback)" and writes the 
> ctxt->eip from L2 into the L1 EIP.
Heh, that's a bummer. We should not write back if an instruction caused a vmexit.

> 
> Possibly this patch can be replaced by just this?
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 022513b..475e979 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5312,7 +5312,7 @@ restart:
>  
>         if (ctxt->have_exception) {
>                 inject_emulated_exception(vcpu);
> -               r = EMULATE_DONE;
> +               return EMULATE_DONE;
If there was no vmexit we still want to writeback. Perhaps:
    writeback = inject_emulated_exception(vcpu);
and return false if there was vmexit due to nested page fault (or any fault,
can't L1 ask for #GP/#UD intercept that need to be handled here too?)

>         } else if (vcpu->arch.pio.count) {
>                 if (!vcpu->arch.pio.in) {
>                         /* FIXME: return into emulator if single-stepping.  */
> 
> But I'm not sure how to test it, and I like the idea of treating nested page
> faults like other nested vmexits during emulation (which is what this patch
> does).
IMO exits due to instruction intercept and exits due to other interceptable events
that may happen during instruction emulation are sufficiently different to be handled
slightly different. If my assumption about #GP above are correct with current approach it
can be easily handled inside inject_emulated_exception().

--
			Gleb.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/4] KVM: x86: inject nested page faults on emulated instructions
  2014-09-04 15:05       ` Gleb Natapov
@ 2014-09-04 17:17         ` Paolo Bonzini
  2014-09-04 17:44         ` Paolo Bonzini
  1 sibling, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2014-09-04 17:17 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Gleb Natapov, linux-kernel, kvm, jroedel, agraf,
	valentine.sinitsyn, jan.kiszka, avi

Il 04/09/2014 17:05, Gleb Natapov ha scritto:
>> >         if (ctxt->have_exception) {
>> >                 inject_emulated_exception(vcpu);
>> > -               r = EMULATE_DONE;
>> > +               return EMULATE_DONE;
> If there was no vmexit we still want to writeback. Perhaps:
>     writeback = inject_emulated_exception(vcpu);
> and return false if there was vmexit due to nested page fault (or any fault,
> can't L1 ask for #GP/#UD intercept that need to be handled here too?)
> 

Sounds good.

Paolo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/4] KVM: x86: inject nested page faults on emulated instructions
  2014-09-04 15:05       ` Gleb Natapov
  2014-09-04 17:17         ` Paolo Bonzini
@ 2014-09-04 17:44         ` Paolo Bonzini
  2014-09-05  9:47           ` Gleb Natapov
  1 sibling, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2014-09-04 17:44 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Gleb Natapov, linux-kernel, kvm, jroedel, agraf,
	valentine.sinitsyn, jan.kiszka, avi

Il 04/09/2014 17:05, Gleb Natapov ha scritto:
>> > 
>> > If you do that, KVM gets down to the "if (writeback)" and writes the 
>> > ctxt->eip from L2 into the L1 EIP.
> Heh, that's a bummer. We should not write back if an instruction caused a vmexit.
> 

You're right, that works.

Paolo

-------------- 8< -------------
Subject: [PATCH] KVM: x86: skip writeback on injection of nested exception

If a nested page fault happens during emulation, we will inject a vmexit,
not a page fault.  However because writeback happens after the injection,
we will write ctxt->eip from L2 into the L1 EIP.  We do not write back
if an instruction caused an interception vmexit---do the same for page
faults.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/x86.c              | 15 ++++++++++-----
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 08cc299..c989651 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -893,7 +893,6 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault);
 int kvm_read_guest_page_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 			    gfn_t gfn, void *data, int offset, int len,
 			    u32 access);
-void kvm_propagate_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault);
 bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl);
 
 static inline int __kvm_irq_line_state(unsigned long *irq_state,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e4ed85e..3541946 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -408,12 +408,14 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
 }
 EXPORT_SYMBOL_GPL(kvm_inject_page_fault);
 
-void kvm_propagate_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
+static bool kvm_propagate_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
 {
 	if (mmu_is_nested(vcpu) && !fault->nested_page_fault)
 		vcpu->arch.nested_mmu.inject_page_fault(vcpu, fault);
 	else
 		vcpu->arch.mmu.inject_page_fault(vcpu, fault);
+
+	return fault->nested_page_fault;
 }
 
 void kvm_inject_nmi(struct kvm_vcpu *vcpu)
@@ -4929,16 +4931,18 @@ static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask)
 	}
 }
 
-static void inject_emulated_exception(struct kvm_vcpu *vcpu)
+static bool inject_emulated_exception(struct kvm_vcpu *vcpu)
 {
 	struct x86_emulate_ctxt *ctxt = &vcpu->arch.emulate_ctxt;
 	if (ctxt->exception.vector == PF_VECTOR)
-		kvm_propagate_fault(vcpu, &ctxt->exception);
-	else if (ctxt->exception.error_code_valid)
+		return kvm_propagate_fault(vcpu, &ctxt->exception);
+
+	if (ctxt->exception.error_code_valid)
 		kvm_queue_exception_e(vcpu, ctxt->exception.vector,
 				      ctxt->exception.error_code);
 	else
 		kvm_queue_exception(vcpu, ctxt->exception.vector);
+	return false;
 }
 
 static void init_emulate_ctxt(struct kvm_vcpu *vcpu)
@@ -5300,8 +5304,9 @@ restart:
 	}
 
 	if (ctxt->have_exception) {
-		inject_emulated_exception(vcpu);
 		r = EMULATE_DONE;
+		if (inject_emulated_exception(vcpu))
+			return r;
 	} else if (vcpu->arch.pio.count) {
 		if (!vcpu->arch.pio.in) {
 			/* FIXME: return into emulator if single-stepping.  */
-- 
1.9.3



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/4] KVM: x86: inject nested page faults on emulated instructions
  2014-09-04 17:44         ` Paolo Bonzini
@ 2014-09-05  9:47           ` Gleb Natapov
  0 siblings, 0 replies; 18+ messages in thread
From: Gleb Natapov @ 2014-09-05  9:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Gleb Natapov, linux-kernel, kvm, jroedel, agraf,
	valentine.sinitsyn, jan.kiszka, avi

On Thu, Sep 04, 2014 at 07:44:51PM +0200, Paolo Bonzini wrote:
> Il 04/09/2014 17:05, Gleb Natapov ha scritto:
> >> > 
> >> > If you do that, KVM gets down to the "if (writeback)" and writes the 
> >> > ctxt->eip from L2 into the L1 EIP.
> > Heh, that's a bummer. We should not write back if an instruction caused a vmexit.
> > 
> 
> You're right, that works.
Looks good!

Reviewed-by: Gleb Natapov <gleb@kernel.org>

> 
> Paolo
> 
> -------------- 8< -------------
> Subject: [PATCH] KVM: x86: skip writeback on injection of nested exception
> 
> If a nested page fault happens during emulation, we will inject a vmexit,
> not a page fault.  However because writeback happens after the injection,
> we will write ctxt->eip from L2 into the L1 EIP.  We do not write back
> if an instruction caused an interception vmexit---do the same for page
> faults.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  1 -
>  arch/x86/kvm/x86.c              | 15 ++++++++++-----
>  2 files changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 08cc299..c989651 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -893,7 +893,6 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault);
>  int kvm_read_guest_page_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>  			    gfn_t gfn, void *data, int offset, int len,
>  			    u32 access);
> -void kvm_propagate_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault);
>  bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl);
>  
>  static inline int __kvm_irq_line_state(unsigned long *irq_state,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e4ed85e..3541946 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -408,12 +408,14 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
>  }
>  EXPORT_SYMBOL_GPL(kvm_inject_page_fault);
>  
> -void kvm_propagate_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
> +static bool kvm_propagate_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
>  {
>  	if (mmu_is_nested(vcpu) && !fault->nested_page_fault)
>  		vcpu->arch.nested_mmu.inject_page_fault(vcpu, fault);
>  	else
>  		vcpu->arch.mmu.inject_page_fault(vcpu, fault);
> +
> +	return fault->nested_page_fault;
>  }
>  
>  void kvm_inject_nmi(struct kvm_vcpu *vcpu)
> @@ -4929,16 +4931,18 @@ static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask)
>  	}
>  }
>  
> -static void inject_emulated_exception(struct kvm_vcpu *vcpu)
> +static bool inject_emulated_exception(struct kvm_vcpu *vcpu)
>  {
>  	struct x86_emulate_ctxt *ctxt = &vcpu->arch.emulate_ctxt;
>  	if (ctxt->exception.vector == PF_VECTOR)
> -		kvm_propagate_fault(vcpu, &ctxt->exception);
> -	else if (ctxt->exception.error_code_valid)
> +		return kvm_propagate_fault(vcpu, &ctxt->exception);
> +
> +	if (ctxt->exception.error_code_valid)
>  		kvm_queue_exception_e(vcpu, ctxt->exception.vector,
>  				      ctxt->exception.error_code);
>  	else
>  		kvm_queue_exception(vcpu, ctxt->exception.vector);
> +	return false;
>  }
>  
>  static void init_emulate_ctxt(struct kvm_vcpu *vcpu)
> @@ -5300,8 +5304,9 @@ restart:
>  	}
>  
>  	if (ctxt->have_exception) {
> -		inject_emulated_exception(vcpu);
>  		r = EMULATE_DONE;
> +		if (inject_emulated_exception(vcpu))
> +			return r;
>  	} else if (vcpu->arch.pio.count) {
>  		if (!vcpu->arch.pio.in) {
>  			/* FIXME: return into emulator if single-stepping.  */
> -- 
> 1.9.3
> 
> 

--
			Gleb.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2014-09-05  9:47 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-02 15:13 [PATCH 0/4] KVM: nested x86: nested page faults fixes Paolo Bonzini
2014-09-02 15:13 ` [PATCH 1/4] KVM: x86: reserve bit 8 of non-leaf PDPEs and PML4Es in 64-bit mode on AMD Paolo Bonzini
2014-09-02 15:13 ` [PATCH 2/4] KVM: nSVM: propagate the NPF EXITINFO to the guest Paolo Bonzini
2014-09-02 16:33   ` Joerg Roedel
2014-09-02 16:46     ` Paolo Bonzini
2014-09-02 17:01       ` Paolo Bonzini
2014-09-02 17:01       ` Joerg Roedel
2014-09-02 17:47       ` Avi Kivity
2014-09-02 15:13 ` [PATCH 3/4] KVM: x86: inject nested page faults on emulated instructions Paolo Bonzini
2014-09-04  7:02   ` Gleb Natapov
2014-09-04 14:12     ` Paolo Bonzini
2014-09-04 15:05       ` Gleb Natapov
2014-09-04 17:17         ` Paolo Bonzini
2014-09-04 17:44         ` Paolo Bonzini
2014-09-05  9:47           ` Gleb Natapov
2014-09-02 15:13 ` [PATCH 4/4] KVM: x86: propagate exception from permission checks on the nested page fault Paolo Bonzini
2014-09-02 16:02 ` [PATCH 0/4] KVM: nested x86: nested page faults fixes Valentine Sinitsyn
2014-09-02 16:56   ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).