All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization)
@ 2010-03-03 19:12 Joerg Roedel
  2010-03-03 19:12 ` [PATCH 01/18] KVM: MMU: Check for root_level instead of long mode Joerg Roedel
                   ` (22 more replies)
  0 siblings, 23 replies; 49+ messages in thread
From: Joerg Roedel @ 2010-03-03 19:12 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: Alexander Graf, kvm, linux-kernel

Hi,

here are the patches that implement nested paging support for nested
svm. They are somewhat intrusive to the soft-mmu so I post them as RFC
in the first round to get feedback about the general direction of the
changes.  Nevertheless I am proud to report that with these patches the
famous kernel-compile benchmark runs only 4% slower in the l2 guest as
in the l1 guest when l2 is single-processor. With SMP guests the
situation is very different. The more vcpus the guest has the more is
the performance drop from l1 to l2. 
Anyway, this post is to get feedback about the overall concept of these
patches.  Please review and give feedback :-)

Thanks,

	Joerg

Diffstat:

 arch/x86/include/asm/kvm_host.h |   21 ++++++
 arch/x86/kvm/mmu.c              |  152 ++++++++++++++++++++++++++++++---------
 arch/x86/kvm/mmu.h              |    2 +
 arch/x86/kvm/paging_tmpl.h      |   81 ++++++++++++++++++---
 arch/x86/kvm/svm.c              |  126 +++++++++++++++++++++++++++-----
 arch/x86/kvm/vmx.c              |    9 +++
 arch/x86/kvm/x86.c              |   19 +++++-
 include/linux/kvm.h             |    1 +
 include/linux/kvm_host.h        |    5 ++
 9 files changed, 354 insertions(+), 62 deletions(-)

Shortlog:

Joerg Roedel (18):
      KVM: MMU: Check for root_level instead of long mode
      KVM: MMU: Make tdp_enabled a mmu-context parameter
      KVM: MMU: Make set_cr3 a function pointer in kvm_mmu
      KVM: X86: Introduce a tdp_set_cr3 function
      KVM: MMU: Introduce get_cr3 function pointer
      KVM: MMU: Introduce inject_page_fault function pointer
      KVM: SVM: Implement MMU helper functions for Nested Nested Paging
      KVM: MMU: Change init_kvm_softmmu to take a context as parameter
      KVM: MMU: Let is_rsvd_bits_set take mmu context instead of vcpu
      KVM: MMU: Introduce generic walk_addr function
      KVM: MMU: Add infrastructure for two-level page walker
      KVM: MMU: Implement nested gva_to_gpa functions
      KVM: MMU: Introduce Nested MMU context
      KVM: SVM: Initialize Nested Nested MMU context on VMRUN
      KVM: MMU: Propagate the right fault back to the guest after gva_to_gpa
      KVM: X86: Add callback to let modules decide over some supported cpuid bits
      KVM: SVM: Report Nested Paging support to userspace
      KVM: X86: Add KVM_CAP_SVM_CPUID_FIXED



^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 01/18] KVM: MMU: Check for root_level instead of long mode
  2010-03-03 19:12 [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization) Joerg Roedel
@ 2010-03-03 19:12 ` Joerg Roedel
  2010-03-03 19:12 ` [PATCH 02/18] KVM: MMU: Make tdp_enabled a mmu-context parameter Joerg Roedel
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 49+ messages in thread
From: Joerg Roedel @ 2010-03-03 19:12 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, kvm, linux-kernel, Joerg Roedel

The walk_addr function checks for !is_long_mode in its 64
bit version. But what is meant here is a check for pae
paging. Change the condition to really check for pae paging
so that it also works with nested nested paging.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/paging_tmpl.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 81eab9a..92b6bb5 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -128,7 +128,7 @@ walk:
 	walker->level = vcpu->arch.mmu.root_level;
 	pte = vcpu->arch.cr3;
 #if PTTYPE == 64
-	if (!is_long_mode(vcpu)) {
+	if (vcpu->arch.mmu.root_level == PT32E_ROOT_LEVEL) {
 		pte = kvm_pdptr_read(vcpu, (addr >> 30) & 3);
 		trace_kvm_mmu_paging_element(pte, walker->level);
 		if (!is_present_gpte(pte))
@@ -194,7 +194,7 @@ walk:
 				(PTTYPE == 64 || is_pse(vcpu))) ||
 		    ((walker->level == PT_PDPE_LEVEL) &&
 				(pte & PT_PAGE_SIZE_MASK)  &&
-				is_long_mode(vcpu))) {
+				vcpu->arch.mmu.root_level == PT64_ROOT_LEVEL)) {
 			int lvl = walker->level;
 
 			walker->gfn = gpte_to_gfn_lvl(pte, lvl);
-- 
1.7.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 02/18] KVM: MMU: Make tdp_enabled a mmu-context parameter
  2010-03-03 19:12 [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization) Joerg Roedel
  2010-03-03 19:12 ` [PATCH 01/18] KVM: MMU: Check for root_level instead of long mode Joerg Roedel
@ 2010-03-03 19:12 ` Joerg Roedel
  2010-03-08  9:17   ` Avi Kivity
  2010-03-03 19:12 ` [PATCH 03/18] KVM: MMU: Make set_cr3 a function pointer in kvm_mmu Joerg Roedel
                   ` (20 subsequent siblings)
  22 siblings, 1 reply; 49+ messages in thread
From: Joerg Roedel @ 2010-03-03 19:12 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, kvm, linux-kernel, Joerg Roedel

This patch changes the tdp_enabled flag from its global
meaning to the mmu-context. This is necessary for Nested SVM
with emulation of Nested Paging where we need an extra MMU
context to shadow the Nested Nested Page Table.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/kvm_host.h |    1 +
 arch/x86/kvm/mmu.c              |    8 +++++---
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ec891a2..e7bef19 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -254,6 +254,7 @@ struct kvm_mmu {
 	int root_level;
 	int shadow_root_level;
 	union kvm_mmu_page_role base_role;
+	bool tdp_enabled;
 
 	u64 *pae_root;
 	u64 rsvd_bits_mask[2][4];
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 741373e..5c66c99 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1812,7 +1812,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		spte |= shadow_user_mask;
 	if (level > PT_PAGE_TABLE_LEVEL)
 		spte |= PT_PAGE_SIZE_MASK;
-	if (tdp_enabled)
+	if (vcpu->arch.mmu.tdp_enabled)
 		spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
 			kvm_is_mmio_pfn(pfn));
 
@@ -2077,7 +2077,7 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
 		hpa_t root = vcpu->arch.mmu.root_hpa;
 
 		ASSERT(!VALID_PAGE(root));
-		if (tdp_enabled)
+		if (vcpu->arch.mmu.tdp_enabled)
 			direct = 1;
 		if (mmu_check_root(vcpu, root_gfn))
 			return 1;
@@ -2090,7 +2090,7 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
 		return 0;
 	}
 	direct = !is_paging(vcpu);
-	if (tdp_enabled)
+	if (vcpu->arch.mmu.tdp_enabled)
 		direct = 1;
 	for (i = 0; i < 4; ++i) {
 		hpa_t root = vcpu->arch.mmu.pae_root[i];
@@ -2397,6 +2397,7 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 	context->invlpg = nonpaging_invlpg;
 	context->shadow_root_level = kvm_x86_ops->get_tdp_level();
 	context->root_hpa = INVALID_PAGE;
+	vcpu->arch.mmu.tdp_enabled = true;
 
 	if (!is_paging(vcpu)) {
 		context->gva_to_gpa = nonpaging_gva_to_gpa;
@@ -2435,6 +2436,7 @@ static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 		r = paging32_init_context(vcpu);
 
 	vcpu->arch.mmu.base_role.glevels = vcpu->arch.mmu.root_level;
+	vcpu->arch.mmu.tdp_enabled       = false;
 
 	return r;
 }
-- 
1.7.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 03/18] KVM: MMU: Make set_cr3 a function pointer in kvm_mmu
  2010-03-03 19:12 [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization) Joerg Roedel
  2010-03-03 19:12 ` [PATCH 01/18] KVM: MMU: Check for root_level instead of long mode Joerg Roedel
  2010-03-03 19:12 ` [PATCH 02/18] KVM: MMU: Make tdp_enabled a mmu-context parameter Joerg Roedel
@ 2010-03-03 19:12 ` Joerg Roedel
  2010-03-03 19:12 ` [PATCH 04/18] KVM: X86: Introduce a tdp_set_cr3 function Joerg Roedel
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 49+ messages in thread
From: Joerg Roedel @ 2010-03-03 19:12 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, kvm, linux-kernel, Joerg Roedel

This is necessary to implement Nested Nested Paging. As a
side effect this allows some cleanups in the SVM nested
paging code.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/kvm_host.h |    1 +
 arch/x86/kvm/mmu.c              |    4 +++-
 2 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e7bef19..887a1f7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -241,6 +241,7 @@ struct kvm_pio_request {
  */
 struct kvm_mmu {
 	void (*new_cr3)(struct kvm_vcpu *vcpu);
+	void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long root);
 	int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err);
 	void (*free)(struct kvm_vcpu *vcpu);
 	gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access,
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5c66c99..ec3da11 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2398,6 +2398,7 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 	context->shadow_root_level = kvm_x86_ops->get_tdp_level();
 	context->root_hpa = INVALID_PAGE;
 	vcpu->arch.mmu.tdp_enabled = true;
+	vcpu->arch.mmu.set_cr3 = kvm_x86_ops->set_cr3;
 
 	if (!is_paging(vcpu)) {
 		context->gva_to_gpa = nonpaging_gva_to_gpa;
@@ -2437,6 +2438,7 @@ static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 
 	vcpu->arch.mmu.base_role.glevels = vcpu->arch.mmu.root_level;
 	vcpu->arch.mmu.tdp_enabled       = false;
+	vcpu->arch.mmu.set_cr3           = kvm_x86_ops->set_cr3;
 
 	return r;
 }
@@ -2482,7 +2484,7 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 	if (r)
 		goto out;
 	/* set_cr3() should ensure TLB has been flushed */
-	kvm_x86_ops->set_cr3(vcpu, vcpu->arch.mmu.root_hpa);
+	vcpu->arch.mmu.set_cr3(vcpu, vcpu->arch.mmu.root_hpa);
 out:
 	return r;
 }
-- 
1.7.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 04/18] KVM: X86: Introduce a tdp_set_cr3 function
  2010-03-03 19:12 [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization) Joerg Roedel
                   ` (2 preceding siblings ...)
  2010-03-03 19:12 ` [PATCH 03/18] KVM: MMU: Make set_cr3 a function pointer in kvm_mmu Joerg Roedel
@ 2010-03-03 19:12 ` Joerg Roedel
  2010-03-03 19:12 ` [PATCH 05/18] KVM: MMU: Introduce get_cr3 function pointer Joerg Roedel
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 49+ messages in thread
From: Joerg Roedel @ 2010-03-03 19:12 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, kvm, linux-kernel, Joerg Roedel

This patch introduces a special set_tdp_cr3 function pointer
in kvm_x86_ops which is only used for tpd enabled mmu
contexts. This allows to remove some hacks from svm code.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/kvm_host.h |    1 +
 arch/x86/kvm/mmu.c              |    2 +-
 arch/x86/kvm/svm.c              |   23 ++++++++++++++---------
 arch/x86/kvm/vmx.c              |    3 +++
 4 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 887a1f7..1bf8501 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -537,6 +537,7 @@ struct kvm_x86_ops {
 	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
 	int (*get_lpage_level)(void);
 	bool (*rdtscp_supported)(void);
+	void (*set_tdp_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3);
 
 	const struct trace_print_flags *exit_reasons_str;
 };
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ec3da11..84e3209 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2398,7 +2398,7 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 	context->shadow_root_level = kvm_x86_ops->get_tdp_level();
 	context->root_hpa = INVALID_PAGE;
 	vcpu->arch.mmu.tdp_enabled = true;
-	vcpu->arch.mmu.set_cr3 = kvm_x86_ops->set_cr3;
+	vcpu->arch.mmu.set_cr3 = kvm_x86_ops->set_tdp_cr3;
 
 	if (!is_paging(vcpu)) {
 		context->gva_to_gpa = nonpaging_gva_to_gpa;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 15b8852..a7a76f5 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3044,9 +3044,6 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 	gs_selector = kvm_read_gs();
 	ldt_selector = kvm_read_ldt();
 	svm->vmcb->save.cr2 = vcpu->arch.cr2;
-	/* required for live migration with NPT */
-	if (npt_enabled)
-		svm->vmcb->save.cr3 = vcpu->arch.cr3;
 
 	clgi();
 
@@ -3155,16 +3152,22 @@ static void svm_set_cr3(struct kvm_vcpu *vcpu, unsigned long root)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	if (npt_enabled) {
-		svm->vmcb->control.nested_cr3 = root;
-		force_new_asid(vcpu);
-		return;
-	}
-
 	svm->vmcb->save.cr3 = root;
 	force_new_asid(vcpu);
 }
 
+static void set_tdp_cr3(struct kvm_vcpu *vcpu, unsigned long root)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	svm->vmcb->control.nested_cr3 = root;
+
+	/* Also sync guest cr3 here in case we live migrate */
+	svm->vmcb->save.cr3 = vcpu->arch.cr3;
+
+	force_new_asid(vcpu);
+}
+
 static int is_disabled(void)
 {
 	u64 vm_cr;
@@ -3361,6 +3364,8 @@ static struct kvm_x86_ops svm_x86_ops = {
 	.cpuid_update = svm_cpuid_update,
 
 	.rdtscp_supported = svm_rdtscp_supported,
+
+	.set_tdp_cr3 = set_tdp_cr3,
 };
 
 static int __init svm_init(void)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ae3217d..530d14d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4212,11 +4212,14 @@ static struct kvm_x86_ops vmx_x86_ops = {
 	.get_mt_mask = vmx_get_mt_mask,
 
 	.exit_reasons_str = vmx_exit_reasons_str,
+
 	.get_lpage_level = vmx_get_lpage_level,
 
 	.cpuid_update = vmx_cpuid_update,
 
 	.rdtscp_supported = vmx_rdtscp_supported,
+
+	.set_tdp_cr3 = vmx_set_cr3,
 };
 
 static int __init vmx_init(void)
-- 
1.7.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 05/18] KVM: MMU: Introduce get_cr3 function pointer
  2010-03-03 19:12 [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization) Joerg Roedel
                   ` (3 preceding siblings ...)
  2010-03-03 19:12 ` [PATCH 04/18] KVM: X86: Introduce a tdp_set_cr3 function Joerg Roedel
@ 2010-03-03 19:12 ` Joerg Roedel
  2010-03-03 19:12 ` [PATCH 06/18] KVM: MMU: Introduce inject_page_fault " Joerg Roedel
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 49+ messages in thread
From: Joerg Roedel @ 2010-03-03 19:12 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, kvm, linux-kernel, Joerg Roedel

This function pointer in the MMU context is required to
implement Nested Nested Paging.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/kvm_host.h |    1 +
 arch/x86/kvm/mmu.c              |    9 ++++++++-
 arch/x86/kvm/paging_tmpl.h      |    4 ++--
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1bf8501..37d0145 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -242,6 +242,7 @@ struct kvm_pio_request {
 struct kvm_mmu {
 	void (*new_cr3)(struct kvm_vcpu *vcpu);
 	void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long root);
+	unsigned long (*get_cr3)(struct kvm_vcpu *vcpu);
 	int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err);
 	void (*free)(struct kvm_vcpu *vcpu);
 	gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access,
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 84e3209..189c68d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2071,7 +2071,7 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
 	int direct = 0;
 	u64 pdptr;
 
-	root_gfn = vcpu->arch.cr3 >> PAGE_SHIFT;
+	root_gfn = vcpu->arch.mmu.get_cr3(vcpu) >> PAGE_SHIFT;
 
 	if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
 		hpa_t root = vcpu->arch.mmu.root_hpa;
@@ -2254,6 +2254,11 @@ static void paging_new_cr3(struct kvm_vcpu *vcpu)
 	mmu_free_roots(vcpu);
 }
 
+static unsigned long get_cr3(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.cr3;
+}
+
 static void inject_page_fault(struct kvm_vcpu *vcpu,
 			      u64 addr,
 			      u32 err_code)
@@ -2399,6 +2404,7 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 	context->root_hpa = INVALID_PAGE;
 	vcpu->arch.mmu.tdp_enabled = true;
 	vcpu->arch.mmu.set_cr3 = kvm_x86_ops->set_tdp_cr3;
+	vcpu->arch.mmu.get_cr3 = get_cr3;
 
 	if (!is_paging(vcpu)) {
 		context->gva_to_gpa = nonpaging_gva_to_gpa;
@@ -2439,6 +2445,7 @@ static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 	vcpu->arch.mmu.base_role.glevels = vcpu->arch.mmu.root_level;
 	vcpu->arch.mmu.tdp_enabled       = false;
 	vcpu->arch.mmu.set_cr3           = kvm_x86_ops->set_cr3;
+	vcpu->arch.mmu.get_cr3           = get_cr3;
 
 	return r;
 }
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 92b6bb5..1149daa 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -126,7 +126,7 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
 				     fetch_fault);
 walk:
 	walker->level = vcpu->arch.mmu.root_level;
-	pte = vcpu->arch.cr3;
+	pte = vcpu->arch.mmu.get_cr3(vcpu);
 #if PTTYPE == 64
 	if (vcpu->arch.mmu.root_level == PT32E_ROOT_LEVEL) {
 		pte = kvm_pdptr_read(vcpu, (addr >> 30) & 3);
@@ -137,7 +137,7 @@ walk:
 	}
 #endif
 	ASSERT((!is_long_mode(vcpu) && is_pae(vcpu)) ||
-	       (vcpu->arch.cr3 & CR3_NONPAE_RESERVED_BITS) == 0);
+	       (vcpu->arch.mmu.get_cr3(vcpu) & CR3_NONPAE_RESERVED_BITS) == 0);
 
 	pt_access = ACC_ALL;
 
-- 
1.7.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 06/18] KVM: MMU: Introduce inject_page_fault function pointer
  2010-03-03 19:12 [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization) Joerg Roedel
                   ` (4 preceding siblings ...)
  2010-03-03 19:12 ` [PATCH 05/18] KVM: MMU: Introduce get_cr3 function pointer Joerg Roedel
@ 2010-03-03 19:12 ` Joerg Roedel
  2010-03-03 19:12 ` [PATCH 07/18] KVM: SVM: Implement MMU helper functions for Nested Nested Paging Joerg Roedel
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 49+ messages in thread
From: Joerg Roedel @ 2010-03-03 19:12 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, kvm, linux-kernel, Joerg Roedel

This patch introduces an inject_page_fault function pointer
into struct kvm_mmu which will be used to inject a page
fault. This will be used later when Nested Nested Paging is
implemented.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/kvm_host.h |    3 +++
 arch/x86/kvm/mmu.c              |    3 ++-
 2 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 37d0145..c0b5576 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -244,6 +244,9 @@ struct kvm_mmu {
 	void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long root);
 	unsigned long (*get_cr3)(struct kvm_vcpu *vcpu);
 	int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err);
+	void (*inject_page_fault)(struct kvm_vcpu *vcpu,
+				  unsigned long addr,
+				  u32 error_code);
 	void (*free)(struct kvm_vcpu *vcpu);
 	gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access,
 			    u32 *error);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 189c68d..8f835f1 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2263,7 +2263,7 @@ static void inject_page_fault(struct kvm_vcpu *vcpu,
 			      u64 addr,
 			      u32 err_code)
 {
-	kvm_inject_page_fault(vcpu, addr, err_code);
+	vcpu->arch.mmu.inject_page_fault(vcpu, addr, err_code);
 }
 
 static void paging_free(struct kvm_vcpu *vcpu)
@@ -2446,6 +2446,7 @@ static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 	vcpu->arch.mmu.tdp_enabled       = false;
 	vcpu->arch.mmu.set_cr3           = kvm_x86_ops->set_cr3;
 	vcpu->arch.mmu.get_cr3           = get_cr3;
+	vcpu->arch.mmu.inject_page_fault = kvm_inject_page_fault;
 
 	return r;
 }
-- 
1.7.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 07/18] KVM: SVM: Implement MMU helper functions for Nested Nested Paging
  2010-03-03 19:12 [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization) Joerg Roedel
                   ` (5 preceding siblings ...)
  2010-03-03 19:12 ` [PATCH 06/18] KVM: MMU: Introduce inject_page_fault " Joerg Roedel
@ 2010-03-03 19:12 ` Joerg Roedel
  2010-03-03 19:12 ` [PATCH 08/18] KVM: MMU: Change init_kvm_softmmu to take a context as parameter Joerg Roedel
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 49+ messages in thread
From: Joerg Roedel @ 2010-03-03 19:12 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, kvm, linux-kernel, Joerg Roedel

This patch adds the helper functions which will be used in
the mmu context for handling nested nested page faults.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/svm.c |   33 +++++++++++++++++++++++++++++++++
 1 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index a7a76f5..a6c08e0 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -91,6 +91,9 @@ struct nested_state {
 	u32 intercept_exceptions;
 	u64 intercept;
 
+	/* Nested Paging related state */
+	u64 nested_cr3;
+
 };
 
 #define MSRPM_OFFSETS	16
@@ -1545,6 +1548,36 @@ static int vmmcall_interception(struct vcpu_svm *svm)
 	return 1;
 }
 
+static unsigned long nested_svm_get_tdp_cr3(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	return svm->nested.nested_cr3;
+}
+
+static void nested_svm_set_tdp_cr3(struct kvm_vcpu *vcpu,
+				   unsigned long root)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	svm->vmcb->control.nested_cr3 = root;
+	force_new_asid(vcpu);
+}
+
+static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
+				       unsigned long addr,
+				       u32 error_code)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	svm->vmcb->control.exit_code = SVM_EXIT_NPF;
+	svm->vmcb->control.exit_code_hi = 0;
+	svm->vmcb->control.exit_info_1 = error_code;
+	svm->vmcb->control.exit_info_2 = addr;
+
+	nested_svm_vmexit(svm);
+}
+
 static int nested_svm_check_permissions(struct vcpu_svm *svm)
 {
 	if (!(svm->vcpu.arch.efer & EFER_SVME)
-- 
1.7.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 08/18] KVM: MMU: Change init_kvm_softmmu to take a context as parameter
  2010-03-03 19:12 [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization) Joerg Roedel
                   ` (6 preceding siblings ...)
  2010-03-03 19:12 ` [PATCH 07/18] KVM: SVM: Implement MMU helper functions for Nested Nested Paging Joerg Roedel
@ 2010-03-03 19:12 ` Joerg Roedel
  2010-03-03 19:12 ` [PATCH 09/18] KVM: MMU: Let is_rsvd_bits_set take mmu context instead of vcpu Joerg Roedel
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 49+ messages in thread
From: Joerg Roedel @ 2010-03-03 19:12 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, kvm, linux-kernel, Joerg Roedel

Some logic of this function is required to build the Nested
Nested Paging context. So factor the required logic into a
seperate function and export it.
Also make the whole init path suitable for more than one mmu
context.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/mmu.c |   60 ++++++++++++++++++++++++++++++---------------------
 arch/x86/kvm/mmu.h |    1 +
 2 files changed, 36 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 8f835f1..560ecb6 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2225,10 +2225,9 @@ static void nonpaging_free(struct kvm_vcpu *vcpu)
 	mmu_free_roots(vcpu);
 }
 
-static int nonpaging_init_context(struct kvm_vcpu *vcpu)
+static int nonpaging_init_context(struct kvm_vcpu *vcpu,
+				  struct kvm_mmu *context)
 {
-	struct kvm_mmu *context = &vcpu->arch.mmu;
-
 	context->new_cr3 = nonpaging_new_cr3;
 	context->page_fault = nonpaging_page_fault;
 	context->gva_to_gpa = nonpaging_gva_to_gpa;
@@ -2287,9 +2286,10 @@ static bool is_rsvd_bits_set(struct kvm_vcpu *vcpu, u64 gpte, int level)
 #include "paging_tmpl.h"
 #undef PTTYPE
 
-static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, int level)
+static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
+				  struct kvm_mmu *context,
+				  int level)
 {
-	struct kvm_mmu *context = &vcpu->arch.mmu;
 	int maxphyaddr = cpuid_maxphyaddr(vcpu);
 	u64 exb_bit_rsvd = 0;
 
@@ -2342,9 +2342,11 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, int level)
 	}
 }
 
-static int paging64_init_context_common(struct kvm_vcpu *vcpu, int level)
+static int paging64_init_context_common(struct kvm_vcpu *vcpu,
+					struct kvm_mmu *context,
+					int level)
 {
-	struct kvm_mmu *context = &vcpu->arch.mmu;
+	reset_rsvds_bits_mask(vcpu, context, level);
 
 	ASSERT(is_pae(vcpu));
 	context->new_cr3 = paging_new_cr3;
@@ -2360,17 +2362,17 @@ static int paging64_init_context_common(struct kvm_vcpu *vcpu, int level)
 	return 0;
 }
 
-static int paging64_init_context(struct kvm_vcpu *vcpu)
+static int paging64_init_context(struct kvm_vcpu *vcpu,
+				 struct kvm_mmu *context)
 {
-	reset_rsvds_bits_mask(vcpu, PT64_ROOT_LEVEL);
-	return paging64_init_context_common(vcpu, PT64_ROOT_LEVEL);
+	return paging64_init_context_common(vcpu, context, PT64_ROOT_LEVEL);
 }
 
-static int paging32_init_context(struct kvm_vcpu *vcpu)
+static int paging32_init_context(struct kvm_vcpu *vcpu,
+				 struct kvm_mmu *context)
 {
-	struct kvm_mmu *context = &vcpu->arch.mmu;
+	reset_rsvds_bits_mask(vcpu, context, PT32_ROOT_LEVEL);
 
-	reset_rsvds_bits_mask(vcpu, PT32_ROOT_LEVEL);
 	context->new_cr3 = paging_new_cr3;
 	context->page_fault = paging32_page_fault;
 	context->gva_to_gpa = paging32_gva_to_gpa;
@@ -2384,10 +2386,10 @@ static int paging32_init_context(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
-static int paging32E_init_context(struct kvm_vcpu *vcpu)
+static int paging32E_init_context(struct kvm_vcpu *vcpu,
+				  struct kvm_mmu *context)
 {
-	reset_rsvds_bits_mask(vcpu, PT32E_ROOT_LEVEL);
-	return paging64_init_context_common(vcpu, PT32E_ROOT_LEVEL);
+	return paging64_init_context_common(vcpu, context, PT32E_ROOT_LEVEL);
 }
 
 static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
@@ -2410,15 +2412,15 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 		context->gva_to_gpa = nonpaging_gva_to_gpa;
 		context->root_level = 0;
 	} else if (is_long_mode(vcpu)) {
-		reset_rsvds_bits_mask(vcpu, PT64_ROOT_LEVEL);
+		reset_rsvds_bits_mask(vcpu, context, PT64_ROOT_LEVEL);
 		context->gva_to_gpa = paging64_gva_to_gpa;
 		context->root_level = PT64_ROOT_LEVEL;
 	} else if (is_pae(vcpu)) {
-		reset_rsvds_bits_mask(vcpu, PT32E_ROOT_LEVEL);
+		reset_rsvds_bits_mask(vcpu, context, PT32E_ROOT_LEVEL);
 		context->gva_to_gpa = paging64_gva_to_gpa;
 		context->root_level = PT32E_ROOT_LEVEL;
 	} else {
-		reset_rsvds_bits_mask(vcpu, PT32_ROOT_LEVEL);
+		reset_rsvds_bits_mask(vcpu, context, PT32_ROOT_LEVEL);
 		context->gva_to_gpa = paging32_gva_to_gpa;
 		context->root_level = PT32_ROOT_LEVEL;
 	}
@@ -2426,24 +2428,32 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
-static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
+int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
 {
 	int r;
-
 	ASSERT(vcpu);
 	ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
 
 	if (!is_paging(vcpu))
-		r = nonpaging_init_context(vcpu);
+		r = nonpaging_init_context(vcpu, context);
 	else if (is_long_mode(vcpu))
-		r = paging64_init_context(vcpu);
+		r = paging64_init_context(vcpu, context);
 	else if (is_pae(vcpu))
-		r = paging32E_init_context(vcpu);
+		r = paging32E_init_context(vcpu, context);
 	else
-		r = paging32_init_context(vcpu);
+		r = paging32_init_context(vcpu, context);
 
 	vcpu->arch.mmu.base_role.glevels = vcpu->arch.mmu.root_level;
 	vcpu->arch.mmu.tdp_enabled       = false;
+
+	return r;
+}
+EXPORT_SYMBOL_GPL(kvm_init_shadow_mmu);
+
+static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
+{
+	int r = kvm_init_shadow_mmu(vcpu, &vcpu->arch.mmu);
+
 	vcpu->arch.mmu.set_cr3           = kvm_x86_ops->set_cr3;
 	vcpu->arch.mmu.get_cr3           = get_cr3;
 	vcpu->arch.mmu.inject_page_fault = kvm_inject_page_fault;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index be66759..64f619b 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -49,6 +49,7 @@
 #define PFERR_FETCH_MASK (1U << 4)
 
 int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4]);
+int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
 
 static inline void kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu)
 {
-- 
1.7.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 09/18] KVM: MMU: Let is_rsvd_bits_set take mmu context instead of vcpu
  2010-03-03 19:12 [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization) Joerg Roedel
                   ` (7 preceding siblings ...)
  2010-03-03 19:12 ` [PATCH 08/18] KVM: MMU: Change init_kvm_softmmu to take a context as parameter Joerg Roedel
@ 2010-03-03 19:12 ` Joerg Roedel
  2010-03-03 19:12 ` [PATCH 10/18] KVM: MMU: Introduce generic walk_addr function Joerg Roedel
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 49+ messages in thread
From: Joerg Roedel @ 2010-03-03 19:12 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, kvm, linux-kernel, Joerg Roedel

This patch changes is_rsvd_bits_set() function prototype to
take only a kvm_mmu context instead of a full vcpu.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/mmu.c         |    4 ++--
 arch/x86/kvm/paging_tmpl.h |    3 ++-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 560ecb6..647353d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2270,12 +2270,12 @@ static void paging_free(struct kvm_vcpu *vcpu)
 	nonpaging_free(vcpu);
 }
 
-static bool is_rsvd_bits_set(struct kvm_vcpu *vcpu, u64 gpte, int level)
+static bool is_rsvd_bits_set(struct kvm_mmu *mmu, u64 gpte, int level)
 {
 	int bit7;
 
 	bit7 = (gpte >> 7) & 1;
-	return (gpte & vcpu->arch.mmu.rsvd_bits_mask[bit7][level-1]) != 0;
+	return (gpte & mmu->rsvd_bits_mask[bit7][level-1]) != 0;
 }
 
 #define PTTYPE 64
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 1149daa..8608439 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -158,7 +158,8 @@ walk:
 		if (!is_present_gpte(pte))
 			goto not_present;
 
-		rsvd_fault = is_rsvd_bits_set(vcpu, pte, walker->level);
+		rsvd_fault = is_rsvd_bits_set(&vcpu->arch.mmu, pte,
+					      walker->level);
 		if (rsvd_fault)
 			goto access_error;
 
-- 
1.7.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 10/18] KVM: MMU: Introduce generic walk_addr function
  2010-03-03 19:12 [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization) Joerg Roedel
                   ` (8 preceding siblings ...)
  2010-03-03 19:12 ` [PATCH 09/18] KVM: MMU: Let is_rsvd_bits_set take mmu context instead of vcpu Joerg Roedel
@ 2010-03-03 19:12 ` Joerg Roedel
  2010-03-03 19:12 ` [PATCH 11/18] KVM: MMU: Add infrastructure for two-level page walker Joerg Roedel
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 49+ messages in thread
From: Joerg Roedel @ 2010-03-03 19:12 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, kvm, linux-kernel, Joerg Roedel

This is the first patch in the series towards a generic
walk_addr implementation which could walk two-dimensional
page tables in the end. In this first step the walk_addr
function is renamed into walk_addr_generic which takes an
mmu context as an additional parameter.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/paging_tmpl.h |   30 ++++++++++++++++++++----------
 1 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 8608439..6c55a31 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -112,9 +112,10 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte)
 /*
  * Fetch a guest pte for a guest virtual address
  */
-static int FNAME(walk_addr)(struct guest_walker *walker,
-			    struct kvm_vcpu *vcpu, gva_t addr,
-			    int write_fault, int user_fault, int fetch_fault)
+static int FNAME(walk_addr_generic)(struct guest_walker *walker,
+				    struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+				    gva_t addr, int write_fault,
+				    int user_fault, int fetch_fault)
 {
 	pt_element_t pte;
 	gfn_t table_gfn;
@@ -125,10 +126,12 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
 	trace_kvm_mmu_pagetable_walk(addr, write_fault, user_fault,
 				     fetch_fault);
 walk:
-	walker->level = vcpu->arch.mmu.root_level;
-	pte = vcpu->arch.mmu.get_cr3(vcpu);
+
+	walker->level = mmu->root_level;
+	pte =           mmu->get_cr3(vcpu);
+
 #if PTTYPE == 64
-	if (vcpu->arch.mmu.root_level == PT32E_ROOT_LEVEL) {
+	if (walker->level == PT32E_ROOT_LEVEL) {
 		pte = kvm_pdptr_read(vcpu, (addr >> 30) & 3);
 		trace_kvm_mmu_paging_element(pte, walker->level);
 		if (!is_present_gpte(pte))
@@ -137,7 +140,7 @@ walk:
 	}
 #endif
 	ASSERT((!is_long_mode(vcpu) && is_pae(vcpu)) ||
-	       (vcpu->arch.mmu.get_cr3(vcpu) & CR3_NONPAE_RESERVED_BITS) == 0);
+	       (mmu->get_cr3(vcpu) & CR3_NONPAE_RESERVED_BITS) == 0);
 
 	pt_access = ACC_ALL;
 
@@ -158,8 +161,7 @@ walk:
 		if (!is_present_gpte(pte))
 			goto not_present;
 
-		rsvd_fault = is_rsvd_bits_set(&vcpu->arch.mmu, pte,
-					      walker->level);
+		rsvd_fault = is_rsvd_bits_set(mmu, pte, walker->level);
 		if (rsvd_fault)
 			goto access_error;
 
@@ -195,7 +197,7 @@ walk:
 				(PTTYPE == 64 || is_pse(vcpu))) ||
 		    ((walker->level == PT_PDPE_LEVEL) &&
 				(pte & PT_PAGE_SIZE_MASK)  &&
-				vcpu->arch.mmu.root_level == PT64_ROOT_LEVEL)) {
+				mmu->root_level == PT64_ROOT_LEVEL)) {
 			int lvl = walker->level;
 
 			walker->gfn = gpte_to_gfn_lvl(pte, lvl);
@@ -253,6 +255,14 @@ err:
 	return 0;
 }
 
+static int FNAME(walk_addr)(struct guest_walker *walker,
+			    struct kvm_vcpu *vcpu, gva_t addr,
+			    int write_fault, int user_fault, int fetch_fault)
+{
+	return FNAME(walk_addr_generic)(walker, vcpu, &vcpu->arch.mmu, addr,
+				        write_fault, user_fault, fetch_fault);
+}
+
 static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page,
 			      u64 *spte, const void *pte)
 {
-- 
1.7.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 11/18] KVM: MMU: Add infrastructure for two-level page walker
  2010-03-03 19:12 [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization) Joerg Roedel
                   ` (9 preceding siblings ...)
  2010-03-03 19:12 ` [PATCH 10/18] KVM: MMU: Introduce generic walk_addr function Joerg Roedel
@ 2010-03-03 19:12 ` Joerg Roedel
  2010-03-08  9:37   ` Avi Kivity
  2010-03-03 19:12 ` [PATCH 12/18] KVM: MMU: Implement nested gva_to_gpa functions Joerg Roedel
                   ` (11 subsequent siblings)
  22 siblings, 1 reply; 49+ messages in thread
From: Joerg Roedel @ 2010-03-03 19:12 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, kvm, linux-kernel, Joerg Roedel

This patch introduces a mmu-callback to translate gpa
addresses in the walk_addr code. This is later used to
translate l2_gpa addresses into l1_gpa addresses.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/kvm_host.h |    1 +
 arch/x86/kvm/mmu.c              |    7 +++++++
 arch/x86/kvm/paging_tmpl.h      |   19 +++++++++++++++++++
 include/linux/kvm_host.h        |    5 +++++
 4 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c0b5576..76c8b5f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -250,6 +250,7 @@ struct kvm_mmu {
 	void (*free)(struct kvm_vcpu *vcpu);
 	gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access,
 			    u32 *error);
+	gpa_t (*translate_gpa)(struct kvm_vcpu *vcpu, gpa_t gpa, u32 *error);
 	void (*prefetch_page)(struct kvm_vcpu *vcpu,
 			      struct kvm_mmu_page *page);
 	int (*sync_page)(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 647353d..ec3830c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2149,6 +2149,11 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
 	spin_unlock(&vcpu->kvm->mmu_lock);
 }
 
+static gpa_t translate_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 *error)
+{
+	return gpa;
+}
+
 static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t vaddr,
 				  u32 access, u32 *error)
 {
@@ -2399,6 +2404,7 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 	context->new_cr3 = nonpaging_new_cr3;
 	context->page_fault = tdp_page_fault;
 	context->free = nonpaging_free;
+	context->translate_gpa = translate_gpa;
 	context->prefetch_page = nonpaging_prefetch_page;
 	context->sync_page = nonpaging_sync_page;
 	context->invlpg = nonpaging_invlpg;
@@ -2443,6 +2449,7 @@ int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
 	else
 		r = paging32_init_context(vcpu, context);
 
+	vcpu->arch.mmu.translate_gpa = translate_gpa;
 	vcpu->arch.mmu.base_role.glevels = vcpu->arch.mmu.root_level;
 	vcpu->arch.mmu.tdp_enabled       = false;
 
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 6c55a31..a72d5ea 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -122,6 +122,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 	unsigned index, pt_access, pte_access;
 	gpa_t pte_gpa;
 	int rsvd_fault = 0;
+	u32 error;
 
 	trace_kvm_mmu_pagetable_walk(addr, write_fault, user_fault,
 				     fetch_fault);
@@ -150,6 +151,15 @@ walk:
 		table_gfn = gpte_to_gfn(pte);
 		pte_gpa = gfn_to_gpa(table_gfn);
 		pte_gpa += index * sizeof(pt_element_t);
+
+		pte_gpa = mmu->translate_gpa(vcpu, pte_gpa, &error);
+		if (pte_gpa == UNMAPPED_GVA) {
+			walker->error_code = error;
+			return 0;
+		}
+		/* pte_gpa might have changed - recalculate table_gfn */
+		table_gfn = gpa_to_gfn(pte_gpa);
+
 		walker->table_gfn[walker->level - 1] = table_gfn;
 		walker->pte_gpa[walker->level - 1] = pte_gpa;
 
@@ -209,6 +219,15 @@ walk:
 			    is_cpuid_PSE36())
 				walker->gfn += pse36_gfn_delta(pte);
 
+			/* Do the final translation */
+			pte_gpa = gfn_to_gpa(walker->gfn);
+			pte_gpa = mmu->translate_gpa(vcpu, pte_gpa, &error);
+			if (pte_gpa == UNMAPPED_GVA) {
+				walker->error_code = error;
+				return 0;
+			}
+			walker->gfn = gpa_to_gfn(pte_gpa);
+
 			break;
 		}
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a3fd0f9..ef2e81a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -503,6 +503,11 @@ static inline gpa_t gfn_to_gpa(gfn_t gfn)
 	return (gpa_t)gfn << PAGE_SHIFT;
 }
 
+static inline gfn_t gpa_to_gfn(gpa_t gpa)
+{
+	return (gfn_t)gpa >> PAGE_SHIFT;
+}
+
 static inline hpa_t pfn_to_hpa(pfn_t pfn)
 {
 	return (hpa_t)pfn << PAGE_SHIFT;
-- 
1.7.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 12/18] KVM: MMU: Implement nested gva_to_gpa functions
  2010-03-03 19:12 [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization) Joerg Roedel
                   ` (10 preceding siblings ...)
  2010-03-03 19:12 ` [PATCH 11/18] KVM: MMU: Add infrastructure for two-level page walker Joerg Roedel
@ 2010-03-03 19:12 ` Joerg Roedel
  2010-03-03 19:12 ` [PATCH 13/18] KVM: MMU: Introduce Nested MMU context Joerg Roedel
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 49+ messages in thread
From: Joerg Roedel @ 2010-03-03 19:12 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, kvm, linux-kernel, Joerg Roedel

This patch adds the functions to do a nested l2_gva to
l1_gpa page table walk.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/kvm_host.h |    3 +++
 arch/x86/kvm/mmu.c              |    8 ++++++++
 arch/x86/kvm/paging_tmpl.h      |   31 +++++++++++++++++++++++++++++++
 3 files changed, 42 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 76c8b5f..20dd1ce 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -295,6 +295,9 @@ struct kvm_vcpu_arch {
 	bool tpr_access_reporting;
 
 	struct kvm_mmu mmu;
+
+	struct kvm_mmu nested_mmu;
+
 	/* only needed in kvm_pv_mmu_op() path, but it's hot so
 	 * put it here to avoid allocation */
 	struct kvm_pv_mmu_op_buffer mmu_op_buffer;
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ec3830c..c831955 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2162,6 +2162,14 @@ static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t vaddr,
 	return vaddr;
 }
 
+static gpa_t nonpaging_gva_to_gpa_nested(struct kvm_vcpu *vcpu, gva_t vaddr,
+					 u32 access, u32 *error)
+{
+	if (error)
+		*error = 0;
+	return vcpu->arch.nested_mmu.translate_gpa(vcpu, vaddr, error);
+}
+
 static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
 				u32 error_code)
 {
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index a72d5ea..c0158d8 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -282,6 +282,16 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
 				        write_fault, user_fault, fetch_fault);
 }
 
+static int FNAME(walk_addr_nested)(struct guest_walker *walker,
+				   struct kvm_vcpu *vcpu, gva_t addr,
+				   int write_fault, int user_fault,
+				   int fetch_fault)
+{
+	return FNAME(walk_addr_generic)(walker, vcpu, &vcpu->arch.nested_mmu,
+					addr, write_fault, user_fault,
+					fetch_fault);
+}
+
 static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page,
 			      u64 *spte, const void *pte)
 {
@@ -541,6 +551,27 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr, u32 access,
 	return gpa;
 }
 
+static gpa_t FNAME(gva_to_gpa_nested)(struct kvm_vcpu *vcpu, gva_t vaddr,
+				      u32 access, u32 *error)
+{
+	struct guest_walker walker;
+	gpa_t gpa = UNMAPPED_GVA;
+	int r;
+
+	r = FNAME(walk_addr_nested)(&walker, vcpu, vaddr,
+				    !!(access & PFERR_WRITE_MASK),
+				    !!(access & PFERR_USER_MASK),
+				    !!(access & PFERR_FETCH_MASK));
+
+	if (r) {
+		gpa = gfn_to_gpa(walker.gfn);
+		gpa |= vaddr & ~PAGE_MASK;
+	} else if (error)
+		*error = walker.error_code;
+
+	return gpa;
+}
+
 static void FNAME(prefetch_page)(struct kvm_vcpu *vcpu,
 				 struct kvm_mmu_page *sp)
 {
-- 
1.7.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 13/18] KVM: MMU: Introduce Nested MMU context
  2010-03-03 19:12 [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization) Joerg Roedel
                   ` (11 preceding siblings ...)
  2010-03-03 19:12 ` [PATCH 12/18] KVM: MMU: Implement nested gva_to_gpa functions Joerg Roedel
@ 2010-03-03 19:12 ` Joerg Roedel
  2010-03-03 19:12 ` [PATCH 14/18] KVM: SVM: Initialize Nested Nested MMU context on VMRUN Joerg Roedel
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 49+ messages in thread
From: Joerg Roedel @ 2010-03-03 19:12 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, kvm, linux-kernel, Joerg Roedel

This patch introduces a second MMU context which will hold
the paging information for the l2 guest.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/kvm_host.h |    8 ++++++
 arch/x86/kvm/mmu.c              |   48 ++++++++++++++++++++++++++++++++++++++-
 2 files changed, 55 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 20dd1ce..66a698e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -264,6 +264,13 @@ struct kvm_mmu {
 
 	u64 *pae_root;
 	u64 rsvd_bits_mask[2][4];
+
+	/*
+	 * If true the mmu runs in two-level mode.
+	 * vcpu->arch.nested_mmu needs to contain meaningful
+	 * values then.
+	 */
+	bool nested;
 };
 
 struct kvm_vcpu_arch {
@@ -296,6 +303,7 @@ struct kvm_vcpu_arch {
 
 	struct kvm_mmu mmu;
 
+	/* This will hold the mmu context of the second level guest */
 	struct kvm_mmu nested_mmu;
 
 	/* only needed in kvm_pv_mmu_op() path, but it's hot so
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c831955..ccaf6b1 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2154,6 +2154,18 @@ static gpa_t translate_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 *error)
 	return gpa;
 }
 
+static gpa_t translate_nested_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 *error)
+{
+	u32 access;
+
+	BUG_ON(!vcpu->arch.mmu.nested);
+
+	/* NPT walks are treated as user writes */
+	access = PFERR_WRITE_MASK | PFERR_USER_MASK;
+
+	return vcpu->arch.nested_mmu.gva_to_gpa(vcpu, gpa, access, error);
+}
+
 static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t vaddr,
 				  u32 access, u32 *error)
 {
@@ -2476,11 +2488,45 @@ static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 	return r;
 }
 
+static int init_kvm_nested_mmu(struct kvm_vcpu *vcpu)
+{
+	struct kvm_mmu *g_context = &vcpu->arch.nested_mmu;
+	struct kvm_mmu *h_context = &vcpu->arch.mmu;
+
+	g_context->get_cr3           = get_cr3;
+	g_context->translate_gpa     = translate_nested_gpa;
+	g_context->inject_page_fault = kvm_inject_page_fault;
+
+	/*
+	 * Note that arch.mmu.gva_to_gpa translates l2_gva to l1_gpa. The
+	 * translation of l2_gpa to l1_gpa addresses is done using the
+	 * arch.nested_mmu.gva_to_gpa function. Basically the gva_to_gpa
+	 * functions between mmu and nested_mmu are swapped.
+	 */
+	if (!is_paging(vcpu)) {
+		g_context->root_level = 0;
+		h_context->gva_to_gpa = nonpaging_gva_to_gpa_nested;
+	} else if (is_long_mode(vcpu)) {
+		g_context->root_level = PT64_ROOT_LEVEL;
+		h_context->gva_to_gpa = paging64_gva_to_gpa_nested;
+	} else if (is_pae(vcpu)) {
+		g_context->root_level = PT32E_ROOT_LEVEL;
+		h_context->gva_to_gpa = paging64_gva_to_gpa_nested;
+	} else {
+		g_context->root_level = PT32_ROOT_LEVEL;
+		h_context->gva_to_gpa = paging32_gva_to_gpa_nested;
+	}
+
+	return 0;
+}
+
 static int init_kvm_mmu(struct kvm_vcpu *vcpu)
 {
 	vcpu->arch.update_pte.pfn = bad_pfn;
 
-	if (tdp_enabled)
+	if (vcpu->arch.mmu.nested)
+		return init_kvm_nested_mmu(vcpu);
+	else if (tdp_enabled)
 		return init_kvm_tdp_mmu(vcpu);
 	else
 		return init_kvm_softmmu(vcpu);
-- 
1.7.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 14/18] KVM: SVM: Initialize Nested Nested MMU context on VMRUN
  2010-03-03 19:12 [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization) Joerg Roedel
                   ` (12 preceding siblings ...)
  2010-03-03 19:12 ` [PATCH 13/18] KVM: MMU: Introduce Nested MMU context Joerg Roedel
@ 2010-03-03 19:12 ` Joerg Roedel
  2010-03-03 19:12 ` [PATCH 15/18] KVM: MMU: Propagate the right fault back to the guest after gva_to_gpa Joerg Roedel
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 49+ messages in thread
From: Joerg Roedel @ 2010-03-03 19:12 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, kvm, linux-kernel, Joerg Roedel

This patch adds code to initialize the Nested Nested Paging
MMU context when the L1 guest executes a VMRUN instruction
and has nested paging enabled in its VMCB.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/mmu.c |    1 +
 arch/x86/kvm/svm.c |   56 ++++++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 47 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ccaf6b1..b929d84 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2573,6 +2573,7 @@ void kvm_mmu_unload(struct kvm_vcpu *vcpu)
 {
 	mmu_free_roots(vcpu);
 }
+EXPORT_SYMBOL_GPL(kvm_mmu_unload);
 
 static void mmu_pte_write_zap_pte(struct kvm_vcpu *vcpu,
 				  struct kvm_mmu_page *sp,
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index a6c08e0..bce10fe 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -93,7 +93,6 @@ struct nested_state {
 
 	/* Nested Paging related state */
 	u64 nested_cr3;
-
 };
 
 #define MSRPM_OFFSETS	16
@@ -282,6 +281,15 @@ static inline void flush_guest_tlb(struct kvm_vcpu *vcpu)
 	force_new_asid(vcpu);
 }
 
+static int get_npt_level(void)
+{
+#ifdef CONFIG_X86_64
+	return PT64_ROOT_LEVEL;
+#else
+	return PT32E_ROOT_LEVEL;
+#endif
+}
+
 static void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 {
 	if (!npt_enabled && !(efer & EFER_LMA))
@@ -1578,6 +1586,27 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
 	nested_svm_vmexit(svm);
 }
 
+static int nested_svm_init_mmu_context(struct kvm_vcpu *vcpu)
+{
+	int r;
+
+	r = kvm_init_shadow_mmu(vcpu, &vcpu->arch.mmu);
+
+	vcpu->arch.mmu.set_cr3           = nested_svm_set_tdp_cr3;
+	vcpu->arch.mmu.get_cr3           = nested_svm_get_tdp_cr3;
+	vcpu->arch.mmu.inject_page_fault = nested_svm_inject_npf_exit;
+	vcpu->arch.mmu.shadow_root_level = get_npt_level();
+	vcpu->arch.nested_mmu.gva_to_gpa = vcpu->arch.mmu.gva_to_gpa;
+	vcpu->arch.mmu.nested            = true;
+
+	return r;
+}
+
+static void nested_svm_uninit_mmu_context(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.mmu.nested = false;
+}
+
 static int nested_svm_check_permissions(struct vcpu_svm *svm)
 {
 	if (!(svm->vcpu.arch.efer & EFER_SVME)
@@ -1942,6 +1971,8 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
 	kvm_clear_exception_queue(&svm->vcpu);
 	kvm_clear_interrupt_queue(&svm->vcpu);
 
+	svm->nested.nested_cr3 = 0;
+
 	/* Restore selected save entries */
 	svm->vmcb->save.es = hsave->save.es;
 	svm->vmcb->save.cs = hsave->save.cs;
@@ -1968,6 +1999,7 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
 
 	nested_svm_unmap(page);
 
+	nested_svm_uninit_mmu_context(&svm->vcpu);
 	kvm_mmu_reset_context(&svm->vcpu);
 	kvm_mmu_load(&svm->vcpu);
 
@@ -2021,6 +2053,13 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
 	if (!nested_vmcb)
 		return false;
 
+	/* Do check if nested paging is allowed for the guest */
+	if (nested_vmcb->control.nested_ctl && !npt_enabled) {
+		nested_vmcb->control.exit_code = SVM_EXIT_ERR;
+		nested_svm_unmap(page);
+		return false;
+	}
+
 	trace_kvm_nested_vmrun(svm->vmcb->save.rip - 3, vmcb_gpa,
 			       nested_vmcb->save.rip,
 			       nested_vmcb->control.int_ctl,
@@ -2065,6 +2104,12 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
 	else
 		svm->vcpu.arch.hflags &= ~HF_HIF_MASK;
 
+	if (nested_vmcb->control.nested_ctl) {
+		kvm_mmu_unload(&svm->vcpu);
+		svm->nested.nested_cr3 = nested_vmcb->control.nested_cr3;
+		nested_svm_init_mmu_context(&svm->vcpu);
+	}
+
 	/* Load the nested guest state */
 	svm->vmcb->save.es = nested_vmcb->save.es;
 	svm->vmcb->save.cs = nested_vmcb->save.cs;
@@ -3233,15 +3278,6 @@ static bool svm_cpu_has_accelerated_tpr(void)
 	return false;
 }
 
-static int get_npt_level(void)
-{
-#ifdef CONFIG_X86_64
-	return PT64_ROOT_LEVEL;
-#else
-	return PT32E_ROOT_LEVEL;
-#endif
-}
-
 static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 {
 	return 0;
-- 
1.7.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 15/18] KVM: MMU: Propagate the right fault back to the guest after gva_to_gpa
  2010-03-03 19:12 [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization) Joerg Roedel
                   ` (13 preceding siblings ...)
  2010-03-03 19:12 ` [PATCH 14/18] KVM: SVM: Initialize Nested Nested MMU context on VMRUN Joerg Roedel
@ 2010-03-03 19:12 ` Joerg Roedel
  2010-03-15  4:30   ` Daniel K.
  2010-03-15  7:36   ` Avi Kivity
  2010-03-03 19:12 ` [PATCH 16/18] KVM: X86: Add callback to let modules decide over some supported cpuid bits Joerg Roedel
                   ` (7 subsequent siblings)
  22 siblings, 2 replies; 49+ messages in thread
From: Joerg Roedel @ 2010-03-03 19:12 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, kvm, linux-kernel, Joerg Roedel

This patch implements logic to make sure that either a
page-fault/page-fault-vmexit or a nested-page-fault-vmexit
is propagated back to the guest.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/mmu.h         |    1 +
 arch/x86/kvm/paging_tmpl.h |    2 ++
 arch/x86/kvm/x86.c         |   15 ++++++++++++++-
 3 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 64f619b..b42b27e 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -47,6 +47,7 @@
 #define PFERR_USER_MASK (1U << 2)
 #define PFERR_RSVD_MASK (1U << 3)
 #define PFERR_FETCH_MASK (1U << 4)
+#define PFERR_NESTED_MASK (1U << 31)
 
 int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4]);
 int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index c0158d8..9fc5fb1 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -154,6 +154,7 @@ walk:
 
 		pte_gpa = mmu->translate_gpa(vcpu, pte_gpa, &error);
 		if (pte_gpa == UNMAPPED_GVA) {
+			error |= PFERR_NESTED_MASK;
 			walker->error_code = error;
 			return 0;
 		}
@@ -223,6 +224,7 @@ walk:
 			pte_gpa = gfn_to_gpa(walker->gfn);
 			pte_gpa = mmu->translate_gpa(vcpu, pte_gpa, &error);
 			if (pte_gpa == UNMAPPED_GVA) {
+				error |= PFERR_NESTED_MASK;
 				walker->error_code = error;
 				return 0;
 			}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2883ce8..9f8b02d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -314,6 +314,19 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 	kvm_queue_exception_e(vcpu, PF_VECTOR, error_code);
 }
 
+void kvm_propagate_fault(struct kvm_vcpu *vcpu, unsigned long addr, u32 error_code)
+{
+	u32 nested, error;
+
+	nested = error_code &  PFERR_NESTED_MASK;
+	error  = error_code & ~PFERR_NESTED_MASK;
+
+	if (vcpu->arch.mmu.nested && !(error_code && PFERR_NESTED_MASK))
+		vcpu->arch.nested_mmu.inject_page_fault(vcpu, addr, error);
+	else
+		vcpu->arch.mmu.inject_page_fault(vcpu, addr, error);
+}
+
 void kvm_inject_nmi(struct kvm_vcpu *vcpu)
 {
 	vcpu->arch.nmi_pending = 1;
@@ -3546,7 +3559,7 @@ static int pio_copy_data(struct kvm_vcpu *vcpu)
 		ret = kvm_read_guest_virt(q, p, bytes, vcpu, &error_code);
 
 	if (ret == X86EMUL_PROPAGATE_FAULT)
-		kvm_inject_page_fault(vcpu, q, error_code);
+		kvm_propagate_fault(vcpu, q, error_code);
 
 	return ret;
 }
-- 
1.7.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 16/18] KVM: X86: Add callback to let modules decide over some supported cpuid bits
  2010-03-03 19:12 [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization) Joerg Roedel
                   ` (14 preceding siblings ...)
  2010-03-03 19:12 ` [PATCH 15/18] KVM: MMU: Propagate the right fault back to the guest after gva_to_gpa Joerg Roedel
@ 2010-03-03 19:12 ` Joerg Roedel
  2010-03-03 19:12 ` [PATCH 17/18] KVM: SVM: Report Nested Paging support to userspace Joerg Roedel
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 49+ messages in thread
From: Joerg Roedel @ 2010-03-03 19:12 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, kvm, linux-kernel, Joerg Roedel

This patch adds the get_supported_cpuid callback to
kvm_x86_ops. It will be used in do_cpuid_ent to delegate the
decission about some supported cpuid bits to the
architecture modules.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/kvm_host.h |    2 ++
 arch/x86/kvm/svm.c              |    6 ++++++
 arch/x86/kvm/vmx.c              |    6 ++++++
 arch/x86/kvm/x86.c              |    3 +++
 4 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 66a698e..7d649f9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -555,6 +555,8 @@ struct kvm_x86_ops {
 	bool (*rdtscp_supported)(void);
 	void (*set_tdp_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3);
 
+	void (*set_supported_cpuid)(u32 func, struct kvm_cpuid_entry2 *entry);
+
 	const struct trace_print_flags *exit_reasons_str;
 };
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index bce10fe..fe1398e 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3287,6 +3287,10 @@ static void svm_cpuid_update(struct kvm_vcpu *vcpu)
 {
 }
 
+static void svm_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
+{
+}
+
 static const struct trace_print_flags svm_exit_reasons_str[] = {
 	{ SVM_EXIT_READ_CR0,			"read_cr0" },
 	{ SVM_EXIT_READ_CR3,			"read_cr3" },
@@ -3435,6 +3439,8 @@ static struct kvm_x86_ops svm_x86_ops = {
 	.rdtscp_supported = svm_rdtscp_supported,
 
 	.set_tdp_cr3 = set_tdp_cr3,
+
+	.set_supported_cpuid = svm_set_supported_cpuid,
 };
 
 static int __init svm_init(void)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 530d14d..9216867 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4146,6 +4146,10 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
 	}
 }
 
+static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
+{
+}
+
 static struct kvm_x86_ops vmx_x86_ops = {
 	.cpu_has_kvm_support = cpu_has_kvm_support,
 	.disabled_by_bios = vmx_disabled_by_bios,
@@ -4220,6 +4224,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
 	.rdtscp_supported = vmx_rdtscp_supported,
 
 	.set_tdp_cr3 = vmx_set_cr3,
+
+	.set_supported_cpuid = vmx_set_supported_cpuid,
 };
 
 static int __init vmx_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9f8b02d..53360de 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1921,6 +1921,9 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 		entry->ecx &= kvm_supported_word6_x86_features;
 		break;
 	}
+
+	kvm_x86_ops->set_supported_cpuid(function, entry);
+
 	put_cpu();
 }
 
-- 
1.7.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 17/18] KVM: SVM: Report Nested Paging support to userspace
  2010-03-03 19:12 [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization) Joerg Roedel
                   ` (15 preceding siblings ...)
  2010-03-03 19:12 ` [PATCH 16/18] KVM: X86: Add callback to let modules decide over some supported cpuid bits Joerg Roedel
@ 2010-03-03 19:12 ` Joerg Roedel
  2010-03-03 23:37   ` Alexander Graf
  2010-03-03 19:12 ` [PATCH 18/18] KVM: X86: Add KVM_CAP_SVM_CPUID_FIXED Joerg Roedel
                   ` (5 subsequent siblings)
  22 siblings, 1 reply; 49+ messages in thread
From: Joerg Roedel @ 2010-03-03 19:12 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, kvm, linux-kernel, Joerg Roedel

This patch implements the reporting of the nested paging
feature support to userspace.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/svm.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index fe1398e..ce71023 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3289,6 +3289,16 @@ static void svm_cpuid_update(struct kvm_vcpu *vcpu)
 
 static void svm_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
 {
+	switch (func) {
+	case 0x8000000A:
+		if (!npt_enabled)
+			break;
+
+		/* NPT feature is supported by Nested SVM */
+		entry->edx = SVM_FEATURE_NPT;
+
+		break;
+	}
 }
 
 static const struct trace_print_flags svm_exit_reasons_str[] = {
-- 
1.7.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 18/18] KVM: X86: Add KVM_CAP_SVM_CPUID_FIXED
  2010-03-03 19:12 [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization) Joerg Roedel
                   ` (16 preceding siblings ...)
  2010-03-03 19:12 ` [PATCH 17/18] KVM: SVM: Report Nested Paging support to userspace Joerg Roedel
@ 2010-03-03 19:12 ` Joerg Roedel
  2010-03-08  9:39   ` Avi Kivity
  2010-03-03 23:10 ` [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization) Jan Kiszka
                   ` (4 subsequent siblings)
  22 siblings, 1 reply; 49+ messages in thread
From: Joerg Roedel @ 2010-03-03 19:12 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, kvm, linux-kernel, Joerg Roedel

This capability shows userspace that is can trust the values
of cpuid[0x8000000A] that it gets from the kernel. Old
behavior was to just return the host cpuid values which is
broken because all additional svm-features need support in
the svm emulation code.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/x86.c  |    1 +
 include/linux/kvm.h |    1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 53360de..51bad08 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1557,6 +1557,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_PCI_SEGMENT:
 	case KVM_CAP_DEBUGREGS:
 	case KVM_CAP_X86_ROBUST_SINGLESTEP:
+	case KVM_CAP_SVM_CPUID_FIXED:
 		r = 1;
 		break;
 	case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index ce28767..86caf32 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -507,6 +507,7 @@ struct kvm_ioeventfd {
 #define KVM_CAP_DEBUGREGS 50
 #endif
 #define KVM_CAP_X86_ROBUST_SINGLESTEP 51
+#define KVM_CAP_SVM_CPUID_FIXED 52
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
1.7.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization)
  2010-03-03 19:12 [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization) Joerg Roedel
                   ` (17 preceding siblings ...)
  2010-03-03 19:12 ` [PATCH 18/18] KVM: X86: Add KVM_CAP_SVM_CPUID_FIXED Joerg Roedel
@ 2010-03-03 23:10 ` Jan Kiszka
  2010-03-03 23:44 ` Alexander Graf
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 49+ messages in thread
From: Jan Kiszka @ 2010-03-03 23:10 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Avi Kivity, Marcelo Tosatti, Alexander Graf, kvm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2526 bytes --]

Joerg Roedel wrote:
> Hi,
> 
> here are the patches that implement nested paging support for nested
> svm. They are somewhat intrusive to the soft-mmu so I post them as RFC
> in the first round to get feedback about the general direction of the
> changes.  Nevertheless I am proud to report that with these patches the
> famous kernel-compile benchmark runs only 4% slower in the l2 guest as
> in the l1 guest when l2 is single-processor. With SMP guests the

Wow!

Jan

> situation is very different. The more vcpus the guest has the more is
> the performance drop from l1 to l2. 
> Anyway, this post is to get feedback about the overall concept of these
> patches.  Please review and give feedback :-)
> 
> Thanks,
> 
> 	Joerg
> 
> Diffstat:
> 
>  arch/x86/include/asm/kvm_host.h |   21 ++++++
>  arch/x86/kvm/mmu.c              |  152 ++++++++++++++++++++++++++++++---------
>  arch/x86/kvm/mmu.h              |    2 +
>  arch/x86/kvm/paging_tmpl.h      |   81 ++++++++++++++++++---
>  arch/x86/kvm/svm.c              |  126 +++++++++++++++++++++++++++-----
>  arch/x86/kvm/vmx.c              |    9 +++
>  arch/x86/kvm/x86.c              |   19 +++++-
>  include/linux/kvm.h             |    1 +
>  include/linux/kvm_host.h        |    5 ++
>  9 files changed, 354 insertions(+), 62 deletions(-)
> 
> Shortlog:
> 
> Joerg Roedel (18):
>       KVM: MMU: Check for root_level instead of long mode
>       KVM: MMU: Make tdp_enabled a mmu-context parameter
>       KVM: MMU: Make set_cr3 a function pointer in kvm_mmu
>       KVM: X86: Introduce a tdp_set_cr3 function
>       KVM: MMU: Introduce get_cr3 function pointer
>       KVM: MMU: Introduce inject_page_fault function pointer
>       KVM: SVM: Implement MMU helper functions for Nested Nested Paging
>       KVM: MMU: Change init_kvm_softmmu to take a context as parameter
>       KVM: MMU: Let is_rsvd_bits_set take mmu context instead of vcpu
>       KVM: MMU: Introduce generic walk_addr function
>       KVM: MMU: Add infrastructure for two-level page walker
>       KVM: MMU: Implement nested gva_to_gpa functions
>       KVM: MMU: Introduce Nested MMU context
>       KVM: SVM: Initialize Nested Nested MMU context on VMRUN
>       KVM: MMU: Propagate the right fault back to the guest after gva_to_gpa
>       KVM: X86: Add callback to let modules decide over some supported cpuid bits
>       KVM: SVM: Report Nested Paging support to userspace
>       KVM: X86: Add KVM_CAP_SVM_CPUID_FIXED
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 17/18] KVM: SVM: Report Nested Paging support to userspace
  2010-03-03 19:12 ` [PATCH 17/18] KVM: SVM: Report Nested Paging support to userspace Joerg Roedel
@ 2010-03-03 23:37   ` Alexander Graf
  2010-03-04 11:27     ` Joerg Roedel
  0 siblings, 1 reply; 49+ messages in thread
From: Alexander Graf @ 2010-03-03 23:37 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Avi Kivity, Marcelo Tosatti, kvm, linux-kernel


On 03.03.2010, at 20:12, Joerg Roedel wrote:

> This patch implements the reporting of the nested paging
> feature support to userspace.
> 
> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
> ---
> arch/x86/kvm/svm.c |   10 ++++++++++
> 1 files changed, 10 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index fe1398e..ce71023 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -3289,6 +3289,16 @@ static void svm_cpuid_update(struct kvm_vcpu *vcpu)
> 
> static void svm_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
> {
> +	switch (func) {
> +	case 0x8000000A:
> +		if (!npt_enabled)
> +			break;

if (!nested)
  break;

Alex

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization)
  2010-03-03 19:12 [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization) Joerg Roedel
                   ` (18 preceding siblings ...)
  2010-03-03 23:10 ` [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization) Jan Kiszka
@ 2010-03-03 23:44 ` Alexander Graf
  2010-03-04 11:29   ` Joerg Roedel
  2010-03-04  0:35 ` Anthony Liguori
                   ` (2 subsequent siblings)
  22 siblings, 1 reply; 49+ messages in thread
From: Alexander Graf @ 2010-03-03 23:44 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Avi Kivity, Marcelo Tosatti, kvm, linux-kernel


On 03.03.2010, at 20:12, Joerg Roedel wrote:

> Hi,
> 
> here are the patches that implement nested paging support for nested
> svm. They are somewhat intrusive to the soft-mmu so I post them as RFC
> in the first round to get feedback about the general direction of the
> changes.  Nevertheless I am proud to report that with these patches the
> famous kernel-compile benchmark runs only 4% slower in the l2 guest as
> in the l1 guest when l2 is single-processor. With SMP guests the
> situation is very different. The more vcpus the guest has the more is
> the performance drop from l1 to l2. 
> Anyway, this post is to get feedback about the overall concept of these
> patches.  Please review and give feedback :-)

Nice job! It's great to see you finally got around to it :-).

Have you tracked what slows down SMP l2 guests yet? So far I've been assuming that IPIs just completely kill the performance, but I guess it shouldn't be that bad, especially now where you have sped up the #VMEXIT path that much.


Alex

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization)
  2010-03-03 19:12 [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization) Joerg Roedel
                   ` (19 preceding siblings ...)
  2010-03-03 23:44 ` Alexander Graf
@ 2010-03-04  0:35 ` Anthony Liguori
  2010-03-04 14:42 ` Marcelo Tosatti
  2010-03-08  9:41 ` Avi Kivity
  22 siblings, 0 replies; 49+ messages in thread
From: Anthony Liguori @ 2010-03-04  0:35 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Avi Kivity, Marcelo Tosatti, Alexander Graf, kvm, linux-kernel

On 03/03/2010 01:12 PM, Joerg Roedel wrote:
> Hi,
>
> here are the patches that implement nested paging support for nested
> svm. They are somewhat intrusive to the soft-mmu so I post them as RFC
> in the first round to get feedback about the general direction of the
> changes.  Nevertheless I am proud to report that with these patches the
> famous kernel-compile benchmark runs only 4% slower in the l2 guest as
> in the l1 guest when l2 is single-processor.

That's an awesome result.

Regards,

Anthony Liguori


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 17/18] KVM: SVM: Report Nested Paging support to userspace
  2010-03-03 23:37   ` Alexander Graf
@ 2010-03-04 11:27     ` Joerg Roedel
  0 siblings, 0 replies; 49+ messages in thread
From: Joerg Roedel @ 2010-03-04 11:27 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Avi Kivity, Marcelo Tosatti, kvm, linux-kernel

On Thu, Mar 04, 2010 at 12:37:42AM +0100, Alexander Graf wrote:
> 
> On 03.03.2010, at 20:12, Joerg Roedel wrote:
> 
> > This patch implements the reporting of the nested paging
> > feature support to userspace.
> > 
> > Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
> > ---
> > arch/x86/kvm/svm.c |   10 ++++++++++
> > 1 files changed, 10 insertions(+), 0 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index fe1398e..ce71023 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -3289,6 +3289,16 @@ static void svm_cpuid_update(struct kvm_vcpu *vcpu)
> > 
> > static void svm_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
> > {
> > +	switch (func) {
> > +	case 0x8000000A:
> > +		if (!npt_enabled)
> > +			break;
> 
> if (!nested)
>   break;

True, but shouldn't matter much because if nested is off the guest will
not see the svm bit. It would only see that the processor could do
nested paging if it had svm support ;-)

	Joerg



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization)
  2010-03-03 23:44 ` Alexander Graf
@ 2010-03-04 11:29   ` Joerg Roedel
  0 siblings, 0 replies; 49+ messages in thread
From: Joerg Roedel @ 2010-03-04 11:29 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Avi Kivity, Marcelo Tosatti, kvm, linux-kernel

On Thu, Mar 04, 2010 at 12:44:48AM +0100, Alexander Graf wrote:
> 
> On 03.03.2010, at 20:12, Joerg Roedel wrote:
> 
> > Hi,
> > 
> > here are the patches that implement nested paging support for nested
> > svm. They are somewhat intrusive to the soft-mmu so I post them as RFC
> > in the first round to get feedback about the general direction of the
> > changes.  Nevertheless I am proud to report that with these patches the
> > famous kernel-compile benchmark runs only 4% slower in the l2 guest as
> > in the l1 guest when l2 is single-processor. With SMP guests the
> > situation is very different. The more vcpus the guest has the more is
> > the performance drop from l1 to l2. 
> > Anyway, this post is to get feedback about the overall concept of these
> > patches.  Please review and give feedback :-)
> 
> Nice job! It's great to see you finally got around to it :-).
> 
> Have you tracked what slows down SMP l2 guests yet? So far I've been
> assuming that IPIs just completely kill the performance, but I guess
> it shouldn't be that bad, especially now where you have sped up the
> #VMEXIT path that much.

I have not yet looked deeper into this issue. I also suspect lockholder
preemption to be the cause for this. I did the test with a
populated nested page table too and the slowdown is still there. But
thats all guessing, I need to do some research for the exact reasons.

	Joerg



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization)
  2010-03-03 19:12 [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization) Joerg Roedel
                   ` (20 preceding siblings ...)
  2010-03-04  0:35 ` Anthony Liguori
@ 2010-03-04 14:42 ` Marcelo Tosatti
  2010-03-04 15:58   ` Joerg Roedel
  2010-03-08  9:41 ` Avi Kivity
  22 siblings, 1 reply; 49+ messages in thread
From: Marcelo Tosatti @ 2010-03-04 14:42 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Avi Kivity, Alexander Graf, kvm, linux-kernel

On Wed, Mar 03, 2010 at 08:12:03PM +0100, Joerg Roedel wrote:
> Hi,
> 
> here are the patches that implement nested paging support for nested
> svm. They are somewhat intrusive to the soft-mmu so I post them as RFC
> in the first round to get feedback about the general direction of the
> changes.  Nevertheless I am proud to report that with these patches the
> famous kernel-compile benchmark runs only 4% slower in the l2 guest as
> in the l1 guest when l2 is single-processor. With SMP guests the
> situation is very different. The more vcpus the guest has the more is
> the performance drop from l1 to l2. 
> Anyway, this post is to get feedback about the overall concept of these
> patches.  Please review and give feedback :-)

Joerg,

What perf gain does this bring ? (i'm not aware of the current
overhead).

Overall comments:

Can't you translate l2_gpa -> l1_gpa walking the current l1 nested
pagetable, and pass that to the kvm tdp fault path (with the correct
context setup)?

You probably need to include a flag in base_role to differentiate
between l1 / l2 shadow tables (say if they use the same cr3 value).


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization)
  2010-03-04 14:42 ` Marcelo Tosatti
@ 2010-03-04 15:58   ` Joerg Roedel
  2010-03-11 20:58     ` Marcelo Tosatti
  2010-03-12  7:41     ` Avi Kivity
  0 siblings, 2 replies; 49+ messages in thread
From: Joerg Roedel @ 2010-03-04 15:58 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, Alexander Graf, kvm, linux-kernel

On Thu, Mar 04, 2010 at 11:42:55AM -0300, Marcelo Tosatti wrote:
> On Wed, Mar 03, 2010 at 08:12:03PM +0100, Joerg Roedel wrote:
> > Hi,
> > 
> > here are the patches that implement nested paging support for nested
> > svm. They are somewhat intrusive to the soft-mmu so I post them as RFC
> > in the first round to get feedback about the general direction of the
> > changes.  Nevertheless I am proud to report that with these patches the
> > famous kernel-compile benchmark runs only 4% slower in the l2 guest as
> > in the l1 guest when l2 is single-processor. With SMP guests the
> > situation is very different. The more vcpus the guest has the more is
> > the performance drop from l1 to l2. 
> > Anyway, this post is to get feedback about the overall concept of these
> > patches.  Please review and give feedback :-)
> 
> Joerg,
> 
> What perf gain does this bring ? (i'm not aware of the current
> overhead).

The benchmark was an allnoconfig kernel compile in tmpfs which took with
the same guest image:

as l1-guest with npt:
		
		2m23s

as l2-guest with l1(nested)-l2(shadow):
	
		around 8-9 minutes

as l2-guest with l1(nested)-l2(shadow) without the recent msrpm
optimization:

		around 19 minutes

as l2-guest with l1(nested)-l2(nested) [this patchset]:

		2m25s-2m30s

> Overall comments:
> 
> Can't you translate l2_gpa -> l1_gpa walking the current l1 nested
> pagetable, and pass that to the kvm tdp fault path (with the correct
> context setup)?

If I understand your suggestion correctly, I think thats exactly whats
done in the patches. Some words about the design:

For nested-nested we need to shadow the l1-nested-ptable on the host.
This is done using the vcpu->arch.mmu context which holds the l1 paging
modes while the l2 is running. On a npt-fault from the l2 we just
instrument the shadow-ptable code. This is the common case. because it
happens all the time while the l2 is running.

The other thing is that vcpu->arch.mmu.gva_to_gpa is expected to still
work and translate virtual addresses of the l2 into physical addresses
of the l1 (so it can be accessed with kvm functions).

To do this we need to be aware of the L2 paging mode. It is stored in
vcpu->arch.nested_mmu context. This context is only used for gva_to_gpa
translations. It is not used to build shadow page tables or anything
else. Thats the reason only the parts necessary for gva_to_gpa
translations of the nested_mmu context are initialized.

Since we can not use mmu.gva_to_gpa to translate only between l2_gpa and
l1_gpa because this function is required to translate l2_gva to l1_gpa
by other parts of kvm, the function which does this translation is moved
to nested_mmu.gva_to_gpa. So basically the gva_to_gpa function pointers
are swapped between mmu and nested_mmu.

The nested_mmu.gva_to_gpa function is used in translate_gpa_nested which
is assigned to the newly introduced translate_gpa callback of nested_mmu
context.

This callback is used in the walk_addr function to translate every
l2_gpa address we read from cr3 or the guest ptes into l1_gpa to read
the next step from the guest memory.

In the old unnested case the translate_gpa callback would point to a
function which just returns the gpa it is passed to it unmodified. The
walk_addr function is generalized and now there are basically two
versions of it:

	* walk_addr which translates using vcpu->arch.mmu context
	* walk_addr_nested which translates using vcpu->arch.nested_mmu
	  context

Thats pretty much how these patches work.

> You probably need to include a flag in base_role to differentiate
> between l1 / l2 shadow tables (say if they use the same cr3 value).

Not sure if this is necessary. It may be necessary when large pages come
into play. Otherwise the host npt pages are distinguished by the shadow
npt pages by the direct-flag.

	Joerg



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 02/18] KVM: MMU: Make tdp_enabled a mmu-context parameter
  2010-03-03 19:12 ` [PATCH 02/18] KVM: MMU: Make tdp_enabled a mmu-context parameter Joerg Roedel
@ 2010-03-08  9:17   ` Avi Kivity
  2010-03-10 14:44     ` Joerg Roedel
  0 siblings, 1 reply; 49+ messages in thread
From: Avi Kivity @ 2010-03-08  9:17 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Marcelo Tosatti, Alexander Graf, kvm, linux-kernel

On 03/03/2010 09:12 PM, Joerg Roedel wrote:
> This patch changes the tdp_enabled flag from its global
> meaning to the mmu-context. This is necessary for Nested SVM
> with emulation of Nested Paging where we need an extra MMU
> context to shadow the Nested Nested Page Table.
>
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index ec891a2..e7bef19 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -254,6 +254,7 @@ struct kvm_mmu {
>   	int root_level;
>   	int shadow_root_level;
>   	union kvm_mmu_page_role base_role;
> +	bool tdp_enabled;
>
>    

This needs a different name, since the old one is still around.  Perhaps 
we could call it parent_mmu and make it a kvm_mmu pointer.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 11/18] KVM: MMU: Add infrastructure for two-level page walker
  2010-03-03 19:12 ` [PATCH 11/18] KVM: MMU: Add infrastructure for two-level page walker Joerg Roedel
@ 2010-03-08  9:37   ` Avi Kivity
  2010-03-10 14:46     ` Joerg Roedel
  0 siblings, 1 reply; 49+ messages in thread
From: Avi Kivity @ 2010-03-08  9:37 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Marcelo Tosatti, Alexander Graf, kvm, linux-kernel

On 03/03/2010 09:12 PM, Joerg Roedel wrote:
> This patch introduces a mmu-callback to translate gpa
> addresses in the walk_addr code. This is later used to
> translate l2_gpa addresses into l1_gpa addresses.
>
> Signed-off-by: Joerg Roedel<joerg.roedel@amd.com>
> ---
>   arch/x86/include/asm/kvm_host.h |    1 +
>   arch/x86/kvm/mmu.c              |    7 +++++++
>   arch/x86/kvm/paging_tmpl.h      |   19 +++++++++++++++++++
>   include/linux/kvm_host.h        |    5 +++++
>   4 files changed, 32 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index c0b5576..76c8b5f 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -250,6 +250,7 @@ struct kvm_mmu {
>   	void (*free)(struct kvm_vcpu *vcpu);
>   	gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access,
>   			    u32 *error);
> +	gpa_t (*translate_gpa)(struct kvm_vcpu *vcpu, gpa_t gpa, u32 *error);
>   	void (*prefetch_page)(struct kvm_vcpu *vcpu,
>   			      struct kvm_mmu_page *page);
>   	int (*sync_page)(struct kvm_vcpu *vcpu,
>    

I think placing this here means we will miss a few translations, namely 
when we do a physical access (say, reading PDPTEs or similar).

We need to do this on the level of kvm_read_guest() so we capture 
physical accesses:

kvm_read_guest_virt
   -> walk_addr
      -> kvm_read_guest_tdp
          -> kvm_read_guest_virt
             -> walk_addr
                 -> kvm_read_guest_tdp
                      -> kvm_read_guest

Of course, not all accesses will use kvm_read_guest_tdp; for example 
kvmclock accesses should still go untranslated.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 18/18] KVM: X86: Add KVM_CAP_SVM_CPUID_FIXED
  2010-03-03 19:12 ` [PATCH 18/18] KVM: X86: Add KVM_CAP_SVM_CPUID_FIXED Joerg Roedel
@ 2010-03-08  9:39   ` Avi Kivity
  2010-03-10 14:46     ` Joerg Roedel
  0 siblings, 1 reply; 49+ messages in thread
From: Avi Kivity @ 2010-03-08  9:39 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Marcelo Tosatti, Alexander Graf, kvm, linux-kernel

On 03/03/2010 09:12 PM, Joerg Roedel wrote:
> This capability shows userspace that is can trust the values
> of cpuid[0x8000000A] that it gets from the kernel. Old
> behavior was to just return the host cpuid values which is
> broken because all additional svm-features need support in
> the svm emulation code.
>
>    

A think we can simply fix the bug and push the fix to the various stable 
queues.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization)
  2010-03-03 19:12 [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization) Joerg Roedel
                   ` (21 preceding siblings ...)
  2010-03-04 14:42 ` Marcelo Tosatti
@ 2010-03-08  9:41 ` Avi Kivity
  22 siblings, 0 replies; 49+ messages in thread
From: Avi Kivity @ 2010-03-08  9:41 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Marcelo Tosatti, Alexander Graf, kvm, linux-kernel

On 03/03/2010 09:12 PM, Joerg Roedel wrote:
> Hi,
>
> here are the patches that implement nested paging support for nested
> svm. They are somewhat intrusive to the soft-mmu so I post them as RFC
> in the first round to get feedback about the general direction of the
> changes.  Nevertheless I am proud to report that with these patches the
> famous kernel-compile benchmark runs only 4% slower in the l2 guest as
> in the l1 guest when l2 is single-processor. With SMP guests the
> situation is very different. The more vcpus the guest has the more is
> the performance drop from l1 to l2.
> Anyway, this post is to get feedback about the overall concept of these
> patches.  Please review and give feedback :-)
>
> Thanks,
>
> 	Joerg
>
> Diffstat:
>
>   arch/x86/include/asm/kvm_host.h |   21 ++++++
>   arch/x86/kvm/mmu.c              |  152 ++++++++++++++++++++++++++++++---------
>   arch/x86/kvm/mmu.h              |    2 +
>   arch/x86/kvm/paging_tmpl.h      |   81 ++++++++++++++++++---
>   arch/x86/kvm/svm.c              |  126 +++++++++++++++++++++++++++-----
>   arch/x86/kvm/vmx.c              |    9 +++
>   arch/x86/kvm/x86.c              |   19 +++++-
>   include/linux/kvm.h             |    1 +
>   include/linux/kvm_host.h        |    5 ++
>   9 files changed, 354 insertions(+), 62 deletions(-)
>    

Okay, this looks excellent overall, it's nice to see how well this fits 
with the existing mmu infrastructure (only ~300 lines added).  The 
performance results are impressive.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 02/18] KVM: MMU: Make tdp_enabled a mmu-context parameter
  2010-03-08  9:17   ` Avi Kivity
@ 2010-03-10 14:44     ` Joerg Roedel
  2010-03-10 14:53       ` Avi Kivity
  0 siblings, 1 reply; 49+ messages in thread
From: Joerg Roedel @ 2010-03-10 14:44 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Alexander Graf, kvm, linux-kernel

On Mon, Mar 08, 2010 at 11:17:41AM +0200, Avi Kivity wrote:
> On 03/03/2010 09:12 PM, Joerg Roedel wrote:
> >This patch changes the tdp_enabled flag from its global
> >meaning to the mmu-context. This is necessary for Nested SVM
> >with emulation of Nested Paging where we need an extra MMU
> >context to shadow the Nested Nested Page Table.
> >
> >
> >diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >index ec891a2..e7bef19 100644
> >--- a/arch/x86/include/asm/kvm_host.h
> >+++ b/arch/x86/include/asm/kvm_host.h
> >@@ -254,6 +254,7 @@ struct kvm_mmu {
> >  	int root_level;
> >  	int shadow_root_level;
> >  	union kvm_mmu_page_role base_role;
> >+	bool tdp_enabled;
> >
> 
> This needs a different name, since the old one is still around.
> Perhaps we could call it parent_mmu and make it a kvm_mmu pointer.

Hmm, how about renaming the global tdp_enabled variable to tdp_usable?
The global variable indicates if tdp is _usable_ and we can _enable_ it
for a mmu context.

	Joerg



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 11/18] KVM: MMU: Add infrastructure for two-level page walker
  2010-03-08  9:37   ` Avi Kivity
@ 2010-03-10 14:46     ` Joerg Roedel
  0 siblings, 0 replies; 49+ messages in thread
From: Joerg Roedel @ 2010-03-10 14:46 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Alexander Graf, kvm, linux-kernel

On Mon, Mar 08, 2010 at 11:37:22AM +0200, Avi Kivity wrote:
> On 03/03/2010 09:12 PM, Joerg Roedel wrote:
> >This patch introduces a mmu-callback to translate gpa
> >addresses in the walk_addr code. This is later used to
> >translate l2_gpa addresses into l1_gpa addresses.
> >
> >Signed-off-by: Joerg Roedel<joerg.roedel@amd.com>
> >---
> >  arch/x86/include/asm/kvm_host.h |    1 +
> >  arch/x86/kvm/mmu.c              |    7 +++++++
> >  arch/x86/kvm/paging_tmpl.h      |   19 +++++++++++++++++++
> >  include/linux/kvm_host.h        |    5 +++++
> >  4 files changed, 32 insertions(+), 0 deletions(-)
> >
> >diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >index c0b5576..76c8b5f 100644
> >--- a/arch/x86/include/asm/kvm_host.h
> >+++ b/arch/x86/include/asm/kvm_host.h
> >@@ -250,6 +250,7 @@ struct kvm_mmu {
> >  	void (*free)(struct kvm_vcpu *vcpu);
> >  	gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access,
> >  			    u32 *error);
> >+	gpa_t (*translate_gpa)(struct kvm_vcpu *vcpu, gpa_t gpa, u32 *error);
> >  	void (*prefetch_page)(struct kvm_vcpu *vcpu,
> >  			      struct kvm_mmu_page *page);
> >  	int (*sync_page)(struct kvm_vcpu *vcpu,
> 
> I think placing this here means we will miss a few translations,
> namely when we do a physical access (say, reading PDPTEs or
> similar).
> 
> We need to do this on the level of kvm_read_guest() so we capture
> physical accesses:
> 
> kvm_read_guest_virt
>   -> walk_addr
>      -> kvm_read_guest_tdp
>          -> kvm_read_guest_virt
>             -> walk_addr
>                 -> kvm_read_guest_tdp
>                      -> kvm_read_guest
> 
> Of course, not all accesses will use kvm_read_guest_tdp; for example
> kvmclock accesses should still go untranslated.

Ok, doing the translation in kvm_read_guest is certainly the more
generic approach. I already fixed a bug related to loading l2 pdptr
pointers. Doing the translation in kvm_read_guest makes the code a lot
nicer.

	Joerg



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 18/18] KVM: X86: Add KVM_CAP_SVM_CPUID_FIXED
  2010-03-08  9:39   ` Avi Kivity
@ 2010-03-10 14:46     ` Joerg Roedel
  0 siblings, 0 replies; 49+ messages in thread
From: Joerg Roedel @ 2010-03-10 14:46 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Alexander Graf, kvm, linux-kernel

On Mon, Mar 08, 2010 at 11:39:31AM +0200, Avi Kivity wrote:
> On 03/03/2010 09:12 PM, Joerg Roedel wrote:
> >This capability shows userspace that is can trust the values
> >of cpuid[0x8000000A] that it gets from the kernel. Old
> >behavior was to just return the host cpuid values which is
> >broken because all additional svm-features need support in
> >the svm emulation code.
> >
> 
> A think we can simply fix the bug and push the fix to the various
> stable queues.

Ok, sounds good too. I have some more fixes queued up and send this one
together with them.

	Joerg



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 02/18] KVM: MMU: Make tdp_enabled a mmu-context parameter
  2010-03-10 14:44     ` Joerg Roedel
@ 2010-03-10 14:53       ` Avi Kivity
  2010-03-10 15:26         ` Joerg Roedel
  0 siblings, 1 reply; 49+ messages in thread
From: Avi Kivity @ 2010-03-10 14:53 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Marcelo Tosatti, Alexander Graf, kvm, linux-kernel

On 03/10/2010 04:44 PM, Joerg Roedel wrote:
> On Mon, Mar 08, 2010 at 11:17:41AM +0200, Avi Kivity wrote:
>    
>> On 03/03/2010 09:12 PM, Joerg Roedel wrote:
>>      
>>> This patch changes the tdp_enabled flag from its global
>>> meaning to the mmu-context. This is necessary for Nested SVM
>>> with emulation of Nested Paging where we need an extra MMU
>>> context to shadow the Nested Nested Page Table.
>>>
>>>
>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>> index ec891a2..e7bef19 100644
>>> --- a/arch/x86/include/asm/kvm_host.h
>>> +++ b/arch/x86/include/asm/kvm_host.h
>>> @@ -254,6 +254,7 @@ struct kvm_mmu {
>>>   	int root_level;
>>>   	int shadow_root_level;
>>>   	union kvm_mmu_page_role base_role;
>>> +	bool tdp_enabled;
>>>
>>>        
>> This needs a different name, since the old one is still around.
>> Perhaps we could call it parent_mmu and make it a kvm_mmu pointer.
>>      
> Hmm, how about renaming the global tdp_enabled variable to tdp_usable?
> The global variable indicates if tdp is _usable_ and we can _enable_ it
> for a mmu context.
>    

I think of the global flags as host tdp, and the mmu as guest tdp (but 
maybe this is wrong?).  If that makes sense, the naming should reflect that.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 02/18] KVM: MMU: Make tdp_enabled a mmu-context parameter
  2010-03-10 14:53       ` Avi Kivity
@ 2010-03-10 15:26         ` Joerg Roedel
  2010-03-11  6:47           ` Avi Kivity
  0 siblings, 1 reply; 49+ messages in thread
From: Joerg Roedel @ 2010-03-10 15:26 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Alexander Graf, kvm, linux-kernel

On Wed, Mar 10, 2010 at 04:53:29PM +0200, Avi Kivity wrote:
> On 03/10/2010 04:44 PM, Joerg Roedel wrote:
> >On Mon, Mar 08, 2010 at 11:17:41AM +0200, Avi Kivity wrote:
> >>On 03/03/2010 09:12 PM, Joerg Roedel wrote:
> >>>This patch changes the tdp_enabled flag from its global
> >>>meaning to the mmu-context. This is necessary for Nested SVM
> >>>with emulation of Nested Paging where we need an extra MMU
> >>>context to shadow the Nested Nested Page Table.
> >>>
> >>>
> >>>diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >>>index ec891a2..e7bef19 100644
> >>>--- a/arch/x86/include/asm/kvm_host.h
> >>>+++ b/arch/x86/include/asm/kvm_host.h
> >>>@@ -254,6 +254,7 @@ struct kvm_mmu {
> >>>  	int root_level;
> >>>  	int shadow_root_level;
> >>>  	union kvm_mmu_page_role base_role;
> >>>+	bool tdp_enabled;
> >>>
> >>This needs a different name, since the old one is still around.
> >>Perhaps we could call it parent_mmu and make it a kvm_mmu pointer.
> >Hmm, how about renaming the global tdp_enabled variable to tdp_usable?
> >The global variable indicates if tdp is _usable_ and we can _enable_ it
> >for a mmu context.
> 
> I think of the global flags as host tdp, and the mmu as guest tdp
> (but maybe this is wrong?).  If that makes sense, the naming should
> reflect that.

The basic flow of the mmu state with npt-npt is:

	1. As long as the L1 is running the arch.mmu context is in tdp
	   mode and builds a direct-mapped page table.
	
	2. When vmrun is emulated and the nested vmcb enables nested
	   paging, arch.mmu is switched to a shadow-mmu mode which now
	   shadows the l1 nested page table.
	   So when the l2-guest runs with nested paging the
	   arch.mmu.tdp_enabled variable on the host is false.
	
	3. On a vmexit emulation the mmu is switched back to tdp
	   handling state.

So the mmu.tdp_enabled parameter is about tdp being enabled for the
mmu context (so mmu.tdp_enabled means that we build a l1-direct-mapped
page table when true or shadow a l1-page-table when false). Thats why I
think the 'tdp_enabled' name makes sense in the mmu-context.
The global flag only shows if an mmu-context could be in tdp-state. So
tdp_usable may be a good name for it.

	Joerg



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 02/18] KVM: MMU: Make tdp_enabled a mmu-context parameter
  2010-03-10 15:26         ` Joerg Roedel
@ 2010-03-11  6:47           ` Avi Kivity
  2010-03-11 10:33             ` Joerg Roedel
  0 siblings, 1 reply; 49+ messages in thread
From: Avi Kivity @ 2010-03-11  6:47 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Marcelo Tosatti, Alexander Graf, kvm, linux-kernel

On 03/10/2010 05:26 PM, Joerg Roedel wrote:
> On Wed, Mar 10, 2010 at 04:53:29PM +0200, Avi Kivity wrote:
>    
>> On 03/10/2010 04:44 PM, Joerg Roedel wrote:
>>      
>>> On Mon, Mar 08, 2010 at 11:17:41AM +0200, Avi Kivity wrote:
>>>        
>>>> On 03/03/2010 09:12 PM, Joerg Roedel wrote:
>>>>          
>>>>> This patch changes the tdp_enabled flag from its global
>>>>> meaning to the mmu-context. This is necessary for Nested SVM
>>>>> with emulation of Nested Paging where we need an extra MMU
>>>>> context to shadow the Nested Nested Page Table.
>>>>>
>>>>>
>>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>>>> index ec891a2..e7bef19 100644
>>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>>> @@ -254,6 +254,7 @@ struct kvm_mmu {
>>>>>   	int root_level;
>>>>>   	int shadow_root_level;
>>>>>   	union kvm_mmu_page_role base_role;
>>>>> +	bool tdp_enabled;
>>>>>
>>>>>            
>>>> This needs a different name, since the old one is still around.
>>>> Perhaps we could call it parent_mmu and make it a kvm_mmu pointer.
>>>>          
>>> Hmm, how about renaming the global tdp_enabled variable to tdp_usable?
>>> The global variable indicates if tdp is _usable_ and we can _enable_ it
>>> for a mmu context.
>>>        
>> I think of the global flags as host tdp, and the mmu as guest tdp
>> (but maybe this is wrong?).  If that makes sense, the naming should
>> reflect that.
>>      
> The basic flow of the mmu state with npt-npt is:
>
> 	1. As long as the L1 is running the arch.mmu context is in tdp
> 	   mode and builds a direct-mapped page table.
> 	
> 	2. When vmrun is emulated and the nested vmcb enables nested
> 	   paging, arch.mmu is switched to a shadow-mmu mode which now
> 	   shadows the l1 nested page table.
> 	   So when the l2-guest runs with nested paging the
> 	   arch.mmu.tdp_enabled variable on the host is false.
> 	
> 	3. On a vmexit emulation the mmu is switched back to tdp
> 	   handling state.
>
> So the mmu.tdp_enabled parameter is about tdp being enabled for the
> mmu context (so mmu.tdp_enabled means that we build a l1-direct-mapped
> page table when true or shadow a l1-page-table when false). Thats why I
> think the 'tdp_enabled' name makes sense in the mmu-context.
> The global flag only shows if an mmu-context could be in tdp-state. So
> tdp_usable may be a good name for it.
>
>    

tdp is still used in both cases, so that name is confusing.  We could 
call it mmu.direct_map (and set it for real mode?) or mmu.virtual_map 
(with the opposite sense).  Or something.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 02/18] KVM: MMU: Make tdp_enabled a mmu-context parameter
  2010-03-11  6:47           ` Avi Kivity
@ 2010-03-11 10:33             ` Joerg Roedel
  0 siblings, 0 replies; 49+ messages in thread
From: Joerg Roedel @ 2010-03-11 10:33 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Alexander Graf, kvm, linux-kernel

On Thu, Mar 11, 2010 at 08:47:21AM +0200, Avi Kivity wrote:
> 
> tdp is still used in both cases, so that name is confusing.  We
> could call it mmu.direct_map (and set it for real mode?) or
> mmu.virtual_map (with the opposite sense).  Or something.

I like the mmu.direct_map name. Its a good term too, I will change it in
the patch.

	Joerg



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization)
  2010-03-04 15:58   ` Joerg Roedel
@ 2010-03-11 20:58     ` Marcelo Tosatti
  2010-03-12  7:36       ` Avi Kivity
  2010-03-12  7:41     ` Avi Kivity
  1 sibling, 1 reply; 49+ messages in thread
From: Marcelo Tosatti @ 2010-03-11 20:58 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Avi Kivity, Alexander Graf, kvm, linux-kernel

On Thu, Mar 04, 2010 at 04:58:20PM +0100, Joerg Roedel wrote:
> On Thu, Mar 04, 2010 at 11:42:55AM -0300, Marcelo Tosatti wrote:
> > On Wed, Mar 03, 2010 at 08:12:03PM +0100, Joerg Roedel wrote:
> > > Hi,
> > > 
> > > here are the patches that implement nested paging support for nested
> > > svm. They are somewhat intrusive to the soft-mmu so I post them as RFC
> > > in the first round to get feedback about the general direction of the
> > > changes.  Nevertheless I am proud to report that with these patches the
> > > famous kernel-compile benchmark runs only 4% slower in the l2 guest as
> > > in the l1 guest when l2 is single-processor. With SMP guests the
> > > situation is very different. The more vcpus the guest has the more is
> > > the performance drop from l1 to l2. 
> > > Anyway, this post is to get feedback about the overall concept of these
> > > patches.  Please review and give feedback :-)
> > 
> > Joerg,
> > 
> > What perf gain does this bring ? (i'm not aware of the current
> > overhead).
> 
> The benchmark was an allnoconfig kernel compile in tmpfs which took with
> the same guest image:
> 
> as l1-guest with npt:
> 		
> 		2m23s
> 
> as l2-guest with l1(nested)-l2(shadow):
> 	
> 		around 8-9 minutes
> 
> as l2-guest with l1(nested)-l2(shadow) without the recent msrpm
> optimization:
> 
> 		around 19 minutes
> 
> as l2-guest with l1(nested)-l2(nested) [this patchset]:
> 
> 		2m25s-2m30s
> 
> > Overall comments:
> > 
> > Can't you translate l2_gpa -> l1_gpa walking the current l1 nested
> > pagetable, and pass that to the kvm tdp fault path (with the correct
> > context setup)?
> 
> If I understand your suggestion correctly, I think thats exactly whats
> done in the patches. Some words about the design:
> 
> For nested-nested we need to shadow the l1-nested-ptable on the host.
> This is done using the vcpu->arch.mmu context which holds the l1 paging
> modes while the l2 is running. On a npt-fault from the l2 we just
> instrument the shadow-ptable code. This is the common case. because it
> happens all the time while the l2 is running.

OK, makes sense now, I was missing the fact that the l1-nested-ptable   
needs to be shadowed and l1 translations to it must be write protected. 

You should disable out of sync shadow so that l1 guest writes to
l1-nested-ptables always trap. And in the trap case, you'd have to
invalidate l2 shadow pagetable entries that used the (now obsolete)
l1-nested-ptable entry. Does that happen automatically?

> The other thing is that vcpu->arch.mmu.gva_to_gpa is expected to still
> work and translate virtual addresses of the l2 into physical addresses
> of the l1 (so it can be accessed with kvm functions).
> 
> To do this we need to be aware of the L2 paging mode. It is stored in
> vcpu->arch.nested_mmu context. This context is only used for gva_to_gpa
> translations. It is not used to build shadow page tables or anything
> else. Thats the reason only the parts necessary for gva_to_gpa
> translations of the nested_mmu context are initialized.
> 
> Since we can not use mmu.gva_to_gpa to translate only between l2_gpa and
> l1_gpa because this function is required to translate l2_gva to l1_gpa
> by other parts of kvm, the function which does this translation is moved
> to nested_mmu.gva_to_gpa. So basically the gva_to_gpa function pointers
> are swapped between mmu and nested_mmu.
> 
> The nested_mmu.gva_to_gpa function is used in translate_gpa_nested which
> is assigned to the newly introduced translate_gpa callback of nested_mmu
> context.
> 
> This callback is used in the walk_addr function to translate every
> l2_gpa address we read from cr3 or the guest ptes into l1_gpa to read
> the next step from the guest memory.
> 
> In the old unnested case the translate_gpa callback would point to a
> function which just returns the gpa it is passed to it unmodified. The
> walk_addr function is generalized and now there are basically two
> versions of it:
> 
> 	* walk_addr which translates using vcpu->arch.mmu context
> 	* walk_addr_nested which translates using vcpu->arch.nested_mmu
> 	  context
> 
> Thats pretty much how these patches work.
> 
> > You probably need to include a flag in base_role to differentiate
> > between l1 / l2 shadow tables (say if they use the same cr3 value).
> 
> Not sure if this is necessary. It may be necessary when large pages come
> into play. Otherwise the host npt pages are distinguished by the shadow
> npt pages by the direct-flag.
> 
> 	Joerg
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization)
  2010-03-11 20:58     ` Marcelo Tosatti
@ 2010-03-12  7:36       ` Avi Kivity
  2010-03-15  6:27         ` Marcelo Tosatti
  0 siblings, 1 reply; 49+ messages in thread
From: Avi Kivity @ 2010-03-12  7:36 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Joerg Roedel, Alexander Graf, kvm, linux-kernel

On 03/11/2010 10:58 PM, Marcelo Tosatti wrote:
>
>>> Can't you translate l2_gpa ->  l1_gpa walking the current l1 nested
>>> pagetable, and pass that to the kvm tdp fault path (with the correct
>>> context setup)?
>>>        
>> If I understand your suggestion correctly, I think thats exactly whats
>> done in the patches. Some words about the design:
>>
>> For nested-nested we need to shadow the l1-nested-ptable on the host.
>> This is done using the vcpu->arch.mmu context which holds the l1 paging
>> modes while the l2 is running. On a npt-fault from the l2 we just
>> instrument the shadow-ptable code. This is the common case. because it
>> happens all the time while the l2 is running.
>>      
> OK, makes sense now, I was missing the fact that the l1-nested-ptable
> needs to be shadowed and l1 translations to it must be write protected.
>    

Shadow converts (gva -> gpa -> hpa) to (gva -> hpa) or (ngpa -> gpa -> 
hpa) to (ngpa -> hpa) equally well.  In the second case npt still does 
(ngva -> ngpa).

> You should disable out of sync shadow so that l1 guest writes to
> l1-nested-ptables always trap.

Why?  The guest is under obligation to flush the tlb if it writes to a 
page table, and we will resync on that tlb flush.

Unsync makes just as much sense for nnpt.  Think of khugepaged in the 
guest eating a page table and spitting out a PDE.

> And in the trap case, you'd have to
> invalidate l2 shadow pagetable entries that used the (now obsolete)
> l1-nested-ptable entry. Does that happen automatically?
>    

What do you mean by 'l2 shadow ptable entries'?  There are the guest's 
page tables (ordinary direct mapped, unless the guest's guest is also 
running an npt-enabled hypervisor), and the host page tables.  When the 
guest writes to each page table, we invalidate the shadows.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization)
  2010-03-04 15:58   ` Joerg Roedel
  2010-03-11 20:58     ` Marcelo Tosatti
@ 2010-03-12  7:41     ` Avi Kivity
  1 sibling, 0 replies; 49+ messages in thread
From: Avi Kivity @ 2010-03-12  7:41 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Marcelo Tosatti, Alexander Graf, kvm, linux-kernel

On 03/04/2010 05:58 PM, Joerg Roedel wrote:
>> You probably need to include a flag in base_role to differentiate
>> between l1 / l2 shadow tables (say if they use the same cr3 value).
>>      
> Not sure if this is necessary. It may be necessary when large pages come
> into play. Otherwise the host npt pages are distinguished by the shadow
> npt pages by the direct-flag.
>    

Hm, I think that direct maps for the same gfn can be shared.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 15/18] KVM: MMU: Propagate the right fault back to the guest after gva_to_gpa
  2010-03-03 19:12 ` [PATCH 15/18] KVM: MMU: Propagate the right fault back to the guest after gva_to_gpa Joerg Roedel
@ 2010-03-15  4:30   ` Daniel K.
  2010-03-15 12:52     ` Joerg Roedel
  2010-03-15  7:36   ` Avi Kivity
  1 sibling, 1 reply; 49+ messages in thread
From: Daniel K. @ 2010-03-15  4:30 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Avi Kivity, Marcelo Tosatti, Alexander Graf, kvm, linux-kernel

Joerg Roedel wrote:
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2883ce8..9f8b02d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -314,6 +314,19 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, unsigned long addr,
>  	kvm_queue_exception_e(vcpu, PF_VECTOR, error_code)
>  }
>  
> +void kvm_propagate_fault(struct kvm_vcpu *vcpu, unsigned long addr, u32 error_code)
> +{
> +	u32 nested, error;
> +
> +	nested = error_code &  PFERR_NESTED_MASK;
> +	error  = error_code & ~PFERR_NESTED_MASK;
> +
> +	if (vcpu->arch.mmu.nested && !(error_code && PFERR_NESTED_MASK))

This looks incorrect, nested is unused.

At the very least it should be a binary & operation

	if (vcpu->arch.mmu.nested && !(error_code & PFERR_NESTED_MASK))

which can be simplified to

	if (vcpu->arch.mmu.nested && !nested)

but it seems wrong that the condition is that it is nested and not nested 
at the same time.


> +		vcpu->arch.nested_mmu.inject_page_fault(vcpu, addr, error);
> +	else
> +		vcpu->arch.mmu.inject_page_fault(vcpu, addr, error);
> +}
> +
>  void kvm_inject_nmi(struct kvm_vcpu *vcpu)
>  {
>  	vcpu->arch.nmi_pending = 1;


Daniel K.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization)
  2010-03-12  7:36       ` Avi Kivity
@ 2010-03-15  6:27         ` Marcelo Tosatti
  2010-03-15  7:34           ` Avi Kivity
  0 siblings, 1 reply; 49+ messages in thread
From: Marcelo Tosatti @ 2010-03-15  6:27 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Joerg Roedel, Alexander Graf, kvm, linux-kernel

On Fri, Mar 12, 2010 at 09:36:41AM +0200, Avi Kivity wrote:
> On 03/11/2010 10:58 PM, Marcelo Tosatti wrote:
> >
> >>>Can't you translate l2_gpa ->  l1_gpa walking the current l1 nested
> >>>pagetable, and pass that to the kvm tdp fault path (with the correct
> >>>context setup)?
> >>If I understand your suggestion correctly, I think thats exactly whats
> >>done in the patches. Some words about the design:
> >>
> >>For nested-nested we need to shadow the l1-nested-ptable on the host.
> >>This is done using the vcpu->arch.mmu context which holds the l1 paging
> >>modes while the l2 is running. On a npt-fault from the l2 we just
> >>instrument the shadow-ptable code. This is the common case. because it
> >>happens all the time while the l2 is running.
> >OK, makes sense now, I was missing the fact that the l1-nested-ptable
> >needs to be shadowed and l1 translations to it must be write protected.
> 
> Shadow converts (gva -> gpa -> hpa) to (gva -> hpa) or (ngpa -> gpa
> -> hpa) to (ngpa -> hpa) equally well.  In the second case npt still
> does (ngva -> ngpa).
> 
> >You should disable out of sync shadow so that l1 guest writes to
> >l1-nested-ptables always trap.
> 
> Why?  The guest is under obligation to flush the tlb if it writes to
> a page table, and we will resync on that tlb flush.

The guests hypervisor will not flush the tlb with invlpg for updates of
its NPT pagetables. It'll create a new ASID, and KVM will not trap
that.

> Unsync makes just as much sense for nnpt.  Think of khugepaged in
> the guest eating a page table and spitting out a PDE.
> 
> >And in the trap case, you'd have to
> >invalidate l2 shadow pagetable entries that used the (now obsolete)
> >l1-nested-ptable entry. Does that happen automatically?
> 
> What do you mean by 'l2 shadow ptable entries'?  There are the
> guest's page tables (ordinary direct mapped, unless the guest's
> guest is also running an npt-enabled hypervisor), and the host page
> tables.  When the guest writes to each page table, we invalidate the
> shadows.

With 'l2 shadow ptable entries' i mean the shadow pagetables that
translate GPA-L2 -> HPA.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization)
  2010-03-15  6:27         ` Marcelo Tosatti
@ 2010-03-15  7:34           ` Avi Kivity
  0 siblings, 0 replies; 49+ messages in thread
From: Avi Kivity @ 2010-03-15  7:34 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Joerg Roedel, Alexander Graf, kvm, linux-kernel

On 03/15/2010 08:27 AM, Marcelo Tosatti wrote:
>>
>>> You should disable out of sync shadow so that l1 guest writes to
>>> l1-nested-ptables always trap.
>>>        
>> Why?  The guest is under obligation to flush the tlb if it writes to
>> a page table, and we will resync on that tlb flush.
>>      
> The guests hypervisor will not flush the tlb with invlpg for updates of
> its NPT pagetables. It'll create a new ASID, and KVM will not trap
> that.
>    

We'll get a kvm_set_cr3() on the next vmrun.

>>> And in the trap case, you'd have to
>>> invalidate l2 shadow pagetable entries that used the (now obsolete)
>>> l1-nested-ptable entry. Does that happen automatically?
>>>        
>> What do you mean by 'l2 shadow ptable entries'?  There are the
>> guest's page tables (ordinary direct mapped, unless the guest's
>> guest is also running an npt-enabled hypervisor), and the host page
>> tables.  When the guest writes to each page table, we invalidate the
>> shadows.
>>      
> With 'l2 shadow ptable entries' i mean the shadow pagetables that
> translate GPA-L2 ->  HPA.
>    

kvm_mmu_pte_write() will invalidate those sptes and will also install 
new translations if possible.

Beautiful, isn't it?

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 15/18] KVM: MMU: Propagate the right fault back to the guest after gva_to_gpa
  2010-03-03 19:12 ` [PATCH 15/18] KVM: MMU: Propagate the right fault back to the guest after gva_to_gpa Joerg Roedel
  2010-03-15  4:30   ` Daniel K.
@ 2010-03-15  7:36   ` Avi Kivity
  2010-03-15  9:06     ` Joerg Roedel
  1 sibling, 1 reply; 49+ messages in thread
From: Avi Kivity @ 2010-03-15  7:36 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Marcelo Tosatti, Alexander Graf, kvm, linux-kernel

On 03/03/2010 09:12 PM, Joerg Roedel wrote:
> This patch implements logic to make sure that either a
> page-fault/page-fault-vmexit or a nested-page-fault-vmexit
> is propagated back to the guest.
>
> Signed-off-by: Joerg Roedel<joerg.roedel@amd.com>
> ---
>   arch/x86/kvm/mmu.h         |    1 +
>   arch/x86/kvm/paging_tmpl.h |    2 ++
>   arch/x86/kvm/x86.c         |   15 ++++++++++++++-
>   3 files changed, 17 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 64f619b..b42b27e 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -47,6 +47,7 @@
>   #define PFERR_USER_MASK (1U<<  2)
>   #define PFERR_RSVD_MASK (1U<<  3)
>   #define PFERR_FETCH_MASK (1U<<  4)
> +#define PFERR_NESTED_MASK (1U<<  31)
>    


Why is this needed?  Queue an ordinary page fault page; the injection 
code should check the page fault intercept and #VMEXIT if needed.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 15/18] KVM: MMU: Propagate the right fault back to the guest after gva_to_gpa
  2010-03-15  7:36   ` Avi Kivity
@ 2010-03-15  9:06     ` Joerg Roedel
  2010-03-15  9:23       ` Avi Kivity
  0 siblings, 1 reply; 49+ messages in thread
From: Joerg Roedel @ 2010-03-15  9:06 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Joerg Roedel, Marcelo Tosatti, Alexander Graf, kvm, linux-kernel

On Mon, Mar 15, 2010 at 09:36:52AM +0200, Avi Kivity wrote:
> On 03/03/2010 09:12 PM, Joerg Roedel wrote:
>> This patch implements logic to make sure that either a
>> page-fault/page-fault-vmexit or a nested-page-fault-vmexit
>> is propagated back to the guest.
>>
>> Signed-off-by: Joerg Roedel<joerg.roedel@amd.com>
>> ---
>>   arch/x86/kvm/mmu.h         |    1 +
>>   arch/x86/kvm/paging_tmpl.h |    2 ++
>>   arch/x86/kvm/x86.c         |   15 ++++++++++++++-
>>   3 files changed, 17 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
>> index 64f619b..b42b27e 100644
>> --- a/arch/x86/kvm/mmu.h
>> +++ b/arch/x86/kvm/mmu.h
>> @@ -47,6 +47,7 @@
>>   #define PFERR_USER_MASK (1U<<  2)
>>   #define PFERR_RSVD_MASK (1U<<  3)
>>   #define PFERR_FETCH_MASK (1U<<  4)
>> +#define PFERR_NESTED_MASK (1U<<  31)
>>    
>
>
> Why is this needed?  Queue an ordinary page fault page; the injection  
> code should check the page fault intercept and #VMEXIT if needed.

This is needed because we could have a nested page fault or an ordinary
page fault which need to be propagated.

	Joerg


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 15/18] KVM: MMU: Propagate the right fault back to the guest after gva_to_gpa
  2010-03-15  9:06     ` Joerg Roedel
@ 2010-03-15  9:23       ` Avi Kivity
  2010-03-15  9:41         ` Joerg Roedel
  0 siblings, 1 reply; 49+ messages in thread
From: Avi Kivity @ 2010-03-15  9:23 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Joerg Roedel, Marcelo Tosatti, Alexander Graf, kvm, linux-kernel

On 03/15/2010 11:06 AM, Joerg Roedel wrote:
> On Mon, Mar 15, 2010 at 09:36:52AM +0200, Avi Kivity wrote:
>    
>> On 03/03/2010 09:12 PM, Joerg Roedel wrote:
>>      
>>> This patch implements logic to make sure that either a
>>> page-fault/page-fault-vmexit or a nested-page-fault-vmexit
>>> is propagated back to the guest.
>>>
>>> Signed-off-by: Joerg Roedel<joerg.roedel@amd.com>
>>> ---
>>>    arch/x86/kvm/mmu.h         |    1 +
>>>    arch/x86/kvm/paging_tmpl.h |    2 ++
>>>    arch/x86/kvm/x86.c         |   15 ++++++++++++++-
>>>    3 files changed, 17 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
>>> index 64f619b..b42b27e 100644
>>> --- a/arch/x86/kvm/mmu.h
>>> +++ b/arch/x86/kvm/mmu.h
>>> @@ -47,6 +47,7 @@
>>>    #define PFERR_USER_MASK (1U<<   2)
>>>    #define PFERR_RSVD_MASK (1U<<   3)
>>>    #define PFERR_FETCH_MASK (1U<<   4)
>>> +#define PFERR_NESTED_MASK (1U<<   31)
>>>
>>>        
>>
>> Why is this needed?  Queue an ordinary page fault page; the injection
>> code should check the page fault intercept and #VMEXIT if needed.
>>      
> This is needed because we could have a nested page fault or an ordinary
> page fault which need to be propagated.
>    

Right.

Why is pio_copy_data() changed?  One would think that it would be an 
all-or-nothing affair.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 15/18] KVM: MMU: Propagate the right fault back to the guest after gva_to_gpa
  2010-03-15  9:23       ` Avi Kivity
@ 2010-03-15  9:41         ` Joerg Roedel
  0 siblings, 0 replies; 49+ messages in thread
From: Joerg Roedel @ 2010-03-15  9:41 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Joerg Roedel, Marcelo Tosatti, Alexander Graf, kvm, linux-kernel

On Mon, Mar 15, 2010 at 11:23:07AM +0200, Avi Kivity wrote:
> On 03/15/2010 11:06 AM, Joerg Roedel wrote:
>> On Mon, Mar 15, 2010 at 09:36:52AM +0200, Avi Kivity wrote:
>>    
>>> On 03/03/2010 09:12 PM, Joerg Roedel wrote:
>>>      
>>>> This patch implements logic to make sure that either a
>>>> page-fault/page-fault-vmexit or a nested-page-fault-vmexit
>>>> is propagated back to the guest.
>>>>
>>>> Signed-off-by: Joerg Roedel<joerg.roedel@amd.com>
>>>> ---
>>>>    arch/x86/kvm/mmu.h         |    1 +
>>>>    arch/x86/kvm/paging_tmpl.h |    2 ++
>>>>    arch/x86/kvm/x86.c         |   15 ++++++++++++++-
>>>>    3 files changed, 17 insertions(+), 1 deletions(-)
>>>>
>>>> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
>>>> index 64f619b..b42b27e 100644
>>>> --- a/arch/x86/kvm/mmu.h
>>>> +++ b/arch/x86/kvm/mmu.h
>>>> @@ -47,6 +47,7 @@
>>>>    #define PFERR_USER_MASK (1U<<   2)
>>>>    #define PFERR_RSVD_MASK (1U<<   3)
>>>>    #define PFERR_FETCH_MASK (1U<<   4)
>>>> +#define PFERR_NESTED_MASK (1U<<   31)
>>>>
>>>>        
>>>
>>> Why is this needed?  Queue an ordinary page fault page; the injection
>>> code should check the page fault intercept and #VMEXIT if needed.
>>>      
>> This is needed because we could have a nested page fault or an ordinary
>> page fault which need to be propagated.
>>    
>
> Right.
>
> Why is pio_copy_data() changed?  One would think that it would be an  
> all-or-nothing affair.

It was the only place I found where the PROPAGATE_FAULT value was
checked and actually propagated.

	Joerg


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 15/18] KVM: MMU: Propagate the right fault back to the guest after gva_to_gpa
  2010-03-15  4:30   ` Daniel K.
@ 2010-03-15 12:52     ` Joerg Roedel
  0 siblings, 0 replies; 49+ messages in thread
From: Joerg Roedel @ 2010-03-15 12:52 UTC (permalink / raw)
  To: Daniel K.
  Cc: Joerg Roedel, Avi Kivity, Marcelo Tosatti, Alexander Graf, kvm,
	linux-kernel

On Mon, Mar 15, 2010 at 04:30:47AM +0000, Daniel K. wrote:
> Joerg Roedel wrote:
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 2883ce8..9f8b02d 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -314,6 +314,19 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, unsigned long addr,
>>  	kvm_queue_exception_e(vcpu, PF_VECTOR, error_code)
>>  }
>>  +void kvm_propagate_fault(struct kvm_vcpu *vcpu, unsigned long addr, 
>> u32 error_code)
>> +{
>> +	u32 nested, error;
>> +
>> +	nested = error_code &  PFERR_NESTED_MASK;
>> +	error  = error_code & ~PFERR_NESTED_MASK;
>> +
>> +	if (vcpu->arch.mmu.nested && !(error_code && PFERR_NESTED_MASK))
>
> This looks incorrect, nested is unused.
>
> At the very least it should be a binary & operation
>
> 	if (vcpu->arch.mmu.nested && !(error_code & PFERR_NESTED_MASK))
>
> which can be simplified to
>
> 	if (vcpu->arch.mmu.nested && !nested)
>
> but it seems wrong that the condition is that it is nested and not nested 
> at the same time.

Yes, this is already fixed in my local patch-stack. I found it during
further testing (while fixing another bug). But thanks for your feedback
:-)

	Joerg


^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2010-03-15 12:52 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-03 19:12 [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization) Joerg Roedel
2010-03-03 19:12 ` [PATCH 01/18] KVM: MMU: Check for root_level instead of long mode Joerg Roedel
2010-03-03 19:12 ` [PATCH 02/18] KVM: MMU: Make tdp_enabled a mmu-context parameter Joerg Roedel
2010-03-08  9:17   ` Avi Kivity
2010-03-10 14:44     ` Joerg Roedel
2010-03-10 14:53       ` Avi Kivity
2010-03-10 15:26         ` Joerg Roedel
2010-03-11  6:47           ` Avi Kivity
2010-03-11 10:33             ` Joerg Roedel
2010-03-03 19:12 ` [PATCH 03/18] KVM: MMU: Make set_cr3 a function pointer in kvm_mmu Joerg Roedel
2010-03-03 19:12 ` [PATCH 04/18] KVM: X86: Introduce a tdp_set_cr3 function Joerg Roedel
2010-03-03 19:12 ` [PATCH 05/18] KVM: MMU: Introduce get_cr3 function pointer Joerg Roedel
2010-03-03 19:12 ` [PATCH 06/18] KVM: MMU: Introduce inject_page_fault " Joerg Roedel
2010-03-03 19:12 ` [PATCH 07/18] KVM: SVM: Implement MMU helper functions for Nested Nested Paging Joerg Roedel
2010-03-03 19:12 ` [PATCH 08/18] KVM: MMU: Change init_kvm_softmmu to take a context as parameter Joerg Roedel
2010-03-03 19:12 ` [PATCH 09/18] KVM: MMU: Let is_rsvd_bits_set take mmu context instead of vcpu Joerg Roedel
2010-03-03 19:12 ` [PATCH 10/18] KVM: MMU: Introduce generic walk_addr function Joerg Roedel
2010-03-03 19:12 ` [PATCH 11/18] KVM: MMU: Add infrastructure for two-level page walker Joerg Roedel
2010-03-08  9:37   ` Avi Kivity
2010-03-10 14:46     ` Joerg Roedel
2010-03-03 19:12 ` [PATCH 12/18] KVM: MMU: Implement nested gva_to_gpa functions Joerg Roedel
2010-03-03 19:12 ` [PATCH 13/18] KVM: MMU: Introduce Nested MMU context Joerg Roedel
2010-03-03 19:12 ` [PATCH 14/18] KVM: SVM: Initialize Nested Nested MMU context on VMRUN Joerg Roedel
2010-03-03 19:12 ` [PATCH 15/18] KVM: MMU: Propagate the right fault back to the guest after gva_to_gpa Joerg Roedel
2010-03-15  4:30   ` Daniel K.
2010-03-15 12:52     ` Joerg Roedel
2010-03-15  7:36   ` Avi Kivity
2010-03-15  9:06     ` Joerg Roedel
2010-03-15  9:23       ` Avi Kivity
2010-03-15  9:41         ` Joerg Roedel
2010-03-03 19:12 ` [PATCH 16/18] KVM: X86: Add callback to let modules decide over some supported cpuid bits Joerg Roedel
2010-03-03 19:12 ` [PATCH 17/18] KVM: SVM: Report Nested Paging support to userspace Joerg Roedel
2010-03-03 23:37   ` Alexander Graf
2010-03-04 11:27     ` Joerg Roedel
2010-03-03 19:12 ` [PATCH 18/18] KVM: X86: Add KVM_CAP_SVM_CPUID_FIXED Joerg Roedel
2010-03-08  9:39   ` Avi Kivity
2010-03-10 14:46     ` Joerg Roedel
2010-03-03 23:10 ` [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization) Jan Kiszka
2010-03-03 23:44 ` Alexander Graf
2010-03-04 11:29   ` Joerg Roedel
2010-03-04  0:35 ` Anthony Liguori
2010-03-04 14:42 ` Marcelo Tosatti
2010-03-04 15:58   ` Joerg Roedel
2010-03-11 20:58     ` Marcelo Tosatti
2010-03-12  7:36       ` Avi Kivity
2010-03-15  6:27         ` Marcelo Tosatti
2010-03-15  7:34           ` Avi Kivity
2010-03-12  7:41     ` Avi Kivity
2010-03-08  9:41 ` Avi Kivity

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.