All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List)
@ 2010-09-06 15:55 Joerg Roedel
  2010-09-06 15:55 ` [PATCH 01/27] KVM: MMU: Check for root_level instead of long mode Joerg Roedel
                   ` (27 more replies)
  0 siblings, 28 replies; 41+ messages in thread
From: Joerg Roedel @ 2010-09-06 15:55 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: Alexander Graf, joro, kvm, linux-kernel

(Now with correct Cc-list. I accidentially copied the wrong line from
 MAINTAINERS in the first post of this. Sorry for the double-post)

Hi Avi, Marcelo,

here is finally the third round of my NPT virtualization patches for KVM. It
took a while to get everything running (including KVM itself) on 32 bit again
to actually test it. But testing on 32 bit host and with a 32 bit hypervisor
was a very good idea. I found some serious bugs and shortcomings in my code
that are fixed now in v3.

The patchset was tested in a number of combinations:

	host(64|32e)
		->kvm(shadow|npt)
			->guest(64|32e|32)
				->test(boot|kbuild)

	host(64|32e)
		->kvm(npt)
			->guest(64|32e|32)
				->kvm(shadow|kvm)
					->guest(64|32e|32)
						->test(boot|kbuild)

Only the valid combinations where tested of course, so no 64 bit on 32 bit
combinations were tested. Except for that I tested all of the above
combinations and all worked without any regressions.

Other changes since v2 are:

	* Addressed the review comments from v2:
		- Rebased everything to latest upstream code
		- renamed nested_mmu to walk_mmu to make its
		  meaning more clear
		- the gva_to_gpa functions are not longer swapped
		  between the two mmu states which makes it more
		  consistent
		- Moved struct vcpu page fault data into seperate
		  sub-struct for better readability
		- Other minor stuff (coding style, typos)
		- Renamed the kvm_*_page_x86 functions to kvm_*_page_mmu so
		  that they can be made more generic later.
	* Made everything work on 32 bit
		- Introduced mmu->lm_root pointer to let the softmmu shadow 32
		  bit page tables with a long-mode page table. The lm_root
		  page-table root always just points to the mmu.pae_root, so
		  this builds entirely on the pae-shadow code.
		- Split mmu_alloc_roots into a shadow and direct_map version to
		  simplify the code and to not break the direct_map paths anymore
		  when changing something in that function.
	* Probably other changes I forgot about

This patchset applies on todays avi/master + the three patches I sent end of
last week. These patches are necessary for some of the tests above to run.

For the curious and impatient user I put everything in a branch on kernel.org.
If you want to test it you can pull the tree from

	git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-kvm.git npt-virt-v3

Please review and/or apply these patches if considered good enough. Otherwise I
appreciate your feedback.

Thanks,

	Joerg



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 01/27] KVM: MMU: Check for root_level instead of long mode
  2010-09-06 15:55 [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Joerg Roedel
@ 2010-09-06 15:55 ` Joerg Roedel
  2010-09-06 15:55 ` [PATCH 02/27] KVM: MMU: Make tdp_enabled a mmu-context parameter Joerg Roedel
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 41+ messages in thread
From: Joerg Roedel @ 2010-09-06 15:55 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, joro, kvm, linux-kernel, Joerg Roedel

The walk_addr function checks for !is_long_mode in its 64
bit version. But what is meant here is a check for pae
paging. Change the condition to really check for pae paging
so that it also works with nested nested paging.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/paging_tmpl.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index debe770..e4ad3dc 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -132,7 +132,7 @@ walk:
 	walker->level = vcpu->arch.mmu.root_level;
 	pte = vcpu->arch.cr3;
 #if PTTYPE == 64
-	if (!is_long_mode(vcpu)) {
+	if (vcpu->arch.mmu.root_level == PT32E_ROOT_LEVEL) {
 		pte = kvm_pdptr_read(vcpu, (addr >> 30) & 3);
 		trace_kvm_mmu_paging_element(pte, walker->level);
 		if (!is_present_gpte(pte)) {
@@ -205,7 +205,7 @@ walk:
 				(PTTYPE == 64 || is_pse(vcpu))) ||
 		    ((walker->level == PT_PDPE_LEVEL) &&
 				is_large_pte(pte) &&
-				is_long_mode(vcpu))) {
+				vcpu->arch.mmu.root_level == PT64_ROOT_LEVEL)) {
 			int lvl = walker->level;
 
 			walker->gfn = gpte_to_gfn_lvl(pte, lvl);
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 02/27] KVM: MMU: Make tdp_enabled a mmu-context parameter
  2010-09-06 15:55 [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Joerg Roedel
  2010-09-06 15:55 ` [PATCH 01/27] KVM: MMU: Check for root_level instead of long mode Joerg Roedel
@ 2010-09-06 15:55 ` Joerg Roedel
  2010-09-06 15:55 ` [PATCH 03/27] KVM: MMU: Make set_cr3 a function pointer in kvm_mmu Joerg Roedel
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 41+ messages in thread
From: Joerg Roedel @ 2010-09-06 15:55 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, joro, kvm, linux-kernel, Joerg Roedel

This patch changes the tdp_enabled flag from its global
meaning to the mmu-context and renames it to direct_map
there. This is necessary for Nested SVM with emulation of
Nested Paging where we need an extra MMU context to shadow
the Nested Nested Page Table.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/kvm_host.h |    1 +
 arch/x86/kvm/mmu.c              |   20 ++++++++++++--------
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9b30285..53cdf39 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -249,6 +249,7 @@ struct kvm_mmu {
 	int root_level;
 	int shadow_root_level;
 	union kvm_mmu_page_role base_role;
+	bool direct_map;
 
 	u64 *pae_root;
 	u64 rsvd_bits_mask[2][4];
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b2136f9..bfb3f23 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1448,7 +1448,8 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 	if (role.direct)
 		role.cr4_pae = 0;
 	role.access = access;
-	if (!tdp_enabled && vcpu->arch.mmu.root_level <= PT32_ROOT_LEVEL) {
+	if (!vcpu->arch.mmu.direct_map
+	    && vcpu->arch.mmu.root_level <= PT32_ROOT_LEVEL) {
 		quadrant = gaddr >> (PAGE_SHIFT + (PT64_PT_BITS * level));
 		quadrant &= (1 << ((PT32_PT_BITS - PT64_PT_BITS) * level)) - 1;
 		role.quadrant = quadrant;
@@ -1973,7 +1974,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		spte |= shadow_user_mask;
 	if (level > PT_PAGE_TABLE_LEVEL)
 		spte |= PT_PAGE_SIZE_MASK;
-	if (tdp_enabled)
+	if (vcpu->arch.mmu.direct_map)
 		spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
 			kvm_is_mmio_pfn(pfn));
 
@@ -1983,8 +1984,8 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	spte |= (u64)pfn << PAGE_SHIFT;
 
 	if ((pte_access & ACC_WRITE_MASK)
-	    || (!tdp_enabled && write_fault && !is_write_protection(vcpu)
-		&& !user_fault)) {
+	    || (!vcpu->arch.mmu.direct_map && write_fault
+		&& !is_write_protection(vcpu) && !user_fault)) {
 
 		if (level > PT_PAGE_TABLE_LEVEL &&
 		    has_wrprotected_page(vcpu->kvm, gfn, level)) {
@@ -1995,7 +1996,8 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 
 		spte |= PT_WRITABLE_MASK;
 
-		if (!tdp_enabled && !(pte_access & ACC_WRITE_MASK))
+		if (!vcpu->arch.mmu.direct_map
+		    && !(pte_access & ACC_WRITE_MASK))
 			spte &= ~PT_USER_MASK;
 
 		/*
@@ -2371,7 +2373,7 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
 		ASSERT(!VALID_PAGE(root));
 		if (mmu_check_root(vcpu, root_gfn))
 			return 1;
-		if (tdp_enabled) {
+		if (vcpu->arch.mmu.direct_map) {
 			direct = 1;
 			root_gfn = 0;
 		}
@@ -2406,7 +2408,7 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
 				return 1;
 		} else if (vcpu->arch.mmu.root_level == 0)
 			root_gfn = 0;
-		if (tdp_enabled) {
+		if (vcpu->arch.mmu.direct_map) {
 			direct = 1;
 			root_gfn = i << 30;
 		}
@@ -2708,6 +2710,7 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 	context->invlpg = nonpaging_invlpg;
 	context->shadow_root_level = kvm_x86_ops->get_tdp_level();
 	context->root_hpa = INVALID_PAGE;
+	context->direct_map = true;
 
 	if (!is_paging(vcpu)) {
 		context->gva_to_gpa = nonpaging_gva_to_gpa;
@@ -2747,6 +2750,7 @@ static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 
 	vcpu->arch.mmu.base_role.cr4_pae = !!is_pae(vcpu);
 	vcpu->arch.mmu.base_role.cr0_wp = is_write_protection(vcpu);
+	vcpu->arch.mmu.direct_map        = false;
 
 	return r;
 }
@@ -3060,7 +3064,7 @@ int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
 	gpa_t gpa;
 	int r;
 
-	if (tdp_enabled)
+	if (vcpu->arch.mmu.direct_map)
 		return 0;
 
 	gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 03/27] KVM: MMU: Make set_cr3 a function pointer in kvm_mmu
  2010-09-06 15:55 [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Joerg Roedel
  2010-09-06 15:55 ` [PATCH 01/27] KVM: MMU: Check for root_level instead of long mode Joerg Roedel
  2010-09-06 15:55 ` [PATCH 02/27] KVM: MMU: Make tdp_enabled a mmu-context parameter Joerg Roedel
@ 2010-09-06 15:55 ` Joerg Roedel
  2010-09-06 15:55 ` [PATCH 04/27] KVM: X86: Introduce a tdp_set_cr3 function Joerg Roedel
                   ` (24 subsequent siblings)
  27 siblings, 0 replies; 41+ messages in thread
From: Joerg Roedel @ 2010-09-06 15:55 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, joro, kvm, linux-kernel, Joerg Roedel

This is necessary to implement Nested Nested Paging. As a
side effect this allows some cleanups in the SVM nested
paging code.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/kvm_host.h |    1 +
 arch/x86/kvm/mmu.c              |    4 +++-
 2 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 53cdf39..43c8db0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -236,6 +236,7 @@ struct kvm_pio_request {
  */
 struct kvm_mmu {
 	void (*new_cr3)(struct kvm_vcpu *vcpu);
+	void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long root);
 	int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err);
 	void (*free)(struct kvm_vcpu *vcpu);
 	gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access,
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index bfb3f23..2ac3851 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2711,6 +2711,7 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 	context->shadow_root_level = kvm_x86_ops->get_tdp_level();
 	context->root_hpa = INVALID_PAGE;
 	context->direct_map = true;
+	context->set_cr3 = kvm_x86_ops->set_cr3;
 
 	if (!is_paging(vcpu)) {
 		context->gva_to_gpa = nonpaging_gva_to_gpa;
@@ -2751,6 +2752,7 @@ static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 	vcpu->arch.mmu.base_role.cr4_pae = !!is_pae(vcpu);
 	vcpu->arch.mmu.base_role.cr0_wp = is_write_protection(vcpu);
 	vcpu->arch.mmu.direct_map        = false;
+	vcpu->arch.mmu.set_cr3           = kvm_x86_ops->set_cr3;
 
 	return r;
 }
@@ -2794,7 +2796,7 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 	if (r)
 		goto out;
 	/* set_cr3() should ensure TLB has been flushed */
-	kvm_x86_ops->set_cr3(vcpu, vcpu->arch.mmu.root_hpa);
+	vcpu->arch.mmu.set_cr3(vcpu, vcpu->arch.mmu.root_hpa);
 out:
 	return r;
 }
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 04/27] KVM: X86: Introduce a tdp_set_cr3 function
  2010-09-06 15:55 [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Joerg Roedel
                   ` (2 preceding siblings ...)
  2010-09-06 15:55 ` [PATCH 03/27] KVM: MMU: Make set_cr3 a function pointer in kvm_mmu Joerg Roedel
@ 2010-09-06 15:55 ` Joerg Roedel
  2010-09-06 15:55 ` [PATCH 05/27] KVM: MMU: Introduce get_cr3 function pointer Joerg Roedel
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 41+ messages in thread
From: Joerg Roedel @ 2010-09-06 15:55 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, joro, kvm, linux-kernel, Joerg Roedel

This patch introduces a special set_tdp_cr3 function pointer
in kvm_x86_ops which is only used for tpd enabled mmu
contexts. This allows to remove some hacks from svm code.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/kvm_host.h |    2 ++
 arch/x86/kvm/mmu.c              |    2 +-
 arch/x86/kvm/svm.c              |   23 ++++++++++++++---------
 arch/x86/kvm/vmx.c              |    2 ++
 4 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 43c8db0..aeeea9c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -526,6 +526,8 @@ struct kvm_x86_ops {
 	bool (*rdtscp_supported)(void);
 	void (*adjust_tsc_offset)(struct kvm_vcpu *vcpu, s64 adjustment);
 
+	void (*set_tdp_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3);
+
 	void (*set_supported_cpuid)(u32 func, struct kvm_cpuid_entry2 *entry);
 
 	bool (*has_wbinvd_exit)(void);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2ac3851..543ec74 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2711,7 +2711,7 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 	context->shadow_root_level = kvm_x86_ops->get_tdp_level();
 	context->root_hpa = INVALID_PAGE;
 	context->direct_map = true;
-	context->set_cr3 = kvm_x86_ops->set_cr3;
+	context->set_cr3 = kvm_x86_ops->set_tdp_cr3;
 
 	if (!is_paging(vcpu)) {
 		context->gva_to_gpa = nonpaging_gva_to_gpa;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 6808f64..094df31 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3216,9 +3216,6 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 	gs_selector = kvm_read_gs();
 	ldt_selector = kvm_read_ldt();
 	svm->vmcb->save.cr2 = vcpu->arch.cr2;
-	/* required for live migration with NPT */
-	if (npt_enabled)
-		svm->vmcb->save.cr3 = vcpu->arch.cr3;
 
 	clgi();
 
@@ -3335,16 +3332,22 @@ static void svm_set_cr3(struct kvm_vcpu *vcpu, unsigned long root)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	if (npt_enabled) {
-		svm->vmcb->control.nested_cr3 = root;
-		force_new_asid(vcpu);
-		return;
-	}
-
 	svm->vmcb->save.cr3 = root;
 	force_new_asid(vcpu);
 }
 
+static void set_tdp_cr3(struct kvm_vcpu *vcpu, unsigned long root)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	svm->vmcb->control.nested_cr3 = root;
+
+	/* Also sync guest cr3 here in case we live migrate */
+	svm->vmcb->save.cr3 = vcpu->arch.cr3;
+
+	force_new_asid(vcpu);
+}
+
 static int is_disabled(void)
 {
 	u64 vm_cr;
@@ -3571,6 +3574,8 @@ static struct kvm_x86_ops svm_x86_ops = {
 
 	.write_tsc_offset = svm_write_tsc_offset,
 	.adjust_tsc_offset = svm_adjust_tsc_offset,
+
+	.set_tdp_cr3 = set_tdp_cr3,
 };
 
 static int __init svm_init(void)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 676555c..0e62d8a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4347,6 +4347,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
 
 	.write_tsc_offset = vmx_write_tsc_offset,
 	.adjust_tsc_offset = vmx_adjust_tsc_offset,
+
+	.set_tdp_cr3 = vmx_set_cr3,
 };
 
 static int __init vmx_init(void)
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 05/27] KVM: MMU: Introduce get_cr3 function pointer
  2010-09-06 15:55 [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Joerg Roedel
                   ` (3 preceding siblings ...)
  2010-09-06 15:55 ` [PATCH 04/27] KVM: X86: Introduce a tdp_set_cr3 function Joerg Roedel
@ 2010-09-06 15:55 ` Joerg Roedel
  2010-09-06 15:55 ` [PATCH 06/27] KVM: MMU: Introduce inject_page_fault " Joerg Roedel
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 41+ messages in thread
From: Joerg Roedel @ 2010-09-06 15:55 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, joro, kvm, linux-kernel, Joerg Roedel

This function pointer in the MMU context is required to
implement Nested Nested Paging.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/kvm_host.h |    1 +
 arch/x86/kvm/mmu.c              |    9 ++++++++-
 arch/x86/kvm/paging_tmpl.h      |    4 ++--
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index aeeea9c..ab708ee 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -237,6 +237,7 @@ struct kvm_pio_request {
 struct kvm_mmu {
 	void (*new_cr3)(struct kvm_vcpu *vcpu);
 	void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long root);
+	unsigned long (*get_cr3)(struct kvm_vcpu *vcpu);
 	int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err);
 	void (*free)(struct kvm_vcpu *vcpu);
 	gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access,
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 543ec74..d2213fa 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2365,7 +2365,7 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
 	int direct = 0;
 	u64 pdptr;
 
-	root_gfn = vcpu->arch.cr3 >> PAGE_SHIFT;
+	root_gfn = vcpu->arch.mmu.get_cr3(vcpu) >> PAGE_SHIFT;
 
 	if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
 		hpa_t root = vcpu->arch.mmu.root_hpa;
@@ -2561,6 +2561,11 @@ static void paging_new_cr3(struct kvm_vcpu *vcpu)
 	mmu_free_roots(vcpu);
 }
 
+static unsigned long get_cr3(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.cr3;
+}
+
 static void inject_page_fault(struct kvm_vcpu *vcpu,
 			      u64 addr,
 			      u32 err_code)
@@ -2712,6 +2717,7 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 	context->root_hpa = INVALID_PAGE;
 	context->direct_map = true;
 	context->set_cr3 = kvm_x86_ops->set_tdp_cr3;
+	context->get_cr3 = get_cr3;
 
 	if (!is_paging(vcpu)) {
 		context->gva_to_gpa = nonpaging_gva_to_gpa;
@@ -2753,6 +2759,7 @@ static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 	vcpu->arch.mmu.base_role.cr0_wp = is_write_protection(vcpu);
 	vcpu->arch.mmu.direct_map        = false;
 	vcpu->arch.mmu.set_cr3           = kvm_x86_ops->set_cr3;
+	vcpu->arch.mmu.get_cr3           = get_cr3;
 
 	return r;
 }
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index e4ad3dc..13d0c06 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -130,7 +130,7 @@ walk:
 	present = true;
 	eperm = rsvd_fault = false;
 	walker->level = vcpu->arch.mmu.root_level;
-	pte = vcpu->arch.cr3;
+	pte = vcpu->arch.mmu.get_cr3(vcpu);
 #if PTTYPE == 64
 	if (vcpu->arch.mmu.root_level == PT32E_ROOT_LEVEL) {
 		pte = kvm_pdptr_read(vcpu, (addr >> 30) & 3);
@@ -143,7 +143,7 @@ walk:
 	}
 #endif
 	ASSERT((!is_long_mode(vcpu) && is_pae(vcpu)) ||
-	       (vcpu->arch.cr3 & CR3_NONPAE_RESERVED_BITS) == 0);
+	       (vcpu->arch.mmu.get_cr3(vcpu) & CR3_NONPAE_RESERVED_BITS) == 0);
 
 	pt_access = ACC_ALL;
 
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 06/27] KVM: MMU: Introduce inject_page_fault function pointer
  2010-09-06 15:55 [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Joerg Roedel
                   ` (4 preceding siblings ...)
  2010-09-06 15:55 ` [PATCH 05/27] KVM: MMU: Introduce get_cr3 function pointer Joerg Roedel
@ 2010-09-06 15:55 ` Joerg Roedel
  2010-09-06 15:55 ` [PATCH 07/27] KVM: MMU: Introduce kvm_init_shadow_mmu helper function Joerg Roedel
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 41+ messages in thread
From: Joerg Roedel @ 2010-09-06 15:55 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, joro, kvm, linux-kernel, Joerg Roedel

This patch introduces an inject_page_fault function pointer
into struct kvm_mmu which will be used to inject a page
fault. This will be used later when Nested Nested Paging is
implemented.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/kvm_host.h |    3 +++
 arch/x86/kvm/mmu.c              |    4 +++-
 2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ab708ee..3fefcd8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -239,6 +239,9 @@ struct kvm_mmu {
 	void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long root);
 	unsigned long (*get_cr3)(struct kvm_vcpu *vcpu);
 	int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err);
+	void (*inject_page_fault)(struct kvm_vcpu *vcpu,
+				  unsigned long addr,
+				  u32 error_code);
 	void (*free)(struct kvm_vcpu *vcpu);
 	gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access,
 			    u32 *error);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d2213fa..5b55451 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2570,7 +2570,7 @@ static void inject_page_fault(struct kvm_vcpu *vcpu,
 			      u64 addr,
 			      u32 err_code)
 {
-	kvm_inject_page_fault(vcpu, addr, err_code);
+	vcpu->arch.mmu.inject_page_fault(vcpu, addr, err_code);
 }
 
 static void paging_free(struct kvm_vcpu *vcpu)
@@ -2718,6 +2718,7 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 	context->direct_map = true;
 	context->set_cr3 = kvm_x86_ops->set_tdp_cr3;
 	context->get_cr3 = get_cr3;
+	context->inject_page_fault = kvm_inject_page_fault;
 
 	if (!is_paging(vcpu)) {
 		context->gva_to_gpa = nonpaging_gva_to_gpa;
@@ -2760,6 +2761,7 @@ static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 	vcpu->arch.mmu.direct_map        = false;
 	vcpu->arch.mmu.set_cr3           = kvm_x86_ops->set_cr3;
 	vcpu->arch.mmu.get_cr3           = get_cr3;
+	vcpu->arch.mmu.inject_page_fault = kvm_inject_page_fault;
 
 	return r;
 }
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 07/27] KVM: MMU: Introduce kvm_init_shadow_mmu helper function
  2010-09-06 15:55 [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Joerg Roedel
                   ` (5 preceding siblings ...)
  2010-09-06 15:55 ` [PATCH 06/27] KVM: MMU: Introduce inject_page_fault " Joerg Roedel
@ 2010-09-06 15:55 ` Joerg Roedel
  2010-09-06 15:55 ` [PATCH 08/27] KVM: MMU: Let is_rsvd_bits_set take mmu context instead of vcpu Joerg Roedel
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 41+ messages in thread
From: Joerg Roedel @ 2010-09-06 15:55 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, joro, kvm, linux-kernel, Joerg Roedel

Some logic of the init_kvm_softmmu function is required to
build the Nested Nested Paging context. So factor the
required logic into a seperate function and export it.
Also make the whole init path suitable for more than one mmu
context.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/mmu.c |   60 ++++++++++++++++++++++++++++++---------------------
 arch/x86/kvm/mmu.h |    1 +
 2 files changed, 36 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5b55451..787540d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2532,10 +2532,9 @@ static void nonpaging_free(struct kvm_vcpu *vcpu)
 	mmu_free_roots(vcpu);
 }
 
-static int nonpaging_init_context(struct kvm_vcpu *vcpu)
+static int nonpaging_init_context(struct kvm_vcpu *vcpu,
+				  struct kvm_mmu *context)
 {
-	struct kvm_mmu *context = &vcpu->arch.mmu;
-
 	context->new_cr3 = nonpaging_new_cr3;
 	context->page_fault = nonpaging_page_fault;
 	context->gva_to_gpa = nonpaging_gva_to_gpa;
@@ -2594,9 +2593,10 @@ static bool is_rsvd_bits_set(struct kvm_vcpu *vcpu, u64 gpte, int level)
 #include "paging_tmpl.h"
 #undef PTTYPE
 
-static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, int level)
+static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
+				  struct kvm_mmu *context,
+				  int level)
 {
-	struct kvm_mmu *context = &vcpu->arch.mmu;
 	int maxphyaddr = cpuid_maxphyaddr(vcpu);
 	u64 exb_bit_rsvd = 0;
 
@@ -2655,9 +2655,11 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, int level)
 	}
 }
 
-static int paging64_init_context_common(struct kvm_vcpu *vcpu, int level)
+static int paging64_init_context_common(struct kvm_vcpu *vcpu,
+					struct kvm_mmu *context,
+					int level)
 {
-	struct kvm_mmu *context = &vcpu->arch.mmu;
+	reset_rsvds_bits_mask(vcpu, context, level);
 
 	ASSERT(is_pae(vcpu));
 	context->new_cr3 = paging_new_cr3;
@@ -2673,17 +2675,17 @@ static int paging64_init_context_common(struct kvm_vcpu *vcpu, int level)
 	return 0;
 }
 
-static int paging64_init_context(struct kvm_vcpu *vcpu)
+static int paging64_init_context(struct kvm_vcpu *vcpu,
+				 struct kvm_mmu *context)
 {
-	reset_rsvds_bits_mask(vcpu, PT64_ROOT_LEVEL);
-	return paging64_init_context_common(vcpu, PT64_ROOT_LEVEL);
+	return paging64_init_context_common(vcpu, context, PT64_ROOT_LEVEL);
 }
 
-static int paging32_init_context(struct kvm_vcpu *vcpu)
+static int paging32_init_context(struct kvm_vcpu *vcpu,
+				 struct kvm_mmu *context)
 {
-	struct kvm_mmu *context = &vcpu->arch.mmu;
+	reset_rsvds_bits_mask(vcpu, context, PT32_ROOT_LEVEL);
 
-	reset_rsvds_bits_mask(vcpu, PT32_ROOT_LEVEL);
 	context->new_cr3 = paging_new_cr3;
 	context->page_fault = paging32_page_fault;
 	context->gva_to_gpa = paging32_gva_to_gpa;
@@ -2697,10 +2699,10 @@ static int paging32_init_context(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
-static int paging32E_init_context(struct kvm_vcpu *vcpu)
+static int paging32E_init_context(struct kvm_vcpu *vcpu,
+				  struct kvm_mmu *context)
 {
-	reset_rsvds_bits_mask(vcpu, PT32E_ROOT_LEVEL);
-	return paging64_init_context_common(vcpu, PT32E_ROOT_LEVEL);
+	return paging64_init_context_common(vcpu, context, PT32E_ROOT_LEVEL);
 }
 
 static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
@@ -2724,15 +2726,15 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 		context->gva_to_gpa = nonpaging_gva_to_gpa;
 		context->root_level = 0;
 	} else if (is_long_mode(vcpu)) {
-		reset_rsvds_bits_mask(vcpu, PT64_ROOT_LEVEL);
+		reset_rsvds_bits_mask(vcpu, context, PT64_ROOT_LEVEL);
 		context->gva_to_gpa = paging64_gva_to_gpa;
 		context->root_level = PT64_ROOT_LEVEL;
 	} else if (is_pae(vcpu)) {
-		reset_rsvds_bits_mask(vcpu, PT32E_ROOT_LEVEL);
+		reset_rsvds_bits_mask(vcpu, context, PT32E_ROOT_LEVEL);
 		context->gva_to_gpa = paging64_gva_to_gpa;
 		context->root_level = PT32E_ROOT_LEVEL;
 	} else {
-		reset_rsvds_bits_mask(vcpu, PT32_ROOT_LEVEL);
+		reset_rsvds_bits_mask(vcpu, context, PT32_ROOT_LEVEL);
 		context->gva_to_gpa = paging32_gva_to_gpa;
 		context->root_level = PT32_ROOT_LEVEL;
 	}
@@ -2740,25 +2742,33 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
-static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
+int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
 {
 	int r;
-
 	ASSERT(vcpu);
 	ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
 
 	if (!is_paging(vcpu))
-		r = nonpaging_init_context(vcpu);
+		r = nonpaging_init_context(vcpu, context);
 	else if (is_long_mode(vcpu))
-		r = paging64_init_context(vcpu);
+		r = paging64_init_context(vcpu, context);
 	else if (is_pae(vcpu))
-		r = paging32E_init_context(vcpu);
+		r = paging32E_init_context(vcpu, context);
 	else
-		r = paging32_init_context(vcpu);
+		r = paging32_init_context(vcpu, context);
 
 	vcpu->arch.mmu.base_role.cr4_pae = !!is_pae(vcpu);
 	vcpu->arch.mmu.base_role.cr0_wp = is_write_protection(vcpu);
 	vcpu->arch.mmu.direct_map        = false;
+
+	return r;
+}
+EXPORT_SYMBOL_GPL(kvm_init_shadow_mmu);
+
+static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
+{
+	int r = kvm_init_shadow_mmu(vcpu, &vcpu->arch.mmu);
+
 	vcpu->arch.mmu.set_cr3           = kvm_x86_ops->set_cr3;
 	vcpu->arch.mmu.get_cr3           = get_cr3;
 	vcpu->arch.mmu.inject_page_fault = kvm_inject_page_fault;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index f05a03d..7086ca8 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -49,6 +49,7 @@
 #define PFERR_FETCH_MASK (1U << 4)
 
 int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4]);
+int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
 
 static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm)
 {
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 08/27] KVM: MMU: Let is_rsvd_bits_set take mmu context instead of vcpu
  2010-09-06 15:55 [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Joerg Roedel
                   ` (6 preceding siblings ...)
  2010-09-06 15:55 ` [PATCH 07/27] KVM: MMU: Introduce kvm_init_shadow_mmu helper function Joerg Roedel
@ 2010-09-06 15:55 ` Joerg Roedel
  2010-09-06 15:55 ` [PATCH 09/27] KVM: MMU: Introduce generic walk_addr function Joerg Roedel
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 41+ messages in thread
From: Joerg Roedel @ 2010-09-06 15:55 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, joro, kvm, linux-kernel, Joerg Roedel

This patch changes is_rsvd_bits_set() function prototype to
take only a kvm_mmu context instead of a full vcpu.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/mmu.c         |    6 +++---
 arch/x86/kvm/paging_tmpl.h |    7 ++++---
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 787540d..9668f91 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2577,12 +2577,12 @@ static void paging_free(struct kvm_vcpu *vcpu)
 	nonpaging_free(vcpu);
 }
 
-static bool is_rsvd_bits_set(struct kvm_vcpu *vcpu, u64 gpte, int level)
+static bool is_rsvd_bits_set(struct kvm_mmu *mmu, u64 gpte, int level)
 {
 	int bit7;
 
 	bit7 = (gpte >> 7) & 1;
-	return (gpte & vcpu->arch.mmu.rsvd_bits_mask[bit7][level-1]) != 0;
+	return (gpte & mmu->rsvd_bits_mask[bit7][level-1]) != 0;
 }
 
 #define PTTYPE 64
@@ -2857,7 +2857,7 @@ static void mmu_pte_write_new_pte(struct kvm_vcpu *vcpu,
 		return;
         }
 
-	if (is_rsvd_bits_set(vcpu, *(u64 *)new, PT_PAGE_TABLE_LEVEL))
+	if (is_rsvd_bits_set(&vcpu->arch.mmu, *(u64 *)new, PT_PAGE_TABLE_LEVEL))
 		return;
 
 	++vcpu->kvm->stat.mmu_pte_updated;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 13d0c06..68ee1b7 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -168,7 +168,7 @@ walk:
 			break;
 		}
 
-		if (is_rsvd_bits_set(vcpu, pte, walker->level)) {
+		if (is_rsvd_bits_set(&vcpu->arch.mmu, pte, walker->level)) {
 			rsvd_fault = true;
 			break;
 		}
@@ -327,6 +327,7 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
 				u64 *sptep)
 {
 	struct kvm_mmu_page *sp;
+	struct kvm_mmu *mmu = &vcpu->arch.mmu;
 	pt_element_t *gptep = gw->prefetch_ptes;
 	u64 *spte;
 	int i;
@@ -358,7 +359,7 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
 		gpte = gptep[i];
 
 		if (!is_present_gpte(gpte) ||
-		      is_rsvd_bits_set(vcpu, gpte, PT_PAGE_TABLE_LEVEL)) {
+		      is_rsvd_bits_set(mmu, gpte, PT_PAGE_TABLE_LEVEL)) {
 			if (!sp->unsync)
 				__set_spte(spte, shadow_notrap_nonpresent_pte);
 			continue;
@@ -713,7 +714,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 			return -EINVAL;
 
 		gfn = gpte_to_gfn(gpte);
-		if (is_rsvd_bits_set(vcpu, gpte, PT_PAGE_TABLE_LEVEL)
+		if (is_rsvd_bits_set(&vcpu->arch.mmu, gpte, PT_PAGE_TABLE_LEVEL)
 		      || gfn != sp->gfns[i] || !is_present_gpte(gpte)
 		      || !(gpte & PT_ACCESSED_MASK)) {
 			u64 nonpresent;
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 09/27] KVM: MMU: Introduce generic walk_addr function
  2010-09-06 15:55 [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Joerg Roedel
                   ` (7 preceding siblings ...)
  2010-09-06 15:55 ` [PATCH 08/27] KVM: MMU: Let is_rsvd_bits_set take mmu context instead of vcpu Joerg Roedel
@ 2010-09-06 15:55 ` Joerg Roedel
  2010-09-06 15:55 ` [PATCH 10/27] KVM: MMU: Add infrastructure for two-level page walker Joerg Roedel
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 41+ messages in thread
From: Joerg Roedel @ 2010-09-06 15:55 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, joro, kvm, linux-kernel, Joerg Roedel

This is the first patch in the series towards a generic
walk_addr implementation which could walk two-dimensional
page tables in the end. In this first step the walk_addr
function is renamed into walk_addr_generic which takes a
mmu context as an additional parameter.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/paging_tmpl.h |   26 ++++++++++++++++++--------
 1 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 68ee1b7..f26fee9 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -114,9 +114,10 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte)
 /*
  * Fetch a guest pte for a guest virtual address
  */
-static int FNAME(walk_addr)(struct guest_walker *walker,
-			    struct kvm_vcpu *vcpu, gva_t addr,
-			    int write_fault, int user_fault, int fetch_fault)
+static int FNAME(walk_addr_generic)(struct guest_walker *walker,
+				    struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+				    gva_t addr, int write_fault,
+				    int user_fault, int fetch_fault)
 {
 	pt_element_t pte;
 	gfn_t table_gfn;
@@ -129,10 +130,11 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
 walk:
 	present = true;
 	eperm = rsvd_fault = false;
-	walker->level = vcpu->arch.mmu.root_level;
-	pte = vcpu->arch.mmu.get_cr3(vcpu);
+	walker->level = mmu->root_level;
+	pte           = mmu->get_cr3(vcpu);
+
 #if PTTYPE == 64
-	if (vcpu->arch.mmu.root_level == PT32E_ROOT_LEVEL) {
+	if (walker->level == PT32E_ROOT_LEVEL) {
 		pte = kvm_pdptr_read(vcpu, (addr >> 30) & 3);
 		trace_kvm_mmu_paging_element(pte, walker->level);
 		if (!is_present_gpte(pte)) {
@@ -143,7 +145,7 @@ walk:
 	}
 #endif
 	ASSERT((!is_long_mode(vcpu) && is_pae(vcpu)) ||
-	       (vcpu->arch.mmu.get_cr3(vcpu) & CR3_NONPAE_RESERVED_BITS) == 0);
+	       (mmu->get_cr3(vcpu) & CR3_NONPAE_RESERVED_BITS) == 0);
 
 	pt_access = ACC_ALL;
 
@@ -205,7 +207,7 @@ walk:
 				(PTTYPE == 64 || is_pse(vcpu))) ||
 		    ((walker->level == PT_PDPE_LEVEL) &&
 				is_large_pte(pte) &&
-				vcpu->arch.mmu.root_level == PT64_ROOT_LEVEL)) {
+				mmu->root_level == PT64_ROOT_LEVEL)) {
 			int lvl = walker->level;
 
 			walker->gfn = gpte_to_gfn_lvl(pte, lvl);
@@ -262,6 +264,14 @@ error:
 	return 0;
 }
 
+static int FNAME(walk_addr)(struct guest_walker *walker,
+			    struct kvm_vcpu *vcpu, gva_t addr,
+			    int write_fault, int user_fault, int fetch_fault)
+{
+	return FNAME(walk_addr_generic)(walker, vcpu, &vcpu->arch.mmu, addr,
+					write_fault, user_fault, fetch_fault);
+}
+
 static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 			      u64 *spte, const void *pte)
 {
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 10/27] KVM: MMU: Add infrastructure for two-level page walker
  2010-09-06 15:55 [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Joerg Roedel
                   ` (8 preceding siblings ...)
  2010-09-06 15:55 ` [PATCH 09/27] KVM: MMU: Introduce generic walk_addr function Joerg Roedel
@ 2010-09-06 15:55 ` Joerg Roedel
  2010-09-06 18:05   ` Avi Kivity
  2010-09-06 15:55 ` [PATCH 11/27] KVM: X86: Introduce pointer to mmu context used for gva_to_gpa Joerg Roedel
                   ` (17 subsequent siblings)
  27 siblings, 1 reply; 41+ messages in thread
From: Joerg Roedel @ 2010-09-06 15:55 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, joro, kvm, linux-kernel, Joerg Roedel

This patch introduces a mmu-callback to translate gpa
addresses in the walk_addr code. This is later used to
translate l2_gpa addresses into l1_gpa addresses.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/kvm_host.h |    1 +
 arch/x86/kvm/x86.c              |    6 ++++++
 include/linux/kvm_host.h        |    5 +++++
 3 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3fefcd8..af8cce3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -245,6 +245,7 @@ struct kvm_mmu {
 	void (*free)(struct kvm_vcpu *vcpu);
 	gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access,
 			    u32 *error);
+	gpa_t (*translate_gpa)(struct kvm_vcpu *vcpu, gpa_t gpa, u32 *error);
 	void (*prefetch_page)(struct kvm_vcpu *vcpu,
 			      struct kvm_mmu_page *page);
 	int (*sync_page)(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f47db25..829efb0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3433,6 +3433,11 @@ void kvm_get_segment(struct kvm_vcpu *vcpu,
 	kvm_x86_ops->get_segment(vcpu, var, seg);
 }
 
+static gpa_t translate_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 *error)
+{
+	return gpa;
+}
+
 gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva, u32 *error)
 {
 	u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
@@ -5644,6 +5649,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 
 	vcpu->arch.emulate_ctxt.ops = &emulate_ops;
 	vcpu->arch.mmu.root_hpa = INVALID_PAGE;
+	vcpu->arch.mmu.translate_gpa = translate_gpa;
 	if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_bsp(vcpu))
 		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
 	else
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f2ecdd5..f2989a7 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -534,6 +534,11 @@ static inline gpa_t gfn_to_gpa(gfn_t gfn)
 	return (gpa_t)gfn << PAGE_SHIFT;
 }
 
+static inline gfn_t gpa_to_gfn(gpa_t gpa)
+{
+	return (gfn_t)gpa >> PAGE_SHIFT;
+}
+
 static inline hpa_t pfn_to_hpa(pfn_t pfn)
 {
 	return (hpa_t)pfn << PAGE_SHIFT;
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 11/27] KVM: X86: Introduce pointer to mmu context used for gva_to_gpa
  2010-09-06 15:55 [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Joerg Roedel
                   ` (9 preceding siblings ...)
  2010-09-06 15:55 ` [PATCH 10/27] KVM: MMU: Add infrastructure for two-level page walker Joerg Roedel
@ 2010-09-06 15:55 ` Joerg Roedel
  2010-09-06 15:55 ` [PATCH 12/27] KVM: MMU: Implement nested gva_to_gpa functions Joerg Roedel
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 41+ messages in thread
From: Joerg Roedel @ 2010-09-06 15:55 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, joro, kvm, linux-kernel, Joerg Roedel

This patch introduces the walk_mmu pointer which points to
the mmu-context currently used for gva_to_gpa translations.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/kvm_host.h |   14 ++++++++++++++
 arch/x86/kvm/mmu.c              |   10 +++++-----
 arch/x86/kvm/x86.c              |   17 ++++++++++-------
 3 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index af8cce3..d797746 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -288,7 +288,21 @@ struct kvm_vcpu_arch {
 	u64 ia32_misc_enable_msr;
 	bool tpr_access_reporting;
 
+	/*
+	 * Paging state of the vcpu
+	 *
+	 * If the vcpu runs in guest mode with two level paging this still saves
+	 * the paging mode of the l1 guest. This context is always used to
+	 * handle faults.
+	 */
 	struct kvm_mmu mmu;
+
+	/*
+	 * Pointer to the mmu context currently used for
+	 * gva_to_gpa translations.
+	 */
+	struct kvm_mmu *walk_mmu;
+
 	/* only needed in kvm_pv_mmu_op() path, but it's hot so
 	 * put it here to avoid allocation */
 	struct kvm_pv_mmu_op_buffer mmu_op_buffer;
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9668f91..a2cd2ce 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2707,7 +2707,7 @@ static int paging32E_init_context(struct kvm_vcpu *vcpu,
 
 static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 {
-	struct kvm_mmu *context = &vcpu->arch.mmu;
+	struct kvm_mmu *context = vcpu->arch.walk_mmu;
 
 	context->new_cr3 = nonpaging_new_cr3;
 	context->page_fault = tdp_page_fault;
@@ -2767,11 +2767,11 @@ EXPORT_SYMBOL_GPL(kvm_init_shadow_mmu);
 
 static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 {
-	int r = kvm_init_shadow_mmu(vcpu, &vcpu->arch.mmu);
+	int r = kvm_init_shadow_mmu(vcpu, vcpu->arch.walk_mmu);
 
-	vcpu->arch.mmu.set_cr3           = kvm_x86_ops->set_cr3;
-	vcpu->arch.mmu.get_cr3           = get_cr3;
-	vcpu->arch.mmu.inject_page_fault = kvm_inject_page_fault;
+	vcpu->arch.walk_mmu->set_cr3           = kvm_x86_ops->set_cr3;
+	vcpu->arch.walk_mmu->get_cr3           = get_cr3;
+	vcpu->arch.walk_mmu->inject_page_fault = kvm_inject_page_fault;
 
 	return r;
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 829efb0..e5dcf7f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3441,27 +3441,27 @@ static gpa_t translate_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 *error)
 gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva, u32 *error)
 {
 	u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
-	return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, access, error);
+	return vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, error);
 }
 
  gpa_t kvm_mmu_gva_to_gpa_fetch(struct kvm_vcpu *vcpu, gva_t gva, u32 *error)
 {
 	u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
 	access |= PFERR_FETCH_MASK;
-	return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, access, error);
+	return vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, error);
 }
 
 gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva, u32 *error)
 {
 	u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
 	access |= PFERR_WRITE_MASK;
-	return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, access, error);
+	return vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, error);
 }
 
 /* uses this to access any guest's mapped memory without checking CPL */
 gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva, u32 *error)
 {
-	return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, 0, error);
+	return vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, 0, error);
 }
 
 static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int bytes,
@@ -3472,7 +3472,8 @@ static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int bytes,
 	int r = X86EMUL_CONTINUE;
 
 	while (bytes) {
-		gpa_t gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, addr, access, error);
+		gpa_t gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, addr, access,
+							    error);
 		unsigned offset = addr & (PAGE_SIZE-1);
 		unsigned toread = min(bytes, (unsigned)PAGE_SIZE - offset);
 		int ret;
@@ -3527,8 +3528,9 @@ static int kvm_write_guest_virt_system(gva_t addr, void *val,
 	int r = X86EMUL_CONTINUE;
 
 	while (bytes) {
-		gpa_t gpa =  vcpu->arch.mmu.gva_to_gpa(vcpu, addr,
-						       PFERR_WRITE_MASK, error);
+		gpa_t gpa =  vcpu->arch.walk_mmu->gva_to_gpa(vcpu, addr,
+							     PFERR_WRITE_MASK,
+							     error);
 		unsigned offset = addr & (PAGE_SIZE-1);
 		unsigned towrite = min(bytes, (unsigned)PAGE_SIZE - offset);
 		int ret;
@@ -5648,6 +5650,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 	kvm = vcpu->kvm;
 
 	vcpu->arch.emulate_ctxt.ops = &emulate_ops;
+	vcpu->arch.walk_mmu = &vcpu->arch.mmu;
 	vcpu->arch.mmu.root_hpa = INVALID_PAGE;
 	vcpu->arch.mmu.translate_gpa = translate_gpa;
 	if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_bsp(vcpu))
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 12/27] KVM: MMU: Implement nested gva_to_gpa functions
  2010-09-06 15:55 [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Joerg Roedel
                   ` (10 preceding siblings ...)
  2010-09-06 15:55 ` [PATCH 11/27] KVM: X86: Introduce pointer to mmu context used for gva_to_gpa Joerg Roedel
@ 2010-09-06 15:55 ` Joerg Roedel
  2010-09-06 15:55 ` [PATCH 13/27] KVM: X86: Add kvm_read_guest_page_tdp function Joerg Roedel
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 41+ messages in thread
From: Joerg Roedel @ 2010-09-06 15:55 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, joro, kvm, linux-kernel, Joerg Roedel

This patch adds the functions to do a nested l2_gva to
l1_gpa page table walk.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/kvm_host.h |   10 ++++++++++
 arch/x86/kvm/mmu.c              |    8 ++++++++
 arch/x86/kvm/paging_tmpl.h      |   31 +++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.h              |    5 +++++
 4 files changed, 54 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d797746..9b9c096 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -298,6 +298,16 @@ struct kvm_vcpu_arch {
 	struct kvm_mmu mmu;
 
 	/*
+	 * Paging state of an L2 guest (used for nested npt)
+	 *
+	 * This context will save all necessary information to walk page tables
+	 * of the an L2 guest. This context is only initialized for page table
+	 * walking and not for faulting since we never handle l2 page faults on
+	 * the host.
+	 */
+	struct kvm_mmu nested_mmu;
+
+	/*
 	 * Pointer to the mmu context currently used for
 	 * gva_to_gpa translations.
 	 */
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a2cd2ce..1f425f3 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2466,6 +2466,14 @@ static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t vaddr,
 	return vaddr;
 }
 
+static gpa_t nonpaging_gva_to_gpa_nested(struct kvm_vcpu *vcpu, gva_t vaddr,
+					 u32 access, u32 *error)
+{
+	if (error)
+		*error = 0;
+	return vcpu->arch.nested_mmu.translate_gpa(vcpu, vaddr, error);
+}
+
 static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
 				u32 error_code)
 {
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index f26fee9..cd59af1 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -272,6 +272,16 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
 					write_fault, user_fault, fetch_fault);
 }
 
+static int FNAME(walk_addr_nested)(struct guest_walker *walker,
+				   struct kvm_vcpu *vcpu, gva_t addr,
+				   int write_fault, int user_fault,
+				   int fetch_fault)
+{
+	return FNAME(walk_addr_generic)(walker, vcpu, &vcpu->arch.nested_mmu,
+					addr, write_fault, user_fault,
+					fetch_fault);
+}
+
 static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 			      u64 *spte, const void *pte)
 {
@@ -656,6 +666,27 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr, u32 access,
 	return gpa;
 }
 
+static gpa_t FNAME(gva_to_gpa_nested)(struct kvm_vcpu *vcpu, gva_t vaddr,
+				      u32 access, u32 *error)
+{
+	struct guest_walker walker;
+	gpa_t gpa = UNMAPPED_GVA;
+	int r;
+
+	r = FNAME(walk_addr_nested)(&walker, vcpu, vaddr,
+				    access & PFERR_WRITE_MASK,
+				    access & PFERR_USER_MASK,
+				    access & PFERR_FETCH_MASK);
+
+	if (r) {
+		gpa = gfn_to_gpa(walker.gfn);
+		gpa |= vaddr & ~PAGE_MASK;
+	} else if (error)
+		*error = walker.error_code;
+
+	return gpa;
+}
+
 static void FNAME(prefetch_page)(struct kvm_vcpu *vcpu,
 				 struct kvm_mmu_page *sp)
 {
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 2d6385e..bf4dc2f 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -50,6 +50,11 @@ static inline int is_long_mode(struct kvm_vcpu *vcpu)
 #endif
 }
 
+static inline bool mmu_is_nested(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.walk_mmu == &vcpu->arch.nested_mmu;
+}
+
 static inline int is_pae(struct kvm_vcpu *vcpu)
 {
 	return kvm_read_cr4_bits(vcpu, X86_CR4_PAE);
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 13/27] KVM: X86: Add kvm_read_guest_page_tdp function
  2010-09-06 15:55 [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Joerg Roedel
                   ` (11 preceding siblings ...)
  2010-09-06 15:55 ` [PATCH 12/27] KVM: MMU: Implement nested gva_to_gpa functions Joerg Roedel
@ 2010-09-06 15:55 ` Joerg Roedel
  2010-09-06 15:55 ` [PATCH 14/27] KVM: MMU: Make walk_addr_generic capable for two-level walking Joerg Roedel
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 41+ messages in thread
From: Joerg Roedel @ 2010-09-06 15:55 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, joro, kvm, linux-kernel, Joerg Roedel

This patch adds a function which can read from the guests
physical memory or from the guest's guest physical memory.
This will be used in the two-dimensional page table walker.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/kvm_host.h |    3 +++
 arch/x86/kvm/x86.c              |   24 ++++++++++++++++++++++++
 2 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9b9c096..38dc82e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -651,6 +651,9 @@ void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr);
 void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
 void kvm_inject_page_fault(struct kvm_vcpu *vcpu, unsigned long cr2,
 			   u32 error_code);
+int kvm_read_guest_page_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+			    gfn_t gfn, void *data, int offset, int len,
+			    u32 *error);
 bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl);
 
 int kvm_pic_set_irq(void *opaque, int irq, int level);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e5dcf7f..f1bdf4e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -369,6 +369,30 @@ bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl)
 EXPORT_SYMBOL_GPL(kvm_require_cpl);
 
 /*
+ * This function will be used to read from the physical memory of the currently
+ * running guest. The difference to kvm_read_guest_page is that this function
+ * can read from guest physical or from the guest's guest physical memory.
+ */
+int kvm_read_guest_page_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+			    gfn_t ngfn, void *data, int offset, int len,
+			    u32 *error)
+{
+	gfn_t real_gfn;
+	gpa_t ngpa;
+
+	*error   = 0;
+	ngpa     = gfn_to_gpa(ngfn);
+	real_gfn = mmu->translate_gpa(vcpu, ngpa, error);
+	if (real_gfn == UNMAPPED_GVA)
+		return -EFAULT;
+
+	real_gfn = gpa_to_gfn(real_gfn);
+
+	return kvm_read_guest_page(vcpu->kvm, real_gfn, data, offset, len);
+}
+EXPORT_SYMBOL_GPL(kvm_read_guest_page_mmu);
+
+/*
  * Load the pae pdptrs.  Return true is they are all valid.
  */
 int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3)
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 14/27] KVM: MMU: Make walk_addr_generic capable for two-level walking
  2010-09-06 15:55 [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Joerg Roedel
                   ` (12 preceding siblings ...)
  2010-09-06 15:55 ` [PATCH 13/27] KVM: X86: Add kvm_read_guest_page_tdp function Joerg Roedel
@ 2010-09-06 15:55 ` Joerg Roedel
  2010-09-07 17:48   ` Marcelo Tosatti
  2010-09-06 15:55 ` [PATCH 15/27] KVM: MMU: Introduce kvm_read_guest_page_x86() Joerg Roedel
                   ` (13 subsequent siblings)
  27 siblings, 1 reply; 41+ messages in thread
From: Joerg Roedel @ 2010-09-06 15:55 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, joro, kvm, linux-kernel, Joerg Roedel

This patch uses kvm_read_guest_page_tdp to make the
walk_addr_generic functions suitable for two-level page
table walking.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/paging_tmpl.h |   27 ++++++++++++++++++++-------
 1 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index cd59af1..a5b5759 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -124,6 +124,8 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 	unsigned index, pt_access, uninitialized_var(pte_access);
 	gpa_t pte_gpa;
 	bool eperm, present, rsvd_fault;
+	int offset;
+	u32 error = 0;
 
 	trace_kvm_mmu_pagetable_walk(addr, write_fault, user_fault,
 				     fetch_fault);
@@ -153,12 +155,13 @@ walk:
 		index = PT_INDEX(addr, walker->level);
 
 		table_gfn = gpte_to_gfn(pte);
-		pte_gpa = gfn_to_gpa(table_gfn);
-		pte_gpa += index * sizeof(pt_element_t);
+		offset    = index * sizeof(pt_element_t);
+		pte_gpa   = gfn_to_gpa(table_gfn) + offset;
 		walker->table_gfn[walker->level - 1] = table_gfn;
 		walker->pte_gpa[walker->level - 1] = pte_gpa;
 
-		if (kvm_read_guest(vcpu->kvm, pte_gpa, &pte, sizeof(pte))) {
+		if (kvm_read_guest_page_mmu(vcpu, mmu, table_gfn, &pte, offset,
+					    sizeof(pte), &error)) {
 			present = false;
 			break;
 		}
@@ -209,15 +212,25 @@ walk:
 				is_large_pte(pte) &&
 				mmu->root_level == PT64_ROOT_LEVEL)) {
 			int lvl = walker->level;
+			gpa_t real_gpa;
+			gfn_t gfn;
 
-			walker->gfn = gpte_to_gfn_lvl(pte, lvl);
-			walker->gfn += (addr & PT_LVL_OFFSET_MASK(lvl))
-					>> PAGE_SHIFT;
+			gfn = gpte_to_gfn_lvl(pte, lvl);
+			gfn += (addr & PT_LVL_OFFSET_MASK(lvl)) >> PAGE_SHIFT;
 
 			if (PTTYPE == 32 &&
 			    walker->level == PT_DIRECTORY_LEVEL &&
 			    is_cpuid_PSE36())
-				walker->gfn += pse36_gfn_delta(pte);
+				gfn += pse36_gfn_delta(pte);
+
+			real_gpa = mmu->translate_gpa(vcpu, gfn_to_gpa(gfn),
+						      &error);
+			if (real_gpa == UNMAPPED_GVA) {
+				walker->error_code = error;
+				return 0;
+			}
+
+			walker->gfn = real_gpa >> PAGE_SHIFT;
 
 			break;
 		}
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 15/27] KVM: MMU: Introduce kvm_read_guest_page_x86()
  2010-09-06 15:55 [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Joerg Roedel
                   ` (13 preceding siblings ...)
  2010-09-06 15:55 ` [PATCH 14/27] KVM: MMU: Make walk_addr_generic capable for two-level walking Joerg Roedel
@ 2010-09-06 15:55 ` Joerg Roedel
  2010-09-06 15:55 ` [PATCH 16/27] KVM: MMU: Introduce init_kvm_nested_mmu() Joerg Roedel
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 41+ messages in thread
From: Joerg Roedel @ 2010-09-06 15:55 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, joro, kvm, linux-kernel, Joerg Roedel

This patch introduces the kvm_read_guest_page_x86 function
which reads from the physical memory of the guest. If the
guest is running in guest-mode itself with nested paging
enabled it will read from the guest's guest physical memory
instead.
The patch also changes changes the code to use this function
where it is necessary.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/x86.c |   22 ++++++++++++++++++----
 1 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f1bdf4e..1f5db75 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -392,6 +392,13 @@ int kvm_read_guest_page_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 }
 EXPORT_SYMBOL_GPL(kvm_read_guest_page_mmu);
 
+int kvm_read_nested_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn,
+			       void *data, int offset, int len, u32 *error)
+{
+	return kvm_read_guest_page_mmu(vcpu, vcpu->arch.walk_mmu, gfn,
+				       data, offset, len, error);
+}
+
 /*
  * Load the pae pdptrs.  Return true is they are all valid.
  */
@@ -399,12 +406,13 @@ int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3)
 {
 	gfn_t pdpt_gfn = cr3 >> PAGE_SHIFT;
 	unsigned offset = ((cr3 & (PAGE_SIZE-1)) >> 5) << 2;
-	int i;
+	int i, error;
 	int ret;
 	u64 pdpte[ARRAY_SIZE(vcpu->arch.pdptrs)];
 
-	ret = kvm_read_guest_page(vcpu->kvm, pdpt_gfn, pdpte,
-				  offset * sizeof(u64), sizeof(pdpte));
+	ret = kvm_read_nested_guest_page(vcpu, pdpt_gfn, pdpte,
+					 offset * sizeof(u64),
+					 sizeof(pdpte), &error);
 	if (ret < 0) {
 		ret = 0;
 		goto out;
@@ -433,6 +441,9 @@ static bool pdptrs_changed(struct kvm_vcpu *vcpu)
 {
 	u64 pdpte[ARRAY_SIZE(vcpu->arch.pdptrs)];
 	bool changed = true;
+	int offset;
+	u32 error;
+	gfn_t gfn;
 	int r;
 
 	if (is_long_mode(vcpu) || !is_pae(vcpu))
@@ -442,7 +453,10 @@ static bool pdptrs_changed(struct kvm_vcpu *vcpu)
 		      (unsigned long *)&vcpu->arch.regs_avail))
 		return true;
 
-	r = kvm_read_guest(vcpu->kvm, vcpu->arch.cr3 & ~31u, pdpte, sizeof(pdpte));
+	gfn = (vcpu->arch.cr3 & ~31u) >> PAGE_SHIFT;
+	offset = (vcpu->arch.cr3 & ~31u) & (PAGE_SIZE - 1);
+	r = kvm_read_nested_guest_page(vcpu, gfn, pdpte, offset,
+				       sizeof(pdpte), &error);
 	if (r < 0)
 		goto out;
 	changed = memcmp(pdpte, vcpu->arch.pdptrs, sizeof(pdpte)) != 0;
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 16/27] KVM: MMU: Introduce init_kvm_nested_mmu()
  2010-09-06 15:55 [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Joerg Roedel
                   ` (14 preceding siblings ...)
  2010-09-06 15:55 ` [PATCH 15/27] KVM: MMU: Introduce kvm_read_guest_page_x86() Joerg Roedel
@ 2010-09-06 15:55 ` Joerg Roedel
  2010-09-06 15:55 ` [PATCH 17/27] KVM: MMU: Track page fault data in struct vcpu Joerg Roedel
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 41+ messages in thread
From: Joerg Roedel @ 2010-09-06 15:55 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, joro, kvm, linux-kernel, Joerg Roedel

This patch introduces the init_kvm_nested_mmu() function
which is used to re-initialize the nested mmu when the l2
guest changes its paging mode.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/kvm_host.h |    1 +
 arch/x86/kvm/mmu.c              |   34 +++++++++++++++++++++++++++++++++-
 arch/x86/kvm/mmu.h              |    1 +
 arch/x86/kvm/x86.c              |   20 ++++++++++++++++++++
 4 files changed, 55 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 38dc82e..a338235 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -805,3 +805,4 @@ void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
 bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);
 
 #endif /* _ASM_X86_KVM_HOST_H */
+
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 1f425f3..7bc8d67 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2784,11 +2784,43 @@ static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 	return r;
 }
 
+static int init_kvm_nested_mmu(struct kvm_vcpu *vcpu)
+{
+	struct kvm_mmu *g_context = &vcpu->arch.nested_mmu;
+
+	g_context->get_cr3           = get_cr3;
+	g_context->inject_page_fault = kvm_inject_page_fault;
+
+	/*
+	 * Note that arch.mmu.gva_to_gpa translates l2_gva to l1_gpa. The
+	 * translation of l2_gpa to l1_gpa addresses is done using the
+	 * arch.nested_mmu.gva_to_gpa function. Basically the gva_to_gpa
+	 * functions between mmu and nested_mmu are swapped.
+	 */
+	if (!is_paging(vcpu)) {
+		g_context->root_level = 0;
+		g_context->gva_to_gpa = nonpaging_gva_to_gpa_nested;
+	} else if (is_long_mode(vcpu)) {
+		g_context->root_level = PT64_ROOT_LEVEL;
+		g_context->gva_to_gpa = paging64_gva_to_gpa_nested;
+	} else if (is_pae(vcpu)) {
+		g_context->root_level = PT32E_ROOT_LEVEL;
+		g_context->gva_to_gpa = paging64_gva_to_gpa_nested;
+	} else {
+		g_context->root_level = PT32_ROOT_LEVEL;
+		g_context->gva_to_gpa = paging32_gva_to_gpa_nested;
+	}
+
+	return 0;
+}
+
 static int init_kvm_mmu(struct kvm_vcpu *vcpu)
 {
 	vcpu->arch.update_pte.pfn = bad_pfn;
 
-	if (tdp_enabled)
+	if (mmu_is_nested(vcpu))
+		return init_kvm_nested_mmu(vcpu);
+	else if (tdp_enabled)
 		return init_kvm_tdp_mmu(vcpu);
 	else
 		return init_kvm_softmmu(vcpu);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 7086ca8..513abbb 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -47,6 +47,7 @@
 #define PFERR_USER_MASK (1U << 2)
 #define PFERR_RSVD_MASK (1U << 3)
 #define PFERR_FETCH_MASK (1U << 4)
+#define PFERR_NESTED_MASK (1U << 31)
 
 int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4]);
 int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1f5db75..b2fe9e7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3476,6 +3476,25 @@ static gpa_t translate_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 *error)
 	return gpa;
 }
 
+static gpa_t translate_nested_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 *error)
+{
+	gpa_t t_gpa;
+	u32 access;
+	u32 err;
+
+	BUG_ON(!mmu_is_nested(vcpu));
+
+	/* NPT walks are treated as user writes */
+	access = PFERR_WRITE_MASK | PFERR_USER_MASK;
+	t_gpa  = vcpu->arch.mmu.gva_to_gpa(vcpu, gpa, access, &err);
+	if (t_gpa == UNMAPPED_GVA) {
+		vcpu->arch.fault.address    = gpa;
+		vcpu->arch.fault.error_code = err | PFERR_NESTED_MASK;
+	}
+
+	return t_gpa;
+}
+
 gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva, u32 *error)
 {
 	u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
@@ -5691,6 +5710,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 	vcpu->arch.walk_mmu = &vcpu->arch.mmu;
 	vcpu->arch.mmu.root_hpa = INVALID_PAGE;
 	vcpu->arch.mmu.translate_gpa = translate_gpa;
+	vcpu->arch.nested_mmu.translate_gpa = translate_nested_gpa;
 	if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_bsp(vcpu))
 		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
 	else
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 17/27] KVM: MMU: Track page fault data in struct vcpu
  2010-09-06 15:55 [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Joerg Roedel
                   ` (15 preceding siblings ...)
  2010-09-06 15:55 ` [PATCH 16/27] KVM: MMU: Introduce init_kvm_nested_mmu() Joerg Roedel
@ 2010-09-06 15:55 ` Joerg Roedel
  2010-09-06 18:17   ` Avi Kivity
  2010-09-06 15:55 ` [PATCH 18/27] KVM: MMU: Propagate the right fault back to the guest after gva_to_gpa Joerg Roedel
                   ` (10 subsequent siblings)
  27 siblings, 1 reply; 41+ messages in thread
From: Joerg Roedel @ 2010-09-06 15:55 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, joro, kvm, linux-kernel, Joerg Roedel

This patch introduces a struct with two new fields in
vcpu_arch for x86:

	* fault.address
	* fault.error_code

This will be used to correctly propagate page faults back
into the guest when we could have either an ordinary page
fault or a nested page fault. In the case of a nested page
fault the fault-address is different from the original
address that should be walked. So we need to keep track
about the real fault-address.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/kvm_emulate.h |    1 -
 arch/x86/include/asm/kvm_host.h    |    9 +++++++++
 arch/x86/kvm/emulate.c             |   30 ++++++++++++++----------------
 arch/x86/kvm/paging_tmpl.h         |    4 ++++
 arch/x86/kvm/x86.c                 |    3 ++-
 5 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index 1bf1140..5187dd8 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -229,7 +229,6 @@ struct x86_emulate_ctxt {
 	int exception; /* exception that happens during emulation or -1 */
 	u32 error_code; /* error code for exception */
 	bool error_code_valid;
-	unsigned long cr2; /* faulted address in case of #PF */
 
 	/* decode cache */
 	struct decode_cache decode;
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a338235..e5eb57c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -313,6 +313,15 @@ struct kvm_vcpu_arch {
 	 */
 	struct kvm_mmu *walk_mmu;
 
+	/*
+	 * This struct is filled with the necessary information to propagate a
+	 * page fault into the guest
+	 */
+	struct {
+		u64      address;
+		unsigned error_code;
+	} fault;
+
 	/* only needed in kvm_pv_mmu_op() path, but it's hot so
 	 * put it here to avoid allocation */
 	struct kvm_pv_mmu_op_buffer mmu_op_buffer;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 27d2c22..2b08b78 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -487,11 +487,9 @@ static void emulate_gp(struct x86_emulate_ctxt *ctxt, int err)
 	emulate_exception(ctxt, GP_VECTOR, err, true);
 }
 
-static void emulate_pf(struct x86_emulate_ctxt *ctxt, unsigned long addr,
-		       int err)
+static void emulate_pf(struct x86_emulate_ctxt *ctxt)
 {
-	ctxt->cr2 = addr;
-	emulate_exception(ctxt, PF_VECTOR, err, true);
+	emulate_exception(ctxt, PF_VECTOR, 0, true);
 }
 
 static void emulate_ud(struct x86_emulate_ctxt *ctxt)
@@ -834,7 +832,7 @@ static int read_emulated(struct x86_emulate_ctxt *ctxt,
 		rc = ops->read_emulated(addr, mc->data + mc->end, n, &err,
 					ctxt->vcpu);
 		if (rc == X86EMUL_PROPAGATE_FAULT)
-			emulate_pf(ctxt, addr, err);
+			emulate_pf(ctxt);
 		if (rc != X86EMUL_CONTINUE)
 			return rc;
 		mc->end += n;
@@ -921,7 +919,7 @@ static int read_segment_descriptor(struct x86_emulate_ctxt *ctxt,
 	addr = dt.address + index * 8;
 	ret = ops->read_std(addr, desc, sizeof *desc, ctxt->vcpu,  &err);
 	if (ret == X86EMUL_PROPAGATE_FAULT)
-		emulate_pf(ctxt, addr, err);
+		emulate_pf(ctxt);
 
        return ret;
 }
@@ -947,7 +945,7 @@ static int write_segment_descriptor(struct x86_emulate_ctxt *ctxt,
 	addr = dt.address + index * 8;
 	ret = ops->write_std(addr, desc, sizeof *desc, ctxt->vcpu, &err);
 	if (ret == X86EMUL_PROPAGATE_FAULT)
-		emulate_pf(ctxt, addr, err);
+		emulate_pf(ctxt);
 
 	return ret;
 }
@@ -1117,7 +1115,7 @@ static inline int writeback(struct x86_emulate_ctxt *ctxt,
 					&err,
 					ctxt->vcpu);
 		if (rc == X86EMUL_PROPAGATE_FAULT)
-			emulate_pf(ctxt, c->dst.addr.mem, err);
+			emulate_pf(ctxt);
 		if (rc != X86EMUL_CONTINUE)
 			return rc;
 		break;
@@ -1939,7 +1937,7 @@ static int task_switch_16(struct x86_emulate_ctxt *ctxt,
 			    &err);
 	if (ret == X86EMUL_PROPAGATE_FAULT) {
 		/* FIXME: need to provide precise fault address */
-		emulate_pf(ctxt, old_tss_base, err);
+		emulate_pf(ctxt);
 		return ret;
 	}
 
@@ -1949,7 +1947,7 @@ static int task_switch_16(struct x86_emulate_ctxt *ctxt,
 			     &err);
 	if (ret == X86EMUL_PROPAGATE_FAULT) {
 		/* FIXME: need to provide precise fault address */
-		emulate_pf(ctxt, old_tss_base, err);
+		emulate_pf(ctxt);
 		return ret;
 	}
 
@@ -1957,7 +1955,7 @@ static int task_switch_16(struct x86_emulate_ctxt *ctxt,
 			    &err);
 	if (ret == X86EMUL_PROPAGATE_FAULT) {
 		/* FIXME: need to provide precise fault address */
-		emulate_pf(ctxt, new_tss_base, err);
+		emulate_pf(ctxt);
 		return ret;
 	}
 
@@ -1970,7 +1968,7 @@ static int task_switch_16(struct x86_emulate_ctxt *ctxt,
 				     ctxt->vcpu, &err);
 		if (ret == X86EMUL_PROPAGATE_FAULT) {
 			/* FIXME: need to provide precise fault address */
-			emulate_pf(ctxt, new_tss_base, err);
+			emulate_pf(ctxt);
 			return ret;
 		}
 	}
@@ -2081,7 +2079,7 @@ static int task_switch_32(struct x86_emulate_ctxt *ctxt,
 			    &err);
 	if (ret == X86EMUL_PROPAGATE_FAULT) {
 		/* FIXME: need to provide precise fault address */
-		emulate_pf(ctxt, old_tss_base, err);
+		emulate_pf(ctxt);
 		return ret;
 	}
 
@@ -2091,7 +2089,7 @@ static int task_switch_32(struct x86_emulate_ctxt *ctxt,
 			     &err);
 	if (ret == X86EMUL_PROPAGATE_FAULT) {
 		/* FIXME: need to provide precise fault address */
-		emulate_pf(ctxt, old_tss_base, err);
+		emulate_pf(ctxt);
 		return ret;
 	}
 
@@ -2099,7 +2097,7 @@ static int task_switch_32(struct x86_emulate_ctxt *ctxt,
 			    &err);
 	if (ret == X86EMUL_PROPAGATE_FAULT) {
 		/* FIXME: need to provide precise fault address */
-		emulate_pf(ctxt, new_tss_base, err);
+		emulate_pf(ctxt);
 		return ret;
 	}
 
@@ -2112,7 +2110,7 @@ static int task_switch_32(struct x86_emulate_ctxt *ctxt,
 				     ctxt->vcpu, &err);
 		if (ret == X86EMUL_PROPAGATE_FAULT) {
 			/* FIXME: need to provide precise fault address */
-			emulate_pf(ctxt, new_tss_base, err);
+			emulate_pf(ctxt);
 			return ret;
 		}
 	}
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index a5b5759..20fc815 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -273,6 +273,10 @@ error:
 		walker->error_code |= PFERR_FETCH_MASK;
 	if (rsvd_fault)
 		walker->error_code |= PFERR_RSVD_MASK;
+
+	vcpu->arch.fault.address    = addr;
+	vcpu->arch.fault.error_code = walker->error_code;
+
 	trace_kvm_mmu_walker_error(walker->error_code);
 	return 0;
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b2fe9e7..38d482d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4130,7 +4130,8 @@ static void inject_emulated_exception(struct kvm_vcpu *vcpu)
 {
 	struct x86_emulate_ctxt *ctxt = &vcpu->arch.emulate_ctxt;
 	if (ctxt->exception == PF_VECTOR)
-		kvm_inject_page_fault(vcpu, ctxt->cr2, ctxt->error_code);
+		kvm_inject_page_fault(vcpu, vcpu->arch.fault.address,
+					    vcpu->arch.fault.error_code);
 	else if (ctxt->error_code_valid)
 		kvm_queue_exception_e(vcpu, ctxt->exception, ctxt->error_code);
 	else
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 18/27] KVM: MMU: Propagate the right fault back to the guest after gva_to_gpa
  2010-09-06 15:55 [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Joerg Roedel
                   ` (16 preceding siblings ...)
  2010-09-06 15:55 ` [PATCH 17/27] KVM: MMU: Track page fault data in struct vcpu Joerg Roedel
@ 2010-09-06 15:55 ` Joerg Roedel
  2010-09-06 15:55 ` [PATCH 19/27] KVM: X86: Propagate fetch faults Joerg Roedel
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 41+ messages in thread
From: Joerg Roedel @ 2010-09-06 15:55 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, joro, kvm, linux-kernel, Joerg Roedel

This patch implements logic to make sure that either a
page-fault/page-fault-vmexit or a nested-page-fault-vmexit
is propagated back to the guest.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/kvm_host.h |    1 +
 arch/x86/kvm/x86.c              |   19 +++++++++++++++++--
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e5eb57c..173834b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -663,6 +663,7 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, unsigned long cr2,
 int kvm_read_guest_page_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 			    gfn_t gfn, void *data, int offset, int len,
 			    u32 *error);
+void kvm_propagate_fault(struct kvm_vcpu *vcpu);
 bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl);
 
 int kvm_pic_set_irq(void *opaque, int irq, int level);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 38d482d..65b00f0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -337,6 +337,22 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 	kvm_queue_exception_e(vcpu, PF_VECTOR, error_code);
 }
 
+void kvm_propagate_fault(struct kvm_vcpu *vcpu)
+{
+	unsigned long address;
+	u32 nested, error;
+
+	address = vcpu->arch.fault.address;
+	error   = vcpu->arch.fault.error_code;
+	nested  = error &  PFERR_NESTED_MASK;
+	error   = error & ~PFERR_NESTED_MASK;
+
+	if (mmu_is_nested(vcpu) && !nested)
+		vcpu->arch.nested_mmu.inject_page_fault(vcpu, address, error);
+	else
+		vcpu->arch.mmu.inject_page_fault(vcpu, address, error);
+}
+
 void kvm_inject_nmi(struct kvm_vcpu *vcpu)
 {
 	vcpu->arch.nmi_pending = 1;
@@ -4130,8 +4146,7 @@ static void inject_emulated_exception(struct kvm_vcpu *vcpu)
 {
 	struct x86_emulate_ctxt *ctxt = &vcpu->arch.emulate_ctxt;
 	if (ctxt->exception == PF_VECTOR)
-		kvm_inject_page_fault(vcpu, vcpu->arch.fault.address,
-					    vcpu->arch.fault.error_code);
+		kvm_propagate_fault(vcpu);
 	else if (ctxt->error_code_valid)
 		kvm_queue_exception_e(vcpu, ctxt->exception, ctxt->error_code);
 	else
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 19/27] KVM: X86: Propagate fetch faults
  2010-09-06 15:55 [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Joerg Roedel
                   ` (17 preceding siblings ...)
  2010-09-06 15:55 ` [PATCH 18/27] KVM: MMU: Propagate the right fault back to the guest after gva_to_gpa Joerg Roedel
@ 2010-09-06 15:55 ` Joerg Roedel
  2010-09-07 18:43   ` Marcelo Tosatti
  2010-09-06 15:55 ` [PATCH 20/27] KVM: MMU: Add kvm_mmu parameter to load_pdptrs function Joerg Roedel
                   ` (8 subsequent siblings)
  27 siblings, 1 reply; 41+ messages in thread
From: Joerg Roedel @ 2010-09-06 15:55 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, joro, kvm, linux-kernel, Joerg Roedel

KVM currently ignores fetch faults in the instruction
emulator. With nested-npt we could have such faults. This
patch adds the code to handle these.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/emulate.c |    3 +++
 arch/x86/kvm/x86.c     |    4 ++++
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 2b08b78..aead72e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1198,6 +1198,9 @@ static int emulate_popf(struct x86_emulate_ctxt *ctxt,
 	*(unsigned long *)dest =
 		(ctxt->eflags & ~change_mask) | (val & change_mask);
 
+	if (rc == X86EMUL_PROPAGATE_FAULT)
+		emulate_pf(ctxt);
+
 	return rc;
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 65b00f0..ca69dcc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4237,6 +4237,9 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
 		vcpu->arch.emulate_ctxt.perm_ok = false;
 
 		r = x86_decode_insn(&vcpu->arch.emulate_ctxt);
+		if (r == X86EMUL_PROPAGATE_FAULT)
+			goto done;
+
 		trace_kvm_emulate_insn_start(vcpu);
 
 		/* Only allow emulation of specific instructions on #UD
@@ -4295,6 +4298,7 @@ restart:
 		return handle_emulation_failure(vcpu);
 	}
 
+done:
 	if (vcpu->arch.emulate_ctxt.exception >= 0) {
 		inject_emulated_exception(vcpu);
 		r = EMULATE_DONE;
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 20/27] KVM: MMU: Add kvm_mmu parameter to load_pdptrs function
  2010-09-06 15:55 [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Joerg Roedel
                   ` (18 preceding siblings ...)
  2010-09-06 15:55 ` [PATCH 19/27] KVM: X86: Propagate fetch faults Joerg Roedel
@ 2010-09-06 15:55 ` Joerg Roedel
  2010-09-06 15:56 ` [PATCH 21/27] KVM: MMU: Introduce kvm_pdptr_read_mmu Joerg Roedel
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 41+ messages in thread
From: Joerg Roedel @ 2010-09-06 15:55 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, joro, kvm, linux-kernel, Joerg Roedel

This function need to be able to load the pdptrs from any
mmu context currently in use. So change this function to
take an kvm_mmu parameter to fit these needs.
As a side effect this patch also moves the cached pdptrs
from vcpu_arch into the kvm_mmu struct.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/kvm_host.h |    5 +++--
 arch/x86/kvm/kvm_cache_regs.h   |    2 +-
 arch/x86/kvm/svm.c              |    2 +-
 arch/x86/kvm/vmx.c              |   16 ++++++++--------
 arch/x86/kvm/x86.c              |   26 ++++++++++++++------------
 5 files changed, 27 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 173834b..1080c0f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -259,6 +259,8 @@ struct kvm_mmu {
 
 	u64 *pae_root;
 	u64 rsvd_bits_mask[2][4];
+
+	u64 pdptrs[4]; /* pae */
 };
 
 struct kvm_vcpu_arch {
@@ -278,7 +280,6 @@ struct kvm_vcpu_arch {
 	unsigned long cr4_guest_owned_bits;
 	unsigned long cr8;
 	u32 hflags;
-	u64 pdptrs[4]; /* pae */
 	u64 efer;
 	u64 apic_base;
 	struct kvm_lapic *apic;    /* kernel irqchip context */
@@ -594,7 +595,7 @@ void kvm_mmu_zap_all(struct kvm *kvm);
 unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
 void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
 
-int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
+int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3);
 
 int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
 			  const void *val, int bytes);
diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index 6491ac8..a37abe2 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -42,7 +42,7 @@ static inline u64 kvm_pdptr_read(struct kvm_vcpu *vcpu, int index)
 		      (unsigned long *)&vcpu->arch.regs_avail))
 		kvm_x86_ops->cache_reg(vcpu, VCPU_EXREG_PDPTR);
 
-	return vcpu->arch.pdptrs[index];
+	return vcpu->arch.walk_mmu->pdptrs[index];
 }
 
 static inline ulong kvm_read_cr0_bits(struct kvm_vcpu *vcpu, ulong mask)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 094df31..a98ac52 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1010,7 +1010,7 @@ static void svm_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
 	switch (reg) {
 	case VCPU_EXREG_PDPTR:
 		BUG_ON(!npt_enabled);
-		load_pdptrs(vcpu, vcpu->arch.cr3);
+		load_pdptrs(vcpu, vcpu->arch.walk_mmu, vcpu->arch.cr3);
 		break;
 	default:
 		BUG();
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0e62d8a..0a70194 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1848,20 +1848,20 @@ static void ept_load_pdptrs(struct kvm_vcpu *vcpu)
 		return;
 
 	if (is_paging(vcpu) && is_pae(vcpu) && !is_long_mode(vcpu)) {
-		vmcs_write64(GUEST_PDPTR0, vcpu->arch.pdptrs[0]);
-		vmcs_write64(GUEST_PDPTR1, vcpu->arch.pdptrs[1]);
-		vmcs_write64(GUEST_PDPTR2, vcpu->arch.pdptrs[2]);
-		vmcs_write64(GUEST_PDPTR3, vcpu->arch.pdptrs[3]);
+		vmcs_write64(GUEST_PDPTR0, vcpu->arch.mmu.pdptrs[0]);
+		vmcs_write64(GUEST_PDPTR1, vcpu->arch.mmu.pdptrs[1]);
+		vmcs_write64(GUEST_PDPTR2, vcpu->arch.mmu.pdptrs[2]);
+		vmcs_write64(GUEST_PDPTR3, vcpu->arch.mmu.pdptrs[3]);
 	}
 }
 
 static void ept_save_pdptrs(struct kvm_vcpu *vcpu)
 {
 	if (is_paging(vcpu) && is_pae(vcpu) && !is_long_mode(vcpu)) {
-		vcpu->arch.pdptrs[0] = vmcs_read64(GUEST_PDPTR0);
-		vcpu->arch.pdptrs[1] = vmcs_read64(GUEST_PDPTR1);
-		vcpu->arch.pdptrs[2] = vmcs_read64(GUEST_PDPTR2);
-		vcpu->arch.pdptrs[3] = vmcs_read64(GUEST_PDPTR3);
+		vcpu->arch.mmu.pdptrs[0] = vmcs_read64(GUEST_PDPTR0);
+		vcpu->arch.mmu.pdptrs[1] = vmcs_read64(GUEST_PDPTR1);
+		vcpu->arch.mmu.pdptrs[2] = vmcs_read64(GUEST_PDPTR2);
+		vcpu->arch.mmu.pdptrs[3] = vmcs_read64(GUEST_PDPTR3);
 	}
 
 	__set_bit(VCPU_EXREG_PDPTR,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ca69dcc..337f59f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -418,17 +418,17 @@ int kvm_read_nested_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn,
 /*
  * Load the pae pdptrs.  Return true is they are all valid.
  */
-int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3)
+int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3)
 {
 	gfn_t pdpt_gfn = cr3 >> PAGE_SHIFT;
 	unsigned offset = ((cr3 & (PAGE_SIZE-1)) >> 5) << 2;
 	int i, error;
 	int ret;
-	u64 pdpte[ARRAY_SIZE(vcpu->arch.pdptrs)];
+	u64 pdpte[ARRAY_SIZE(mmu->pdptrs)];
 
-	ret = kvm_read_nested_guest_page(vcpu, pdpt_gfn, pdpte,
-					 offset * sizeof(u64),
-					 sizeof(pdpte), &error);
+	ret = kvm_read_guest_page_mmu(vcpu, mmu, pdpt_gfn, pdpte,
+				      offset * sizeof(u64),
+				      sizeof(pdpte), &error);
 	if (ret < 0) {
 		ret = 0;
 		goto out;
@@ -442,7 +442,7 @@ int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3)
 	}
 	ret = 1;
 
-	memcpy(vcpu->arch.pdptrs, pdpte, sizeof(vcpu->arch.pdptrs));
+	memcpy(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs));
 	__set_bit(VCPU_EXREG_PDPTR,
 		  (unsigned long *)&vcpu->arch.regs_avail);
 	__set_bit(VCPU_EXREG_PDPTR,
@@ -455,7 +455,7 @@ EXPORT_SYMBOL_GPL(load_pdptrs);
 
 static bool pdptrs_changed(struct kvm_vcpu *vcpu)
 {
-	u64 pdpte[ARRAY_SIZE(vcpu->arch.pdptrs)];
+	u64 pdpte[ARRAY_SIZE(vcpu->arch.walk_mmu->pdptrs)];
 	bool changed = true;
 	int offset;
 	u32 error;
@@ -475,7 +475,7 @@ static bool pdptrs_changed(struct kvm_vcpu *vcpu)
 				       sizeof(pdpte), &error);
 	if (r < 0)
 		goto out;
-	changed = memcmp(pdpte, vcpu->arch.pdptrs, sizeof(pdpte)) != 0;
+	changed = memcmp(pdpte, vcpu->arch.walk_mmu->pdptrs, sizeof(pdpte)) != 0;
 out:
 
 	return changed;
@@ -514,7 +514,8 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 				return 1;
 		} else
 #endif
-		if (is_pae(vcpu) && !load_pdptrs(vcpu, vcpu->arch.cr3))
+		if (is_pae(vcpu) && !load_pdptrs(vcpu, vcpu->arch.walk_mmu,
+						 vcpu->arch.cr3))
 			return 1;
 	}
 
@@ -603,7 +604,7 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 			return 1;
 	} else if (is_paging(vcpu) && (cr4 & X86_CR4_PAE)
 		   && ((cr4 ^ old_cr4) & pdptr_bits)
-		   && !load_pdptrs(vcpu, vcpu->arch.cr3))
+		   && !load_pdptrs(vcpu, vcpu->arch.walk_mmu, vcpu->arch.cr3))
 		return 1;
 
 	if (cr4 & X86_CR4_VMXE)
@@ -636,7 +637,8 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 		if (is_pae(vcpu)) {
 			if (cr3 & CR3_PAE_RESERVED_BITS)
 				return 1;
-			if (is_paging(vcpu) && !load_pdptrs(vcpu, cr3))
+			if (is_paging(vcpu) &&
+			    !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))
 				return 1;
 		}
 		/*
@@ -5412,7 +5414,7 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 	mmu_reset_needed |= kvm_read_cr4(vcpu) != sregs->cr4;
 	kvm_x86_ops->set_cr4(vcpu, sregs->cr4);
 	if (!is_long_mode(vcpu) && is_pae(vcpu)) {
-		load_pdptrs(vcpu, vcpu->arch.cr3);
+		load_pdptrs(vcpu, vcpu->arch.walk_mmu, vcpu->arch.cr3);
 		mmu_reset_needed = 1;
 	}
 
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 21/27] KVM: MMU: Introduce kvm_pdptr_read_mmu
  2010-09-06 15:55 [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Joerg Roedel
                   ` (19 preceding siblings ...)
  2010-09-06 15:55 ` [PATCH 20/27] KVM: MMU: Add kvm_mmu parameter to load_pdptrs function Joerg Roedel
@ 2010-09-06 15:56 ` Joerg Roedel
  2010-09-06 15:56 ` [PATCH 22/27] KVM: MMU: Refactor mmu_alloc_roots function Joerg Roedel
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 41+ messages in thread
From: Joerg Roedel @ 2010-09-06 15:56 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, joro, kvm, linux-kernel, Joerg Roedel

This function is implemented to load the pdptr pointers of
the currently running guest (l1 or l2 guest). Therefore it
takes care about the current paging mode and can read pdptrs
out of l2 guest physical memory.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/kvm_cache_regs.h |    7 +++++++
 arch/x86/kvm/mmu.c            |    2 +-
 arch/x86/kvm/paging_tmpl.h    |    2 +-
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index a37abe2..975bb45 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -45,6 +45,13 @@ static inline u64 kvm_pdptr_read(struct kvm_vcpu *vcpu, int index)
 	return vcpu->arch.walk_mmu->pdptrs[index];
 }
 
+static inline u64 kvm_pdptr_read_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, int index)
+{
+	load_pdptrs(vcpu, mmu, mmu->get_cr3(vcpu));
+
+	return mmu->pdptrs[index];
+}
+
 static inline ulong kvm_read_cr0_bits(struct kvm_vcpu *vcpu, ulong mask)
 {
 	ulong tmask = mask & KVM_POSSIBLE_CR0_GUEST_BITS;
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 7bc8d67..3663d1c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2398,7 +2398,7 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
 
 		ASSERT(!VALID_PAGE(root));
 		if (vcpu->arch.mmu.root_level == PT32E_ROOT_LEVEL) {
-			pdptr = kvm_pdptr_read(vcpu, i);
+			pdptr = kvm_pdptr_read_mmu(vcpu, &vcpu->arch.mmu, i);
 			if (!is_present_gpte(pdptr)) {
 				vcpu->arch.mmu.pae_root[i] = 0;
 				continue;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 20fc815..c0aac98 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -137,7 +137,7 @@ walk:
 
 #if PTTYPE == 64
 	if (walker->level == PT32E_ROOT_LEVEL) {
-		pte = kvm_pdptr_read(vcpu, (addr >> 30) & 3);
+		pte = kvm_pdptr_read_mmu(vcpu, mmu, (addr >> 30) & 3);
 		trace_kvm_mmu_paging_element(pte, walker->level);
 		if (!is_present_gpte(pte)) {
 			present = false;
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 22/27] KVM: MMU: Refactor mmu_alloc_roots function
  2010-09-06 15:55 [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Joerg Roedel
                   ` (20 preceding siblings ...)
  2010-09-06 15:56 ` [PATCH 21/27] KVM: MMU: Introduce kvm_pdptr_read_mmu Joerg Roedel
@ 2010-09-06 15:56 ` Joerg Roedel
  2010-09-07 20:39   ` Marcelo Tosatti
  2010-09-06 15:56 ` [PATCH 23/27] KVM: MMU: Allow long mode shadows for legacy page tables Joerg Roedel
                   ` (5 subsequent siblings)
  27 siblings, 1 reply; 41+ messages in thread
From: Joerg Roedel @ 2010-09-06 15:56 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, joro, kvm, linux-kernel, Joerg Roedel

This patch factors out the direct-mapping paths of the
mmu_alloc_roots function into a seperate function. This
makes it a lot easier to avoid all the unnecessary checks
done in the shadow path which may break when running direct.
In fact, this patch already fixes a problem when running PAE
guests on a PAE shadow page table.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/mmu.c |   82 ++++++++++++++++++++++++++++++++++++++--------------
 1 files changed, 60 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 3663d1c..e7e5527 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2357,42 +2357,77 @@ static int mmu_check_root(struct kvm_vcpu *vcpu, gfn_t root_gfn)
 	return ret;
 }
 
-static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
+static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
+{
+	struct kvm_mmu_page *sp;
+	int i;
+
+	if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
+		spin_lock(&vcpu->kvm->mmu_lock);
+		kvm_mmu_free_some_pages(vcpu);
+		sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL,
+				      1, ACC_ALL, NULL);
+		++sp->root_count;
+		spin_unlock(&vcpu->kvm->mmu_lock);
+		vcpu->arch.mmu.root_hpa = __pa(sp->spt);
+	} else if (vcpu->arch.mmu.shadow_root_level == PT32E_ROOT_LEVEL) {
+		for (i = 0; i < 4; ++i) {
+			hpa_t root = vcpu->arch.mmu.pae_root[i];
+
+			ASSERT(!VALID_PAGE(root));
+			spin_lock(&vcpu->kvm->mmu_lock);
+			kvm_mmu_free_some_pages(vcpu);
+			sp = kvm_mmu_get_page(vcpu, i << 30, i << 30,
+					      PT32_ROOT_LEVEL, 1, ACC_ALL,
+					      NULL);
+			root = __pa(sp->spt);
+			++sp->root_count;
+			spin_unlock(&vcpu->kvm->mmu_lock);
+			vcpu->arch.mmu.pae_root[i] = root | PT_PRESENT_MASK;
+			vcpu->arch.mmu.root_hpa = __pa(vcpu->arch.mmu.pae_root);
+		}
+	} else
+		BUG();
+
+	return 0;
+}
+
+static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 {
 	int i;
 	gfn_t root_gfn;
 	struct kvm_mmu_page *sp;
-	int direct = 0;
 	u64 pdptr;
 
 	root_gfn = vcpu->arch.mmu.get_cr3(vcpu) >> PAGE_SHIFT;
 
-	if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
+	if (mmu_check_root(vcpu, root_gfn))
+		return 1;
+
+	/*
+	 * Do we shadow a long mode page table? If so we need to
+	 * write-protect the guests page table root.
+	 */
+	if (vcpu->arch.mmu.root_level == PT64_ROOT_LEVEL) {
 		hpa_t root = vcpu->arch.mmu.root_hpa;
 
 		ASSERT(!VALID_PAGE(root));
-		if (mmu_check_root(vcpu, root_gfn))
-			return 1;
-		if (vcpu->arch.mmu.direct_map) {
-			direct = 1;
-			root_gfn = 0;
-		}
+
 		spin_lock(&vcpu->kvm->mmu_lock);
 		kvm_mmu_free_some_pages(vcpu);
-		sp = kvm_mmu_get_page(vcpu, root_gfn, 0,
-				      PT64_ROOT_LEVEL, direct,
-				      ACC_ALL, NULL);
+		sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_LEVEL,
+				      0, ACC_ALL, NULL);
 		root = __pa(sp->spt);
 		++sp->root_count;
 		spin_unlock(&vcpu->kvm->mmu_lock);
 		vcpu->arch.mmu.root_hpa = root;
 		return 0;
 	}
-	direct = !is_paging(vcpu);
-
-	if (mmu_check_root(vcpu, root_gfn))
-		return 1;
 
+	/*
+	 * We shadow a 32 bit page table. This may be a legacy 2-level
+	 * or a PAE 3-level page table.
+	 */
 	for (i = 0; i < 4; ++i) {
 		hpa_t root = vcpu->arch.mmu.pae_root[i];
 
@@ -2406,16 +2441,11 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
 			root_gfn = pdptr >> PAGE_SHIFT;
 			if (mmu_check_root(vcpu, root_gfn))
 				return 1;
-		} else if (vcpu->arch.mmu.root_level == 0)
-			root_gfn = 0;
-		if (vcpu->arch.mmu.direct_map) {
-			direct = 1;
-			root_gfn = i << 30;
 		}
 		spin_lock(&vcpu->kvm->mmu_lock);
 		kvm_mmu_free_some_pages(vcpu);
 		sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30,
-				      PT32_ROOT_LEVEL, direct,
+				      PT32_ROOT_LEVEL, 0,
 				      ACC_ALL, NULL);
 		root = __pa(sp->spt);
 		++sp->root_count;
@@ -2427,6 +2457,14 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
+{
+	if (vcpu->arch.mmu.direct_map)
+		return mmu_alloc_direct_roots(vcpu);
+	else
+		return mmu_alloc_shadow_roots(vcpu);
+}
+
 static void mmu_sync_roots(struct kvm_vcpu *vcpu)
 {
 	int i;
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 23/27] KVM: MMU: Allow long mode shadows for legacy page tables
  2010-09-06 15:55 [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Joerg Roedel
                   ` (21 preceding siblings ...)
  2010-09-06 15:56 ` [PATCH 22/27] KVM: MMU: Refactor mmu_alloc_roots function Joerg Roedel
@ 2010-09-06 15:56 ` Joerg Roedel
  2010-09-06 15:56 ` [PATCH 24/27] KVM: SVM: Implement MMU helper functions for Nested Nested Paging Joerg Roedel
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 41+ messages in thread
From: Joerg Roedel @ 2010-09-06 15:56 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, joro, kvm, linux-kernel, Joerg Roedel

Currently the KVM softmmu implementation can not shadow a 32
bit legacy or PAE page table with a long mode page table.
This is a required feature for nested paging emulation
because the nested page table must alway be in host format.
So this patch implements the missing pieces to allow long
mode page tables for page table types.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/kvm_host.h |    1 +
 arch/x86/kvm/mmu.c              |   60 +++++++++++++++++++++++++++++++++-----
 2 files changed, 53 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1080c0f..475fc70 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -258,6 +258,7 @@ struct kvm_mmu {
 	bool direct_map;
 
 	u64 *pae_root;
+	u64 *lm_root;
 	u64 rsvd_bits_mask[2][4];
 
 	u64 pdptrs[4]; /* pae */
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index e7e5527..ea8ed8b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1504,6 +1504,12 @@ static void shadow_walk_init(struct kvm_shadow_walk_iterator *iterator,
 	iterator->addr = addr;
 	iterator->shadow_addr = vcpu->arch.mmu.root_hpa;
 	iterator->level = vcpu->arch.mmu.shadow_root_level;
+
+	if (iterator->level == PT64_ROOT_LEVEL &&
+	    vcpu->arch.mmu.root_level < PT64_ROOT_LEVEL &&
+	    !vcpu->arch.mmu.direct_map)
+		--iterator->level;
+
 	if (iterator->level == PT32E_ROOT_LEVEL) {
 		iterator->shadow_addr
 			= vcpu->arch.mmu.pae_root[(addr >> 30) & 3];
@@ -2314,7 +2320,9 @@ static void mmu_free_roots(struct kvm_vcpu *vcpu)
 	if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
 		return;
 	spin_lock(&vcpu->kvm->mmu_lock);
-	if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
+	if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL &&
+	    (vcpu->arch.mmu.root_level == PT64_ROOT_LEVEL ||
+	     vcpu->arch.mmu.direct_map)) {
 		hpa_t root = vcpu->arch.mmu.root_hpa;
 
 		sp = page_header(root);
@@ -2394,10 +2402,10 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 
 static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 {
-	int i;
-	gfn_t root_gfn;
 	struct kvm_mmu_page *sp;
-	u64 pdptr;
+	u64 pdptr, pm_mask;
+	gfn_t root_gfn;
+	int i;
 
 	root_gfn = vcpu->arch.mmu.get_cr3(vcpu) >> PAGE_SHIFT;
 
@@ -2426,8 +2434,13 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 
 	/*
 	 * We shadow a 32 bit page table. This may be a legacy 2-level
-	 * or a PAE 3-level page table.
+	 * or a PAE 3-level page table. In either case we need to be aware that
+	 * the shadow page table may be a PAE or a long mode page table.
 	 */
+	pm_mask = PT_PRESENT_MASK;
+	if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL)
+		pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK;
+
 	for (i = 0; i < 4; ++i) {
 		hpa_t root = vcpu->arch.mmu.pae_root[i];
 
@@ -2451,9 +2464,35 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 		++sp->root_count;
 		spin_unlock(&vcpu->kvm->mmu_lock);
 
-		vcpu->arch.mmu.pae_root[i] = root | PT_PRESENT_MASK;
+		vcpu->arch.mmu.pae_root[i] = root | pm_mask;
+		vcpu->arch.mmu.root_hpa = __pa(vcpu->arch.mmu.pae_root);
 	}
-	vcpu->arch.mmu.root_hpa = __pa(vcpu->arch.mmu.pae_root);
+
+	/*
+	 * If we shadow a 32 bit page table with a long mode page
+	 * table we enter this path.
+	 */
+	if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
+		if (vcpu->arch.mmu.lm_root == NULL) {
+			/*
+			 * The additional page necessary for this is only
+			 * allocated on demand.
+			 */
+
+			u64 *lm_root;
+
+			lm_root = (void*)get_zeroed_page(GFP_KERNEL);
+			if (lm_root == NULL)
+				return 1;
+
+			lm_root[0] = __pa(vcpu->arch.mmu.pae_root) | pm_mask;
+
+			vcpu->arch.mmu.lm_root = lm_root;
+		}
+
+		vcpu->arch.mmu.root_hpa = __pa(vcpu->arch.mmu.lm_root);
+	}
+
 	return 0;
 }
 
@@ -2470,9 +2509,12 @@ static void mmu_sync_roots(struct kvm_vcpu *vcpu)
 	int i;
 	struct kvm_mmu_page *sp;
 
+	if (vcpu->arch.mmu.direct_map)
+		return;
+
 	if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
 		return;
-	if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
+	if (vcpu->arch.mmu.root_level == PT64_ROOT_LEVEL) {
 		hpa_t root = vcpu->arch.mmu.root_hpa;
 		sp = page_header(root);
 		mmu_sync_children(vcpu, sp);
@@ -3250,6 +3292,8 @@ EXPORT_SYMBOL_GPL(kvm_disable_tdp);
 static void free_mmu_pages(struct kvm_vcpu *vcpu)
 {
 	free_page((unsigned long)vcpu->arch.mmu.pae_root);
+	if (vcpu->arch.mmu.lm_root != NULL)
+		free_page((unsigned long)vcpu->arch.mmu.lm_root);
 }
 
 static int alloc_mmu_pages(struct kvm_vcpu *vcpu)
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 24/27] KVM: SVM: Implement MMU helper functions for Nested Nested Paging
  2010-09-06 15:55 [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Joerg Roedel
                   ` (22 preceding siblings ...)
  2010-09-06 15:56 ` [PATCH 23/27] KVM: MMU: Allow long mode shadows for legacy page tables Joerg Roedel
@ 2010-09-06 15:56 ` Joerg Roedel
  2010-09-06 15:56 ` [PATCH 25/27] KVM: SVM: Initialize Nested Nested MMU context on VMRUN Joerg Roedel
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 41+ messages in thread
From: Joerg Roedel @ 2010-09-06 15:56 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, joro, kvm, linux-kernel, Joerg Roedel

This patch adds the helper functions which will be used in
the mmu context for handling nested nested page faults.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/svm.c |   32 ++++++++++++++++++++++++++++++++
 1 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index a98ac52..6e72ba9 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -104,6 +104,8 @@ struct nested_state {
 	u32 intercept_exceptions;
 	u64 intercept;
 
+	/* Nested Paging related state */
+	u64 nested_cr3;
 };
 
 #define MSRPM_OFFSETS	16
@@ -1600,6 +1602,36 @@ static int vmmcall_interception(struct vcpu_svm *svm)
 	return 1;
 }
 
+static unsigned long nested_svm_get_tdp_cr3(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	return svm->nested.nested_cr3;
+}
+
+static void nested_svm_set_tdp_cr3(struct kvm_vcpu *vcpu,
+				   unsigned long root)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	svm->vmcb->control.nested_cr3 = root;
+	force_new_asid(vcpu);
+}
+
+static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
+				       unsigned long addr,
+				       u32 error_code)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	svm->vmcb->control.exit_code = SVM_EXIT_NPF;
+	svm->vmcb->control.exit_code_hi = 0;
+	svm->vmcb->control.exit_info_1 = error_code;
+	svm->vmcb->control.exit_info_2 = addr;
+
+	nested_svm_vmexit(svm);
+}
+
 static int nested_svm_check_permissions(struct vcpu_svm *svm)
 {
 	if (!(svm->vcpu.arch.efer & EFER_SVME)
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 25/27] KVM: SVM: Initialize Nested Nested MMU context on VMRUN
  2010-09-06 15:55 [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Joerg Roedel
                   ` (23 preceding siblings ...)
  2010-09-06 15:56 ` [PATCH 24/27] KVM: SVM: Implement MMU helper functions for Nested Nested Paging Joerg Roedel
@ 2010-09-06 15:56 ` Joerg Roedel
  2010-09-06 15:56 ` [PATCH 26/27] KVM: SVM: Expect two more candiates for exit_int_info Joerg Roedel
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 41+ messages in thread
From: Joerg Roedel @ 2010-09-06 15:56 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, joro, kvm, linux-kernel, Joerg Roedel

This patch adds code to initialize the Nested Nested Paging
MMU context when the L1 guest executes a VMRUN instruction
and has nested paging enabled in its VMCB.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/mmu.c |    1 +
 arch/x86/kvm/svm.c |   50 +++++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 42 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ea8ed8b..cf4474b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2945,6 +2945,7 @@ void kvm_mmu_unload(struct kvm_vcpu *vcpu)
 {
 	mmu_free_roots(vcpu);
 }
+EXPORT_SYMBOL_GPL(kvm_mmu_unload);
 
 static void mmu_pte_write_zap_pte(struct kvm_vcpu *vcpu,
 				  struct kvm_mmu_page *sp,
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 6e72ba9..949e10d 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -294,6 +294,15 @@ static inline void flush_guest_tlb(struct kvm_vcpu *vcpu)
 	force_new_asid(vcpu);
 }
 
+static int get_npt_level(void)
+{
+#ifdef CONFIG_X86_64
+	return PT64_ROOT_LEVEL;
+#else
+	return PT32E_ROOT_LEVEL;
+#endif
+}
+
 static void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 {
 	vcpu->arch.efer = efer;
@@ -1632,6 +1641,26 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
 	nested_svm_vmexit(svm);
 }
 
+static int nested_svm_init_mmu_context(struct kvm_vcpu *vcpu)
+{
+	int r;
+
+	r = kvm_init_shadow_mmu(vcpu, &vcpu->arch.mmu);
+
+	vcpu->arch.mmu.set_cr3           = nested_svm_set_tdp_cr3;
+	vcpu->arch.mmu.get_cr3           = nested_svm_get_tdp_cr3;
+	vcpu->arch.mmu.inject_page_fault = nested_svm_inject_npf_exit;
+	vcpu->arch.mmu.shadow_root_level = get_npt_level();
+	vcpu->arch.walk_mmu              = &vcpu->arch.nested_mmu;
+
+	return r;
+}
+
+static void nested_svm_uninit_mmu_context(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.walk_mmu = &vcpu->arch.mmu;
+}
+
 static int nested_svm_check_permissions(struct vcpu_svm *svm)
 {
 	if (!(svm->vcpu.arch.efer & EFER_SVME)
@@ -2000,6 +2029,8 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
 	kvm_clear_exception_queue(&svm->vcpu);
 	kvm_clear_interrupt_queue(&svm->vcpu);
 
+	svm->nested.nested_cr3 = 0;
+
 	/* Restore selected save entries */
 	svm->vmcb->save.es = hsave->save.es;
 	svm->vmcb->save.cs = hsave->save.cs;
@@ -2026,6 +2057,7 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
 
 	nested_svm_unmap(page);
 
+	nested_svm_uninit_mmu_context(&svm->vcpu);
 	kvm_mmu_reset_context(&svm->vcpu);
 	kvm_mmu_load(&svm->vcpu);
 
@@ -2073,6 +2105,9 @@ static bool nested_vmcb_checks(struct vmcb *vmcb)
 	if (vmcb->control.asid == 0)
 		return false;
 
+	if (vmcb->control.nested_ctl && !npt_enabled)
+		return false;
+
 	return true;
 }
 
@@ -2145,6 +2180,12 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
 	else
 		svm->vcpu.arch.hflags &= ~HF_HIF_MASK;
 
+	if (nested_vmcb->control.nested_ctl) {
+		kvm_mmu_unload(&svm->vcpu);
+		svm->nested.nested_cr3 = nested_vmcb->control.nested_cr3;
+		nested_svm_init_mmu_context(&svm->vcpu);
+	}
+
 	/* Load the nested guest state */
 	svm->vmcb->save.es = nested_vmcb->save.es;
 	svm->vmcb->save.cs = nested_vmcb->save.cs;
@@ -3412,15 +3453,6 @@ static bool svm_cpu_has_accelerated_tpr(void)
 	return false;
 }
 
-static int get_npt_level(void)
-{
-#ifdef CONFIG_X86_64
-	return PT64_ROOT_LEVEL;
-#else
-	return PT32E_ROOT_LEVEL;
-#endif
-}
-
 static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 {
 	return 0;
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 26/27] KVM: SVM: Expect two more candiates for exit_int_info
  2010-09-06 15:55 [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Joerg Roedel
                   ` (24 preceding siblings ...)
  2010-09-06 15:56 ` [PATCH 25/27] KVM: SVM: Initialize Nested Nested MMU context on VMRUN Joerg Roedel
@ 2010-09-06 15:56 ` Joerg Roedel
  2010-09-06 15:56 ` [PATCH 27/27] KVM: SVM: Report Nested Paging support to userspace Joerg Roedel
  2010-09-06 18:37 ` [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Avi Kivity
  27 siblings, 0 replies; 41+ messages in thread
From: Joerg Roedel @ 2010-09-06 15:56 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, joro, kvm, linux-kernel, Joerg Roedel

This patch adds INTR and NMI intercepts to the list of
expected intercepts with an exit_int_info set. While this
can't happen on bare metal it is architectural legal and may
happen with KVMs SVM emulation.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/svm.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 949e10d..932183e 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2993,7 +2993,8 @@ static int handle_exit(struct kvm_vcpu *vcpu)
 
 	if (is_external_interrupt(svm->vmcb->control.exit_int_info) &&
 	    exit_code != SVM_EXIT_EXCP_BASE + PF_VECTOR &&
-	    exit_code != SVM_EXIT_NPF && exit_code != SVM_EXIT_TASK_SWITCH)
+	    exit_code != SVM_EXIT_NPF && exit_code != SVM_EXIT_TASK_SWITCH &&
+	    exit_code != SVM_EXIT_INTR && exit_code != SVM_EXIT_NMI)
 		printk(KERN_ERR "%s: unexpected exit_ini_info 0x%x "
 		       "exit_code 0x%x\n",
 		       __func__, svm->vmcb->control.exit_int_info,
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 27/27] KVM: SVM: Report Nested Paging support to userspace
  2010-09-06 15:55 [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Joerg Roedel
                   ` (25 preceding siblings ...)
  2010-09-06 15:56 ` [PATCH 26/27] KVM: SVM: Expect two more candiates for exit_int_info Joerg Roedel
@ 2010-09-06 15:56 ` Joerg Roedel
  2010-09-06 18:37 ` [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Avi Kivity
  27 siblings, 0 replies; 41+ messages in thread
From: Joerg Roedel @ 2010-09-06 15:56 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alexander Graf, joro, kvm, linux-kernel, Joerg Roedel

This patch implements the reporting of the nested paging
feature support to userspace.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/svm.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 932183e..dd6c529 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3478,6 +3478,10 @@ static void svm_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
 		if (svm_has(SVM_FEATURE_NRIP))
 			entry->edx |= SVM_FEATURE_NRIP;
 
+		/* Support NPT for the guest if enabled */
+		if (npt_enabled)
+			entry->edx |= SVM_FEATURE_NPT;
+
 		break;
 	}
 }
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 10/27] KVM: MMU: Add infrastructure for two-level page walker
  2010-09-06 15:55 ` [PATCH 10/27] KVM: MMU: Add infrastructure for two-level page walker Joerg Roedel
@ 2010-09-06 18:05   ` Avi Kivity
  2010-09-08  9:20     ` Roedel, Joerg
  0 siblings, 1 reply; 41+ messages in thread
From: Avi Kivity @ 2010-09-06 18:05 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Marcelo Tosatti, Alexander Graf, joro, kvm, linux-kernel

  On 09/06/2010 06:55 PM, Joerg Roedel wrote:
> This patch introduces a mmu-callback to translate gpa
> addresses in the walk_addr code. This is later used to
> translate l2_gpa addresses into l1_gpa addresses.

> @@ -534,6 +534,11 @@ static inline gpa_t gfn_to_gpa(gfn_t gfn)
>   	return (gpa_t)gfn<<  PAGE_SHIFT;
>   }
>
> +static inline gfn_t gpa_to_gfn(gpa_t gpa)
> +{
> +	return (gfn_t)gpa>>  PAGE_SHIFT;
> +}
> +

That's a bug - gfn_t may be smaller than gpa_t, so you're truncating 
just before the shift.  Note the casts in the surrounding functions are 
widening, not narrowing.

However, gfn_t is u64 so the bug is only theoretical.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 17/27] KVM: MMU: Track page fault data in struct vcpu
  2010-09-06 15:55 ` [PATCH 17/27] KVM: MMU: Track page fault data in struct vcpu Joerg Roedel
@ 2010-09-06 18:17   ` Avi Kivity
  0 siblings, 0 replies; 41+ messages in thread
From: Avi Kivity @ 2010-09-06 18:17 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Marcelo Tosatti, Alexander Graf, joro, kvm, linux-kernel

  On 09/06/2010 06:55 PM, Joerg Roedel wrote:
> This patch introduces a struct with two new fields in
> vcpu_arch for x86:
>
> 	* fault.address
> 	* fault.error_code
>
> This will be used to correctly propagate page faults back
> into the guest when we could have either an ordinary page
> fault or a nested page fault. In the case of a nested page
> fault the fault-address is different from the original
> address that should be walked. So we need to keep track
> about the real fault-address.
>
>
>
> -static void emulate_pf(struct x86_emulate_ctxt *ctxt, unsigned long addr,
> -		       int err)
> +static void emulate_pf(struct x86_emulate_ctxt *ctxt)
>   {
> -	ctxt->cr2 = addr;
> -	emulate_exception(ctxt, PF_VECTOR, err, true);
> +	emulate_exception(ctxt, PF_VECTOR, 0, true);
>   }

What happened to the error code?

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b2fe9e7..38d482d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4130,7 +4130,8 @@ static void inject_emulated_exception(struct kvm_vcpu *vcpu)
>   {
>   	struct x86_emulate_ctxt *ctxt =&vcpu->arch.emulate_ctxt;
>   	if (ctxt->exception == PF_VECTOR)
> -		kvm_inject_page_fault(vcpu, ctxt->cr2, ctxt->error_code);
> +		kvm_inject_page_fault(vcpu, vcpu->arch.fault.address,
> +					    vcpu->arch.fault.error_code);
>   	else if (ctxt->error_code_valid)
>   		kvm_queue_exception_e(vcpu, ctxt->exception, ctxt->error_code);
>   	else

Ah.  Not lovely, but it was ugly before as well.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List)
  2010-09-06 15:55 [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Joerg Roedel
                   ` (26 preceding siblings ...)
  2010-09-06 15:56 ` [PATCH 27/27] KVM: SVM: Report Nested Paging support to userspace Joerg Roedel
@ 2010-09-06 18:37 ` Avi Kivity
  2010-09-07 16:35   ` Roedel, Joerg
  27 siblings, 1 reply; 41+ messages in thread
From: Avi Kivity @ 2010-09-06 18:37 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Marcelo Tosatti, Alexander Graf, joro, kvm, linux-kernel

  On 09/06/2010 06:55 PM, Joerg Roedel wrote:
> (Now with correct Cc-list. I accidentially copied the wrong line from
>   MAINTAINERS in the first post of this. Sorry for the double-post)
>
> Hi Avi, Marcelo,
>
> here is finally the third round of my NPT virtualization patches for KVM. It
> took a while to get everything running (including KVM itself) on 32 bit again
> to actually test it. But testing on 32 bit host and with a 32 bit hypervisor
> was a very good idea. I found some serious bugs and shortcomings in my code
> that are fixed now in v3.
>

<snip>

> This patchset applies on todays avi/master + the three patches I sent end of
> last week. These patches are necessary for some of the tests above to run.
>
> For the curious and impatient user I put everything in a branch on kernel.org.
> If you want to test it you can pull the tree from
>
> 	git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-kvm.git npt-virt-v3
>
> Please review and/or apply these patches if considered good enough. Otherwise I
> appreciate your feedback.

Very impressive patchset.  It's broken out so finely that the careful 
reader gets the feeling he understands every little detail, without 
noticing you've introduced recursion into the kvm mmu.

The little nit regarding patch 10 can be addressed in a follow-on patch.

Reviewed-by: Avi Kivity <avi@redhat.com>

Please also post a unit test that checks that nested page faults for l1 
ptes with bad NX, U, W, or reserved bits set are correctly intercepted 
and reported.  W should work already if you tested nested vga, but the 
rest are untested during normal operation and pose a security problem if 
they are incorrect.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List)
  2010-09-06 18:37 ` [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Avi Kivity
@ 2010-09-07 16:35   ` Roedel, Joerg
  0 siblings, 0 replies; 41+ messages in thread
From: Roedel, Joerg @ 2010-09-07 16:35 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Alexander Graf, joro, kvm, linux-kernel

On Mon, Sep 06, 2010 at 02:37:33PM -0400, Avi Kivity wrote:
>   On 09/06/2010 06:55 PM, Joerg Roedel wrote:
> > (Now with correct Cc-list. I accidentially copied the wrong line from
> >   MAINTAINERS in the first post of this. Sorry for the double-post)
> >
> > Hi Avi, Marcelo,
> >
> > here is finally the third round of my NPT virtualization patches for KVM. It
> > took a while to get everything running (including KVM itself) on 32 bit again
> > to actually test it. But testing on 32 bit host and with a 32 bit hypervisor
> > was a very good idea. I found some serious bugs and shortcomings in my code
> > that are fixed now in v3.
> >
> 
> <snip>
> 
> > This patchset applies on todays avi/master + the three patches I sent end of
> > last week. These patches are necessary for some of the tests above to run.
> >
> > For the curious and impatient user I put everything in a branch on kernel.org.
> > If you want to test it you can pull the tree from
> >
> > 	git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-kvm.git npt-virt-v3
> >
> > Please review and/or apply these patches if considered good enough. Otherwise I
> > appreciate your feedback.
> 
> Very impressive patchset.  It's broken out so finely that the careful 
> reader gets the feeling he understands every little detail, without 
> noticing you've introduced recursion into the kvm mmu.

Thanks :-)

> Please also post a unit test that checks that nested page faults for l1 
> ptes with bad NX, U, W, or reserved bits set are correctly intercepted 
> and reported.  W should work already if you tested nested vga, but the 
> rest are untested during normal operation and pose a security problem if 
> they are incorrect.

Okay, I'll write a test for all these cases.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 14/27] KVM: MMU: Make walk_addr_generic capable for two-level walking
  2010-09-06 15:55 ` [PATCH 14/27] KVM: MMU: Make walk_addr_generic capable for two-level walking Joerg Roedel
@ 2010-09-07 17:48   ` Marcelo Tosatti
  2010-09-08  9:12     ` Roedel, Joerg
  0 siblings, 1 reply; 41+ messages in thread
From: Marcelo Tosatti @ 2010-09-07 17:48 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Avi Kivity, Alexander Graf, joro, kvm, linux-kernel

On Mon, Sep 06, 2010 at 05:55:53PM +0200, Joerg Roedel wrote:
> This patch uses kvm_read_guest_page_tdp to make the
> walk_addr_generic functions suitable for two-level page
> table walking.
> 
> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
> ---
>  arch/x86/kvm/paging_tmpl.h |   27 ++++++++++++++++++++-------
>  1 files changed, 20 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index cd59af1..a5b5759 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -124,6 +124,8 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
>  	unsigned index, pt_access, uninitialized_var(pte_access);
>  	gpa_t pte_gpa;
>  	bool eperm, present, rsvd_fault;
> +	int offset;
> +	u32 error = 0;
>  
>  	trace_kvm_mmu_pagetable_walk(addr, write_fault, user_fault,
>  				     fetch_fault);
> @@ -153,12 +155,13 @@ walk:
>  		index = PT_INDEX(addr, walker->level);
>  
>  		table_gfn = gpte_to_gfn(pte);
> -		pte_gpa = gfn_to_gpa(table_gfn);
> -		pte_gpa += index * sizeof(pt_element_t);
> +		offset    = index * sizeof(pt_element_t);
> +		pte_gpa   = gfn_to_gpa(table_gfn) + offset;
>  		walker->table_gfn[walker->level - 1] = table_gfn;
>  		walker->pte_gpa[walker->level - 1] = pte_gpa;
>  
> -		if (kvm_read_guest(vcpu->kvm, pte_gpa, &pte, sizeof(pte))) {
> +		if (kvm_read_guest_page_mmu(vcpu, mmu, table_gfn, &pte, offset,
> +					    sizeof(pte), &error)) {
>  			present = false;
>  			break;
>  		}

If there is failure reading the nested page tables here, you fill
vcpu->arch.fault. But the nested fault error values will be overwritten
at the end of walk_addr() by the original fault values?

> @@ -209,15 +212,25 @@ walk:
>  				is_large_pte(pte) &&
>  				mmu->root_level == PT64_ROOT_LEVEL)) {
>  			int lvl = walker->level;
> +			gpa_t real_gpa;
> +			gfn_t gfn;
>  
> -			walker->gfn = gpte_to_gfn_lvl(pte, lvl);
> -			walker->gfn += (addr & PT_LVL_OFFSET_MASK(lvl))
> -					>> PAGE_SHIFT;
> +			gfn = gpte_to_gfn_lvl(pte, lvl);
> +			gfn += (addr & PT_LVL_OFFSET_MASK(lvl)) >> PAGE_SHIFT;
>  
>  			if (PTTYPE == 32 &&
>  			    walker->level == PT_DIRECTORY_LEVEL &&
>  			    is_cpuid_PSE36())
> -				walker->gfn += pse36_gfn_delta(pte);
> +				gfn += pse36_gfn_delta(pte);
> +
> +			real_gpa = mmu->translate_gpa(vcpu, gfn_to_gpa(gfn),
> +						      &error);
> +			if (real_gpa == UNMAPPED_GVA) {
> +				walker->error_code = error;
> +				return 0;
> +			}
> +
> +			walker->gfn = real_gpa >> PAGE_SHIFT;
>  
>  			break;
>  		}
> -- 
> 1.7.0.4
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 19/27] KVM: X86: Propagate fetch faults
  2010-09-06 15:55 ` [PATCH 19/27] KVM: X86: Propagate fetch faults Joerg Roedel
@ 2010-09-07 18:43   ` Marcelo Tosatti
  2010-09-08  9:18     ` Roedel, Joerg
  0 siblings, 1 reply; 41+ messages in thread
From: Marcelo Tosatti @ 2010-09-07 18:43 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Avi Kivity, Alexander Graf, joro, kvm, linux-kernel

On Mon, Sep 06, 2010 at 05:55:58PM +0200, Joerg Roedel wrote:
> KVM currently ignores fetch faults in the instruction
> emulator. With nested-npt we could have such faults. This
> patch adds the code to handle these.
> 
> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
> ---
>  arch/x86/kvm/emulate.c |    3 +++
>  arch/x86/kvm/x86.c     |    4 ++++
>  2 files changed, 7 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 2b08b78..aead72e 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -1198,6 +1198,9 @@ static int emulate_popf(struct x86_emulate_ctxt *ctxt,
>  	*(unsigned long *)dest =
>  		(ctxt->eflags & ~change_mask) | (val & change_mask);
>  
> +	if (rc == X86EMUL_PROPAGATE_FAULT)
> +		emulate_pf(ctxt);
> +
>  	return rc;
>  }
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 65b00f0..ca69dcc 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4237,6 +4237,9 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
>  		vcpu->arch.emulate_ctxt.perm_ok = false;
>  
>  		r = x86_decode_insn(&vcpu->arch.emulate_ctxt);
> +		if (r == X86EMUL_PROPAGATE_FAULT)
> +			goto done;
> +

x86_decode_insn returns -1 / 0 ? 

>  		trace_kvm_emulate_insn_start(vcpu);
>  
>  		/* Only allow emulation of specific instructions on #UD
> @@ -4295,6 +4298,7 @@ restart:
>  		return handle_emulation_failure(vcpu);
>  	}
>  
> +done:
>  	if (vcpu->arch.emulate_ctxt.exception >= 0) {
>  		inject_emulated_exception(vcpu);
>  		r = EMULATE_DONE;
> -- 
> 1.7.0.4
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 22/27] KVM: MMU: Refactor mmu_alloc_roots function
  2010-09-06 15:56 ` [PATCH 22/27] KVM: MMU: Refactor mmu_alloc_roots function Joerg Roedel
@ 2010-09-07 20:39   ` Marcelo Tosatti
  2010-09-08  7:16     ` Avi Kivity
  0 siblings, 1 reply; 41+ messages in thread
From: Marcelo Tosatti @ 2010-09-07 20:39 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Avi Kivity, Alexander Graf, joro, kvm, linux-kernel

On Mon, Sep 06, 2010 at 05:56:01PM +0200, Joerg Roedel wrote:
> This patch factors out the direct-mapping paths of the
> mmu_alloc_roots function into a seperate function. This
> makes it a lot easier to avoid all the unnecessary checks
> done in the shadow path which may break when running direct.
> In fact, this patch already fixes a problem when running PAE
> guests on a PAE shadow page table.
> 
> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
> ---
>  arch/x86/kvm/mmu.c |   82 ++++++++++++++++++++++++++++++++++++++--------------
>  1 files changed, 60 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 3663d1c..e7e5527 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c

> +
> +static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
>  {
>  	int i;
>  	gfn_t root_gfn;
>  	struct kvm_mmu_page *sp;
> -	int direct = 0;
>  	u64 pdptr;
>  
>  	root_gfn = vcpu->arch.mmu.get_cr3(vcpu) >> PAGE_SHIFT;
>  
> -	if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
> +	if (mmu_check_root(vcpu, root_gfn))
> +		return 1;
> +
> +	/*
> +	 * Do we shadow a long mode page table? If so we need to
> +	 * write-protect the guests page table root.
> +	 */
> +	if (vcpu->arch.mmu.root_level == PT64_ROOT_LEVEL) {
>  		hpa_t root = vcpu->arch.mmu.root_hpa;
>  
>  		ASSERT(!VALID_PAGE(root));
> -		if (mmu_check_root(vcpu, root_gfn))
> -			return 1;
> -		if (vcpu->arch.mmu.direct_map) {
> -			direct = 1;
> -			root_gfn = 0;
> -		}
> +
>  		spin_lock(&vcpu->kvm->mmu_lock);
>  		kvm_mmu_free_some_pages(vcpu);
> -		sp = kvm_mmu_get_page(vcpu, root_gfn, 0,
> -				      PT64_ROOT_LEVEL, direct,
> -				      ACC_ALL, NULL);
> +		sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_LEVEL,
> +				      0, ACC_ALL, NULL);
>  		root = __pa(sp->spt);
>  		++sp->root_count;
>  		spin_unlock(&vcpu->kvm->mmu_lock);
>  		vcpu->arch.mmu.root_hpa = root;
>  		return 0;
>  	}
> -	direct = !is_paging(vcpu);
> -
> -	if (mmu_check_root(vcpu, root_gfn))
> -		return 1;
>  
> +	/*
> +	 * We shadow a 32 bit page table. This may be a legacy 2-level
> +	 * or a PAE 3-level page table.
> +	 */
>  	for (i = 0; i < 4; ++i) {
>  		hpa_t root = vcpu->arch.mmu.pae_root[i];
>  
> @@ -2406,16 +2441,11 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
>  			root_gfn = pdptr >> PAGE_SHIFT;
>  			if (mmu_check_root(vcpu, root_gfn))
>  				return 1;
> -		} else if (vcpu->arch.mmu.root_level == 0)
> -			root_gfn = 0;
> -		if (vcpu->arch.mmu.direct_map) {
> -			direct = 1;
> -			root_gfn = i << 30;
>  		}
>  		spin_lock(&vcpu->kvm->mmu_lock);
>  		kvm_mmu_free_some_pages(vcpu);
>  		sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30,
> -				      PT32_ROOT_LEVEL, direct,
> +				      PT32_ROOT_LEVEL, 0,
>  				      ACC_ALL, NULL);

Should not write protect the gfn for nonpaging mode.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 22/27] KVM: MMU: Refactor mmu_alloc_roots function
  2010-09-07 20:39   ` Marcelo Tosatti
@ 2010-09-08  7:16     ` Avi Kivity
  2010-09-08  9:16       ` Roedel, Joerg
  0 siblings, 1 reply; 41+ messages in thread
From: Avi Kivity @ 2010-09-08  7:16 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Joerg Roedel, Alexander Graf, joro, kvm, linux-kernel

  On 09/07/2010 11:39 PM, Marcelo Tosatti wrote:
>
>> @@ -2406,16 +2441,11 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
>>   			root_gfn = pdptr>>  PAGE_SHIFT;
>>   			if (mmu_check_root(vcpu, root_gfn))
>>   				return 1;
>> -		} else if (vcpu->arch.mmu.root_level == 0)
>> -			root_gfn = 0;
>> -		if (vcpu->arch.mmu.direct_map) {
>> -			direct = 1;
>> -			root_gfn = i<<  30;
>>   		}
>>   		spin_lock(&vcpu->kvm->mmu_lock);
>>   		kvm_mmu_free_some_pages(vcpu);
>>   		sp = kvm_mmu_get_page(vcpu, root_gfn, i<<  30,
>> -				      PT32_ROOT_LEVEL, direct,
>> +				      PT32_ROOT_LEVEL, 0,
>>   				      ACC_ALL, NULL);
> Should not write protect the gfn for nonpaging mode.
>

nonpaging mode should have direct_map set, so wouldn't enter this path 
at all.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 14/27] KVM: MMU: Make walk_addr_generic capable for two-level walking
  2010-09-07 17:48   ` Marcelo Tosatti
@ 2010-09-08  9:12     ` Roedel, Joerg
  0 siblings, 0 replies; 41+ messages in thread
From: Roedel, Joerg @ 2010-09-08  9:12 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, Alexander Graf, joro, kvm, linux-kernel

On Tue, Sep 07, 2010 at 01:48:05PM -0400, Marcelo Tosatti wrote:
> On Mon, Sep 06, 2010 at 05:55:53PM +0200, Joerg Roedel wrote:
> > This patch uses kvm_read_guest_page_tdp to make the
> > walk_addr_generic functions suitable for two-level page
> > table walking.
> > 
> > Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
> > ---
> >  arch/x86/kvm/paging_tmpl.h |   27 ++++++++++++++++++++-------
> >  1 files changed, 20 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> > index cd59af1..a5b5759 100644
> > --- a/arch/x86/kvm/paging_tmpl.h
> > +++ b/arch/x86/kvm/paging_tmpl.h
> > @@ -124,6 +124,8 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
> >  	unsigned index, pt_access, uninitialized_var(pte_access);
> >  	gpa_t pte_gpa;
> >  	bool eperm, present, rsvd_fault;
> > +	int offset;
> > +	u32 error = 0;
> >  
> >  	trace_kvm_mmu_pagetable_walk(addr, write_fault, user_fault,
> >  				     fetch_fault);
> > @@ -153,12 +155,13 @@ walk:
> >  		index = PT_INDEX(addr, walker->level);
> >  
> >  		table_gfn = gpte_to_gfn(pte);
> > -		pte_gpa = gfn_to_gpa(table_gfn);
> > -		pte_gpa += index * sizeof(pt_element_t);
> > +		offset    = index * sizeof(pt_element_t);
> > +		pte_gpa   = gfn_to_gpa(table_gfn) + offset;
> >  		walker->table_gfn[walker->level - 1] = table_gfn;
> >  		walker->pte_gpa[walker->level - 1] = pte_gpa;
> >  
> > -		if (kvm_read_guest(vcpu->kvm, pte_gpa, &pte, sizeof(pte))) {
> > +		if (kvm_read_guest_page_mmu(vcpu, mmu, table_gfn, &pte, offset,
> > +					    sizeof(pte), &error)) {
> >  			present = false;
> >  			break;
> >  		}
> 
> If there is failure reading the nested page tables here, you fill
> vcpu->arch.fault. But the nested fault error values will be overwritten
> at the end of walk_addr() by the original fault values?

True. Thanks for pointing that out. I will write a test-case for that
too. The results from my implemented tests show that sometimes the error
code is not reported correctly too. So I decided to do a v4 of this
patch-set with all found issues fixed.

Thanks for your review.

	Joerg



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 22/27] KVM: MMU: Refactor mmu_alloc_roots function
  2010-09-08  7:16     ` Avi Kivity
@ 2010-09-08  9:16       ` Roedel, Joerg
  0 siblings, 0 replies; 41+ messages in thread
From: Roedel, Joerg @ 2010-09-08  9:16 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Alexander Graf, joro, kvm, linux-kernel

On Wed, Sep 08, 2010 at 03:16:59AM -0400, Avi Kivity wrote:
>   On 09/07/2010 11:39 PM, Marcelo Tosatti wrote:
> >
> >> @@ -2406,16 +2441,11 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
> >>   			root_gfn = pdptr>>  PAGE_SHIFT;
> >>   			if (mmu_check_root(vcpu, root_gfn))
> >>   				return 1;
> >> -		} else if (vcpu->arch.mmu.root_level == 0)
> >> -			root_gfn = 0;
> >> -		if (vcpu->arch.mmu.direct_map) {
> >> -			direct = 1;
> >> -			root_gfn = i<<  30;
> >>   		}
> >>   		spin_lock(&vcpu->kvm->mmu_lock);
> >>   		kvm_mmu_free_some_pages(vcpu);
> >>   		sp = kvm_mmu_get_page(vcpu, root_gfn, i<<  30,
> >> -				      PT32_ROOT_LEVEL, direct,
> >> +				      PT32_ROOT_LEVEL, 0,
> >>   				      ACC_ALL, NULL);
> > Should not write protect the gfn for nonpaging mode.
> >
> 
> nonpaging mode should have direct_map set, so wouldn't enter this path 
> at all.

Hmm, actually the nonpaging path does not set direct_map. I'll fix this
too in v4. Thanks.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 19/27] KVM: X86: Propagate fetch faults
  2010-09-07 18:43   ` Marcelo Tosatti
@ 2010-09-08  9:18     ` Roedel, Joerg
  0 siblings, 0 replies; 41+ messages in thread
From: Roedel, Joerg @ 2010-09-08  9:18 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, Alexander Graf, joro, kvm, linux-kernel

On Tue, Sep 07, 2010 at 02:43:16PM -0400, Marcelo Tosatti wrote:
> On Mon, Sep 06, 2010 at 05:55:58PM +0200, Joerg Roedel wrote:
> >  		r = x86_decode_insn(&vcpu->arch.emulate_ctxt);
> > +		if (r == X86EMUL_PROPAGATE_FAULT)
> > +			goto done;
> > +
> 
> x86_decode_insn returns -1 / 0 ?

Yes. This looks like a left-over from v2 of the patch-set. I'll check
the path again and remove it if not necessary anymore.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 10/27] KVM: MMU: Add infrastructure for two-level page walker
  2010-09-06 18:05   ` Avi Kivity
@ 2010-09-08  9:20     ` Roedel, Joerg
  0 siblings, 0 replies; 41+ messages in thread
From: Roedel, Joerg @ 2010-09-08  9:20 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Alexander Graf, joro, kvm, linux-kernel

On Mon, Sep 06, 2010 at 02:05:35PM -0400, Avi Kivity wrote:
>   On 09/06/2010 06:55 PM, Joerg Roedel wrote:
> > This patch introduces a mmu-callback to translate gpa
> > addresses in the walk_addr code. This is later used to
> > translate l2_gpa addresses into l1_gpa addresses.
> 
> > @@ -534,6 +534,11 @@ static inline gpa_t gfn_to_gpa(gfn_t gfn)
> >   	return (gpa_t)gfn<<  PAGE_SHIFT;
> >   }
> >
> > +static inline gfn_t gpa_to_gfn(gpa_t gpa)
> > +{
> > +	return (gfn_t)gpa>>  PAGE_SHIFT;
> > +}
> > +
> 
> That's a bug - gfn_t may be smaller than gpa_t, so you're truncating 
> just before the shift.  Note the casts in the surrounding functions are 
> widening, not narrowing.
> 
> However, gfn_t is u64 so the bug is only theoretical.

Will fix that in v4 too. Thanks.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 19/27] KVM: X86: Propagate fetch faults
  2010-09-06 15:01 [PATCH 0/27] Nested Paging Virtualization for KVM v3 Joerg Roedel
@ 2010-09-06 15:01 ` Joerg Roedel
  0 siblings, 0 replies; 41+ messages in thread
From: Joerg Roedel @ 2010-09-06 15:01 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: http://kvm.qumranet.com, linux-kernel, Joerg Roedel

KVM currently ignores fetch faults in the instruction
emulator. With nested-npt we could have such faults. This
patch adds the code to handle these.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/emulate.c |    3 +++
 arch/x86/kvm/x86.c     |    4 ++++
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 2b08b78..aead72e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1198,6 +1198,9 @@ static int emulate_popf(struct x86_emulate_ctxt *ctxt,
 	*(unsigned long *)dest =
 		(ctxt->eflags & ~change_mask) | (val & change_mask);
 
+	if (rc == X86EMUL_PROPAGATE_FAULT)
+		emulate_pf(ctxt);
+
 	return rc;
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 65b00f0..ca69dcc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4237,6 +4237,9 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
 		vcpu->arch.emulate_ctxt.perm_ok = false;
 
 		r = x86_decode_insn(&vcpu->arch.emulate_ctxt);
+		if (r == X86EMUL_PROPAGATE_FAULT)
+			goto done;
+
 		trace_kvm_emulate_insn_start(vcpu);
 
 		/* Only allow emulation of specific instructions on #UD
@@ -4295,6 +4298,7 @@ restart:
 		return handle_emulation_failure(vcpu);
 	}
 
+done:
 	if (vcpu->arch.emulate_ctxt.exception >= 0) {
 		inject_emulated_exception(vcpu);
 		r = EMULATE_DONE;
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2010-09-08  9:39 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-06 15:55 [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Joerg Roedel
2010-09-06 15:55 ` [PATCH 01/27] KVM: MMU: Check for root_level instead of long mode Joerg Roedel
2010-09-06 15:55 ` [PATCH 02/27] KVM: MMU: Make tdp_enabled a mmu-context parameter Joerg Roedel
2010-09-06 15:55 ` [PATCH 03/27] KVM: MMU: Make set_cr3 a function pointer in kvm_mmu Joerg Roedel
2010-09-06 15:55 ` [PATCH 04/27] KVM: X86: Introduce a tdp_set_cr3 function Joerg Roedel
2010-09-06 15:55 ` [PATCH 05/27] KVM: MMU: Introduce get_cr3 function pointer Joerg Roedel
2010-09-06 15:55 ` [PATCH 06/27] KVM: MMU: Introduce inject_page_fault " Joerg Roedel
2010-09-06 15:55 ` [PATCH 07/27] KVM: MMU: Introduce kvm_init_shadow_mmu helper function Joerg Roedel
2010-09-06 15:55 ` [PATCH 08/27] KVM: MMU: Let is_rsvd_bits_set take mmu context instead of vcpu Joerg Roedel
2010-09-06 15:55 ` [PATCH 09/27] KVM: MMU: Introduce generic walk_addr function Joerg Roedel
2010-09-06 15:55 ` [PATCH 10/27] KVM: MMU: Add infrastructure for two-level page walker Joerg Roedel
2010-09-06 18:05   ` Avi Kivity
2010-09-08  9:20     ` Roedel, Joerg
2010-09-06 15:55 ` [PATCH 11/27] KVM: X86: Introduce pointer to mmu context used for gva_to_gpa Joerg Roedel
2010-09-06 15:55 ` [PATCH 12/27] KVM: MMU: Implement nested gva_to_gpa functions Joerg Roedel
2010-09-06 15:55 ` [PATCH 13/27] KVM: X86: Add kvm_read_guest_page_tdp function Joerg Roedel
2010-09-06 15:55 ` [PATCH 14/27] KVM: MMU: Make walk_addr_generic capable for two-level walking Joerg Roedel
2010-09-07 17:48   ` Marcelo Tosatti
2010-09-08  9:12     ` Roedel, Joerg
2010-09-06 15:55 ` [PATCH 15/27] KVM: MMU: Introduce kvm_read_guest_page_x86() Joerg Roedel
2010-09-06 15:55 ` [PATCH 16/27] KVM: MMU: Introduce init_kvm_nested_mmu() Joerg Roedel
2010-09-06 15:55 ` [PATCH 17/27] KVM: MMU: Track page fault data in struct vcpu Joerg Roedel
2010-09-06 18:17   ` Avi Kivity
2010-09-06 15:55 ` [PATCH 18/27] KVM: MMU: Propagate the right fault back to the guest after gva_to_gpa Joerg Roedel
2010-09-06 15:55 ` [PATCH 19/27] KVM: X86: Propagate fetch faults Joerg Roedel
2010-09-07 18:43   ` Marcelo Tosatti
2010-09-08  9:18     ` Roedel, Joerg
2010-09-06 15:55 ` [PATCH 20/27] KVM: MMU: Add kvm_mmu parameter to load_pdptrs function Joerg Roedel
2010-09-06 15:56 ` [PATCH 21/27] KVM: MMU: Introduce kvm_pdptr_read_mmu Joerg Roedel
2010-09-06 15:56 ` [PATCH 22/27] KVM: MMU: Refactor mmu_alloc_roots function Joerg Roedel
2010-09-07 20:39   ` Marcelo Tosatti
2010-09-08  7:16     ` Avi Kivity
2010-09-08  9:16       ` Roedel, Joerg
2010-09-06 15:56 ` [PATCH 23/27] KVM: MMU: Allow long mode shadows for legacy page tables Joerg Roedel
2010-09-06 15:56 ` [PATCH 24/27] KVM: SVM: Implement MMU helper functions for Nested Nested Paging Joerg Roedel
2010-09-06 15:56 ` [PATCH 25/27] KVM: SVM: Initialize Nested Nested MMU context on VMRUN Joerg Roedel
2010-09-06 15:56 ` [PATCH 26/27] KVM: SVM: Expect two more candiates for exit_int_info Joerg Roedel
2010-09-06 15:56 ` [PATCH 27/27] KVM: SVM: Report Nested Paging support to userspace Joerg Roedel
2010-09-06 18:37 ` [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List) Avi Kivity
2010-09-07 16:35   ` Roedel, Joerg
  -- strict thread matches above, loose matches on Subject: below --
2010-09-06 15:01 [PATCH 0/27] Nested Paging Virtualization for KVM v3 Joerg Roedel
2010-09-06 15:01 ` [PATCH 19/27] KVM: X86: Propagate fetch faults Joerg Roedel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.