All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V3 00/12] KVM: X86/MMU: Use one-off local shadow page for special roots
@ 2022-05-21 13:16 Lai Jiangshan
  2022-05-21 13:16 ` [PATCH V3 01/12] KVM: X86/MMU: Verify PDPTE for nested NPT in PAE paging mode when page fault Lai Jiangshan
                   ` (12 more replies)
  0 siblings, 13 replies; 30+ messages in thread
From: Lai Jiangshan @ 2022-05-21 13:16 UTC (permalink / raw)
  To: linux-kernel, kvm, Paolo Bonzini, Sean Christopherson
  Cc: Vitaly Kuznetsov, Maxim Levitsky, David Matlack, Lai Jiangshan

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

Current code uses mmu->pae_root, mmu->pml4_root, and mmu->pml5_root to
setup special roots.  The initialization code is complex and the roots
are not associated with struct kvm_mmu_page which causes the code more
complex.

So add new local shadow pages to simplify it.

The local shadow pages are associated with struct kvm_mmu_page and
VCPU-local.

The local shadow pages are created and freed when the roots are
changed (or one-off) which can be optimized but not in the patchset
since the re-creating is light way (in normal case only the struct
kvm_mmu_page needs to be re-allocated and sp->spt doens't, because
it is likely to be mmu->pae_root)

The patchset also fixes a possible bug described in:
https://lore.kernel.org/lkml/20220415103414.86555-1-jiangshanlai@gmail.com/
as patch1.

And the fixing is simplifed in patch9 with the help of local shadow page.

Note:
using_local_root_page() can be implemented in two ways.

static bool using_local_root_page(struct kvm_mmu *mmu)
{
	return mmu->root_role.level == PT32E_ROOT_LEVEL ||
	       (!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL);
}

static bool using_local_root_page(struct kvm_mmu *mmu)
{
	if (mmu->root_role.direct)
		return mmu->root_role.level == PT32E_ROOT_LEVEL;
	else
		return mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL;
}

I prefer the second way.  But when I wrote the documents for them.  I
couldn't explain well enough for the second way.  Maybe I explained the
second way in a wrong aspect or my English is not qualified to explain
it.

So I put the first way in patch 2 and the second way in patch3.
Patch3 adds much more documents and changes the first way to the second
way.  Patch3 can be discarded.

Changed from v2:
	Add document for using_local_root_page()
	Update many documents
	Address review comments
	Add a patch that fix a possible bug (and split other patches for patch9)

Changed from v1:
	Rebase to newest kvm/queue. Slightly update patch4.

[V2]: https://lore.kernel.org/lkml/20220503150735.32723-1-jiangshanlai@gmail.com/
[V1]: https://lore.kernel.org/lkml/20220420132605.3813-1-jiangshanlai@gmail.com/


Lai Jiangshan (12):
  KVM: X86/MMU: Verify PDPTE for nested NPT in PAE paging mode when page
    fault
  KVM: X86/MMU: Add using_local_root_page()
  KVM: X86/MMU: Reduce a check in using_local_root_page() for common
    cases
  KVM: X86/MMU: Add local shadow pages
  KVM: X86/MMU: Link PAE root pagetable with its children
  KVM: X86/MMU: Activate local shadow pages and remove old logic
  KVM: X86/MMU: Remove the check of the return value of to_shadow_page()
  KVM: X86/MMU: Allocate mmu->pae_root for PAE paging on-demand
  KVM: X86/MMU: Move the verifying of NPT's PDPTE in FNAME(fetch)
  KVM: X86/MMU: Remove unused INVALID_PAE_ROOT and IS_VALID_PAE_ROOT
  KVM: X86/MMU: Don't use mmu->pae_root when shadowing PAE NPT in 64-bit
    host
  KVM: X86/MMU: Remove mmu_alloc_special_roots()

 arch/x86/include/asm/kvm_host.h |   5 +-
 arch/x86/kvm/mmu/mmu.c          | 575 ++++++++++++++------------------
 arch/x86/kvm/mmu/mmu_internal.h |  10 -
 arch/x86/kvm/mmu/paging_tmpl.h  |  51 ++-
 arch/x86/kvm/mmu/spte.c         |   7 +
 arch/x86/kvm/mmu/spte.h         |   1 +
 arch/x86/kvm/mmu/tdp_mmu.h      |   7 +-
 arch/x86/kvm/x86.c              |   4 +-
 8 files changed, 303 insertions(+), 357 deletions(-)

-- 
2.19.1.6.gb485710b


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH V3 01/12] KVM: X86/MMU: Verify PDPTE for nested NPT in PAE paging mode when page fault
  2022-05-21 13:16 [PATCH V3 00/12] KVM: X86/MMU: Use one-off local shadow page for special roots Lai Jiangshan
@ 2022-05-21 13:16 ` Lai Jiangshan
  2022-07-19 21:17   ` Sean Christopherson
  2022-05-21 13:16 ` [PATCH V3 02/12] KVM: X86/MMU: Add using_local_root_page() Lai Jiangshan
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Lai Jiangshan @ 2022-05-21 13:16 UTC (permalink / raw)
  To: linux-kernel, kvm, Paolo Bonzini, Sean Christopherson
  Cc: Vitaly Kuznetsov, Maxim Levitsky, David Matlack, Lai Jiangshan

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

When nested NPT enabled and L1 is PAE paging, mmu->get_pdptrs()
which is nested_svm_get_tdp_pdptr() reads the guest NPT's PDPTE
from memory unconditionally for each call.

The guest PAE root page is not write-protected.

The mmu->get_pdptrs() in FNAME(walk_addr_generic) might get a value
different from previous calls or different from the return value of
mmu->get_pdptrs() in mmu_alloc_shadow_roots().

It will cause FNAME(fetch) installs the spte in a wrong sp or links
a sp to a wrong parent if the return value of mmu->get_pdptrs()
is not verified unchanged since FNAME(gpte_changed) can't check
this kind of change.

Verify the return value of mmu->get_pdptrs() (only the gfn in it
needs to be checked) and do kvm_mmu_free_roots() like load_pdptr()
if the gfn isn't matched.

Do the verifying unconditionally when the guest is PAE paging no
matter whether it is nested NPT or not to avoid complicated code.

The commit e4e517b4be01 ("KVM: MMU: Do not unconditionally read PDPTE
from guest memory") fixs the same problem for non-nested case via
caching the PDPTEs which is also the same way as how hardware caches
the PDPTEs.

Under SVM, however, when the processor is in guest mode with PAE
enabled, the guest PDPTE entries are not cached or validated at this
point, but instead are loaded and checked on demand in the normal course
of address translation, just like page directory and page table entries.
Any reserved bit violations are detected at the point of use, and result
in a page-fault (#PF) exception rather than a general-protection (#GP)
exception.

So using caches can not fix the problem for shadowing nested NPT for
32bit L1.

Fixes: e4e517b4be01 ("KVM: MMU: Do not unconditionally read PDPTE from guest memory")
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
 arch/x86/kvm/mmu/paging_tmpl.h | 39 ++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index db80f7ccaa4e..6e3df84e8455 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -870,6 +870,44 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	if (is_page_fault_stale(vcpu, fault, mmu_seq))
 		goto out_unlock;
 
+	/*
+	 * When nested NPT enabled and L1 is PAE paging, mmu->get_pdptrs()
+	 * which is nested_svm_get_tdp_pdptr() reads the guest NPT's PDPTE
+	 * from memory unconditionally for each call.
+	 *
+	 * The guest PAE root page is not write-protected.
+	 *
+	 * The mmu->get_pdptrs() in FNAME(walk_addr_generic) might get a value
+	 * different from previous calls or different from the return value of
+	 * mmu->get_pdptrs() in mmu_alloc_shadow_roots().
+	 *
+	 * It will cause FNAME(fetch) installs the spte in a wrong sp or links
+	 * a sp to a wrong parent if the return value of mmu->get_pdptrs()
+	 * is not verified unchanged since FNAME(gpte_changed) can't check
+	 * this kind of change.
+	 *
+	 * Verify the return value of mmu->get_pdptrs() (only the gfn in it
+	 * needs to be checked) and do kvm_mmu_free_roots() like load_pdptr()
+	 * if the gfn isn't matched.
+	 *
+	 * Do the verifying unconditionally when the guest is PAE paging no
+	 * matter whether it is nested NPT or not to avoid complicated code.
+	 */
+	if (vcpu->arch.mmu->cpu_role.base.level == PT32E_ROOT_LEVEL) {
+		u64 pdpte = vcpu->arch.mmu->pae_root[(fault->addr >> 30) & 3];
+		struct kvm_mmu_page *sp = NULL;
+
+		if (IS_VALID_PAE_ROOT(pdpte))
+			sp = to_shadow_page(pdpte & PT64_BASE_ADDR_MASK);
+
+		if (!sp || walker.table_gfn[PT32E_ROOT_LEVEL - 2] != sp->gfn) {
+			write_unlock(&vcpu->kvm->mmu_lock);
+			kvm_mmu_free_roots(vcpu->kvm, vcpu->arch.mmu,
+					   KVM_MMU_ROOT_CURRENT);
+			goto release_clean;
+		}
+	}
+
 	r = make_mmu_pages_available(vcpu);
 	if (r)
 		goto out_unlock;
@@ -877,6 +915,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 
 out_unlock:
 	write_unlock(&vcpu->kvm->mmu_lock);
+release_clean:
 	kvm_release_pfn_clean(fault->pfn);
 	return r;
 }
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V3 02/12] KVM: X86/MMU: Add using_local_root_page()
  2022-05-21 13:16 [PATCH V3 00/12] KVM: X86/MMU: Use one-off local shadow page for special roots Lai Jiangshan
  2022-05-21 13:16 ` [PATCH V3 01/12] KVM: X86/MMU: Verify PDPTE for nested NPT in PAE paging mode when page fault Lai Jiangshan
@ 2022-05-21 13:16 ` Lai Jiangshan
  2022-05-26 21:28   ` David Matlack
  2022-07-19 22:03   ` Sean Christopherson
  2022-05-21 13:16 ` [PATCH V3 03/12] KVM: X86/MMU: Reduce a check in using_local_root_page() for common cases Lai Jiangshan
                   ` (10 subsequent siblings)
  12 siblings, 2 replies; 30+ messages in thread
From: Lai Jiangshan @ 2022-05-21 13:16 UTC (permalink / raw)
  To: linux-kernel, kvm, Paolo Bonzini, Sean Christopherson
  Cc: Vitaly Kuznetsov, Maxim Levitsky, David Matlack, Lai Jiangshan

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

In some cases, local root pages are used for MMU.  It is often using
to_shadow_page(mmu->root.hpa) to check if local root pages are used.

Add using_local_root_page() to directly check if local root pages are
used or needed to be used even mmu->root.hpa is not set.

Prepare for making to_shadow_page(mmu->root.hpa) returns non-NULL via
using local shadow [root] pages.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
 arch/x86/kvm/mmu/mmu.c | 40 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 37 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index efe5a3dca1e0..624b6d2473f7 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1690,6 +1690,39 @@ static void drop_parent_pte(struct kvm_mmu_page *sp,
 	mmu_spte_clear_no_track(parent_pte);
 }
 
+/*
+ * KVM uses the VCPU's local root page (vcpu->mmu->pae_root) when either the
+ * shadow pagetable is using PAE paging or the host is shadowing nested NPT for
+ * 32bit L1 hypervisor.
+ *
+ * It includes cases:
+ *	nonpaging when !tdp_enabled				(direct paging)
+ *	shadow paging for 32 bit guest when !tdp_enabled	(shadow paging)
+ *	NPT in 32bit host (not shadowing nested NPT)		(direct paging)
+ *	shadow nested NPT for 32bit L1 hypervisor in 32bit host (shadow paging)
+ *	shadow nested NPT for 32bit L1 hypervisor in 64bit host (shadow paging)
+ *
+ * For the first four cases, mmu->root_role.level is PT32E_ROOT_LEVEL and the
+ * shadow pagetable is using PAE paging.
+ *
+ * For the last case, it is
+ * 	mmu->root_role.level > PT32E_ROOT_LEVEL &&
+ * 	!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL
+ * And if this condition is true, it must be the last case.
+ *
+ * With the two conditions combined, the checking condition is:
+ * 	mmu->root_role.level == PT32E_ROOT_LEVEL ||
+ * 	(!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL)
+ *
+ * (There is no "mmu->root_role.level > PT32E_ROOT_LEVEL" here, because it is
+ *  already ensured that mmu->root_role.level >= PT32E_ROOT_LEVEL)
+ */
+static bool using_local_root_page(struct kvm_mmu *mmu)
+{
+	return mmu->root_role.level == PT32E_ROOT_LEVEL ||
+	       (!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL);
+}
+
 static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct)
 {
 	struct kvm_mmu_page *sp;
@@ -4252,10 +4285,11 @@ static bool fast_pgd_switch(struct kvm *kvm, struct kvm_mmu *mmu,
 {
 	/*
 	 * For now, limit the caching to 64-bit hosts+VMs in order to avoid
-	 * having to deal with PDPTEs. We may add support for 32-bit hosts/VMs
-	 * later if necessary.
+	 * having to deal with PDPTEs.  Local roots can not be put into
+	 * mmu->prev_roots[] because mmu->pae_root can not be shared for
+	 * different roots at the same time.
 	 */
-	if (VALID_PAGE(mmu->root.hpa) && !to_shadow_page(mmu->root.hpa))
+	if (unlikely(using_local_root_page(mmu)))
 		kvm_mmu_free_roots(kvm, mmu, KVM_MMU_ROOT_CURRENT);
 
 	if (VALID_PAGE(mmu->root.hpa))
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V3 03/12] KVM: X86/MMU: Reduce a check in using_local_root_page() for common cases
  2022-05-21 13:16 [PATCH V3 00/12] KVM: X86/MMU: Use one-off local shadow page for special roots Lai Jiangshan
  2022-05-21 13:16 ` [PATCH V3 01/12] KVM: X86/MMU: Verify PDPTE for nested NPT in PAE paging mode when page fault Lai Jiangshan
  2022-05-21 13:16 ` [PATCH V3 02/12] KVM: X86/MMU: Add using_local_root_page() Lai Jiangshan
@ 2022-05-21 13:16 ` Lai Jiangshan
  2022-05-21 13:16 ` [PATCH V3 04/12] KVM: X86/MMU: Add local shadow pages Lai Jiangshan
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Lai Jiangshan @ 2022-05-21 13:16 UTC (permalink / raw)
  To: linux-kernel, kvm, Paolo Bonzini, Sean Christopherson
  Cc: Vitaly Kuznetsov, Maxim Levitsky, David Matlack, Lai Jiangshan

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

For most cases, mmu->root_role.direct is true and mmu->root_role.level
is not PT32E_ROOT_LEVEL which means using_local_root_page() is often
checking for all the three test which is not good in fast paths.

Morph the conditions in using_local_root_page() to an equivalent one
to reduce a check.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
 arch/x86/kvm/mmu/mmu.c | 45 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 43 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 624b6d2473f7..240ebe589caf 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1716,11 +1716,52 @@ static void drop_parent_pte(struct kvm_mmu_page *sp,
  *
  * (There is no "mmu->root_role.level > PT32E_ROOT_LEVEL" here, because it is
  *  already ensured that mmu->root_role.level >= PT32E_ROOT_LEVEL)
+ *
+ * But mmu->root_role.direct is normally true and mmu->root_role.level is
+ * normally not PT32E_ROOT_LEVEL.  To reduce a check for the fast path of
+ * fast_pgd_switch() in mormal case, mmu->root_role.direct is checked first.
+ *
+ * The return value is:
+ * 	mmu->root_role.level == PT32E_ROOT_LEVEL ||
+ * 	(!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL)
+ * =>
+ * 	(mmu->root_role.direct && mmu->root_role.level == PT32E_ROOT_LEVEL) ||
+ * 	(!mmu->root_role.direct && mmu->root_role.level == PT32E_ROOT_LEVEL) ||
+ * 	(!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL)
+ * =>
+ * 	(mmu->root_role.direct && mmu->root_role.level == PT32E_ROOT_LEVEL) ||
+ * 	(!mmu->root_role.direct &&
+ * 	 (mmu->root_role.level == PT32E_ROOT_LEVEL ||
+ * 	  mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL))
+ * => (for !direct, mmu->root_role.level == PT32E_ROOT_LEVEL implies
+ * 	mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL)
+ * =>
+ * 	(mmu->root_role.direct && mmu->root_role.level == PT32E_ROOT_LEVEL) ||
+ * 	(!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL)
+ *
+ * In other words:
+ *
+ * For the first and third cases, it is
+ * 	mmu->root_role.direct && mmu->root_role.level == PT32E_ROOT_LEVEL
+ * And if this condition is true, it must be one of the two cases.
+ *
+ * For the 2nd, 4th and 5th cases, it is
+ * 	!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL
+ * And if this condition is true, it must be one of the three cases although
+ * it is not so intuitive.  It can be split into:
+ * 	mmu->root_role.level == PT32E_ROOT_LEVEL &&
+ * 	(!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL)
+ * which is for the 2nd and 4th cases and
+ * 	mmu->root_role.level > PT32E_ROOT_LEVEL &&
+ * 	!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL
+ * which is the last case.
  */
 static bool using_local_root_page(struct kvm_mmu *mmu)
 {
-	return mmu->root_role.level == PT32E_ROOT_LEVEL ||
-	       (!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL);
+	if (mmu->root_role.direct)
+		return mmu->root_role.level == PT32E_ROOT_LEVEL;
+	else
+		return mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL;
 }
 
 static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct)
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V3 04/12] KVM: X86/MMU: Add local shadow pages
  2022-05-21 13:16 [PATCH V3 00/12] KVM: X86/MMU: Use one-off local shadow page for special roots Lai Jiangshan
                   ` (2 preceding siblings ...)
  2022-05-21 13:16 ` [PATCH V3 03/12] KVM: X86/MMU: Reduce a check in using_local_root_page() for common cases Lai Jiangshan
@ 2022-05-21 13:16 ` Lai Jiangshan
  2022-05-26 21:38   ` David Matlack
                     ` (2 more replies)
  2022-05-21 13:16 ` [PATCH V3 05/12] KVM: X86/MMU: Link PAE root pagetable with its children Lai Jiangshan
                   ` (8 subsequent siblings)
  12 siblings, 3 replies; 30+ messages in thread
From: Lai Jiangshan @ 2022-05-21 13:16 UTC (permalink / raw)
  To: linux-kernel, kvm, Paolo Bonzini, Sean Christopherson
  Cc: Vitaly Kuznetsov, Maxim Levitsky, David Matlack, Lai Jiangshan

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

Local shadow pages are shadow pages to hold PDPTEs for 32bit guest or
higher level shadow pages having children local shadow pages when
shadowing nested NPT for 32bit L1 in 64 bit L0.

Current code use mmu->pae_root, mmu->pml4_root, and mmu->pml5_root to
setup local root page.  The initialization code is complex and the root
pages are not associated with struct kvm_mmu_page which causes the code
more complex.

Add kvm_mmu_alloc_local_shadow_page() and mmu_free_local_root_page() to
allocate and free local shadow pages and prepare for using local
shadow pages to replace current logic and share the most logic with
non-local shadow pages.

The code is not activated since using_local_root_page() is false in
the place where it is inserted.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
 arch/x86/kvm/mmu/mmu.c | 109 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 108 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 240ebe589caf..c941a5931bc3 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1764,6 +1764,76 @@ static bool using_local_root_page(struct kvm_mmu *mmu)
 		return mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL;
 }
 
+/*
+ * Local shadow pages are shadow pages to hold PDPTEs for 32bit guest or higher
+ * level shadow pages having children local shadow pages when shadowing nested
+ * NPT for 32bit L1 in 64 bit L0.
+ *
+ * Local shadow pages are often local shadow root pages (or local root pages for
+ * short) except when shadowing nested NPT for 32bit L1 in 64 bit L0 which has
+ * 2 or 3 levels of local shadow pages on top of non-local shadow pages.
+ *
+ * Local shadow pages are locally allocated.  If the local shadow page's level
+ * is PT32E_ROOT_LEVEL, it will use the preallocated mmu->pae_root for its
+ * sp->spt.  Because sp->spt may need to be put in the 32 bits CR3 (even in
+ * x86_64) or decrypted.  Using the preallocated one to handle these
+ * requirements makes the allocation simpler.
+ *
+ * Local shadow pages are only visible to local VCPU except through
+ * sp->parent_ptes rmap from their children, so they are not in the
+ * kvm->arch.active_mmu_pages nor in the hash.
+ *
+ * And they are neither accounted nor write-protected since they don't shadow a
+ * guest page table.
+ *
+ * Because of above, local shadow pages can not be freed nor zapped like
+ * non-local shadow pages.  They are freed directly when the local root page
+ * is freed, see mmu_free_local_root_page().
+ *
+ * Local root page can not be put on mmu->prev_roots because the comparison
+ * must use PDPTEs instead of CR3 and mmu->pae_root can not be shared for multi
+ * local root pages.
+ *
+ * Except above limitations, all the other abilities are the same as other
+ * shadow page, like link, parent rmap, sync, unsync etc.
+ *
+ * Local shadow pages can be obsoleted in a little different way other than
+ * the non-local shadow pages.  When the obsoleting process is done, all the
+ * obsoleted non-local shadow pages are unlinked from the local shadow pages
+ * by the help of the sp->parent_ptes rmap and the local shadow pages become
+ * theoretically valid again except sp->mmu_valid_gen may be still outdated.
+ * If there is no other event to cause a VCPU to free the local root page and
+ * the VCPU is being preempted by the host during two obsoleting processes,
+ * sp->mmu_valid_gen might become valid again and the VCPU can reuse it when
+ * the VCPU is back.  It is different from the non-local shadow pages which
+ * are always freed after obsoleted.
+ */
+static struct kvm_mmu_page *
+kvm_mmu_alloc_local_shadow_page(struct kvm_vcpu *vcpu, union kvm_mmu_page_role role)
+{
+	struct kvm_mmu_page *sp;
+
+	sp = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
+	sp->gfn = 0;
+	sp->role = role;
+	/*
+	 * Use the preallocated mmu->pae_root when the shadow page's
+	 * level is PT32E_ROOT_LEVEL which may need to be put in the 32 bits
+	 * CR3 (even in x86_64) or decrypted.  The preallocated one is prepared
+	 * for the requirements.
+	 */
+	if (role.level == PT32E_ROOT_LEVEL &&
+	    !WARN_ON_ONCE(!vcpu->arch.mmu->pae_root))
+		sp->spt = vcpu->arch.mmu->pae_root;
+	else
+		sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache);
+	/* sp->gfns is not used for local shadow page */
+	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
+	sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen;
+
+	return sp;
+}
+
 static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct)
 {
 	struct kvm_mmu_page *sp;
@@ -2121,6 +2191,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 	if (level <= vcpu->arch.mmu->cpu_role.base.level)
 		role.passthrough = 0;
 
+	if (unlikely(level >= PT32E_ROOT_LEVEL && using_local_root_page(vcpu->arch.mmu)))
+		return kvm_mmu_alloc_local_shadow_page(vcpu, role);
+
 	sp_list = &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)];
 	for_each_valid_sp(vcpu->kvm, sp, sp_list) {
 		if (sp->gfn != gfn) {
@@ -3351,6 +3424,37 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa,
 	*root_hpa = INVALID_PAGE;
 }
 
+static void mmu_free_local_root_page(struct kvm *kvm, struct kvm_mmu *mmu)
+{
+	u64 spte = mmu->root.hpa;
+	struct kvm_mmu_page *sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK);
+	int i;
+
+	/* Free level 5 or 4 roots for shadow NPT for 32 bit L1 */
+	while (sp->role.level > PT32E_ROOT_LEVEL)
+	{
+		spte = sp->spt[0];
+		mmu_page_zap_pte(kvm, sp, sp->spt + 0, NULL);
+		free_page((unsigned long)sp->spt);
+		kmem_cache_free(mmu_page_header_cache, sp);
+		if (!is_shadow_present_pte(spte))
+			return;
+		sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK);
+	}
+
+	if (WARN_ON_ONCE(sp->role.level != PT32E_ROOT_LEVEL))
+		return;
+
+	/* Disconnect PAE root from the 4 PAE page directories */
+	for (i = 0; i < 4; i++)
+		mmu_page_zap_pte(kvm, sp, sp->spt + i, NULL);
+
+	if (sp->spt != mmu->pae_root)
+		free_page((unsigned long)sp->spt);
+
+	kmem_cache_free(mmu_page_header_cache, sp);
+}
+
 /* roots_to_free must be some combination of the KVM_MMU_ROOT_* flags */
 void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu,
 			ulong roots_to_free)
@@ -3384,7 +3488,10 @@ void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu,
 
 	if (free_active_root) {
 		if (to_shadow_page(mmu->root.hpa)) {
-			mmu_free_root_page(kvm, &mmu->root.hpa, &invalid_list);
+			if (using_local_root_page(mmu))
+				mmu_free_local_root_page(kvm, mmu);
+			else
+				mmu_free_root_page(kvm, &mmu->root.hpa, &invalid_list);
 		} else if (mmu->pae_root) {
 			for (i = 0; i < 4; ++i) {
 				if (!IS_VALID_PAE_ROOT(mmu->pae_root[i]))
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V3 05/12] KVM: X86/MMU: Link PAE root pagetable with its children
  2022-05-21 13:16 [PATCH V3 00/12] KVM: X86/MMU: Use one-off local shadow page for special roots Lai Jiangshan
                   ` (3 preceding siblings ...)
  2022-05-21 13:16 ` [PATCH V3 04/12] KVM: X86/MMU: Add local shadow pages Lai Jiangshan
@ 2022-05-21 13:16 ` Lai Jiangshan
  2022-07-19 22:21   ` Sean Christopherson
  2022-05-21 13:16 ` [PATCH V3 06/12] KVM: X86/MMU: Activate local shadow pages and remove old logic Lai Jiangshan
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Lai Jiangshan @ 2022-05-21 13:16 UTC (permalink / raw)
  To: linux-kernel, kvm, Paolo Bonzini, Sean Christopherson
  Cc: Vitaly Kuznetsov, Maxim Levitsky, David Matlack, Lai Jiangshan

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

When local shadow pages are activated, link_shadow_page() might link
a local shadow pages which is the PAE root for PAE paging with its
children.

Add make_pae_pdpte() to handle it.

The code is not activated since local shadow pages are not activated
yet.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
 arch/x86/kvm/mmu/mmu.c  | 6 +++++-
 arch/x86/kvm/mmu/spte.c | 7 +++++++
 arch/x86/kvm/mmu/spte.h | 1 +
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c941a5931bc3..e1a059dd9621 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2340,7 +2340,11 @@ static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
 
 	BUILD_BUG_ON(VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK);
 
-	spte = make_nonleaf_spte(sp->spt, sp_ad_disabled(sp));
+	if (unlikely(sp->role.level == PT32_ROOT_LEVEL &&
+		     vcpu->arch.mmu->root_role.level == PT32E_ROOT_LEVEL))
+		spte = make_pae_pdpte(sp->spt);
+	else
+		spte = make_nonleaf_spte(sp->spt, sp_ad_disabled(sp));
 
 	mmu_spte_set(sptep, spte);
 
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index b5960bbde7f7..5c31fa1d2b61 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -279,6 +279,13 @@ u64 make_huge_page_split_spte(u64 huge_spte, int huge_level, int index)
 	return child_spte;
 }
 
+u64 make_pae_pdpte(u64 *child_pt)
+{
+	/* The only ignore bits in PDPTE are 11:9. */
+	BUILD_BUG_ON(!(GENMASK(11,9) & SPTE_MMU_PRESENT_MASK));
+	return __pa(child_pt) | PT_PRESENT_MASK | SPTE_MMU_PRESENT_MASK |
+		shadow_me_value;
+}
 
 u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled)
 {
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 0127bb6e3c7d..2408ba1361d5 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -426,6 +426,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	       u64 old_spte, bool prefetch, bool can_unsync,
 	       bool host_writable, u64 *new_spte);
 u64 make_huge_page_split_spte(u64 huge_spte, int huge_level, int index);
+u64 make_pae_pdpte(u64 *child_pt);
 u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled);
 u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsigned int access);
 u64 mark_spte_for_access_track(u64 spte);
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V3 06/12] KVM: X86/MMU: Activate local shadow pages and remove old logic
  2022-05-21 13:16 [PATCH V3 00/12] KVM: X86/MMU: Use one-off local shadow page for special roots Lai Jiangshan
                   ` (4 preceding siblings ...)
  2022-05-21 13:16 ` [PATCH V3 05/12] KVM: X86/MMU: Link PAE root pagetable with its children Lai Jiangshan
@ 2022-05-21 13:16 ` Lai Jiangshan
  2022-05-21 13:16 ` [PATCH V3 07/12] KVM: X86/MMU: Remove the check of the return value of to_shadow_page() Lai Jiangshan
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Lai Jiangshan @ 2022-05-21 13:16 UTC (permalink / raw)
  To: linux-kernel, kvm, Paolo Bonzini, Sean Christopherson
  Cc: Vitaly Kuznetsov, Maxim Levitsky, David Matlack, Lai Jiangshan

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

Activate local shadow pages by allocate local shadow pages in
mmu_alloc_direct_roots() and mmu_alloc_shadow_roots().

Make shadow walkings walk from the topmost shadow page even it is
local shadow page so that they can be walked like normal root
and shadowed PDPTEs can be made and installed on-demand.

Walking from the topmost causes FNAME(fetch) needs to visit high level
local shadow pages and allocate local shadow pages when shadowing
NPT for 32bit L1 in 64bit host, so change FNAME(fetch) and
FNAME(walk_addr_generic) to handle it for affected code.

Do sync from the topmost in kvm_mmu_sync_roots() and simplifies
the code.

Now all the root pages and pagetable pointed by a present spte in
struct kvm_mmu are associated by struct kvm_mmu_page, and
to_shadow_page() is guaranteed to be not NULL.

Affect cases are those that using_local_root_page() return true.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
 arch/x86/kvm/mmu/mmu.c         | 174 +++------------------------------
 arch/x86/kvm/mmu/paging_tmpl.h |  18 +++-
 2 files changed, 31 insertions(+), 161 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e1a059dd9621..684a0221aa4c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1691,9 +1691,9 @@ static void drop_parent_pte(struct kvm_mmu_page *sp,
 }
 
 /*
- * KVM uses the VCPU's local root page (vcpu->mmu->pae_root) when either the
- * shadow pagetable is using PAE paging or the host is shadowing nested NPT for
- * 32bit L1 hypervisor.
+ * KVM uses the VCPU's local root page (kvm_mmu_alloc_local_shadow_page()) when
+ * either the shadow pagetable is using PAE paging or the host is shadowing
+ * nested NPT for 32bit L1 hypervisor.
  *
  * It includes cases:
  *	nonpaging when !tdp_enabled				(direct paging)
@@ -2277,26 +2277,6 @@ static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterato
 	iterator->addr = addr;
 	iterator->shadow_addr = root;
 	iterator->level = vcpu->arch.mmu->root_role.level;
-
-	if (iterator->level >= PT64_ROOT_4LEVEL &&
-	    vcpu->arch.mmu->cpu_role.base.level < PT64_ROOT_4LEVEL &&
-	    !vcpu->arch.mmu->root_role.direct)
-		iterator->level = PT32E_ROOT_LEVEL;
-
-	if (iterator->level == PT32E_ROOT_LEVEL) {
-		/*
-		 * prev_root is currently only used for 64-bit hosts. So only
-		 * the active root_hpa is valid here.
-		 */
-		BUG_ON(root != vcpu->arch.mmu->root.hpa);
-
-		iterator->shadow_addr
-			= vcpu->arch.mmu->pae_root[(addr >> 30) & 3];
-		iterator->shadow_addr &= PT64_BASE_ADDR_MASK;
-		--iterator->level;
-		if (!iterator->shadow_addr)
-			iterator->level = 0;
-	}
 }
 
 static void shadow_walk_init(struct kvm_shadow_walk_iterator *iterator,
@@ -3491,21 +3471,10 @@ void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu,
 					   &invalid_list);
 
 	if (free_active_root) {
-		if (to_shadow_page(mmu->root.hpa)) {
-			if (using_local_root_page(mmu))
-				mmu_free_local_root_page(kvm, mmu);
-			else
-				mmu_free_root_page(kvm, &mmu->root.hpa, &invalid_list);
-		} else if (mmu->pae_root) {
-			for (i = 0; i < 4; ++i) {
-				if (!IS_VALID_PAE_ROOT(mmu->pae_root[i]))
-					continue;
-
-				mmu_free_root_page(kvm, &mmu->pae_root[i],
-						   &invalid_list);
-				mmu->pae_root[i] = INVALID_PAE_ROOT;
-			}
-		}
+		if (using_local_root_page(mmu))
+			mmu_free_local_root_page(kvm, mmu);
+		else
+			mmu_free_root_page(kvm, &mmu->root.hpa, &invalid_list);
 		mmu->root.hpa = INVALID_PAGE;
 		mmu->root.pgd = 0;
 	}
@@ -3570,7 +3539,6 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 	struct kvm_mmu *mmu = vcpu->arch.mmu;
 	u8 shadow_root_level = mmu->root_role.level;
 	hpa_t root;
-	unsigned i;
 	int r;
 
 	write_lock(&vcpu->kvm->mmu_lock);
@@ -3581,24 +3549,9 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 	if (is_tdp_mmu_enabled(vcpu->kvm)) {
 		root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu);
 		mmu->root.hpa = root;
-	} else if (shadow_root_level >= PT64_ROOT_4LEVEL) {
+	} else if (shadow_root_level >= PT32E_ROOT_LEVEL) {
 		root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level, true);
 		mmu->root.hpa = root;
-	} else if (shadow_root_level == PT32E_ROOT_LEVEL) {
-		if (WARN_ON_ONCE(!mmu->pae_root)) {
-			r = -EIO;
-			goto out_unlock;
-		}
-
-		for (i = 0; i < 4; ++i) {
-			WARN_ON_ONCE(IS_VALID_PAE_ROOT(mmu->pae_root[i]));
-
-			root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT),
-					      i << 30, PT32_ROOT_LEVEL, true);
-			mmu->pae_root[i] = root | PT_PRESENT_MASK |
-					   shadow_me_mask;
-		}
-		mmu->root.hpa = __pa(mmu->pae_root);
 	} else {
 		WARN_ONCE(1, "Bad TDP root level = %d\n", shadow_root_level);
 		r = -EIO;
@@ -3676,10 +3629,8 @@ static int mmu_first_shadow_root_alloc(struct kvm *kvm)
 static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 {
 	struct kvm_mmu *mmu = vcpu->arch.mmu;
-	u64 pdptrs[4], pm_mask;
 	gfn_t root_gfn, root_pgd;
 	hpa_t root;
-	unsigned i;
 	int r;
 
 	root_pgd = mmu->get_guest_pgd(vcpu);
@@ -3688,21 +3639,6 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	if (mmu_check_root(vcpu, root_gfn))
 		return 1;
 
-	/*
-	 * On SVM, reading PDPTRs might access guest memory, which might fault
-	 * and thus might sleep.  Grab the PDPTRs before acquiring mmu_lock.
-	 */
-	if (mmu->cpu_role.base.level == PT32E_ROOT_LEVEL) {
-		for (i = 0; i < 4; ++i) {
-			pdptrs[i] = mmu->get_pdptr(vcpu, i);
-			if (!(pdptrs[i] & PT_PRESENT_MASK))
-				continue;
-
-			if (mmu_check_root(vcpu, pdptrs[i] >> PAGE_SHIFT))
-				return 1;
-		}
-	}
-
 	r = mmu_first_shadow_root_alloc(vcpu->kvm);
 	if (r)
 		return r;
@@ -3712,70 +3648,9 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	if (r < 0)
 		goto out_unlock;
 
-	/*
-	 * Do we shadow a long mode page table? If so we need to
-	 * write-protect the guests page table root.
-	 */
-	if (mmu->cpu_role.base.level >= PT64_ROOT_4LEVEL) {
-		root = mmu_alloc_root(vcpu, root_gfn, 0,
-				      mmu->root_role.level, false);
-		mmu->root.hpa = root;
-		goto set_root_pgd;
-	}
-
-	if (WARN_ON_ONCE(!mmu->pae_root)) {
-		r = -EIO;
-		goto out_unlock;
-	}
-
-	/*
-	 * We shadow a 32 bit page table. This may be a legacy 2-level
-	 * or a PAE 3-level page table. In either case we need to be aware that
-	 * the shadow page table may be a PAE or a long mode page table.
-	 */
-	pm_mask = PT_PRESENT_MASK | shadow_me_value;
-	if (mmu->root_role.level >= PT64_ROOT_4LEVEL) {
-		pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK;
-
-		if (WARN_ON_ONCE(!mmu->pml4_root)) {
-			r = -EIO;
-			goto out_unlock;
-		}
-		mmu->pml4_root[0] = __pa(mmu->pae_root) | pm_mask;
-
-		if (mmu->root_role.level == PT64_ROOT_5LEVEL) {
-			if (WARN_ON_ONCE(!mmu->pml5_root)) {
-				r = -EIO;
-				goto out_unlock;
-			}
-			mmu->pml5_root[0] = __pa(mmu->pml4_root) | pm_mask;
-		}
-	}
-
-	for (i = 0; i < 4; ++i) {
-		WARN_ON_ONCE(IS_VALID_PAE_ROOT(mmu->pae_root[i]));
-
-		if (mmu->cpu_role.base.level == PT32E_ROOT_LEVEL) {
-			if (!(pdptrs[i] & PT_PRESENT_MASK)) {
-				mmu->pae_root[i] = INVALID_PAE_ROOT;
-				continue;
-			}
-			root_gfn = pdptrs[i] >> PAGE_SHIFT;
-		}
-
-		root = mmu_alloc_root(vcpu, root_gfn, i << 30,
-				      PT32_ROOT_LEVEL, false);
-		mmu->pae_root[i] = root | pm_mask;
-	}
-
-	if (mmu->root_role.level == PT64_ROOT_5LEVEL)
-		mmu->root.hpa = __pa(mmu->pml5_root);
-	else if (mmu->root_role.level == PT64_ROOT_4LEVEL)
-		mmu->root.hpa = __pa(mmu->pml4_root);
-	else
-		mmu->root.hpa = __pa(mmu->pae_root);
-
-set_root_pgd:
+	root = mmu_alloc_root(vcpu, root_gfn, 0,
+			      mmu->root_role.level, false);
+	mmu->root.hpa = root;
 	mmu->root.pgd = root_pgd;
 out_unlock:
 	write_unlock(&vcpu->kvm->mmu_lock);
@@ -3892,8 +3767,7 @@ static bool is_unsync_root(hpa_t root)
 
 void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
 {
-	int i;
-	struct kvm_mmu_page *sp;
+	hpa_t root = vcpu->arch.mmu->root.hpa;
 
 	if (vcpu->arch.mmu->root_role.direct)
 		return;
@@ -3903,31 +3777,11 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
 
 	vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY);
 
-	if (vcpu->arch.mmu->cpu_role.base.level >= PT64_ROOT_4LEVEL) {
-		hpa_t root = vcpu->arch.mmu->root.hpa;
-		sp = to_shadow_page(root);
-
-		if (!is_unsync_root(root))
-			return;
-
-		write_lock(&vcpu->kvm->mmu_lock);
-		mmu_sync_children(vcpu, sp, true);
-		write_unlock(&vcpu->kvm->mmu_lock);
+	if (!is_unsync_root(root))
 		return;
-	}
 
 	write_lock(&vcpu->kvm->mmu_lock);
-
-	for (i = 0; i < 4; ++i) {
-		hpa_t root = vcpu->arch.mmu->pae_root[i];
-
-		if (IS_VALID_PAE_ROOT(root)) {
-			root &= PT64_BASE_ADDR_MASK;
-			sp = to_shadow_page(root);
-			mmu_sync_children(vcpu, sp, true);
-		}
-	}
-
+	mmu_sync_children(vcpu, to_shadow_page(root), true);
 	write_unlock(&vcpu->kvm->mmu_lock);
 }
 
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 6e3df84e8455..cd6032e1947c 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -316,6 +316,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 	u16 errcode = 0;
 	gpa_t real_gpa;
 	gfn_t gfn;
+	int i;
 
 	trace_kvm_mmu_pagetable_walk(addr, access);
 retry_walk:
@@ -323,6 +324,20 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 	pte           = mmu->get_guest_pgd(vcpu);
 	have_ad       = PT_HAVE_ACCESSED_DIRTY(mmu);
 
+	/*
+	 * Initialize the guest walker with default values.  These values will
+	 * be used in cases where KVM shadows a guest page table structure
+	 * with more levels than what the guest.  For example, KVM shadows
+	 * 3-level nested NPT for 32 bit L1 with 5-level NPT paging.
+	 *
+	 * Note, the gfn is technically ignored for these local shadow pages,
+	 * but it's more consistent to always pass 0 to kvm_mmu_get_page().
+	 */
+	for (i = PT32_ROOT_LEVEL; i < PT_MAX_FULL_LEVELS; i++) {
+		walker->table_gfn[i] = 0;
+		walker->pt_access[i] = ACC_ALL;
+	}
+
 #if PTTYPE == 64
 	walk_nx_mask = 1ULL << PT64_NX_SHIFT;
 	if (walker->level == PT32E_ROOT_LEVEL) {
@@ -675,7 +690,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 		 * Verify that the gpte in the page we've just write
 		 * protected is still there.
 		 */
-		if (FNAME(gpte_changed)(vcpu, gw, it.level - 1))
+		if (it.level - 1 < top_level &&
+		    FNAME(gpte_changed)(vcpu, gw, it.level - 1))
 			goto out_gpte_changed;
 
 		if (sp)
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V3 07/12] KVM: X86/MMU: Remove the check of the return value of to_shadow_page()
  2022-05-21 13:16 [PATCH V3 00/12] KVM: X86/MMU: Use one-off local shadow page for special roots Lai Jiangshan
                   ` (5 preceding siblings ...)
  2022-05-21 13:16 ` [PATCH V3 06/12] KVM: X86/MMU: Activate local shadow pages and remove old logic Lai Jiangshan
@ 2022-05-21 13:16 ` Lai Jiangshan
  2022-07-19 22:42   ` Sean Christopherson
  2022-05-21 13:16 ` [PATCH V3 08/12] KVM: X86/MMU: Allocate mmu->pae_root for PAE paging on-demand Lai Jiangshan
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Lai Jiangshan @ 2022-05-21 13:16 UTC (permalink / raw)
  To: linux-kernel, kvm, Paolo Bonzini, Sean Christopherson
  Cc: Vitaly Kuznetsov, Maxim Levitsky, David Matlack, Lai Jiangshan

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

Remove the check of the return value of to_shadow_page() in
mmu_free_root_page(), kvm_mmu_free_guest_mode_roots(), is_unsync_root()
and is_tdp_mmu() because it can not return NULL.

Remove the check of the return value of to_shadow_page() in
is_page_fault_stale() and is_obsolete_root() because it can not return
NULL and the obsoleting for local shadow page is already handled by
a different way.

When the obsoleting process is done, all the obsoleted non-local shadow
pages are already unlinked from the local shadow pages by the help of
the parent rmap from the children and the local shadow pages become
theoretically valid again.  The local shadow page can be freed if
is_obsolete_sp() return true, or be reused if is_obsolete_sp() becomes
false.

Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
 arch/x86/kvm/mmu/mmu.c     | 44 +++-----------------------------------
 arch/x86/kvm/mmu/tdp_mmu.h |  7 +-----
 2 files changed, 4 insertions(+), 47 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 684a0221aa4c..90b715eefe6a 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3397,8 +3397,6 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa,
 		return;
 
 	sp = to_shadow_page(*root_hpa & PT64_BASE_ADDR_MASK);
-	if (WARN_ON(!sp))
-		return;
 
 	if (is_tdp_mmu_page(sp))
 		kvm_tdp_mmu_put_root(kvm, sp, false);
@@ -3501,8 +3499,7 @@ void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu)
 		if (!VALID_PAGE(root_hpa))
 			continue;
 
-		if (!to_shadow_page(root_hpa) ||
-			to_shadow_page(root_hpa)->role.guest_mode)
+		if (to_shadow_page(root_hpa)->role.guest_mode)
 			roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
 	}
 
@@ -3752,13 +3749,6 @@ static bool is_unsync_root(hpa_t root)
 	smp_rmb();
 	sp = to_shadow_page(root);
 
-	/*
-	 * PAE roots (somewhat arbitrarily) aren't backed by shadow pages, the
-	 * PDPTEs for a given PAE root need to be synchronized individually.
-	 */
-	if (WARN_ON_ONCE(!sp))
-		return false;
-
 	if (sp->unsync || sp->unsync_children)
 		return true;
 
@@ -4068,21 +4058,7 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 static bool is_page_fault_stale(struct kvm_vcpu *vcpu,
 				struct kvm_page_fault *fault, int mmu_seq)
 {
-	struct kvm_mmu_page *sp = to_shadow_page(vcpu->arch.mmu->root.hpa);
-
-	/* Special roots, e.g. pae_root, are not backed by shadow pages. */
-	if (sp && is_obsolete_sp(vcpu->kvm, sp))
-		return true;
-
-	/*
-	 * Roots without an associated shadow page are considered invalid if
-	 * there is a pending request to free obsolete roots.  The request is
-	 * only a hint that the current root _may_ be obsolete and needs to be
-	 * reloaded, e.g. if the guest frees a PGD that KVM is tracking as a
-	 * previous root, then __kvm_mmu_prepare_zap_page() signals all vCPUs
-	 * to reload even if no vCPU is actively using the root.
-	 */
-	if (!sp && kvm_test_request(KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, vcpu))
+	if (is_obsolete_sp(vcpu->kvm, to_shadow_page(vcpu->arch.mmu->root.hpa)))
 		return true;
 
 	return fault->slot &&
@@ -5190,24 +5166,10 @@ void kvm_mmu_unload(struct kvm_vcpu *vcpu)
 
 static bool is_obsolete_root(struct kvm *kvm, hpa_t root_hpa)
 {
-	struct kvm_mmu_page *sp;
-
 	if (!VALID_PAGE(root_hpa))
 		return false;
 
-	/*
-	 * When freeing obsolete roots, treat roots as obsolete if they don't
-	 * have an associated shadow page.  This does mean KVM will get false
-	 * positives and free roots that don't strictly need to be freed, but
-	 * such false positives are relatively rare:
-	 *
-	 *  (a) only PAE paging and nested NPT has roots without shadow pages
-	 *  (b) remote reloads due to a memslot update obsoletes _all_ roots
-	 *  (c) KVM doesn't track previous roots for PAE paging, and the guest
-	 *      is unlikely to zap an in-use PGD.
-	 */
-	sp = to_shadow_page(root_hpa);
-	return !sp || is_obsolete_sp(kvm, sp);
+	return is_obsolete_sp(kvm, to_shadow_page(root_hpa));
 }
 
 static void __kvm_mmu_free_obsolete_roots(struct kvm *kvm, struct kvm_mmu *mmu)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index c163f7cc23ca..5779a2a7161e 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -78,13 +78,8 @@ static inline bool is_tdp_mmu(struct kvm_mmu *mmu)
 	if (WARN_ON(!VALID_PAGE(hpa)))
 		return false;
 
-	/*
-	 * A NULL shadow page is legal when shadowing a non-paging guest with
-	 * PAE paging, as the MMU will be direct with root_hpa pointing at the
-	 * pae_root page, not a shadow page.
-	 */
 	sp = to_shadow_page(hpa);
-	return sp && is_tdp_mmu_page(sp) && sp->root_count;
+	return is_tdp_mmu_page(sp) && sp->root_count;
 }
 #else
 static inline int kvm_mmu_init_tdp_mmu(struct kvm *kvm) { return 0; }
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V3 08/12] KVM: X86/MMU: Allocate mmu->pae_root for PAE paging on-demand
  2022-05-21 13:16 [PATCH V3 00/12] KVM: X86/MMU: Use one-off local shadow page for special roots Lai Jiangshan
                   ` (6 preceding siblings ...)
  2022-05-21 13:16 ` [PATCH V3 07/12] KVM: X86/MMU: Remove the check of the return value of to_shadow_page() Lai Jiangshan
@ 2022-05-21 13:16 ` Lai Jiangshan
  2022-07-19 23:08   ` Sean Christopherson
  2022-05-21 13:16 ` [PATCH V3 09/12] KVM: X86/MMU: Move the verifying of NPT's PDPTE in FNAME(fetch) Lai Jiangshan
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Lai Jiangshan @ 2022-05-21 13:16 UTC (permalink / raw)
  To: linux-kernel, kvm, Paolo Bonzini, Sean Christopherson
  Cc: Vitaly Kuznetsov, Maxim Levitsky, David Matlack, Lai Jiangshan

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

mmu->pae_root for non-PAE paging is allocated on-demand, but
mmu->pae_root for PAE paging is allocated early when struct kvm_mmu is
being created.

Simplify the code to allocate mmu->pae_root for PAE paging and make
it on-demand.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
 arch/x86/include/asm/kvm_host.h |   2 +-
 arch/x86/kvm/mmu/mmu.c          | 101 +++++++++++++-------------------
 arch/x86/kvm/x86.c              |   4 +-
 3 files changed, 44 insertions(+), 63 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9cdc5bbd721f..fb9751dfc1a7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1615,7 +1615,7 @@ int kvm_mmu_vendor_module_init(void);
 void kvm_mmu_vendor_module_exit(void);
 
 void kvm_mmu_destroy(struct kvm_vcpu *vcpu);
-int kvm_mmu_create(struct kvm_vcpu *vcpu);
+void kvm_mmu_create(struct kvm_vcpu *vcpu);
 int kvm_mmu_init_vm(struct kvm *kvm);
 void kvm_mmu_uninit_vm(struct kvm *kvm);
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 90b715eefe6a..63c2b2c6122c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -668,6 +668,41 @@ static void walk_shadow_page_lockless_end(struct kvm_vcpu *vcpu)
 	}
 }
 
+static int mmu_alloc_pae_root(struct kvm_vcpu *vcpu)
+{
+	struct page *page;
+
+	if (vcpu->arch.mmu->root_role.level != PT32E_ROOT_LEVEL)
+		return 0;
+	if (vcpu->arch.mmu->pae_root)
+		return 0;
+
+	/*
+	 * Allocate a page to hold the four PDPTEs for PAE paging when emulating
+	 * 32-bit mode.  CR3 is only 32 bits even on x86_64 in this case.
+	 * Therefore we need to allocate the PDP table in the first 4GB of
+	 * memory, which happens to fit the DMA32 zone.
+	 */
+	page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_DMA32);
+	if (!page)
+		return -ENOMEM;
+	vcpu->arch.mmu->pae_root = page_address(page);
+
+	/*
+	 * CR3 is only 32 bits when PAE paging is used, thus it's impossible to
+	 * get the CPU to treat the PDPTEs as encrypted.  Decrypt the page so
+	 * that KVM's writes and the CPU's reads get along.  Note, this is
+	 * only necessary when using shadow paging, as 64-bit NPT can get at
+	 * the C-bit even when shadowing 32-bit NPT, and SME isn't supported
+	 * by 32-bit kernels (when KVM itself uses 32-bit NPT).
+	 */
+	if (!tdp_enabled)
+		set_memory_decrypted((unsigned long)vcpu->arch.mmu->pae_root, 1);
+	else
+		WARN_ON_ONCE(shadow_me_value);
+	return 0;
+}
+
 static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
 {
 	int r;
@@ -5127,6 +5162,9 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 	r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->root_role.direct);
 	if (r)
 		goto out;
+	r = mmu_alloc_pae_root(vcpu);
+	if (r)
+		return r;
 	r = mmu_alloc_special_roots(vcpu);
 	if (r)
 		goto out;
@@ -5591,63 +5629,18 @@ static void free_mmu_pages(struct kvm_mmu *mmu)
 	free_page((unsigned long)mmu->pml5_root);
 }
 
-static int __kvm_mmu_create(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
+static void __kvm_mmu_create(struct kvm_mmu *mmu)
 {
-	struct page *page;
 	int i;
 
 	mmu->root.hpa = INVALID_PAGE;
 	mmu->root.pgd = 0;
 	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
 		mmu->prev_roots[i] = KVM_MMU_ROOT_INFO_INVALID;
-
-	/* vcpu->arch.guest_mmu isn't used when !tdp_enabled. */
-	if (!tdp_enabled && mmu == &vcpu->arch.guest_mmu)
-		return 0;
-
-	/*
-	 * When using PAE paging, the four PDPTEs are treated as 'root' pages,
-	 * while the PDP table is a per-vCPU construct that's allocated at MMU
-	 * creation.  When emulating 32-bit mode, cr3 is only 32 bits even on
-	 * x86_64.  Therefore we need to allocate the PDP table in the first
-	 * 4GB of memory, which happens to fit the DMA32 zone.  TDP paging
-	 * generally doesn't use PAE paging and can skip allocating the PDP
-	 * table.  The main exception, handled here, is SVM's 32-bit NPT.  The
-	 * other exception is for shadowing L1's 32-bit or PAE NPT on 64-bit
-	 * KVM; that horror is handled on-demand by mmu_alloc_special_roots().
-	 */
-	if (tdp_enabled && kvm_mmu_get_tdp_level(vcpu) > PT32E_ROOT_LEVEL)
-		return 0;
-
-	page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_DMA32);
-	if (!page)
-		return -ENOMEM;
-
-	mmu->pae_root = page_address(page);
-
-	/*
-	 * CR3 is only 32 bits when PAE paging is used, thus it's impossible to
-	 * get the CPU to treat the PDPTEs as encrypted.  Decrypt the page so
-	 * that KVM's writes and the CPU's reads get along.  Note, this is
-	 * only necessary when using shadow paging, as 64-bit NPT can get at
-	 * the C-bit even when shadowing 32-bit NPT, and SME isn't supported
-	 * by 32-bit kernels (when KVM itself uses 32-bit NPT).
-	 */
-	if (!tdp_enabled)
-		set_memory_decrypted((unsigned long)mmu->pae_root, 1);
-	else
-		WARN_ON_ONCE(shadow_me_value);
-
-	for (i = 0; i < 4; ++i)
-		mmu->pae_root[i] = INVALID_PAE_ROOT;
-
-	return 0;
 }
 
-int kvm_mmu_create(struct kvm_vcpu *vcpu)
+void kvm_mmu_create(struct kvm_vcpu *vcpu)
 {
-	int ret;
-
 	vcpu->arch.mmu_pte_list_desc_cache.kmem_cache = pte_list_desc_cache;
 	vcpu->arch.mmu_pte_list_desc_cache.gfp_zero = __GFP_ZERO;
 
@@ -5659,18 +5652,8 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.mmu = &vcpu->arch.root_mmu;
 	vcpu->arch.walk_mmu = &vcpu->arch.root_mmu;
 
-	ret = __kvm_mmu_create(vcpu, &vcpu->arch.guest_mmu);
-	if (ret)
-		return ret;
-
-	ret = __kvm_mmu_create(vcpu, &vcpu->arch.root_mmu);
-	if (ret)
-		goto fail_allocate_root;
-
-	return ret;
- fail_allocate_root:
-	free_mmu_pages(&vcpu->arch.guest_mmu);
-	return ret;
+	__kvm_mmu_create(&vcpu->arch.guest_mmu);
+	__kvm_mmu_create(&vcpu->arch.root_mmu);
 }
 
 #define BATCH_ZAP_PAGES	10
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 04812eaaf61b..064aecb188dc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11285,9 +11285,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 	else
 		vcpu->arch.mp_state = KVM_MP_STATE_UNINITIALIZED;
 
-	r = kvm_mmu_create(vcpu);
-	if (r < 0)
-		return r;
+	kvm_mmu_create(vcpu);
 
 	if (irqchip_in_kernel(vcpu->kvm)) {
 		r = kvm_create_lapic(vcpu, lapic_timer_advance_ns);
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V3 09/12] KVM: X86/MMU: Move the verifying of NPT's PDPTE in FNAME(fetch)
  2022-05-21 13:16 [PATCH V3 00/12] KVM: X86/MMU: Use one-off local shadow page for special roots Lai Jiangshan
                   ` (7 preceding siblings ...)
  2022-05-21 13:16 ` [PATCH V3 08/12] KVM: X86/MMU: Allocate mmu->pae_root for PAE paging on-demand Lai Jiangshan
@ 2022-05-21 13:16 ` Lai Jiangshan
  2022-07-19 23:21   ` Sean Christopherson
  2022-05-21 13:16 ` [PATCH V3 10/12] KVM: X86/MMU: Remove unused INVALID_PAE_ROOT and IS_VALID_PAE_ROOT Lai Jiangshan
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Lai Jiangshan @ 2022-05-21 13:16 UTC (permalink / raw)
  To: linux-kernel, kvm, Paolo Bonzini, Sean Christopherson
  Cc: Vitaly Kuznetsov, Maxim Levitsky, David Matlack, Lai Jiangshan

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

FNAME(page_fault) verifies PDPTE for nested NPT in PAE paging mode
because nested_svm_get_tdp_pdptr() reads the guest NPT's PDPTE from
memory unconditionally for each call.

The verifying is complicated and it works only when mmu->pae_root
is always used when the guest is PAE paging.

Move the verifying code in FNAME(fetch) and simplify it since the local
shadow page is used and it can be walked in FNAME(fetch) and unlinked
from children via drop_spte().

It also allows for mmu->pae_root NOT to be used when it is NOT required
to be put in a 32bit CR3.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
 arch/x86/kvm/mmu/paging_tmpl.h | 72 ++++++++++++++++------------------
 1 file changed, 33 insertions(+), 39 deletions(-)

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index cd6032e1947c..67c419bce1e5 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -659,6 +659,39 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 		clear_sp_write_flooding_count(it.sptep);
 		drop_large_spte(vcpu, it.sptep);
 
+		/*
+		 * When nested NPT enabled and L1 is PAE paging,
+		 * mmu->get_pdptrs() which is nested_svm_get_tdp_pdptr() reads
+		 * the guest NPT's PDPTE from memory unconditionally for each
+		 * call.
+		 *
+		 * The guest PAE root page is not write-protected.
+		 *
+		 * The mmu->get_pdptrs() in FNAME(walk_addr_generic) might get
+		 * a value different from previous calls or different from the
+		 * return value of mmu->get_pdptrs() in mmu_alloc_shadow_roots().
+		 *
+		 * It will cause the following code installs the spte in a wrong
+		 * sp or links a sp to a wrong parent if the return value of
+		 * mmu->get_pdptrs() is not verified unchanged since
+		 * FNAME(gpte_changed) can't check this kind of change.
+		 *
+		 * Verify the return value of mmu->get_pdptrs() (only the gfn
+		 * in it needs to be checked) and drop the spte if the gfn isn't
+		 * matched.
+		 *
+		 * Do the verifying unconditionally when the guest is PAE
+		 * paging no matter whether it is nested NPT or not to avoid
+		 * complicated code.
+		 */
+		if (vcpu->arch.mmu->cpu_role.base.level == PT32E_ROOT_LEVEL &&
+		    it.level == PT32E_ROOT_LEVEL &&
+		    is_shadow_present_pte(*it.sptep)) {
+			sp = to_shadow_page(*it.sptep & PT64_BASE_ADDR_MASK);
+			if (gw->table_gfn[it.level - 2] != sp->gfn)
+				drop_spte(vcpu->kvm, it.sptep);
+		}
+
 		sp = NULL;
 		if (!is_shadow_present_pte(*it.sptep)) {
 			table_gfn = gw->table_gfn[it.level - 2];
@@ -886,44 +919,6 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	if (is_page_fault_stale(vcpu, fault, mmu_seq))
 		goto out_unlock;
 
-	/*
-	 * When nested NPT enabled and L1 is PAE paging, mmu->get_pdptrs()
-	 * which is nested_svm_get_tdp_pdptr() reads the guest NPT's PDPTE
-	 * from memory unconditionally for each call.
-	 *
-	 * The guest PAE root page is not write-protected.
-	 *
-	 * The mmu->get_pdptrs() in FNAME(walk_addr_generic) might get a value
-	 * different from previous calls or different from the return value of
-	 * mmu->get_pdptrs() in mmu_alloc_shadow_roots().
-	 *
-	 * It will cause FNAME(fetch) installs the spte in a wrong sp or links
-	 * a sp to a wrong parent if the return value of mmu->get_pdptrs()
-	 * is not verified unchanged since FNAME(gpte_changed) can't check
-	 * this kind of change.
-	 *
-	 * Verify the return value of mmu->get_pdptrs() (only the gfn in it
-	 * needs to be checked) and do kvm_mmu_free_roots() like load_pdptr()
-	 * if the gfn isn't matched.
-	 *
-	 * Do the verifying unconditionally when the guest is PAE paging no
-	 * matter whether it is nested NPT or not to avoid complicated code.
-	 */
-	if (vcpu->arch.mmu->cpu_role.base.level == PT32E_ROOT_LEVEL) {
-		u64 pdpte = vcpu->arch.mmu->pae_root[(fault->addr >> 30) & 3];
-		struct kvm_mmu_page *sp = NULL;
-
-		if (IS_VALID_PAE_ROOT(pdpte))
-			sp = to_shadow_page(pdpte & PT64_BASE_ADDR_MASK);
-
-		if (!sp || walker.table_gfn[PT32E_ROOT_LEVEL - 2] != sp->gfn) {
-			write_unlock(&vcpu->kvm->mmu_lock);
-			kvm_mmu_free_roots(vcpu->kvm, vcpu->arch.mmu,
-					   KVM_MMU_ROOT_CURRENT);
-			goto release_clean;
-		}
-	}
-
 	r = make_mmu_pages_available(vcpu);
 	if (r)
 		goto out_unlock;
@@ -931,7 +926,6 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 
 out_unlock:
 	write_unlock(&vcpu->kvm->mmu_lock);
-release_clean:
 	kvm_release_pfn_clean(fault->pfn);
 	return r;
 }
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V3 10/12] KVM: X86/MMU: Remove unused INVALID_PAE_ROOT and IS_VALID_PAE_ROOT
  2022-05-21 13:16 [PATCH V3 00/12] KVM: X86/MMU: Use one-off local shadow page for special roots Lai Jiangshan
                   ` (8 preceding siblings ...)
  2022-05-21 13:16 ` [PATCH V3 09/12] KVM: X86/MMU: Move the verifying of NPT's PDPTE in FNAME(fetch) Lai Jiangshan
@ 2022-05-21 13:16 ` Lai Jiangshan
  2022-07-19 23:11   ` Sean Christopherson
  2022-05-21 13:16 ` [PATCH V3 11/12] KVM: X86/MMU: Don't use mmu->pae_root when shadowing PAE NPT in 64-bit host Lai Jiangshan
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Lai Jiangshan @ 2022-05-21 13:16 UTC (permalink / raw)
  To: linux-kernel, kvm, Paolo Bonzini, Sean Christopherson
  Cc: Vitaly Kuznetsov, Maxim Levitsky, David Matlack, Lai Jiangshan

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

They are unused and replaced with 0ull like other zero sptes and
is_shadow_present_pte().

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
 arch/x86/kvm/mmu/mmu_internal.h | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index bd2a26897b97..4feb1ac2742c 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -20,16 +20,6 @@ extern bool dbg;
 #define MMU_WARN_ON(x) do { } while (0)
 #endif
 
-/*
- * Unlike regular MMU roots, PAE "roots", a.k.a. PDPTEs/PDPTRs, have a PRESENT
- * bit, and thus are guaranteed to be non-zero when valid.  And, when a guest
- * PDPTR is !PRESENT, its corresponding PAE root cannot be set to INVALID_PAGE,
- * as the CPU would treat that as PRESENT PDPTR with reserved bits set.  Use
- * '0' instead of INVALID_PAGE to indicate an invalid PAE root.
- */
-#define INVALID_PAE_ROOT	0
-#define IS_VALID_PAE_ROOT(x)	(!!(x))
-
 typedef u64 __rcu *tdp_ptep_t;
 
 struct kvm_mmu_page {
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V3 11/12] KVM: X86/MMU: Don't use mmu->pae_root when shadowing PAE NPT in 64-bit host
  2022-05-21 13:16 [PATCH V3 00/12] KVM: X86/MMU: Use one-off local shadow page for special roots Lai Jiangshan
                   ` (9 preceding siblings ...)
  2022-05-21 13:16 ` [PATCH V3 10/12] KVM: X86/MMU: Remove unused INVALID_PAE_ROOT and IS_VALID_PAE_ROOT Lai Jiangshan
@ 2022-05-21 13:16 ` Lai Jiangshan
  2022-07-19 23:26   ` Sean Christopherson
  2022-05-21 13:17 ` [PATCH V3 12/12] KVM: X86/MMU: Remove mmu_alloc_special_roots() Lai Jiangshan
  2022-05-26  8:49 ` [PATCH V3 00/12] KVM: X86/MMU: Use one-off local shadow page for special roots Lai Jiangshan
  12 siblings, 1 reply; 30+ messages in thread
From: Lai Jiangshan @ 2022-05-21 13:16 UTC (permalink / raw)
  To: linux-kernel, kvm, Paolo Bonzini, Sean Christopherson
  Cc: Vitaly Kuznetsov, Maxim Levitsky, David Matlack, Lai Jiangshan

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

Allocate the tables when allocating the local shadow page.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
 arch/x86/kvm/mmu/mmu.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 63c2b2c6122c..73e6a8e1e1a9 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1809,10 +1809,12 @@ static bool using_local_root_page(struct kvm_mmu *mmu)
  * 2 or 3 levels of local shadow pages on top of non-local shadow pages.
  *
  * Local shadow pages are locally allocated.  If the local shadow page's level
- * is PT32E_ROOT_LEVEL, it will use the preallocated mmu->pae_root for its
- * sp->spt.  Because sp->spt may need to be put in the 32 bits CR3 (even in
- * x86_64) or decrypted.  Using the preallocated one to handle these
- * requirements makes the allocation simpler.
+ * is PT32E_ROOT_LEVEL, and it is not shadowing nested NPT for 32-bit L1 in
+ * 64-bit L0 (or said when the shadow pagetable's level is PT32E_ROOT_LEVEL),
+ * it will use the preallocated mmu->pae_root for its sp->spt.  Because sp->spt
+ * need to be put in the 32-bit CR3 (even in 64-bit host) or decrypted.  Using
+ * the preallocated one to handle these requirements makes the allocation
+ * simpler.
  *
  * Local shadow pages are only visible to local VCPU except through
  * sp->parent_ptes rmap from their children, so they are not in the
@@ -1852,13 +1854,12 @@ kvm_mmu_alloc_local_shadow_page(struct kvm_vcpu *vcpu, union kvm_mmu_page_role r
 	sp->gfn = 0;
 	sp->role = role;
 	/*
-	 * Use the preallocated mmu->pae_root when the shadow page's
-	 * level is PT32E_ROOT_LEVEL which may need to be put in the 32 bits
+	 * Use the preallocated mmu->pae_root when the shadow pagetable's
+	 * level is PT32E_ROOT_LEVEL which need to be put in the 32 bits
 	 * CR3 (even in x86_64) or decrypted.  The preallocated one is prepared
 	 * for the requirements.
 	 */
-	if (role.level == PT32E_ROOT_LEVEL &&
-	    !WARN_ON_ONCE(!vcpu->arch.mmu->pae_root))
+	if (vcpu->arch.mmu->root_role.level == PT32E_ROOT_LEVEL)
 		sp->spt = vcpu->arch.mmu->pae_root;
 	else
 		sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache);
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V3 12/12] KVM: X86/MMU: Remove mmu_alloc_special_roots()
  2022-05-21 13:16 [PATCH V3 00/12] KVM: X86/MMU: Use one-off local shadow page for special roots Lai Jiangshan
                   ` (10 preceding siblings ...)
  2022-05-21 13:16 ` [PATCH V3 11/12] KVM: X86/MMU: Don't use mmu->pae_root when shadowing PAE NPT in 64-bit host Lai Jiangshan
@ 2022-05-21 13:17 ` Lai Jiangshan
  2022-05-26  8:49 ` [PATCH V3 00/12] KVM: X86/MMU: Use one-off local shadow page for special roots Lai Jiangshan
  12 siblings, 0 replies; 30+ messages in thread
From: Lai Jiangshan @ 2022-05-21 13:17 UTC (permalink / raw)
  To: linux-kernel, kvm, Paolo Bonzini, Sean Christopherson
  Cc: Vitaly Kuznetsov, Maxim Levitsky, David Matlack, Lai Jiangshan

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

mmu_alloc_special_roots() allocates mmu->pae_root for non-PAE paging
(as for shadowing 32bit NPT on 64 bit host) and mmu->pml4_root and
mmu->pml5_root.

But mmu->pml4_root and mmu->pml5_root is not used, neither mmu->pae_root
for non-PAE paging.

So remove mmu_alloc_special_roots(), mmu->pml4_root and mmu->pml5_root.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
 arch/x86/include/asm/kvm_host.h |  3 --
 arch/x86/kvm/mmu/mmu.c          | 77 ---------------------------------
 2 files changed, 80 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index fb9751dfc1a7..ec44e6c3d5ea 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -458,9 +458,6 @@ struct kvm_mmu {
 	u8 permissions[16];
 
 	u64 *pae_root;
-	u64 *pml4_root;
-	u64 *pml5_root;
-
 	/*
 	 * check zero bits on shadow page table entries, these
 	 * bits include not only hardware reserved bits but also
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 73e6a8e1e1a9..b8eed217314d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3691,78 +3691,6 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	return r;
 }
 
-static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
-{
-	struct kvm_mmu *mmu = vcpu->arch.mmu;
-	bool need_pml5 = mmu->root_role.level > PT64_ROOT_4LEVEL;
-	u64 *pml5_root = NULL;
-	u64 *pml4_root = NULL;
-	u64 *pae_root;
-
-	/*
-	 * When shadowing 32-bit or PAE NPT with 64-bit NPT, the PML4 and PDP
-	 * tables are allocated and initialized at root creation as there is no
-	 * equivalent level in the guest's NPT to shadow.  Allocate the tables
-	 * on demand, as running a 32-bit L1 VMM on 64-bit KVM is very rare.
-	 */
-	if (mmu->root_role.direct ||
-	    mmu->cpu_role.base.level >= PT64_ROOT_4LEVEL ||
-	    mmu->root_role.level < PT64_ROOT_4LEVEL)
-		return 0;
-
-	/*
-	 * NPT, the only paging mode that uses this horror, uses a fixed number
-	 * of levels for the shadow page tables, e.g. all MMUs are 4-level or
-	 * all MMus are 5-level.  Thus, this can safely require that pml5_root
-	 * is allocated if the other roots are valid and pml5 is needed, as any
-	 * prior MMU would also have required pml5.
-	 */
-	if (mmu->pae_root && mmu->pml4_root && (!need_pml5 || mmu->pml5_root))
-		return 0;
-
-	/*
-	 * The special roots should always be allocated in concert.  Yell and
-	 * bail if KVM ends up in a state where only one of the roots is valid.
-	 */
-	if (WARN_ON_ONCE(!tdp_enabled || mmu->pae_root || mmu->pml4_root ||
-			 (need_pml5 && mmu->pml5_root)))
-		return -EIO;
-
-	/*
-	 * Unlike 32-bit NPT, the PDP table doesn't need to be in low mem, and
-	 * doesn't need to be decrypted.
-	 */
-	pae_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
-	if (!pae_root)
-		return -ENOMEM;
-
-#ifdef CONFIG_X86_64
-	pml4_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
-	if (!pml4_root)
-		goto err_pml4;
-
-	if (need_pml5) {
-		pml5_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
-		if (!pml5_root)
-			goto err_pml5;
-	}
-#endif
-
-	mmu->pae_root = pae_root;
-	mmu->pml4_root = pml4_root;
-	mmu->pml5_root = pml5_root;
-
-	return 0;
-
-#ifdef CONFIG_X86_64
-err_pml5:
-	free_page((unsigned long)pml4_root);
-err_pml4:
-	free_page((unsigned long)pae_root);
-	return -ENOMEM;
-#endif
-}
-
 static bool is_unsync_root(hpa_t root)
 {
 	struct kvm_mmu_page *sp;
@@ -5166,9 +5094,6 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 	r = mmu_alloc_pae_root(vcpu);
 	if (r)
 		return r;
-	r = mmu_alloc_special_roots(vcpu);
-	if (r)
-		goto out;
 	if (vcpu->arch.mmu->root_role.direct)
 		r = mmu_alloc_direct_roots(vcpu);
 	else
@@ -5626,8 +5551,6 @@ static void free_mmu_pages(struct kvm_mmu *mmu)
 	if (!tdp_enabled && mmu->pae_root)
 		set_memory_encrypted((unsigned long)mmu->pae_root, 1);
 	free_page((unsigned long)mmu->pae_root);
-	free_page((unsigned long)mmu->pml4_root);
-	free_page((unsigned long)mmu->pml5_root);
 }
 
 static void __kvm_mmu_create(struct kvm_mmu *mmu)
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 00/12] KVM: X86/MMU: Use one-off local shadow page for special roots
  2022-05-21 13:16 [PATCH V3 00/12] KVM: X86/MMU: Use one-off local shadow page for special roots Lai Jiangshan
                   ` (11 preceding siblings ...)
  2022-05-21 13:17 ` [PATCH V3 12/12] KVM: X86/MMU: Remove mmu_alloc_special_roots() Lai Jiangshan
@ 2022-05-26  8:49 ` Lai Jiangshan
  2022-05-26 20:27   ` David Matlack
  12 siblings, 1 reply; 30+ messages in thread
From: Lai Jiangshan @ 2022-05-26  8:49 UTC (permalink / raw)
  To: LKML, open list:KERNEL VIRTUAL MACHINE FOR MIPS (KVM/mips),
	Paolo Bonzini, Sean Christopherson
  Cc: Vitaly Kuznetsov, Maxim Levitsky, David Matlack, Lai Jiangshan

On Sat, May 21, 2022 at 9:16 PM Lai Jiangshan <jiangshanlai@gmail.com> wrote:
>
> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
>
> Current code uses mmu->pae_root, mmu->pml4_root, and mmu->pml5_root to
> setup special roots.  The initialization code is complex and the roots
> are not associated with struct kvm_mmu_page which causes the code more
> complex.
>
> So add new local shadow pages to simplify it.
>
> The local shadow pages are associated with struct kvm_mmu_page and
> VCPU-local.
>
> The local shadow pages are created and freed when the roots are
> changed (or one-off) which can be optimized but not in the patchset
> since the re-creating is light way (in normal case only the struct
> kvm_mmu_page needs to be re-allocated and sp->spt doens't, because
> it is likely to be mmu->pae_root)
>
> The patchset also fixes a possible bug described in:
> https://lore.kernel.org/lkml/20220415103414.86555-1-jiangshanlai@gmail.com/
> as patch1.
>

Ping and please ignore patch1 and patch9. It would not cause any conflict
without patch1 and patch9 if both are ignored together.

The fix is wrong (see new discussion in the above link).  So the possible
correct fix will not have any conflict with this patchset of one-off
local shadow page.  I don't want to add extra stuff in this patchset
anymore.

Thanks
Lai

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 00/12] KVM: X86/MMU: Use one-off local shadow page for special roots
  2022-05-26  8:49 ` [PATCH V3 00/12] KVM: X86/MMU: Use one-off local shadow page for special roots Lai Jiangshan
@ 2022-05-26 20:27   ` David Matlack
  0 siblings, 0 replies; 30+ messages in thread
From: David Matlack @ 2022-05-26 20:27 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: LKML, open list:KERNEL VIRTUAL MACHINE FOR MIPS (KVM/mips),
	Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov,
	Maxim Levitsky, Lai Jiangshan

On Thu, May 26, 2022 at 04:49:09PM +0800, Lai Jiangshan wrote:
> On Sat, May 21, 2022 at 9:16 PM Lai Jiangshan <jiangshanlai@gmail.com> wrote:
> >
> > From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> >
> > Current code uses mmu->pae_root, mmu->pml4_root, and mmu->pml5_root to
> > setup special roots.  The initialization code is complex and the roots
> > are not associated with struct kvm_mmu_page which causes the code more
> > complex.
> >
> > So add new local shadow pages to simplify it.
> >
> > The local shadow pages are associated with struct kvm_mmu_page and
> > VCPU-local.
> >
> > The local shadow pages are created and freed when the roots are
> > changed (or one-off) which can be optimized but not in the patchset
> > since the re-creating is light way (in normal case only the struct
> > kvm_mmu_page needs to be re-allocated and sp->spt doens't, because
> > it is likely to be mmu->pae_root)
> >
> > The patchset also fixes a possible bug described in:
> > https://lore.kernel.org/lkml/20220415103414.86555-1-jiangshanlai@gmail.com/
> > as patch1.
> >
> 
> Ping and please ignore patch1 and patch9. It would not cause any conflict
> without patch1 and patch9 if both are ignored together.
> 
> The fix is wrong (see new discussion in the above link).  So the possible
> correct fix will not have any conflict with this patchset of one-off
> local shadow page.  I don't want to add extra stuff in this patchset
> anymore.

Yeah I agree with splitting this fix out to a separate patchset, and
ordered after this cleanup so it can be done in one patch.

When you get around to it, can you also implement a kvm-unit-test to
demonstrate the bug? It would be good to have a regression test.

> 
> Thanks
> Lai

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 02/12] KVM: X86/MMU: Add using_local_root_page()
  2022-05-21 13:16 ` [PATCH V3 02/12] KVM: X86/MMU: Add using_local_root_page() Lai Jiangshan
@ 2022-05-26 21:28   ` David Matlack
  2022-05-26 21:38     ` Sean Christopherson
  2022-07-19 22:03   ` Sean Christopherson
  1 sibling, 1 reply; 30+ messages in thread
From: David Matlack @ 2022-05-26 21:28 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, kvm, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Maxim Levitsky, Lai Jiangshan

On Sat, May 21, 2022 at 09:16:50PM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> 
> In some cases, local root pages are used for MMU.  It is often using
> to_shadow_page(mmu->root.hpa) to check if local root pages are used.
> 
> Add using_local_root_page() to directly check if local root pages are
> used or needed to be used even mmu->root.hpa is not set.
> 
> Prepare for making to_shadow_page(mmu->root.hpa) returns non-NULL via
> using local shadow [root] pages.
> 
> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 40 +++++++++++++++++++++++++++++++++++++---
>  1 file changed, 37 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index efe5a3dca1e0..624b6d2473f7 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -1690,6 +1690,39 @@ static void drop_parent_pte(struct kvm_mmu_page *sp,
>  	mmu_spte_clear_no_track(parent_pte);
>  }
>  
> +/*
> + * KVM uses the VCPU's local root page (vcpu->mmu->pae_root) when either the
> + * shadow pagetable is using PAE paging or the host is shadowing nested NPT for
> + * 32bit L1 hypervisor.

How about using the terms "private" and "shared" instead of "local" and
"non-local"? I think that more accurately conveys what is special about
these pages: they are private to the vCPU using them. And then "shared"
is more intuitive to understand than "non-local" (which is used
elsewhere in this series).

> + *
> + * It includes cases:
> + *	nonpaging when !tdp_enabled				(direct paging)
> + *	shadow paging for 32 bit guest when !tdp_enabled	(shadow paging)
> + *	NPT in 32bit host (not shadowing nested NPT)		(direct paging)
> + *	shadow nested NPT for 32bit L1 hypervisor in 32bit host (shadow paging)
> + *	shadow nested NPT for 32bit L1 hypervisor in 64bit host (shadow paging)
> + *
> + * For the first four cases, mmu->root_role.level is PT32E_ROOT_LEVEL and the
> + * shadow pagetable is using PAE paging.
> + *
> + * For the last case, it is
> + * 	mmu->root_role.level > PT32E_ROOT_LEVEL &&
> + * 	!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL
> + * And if this condition is true, it must be the last case.
> + *
> + * With the two conditions combined, the checking condition is:
> + * 	mmu->root_role.level == PT32E_ROOT_LEVEL ||
> + * 	(!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL)
> + *
> + * (There is no "mmu->root_role.level > PT32E_ROOT_LEVEL" here, because it is
> + *  already ensured that mmu->root_role.level >= PT32E_ROOT_LEVEL)
> + */
> +static bool using_local_root_page(struct kvm_mmu *mmu)
> +{
> +	return mmu->root_role.level == PT32E_ROOT_LEVEL ||
> +	       (!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL);
> +}
> +
>  static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct)
>  {
>  	struct kvm_mmu_page *sp;
> @@ -4252,10 +4285,11 @@ static bool fast_pgd_switch(struct kvm *kvm, struct kvm_mmu *mmu,
>  {
>  	/*
>  	 * For now, limit the caching to 64-bit hosts+VMs in order to avoid
> -	 * having to deal with PDPTEs. We may add support for 32-bit hosts/VMs
> -	 * later if necessary.
> +	 * having to deal with PDPTEs.  Local roots can not be put into
> +	 * mmu->prev_roots[] because mmu->pae_root can not be shared for
> +	 * different roots at the same time.

This comment ends up being a little confusing by the end of this series
because using_local_root_page() does not necessarily imply pae_root is
in use. i.e. case 5 (shadow nested NPT for 32bit L1 hypervisor in 64bit
host) does not use pae_root.

How about rewording this commit to say something like:

  If the vCPU is using a private root, it might be using pae_root, which
  cannot be shared for different roots at the same time.

>  	 */
> -	if (VALID_PAGE(mmu->root.hpa) && !to_shadow_page(mmu->root.hpa))
> +	if (unlikely(using_local_root_page(mmu)))
>  		kvm_mmu_free_roots(kvm, mmu, KVM_MMU_ROOT_CURRENT);
>  
>  	if (VALID_PAGE(mmu->root.hpa))
> -- 
> 2.19.1.6.gb485710b
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 04/12] KVM: X86/MMU: Add local shadow pages
  2022-05-21 13:16 ` [PATCH V3 04/12] KVM: X86/MMU: Add local shadow pages Lai Jiangshan
@ 2022-05-26 21:38   ` David Matlack
  2022-05-26 22:01   ` David Matlack
  2022-07-20  0:35   ` Sean Christopherson
  2 siblings, 0 replies; 30+ messages in thread
From: David Matlack @ 2022-05-26 21:38 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, kvm, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Maxim Levitsky, Lai Jiangshan

On Sat, May 21, 2022 at 09:16:52PM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> 
> Local shadow pages are shadow pages to hold PDPTEs for 32bit guest or
> higher level shadow pages having children local shadow pages when
> shadowing nested NPT for 32bit L1 in 64 bit L0.
> 
> Current code use mmu->pae_root, mmu->pml4_root, and mmu->pml5_root to
> setup local root page.  The initialization code is complex and the root
> pages are not associated with struct kvm_mmu_page which causes the code
> more complex.
> 
> Add kvm_mmu_alloc_local_shadow_page() and mmu_free_local_root_page() to
> allocate and free local shadow pages and prepare for using local
> shadow pages to replace current logic and share the most logic with
> non-local shadow pages.
> 
> The code is not activated since using_local_root_page() is false in
> the place where it is inserted.
> 
> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 109 ++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 108 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 240ebe589caf..c941a5931bc3 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -1764,6 +1764,76 @@ static bool using_local_root_page(struct kvm_mmu *mmu)
>  		return mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL;
>  }
>  
> +/*
> + * Local shadow pages are shadow pages to hold PDPTEs for 32bit guest or higher
> + * level shadow pages having children local shadow pages when shadowing nested
> + * NPT for 32bit L1 in 64 bit L0.
> + *
> + * Local shadow pages are often local shadow root pages (or local root pages for
> + * short) except when shadowing nested NPT for 32bit L1 in 64 bit L0 which has
> + * 2 or 3 levels of local shadow pages on top of non-local shadow pages.
> + *
> + * Local shadow pages are locally allocated.  If the local shadow page's level

Can you clarify what you mean by "locally allocated"?

> + * is PT32E_ROOT_LEVEL, it will use the preallocated mmu->pae_root for its
> + * sp->spt.  Because sp->spt may need to be put in the 32 bits CR3 (even in
> + * x86_64) or decrypted.  Using the preallocated one to handle these
> + * requirements makes the allocation simpler.
> + *
> + * Local shadow pages are only visible to local VCPU except through
> + * sp->parent_ptes rmap from their children, so they are not in the
> + * kvm->arch.active_mmu_pages nor in the hash.
> + *
> + * And they are neither accounted nor write-protected since they don't shadow a
> + * guest page table.
> + *
> + * Because of above, local shadow pages can not be freed nor zapped like
> + * non-local shadow pages.  They are freed directly when the local root page
> + * is freed, see mmu_free_local_root_page().
> + *
> + * Local root page can not be put on mmu->prev_roots because the comparison
> + * must use PDPTEs instead of CR3 and mmu->pae_root can not be shared for multi
> + * local root pages.
> + *
> + * Except above limitations, all the other abilities are the same as other
> + * shadow page, like link, parent rmap, sync, unsync etc.
> + *
> + * Local shadow pages can be obsoleted in a little different way other than
> + * the non-local shadow pages.  When the obsoleting process is done, all the
> + * obsoleted non-local shadow pages are unlinked from the local shadow pages
> + * by the help of the sp->parent_ptes rmap and the local shadow pages become
> + * theoretically valid again except sp->mmu_valid_gen may be still outdated.
> + * If there is no other event to cause a VCPU to free the local root page and
> + * the VCPU is being preempted by the host during two obsoleting processes,
> + * sp->mmu_valid_gen might become valid again and the VCPU can reuse it when
> + * the VCPU is back.  It is different from the non-local shadow pages which
> + * are always freed after obsoleted.
> + */
> +static struct kvm_mmu_page *
> +kvm_mmu_alloc_local_shadow_page(struct kvm_vcpu *vcpu, union kvm_mmu_page_role role)
> +{
> +	struct kvm_mmu_page *sp;
> +
> +	sp = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
> +	sp->gfn = 0;
> +	sp->role = role;
> +	/*
> +	 * Use the preallocated mmu->pae_root when the shadow page's
> +	 * level is PT32E_ROOT_LEVEL which may need to be put in the 32 bits
> +	 * CR3 (even in x86_64) or decrypted.  The preallocated one is prepared
> +	 * for the requirements.

Thanks for adding this comment. It helps a lot.

> +	 */
> +	if (role.level == PT32E_ROOT_LEVEL &&
> +	    !WARN_ON_ONCE(!vcpu->arch.mmu->pae_root))
> +		sp->spt = vcpu->arch.mmu->pae_root;
> +	else
> +		sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache);
> +	/* sp->gfns is not used for local shadow page */
> +	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
> +	sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen;
> +
> +	return sp;
> +}
> +
>  static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct)
>  {
>  	struct kvm_mmu_page *sp;
> @@ -2121,6 +2191,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>  	if (level <= vcpu->arch.mmu->cpu_role.base.level)
>  		role.passthrough = 0;
>  
> +	if (unlikely(level >= PT32E_ROOT_LEVEL && using_local_root_page(vcpu->arch.mmu)))
> +		return kvm_mmu_alloc_local_shadow_page(vcpu, role);
> +
>  	sp_list = &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)];
>  	for_each_valid_sp(vcpu->kvm, sp, sp_list) {
>  		if (sp->gfn != gfn) {
> @@ -3351,6 +3424,37 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa,
>  	*root_hpa = INVALID_PAGE;
>  }
>  
> +static void mmu_free_local_root_page(struct kvm *kvm, struct kvm_mmu *mmu)
> +{
> +	u64 spte = mmu->root.hpa;
> +	struct kvm_mmu_page *sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK);
> +	int i;
> +
> +	/* Free level 5 or 4 roots for shadow NPT for 32 bit L1 */
> +	while (sp->role.level > PT32E_ROOT_LEVEL)
> +	{
> +		spte = sp->spt[0];
> +		mmu_page_zap_pte(kvm, sp, sp->spt + 0, NULL);
> +		free_page((unsigned long)sp->spt);
> +		kmem_cache_free(mmu_page_header_cache, sp);
> +		if (!is_shadow_present_pte(spte))
> +			return;
> +		sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK);
> +	}
> +
> +	if (WARN_ON_ONCE(sp->role.level != PT32E_ROOT_LEVEL))
> +		return;
> +
> +	/* Disconnect PAE root from the 4 PAE page directories */
> +	for (i = 0; i < 4; i++)
> +		mmu_page_zap_pte(kvm, sp, sp->spt + i, NULL);
> +
> +	if (sp->spt != mmu->pae_root)
> +		free_page((unsigned long)sp->spt);
> +
> +	kmem_cache_free(mmu_page_header_cache, sp);
> +}
> +
>  /* roots_to_free must be some combination of the KVM_MMU_ROOT_* flags */
>  void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu,
>  			ulong roots_to_free)
> @@ -3384,7 +3488,10 @@ void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu,
>  
>  	if (free_active_root) {
>  		if (to_shadow_page(mmu->root.hpa)) {
> -			mmu_free_root_page(kvm, &mmu->root.hpa, &invalid_list);
> +			if (using_local_root_page(mmu))
> +				mmu_free_local_root_page(kvm, mmu);
> +			else
> +				mmu_free_root_page(kvm, &mmu->root.hpa, &invalid_list);

What do you think about adding a separate patch to rename
mmu_free_root_page() to mmu_put_root_page()? I think that would make the
code much more clear when combined with my suggestion to use "private".
i.e. We'd end up with:

  if (using_private_root_page(mmu))
        mmu_free_private_root_page(mmu);
  else
        mmu_put_root_page(kvm, &mmu->root.hpa, &invalid_list);

This makes it clear that the vCPU owns private root pages, so can free
them directly. But for shared root pages (i.e. else clause), we are just
putting a reference and only freeing if the reference (root_count) goes
to 0.

>  		} else if (mmu->pae_root) {
>  			for (i = 0; i < 4; ++i) {
>  				if (!IS_VALID_PAE_ROOT(mmu->pae_root[i]))
> -- 
> 2.19.1.6.gb485710b
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 02/12] KVM: X86/MMU: Add using_local_root_page()
  2022-05-26 21:28   ` David Matlack
@ 2022-05-26 21:38     ` Sean Christopherson
  0 siblings, 0 replies; 30+ messages in thread
From: Sean Christopherson @ 2022-05-26 21:38 UTC (permalink / raw)
  To: David Matlack
  Cc: Lai Jiangshan, linux-kernel, kvm, Paolo Bonzini,
	Vitaly Kuznetsov, Maxim Levitsky, Lai Jiangshan

On Thu, May 26, 2022, David Matlack wrote:
> On Sat, May 21, 2022 at 09:16:50PM +0800, Lai Jiangshan wrote:
> > From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> > 
> > In some cases, local root pages are used for MMU.  It is often using
> > to_shadow_page(mmu->root.hpa) to check if local root pages are used.
> > 
> > Add using_local_root_page() to directly check if local root pages are
> > used or needed to be used even mmu->root.hpa is not set.
> > 
> > Prepare for making to_shadow_page(mmu->root.hpa) returns non-NULL via
> > using local shadow [root] pages.
> > 
> > Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> > ---
> >  arch/x86/kvm/mmu/mmu.c | 40 +++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 37 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index efe5a3dca1e0..624b6d2473f7 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -1690,6 +1690,39 @@ static void drop_parent_pte(struct kvm_mmu_page *sp,
> >  	mmu_spte_clear_no_track(parent_pte);
> >  }
> >  
> > +/*
> > + * KVM uses the VCPU's local root page (vcpu->mmu->pae_root) when either the
> > + * shadow pagetable is using PAE paging or the host is shadowing nested NPT for
> > + * 32bit L1 hypervisor.
> 
> How about using the terms "private" and "shared" instead of "local" and
> "non-local"? I think that more accurately conveys what is special about
> these pages: they are private to the vCPU using them. And then "shared"
> is more intuitive to understand than "non-local" (which is used
> elsewhere in this series).

Please avoid "private" and "shared".  I haven't read the full context of the
discussion, but those terms have already been claimed by confidential VMs.

FWIW, I believe similar discussions happened around mm/ and kmap(), and they ended
up with thread_local and kmap_local().  Maybe "vCPU local" and "common"?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 04/12] KVM: X86/MMU: Add local shadow pages
  2022-05-21 13:16 ` [PATCH V3 04/12] KVM: X86/MMU: Add local shadow pages Lai Jiangshan
  2022-05-26 21:38   ` David Matlack
@ 2022-05-26 22:01   ` David Matlack
  2022-07-20  0:35   ` Sean Christopherson
  2 siblings, 0 replies; 30+ messages in thread
From: David Matlack @ 2022-05-26 22:01 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, kvm, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Maxim Levitsky, Lai Jiangshan

On Sat, May 21, 2022 at 09:16:52PM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> 
> Local shadow pages are shadow pages to hold PDPTEs for 32bit guest or
> higher level shadow pages having children local shadow pages when
> shadowing nested NPT for 32bit L1 in 64 bit L0.
> 
> Current code use mmu->pae_root, mmu->pml4_root, and mmu->pml5_root to
> setup local root page.  The initialization code is complex and the root
> pages are not associated with struct kvm_mmu_page which causes the code
> more complex.
> 
> Add kvm_mmu_alloc_local_shadow_page() and mmu_free_local_root_page() to
> allocate and free local shadow pages and prepare for using local
> shadow pages to replace current logic and share the most logic with
> non-local shadow pages.
> 
> The code is not activated since using_local_root_page() is false in
> the place where it is inserted.
> 
> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 109 ++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 108 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 240ebe589caf..c941a5931bc3 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -1764,6 +1764,76 @@ static bool using_local_root_page(struct kvm_mmu *mmu)
>  		return mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL;
>  }
>  
> +/*
> + * Local shadow pages are shadow pages to hold PDPTEs for 32bit guest or higher
> + * level shadow pages having children local shadow pages when shadowing nested
> + * NPT for 32bit L1 in 64 bit L0.
> + *
> + * Local shadow pages are often local shadow root pages (or local root pages for
> + * short) except when shadowing nested NPT for 32bit L1 in 64 bit L0 which has
> + * 2 or 3 levels of local shadow pages on top of non-local shadow pages.
> + *
> + * Local shadow pages are locally allocated.  If the local shadow page's level
> + * is PT32E_ROOT_LEVEL, it will use the preallocated mmu->pae_root for its
> + * sp->spt.  Because sp->spt may need to be put in the 32 bits CR3 (even in
> + * x86_64) or decrypted.  Using the preallocated one to handle these
> + * requirements makes the allocation simpler.
> + *
> + * Local shadow pages are only visible to local VCPU except through
> + * sp->parent_ptes rmap from their children, so they are not in the
> + * kvm->arch.active_mmu_pages nor in the hash.
> + *
> + * And they are neither accounted nor write-protected since they don't shadow a
> + * guest page table.
> + *
> + * Because of above, local shadow pages can not be freed nor zapped like
> + * non-local shadow pages.  They are freed directly when the local root page
> + * is freed, see mmu_free_local_root_page().
> + *
> + * Local root page can not be put on mmu->prev_roots because the comparison
> + * must use PDPTEs instead of CR3 and mmu->pae_root can not be shared for multi
> + * local root pages.
> + *
> + * Except above limitations, all the other abilities are the same as other
> + * shadow page, like link, parent rmap, sync, unsync etc.
> + *
> + * Local shadow pages can be obsoleted in a little different way other than
> + * the non-local shadow pages.  When the obsoleting process is done, all the
> + * obsoleted non-local shadow pages are unlinked from the local shadow pages
> + * by the help of the sp->parent_ptes rmap and the local shadow pages become
> + * theoretically valid again except sp->mmu_valid_gen may be still outdated.
> + * If there is no other event to cause a VCPU to free the local root page and
> + * the VCPU is being preempted by the host during two obsoleting processes,
> + * sp->mmu_valid_gen might become valid again and the VCPU can reuse it when
> + * the VCPU is back.  It is different from the non-local shadow pages which
> + * are always freed after obsoleted.
> + */
> +static struct kvm_mmu_page *
> +kvm_mmu_alloc_local_shadow_page(struct kvm_vcpu *vcpu, union kvm_mmu_page_role role)
> +{
> +	struct kvm_mmu_page *sp;
> +
> +	sp = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
> +	sp->gfn = 0;
> +	sp->role = role;
> +	/*
> +	 * Use the preallocated mmu->pae_root when the shadow page's
> +	 * level is PT32E_ROOT_LEVEL which may need to be put in the 32 bits
> +	 * CR3 (even in x86_64) or decrypted.  The preallocated one is prepared
> +	 * for the requirements.
> +	 */
> +	if (role.level == PT32E_ROOT_LEVEL &&
> +	    !WARN_ON_ONCE(!vcpu->arch.mmu->pae_root))
> +		sp->spt = vcpu->arch.mmu->pae_root;

FYI this (and a couple other parts of this series) conflict with Nested
MMU Eager Page Splitting, since it uses struct kvm_vcpu in kvm_mmu_get_page().

Hopefully Paolo can queue Nested MMU Eager Page Splitting for 5.20 so
you can apply this series on top. I think that'd be simpler than trying
to do it the other way around.

> +	else
> +		sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache);
> +	/* sp->gfns is not used for local shadow page */
> +	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
> +	sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen;
> +
> +	return sp;
> +}
> +
>  static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct)
>  {
>  	struct kvm_mmu_page *sp;
> @@ -2121,6 +2191,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>  	if (level <= vcpu->arch.mmu->cpu_role.base.level)
>  		role.passthrough = 0;
>  
> +	if (unlikely(level >= PT32E_ROOT_LEVEL && using_local_root_page(vcpu->arch.mmu)))
> +		return kvm_mmu_alloc_local_shadow_page(vcpu, role);
> +
>  	sp_list = &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)];
>  	for_each_valid_sp(vcpu->kvm, sp, sp_list) {
>  		if (sp->gfn != gfn) {
> @@ -3351,6 +3424,37 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa,
>  	*root_hpa = INVALID_PAGE;
>  }
>  
> +static void mmu_free_local_root_page(struct kvm *kvm, struct kvm_mmu *mmu)
> +{
> +	u64 spte = mmu->root.hpa;
> +	struct kvm_mmu_page *sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK);
> +	int i;
> +
> +	/* Free level 5 or 4 roots for shadow NPT for 32 bit L1 */
> +	while (sp->role.level > PT32E_ROOT_LEVEL)
> +	{
> +		spte = sp->spt[0];
> +		mmu_page_zap_pte(kvm, sp, sp->spt + 0, NULL);
> +		free_page((unsigned long)sp->spt);
> +		kmem_cache_free(mmu_page_header_cache, sp);
> +		if (!is_shadow_present_pte(spte))
> +			return;
> +		sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK);
> +	}
> +
> +	if (WARN_ON_ONCE(sp->role.level != PT32E_ROOT_LEVEL))
> +		return;
> +
> +	/* Disconnect PAE root from the 4 PAE page directories */
> +	for (i = 0; i < 4; i++)
> +		mmu_page_zap_pte(kvm, sp, sp->spt + i, NULL);
> +
> +	if (sp->spt != mmu->pae_root)
> +		free_page((unsigned long)sp->spt);
> +
> +	kmem_cache_free(mmu_page_header_cache, sp);
> +}
> +
>  /* roots_to_free must be some combination of the KVM_MMU_ROOT_* flags */
>  void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu,
>  			ulong roots_to_free)
> @@ -3384,7 +3488,10 @@ void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu,
>  
>  	if (free_active_root) {
>  		if (to_shadow_page(mmu->root.hpa)) {
> -			mmu_free_root_page(kvm, &mmu->root.hpa, &invalid_list);
> +			if (using_local_root_page(mmu))
> +				mmu_free_local_root_page(kvm, mmu);
> +			else
> +				mmu_free_root_page(kvm, &mmu->root.hpa, &invalid_list);
>  		} else if (mmu->pae_root) {
>  			for (i = 0; i < 4; ++i) {
>  				if (!IS_VALID_PAE_ROOT(mmu->pae_root[i]))
> -- 
> 2.19.1.6.gb485710b
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 01/12] KVM: X86/MMU: Verify PDPTE for nested NPT in PAE paging mode when page fault
  2022-05-21 13:16 ` [PATCH V3 01/12] KVM: X86/MMU: Verify PDPTE for nested NPT in PAE paging mode when page fault Lai Jiangshan
@ 2022-07-19 21:17   ` Sean Christopherson
  0 siblings, 0 replies; 30+ messages in thread
From: Sean Christopherson @ 2022-07-19 21:17 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, kvm, Paolo Bonzini, Vitaly Kuznetsov,
	Maxim Levitsky, David Matlack, Lai Jiangshan

On Sat, May 21, 2022, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> Fixes: e4e517b4be01 ("KVM: MMU: Do not unconditionally read PDPTE from guest memory")
> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> ---
>  arch/x86/kvm/mmu/paging_tmpl.h | 39 ++++++++++++++++++++++++++++++++++
>  1 file changed, 39 insertions(+)
> 
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index db80f7ccaa4e..6e3df84e8455 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -870,6 +870,44 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>  	if (is_page_fault_stale(vcpu, fault, mmu_seq))
>  		goto out_unlock;
>  
> +	/*
> +	 * When nested NPT enabled and L1 is PAE paging, mmu->get_pdptrs()
> +	 * which is nested_svm_get_tdp_pdptr() reads the guest NPT's PDPTE
> +	 * from memory unconditionally for each call.
> +	 *
> +	 * The guest PAE root page is not write-protected.

I think it's worth calling out that it's simply not feasible to write-protect
PDPTEs due to them not covering a full page.

And looking at this comment as a whole, while I love detailed comments, I think
it'd be better off to avoid referring to mmu->get_pdptrs() and use more generic
terminology when talking about KVM.

I think this is accurate?

	/*
	 * If KVM is shadowing nested NPT and L1 is using PAE paging, zap the
	 * root for the PDPTE if the cached value doesn't match the entry at the
	 * time of the page fault, and resume the guest to rebuid the root.
	 * This is effectively a variation of write-protection, where the target
	 * SPTE(s) is zapped on use instead of on write.
	 *
	 * Under SVM with NPT+PAE, the CPU does NOT cache PDPTEs and instead
	 * handles them as it would any other page table entry.  I.e. KVM can't
	 * cache PDPTEs at nested VMRUN without violating the SVM architecture.
	 *
	 * KVM doesn't write-protect PDPTEs because CR3 only needs to be 32-byte
	 * aligned and sized when using PAE paging, whereas write-protection
	 * works at page granularity.
	 */

> +	 *
> +	 * The mmu->get_pdptrs() in FNAME(walk_addr_generic) might get a value
> +	 * different from previous calls or different from the return value of
> +	 * mmu->get_pdptrs() in mmu_alloc_shadow_roots().
> +	 *
> +	 * It will cause FNAME(fetch) installs the spte in a wrong sp or links
> +	 * a sp to a wrong parent if the return value of mmu->get_pdptrs()
> +	 * is not verified unchanged since FNAME(gpte_changed) can't check
> +	 * this kind of change.
> +	 *
> +	 * Verify the return value of mmu->get_pdptrs() (only the gfn in it
> +	 * needs to be checked) and do kvm_mmu_free_roots() like load_pdptr()
> +	 * if the gfn isn't matched.
> +	 *
> +	 * Do the verifying unconditionally when the guest is PAE paging no
> +	 * matter whether it is nested NPT or not to avoid complicated code.

Doing this unconditionally just trades one architecturally incorrect behavior
with another.  Does any real world use case actually care?  Probably not.  But the
behavior is visible to software, and I don't think it costs us much to get it right.

There are a number of ways to handle this, e.g. set a flag in kvm_init_shadow_npt_mmu()
and consume it here.  We could probably even burn a bit in kvm_mmu_extended_role
since we have lots of bits to burn.  E.g.

	if (vcpu->arch.mmu->cpu_role.ext.npt_pae) {

	}


> +	 */
> +	if (vcpu->arch.mmu->cpu_role.base.level == PT32E_ROOT_LEVEL) {
> +		u64 pdpte = vcpu->arch.mmu->pae_root[(fault->addr >> 30) & 3];
> +		struct kvm_mmu_page *sp = NULL;
> +
> +		if (IS_VALID_PAE_ROOT(pdpte))
> +			sp = to_shadow_page(pdpte & PT64_BASE_ADDR_MASK);
> +
> +		if (!sp || walker.table_gfn[PT32E_ROOT_LEVEL - 2] != sp->gfn) {
> +			write_unlock(&vcpu->kvm->mmu_lock);
> +			kvm_mmu_free_roots(vcpu->kvm, vcpu->arch.mmu,
> +					   KVM_MMU_ROOT_CURRENT);
> +			goto release_clean;
> +		}
> +	}
> +
>  	r = make_mmu_pages_available(vcpu);
>  	if (r)
>  		goto out_unlock;
> @@ -877,6 +915,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>  
>  out_unlock:
>  	write_unlock(&vcpu->kvm->mmu_lock);
> +release_clean:
>  	kvm_release_pfn_clean(fault->pfn);
>  	return r;
>  }
> -- 
> 2.19.1.6.gb485710b
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 02/12] KVM: X86/MMU: Add using_local_root_page()
  2022-05-21 13:16 ` [PATCH V3 02/12] KVM: X86/MMU: Add using_local_root_page() Lai Jiangshan
  2022-05-26 21:28   ` David Matlack
@ 2022-07-19 22:03   ` Sean Christopherson
  1 sibling, 0 replies; 30+ messages in thread
From: Sean Christopherson @ 2022-07-19 22:03 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, kvm, Paolo Bonzini, Vitaly Kuznetsov,
	Maxim Levitsky, David Matlack, Lai Jiangshan

On Sat, May 21, 2022, Lai Jiangshan wrote:
> +static bool using_local_root_page(struct kvm_mmu *mmu)

Hmm, I agree with David that "local" isn't the most intuitive terminology.  But
I also do want to avoid private vs. shared to avoid confusion with confidential VMs.

Luckily, I don't think we need to come up with new terminology, just be literal
and call 'em "per-vCPU root pages".  E.g.

  static bool kvm_mmu_has_per_vcpu_root_page()

That way readers don't have to understand what "local" means, and that also captures
per-vCPU roots are an exception, i.e. that most roots are NOT per-vCPU.

> +{
> +	return mmu->root_role.level == PT32E_ROOT_LEVEL ||
> +	       (!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL);
> +}
> +
>  static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct)
>  {
>  	struct kvm_mmu_page *sp;
> @@ -4252,10 +4285,11 @@ static bool fast_pgd_switch(struct kvm *kvm, struct kvm_mmu *mmu,
>  {
>  	/*
>  	 * For now, limit the caching to 64-bit hosts+VMs in order to avoid
> -	 * having to deal with PDPTEs. We may add support for 32-bit hosts/VMs
> -	 * later if necessary.
> +	 * having to deal with PDPTEs.  Local roots can not be put into
> +	 * mmu->prev_roots[] because mmu->pae_root can not be shared for
> +	 * different roots at the same time.
>  	 */
> -	if (VALID_PAGE(mmu->root.hpa) && !to_shadow_page(mmu->root.hpa))
> +	if (unlikely(using_local_root_page(mmu)))

I don't know that I like using the local/per-vCPU helper.  The problem isn't _just_
that KVM is using a per-vCPU root, KVM is also deliberately punting on dealing with
PDTPRs.  E.g. the per-vCPU aspect doesn't explain why KVM doesn't allow reusing the
current root.  I don't like that the using_local_root_page() obfuscates that check.

My preference for this would be to revert back to a streamlined variation of the
code prior to commit 5499ea73e7db ("KVM: x86/mmu: look for a cached PGD when going
from 32-bit to 64-bit").

KVM switched to the !to_shadow_page() check to _avoid_ consuming (what is now)
mmu->root_role because, at the time of the patch, mmu held the _old_ data, which
was wrong/stale for nested virtualization transitions.

In other words, I would prefer that explicitly do (in a separate patch):

	/*
	 * For now, limit the fast switch to 64-bit VMs in order to avoid having
	 * to deal with PDPTEs.  32-bit VMs can be supported later if necessary.
	 */
	if (new_role.level < PT64_ROOT_LEVEL4)
		kvm_mmu_free_roots(kvm, mmu, KVM_MMU_ROOT_CURRENT);

The "hosts+VMs" can be shortened to just "VMs", because running a 64-bit VM with
a 32-bit host just doesn't work for a variety of reasons, i.e. doesn't need to be
called out here.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 05/12] KVM: X86/MMU: Link PAE root pagetable with its children
  2022-05-21 13:16 ` [PATCH V3 05/12] KVM: X86/MMU: Link PAE root pagetable with its children Lai Jiangshan
@ 2022-07-19 22:21   ` Sean Christopherson
  0 siblings, 0 replies; 30+ messages in thread
From: Sean Christopherson @ 2022-07-19 22:21 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, kvm, Paolo Bonzini, Vitaly Kuznetsov,
	Maxim Levitsky, David Matlack, Lai Jiangshan

The shortlog is very misleading.  This patch doesn't "Link PAE root pagetable with
its children", it adds support for creating PAE PDPTEs in order to link them into
shadow pages, but it doesn't do the actual linking.

  KVM: x86/mmu: Add support for linking PAE PDPTE shadow pages

On Sat, May 21, 2022, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> 
> When local shadow pages are activated, link_shadow_page() might link
> a local shadow pages which is the PAE root for PAE paging with its
> children.
> 
> Add make_pae_pdpte() to handle it.
> 
> The code is not activated since local shadow pages are not activated
> yet.

And though it's redudnant with other context, IMO it's helpful to again reiterate
why this will be used for per-vCPU (local) shadow pages, i.e. why it's _not_ used
right now.

  Add support for installing PDPTEs via link_shadow_page(), PDPTEs have
  different layouts than every other entry type and so need a dedicated
  helper to make them.

  This code will become active when a future patch activates per-vCPU
  shadow pages and stops using so called "special" roots (which are
  installed at root allocation, not via link_shadow_page()).

> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> ---
>  arch/x86/kvm/mmu/mmu.c  | 6 +++++-
>  arch/x86/kvm/mmu/spte.c | 7 +++++++
>  arch/x86/kvm/mmu/spte.h | 1 +
>  3 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index c941a5931bc3..e1a059dd9621 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2340,7 +2340,11 @@ static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
>  
>  	BUILD_BUG_ON(VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK);
>  
> -	spte = make_nonleaf_spte(sp->spt, sp_ad_disabled(sp));
> +	if (unlikely(sp->role.level == PT32_ROOT_LEVEL &&
> +		     vcpu->arch.mmu->root_role.level == PT32E_ROOT_LEVEL))
> +		spte = make_pae_pdpte(sp->spt);
> +	else
> +		spte = make_nonleaf_spte(sp->spt, sp_ad_disabled(sp));
>  
>  	mmu_spte_set(sptep, spte);
>  
> diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
> index b5960bbde7f7..5c31fa1d2b61 100644
> --- a/arch/x86/kvm/mmu/spte.c
> +++ b/arch/x86/kvm/mmu/spte.c
> @@ -279,6 +279,13 @@ u64 make_huge_page_split_spte(u64 huge_spte, int huge_level, int index)
>  	return child_spte;
>  }
>  
> +u64 make_pae_pdpte(u64 *child_pt)
> +{
> +	/* The only ignore bits in PDPTE are 11:9. */

s/ignore/ignored, though it might be worth calling out that unlike 64-bit paging,
the upper bits bits are reserved (I always forget this).

	/*
	 * Only PDPTE bits 11:9 are ignored by hardware.  Unlike 64-bit paging,
	 * bits above the PA bits are reserved.
	 */

> +	BUILD_BUG_ON(!(GENMASK(11,9) & SPTE_MMU_PRESENT_MASK));
> +	return __pa(child_pt) | PT_PRESENT_MASK | SPTE_MMU_PRESENT_MASK |
> +		shadow_me_value;
> +}
>  
>  u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled)
>  {
> diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
> index 0127bb6e3c7d..2408ba1361d5 100644
> --- a/arch/x86/kvm/mmu/spte.h
> +++ b/arch/x86/kvm/mmu/spte.h
> @@ -426,6 +426,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>  	       u64 old_spte, bool prefetch, bool can_unsync,
>  	       bool host_writable, u64 *new_spte);
>  u64 make_huge_page_split_spte(u64 huge_spte, int huge_level, int index);
> +u64 make_pae_pdpte(u64 *child_pt);
>  u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled);
>  u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsigned int access);
>  u64 mark_spte_for_access_track(u64 spte);
> -- 
> 2.19.1.6.gb485710b
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 07/12] KVM: X86/MMU: Remove the check of the return value of to_shadow_page()
  2022-05-21 13:16 ` [PATCH V3 07/12] KVM: X86/MMU: Remove the check of the return value of to_shadow_page() Lai Jiangshan
@ 2022-07-19 22:42   ` Sean Christopherson
  0 siblings, 0 replies; 30+ messages in thread
From: Sean Christopherson @ 2022-07-19 22:42 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, kvm, Paolo Bonzini, Vitaly Kuznetsov,
	Maxim Levitsky, David Matlack, Lai Jiangshan

On Sat, May 21, 2022, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> 
> Remove the check of the return value of to_shadow_page() in
> mmu_free_root_page(), kvm_mmu_free_guest_mode_roots(), is_unsync_root()
> and is_tdp_mmu() because it can not return NULL.

s/can not/can no longer, to make it very clear the to_shadow_page() _could_ return
NULL in the past, but something changed.

And please explain _why_ it can no longer return NULL, even if that's just a
reference to kvm_mmu_alloc_local_shadow_page() or whatever it ends up being named.
That can also tie into the obsolete sp handling.

> Remove the check of the return value of to_shadow_page() in
> is_page_fault_stale() and is_obsolete_root() because it can not return
> NULL and the obsoleting for local shadow page is already handled by
> a different way.
> 
> When the obsoleting process is done, all the obsoleted non-local shadow
> pages are already unlinked from the local shadow pages by the help of
> the parent rmap from the children and the local shadow pages become
> theoretically valid again.  The local shadow page can be freed if
> is_obsolete_sp() return true, or be reused if is_obsolete_sp() becomes
> false.
> 
> Reviewed-by: David Matlack <dmatlack@google.com>
> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> ---

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 08/12] KVM: X86/MMU: Allocate mmu->pae_root for PAE paging on-demand
  2022-05-21 13:16 ` [PATCH V3 08/12] KVM: X86/MMU: Allocate mmu->pae_root for PAE paging on-demand Lai Jiangshan
@ 2022-07-19 23:08   ` Sean Christopherson
  2022-07-20  0:07     ` Sean Christopherson
  0 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2022-07-19 23:08 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, kvm, Paolo Bonzini, Vitaly Kuznetsov,
	Maxim Levitsky, David Matlack, Lai Jiangshan

On Sat, May 21, 2022, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> 
> mmu->pae_root for non-PAE paging is allocated on-demand, but
> mmu->pae_root for PAE paging is allocated early when struct kvm_mmu is
> being created.
> 
> Simplify the code to allocate mmu->pae_root for PAE paging and make
> it on-demand.

Hmm, I'm not convinced this simplifies things enough to justify the risk.  There's
a non-zero chance that the __GFP_DMA32 allocation was intentionally done during VM
creation in order to avoid OOM on low memory.

Maybe move this patch to the tail end of the series so that it has a higher chance
of reverting cleanly if on-demand allocation breaks someone's setup?

> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> ---
>  arch/x86/include/asm/kvm_host.h |   2 +-
>  arch/x86/kvm/mmu/mmu.c          | 101 +++++++++++++-------------------
>  arch/x86/kvm/x86.c              |   4 +-
>  3 files changed, 44 insertions(+), 63 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 9cdc5bbd721f..fb9751dfc1a7 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1615,7 +1615,7 @@ int kvm_mmu_vendor_module_init(void);
>  void kvm_mmu_vendor_module_exit(void);
>  
>  void kvm_mmu_destroy(struct kvm_vcpu *vcpu);
> -int kvm_mmu_create(struct kvm_vcpu *vcpu);
> +void kvm_mmu_create(struct kvm_vcpu *vcpu);
>  int kvm_mmu_init_vm(struct kvm *kvm);
>  void kvm_mmu_uninit_vm(struct kvm *kvm);
>  
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 90b715eefe6a..63c2b2c6122c 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -668,6 +668,41 @@ static void walk_shadow_page_lockless_end(struct kvm_vcpu *vcpu)
>  	}
>  }
>  
> +static int mmu_alloc_pae_root(struct kvm_vcpu *vcpu)

Now that pae_root isn't the "full" root, just the page table, I think we should
rename pae_root to something else, and then name this accordingly.

pae_root_backing_page and mmu_alloc_pae_root_backing_page()?  Definitely don't
love the name if someone has a better idea.

> +{
> +	struct page *page;
> +
> +	if (vcpu->arch.mmu->root_role.level != PT32E_ROOT_LEVEL)
> +		return 0;

I think I'd prefer to move this check to the caller, it's confusing to see an
unconditional call to a PAE-specific helper.

> +	if (vcpu->arch.mmu->pae_root)
> +		return 0;
> +
> +	/*
> +	 * Allocate a page to hold the four PDPTEs for PAE paging when emulating
> +	 * 32-bit mode.  CR3 is only 32 bits even on x86_64 in this case.
> +	 * Therefore we need to allocate the PDP table in the first 4GB of
> +	 * memory, which happens to fit the DMA32 zone.
> +	 */
> +	page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_DMA32);

Leave off __GFP_ZERO, it's unnecesary in both cases, and actively misleading in
when TDP is disabled.  KVM _must_ write the page after making it decrypted.  And
since I can't find any code that actually does initialize "pae_root", I suspect
this series is buggy.

But if there is a bug, it was introduced earlier in this series, either by

  KVM: X86/MMU: Add local shadow pages

or by

  KVM: X86/MMU: Activate local shadow pages and remove old logic

depending on whether you want to blame the function that is buggy, or the patch
that uses the buggy function..

The right place to initialize the root is kvm_mmu_alloc_local_shadow_page().
KVM sets __GFP_ZERO for mmu_shadow_page_cache, i.e. relies on new sp->spt pages
to be zeroed prior to "allocating" from the cache.

The PAE root backing page on the other hand is allocated once and then reused
over and over.

	if (role.level == PT32E_ROOT_LEVEL &&
	    !WARN_ON_ONCE(!vcpu->arch.mmu->pae_root)) {
		sp->spt = vcpu->arch.mmu->pae_root;
		kvm_mmu_initialize_pae_root(sp->spt): <==== something like this
	} else {
		sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache);
	}


> -	for (i = 0; i < 4; ++i)
> -		mmu->pae_root[i] = INVALID_PAE_ROOT;

Please remove this code in a separate patch.  I don't care if it is removed before
or after (I'm pretty sure the existing behavior is paranoia), but I don't want
multiple potentially-functional changes in this patch.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 10/12] KVM: X86/MMU: Remove unused INVALID_PAE_ROOT and IS_VALID_PAE_ROOT
  2022-05-21 13:16 ` [PATCH V3 10/12] KVM: X86/MMU: Remove unused INVALID_PAE_ROOT and IS_VALID_PAE_ROOT Lai Jiangshan
@ 2022-07-19 23:11   ` Sean Christopherson
  0 siblings, 0 replies; 30+ messages in thread
From: Sean Christopherson @ 2022-07-19 23:11 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, kvm, Paolo Bonzini, Vitaly Kuznetsov,
	Maxim Levitsky, David Matlack, Lai Jiangshan

On Sat, May 21, 2022, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> 
> They are unused and replaced with 0ull like other zero sptes and
> is_shadow_present_pte().
> 
> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> ---

Reviewed-by: Sean Christopherson <seanjc@google.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 09/12] KVM: X86/MMU: Move the verifying of NPT's PDPTE in FNAME(fetch)
  2022-05-21 13:16 ` [PATCH V3 09/12] KVM: X86/MMU: Move the verifying of NPT's PDPTE in FNAME(fetch) Lai Jiangshan
@ 2022-07-19 23:21   ` Sean Christopherson
  0 siblings, 0 replies; 30+ messages in thread
From: Sean Christopherson @ 2022-07-19 23:21 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, kvm, Paolo Bonzini, Vitaly Kuznetsov,
	Maxim Levitsky, David Matlack, Lai Jiangshan

On Sat, May 21, 2022, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> 
> FNAME(page_fault) verifies PDPTE for nested NPT in PAE paging mode
> because nested_svm_get_tdp_pdptr() reads the guest NPT's PDPTE from
> memory unconditionally for each call.
> 
> The verifying is complicated and it works only when mmu->pae_root
> is always used when the guest is PAE paging.

Why is this relevant?  It's not _that_ complicated, and even if it were, I don't
see how calling that out helps the reader understand the motivation for this patch.

> Move the verifying code in FNAME(fetch) and simplify it since the local
> shadow page is used and it can be walked in FNAME(fetch) and unlinked
> from children via drop_spte().
> 
> It also allows for mmu->pae_root NOT to be used when it is NOT required

Avoid leading with pronous, "it" is ambiguous, e.g. at first I thought "it' meant
moving the code, but what "it" really means is using the iterator from the shadow
page walk instead of hardcoding a pae_root lookup.

And changing from pae_root to it.sptep needs to be explicitly called out.  It's
a subtle but important detail.  And if you call that out, then it's more obvious
why this patch is relevant to not having to use pae_root for a 64-bit host with NPT.

> to be put in a 32bit CR3.
> 
> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> ---
>  arch/x86/kvm/mmu/paging_tmpl.h | 72 ++++++++++++++++------------------
>  1 file changed, 33 insertions(+), 39 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index cd6032e1947c..67c419bce1e5 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -659,6 +659,39 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
>  		clear_sp_write_flooding_count(it.sptep);
>  		drop_large_spte(vcpu, it.sptep);
>  
> +		/*
> +		 * When nested NPT enabled and L1 is PAE paging,
> +		 * mmu->get_pdptrs() which is nested_svm_get_tdp_pdptr() reads
> +		 * the guest NPT's PDPTE from memory unconditionally for each
> +		 * call.
> +		 *
> +		 * The guest PAE root page is not write-protected.
> +		 *
> +		 * The mmu->get_pdptrs() in FNAME(walk_addr_generic) might get
> +		 * a value different from previous calls or different from the
> +		 * return value of mmu->get_pdptrs() in mmu_alloc_shadow_roots().
> +		 *
> +		 * It will cause the following code installs the spte in a wrong
> +		 * sp or links a sp to a wrong parent if the return value of
> +		 * mmu->get_pdptrs() is not verified unchanged since
> +		 * FNAME(gpte_changed) can't check this kind of change.
> +		 *
> +		 * Verify the return value of mmu->get_pdptrs() (only the gfn
> +		 * in it needs to be checked) and drop the spte if the gfn isn't
> +		 * matched.
> +		 *
> +		 * Do the verifying unconditionally when the guest is PAE
> +		 * paging no matter whether it is nested NPT or not to avoid
> +		 * complicated code.
> +		 */
> +		if (vcpu->arch.mmu->cpu_role.base.level == PT32E_ROOT_LEVEL &&
> +		    it.level == PT32E_ROOT_LEVEL &&
> +		    is_shadow_present_pte(*it.sptep)) {
> +			sp = to_shadow_page(*it.sptep & PT64_BASE_ADDR_MASK);

For this patch, it's probably worth a

			WARN_ON_ONCE(sp->spt != vcpu->arch.mmu->pae_root);

Mostly so that when the future patch stops using pae_root for 64-bit NPT hosts,
there's a code change for this particular logic that is very much relevant to
that change.

> +			if (gw->table_gfn[it.level - 2] != sp->gfn)
> +				drop_spte(vcpu->kvm, it.sptep);
> +		}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 11/12] KVM: X86/MMU: Don't use mmu->pae_root when shadowing PAE NPT in 64-bit host
  2022-05-21 13:16 ` [PATCH V3 11/12] KVM: X86/MMU: Don't use mmu->pae_root when shadowing PAE NPT in 64-bit host Lai Jiangshan
@ 2022-07-19 23:26   ` Sean Christopherson
  2022-07-19 23:27     ` Sean Christopherson
  0 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2022-07-19 23:26 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, kvm, Paolo Bonzini, Vitaly Kuznetsov,
	Maxim Levitsky, David Matlack, Lai Jiangshan

On Sat, May 21, 2022, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> 
> Allocate the tables when allocating the local shadow page.

This absolutely needs a much more verbose changelog.

> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 17 +++++++++--------
>  1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 63c2b2c6122c..73e6a8e1e1a9 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -1809,10 +1809,12 @@ static bool using_local_root_page(struct kvm_mmu *mmu)
>   * 2 or 3 levels of local shadow pages on top of non-local shadow pages.
>   *
>   * Local shadow pages are locally allocated.  If the local shadow page's level
> - * is PT32E_ROOT_LEVEL, it will use the preallocated mmu->pae_root for its
> - * sp->spt.  Because sp->spt may need to be put in the 32 bits CR3 (even in
> - * x86_64) or decrypted.  Using the preallocated one to handle these
> - * requirements makes the allocation simpler.
> + * is PT32E_ROOT_LEVEL, and it is not shadowing nested NPT for 32-bit L1 in
> + * 64-bit L0 (or said when the shadow pagetable's level is PT32E_ROOT_LEVEL),
> + * it will use the preallocated mmu->pae_root for its sp->spt.  Because sp->spt
> + * need to be put in the 32-bit CR3 (even in 64-bit host) or decrypted.  Using
> + * the preallocated one to handle these requirements makes the allocation
> + * simpler.
>   *
>   * Local shadow pages are only visible to local VCPU except through
>   * sp->parent_ptes rmap from their children, so they are not in the
> @@ -1852,13 +1854,12 @@ kvm_mmu_alloc_local_shadow_page(struct kvm_vcpu *vcpu, union kvm_mmu_page_role r
>  	sp->gfn = 0;
>  	sp->role = role;
>  	/*
> -	 * Use the preallocated mmu->pae_root when the shadow page's
> -	 * level is PT32E_ROOT_LEVEL which may need to be put in the 32 bits
> +	 * Use the preallocated mmu->pae_root when the shadow pagetable's
> +	 * level is PT32E_ROOT_LEVEL which need to be put in the 32 bits
>  	 * CR3 (even in x86_64) or decrypted.  The preallocated one is prepared
>  	 * for the requirements.
>  	 */
> -	if (role.level == PT32E_ROOT_LEVEL &&
> -	    !WARN_ON_ONCE(!vcpu->arch.mmu->pae_root))

Why remove this WARN_ON_ONCE()?  And shouldn't this also interact with 

   KVM: X86/MMU: Allocate mmu->pae_root for PAE paging on-demand

Actually, I think the series is buggy.  That patch, which precedes this one, does

	if (vcpu->arch.mmu->root_role.level != PT32E_ROOT_LEVEL)
		return 0;

i.e. does NOT allocate pae_root for a 64-bit host, which means that running KVM
against the on-demand patch would result in the WARN firing and bad things happening.

> +	if (vcpu->arch.mmu->root_role.level == PT32E_ROOT_LEVEL)
>  		sp->spt = vcpu->arch.mmu->pae_root;
>  	else
>  		sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache);
> -- 
> 2.19.1.6.gb485710b
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 11/12] KVM: X86/MMU: Don't use mmu->pae_root when shadowing PAE NPT in 64-bit host
  2022-07-19 23:26   ` Sean Christopherson
@ 2022-07-19 23:27     ` Sean Christopherson
  0 siblings, 0 replies; 30+ messages in thread
From: Sean Christopherson @ 2022-07-19 23:27 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, kvm, Paolo Bonzini, Vitaly Kuznetsov,
	Maxim Levitsky, David Matlack, Lai Jiangshan

On Tue, Jul 19, 2022, Sean Christopherson wrote:
> On Sat, May 21, 2022, Lai Jiangshan wrote:
> Actually, I think the series is buggy.  That patch, which precedes this one, does
> 
> 	if (vcpu->arch.mmu->root_role.level != PT32E_ROOT_LEVEL)
> 		return 0;
> 
> i.e. does NOT allocate pae_root for a 64-bit host, which means that running KVM
> against the on-demand patch would result in the WARN firing and bad things happening.

Gah, I take that back, pae_root is allocated by mmu_alloc_special_roots().

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 08/12] KVM: X86/MMU: Allocate mmu->pae_root for PAE paging on-demand
  2022-07-19 23:08   ` Sean Christopherson
@ 2022-07-20  0:07     ` Sean Christopherson
  0 siblings, 0 replies; 30+ messages in thread
From: Sean Christopherson @ 2022-07-20  0:07 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, kvm, Paolo Bonzini, Vitaly Kuznetsov,
	Maxim Levitsky, David Matlack, Lai Jiangshan

On Tue, Jul 19, 2022, Sean Christopherson wrote:
> On Sat, May 21, 2022, Lai Jiangshan wrote:
> > +	/*
> > +	 * Allocate a page to hold the four PDPTEs for PAE paging when emulating
> > +	 * 32-bit mode.  CR3 is only 32 bits even on x86_64 in this case.
> > +	 * Therefore we need to allocate the PDP table in the first 4GB of
> > +	 * memory, which happens to fit the DMA32 zone.
> > +	 */
> > +	page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_DMA32);
> 
> Leave off __GFP_ZERO, it's unnecesary in both cases, and actively misleading in
> when TDP is disabled.  KVM _must_ write the page after making it decrypted.  And
> since I can't find any code that actually does initialize "pae_root", I suspect
> this series is buggy.
> 
> But if there is a bug, it was introduced earlier in this series, either by
> 
>   KVM: X86/MMU: Add local shadow pages
> 
> or by
> 
>   KVM: X86/MMU: Activate local shadow pages and remove old logic
> 
> depending on whether you want to blame the function that is buggy, or the patch
> that uses the buggy function..
> 
> The right place to initialize the root is kvm_mmu_alloc_local_shadow_page().
> KVM sets __GFP_ZERO for mmu_shadow_page_cache, i.e. relies on new sp->spt pages
> to be zeroed prior to "allocating" from the cache.
> 
> The PAE root backing page on the other hand is allocated once and then reused
> over and over.
> 
> 	if (role.level == PT32E_ROOT_LEVEL &&
> 	    !WARN_ON_ONCE(!vcpu->arch.mmu->pae_root)) {
> 		sp->spt = vcpu->arch.mmu->pae_root;
> 		kvm_mmu_initialize_pae_root(sp->spt): <==== something like this
> 	} else {
> 		sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache);
> 	}

Ah, I believe this is handled for the non-SME case in mmu_free_local_root_page().
But that won't play nice with the decryption path.  And either way, the PDPDTEs
should be explicitly initialized/zeroed when the shadow page is "allocated"

> > -	for (i = 0; i < 4; ++i)
> > -		mmu->pae_root[i] = INVALID_PAE_ROOT;
> 
> Please remove this code in a separate patch.  I don't care if it is removed before
> or after (I'm pretty sure the existing behavior is paranoia), but I don't want
> multiple potentially-functional changes in this patch.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 04/12] KVM: X86/MMU: Add local shadow pages
  2022-05-21 13:16 ` [PATCH V3 04/12] KVM: X86/MMU: Add local shadow pages Lai Jiangshan
  2022-05-26 21:38   ` David Matlack
  2022-05-26 22:01   ` David Matlack
@ 2022-07-20  0:35   ` Sean Christopherson
  2 siblings, 0 replies; 30+ messages in thread
From: Sean Christopherson @ 2022-07-20  0:35 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, kvm, Paolo Bonzini, Vitaly Kuznetsov,
	Maxim Levitsky, David Matlack, Lai Jiangshan

On Sat, May 21, 2022, Lai Jiangshan wrote:
> +static struct kvm_mmu_page *
> +kvm_mmu_alloc_local_shadow_page(struct kvm_vcpu *vcpu, union kvm_mmu_page_role role)

Don't split the function name to a new line, even if it means running (well) over
the 80 char soft limit.

static struct kvm_mmu_page *kvm_mmu_alloc_per_vcpu_shadow_page(struct kvm_vcpu *vcpu,
							       union kvm_mmu_page_role role)

> +{
> +	struct kvm_mmu_page *sp;
> +
> +	sp = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
> +	sp->gfn = 0;

Why explicitly zero gfn but not gfns?  Either rely on __GFP_ZERO or don't, mixing
behavior is confusing.  If there's an assumption that "gfn" be zero, e.g. due to
masking, then that would be a good WARN candidate.

> +	sp->role = role;
> +	/*
> +	 * Use the preallocated mmu->pae_root when the shadow page's
> +	 * level is PT32E_ROOT_LEVEL which may need to be put in the 32 bits
> +	 * CR3 (even in x86_64) or decrypted.  The preallocated one is prepared
> +	 * for the requirements.
> +	 */
> +	if (role.level == PT32E_ROOT_LEVEL &&
> +	    !WARN_ON_ONCE(!vcpu->arch.mmu->pae_root))
> +		sp->spt = vcpu->arch.mmu->pae_root;
> +	else
> +		sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache);
> +	/* sp->gfns is not used for local shadow page */

This comment isn't helpful as it doesn't provide any information as to _why_ gfns
isn't used.  For simple enforcement, a KVM_BUG_ON() is much more effective as it
documents the underlying assumption, e.g.

	KVM_BUG_ON(sp_has_gptes(sp), vcpu->kvm);

but I'm fairly confident that won't actually work, because sp_has_gptes() will
return true for pages that are backed by pae_root, i.e. are not passthrough.

In other words, this all subtly relies on the PDPTEs not being write-protected
and not being reachable through things like mmu_page_hash.  I don't know that we
need to add a dedicated flag for these pages, but we need _something_ to document
what's going on.

Hmm, but if we do add kvm_mmu_page_role.per_vcpu, it would allow for code
consolidation, and I think it will yield more intuitive code.  And sp_has_gptes()
is easy to fix.

> +	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
> +	sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen;

I would prefer that kvm_mmu_alloc_per_vcpu_shadow_page() and kvm_mmu_alloc_page()
share common bits, and then add comments for the differences.  For example, this
path fails to invoke kvm_mod_used_mmu_pages(), which arguably it should do when
not using pae_root, i.e. when it actually "allocates" a page.

I've always found it annoying/odd that kvm_mmu_alloc_page() adds the page to
active_mmu_pages, but the caller adds the page to mmu_page_hash.  This is a good
excuse to fix that.

If role.per_vcpu is a thing, and is tracked in vcpu->arch.mmu->root_role, then
we can do:

	if (level < PT32E_ROOT_LEVEL)
		role.per_vcpu = 0;

	/* Per-vCPU roots are (obviously) not tracked in the per-VM lists. */
	if (unlikely(role.per_vcpu))
		return kvm_mmu_alloc_page(vcpu, role, true, gfn);

	sp_list = &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)];
	for_each_valid_sp(vcpu->kvm, sp, sp_list) {
		...
	}

	++vcpu->kvm->stat.mmu_cache_miss;

	sp = kvm_mmu_alloc_page(vcpu, role, gfn);


and kvm_mmu_alloc_page() becomes something like (completely untested, and I'm not
at all confident about the gfn logic).

static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
					       union kvm_mmu_page_role role,
					       gfn_t gfn)
{
	struct kvm_mmu_page *sp;

	sp = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);

	/*
	 * Use the preallocated mmu->pae_root when the shadow page's level is
	 * PT32E_ROOT_LEVEL.  When using PAE paging, the backing page may need
	 * to have a 32-bit physical address (to go into a 32-bit CR3), and/or
	 * may need to be decrypted (!TDP + SME).  The preallocated pae_root
	 * is prepared for said requirements.
	 */
	if (role.per_vcpu && role.level == PT32E_ROOT_LEVEL) {
		sp->spt = vcpu->arch.mmu->pae_root;
		memset(sp->spt, 0, sizeof(u64) * 4);
	} else {
		sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache);
	}

	if (sp_has_gptes(sp))
		sp->gfns = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_gfn_array_cache);

	WARN_ON_ONCE(role.per_vcpu && gfn);
	sp->gfn = gfn;
	sp->role = role;

	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);

	/*
	 * active_mmu_pages must be a FIFO list, as kvm_zap_obsolete_pages()
	 * depends on valid pages being added to the head of the list.  See
	 * comments in kvm_zap_obsolete_pages().
	 */
	sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen;
	kvm_mod_used_mmu_pages(vcpu->kvm, +1);
	return sp;
}


> +
> +	return sp;
> +}
> +
>  static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct)
>  {
>  	struct kvm_mmu_page *sp;
> @@ -2121,6 +2191,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>  	if (level <= vcpu->arch.mmu->cpu_role.base.level)
>  		role.passthrough = 0;
>  
> +	if (unlikely(level >= PT32E_ROOT_LEVEL && using_local_root_page(vcpu->arch.mmu)))
> +		return kvm_mmu_alloc_local_shadow_page(vcpu, role);
> +
>  	sp_list = &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)];
>  	for_each_valid_sp(vcpu->kvm, sp, sp_list) {
>  		if (sp->gfn != gfn) {
> @@ -3351,6 +3424,37 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa,
>  	*root_hpa = INVALID_PAGE;
>  }
>  
> +static void mmu_free_local_root_page(struct kvm *kvm, struct kvm_mmu *mmu)
> +{
> +	u64 spte = mmu->root.hpa;
> +	struct kvm_mmu_page *sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK);
> +	int i;
> +
> +	/* Free level 5 or 4 roots for shadow NPT for 32 bit L1 */
> +	while (sp->role.level > PT32E_ROOT_LEVEL)

Maybe a for-loop?

	/* Free level 5 or 4 roots for shadow NPT for 32 bit L1 */
	for (sp = to_shadow_page(mmu->root.hpa & PT64_BASE_ADDR_MASK);
	     sp->role.level > PT32E_ROOT_LEVEL;
	     sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK)) {

> +	{
> +		spte = sp->spt[0];
> +		mmu_page_zap_pte(kvm, sp, sp->spt + 0, NULL);
> +		free_page((unsigned long)sp->spt);
> +		kmem_cache_free(mmu_page_header_cache, sp);

Probably worth a helper for free_page()+kmem_cache_free(), especially if the
!pae_root case is accounted.  And then we can combine with tdp_mmu_free_sp() if
we ever decide to fully account TDP MMU pages (to play nice with reclaim).

E.g.

static void __mmu_free_per_vcpu_root_page(struct kvm *kvm,
					  struct kvm_mmu_page *sp)
{
	if (sp->spt != mmu->pae_root) {
		free_page((unsigned long)sp->spt);
		kvm_mod_used_mmu_pages(kvm, -1);
	}

	kmem_cache_free(mmu_page_header_cache, sp);
}

static void mmu_free_per_vcpu_root_page(struct kvm *kvm, struct kvm_mmu *mmu)
{
	struct kvm_mmu_page *sp;
	u64 spte;
	int i;

	/* Free level 5 or 4 roots for shadow NPT for 32 bit L1 */
	for (sp = to_shadow_page(mmu->root.hpa & PT64_BASE_ADDR_MASK);
	     sp->role.level > PT32E_ROOT_LEVEL;
	     sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK)) {
		spte = sp->spt[0];
		mmu_page_zap_pte(kvm, sp, sp->spt + 0, NULL);

		__mmu_free_per_vcpu_root_page(kvm, sp);

		if (!is_shadow_present_pte(spte))
			return;
	}

	if (WARN_ON_ONCE(sp->role.level != PT32E_ROOT_LEVEL))
		return;

	/* Disconnect PAE root from the 4 PAE page directories */
	for (i = 0; i < 4; i++)
		mmu_page_zap_pte(kvm, sp, sp->spt + i, NULL);

	__mmu_free_per_vcpu_root_page(kvm, sp);
}


> +		if (!is_shadow_present_pte(spte))
> +			return;
> +		sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK);
> +	}

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2022-07-20  0:35 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-21 13:16 [PATCH V3 00/12] KVM: X86/MMU: Use one-off local shadow page for special roots Lai Jiangshan
2022-05-21 13:16 ` [PATCH V3 01/12] KVM: X86/MMU: Verify PDPTE for nested NPT in PAE paging mode when page fault Lai Jiangshan
2022-07-19 21:17   ` Sean Christopherson
2022-05-21 13:16 ` [PATCH V3 02/12] KVM: X86/MMU: Add using_local_root_page() Lai Jiangshan
2022-05-26 21:28   ` David Matlack
2022-05-26 21:38     ` Sean Christopherson
2022-07-19 22:03   ` Sean Christopherson
2022-05-21 13:16 ` [PATCH V3 03/12] KVM: X86/MMU: Reduce a check in using_local_root_page() for common cases Lai Jiangshan
2022-05-21 13:16 ` [PATCH V3 04/12] KVM: X86/MMU: Add local shadow pages Lai Jiangshan
2022-05-26 21:38   ` David Matlack
2022-05-26 22:01   ` David Matlack
2022-07-20  0:35   ` Sean Christopherson
2022-05-21 13:16 ` [PATCH V3 05/12] KVM: X86/MMU: Link PAE root pagetable with its children Lai Jiangshan
2022-07-19 22:21   ` Sean Christopherson
2022-05-21 13:16 ` [PATCH V3 06/12] KVM: X86/MMU: Activate local shadow pages and remove old logic Lai Jiangshan
2022-05-21 13:16 ` [PATCH V3 07/12] KVM: X86/MMU: Remove the check of the return value of to_shadow_page() Lai Jiangshan
2022-07-19 22:42   ` Sean Christopherson
2022-05-21 13:16 ` [PATCH V3 08/12] KVM: X86/MMU: Allocate mmu->pae_root for PAE paging on-demand Lai Jiangshan
2022-07-19 23:08   ` Sean Christopherson
2022-07-20  0:07     ` Sean Christopherson
2022-05-21 13:16 ` [PATCH V3 09/12] KVM: X86/MMU: Move the verifying of NPT's PDPTE in FNAME(fetch) Lai Jiangshan
2022-07-19 23:21   ` Sean Christopherson
2022-05-21 13:16 ` [PATCH V3 10/12] KVM: X86/MMU: Remove unused INVALID_PAE_ROOT and IS_VALID_PAE_ROOT Lai Jiangshan
2022-07-19 23:11   ` Sean Christopherson
2022-05-21 13:16 ` [PATCH V3 11/12] KVM: X86/MMU: Don't use mmu->pae_root when shadowing PAE NPT in 64-bit host Lai Jiangshan
2022-07-19 23:26   ` Sean Christopherson
2022-07-19 23:27     ` Sean Christopherson
2022-05-21 13:17 ` [PATCH V3 12/12] KVM: X86/MMU: Remove mmu_alloc_special_roots() Lai Jiangshan
2022-05-26  8:49 ` [PATCH V3 00/12] KVM: X86/MMU: Use one-off local shadow page for special roots Lai Jiangshan
2022-05-26 20:27   ` David Matlack

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.