linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/15] KVM: x86/mmu: Lots of bug fixes
@ 2021-03-02 18:45 Sean Christopherson
  2021-03-02 18:45 ` [PATCH 01/15] KVM: nSVM: Set the shadow root level to the TDP level for nested NPT Sean Christopherson
                   ` (14 more replies)
  0 siblings, 15 replies; 19+ messages in thread
From: Sean Christopherson @ 2021-03-02 18:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Fix nested NPT (nSVM) with 32-bit L1 and SME with shadow paging, which
are completely broken.  Opportunistically fix theoretical bugs related to
prematurely reloading/unloading the MMU.

If nNPT is enabled, L1 can crash the host simply by using 32-bit NPT to
trigger a null pointer dereference on pae_root.

SME with shadow paging (including nNPT) fails to set the C-bit in the
shadow pages that don't go through standard MMU flows (PDPTPRs and the
PML4 used by nNPT to shadow legacy NPT).  It also failes to account for
CR3[63:32], and thus the C-bit, being ignored outside of 64-bit mode.

Patches 01 and 02 fix the null pointer bugs.

Patches 03-07 fix mostly-benign related memory leaks.

Patches 08-10 fix the SME shadow paging bugs, which are also what led me to
the nNPT null pointer bugs.

Patches 11 and 12 fix theoretical bugs with PTP_SWITCH and INVPCID that
I found when auditing flows that touch the MMU context.

Patches 13-15 do additional clean up to hopefully make it harder to
introduce bugs in the future.

On the plus side, I finally understand why KVM supports shadowing 2-level
page tables with 4-level page tables...

Based on kvm/queue, commit fe5f0041c026 ("KVM/SVM: Move vmenter.S exception
fixups out of line").  The null pointer fixes cherry-pick cleanly onto
kvm/master, haven't tried the other bug fixes (I doubt they're worth
backporting even though I tagged 'em with stable).

Sean Christopherson (15):
  KVM: nSVM: Set the shadow root level to the TDP level for nested NPT
  KVM: x86/mmu: Alloc page for PDPTEs when shadowing 32-bit NPT with
    64-bit
  KVM: x86/mmu: Ensure MMU pages are available when allocating roots
  KVM: x86/mmu: Allocate the lm_root before allocating PAE roots
  KVM: x86/mmu: Check PDPTRs before allocating PAE roots
  KVM: x86/mmu: Fix and unconditionally enable WARNs to detect PAE leaks
  KVM: x86/mmu: Use '0' as the one and only value for an invalid PAE
    root
  KVM: x86/mmu: Set the C-bit in the PDPTRs and LM pseudo-PDPTRs
  KVM: x86/mmu: Mark the PAE roots as decrypted for shadow paging
  KVM: SVM: Don't strip the C-bit from CR2 on #PF interception
  KVM: nVMX: Defer the MMU reload to the normal path on an EPTP switch
  KVM: x86: Defer the MMU unload to the normal path on an global INVPCID
  KVM: x86/mmu: Unexport MMU load/unload functions
  KVM: x86/mmu: Sync roots after MMU load iff load as successful
  KVM: x86/mmu: WARN on NULL pae_root and bad shadow root level

 arch/x86/include/asm/kvm_host.h |   3 -
 arch/x86/kvm/mmu.h              |   4 +
 arch/x86/kvm/mmu/mmu.c          | 209 +++++++++++++++++++-------------
 arch/x86/kvm/mmu/tdp_mmu.c      |  23 +---
 arch/x86/kvm/svm/svm.c          |   9 +-
 arch/x86/kvm/vmx/nested.c       |   9 +-
 arch/x86/kvm/x86.c              |   2 +-
 7 files changed, 142 insertions(+), 117 deletions(-)

-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 01/15] KVM: nSVM: Set the shadow root level to the TDP level for nested NPT
  2021-03-02 18:45 [PATCH 00/15] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
@ 2021-03-02 18:45 ` Sean Christopherson
  2021-03-02 18:45 ` [PATCH 02/15] KVM: x86/mmu: Alloc page for PDPTEs when shadowing 32-bit NPT with 64-bit Sean Christopherson
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: Sean Christopherson @ 2021-03-02 18:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Override the shadow root level in the MMU context when configuring
NPT for shadowing nested NPT.  The level is always tied to the TDP level
of the host, not whatever level the guest happens to be using.

Fixes: 096586fda522 ("KVM: nSVM: Correctly set the shadow NPT root level in its MMU role")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c462062d36aa..0987cc1d53eb 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4618,12 +4618,17 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, u32 cr0, u32 cr4, u32 efer,
 	struct kvm_mmu *context = &vcpu->arch.guest_mmu;
 	union kvm_mmu_role new_role = kvm_calc_shadow_npt_root_page_role(vcpu);
 
-	context->shadow_root_level = new_role.base.level;
-
 	__kvm_mmu_new_pgd(vcpu, nested_cr3, new_role.base, false, false);
 
-	if (new_role.as_u64 != context->mmu_role.as_u64)
+	if (new_role.as_u64 != context->mmu_role.as_u64) {
 		shadow_mmu_init_context(vcpu, context, cr0, cr4, efer, new_role);
+
+		/*
+		 * Override the level set by the common init helper, nested TDP
+		 * always uses the host's TDP configuration.
+		 */
+		context->shadow_root_level = new_role.base.level;
+	}
 }
 EXPORT_SYMBOL_GPL(kvm_init_shadow_npt_mmu);
 
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 02/15] KVM: x86/mmu: Alloc page for PDPTEs when shadowing 32-bit NPT with 64-bit
  2021-03-02 18:45 [PATCH 00/15] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
  2021-03-02 18:45 ` [PATCH 01/15] KVM: nSVM: Set the shadow root level to the TDP level for nested NPT Sean Christopherson
@ 2021-03-02 18:45 ` Sean Christopherson
  2021-03-03 17:28   ` Ben Gardon
  2021-03-02 18:45 ` [PATCH 03/15] KVM: x86/mmu: Ensure MMU pages are available when allocating roots Sean Christopherson
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 19+ messages in thread
From: Sean Christopherson @ 2021-03-02 18:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Allocate the so called pae_root page on-demand, along with the lm_root
page, when shadowing 32-bit NPT with 64-bit NPT, i.e. when running a
32-bit L1.  KVM currently only allocates the page when NPT is disabled,
or when L0 is 32-bit (using PAE paging).

Note, there is an existing memory leak involving the MMU roots, as KVM
fails to free the PAE roots on failure.  This will be addressed in a
future commit.

Fixes: ee6268ba3a68 ("KVM: x86: Skip pae_root shadow allocation if tdp enabled")
Fixes: b6b80c78af83 ("KVM: x86/mmu: Allocate PAE root array when using SVM's 32-bit NPT")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 44 ++++++++++++++++++++++++++++--------------
 1 file changed, 29 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0987cc1d53eb..2ed3fac1244e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3187,14 +3187,14 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 		if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL &&
 		    (mmu->root_level >= PT64_ROOT_4LEVEL || mmu->direct_map)) {
 			mmu_free_root_page(kvm, &mmu->root_hpa, &invalid_list);
-		} else {
+		} else if (mmu->pae_root) {
 			for (i = 0; i < 4; ++i)
 				if (mmu->pae_root[i] != 0)
 					mmu_free_root_page(kvm,
 							   &mmu->pae_root[i],
 							   &invalid_list);
-			mmu->root_hpa = INVALID_PAGE;
 		}
+		mmu->root_hpa = INVALID_PAGE;
 		mmu->root_pgd = 0;
 	}
 
@@ -3306,9 +3306,23 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	 * the shadow page table may be a PAE or a long mode page table.
 	 */
 	pm_mask = PT_PRESENT_MASK;
-	if (vcpu->arch.mmu->shadow_root_level == PT64_ROOT_4LEVEL)
+	if (vcpu->arch.mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
 		pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK;
 
+		/*
+		 * Allocate the page for the PDPTEs when shadowing 32-bit NPT
+		 * with 64-bit only when needed.  Unlike 32-bit NPT, it doesn't
+		 * need to be in low mem.  See also lm_root below.
+		 */
+		if (!vcpu->arch.mmu->pae_root) {
+			WARN_ON_ONCE(!tdp_enabled);
+
+			vcpu->arch.mmu->pae_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+			if (!vcpu->arch.mmu->pae_root)
+				return -ENOMEM;
+		}
+	}
+
 	for (i = 0; i < 4; ++i) {
 		MMU_WARN_ON(VALID_PAGE(vcpu->arch.mmu->pae_root[i]));
 		if (vcpu->arch.mmu->root_level == PT32E_ROOT_LEVEL) {
@@ -3331,21 +3345,19 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	vcpu->arch.mmu->root_hpa = __pa(vcpu->arch.mmu->pae_root);
 
 	/*
-	 * If we shadow a 32 bit page table with a long mode page
-	 * table we enter this path.
+	 * When shadowing 32-bit or PAE NPT with 64-bit NPT, the PML4 and PDP
+	 * tables are allocated and initialized at MMU creation as there is no
+	 * equivalent level in the guest's NPT to shadow.  Allocate the tables
+	 * on demand, as running a 32-bit L1 VMM is very rare.  The PDP is
+	 * handled above (to share logic with PAE), deal with the PML4 here.
 	 */
 	if (vcpu->arch.mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
 		if (vcpu->arch.mmu->lm_root == NULL) {
-			/*
-			 * The additional page necessary for this is only
-			 * allocated on demand.
-			 */
-
 			u64 *lm_root;
 
 			lm_root = (void*)get_zeroed_page(GFP_KERNEL_ACCOUNT);
-			if (lm_root == NULL)
-				return 1;
+			if (!lm_root)
+				return -ENOMEM;
 
 			lm_root[0] = __pa(vcpu->arch.mmu->pae_root) | pm_mask;
 
@@ -5248,9 +5260,11 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
 	 * while the PDP table is a per-vCPU construct that's allocated at MMU
 	 * creation.  When emulating 32-bit mode, cr3 is only 32 bits even on
 	 * x86_64.  Therefore we need to allocate the PDP table in the first
-	 * 4GB of memory, which happens to fit the DMA32 zone.  Except for
-	 * SVM's 32-bit NPT support, TDP paging doesn't use PAE paging and can
-	 * skip allocating the PDP table.
+	 * 4GB of memory, which happens to fit the DMA32 zone.  TDP paging
+	 * generally doesn't use PAE paging and can skip allocating the PDP
+	 * table.  The main exception, handled here, is SVM's 32-bit NPT.  The
+	 * other exception is for shadowing L1's 32-bit or PAE NPT on 64-bit
+	 * KVM; that horror is handled on-demand by mmu_alloc_shadow_roots().
 	 */
 	if (tdp_enabled && kvm_mmu_get_tdp_level(vcpu) > PT32E_ROOT_LEVEL)
 		return 0;
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 03/15] KVM: x86/mmu: Ensure MMU pages are available when allocating roots
  2021-03-02 18:45 [PATCH 00/15] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
  2021-03-02 18:45 ` [PATCH 01/15] KVM: nSVM: Set the shadow root level to the TDP level for nested NPT Sean Christopherson
  2021-03-02 18:45 ` [PATCH 02/15] KVM: x86/mmu: Alloc page for PDPTEs when shadowing 32-bit NPT with 64-bit Sean Christopherson
@ 2021-03-02 18:45 ` Sean Christopherson
  2021-03-03  0:21   ` Ben Gardon
  2021-03-02 18:45 ` [PATCH 04/15] KVM: x86/mmu: Allocate the lm_root before allocating PAE roots Sean Christopherson
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 19+ messages in thread
From: Sean Christopherson @ 2021-03-02 18:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Hold the mmu_lock for write for the entire duration of allocating and
initializing an MMU's roots.  This ensures there are MMU pages available
and thus prevents root allocations from failing.  That in turn fixes a
bug where KVM would fail to free valid PAE roots if a one of the later
roots failed to allocate.

Note, KVM still leaks the PAE roots if the lm_root allocation fails.
This will be addressed in a future commit.

Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c     | 41 ++++++++++++--------------------------
 arch/x86/kvm/mmu/tdp_mmu.c | 23 +++++----------------
 2 files changed, 18 insertions(+), 46 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2ed3fac1244e..1f129001a30c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2398,6 +2398,9 @@ static int make_mmu_pages_available(struct kvm_vcpu *vcpu)
 {
 	unsigned long avail = kvm_mmu_available_pages(vcpu->kvm);
 
+	/* Ensure all four PAE roots can be allocated in a single pass. */
+	BUILD_BUG_ON(KVM_MIN_FREE_MMU_PAGES < 4);
+
 	if (likely(avail >= KVM_MIN_FREE_MMU_PAGES))
 		return 0;
 
@@ -3220,16 +3223,9 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, gva_t gva,
 {
 	struct kvm_mmu_page *sp;
 
-	write_lock(&vcpu->kvm->mmu_lock);
-
-	if (make_mmu_pages_available(vcpu)) {
-		write_unlock(&vcpu->kvm->mmu_lock);
-		return INVALID_PAGE;
-	}
 	sp = kvm_mmu_get_page(vcpu, gfn, gva, level, direct, ACC_ALL);
 	++sp->root_count;
 
-	write_unlock(&vcpu->kvm->mmu_lock);
 	return __pa(sp->spt);
 }
 
@@ -3241,16 +3237,10 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 
 	if (is_tdp_mmu_enabled(vcpu->kvm)) {
 		root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu);
-
-		if (!VALID_PAGE(root))
-			return -ENOSPC;
 		vcpu->arch.mmu->root_hpa = root;
 	} else if (shadow_root_level >= PT64_ROOT_4LEVEL) {
 		root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level,
 				      true);
-
-		if (!VALID_PAGE(root))
-			return -ENOSPC;
 		vcpu->arch.mmu->root_hpa = root;
 	} else if (shadow_root_level == PT32E_ROOT_LEVEL) {
 		for (i = 0; i < 4; ++i) {
@@ -3258,8 +3248,6 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 
 			root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT),
 					      i << 30, PT32_ROOT_LEVEL, true);
-			if (!VALID_PAGE(root))
-				return -ENOSPC;
 			vcpu->arch.mmu->pae_root[i] = root | PT_PRESENT_MASK;
 		}
 		vcpu->arch.mmu->root_hpa = __pa(vcpu->arch.mmu->pae_root);
@@ -3294,8 +3282,6 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 
 		root = mmu_alloc_root(vcpu, root_gfn, 0,
 				      vcpu->arch.mmu->shadow_root_level, false);
-		if (!VALID_PAGE(root))
-			return -ENOSPC;
 		vcpu->arch.mmu->root_hpa = root;
 		goto set_root_pgd;
 	}
@@ -3325,6 +3311,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 
 	for (i = 0; i < 4; ++i) {
 		MMU_WARN_ON(VALID_PAGE(vcpu->arch.mmu->pae_root[i]));
+
 		if (vcpu->arch.mmu->root_level == PT32E_ROOT_LEVEL) {
 			pdptr = vcpu->arch.mmu->get_pdptr(vcpu, i);
 			if (!(pdptr & PT_PRESENT_MASK)) {
@@ -3338,8 +3325,6 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 
 		root = mmu_alloc_root(vcpu, root_gfn, i << 30,
 				      PT32_ROOT_LEVEL, false);
-		if (!VALID_PAGE(root))
-			return -ENOSPC;
 		vcpu->arch.mmu->pae_root[i] = root | pm_mask;
 	}
 	vcpu->arch.mmu->root_hpa = __pa(vcpu->arch.mmu->pae_root);
@@ -3373,14 +3358,6 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
-static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
-{
-	if (vcpu->arch.mmu->direct_map)
-		return mmu_alloc_direct_roots(vcpu);
-	else
-		return mmu_alloc_shadow_roots(vcpu);
-}
-
 void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
 {
 	int i;
@@ -4822,7 +4799,15 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 	r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->direct_map);
 	if (r)
 		goto out;
-	r = mmu_alloc_roots(vcpu);
+	write_lock(&vcpu->kvm->mmu_lock);
+	if (make_mmu_pages_available(vcpu))
+		r = -ENOSPC;
+	else if (vcpu->arch.mmu->direct_map)
+		r = mmu_alloc_direct_roots(vcpu);
+	else
+		r = mmu_alloc_shadow_roots(vcpu);
+	write_unlock(&vcpu->kvm->mmu_lock);
+
 	kvm_mmu_sync_roots(vcpu);
 	if (r)
 		goto out;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 70226e0875fe..50ef757c5586 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -137,22 +137,21 @@ static struct kvm_mmu_page *alloc_tdp_mmu_page(struct kvm_vcpu *vcpu, gfn_t gfn,
 	return sp;
 }
 
-static struct kvm_mmu_page *get_tdp_mmu_vcpu_root(struct kvm_vcpu *vcpu)
+hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
 {
 	union kvm_mmu_page_role role;
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_mmu_page *root;
 
+	lockdep_assert_held_write(&kvm->mmu_lock);
+
 	role = page_role_for_level(vcpu, vcpu->arch.mmu->shadow_root_level);
 
-	write_lock(&kvm->mmu_lock);
-
 	/* Check for an existing root before allocating a new one. */
 	for_each_tdp_mmu_root(kvm, root) {
 		if (root->role.word == role.word) {
 			kvm_mmu_get_root(kvm, root);
-			write_unlock(&kvm->mmu_lock);
-			return root;
+			goto out;
 		}
 	}
 
@@ -161,19 +160,7 @@ static struct kvm_mmu_page *get_tdp_mmu_vcpu_root(struct kvm_vcpu *vcpu)
 
 	list_add(&root->link, &kvm->arch.tdp_mmu_roots);
 
-	write_unlock(&kvm->mmu_lock);
-
-	return root;
-}
-
-hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
-{
-	struct kvm_mmu_page *root;
-
-	root = get_tdp_mmu_vcpu_root(vcpu);
-	if (!root)
-		return INVALID_PAGE;
-
+out:
 	return __pa(root->spt);
 }
 
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 04/15] KVM: x86/mmu: Allocate the lm_root before allocating PAE roots
  2021-03-02 18:45 [PATCH 00/15] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
                   ` (2 preceding siblings ...)
  2021-03-02 18:45 ` [PATCH 03/15] KVM: x86/mmu: Ensure MMU pages are available when allocating roots Sean Christopherson
@ 2021-03-02 18:45 ` Sean Christopherson
  2021-03-02 18:45 ` [PATCH 05/15] KVM: x86/mmu: Check PDPTRs " Sean Christopherson
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: Sean Christopherson @ 2021-03-02 18:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Allocate lm_root before the PAE roots so that the PAE roots aren't
leaked if the memory allocation for the lm_root happens to fail.

Note, KVM can _still_ leak PAE roots if mmu_check_root() fails on a
guest's PDPTR.  That too will be fixed in a future commit.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 65 +++++++++++++++++++++---------------------
 1 file changed, 32 insertions(+), 33 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 1f129001a30c..e5c3701112f8 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3292,21 +3292,39 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	 * the shadow page table may be a PAE or a long mode page table.
 	 */
 	pm_mask = PT_PRESENT_MASK;
-	if (vcpu->arch.mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
+	if (vcpu->arch.mmu->shadow_root_level == PT64_ROOT_4LEVEL)
 		pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK;
 
-		/*
-		 * Allocate the page for the PDPTEs when shadowing 32-bit NPT
-		 * with 64-bit only when needed.  Unlike 32-bit NPT, it doesn't
-		 * need to be in low mem.  See also lm_root below.
-		 */
-		if (!vcpu->arch.mmu->pae_root) {
-			WARN_ON_ONCE(!tdp_enabled);
+	/*
+	 * When shadowing 32-bit or PAE NPT with 64-bit NPT, the PML4 and PDP
+	 * tables are allocated and initialized at root creation as there is no
+	 * equivalent level in the guest's NPT to shadow.  Allocate the tables
+	 * on demand, as running a 32-bit L1 VMM is very rare.  Unlike 32-bit
+	 * NPT, the PDP table doesn't need to be in low mem.  Preallocate the
+	 * pages so that the PAE roots aren't leaked on failure.
+	 */
+	if (vcpu->arch.mmu->shadow_root_level == PT64_ROOT_4LEVEL &&
+	    (!vcpu->arch.mmu->pae_root || !vcpu->arch.mmu->lm_root)) {
+		u64 *lm_root, *pae_root;
 
-			vcpu->arch.mmu->pae_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
-			if (!vcpu->arch.mmu->pae_root)
-				return -ENOMEM;
+		if (WARN_ON_ONCE(!tdp_enabled || vcpu->arch.mmu->pae_root ||
+				 vcpu->arch.mmu->lm_root))
+			return -EIO;
+
+		pae_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+		if (!pae_root)
+			return -ENOMEM;
+
+		lm_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+		if (!lm_root) {
+			free_page((unsigned long)pae_root);
+			return -ENOMEM;
 		}
+
+		vcpu->arch.mmu->pae_root = pae_root;
+		vcpu->arch.mmu->lm_root = lm_root;
+
+		lm_root[0] = __pa(vcpu->arch.mmu->pae_root) | pm_mask;
 	}
 
 	for (i = 0; i < 4; ++i) {
@@ -3327,30 +3345,11 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 				      PT32_ROOT_LEVEL, false);
 		vcpu->arch.mmu->pae_root[i] = root | pm_mask;
 	}
-	vcpu->arch.mmu->root_hpa = __pa(vcpu->arch.mmu->pae_root);
-
-	/*
-	 * When shadowing 32-bit or PAE NPT with 64-bit NPT, the PML4 and PDP
-	 * tables are allocated and initialized at MMU creation as there is no
-	 * equivalent level in the guest's NPT to shadow.  Allocate the tables
-	 * on demand, as running a 32-bit L1 VMM is very rare.  The PDP is
-	 * handled above (to share logic with PAE), deal with the PML4 here.
-	 */
-	if (vcpu->arch.mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
-		if (vcpu->arch.mmu->lm_root == NULL) {
-			u64 *lm_root;
-
-			lm_root = (void*)get_zeroed_page(GFP_KERNEL_ACCOUNT);
-			if (!lm_root)
-				return -ENOMEM;
-
-			lm_root[0] = __pa(vcpu->arch.mmu->pae_root) | pm_mask;
-
-			vcpu->arch.mmu->lm_root = lm_root;
-		}
 
+	if (vcpu->arch.mmu->shadow_root_level == PT64_ROOT_4LEVEL)
 		vcpu->arch.mmu->root_hpa = __pa(vcpu->arch.mmu->lm_root);
-	}
+	else
+		vcpu->arch.mmu->root_hpa = __pa(vcpu->arch.mmu->pae_root);
 
 set_root_pgd:
 	vcpu->arch.mmu->root_pgd = root_pgd;
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 05/15] KVM: x86/mmu: Check PDPTRs before allocating PAE roots
  2021-03-02 18:45 [PATCH 00/15] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
                   ` (3 preceding siblings ...)
  2021-03-02 18:45 ` [PATCH 04/15] KVM: x86/mmu: Allocate the lm_root before allocating PAE roots Sean Christopherson
@ 2021-03-02 18:45 ` Sean Christopherson
  2021-03-02 18:45 ` [PATCH 06/15] KVM: x86/mmu: Fix and unconditionally enable WARNs to detect PAE leaks Sean Christopherson
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: Sean Christopherson @ 2021-03-02 18:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Check the validity of the PDPTRs before allocating any of the PAE roots,
otherwise a bad PDPTR will cause KVM to leak any previously allocated
roots.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e5c3701112f8..aa20e8d32197 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3262,7 +3262,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 
 static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 {
-	u64 pdptr, pm_mask;
+	u64 pdptrs[4], pm_mask;
 	gfn_t root_gfn, root_pgd;
 	hpa_t root;
 	int i;
@@ -3273,6 +3273,17 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	if (mmu_check_root(vcpu, root_gfn))
 		return 1;
 
+	if (vcpu->arch.mmu->root_level == PT32E_ROOT_LEVEL) {
+		for (i = 0; i < 4; ++i) {
+			pdptrs[i] = vcpu->arch.mmu->get_pdptr(vcpu, i);
+			if (!(pdptrs[i] & PT_PRESENT_MASK))
+				continue;
+
+			if (mmu_check_root(vcpu, pdptrs[i] >> PAGE_SHIFT))
+				return 1;
+		}
+	}
+
 	/*
 	 * Do we shadow a long mode page table? If so we need to
 	 * write-protect the guests page table root.
@@ -3331,14 +3342,11 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 		MMU_WARN_ON(VALID_PAGE(vcpu->arch.mmu->pae_root[i]));
 
 		if (vcpu->arch.mmu->root_level == PT32E_ROOT_LEVEL) {
-			pdptr = vcpu->arch.mmu->get_pdptr(vcpu, i);
-			if (!(pdptr & PT_PRESENT_MASK)) {
+			if (!(pdptrs[i] & PT_PRESENT_MASK)) {
 				vcpu->arch.mmu->pae_root[i] = 0;
 				continue;
 			}
-			root_gfn = pdptr >> PAGE_SHIFT;
-			if (mmu_check_root(vcpu, root_gfn))
-				return 1;
+			root_gfn = pdptrs[i] >> PAGE_SHIFT;
 		}
 
 		root = mmu_alloc_root(vcpu, root_gfn, i << 30,
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 06/15] KVM: x86/mmu: Fix and unconditionally enable WARNs to detect PAE leaks
  2021-03-02 18:45 [PATCH 00/15] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
                   ` (4 preceding siblings ...)
  2021-03-02 18:45 ` [PATCH 05/15] KVM: x86/mmu: Check PDPTRs " Sean Christopherson
@ 2021-03-02 18:45 ` Sean Christopherson
  2021-03-02 18:45 ` [PATCH 07/15] KVM: x86/mmu: Use '0' as the one and only value for an invalid PAE root Sean Christopherson
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: Sean Christopherson @ 2021-03-02 18:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Exempt NULL PAE roots from the check to detect leaks, since
kvm_mmu_free_roots() doesn't set them back to INVALID_PAGE.  Stop hiding
the WARNs to detect PAE root leaks behind MMU_WARN_ON, the hidden WARNs
obviously didn't do their job given the hilarious number of bugs that
could lead to PAE roots being leaked, not to mention the above false
positive.

Opportunistically delete a warning on root_hpa being valid, there's
nothing special about 4/5-level shadow pages that warrants a WARN.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index aa20e8d32197..3ef7fb2a9878 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3244,7 +3244,8 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 		vcpu->arch.mmu->root_hpa = root;
 	} else if (shadow_root_level == PT32E_ROOT_LEVEL) {
 		for (i = 0; i < 4; ++i) {
-			MMU_WARN_ON(VALID_PAGE(vcpu->arch.mmu->pae_root[i]));
+			WARN_ON_ONCE(vcpu->arch.mmu->pae_root[i] &&
+				     VALID_PAGE(vcpu->arch.mmu->pae_root[i]));
 
 			root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT),
 					      i << 30, PT32_ROOT_LEVEL, true);
@@ -3289,8 +3290,6 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	 * write-protect the guests page table root.
 	 */
 	if (vcpu->arch.mmu->root_level >= PT64_ROOT_4LEVEL) {
-		MMU_WARN_ON(VALID_PAGE(vcpu->arch.mmu->root_hpa));
-
 		root = mmu_alloc_root(vcpu, root_gfn, 0,
 				      vcpu->arch.mmu->shadow_root_level, false);
 		vcpu->arch.mmu->root_hpa = root;
@@ -3339,7 +3338,8 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	}
 
 	for (i = 0; i < 4; ++i) {
-		MMU_WARN_ON(VALID_PAGE(vcpu->arch.mmu->pae_root[i]));
+		WARN_ON_ONCE(vcpu->arch.mmu->pae_root[i] &&
+			     VALID_PAGE(vcpu->arch.mmu->pae_root[i]));
 
 		if (vcpu->arch.mmu->root_level == PT32E_ROOT_LEVEL) {
 			if (!(pdptrs[i] & PT_PRESENT_MASK)) {
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 07/15] KVM: x86/mmu: Use '0' as the one and only value for an invalid PAE root
  2021-03-02 18:45 [PATCH 00/15] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
                   ` (5 preceding siblings ...)
  2021-03-02 18:45 ` [PATCH 06/15] KVM: x86/mmu: Fix and unconditionally enable WARNs to detect PAE leaks Sean Christopherson
@ 2021-03-02 18:45 ` Sean Christopherson
  2021-03-02 18:45 ` [PATCH 08/15] KVM: x86/mmu: Set the C-bit in the PDPTRs and LM pseudo-PDPTRs Sean Christopherson
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: Sean Christopherson @ 2021-03-02 18:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Use '0' to denote an invalid pae_root instead of '0' or INVALID_PAGE.
Unlike root_hpa, the pae_roots hold permission bits and thus are
guaranteed to be non-zero.  Having to deal with both values leads to
bugs, e.g. failing to set back to INVALID_PAGE, warning on the wrong
value, etc...

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3ef7fb2a9878..59b1709a55b4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3191,11 +3191,14 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 		    (mmu->root_level >= PT64_ROOT_4LEVEL || mmu->direct_map)) {
 			mmu_free_root_page(kvm, &mmu->root_hpa, &invalid_list);
 		} else if (mmu->pae_root) {
-			for (i = 0; i < 4; ++i)
-				if (mmu->pae_root[i] != 0)
-					mmu_free_root_page(kvm,
-							   &mmu->pae_root[i],
-							   &invalid_list);
+			for (i = 0; i < 4; ++i) {
+				if (!mmu->pae_root[i])
+					continue;
+
+				mmu_free_root_page(kvm, &mmu->pae_root[i],
+						   &invalid_list);
+				mmu->pae_root[i] = 0;
+			}
 		}
 		mmu->root_hpa = INVALID_PAGE;
 		mmu->root_pgd = 0;
@@ -3244,8 +3247,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 		vcpu->arch.mmu->root_hpa = root;
 	} else if (shadow_root_level == PT32E_ROOT_LEVEL) {
 		for (i = 0; i < 4; ++i) {
-			WARN_ON_ONCE(vcpu->arch.mmu->pae_root[i] &&
-				     VALID_PAGE(vcpu->arch.mmu->pae_root[i]));
+			WARN_ON_ONCE(vcpu->arch.mmu->pae_root[i]);
 
 			root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT),
 					      i << 30, PT32_ROOT_LEVEL, true);
@@ -3338,8 +3340,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	}
 
 	for (i = 0; i < 4; ++i) {
-		WARN_ON_ONCE(vcpu->arch.mmu->pae_root[i] &&
-			     VALID_PAGE(vcpu->arch.mmu->pae_root[i]));
+		WARN_ON_ONCE(vcpu->arch.mmu->pae_root[i]);
 
 		if (vcpu->arch.mmu->root_level == PT32E_ROOT_LEVEL) {
 			if (!(pdptrs[i] & PT_PRESENT_MASK)) {
@@ -3412,7 +3413,7 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
 	for (i = 0; i < 4; ++i) {
 		hpa_t root = vcpu->arch.mmu->pae_root[i];
 
-		if (root && VALID_PAGE(root)) {
+		if (root && !WARN_ON_ONCE(!VALID_PAGE(root))) {
 			root &= PT64_BASE_ADDR_MASK;
 			sp = to_shadow_page(root);
 			mmu_sync_children(vcpu, sp);
@@ -5267,7 +5268,7 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
 
 	mmu->pae_root = page_address(page);
 	for (i = 0; i < 4; ++i)
-		mmu->pae_root[i] = INVALID_PAGE;
+		mmu->pae_root[i] = 0;
 
 	return 0;
 }
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 08/15] KVM: x86/mmu: Set the C-bit in the PDPTRs and LM pseudo-PDPTRs
  2021-03-02 18:45 [PATCH 00/15] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
                   ` (6 preceding siblings ...)
  2021-03-02 18:45 ` [PATCH 07/15] KVM: x86/mmu: Use '0' as the one and only value for an invalid PAE root Sean Christopherson
@ 2021-03-02 18:45 ` Sean Christopherson
  2021-03-02 18:45 ` [PATCH 09/15] KVM: x86/mmu: Mark the PAE roots as decrypted for shadow paging Sean Christopherson
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: Sean Christopherson @ 2021-03-02 18:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Set the C-bit in SPTEs that are set outside of the normal MMU flows,
specifically the PDPDTRs and the handful of special cased "LM root"
entries, all of which are shadow paging only.

Note, the direct-mapped-root PDPTR handling is needed for the scenario
where paging is disabled in the guest, in which case KVM uses a direct
mapped MMU even though TDP is disabled.

Fixes: d0ec49d4de90 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Cc: stable@vger.kernel.org
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 59b1709a55b4..ddf1845f072e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3251,7 +3251,8 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 
 			root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT),
 					      i << 30, PT32_ROOT_LEVEL, true);
-			vcpu->arch.mmu->pae_root[i] = root | PT_PRESENT_MASK;
+			vcpu->arch.mmu->pae_root[i] = root | PT_PRESENT_MASK |
+						      shadow_me_mask;
 		}
 		vcpu->arch.mmu->root_hpa = __pa(vcpu->arch.mmu->pae_root);
 	} else
@@ -3303,7 +3304,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	 * or a PAE 3-level page table. In either case we need to be aware that
 	 * the shadow page table may be a PAE or a long mode page table.
 	 */
-	pm_mask = PT_PRESENT_MASK;
+	pm_mask = PT_PRESENT_MASK | shadow_me_mask;
 	if (vcpu->arch.mmu->shadow_root_level == PT64_ROOT_4LEVEL)
 		pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK;
 
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 09/15] KVM: x86/mmu: Mark the PAE roots as decrypted for shadow paging
  2021-03-02 18:45 [PATCH 00/15] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
                   ` (7 preceding siblings ...)
  2021-03-02 18:45 ` [PATCH 08/15] KVM: x86/mmu: Set the C-bit in the PDPTRs and LM pseudo-PDPTRs Sean Christopherson
@ 2021-03-02 18:45 ` Sean Christopherson
  2021-03-02 18:45 ` [PATCH 10/15] KVM: SVM: Don't strip the C-bit from CR2 on #PF interception Sean Christopherson
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: Sean Christopherson @ 2021-03-02 18:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Set the PAE roots used as decrypted to play nice with SME when KVM is
using shadow paging.  Explicitly skip setting the C-bit when loading
CR3 for PAE shadow paging, even though it's completely ignored by the
CPU.  The extra documentation is nice to have.

Note, there are several subtleties at play with NPT.  In addition to
legacy shadow paging, the PAE roots are used for SVM's NPT when either
KVM is 32-bit (uses PAE paging) or KVM is 64-bit and shadowing 32-bit
NPT.  However, 32-bit Linux, and thus KVM, doesn't support SME.  And
64-bit KVM can happily set the C-bit in CR3.  This also means that
keeping __sme_set(root) for 32-bit KVM when NPT is enabled is
conceptually wrong, but functionally ok since SME is 64-bit only.
Leave it as is to avoid unnecessary pollution.

Fixes: d0ec49d4de90 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Cc: stable@vger.kernel.org
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 24 ++++++++++++++++++++++--
 arch/x86/kvm/svm/svm.c |  7 +++++--
 2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index ddf1845f072e..45fe97b3b25d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -48,6 +48,7 @@
 #include <asm/memtype.h>
 #include <asm/cmpxchg.h>
 #include <asm/io.h>
+#include <asm/set_memory.h>
 #include <asm/vmx.h>
 #include <asm/kvm_page_track.h>
 #include "trace.h"
@@ -3313,8 +3314,9 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	 * tables are allocated and initialized at root creation as there is no
 	 * equivalent level in the guest's NPT to shadow.  Allocate the tables
 	 * on demand, as running a 32-bit L1 VMM is very rare.  Unlike 32-bit
-	 * NPT, the PDP table doesn't need to be in low mem.  Preallocate the
-	 * pages so that the PAE roots aren't leaked on failure.
+	 * NPT, the PDP table doesn't need to be in low mem, and doesn't need
+	 * to be decrypted.  Preallocate the pages so that the PAE roots aren't
+	 * leaked on failure.
 	 */
 	if (vcpu->arch.mmu->shadow_root_level == PT64_ROOT_4LEVEL &&
 	    (!vcpu->arch.mmu->pae_root || !vcpu->arch.mmu->lm_root)) {
@@ -5234,6 +5236,8 @@ slot_handle_leaf(struct kvm *kvm, struct kvm_memory_slot *memslot,
 
 static void free_mmu_pages(struct kvm_mmu *mmu)
 {
+	if (!tdp_enabled && mmu->pae_root)
+		set_memory_encrypted((unsigned long)mmu->pae_root, 1);
 	free_page((unsigned long)mmu->pae_root);
 	free_page((unsigned long)mmu->lm_root);
 }
@@ -5271,6 +5275,22 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
 	for (i = 0; i < 4; ++i)
 		mmu->pae_root[i] = 0;
 
+	/*
+	 * CR3 is only 32 bits when PAE paging is used, thus it's impossible to
+	 * get the CPU to treat the PDPTEs as encrypted.  Decrypt the page so
+	 * that KVM's writes and the CPU's reads get along.  Note, this is
+	 * only necessary when using shadow paging, as 64-bit NPT can get at
+	 * the C-bit even when shadowing 32-bit NPT, and SME isn't supported
+	 * by 32-bit kernels (when KVM itself uses 32-bit NPT).
+	 */
+	if (!tdp_enabled)
+		set_memory_decrypted((unsigned long)mmu->pae_root, 1);
+	else
+		WARN_ON_ONCE(shadow_me_mask);
+
+	for (i = 0; i < 4; ++i)
+		mmu->pae_root[i] = 0;
+
 	return 0;
 }
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 54610270f66a..4769cf8bf2fd 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3908,15 +3908,18 @@ static void svm_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned long root,
 	struct vcpu_svm *svm = to_svm(vcpu);
 	unsigned long cr3;
 
-	cr3 = __sme_set(root);
 	if (npt_enabled) {
-		svm->vmcb->control.nested_cr3 = cr3;
+		svm->vmcb->control.nested_cr3 = __sme_set(root);
 		vmcb_mark_dirty(svm->vmcb, VMCB_NPT);
 
 		/* Loading L2's CR3 is handled by enter_svm_guest_mode.  */
 		if (!test_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail))
 			return;
 		cr3 = vcpu->arch.cr3;
+	} else if (vcpu->arch.mmu->shadow_root_level >= PT64_ROOT_4LEVEL) {
+		cr3 = __sme_set(root);
+	} else {
+		cr3 = root;
 	}
 
 	svm->vmcb->save.cr3 = cr3;
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 10/15] KVM: SVM: Don't strip the C-bit from CR2 on #PF interception
  2021-03-02 18:45 [PATCH 00/15] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
                   ` (8 preceding siblings ...)
  2021-03-02 18:45 ` [PATCH 09/15] KVM: x86/mmu: Mark the PAE roots as decrypted for shadow paging Sean Christopherson
@ 2021-03-02 18:45 ` Sean Christopherson
  2021-03-02 18:45 ` [PATCH 11/15] KVM: nVMX: Defer the MMU reload to the normal path on an EPTP switch Sean Christopherson
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: Sean Christopherson @ 2021-03-02 18:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Don't strip the C-bit from the faulting address on an intercepted #PF,
the address is a virtual address, not a physical address.

Fixes: 0ede79e13224 ("KVM: SVM: Clear C-bit from the page fault address")
Cc: stable@vger.kernel.org
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 4769cf8bf2fd..dfc8fe231e8b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1907,7 +1907,7 @@ static int pf_interception(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	u64 fault_address = __sme_clr(svm->vmcb->control.exit_info_2);
+	u64 fault_address = svm->vmcb->control.exit_info_2;
 	u64 error_code = svm->vmcb->control.exit_info_1;
 
 	return kvm_handle_page_fault(vcpu, error_code, fault_address,
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 11/15] KVM: nVMX: Defer the MMU reload to the normal path on an EPTP switch
  2021-03-02 18:45 [PATCH 00/15] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
                   ` (9 preceding siblings ...)
  2021-03-02 18:45 ` [PATCH 10/15] KVM: SVM: Don't strip the C-bit from CR2 on #PF interception Sean Christopherson
@ 2021-03-02 18:45 ` Sean Christopherson
  2021-03-02 18:45 ` [PATCH 12/15] KVM: x86: Defer the MMU unload to the normal path on an global INVPCID Sean Christopherson
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: Sean Christopherson @ 2021-03-02 18:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Defer reloading the MMU after a EPTP successful EPTP switch.  The VMFUNC
instruction itself is executed in the previous EPTP context, any side
effects, e.g. updating RIP, should occur in the old context.  Practically
speaking, this bug is benign as VMX doesn't touch the MMU when skipping
an emulated instruction, nor does queuing a single-step #DB.  No other
post-switch side effects exist.

Fixes: 41ab93727467 ("KVM: nVMX: Emulate EPTP switching for the L1 hypervisor")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index fdd80dd8e781..81f609886c8b 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5473,16 +5473,11 @@ static int nested_vmx_eptp_switching(struct kvm_vcpu *vcpu,
 		if (!nested_vmx_check_eptp(vcpu, new_eptp))
 			return 1;
 
-		kvm_mmu_unload(vcpu);
 		mmu->ept_ad = accessed_dirty;
 		mmu->mmu_role.base.ad_disabled = !accessed_dirty;
 		vmcs12->ept_pointer = new_eptp;
-		/*
-		 * TODO: Check what's the correct approach in case
-		 * mmu reload fails. Currently, we just let the next
-		 * reload potentially fail
-		 */
-		kvm_mmu_reload(vcpu);
+
+		kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
 	}
 
 	return 0;
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 12/15] KVM: x86: Defer the MMU unload to the normal path on an global INVPCID
  2021-03-02 18:45 [PATCH 00/15] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
                   ` (10 preceding siblings ...)
  2021-03-02 18:45 ` [PATCH 11/15] KVM: nVMX: Defer the MMU reload to the normal path on an EPTP switch Sean Christopherson
@ 2021-03-02 18:45 ` Sean Christopherson
  2021-03-02 18:45 ` [PATCH 13/15] KVM: x86/mmu: Unexport MMU load/unload functions Sean Christopherson
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: Sean Christopherson @ 2021-03-02 18:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Defer unloading the MMU after a INVPCID until the instruction emulation
has completed, i.e. until after RIP has been updated.

On VMX, this is a benign bug as VMX doesn't touch the MMU when skipping
an emulated instruction.  However, on SVM, if nrip is disabled, the
emulator is used to skip an instruction, which would lead to fireworks
if the emulator were invoked without a valid MMU.

Fixes: eb4b248e152d ("kvm: vmx: Support INVPCID in shadow paging mode")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 828de7d65074..7b0adebec1ef 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11531,7 +11531,7 @@ int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva)
 
 		fallthrough;
 	case INVPCID_TYPE_ALL_INCL_GLOBAL:
-		kvm_mmu_unload(vcpu);
+		kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
 		return kvm_skip_emulated_instruction(vcpu);
 
 	default:
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 13/15] KVM: x86/mmu: Unexport MMU load/unload functions
  2021-03-02 18:45 [PATCH 00/15] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
                   ` (11 preceding siblings ...)
  2021-03-02 18:45 ` [PATCH 12/15] KVM: x86: Defer the MMU unload to the normal path on an global INVPCID Sean Christopherson
@ 2021-03-02 18:45 ` Sean Christopherson
  2021-03-02 18:45 ` [PATCH 14/15] KVM: x86/mmu: Sync roots after MMU load iff load as successful Sean Christopherson
  2021-03-02 18:45 ` [PATCH 15/15] KVM: x86/mmu: WARN on NULL pae_root and bad shadow root level Sean Christopherson
  14 siblings, 0 replies; 19+ messages in thread
From: Sean Christopherson @ 2021-03-02 18:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Unexport the MMU load and unload helpers now that they are no longer
used (incorrectly) in vendor code.

Opportunistically move the kvm_mmu_sync_roots() declaration into mmu.h,
it should not be exposed to vendor code.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h | 3 ---
 arch/x86/kvm/mmu.h              | 4 ++++
 arch/x86/kvm/mmu/mmu.c          | 2 --
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6db60ea8ee5b..2da6c9f5935a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1592,9 +1592,6 @@ void kvm_update_dr7(struct kvm_vcpu *vcpu);
 
 int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn);
 void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu);
-int kvm_mmu_load(struct kvm_vcpu *vcpu);
-void kvm_mmu_unload(struct kvm_vcpu *vcpu);
-void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu);
 void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 			ulong roots_to_free);
 gpa_t translate_nested_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access,
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 72b0f66073dc..67e8c7c7a6ce 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -74,6 +74,10 @@ bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu);
 int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
 				u64 fault_address, char *insn, int insn_len);
 
+int kvm_mmu_load(struct kvm_vcpu *vcpu);
+void kvm_mmu_unload(struct kvm_vcpu *vcpu);
+void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu);
+
 static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu)
 {
 	if (likely(vcpu->arch.mmu->root_hpa != INVALID_PAGE))
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 45fe97b3b25d..86432d6a4092 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4827,7 +4827,6 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 out:
 	return r;
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_load);
 
 void kvm_mmu_unload(struct kvm_vcpu *vcpu)
 {
@@ -4836,7 +4835,6 @@ void kvm_mmu_unload(struct kvm_vcpu *vcpu)
 	kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL);
 	WARN_ON(VALID_PAGE(vcpu->arch.guest_mmu.root_hpa));
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_unload);
 
 static bool need_remote_flush(u64 old, u64 new)
 {
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 14/15] KVM: x86/mmu: Sync roots after MMU load iff load as successful
  2021-03-02 18:45 [PATCH 00/15] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
                   ` (12 preceding siblings ...)
  2021-03-02 18:45 ` [PATCH 13/15] KVM: x86/mmu: Unexport MMU load/unload functions Sean Christopherson
@ 2021-03-02 18:45 ` Sean Christopherson
  2021-03-02 18:45 ` [PATCH 15/15] KVM: x86/mmu: WARN on NULL pae_root and bad shadow root level Sean Christopherson
  14 siblings, 0 replies; 19+ messages in thread
From: Sean Christopherson @ 2021-03-02 18:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

For clarity, explicitly skip syncing roots if the MMU load failed
instead of relying on the !VALID_PAGE check in kvm_mmu_sync_roots().

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 86432d6a4092..34eeb39ee0f9 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4818,10 +4818,11 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 	else
 		r = mmu_alloc_shadow_roots(vcpu);
 	write_unlock(&vcpu->kvm->mmu_lock);
+	if (r)
+		goto out;
 
 	kvm_mmu_sync_roots(vcpu);
-	if (r)
-		goto out;
+
 	kvm_mmu_load_pgd(vcpu);
 	static_call(kvm_x86_tlb_flush_current)(vcpu);
 out:
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 15/15] KVM: x86/mmu: WARN on NULL pae_root and bad shadow root level
  2021-03-02 18:45 [PATCH 00/15] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
                   ` (13 preceding siblings ...)
  2021-03-02 18:45 ` [PATCH 14/15] KVM: x86/mmu: Sync roots after MMU load iff load as successful Sean Christopherson
@ 2021-03-02 18:45 ` Sean Christopherson
  14 siblings, 0 replies; 19+ messages in thread
From: Sean Christopherson @ 2021-03-02 18:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

WARN if KVM is about to derference a NULL pae_root when loading an MMU,
and convert the BUG() on a bad shadow_root_level into a WARN (now that
errors are handled cleanly).  With nested NPT, botching the level and
sending KVM down the wrong path is all too easy, and the on-demand
allocation of pae_root means bugs crash the host.  Obviously, KVM could
unconditionally allocate pae_root, but that's arguably a worse failure
mode as it would potentially corrupt the guest instead of crashing it.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 34eeb39ee0f9..35f89bb1f205 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3247,6 +3247,9 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 				      true);
 		vcpu->arch.mmu->root_hpa = root;
 	} else if (shadow_root_level == PT32E_ROOT_LEVEL) {
+		if (WARN_ON_ONCE(!vcpu->arch.mmu->pae_root))
+			return -EIO;
+
 		for (i = 0; i < 4; ++i) {
 			WARN_ON_ONCE(vcpu->arch.mmu->pae_root[i]);
 
@@ -3256,8 +3259,10 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 						      shadow_me_mask;
 		}
 		vcpu->arch.mmu->root_hpa = __pa(vcpu->arch.mmu->pae_root);
-	} else
-		BUG();
+	} else {
+		WARN_ONCE(1, "Bad TDP root level = %d\n", shadow_root_level);
+		return -EIO;
+	}
 
 	/* root_pgd is ignored for direct MMUs. */
 	vcpu->arch.mmu->root_pgd = 0;
@@ -3340,6 +3345,8 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 		vcpu->arch.mmu->lm_root = lm_root;
 
 		lm_root[0] = __pa(vcpu->arch.mmu->pae_root) | pm_mask;
+	} else if (WARN_ON_ONCE(!vcpu->arch.mmu->pae_root)) {
+		return -EIO;
 	}
 
 	for (i = 0; i < 4; ++i) {
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 03/15] KVM: x86/mmu: Ensure MMU pages are available when allocating roots
  2021-03-02 18:45 ` [PATCH 03/15] KVM: x86/mmu: Ensure MMU pages are available when allocating roots Sean Christopherson
@ 2021-03-03  0:21   ` Ben Gardon
  2021-03-03 16:46     ` Sean Christopherson
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Gardon @ 2021-03-03  0:21 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, LKML, Brijesh Singh, Tom Lendacky

On Tue, Mar 2, 2021 at 10:46 AM Sean Christopherson <seanjc@google.com> wrote:
>
> Hold the mmu_lock for write for the entire duration of allocating and
> initializing an MMU's roots.  This ensures there are MMU pages available
> and thus prevents root allocations from failing.  That in turn fixes a
> bug where KVM would fail to free valid PAE roots if a one of the later
> roots failed to allocate.
>
> Note, KVM still leaks the PAE roots if the lm_root allocation fails.
> This will be addressed in a future commit.
>
> Cc: Ben Gardon <bgardon@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Ben Gardon <bgardon@google.com>

Very tidy cleanup!

> ---
>  arch/x86/kvm/mmu/mmu.c     | 41 ++++++++++++--------------------------
>  arch/x86/kvm/mmu/tdp_mmu.c | 23 +++++----------------
>  2 files changed, 18 insertions(+), 46 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 2ed3fac1244e..1f129001a30c 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2398,6 +2398,9 @@ static int make_mmu_pages_available(struct kvm_vcpu *vcpu)
>  {
>         unsigned long avail = kvm_mmu_available_pages(vcpu->kvm);
>
> +       /* Ensure all four PAE roots can be allocated in a single pass. */
> +       BUILD_BUG_ON(KVM_MIN_FREE_MMU_PAGES < 4);
> +

For a second I thought that this should be 5 since a page is needed to
hold the 4 PAE roots, but that page is allocated at vCPU creation and
reused, so no need to check for it here.

>         if (likely(avail >= KVM_MIN_FREE_MMU_PAGES))
>                 return 0;
>
> @@ -3220,16 +3223,9 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, gva_t gva,
>  {
>         struct kvm_mmu_page *sp;
>
> -       write_lock(&vcpu->kvm->mmu_lock);
> -
> -       if (make_mmu_pages_available(vcpu)) {
> -               write_unlock(&vcpu->kvm->mmu_lock);
> -               return INVALID_PAGE;
> -       }
>         sp = kvm_mmu_get_page(vcpu, gfn, gva, level, direct, ACC_ALL);
>         ++sp->root_count;
>
> -       write_unlock(&vcpu->kvm->mmu_lock);
>         return __pa(sp->spt);
>  }
>
> @@ -3241,16 +3237,10 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
>
>         if (is_tdp_mmu_enabled(vcpu->kvm)) {
>                 root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu);
> -
> -               if (!VALID_PAGE(root))
> -                       return -ENOSPC;
>                 vcpu->arch.mmu->root_hpa = root;
>         } else if (shadow_root_level >= PT64_ROOT_4LEVEL) {
>                 root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level,
>                                       true);
> -
> -               if (!VALID_PAGE(root))
> -                       return -ENOSPC;

There's so much going on in mmu_alloc_root that removing this check
makes me nervous, but I think it should be safe.
I checked though the function because I was worried it might yield
somewhere in there, which could result in the page cache being emptied
and the allocation failing, but I don't think mmu_alloc_root this
function will yield.

>                 vcpu->arch.mmu->root_hpa = root;
>         } else if (shadow_root_level == PT32E_ROOT_LEVEL) {
>                 for (i = 0; i < 4; ++i) {
> @@ -3258,8 +3248,6 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
>
>                         root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT),
>                                               i << 30, PT32_ROOT_LEVEL, true);
> -                       if (!VALID_PAGE(root))
> -                               return -ENOSPC;
>                         vcpu->arch.mmu->pae_root[i] = root | PT_PRESENT_MASK;
>                 }
>                 vcpu->arch.mmu->root_hpa = __pa(vcpu->arch.mmu->pae_root);
> @@ -3294,8 +3282,6 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
>
>                 root = mmu_alloc_root(vcpu, root_gfn, 0,
>                                       vcpu->arch.mmu->shadow_root_level, false);
> -               if (!VALID_PAGE(root))
> -                       return -ENOSPC;
>                 vcpu->arch.mmu->root_hpa = root;
>                 goto set_root_pgd;
>         }
> @@ -3325,6 +3311,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
>
>         for (i = 0; i < 4; ++i) {
>                 MMU_WARN_ON(VALID_PAGE(vcpu->arch.mmu->pae_root[i]));
> +
>                 if (vcpu->arch.mmu->root_level == PT32E_ROOT_LEVEL) {
>                         pdptr = vcpu->arch.mmu->get_pdptr(vcpu, i);
>                         if (!(pdptr & PT_PRESENT_MASK)) {
> @@ -3338,8 +3325,6 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
>
>                 root = mmu_alloc_root(vcpu, root_gfn, i << 30,
>                                       PT32_ROOT_LEVEL, false);
> -               if (!VALID_PAGE(root))
> -                       return -ENOSPC;
>                 vcpu->arch.mmu->pae_root[i] = root | pm_mask;
>         }
>         vcpu->arch.mmu->root_hpa = __pa(vcpu->arch.mmu->pae_root);
> @@ -3373,14 +3358,6 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
>         return 0;
>  }
>
> -static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
> -{
> -       if (vcpu->arch.mmu->direct_map)
> -               return mmu_alloc_direct_roots(vcpu);
> -       else
> -               return mmu_alloc_shadow_roots(vcpu);
> -}
> -
>  void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
>  {
>         int i;
> @@ -4822,7 +4799,15 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
>         r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->direct_map);
>         if (r)
>                 goto out;
> -       r = mmu_alloc_roots(vcpu);
> +       write_lock(&vcpu->kvm->mmu_lock);
> +       if (make_mmu_pages_available(vcpu))
> +               r = -ENOSPC;
> +       else if (vcpu->arch.mmu->direct_map)
> +               r = mmu_alloc_direct_roots(vcpu);
> +       else
> +               r = mmu_alloc_shadow_roots(vcpu);
> +       write_unlock(&vcpu->kvm->mmu_lock);
> +
>         kvm_mmu_sync_roots(vcpu);
>         if (r)
>                 goto out;
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 70226e0875fe..50ef757c5586 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -137,22 +137,21 @@ static struct kvm_mmu_page *alloc_tdp_mmu_page(struct kvm_vcpu *vcpu, gfn_t gfn,
>         return sp;
>  }
>
> -static struct kvm_mmu_page *get_tdp_mmu_vcpu_root(struct kvm_vcpu *vcpu)
> +hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
>  {
>         union kvm_mmu_page_role role;
>         struct kvm *kvm = vcpu->kvm;
>         struct kvm_mmu_page *root;
>
> +       lockdep_assert_held_write(&kvm->mmu_lock);
> +
>         role = page_role_for_level(vcpu, vcpu->arch.mmu->shadow_root_level);
>
> -       write_lock(&kvm->mmu_lock);
> -
>         /* Check for an existing root before allocating a new one. */
>         for_each_tdp_mmu_root(kvm, root) {
>                 if (root->role.word == role.word) {
>                         kvm_mmu_get_root(kvm, root);
> -                       write_unlock(&kvm->mmu_lock);
> -                       return root;
> +                       goto out;
>                 }
>         }
>
> @@ -161,19 +160,7 @@ static struct kvm_mmu_page *get_tdp_mmu_vcpu_root(struct kvm_vcpu *vcpu)
>
>         list_add(&root->link, &kvm->arch.tdp_mmu_roots);
>
> -       write_unlock(&kvm->mmu_lock);
> -
> -       return root;
> -}
> -
> -hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
> -{
> -       struct kvm_mmu_page *root;
> -
> -       root = get_tdp_mmu_vcpu_root(vcpu);
> -       if (!root)
> -               return INVALID_PAGE;
> -
> +out:
>         return __pa(root->spt);
>  }
>
> --
> 2.30.1.766.gb4fecdf3b7-goog
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 03/15] KVM: x86/mmu: Ensure MMU pages are available when allocating roots
  2021-03-03  0:21   ` Ben Gardon
@ 2021-03-03 16:46     ` Sean Christopherson
  0 siblings, 0 replies; 19+ messages in thread
From: Sean Christopherson @ 2021-03-03 16:46 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, LKML, Brijesh Singh, Tom Lendacky

On Tue, Mar 02, 2021, Ben Gardon wrote:
> > @@ -3241,16 +3237,10 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
> >
> >         if (is_tdp_mmu_enabled(vcpu->kvm)) {
> >                 root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu);
> > -
> > -               if (!VALID_PAGE(root))
> > -                       return -ENOSPC;
> >                 vcpu->arch.mmu->root_hpa = root;
> >         } else if (shadow_root_level >= PT64_ROOT_4LEVEL) {
> >                 root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level,
> >                                       true);
> > -
> > -               if (!VALID_PAGE(root))
> > -                       return -ENOSPC;
> 
> There's so much going on in mmu_alloc_root that removing this check
> makes me nervous, but I think it should be safe.

Just think of it as a variant of kvm_mmu_get_page(), then all your fears will
melt away. ;-)

> I checked though the function because I was worried it might yield
> somewhere in there, which could result in the page cache being emptied
> and the allocation failing, but I don't think mmu_alloc_root this
> function will yield.

Ugh, mmu_alloc_root() won't yield, but get_zeroed_page() used to allocate pae_root
and lm_root on-demand in mmu_alloc_shadow_roots() will.  The two options are
(a) allocate the fake roots before taking the lock and (b) allocate them from
vcpu->arch.mmu_shadow_page_cache.  I probably prefer (a).  (b) will slide
directly into the existing code, but would require bumping the min number of
objects for mmu_shadow_page_cache in mmu_topup_memory_caches().  I'd prefer not
to have yet more code that has to deal with this insanity.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 02/15] KVM: x86/mmu: Alloc page for PDPTEs when shadowing 32-bit NPT with 64-bit
  2021-03-02 18:45 ` [PATCH 02/15] KVM: x86/mmu: Alloc page for PDPTEs when shadowing 32-bit NPT with 64-bit Sean Christopherson
@ 2021-03-03 17:28   ` Ben Gardon
  0 siblings, 0 replies; 19+ messages in thread
From: Ben Gardon @ 2021-03-03 17:28 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, LKML, Brijesh Singh, Tom Lendacky

On Tue, Mar 2, 2021 at 10:45 AM Sean Christopherson <seanjc@google.com> wrote:
>
> Allocate the so called pae_root page on-demand, along with the lm_root
> page, when shadowing 32-bit NPT with 64-bit NPT, i.e. when running a
> 32-bit L1.  KVM currently only allocates the page when NPT is disabled,
> or when L0 is 32-bit (using PAE paging).
>
> Note, there is an existing memory leak involving the MMU roots, as KVM
> fails to free the PAE roots on failure.  This will be addressed in a
> future commit.
>
> Fixes: ee6268ba3a68 ("KVM: x86: Skip pae_root shadow allocation if tdp enabled")
> Fixes: b6b80c78af83 ("KVM: x86/mmu: Allocate PAE root array when using SVM's 32-bit NPT")
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Ben Gardon <bgardon@google.com>

> ---
>  arch/x86/kvm/mmu/mmu.c | 44 ++++++++++++++++++++++++++++--------------
>  1 file changed, 29 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 0987cc1d53eb..2ed3fac1244e 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3187,14 +3187,14 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>                 if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL &&
>                     (mmu->root_level >= PT64_ROOT_4LEVEL || mmu->direct_map)) {
>                         mmu_free_root_page(kvm, &mmu->root_hpa, &invalid_list);
> -               } else {
> +               } else if (mmu->pae_root) {
>                         for (i = 0; i < 4; ++i)
>                                 if (mmu->pae_root[i] != 0)

I was about to comment on how weird this check is since pae_root can
also be INVALID_PAGE but that case is handled in mmu_free_root_page...
but then I realized that you're already addressing that problem in
patch 7.

>                                         mmu_free_root_page(kvm,
>                                                            &mmu->pae_root[i],
>                                                            &invalid_list);
> -                       mmu->root_hpa = INVALID_PAGE;
>                 }
> +               mmu->root_hpa = INVALID_PAGE;
>                 mmu->root_pgd = 0;
>         }
>
> @@ -3306,9 +3306,23 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
>          * the shadow page table may be a PAE or a long mode page table.
>          */
>         pm_mask = PT_PRESENT_MASK;
> -       if (vcpu->arch.mmu->shadow_root_level == PT64_ROOT_4LEVEL)
> +       if (vcpu->arch.mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
>                 pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK;
>
> +               /*
> +                * Allocate the page for the PDPTEs when shadowing 32-bit NPT
> +                * with 64-bit only when needed.  Unlike 32-bit NPT, it doesn't
> +                * need to be in low mem.  See also lm_root below.
> +                */
> +               if (!vcpu->arch.mmu->pae_root) {
> +                       WARN_ON_ONCE(!tdp_enabled);
> +
> +                       vcpu->arch.mmu->pae_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
> +                       if (!vcpu->arch.mmu->pae_root)
> +                               return -ENOMEM;
> +               }
> +       }
> +
>         for (i = 0; i < 4; ++i) {
>                 MMU_WARN_ON(VALID_PAGE(vcpu->arch.mmu->pae_root[i]));
>                 if (vcpu->arch.mmu->root_level == PT32E_ROOT_LEVEL) {
> @@ -3331,21 +3345,19 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
>         vcpu->arch.mmu->root_hpa = __pa(vcpu->arch.mmu->pae_root);
>
>         /*
> -        * If we shadow a 32 bit page table with a long mode page
> -        * table we enter this path.
> +        * When shadowing 32-bit or PAE NPT with 64-bit NPT, the PML4 and PDP
> +        * tables are allocated and initialized at MMU creation as there is no
> +        * equivalent level in the guest's NPT to shadow.  Allocate the tables
> +        * on demand, as running a 32-bit L1 VMM is very rare.  The PDP is
> +        * handled above (to share logic with PAE), deal with the PML4 here.
>          */
>         if (vcpu->arch.mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
>                 if (vcpu->arch.mmu->lm_root == NULL) {
> -                       /*
> -                        * The additional page necessary for this is only
> -                        * allocated on demand.
> -                        */
> -
>                         u64 *lm_root;
>
>                         lm_root = (void*)get_zeroed_page(GFP_KERNEL_ACCOUNT);
> -                       if (lm_root == NULL)
> -                               return 1;
> +                       if (!lm_root)
> +                               return -ENOMEM;
>
>                         lm_root[0] = __pa(vcpu->arch.mmu->pae_root) | pm_mask;
>
> @@ -5248,9 +5260,11 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
>          * while the PDP table is a per-vCPU construct that's allocated at MMU
>          * creation.  When emulating 32-bit mode, cr3 is only 32 bits even on
>          * x86_64.  Therefore we need to allocate the PDP table in the first
> -        * 4GB of memory, which happens to fit the DMA32 zone.  Except for
> -        * SVM's 32-bit NPT support, TDP paging doesn't use PAE paging and can
> -        * skip allocating the PDP table.
> +        * 4GB of memory, which happens to fit the DMA32 zone.  TDP paging
> +        * generally doesn't use PAE paging and can skip allocating the PDP
> +        * table.  The main exception, handled here, is SVM's 32-bit NPT.  The
> +        * other exception is for shadowing L1's 32-bit or PAE NPT on 64-bit
> +        * KVM; that horror is handled on-demand by mmu_alloc_shadow_roots().
>          */
>         if (tdp_enabled && kvm_mmu_get_tdp_level(vcpu) > PT32E_ROOT_LEVEL)
>                 return 0;
> --
> 2.30.1.766.gb4fecdf3b7-goog
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-03-03 20:56 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-02 18:45 [PATCH 00/15] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
2021-03-02 18:45 ` [PATCH 01/15] KVM: nSVM: Set the shadow root level to the TDP level for nested NPT Sean Christopherson
2021-03-02 18:45 ` [PATCH 02/15] KVM: x86/mmu: Alloc page for PDPTEs when shadowing 32-bit NPT with 64-bit Sean Christopherson
2021-03-03 17:28   ` Ben Gardon
2021-03-02 18:45 ` [PATCH 03/15] KVM: x86/mmu: Ensure MMU pages are available when allocating roots Sean Christopherson
2021-03-03  0:21   ` Ben Gardon
2021-03-03 16:46     ` Sean Christopherson
2021-03-02 18:45 ` [PATCH 04/15] KVM: x86/mmu: Allocate the lm_root before allocating PAE roots Sean Christopherson
2021-03-02 18:45 ` [PATCH 05/15] KVM: x86/mmu: Check PDPTRs " Sean Christopherson
2021-03-02 18:45 ` [PATCH 06/15] KVM: x86/mmu: Fix and unconditionally enable WARNs to detect PAE leaks Sean Christopherson
2021-03-02 18:45 ` [PATCH 07/15] KVM: x86/mmu: Use '0' as the one and only value for an invalid PAE root Sean Christopherson
2021-03-02 18:45 ` [PATCH 08/15] KVM: x86/mmu: Set the C-bit in the PDPTRs and LM pseudo-PDPTRs Sean Christopherson
2021-03-02 18:45 ` [PATCH 09/15] KVM: x86/mmu: Mark the PAE roots as decrypted for shadow paging Sean Christopherson
2021-03-02 18:45 ` [PATCH 10/15] KVM: SVM: Don't strip the C-bit from CR2 on #PF interception Sean Christopherson
2021-03-02 18:45 ` [PATCH 11/15] KVM: nVMX: Defer the MMU reload to the normal path on an EPTP switch Sean Christopherson
2021-03-02 18:45 ` [PATCH 12/15] KVM: x86: Defer the MMU unload to the normal path on an global INVPCID Sean Christopherson
2021-03-02 18:45 ` [PATCH 13/15] KVM: x86/mmu: Unexport MMU load/unload functions Sean Christopherson
2021-03-02 18:45 ` [PATCH 14/15] KVM: x86/mmu: Sync roots after MMU load iff load as successful Sean Christopherson
2021-03-02 18:45 ` [PATCH 15/15] KVM: x86/mmu: WARN on NULL pae_root and bad shadow root level Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).