kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/17] KVM: x86/mmu: Lots of bug fixes
@ 2021-03-05  1:10 Sean Christopherson
  2021-03-05  1:10 ` [PATCH v2 01/17] KVM: nSVM: Set the shadow root level to the TDP level for nested NPT Sean Christopherson
                   ` (17 more replies)
  0 siblings, 18 replies; 31+ messages in thread
From: Sean Christopherson @ 2021-03-05  1:10 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Fix nested NPT (nSVM) with 32-bit L1 and SME with shadow paging, which
are completely broken.  Opportunistically fix theoretical bugs related to
prematurely reloading/unloading the MMU.

If nNPT is enabled, L1 can crash the host simply by using 32-bit NPT to
trigger a null pointer dereference on pae_root.

SME with shadow paging (including nNPT) fails to set the C-bit in the
shadow pages that don't go through standard MMU flows (PDPTPRs and the
PML4 used by nNPT to shadow legacy NPT).  It also failes to account for
CR3[63:32], and thus the C-bit, being ignored outside of 64-bit mode.

Patches 01 and 02 fix the null pointer bugs.

Patches 03-09 fix mostly-benign related memory leaks.

Patches 10-12 fix the SME shadow paging bugs, which are also what led me to
the nNPT null pointer bugs.

Patches 13 and 14 fix theoretical bugs with PTP_SWITCH and INVPCID that
I found when auditing flows that touch the MMU context.

Patches 14-17 do additional clean up to hopefully make it harder to
introduce bugs in the future.

On the plus side, I finally understand why KVM supports shadowing 2-level
page tables with 4-level page tables...

Based on kvm/queue, commit fe5f0041c026 ("KVM/SVM: Move vmenter.S exception
fixups out of line").  The null pointer fixes cherry-pick cleanly onto
kvm/master, haven't tried the other bug fixes (I doubt they're worth
backporting even though I tagged 'em with stable).

v2:
  - Collect a review from Ben (did not include his review of patch 03
    since the patch and its direct dependencies were changed).
  - Move pae_root and lm_root allocation to a separate helper to avoid
    sleeping via get_zeroed_page() while holding mmu_lock.
  - Add a patch to grab 'mmu' in a local variable.
  - Remove the BUILD_BUG_ON() in make_mmu_pages_available() since the
    final check wouldn't actually guarnatee 4 pages were "available".
    Instead, add a comment about the limit being soft.

v1:
  - https://lkml.kernel.org/r/20210302184540.2829328-1-seanjc@google.com
 
Sean Christopherson (17):
  KVM: nSVM: Set the shadow root level to the TDP level for nested NPT
  KVM: x86/mmu: Alloc page for PDPTEs when shadowing 32-bit NPT with
    64-bit
  KVM: x86/mmu: Capture 'mmu' in a local variable when allocating roots
  KVM: x86/mmu: Allocate the lm_root before allocating PAE roots
  KVM: x86/mmu: Allocate pae_root and lm_root pages in dedicated helper
  KVM: x86/mmu: Ensure MMU pages are available when allocating roots
  KVM: x86/mmu: Check PDPTRs before allocating PAE roots
  KVM: x86/mmu: Fix and unconditionally enable WARNs to detect PAE leaks
  KVM: x86/mmu: Use '0' as the one and only value for an invalid PAE
    root
  KVM: x86/mmu: Set the C-bit in the PDPTRs and LM pseudo-PDPTRs
  KVM: x86/mmu: Mark the PAE roots as decrypted for shadow paging
  KVM: SVM: Don't strip the C-bit from CR2 on #PF interception
  KVM: nVMX: Defer the MMU reload to the normal path on an EPTP switch
  KVM: x86: Defer the MMU unload to the normal path on an global INVPCID
  KVM: x86/mmu: Unexport MMU load/unload functions
  KVM: x86/mmu: Sync roots after MMU load iff load as successful
  KVM: x86/mmu: WARN on NULL pae_root or lm_root, or bad shadow root
    level

 arch/x86/include/asm/kvm_host.h |   3 -
 arch/x86/kvm/mmu.h              |   4 +
 arch/x86/kvm/mmu/mmu.c          | 273 ++++++++++++++++++++------------
 arch/x86/kvm/mmu/tdp_mmu.c      |  23 +--
 arch/x86/kvm/svm/svm.c          |   9 +-
 arch/x86/kvm/vmx/nested.c       |   9 +-
 arch/x86/kvm/x86.c              |   2 +-
 7 files changed, 192 insertions(+), 131 deletions(-)

-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 01/17] KVM: nSVM: Set the shadow root level to the TDP level for nested NPT
  2021-03-05  1:10 [PATCH v2 00/17] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
@ 2021-03-05  1:10 ` Sean Christopherson
  2021-03-05  1:10 ` [PATCH v2 02/17] KVM: x86/mmu: Alloc page for PDPTEs when shadowing 32-bit NPT with 64-bit Sean Christopherson
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2021-03-05  1:10 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Override the shadow root level in the MMU context when configuring
NPT for shadowing nested NPT.  The level is always tied to the TDP level
of the host, not whatever level the guest happens to be using.

Fixes: 096586fda522 ("KVM: nSVM: Correctly set the shadow NPT root level in its MMU role")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c462062d36aa..0987cc1d53eb 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4618,12 +4618,17 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, u32 cr0, u32 cr4, u32 efer,
 	struct kvm_mmu *context = &vcpu->arch.guest_mmu;
 	union kvm_mmu_role new_role = kvm_calc_shadow_npt_root_page_role(vcpu);
 
-	context->shadow_root_level = new_role.base.level;
-
 	__kvm_mmu_new_pgd(vcpu, nested_cr3, new_role.base, false, false);
 
-	if (new_role.as_u64 != context->mmu_role.as_u64)
+	if (new_role.as_u64 != context->mmu_role.as_u64) {
 		shadow_mmu_init_context(vcpu, context, cr0, cr4, efer, new_role);
+
+		/*
+		 * Override the level set by the common init helper, nested TDP
+		 * always uses the host's TDP configuration.
+		 */
+		context->shadow_root_level = new_role.base.level;
+	}
 }
 EXPORT_SYMBOL_GPL(kvm_init_shadow_npt_mmu);
 
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 02/17] KVM: x86/mmu: Alloc page for PDPTEs when shadowing 32-bit NPT with 64-bit
  2021-03-05  1:10 [PATCH v2 00/17] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
  2021-03-05  1:10 ` [PATCH v2 01/17] KVM: nSVM: Set the shadow root level to the TDP level for nested NPT Sean Christopherson
@ 2021-03-05  1:10 ` Sean Christopherson
  2021-03-05  1:10 ` [PATCH v2 03/17] KVM: x86/mmu: Capture 'mmu' in a local variable when allocating roots Sean Christopherson
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2021-03-05  1:10 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Allocate the so called pae_root page on-demand, along with the lm_root
page, when shadowing 32-bit NPT with 64-bit NPT, i.e. when running a
32-bit L1.  KVM currently only allocates the page when NPT is disabled,
or when L0 is 32-bit (using PAE paging).

Note, there is an existing memory leak involving the MMU roots, as KVM
fails to free the PAE roots on failure.  This will be addressed in a
future commit.

Fixes: ee6268ba3a68 ("KVM: x86: Skip pae_root shadow allocation if tdp enabled")
Fixes: b6b80c78af83 ("KVM: x86/mmu: Allocate PAE root array when using SVM's 32-bit NPT")
Cc: stable@vger.kernel.org
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 44 ++++++++++++++++++++++++++++--------------
 1 file changed, 29 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0987cc1d53eb..2ed3fac1244e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3187,14 +3187,14 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 		if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL &&
 		    (mmu->root_level >= PT64_ROOT_4LEVEL || mmu->direct_map)) {
 			mmu_free_root_page(kvm, &mmu->root_hpa, &invalid_list);
-		} else {
+		} else if (mmu->pae_root) {
 			for (i = 0; i < 4; ++i)
 				if (mmu->pae_root[i] != 0)
 					mmu_free_root_page(kvm,
 							   &mmu->pae_root[i],
 							   &invalid_list);
-			mmu->root_hpa = INVALID_PAGE;
 		}
+		mmu->root_hpa = INVALID_PAGE;
 		mmu->root_pgd = 0;
 	}
 
@@ -3306,9 +3306,23 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	 * the shadow page table may be a PAE or a long mode page table.
 	 */
 	pm_mask = PT_PRESENT_MASK;
-	if (vcpu->arch.mmu->shadow_root_level == PT64_ROOT_4LEVEL)
+	if (vcpu->arch.mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
 		pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK;
 
+		/*
+		 * Allocate the page for the PDPTEs when shadowing 32-bit NPT
+		 * with 64-bit only when needed.  Unlike 32-bit NPT, it doesn't
+		 * need to be in low mem.  See also lm_root below.
+		 */
+		if (!vcpu->arch.mmu->pae_root) {
+			WARN_ON_ONCE(!tdp_enabled);
+
+			vcpu->arch.mmu->pae_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+			if (!vcpu->arch.mmu->pae_root)
+				return -ENOMEM;
+		}
+	}
+
 	for (i = 0; i < 4; ++i) {
 		MMU_WARN_ON(VALID_PAGE(vcpu->arch.mmu->pae_root[i]));
 		if (vcpu->arch.mmu->root_level == PT32E_ROOT_LEVEL) {
@@ -3331,21 +3345,19 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	vcpu->arch.mmu->root_hpa = __pa(vcpu->arch.mmu->pae_root);
 
 	/*
-	 * If we shadow a 32 bit page table with a long mode page
-	 * table we enter this path.
+	 * When shadowing 32-bit or PAE NPT with 64-bit NPT, the PML4 and PDP
+	 * tables are allocated and initialized at MMU creation as there is no
+	 * equivalent level in the guest's NPT to shadow.  Allocate the tables
+	 * on demand, as running a 32-bit L1 VMM is very rare.  The PDP is
+	 * handled above (to share logic with PAE), deal with the PML4 here.
 	 */
 	if (vcpu->arch.mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
 		if (vcpu->arch.mmu->lm_root == NULL) {
-			/*
-			 * The additional page necessary for this is only
-			 * allocated on demand.
-			 */
-
 			u64 *lm_root;
 
 			lm_root = (void*)get_zeroed_page(GFP_KERNEL_ACCOUNT);
-			if (lm_root == NULL)
-				return 1;
+			if (!lm_root)
+				return -ENOMEM;
 
 			lm_root[0] = __pa(vcpu->arch.mmu->pae_root) | pm_mask;
 
@@ -5248,9 +5260,11 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
 	 * while the PDP table is a per-vCPU construct that's allocated at MMU
 	 * creation.  When emulating 32-bit mode, cr3 is only 32 bits even on
 	 * x86_64.  Therefore we need to allocate the PDP table in the first
-	 * 4GB of memory, which happens to fit the DMA32 zone.  Except for
-	 * SVM's 32-bit NPT support, TDP paging doesn't use PAE paging and can
-	 * skip allocating the PDP table.
+	 * 4GB of memory, which happens to fit the DMA32 zone.  TDP paging
+	 * generally doesn't use PAE paging and can skip allocating the PDP
+	 * table.  The main exception, handled here, is SVM's 32-bit NPT.  The
+	 * other exception is for shadowing L1's 32-bit or PAE NPT on 64-bit
+	 * KVM; that horror is handled on-demand by mmu_alloc_shadow_roots().
 	 */
 	if (tdp_enabled && kvm_mmu_get_tdp_level(vcpu) > PT32E_ROOT_LEVEL)
 		return 0;
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 03/17] KVM: x86/mmu: Capture 'mmu' in a local variable when allocating roots
  2021-03-05  1:10 [PATCH v2 00/17] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
  2021-03-05  1:10 ` [PATCH v2 01/17] KVM: nSVM: Set the shadow root level to the TDP level for nested NPT Sean Christopherson
  2021-03-05  1:10 ` [PATCH v2 02/17] KVM: x86/mmu: Alloc page for PDPTEs when shadowing 32-bit NPT with 64-bit Sean Christopherson
@ 2021-03-05  1:10 ` Sean Christopherson
  2021-03-05  1:10 ` [PATCH v2 04/17] KVM: x86/mmu: Allocate the lm_root before allocating PAE roots Sean Christopherson
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2021-03-05  1:10 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Grab 'mmu' and do s/vcpu->arch.mmu/mmu to shorten line lengths and yield
smaller diffs when moving code around in future cleanup without forcing
the new code to use the same ugly pattern.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 58 ++++++++++++++++++++++--------------------
 1 file changed, 30 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2ed3fac1244e..c4f8e59f596c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3235,7 +3235,8 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, gva_t gva,
 
 static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 {
-	u8 shadow_root_level = vcpu->arch.mmu->shadow_root_level;
+	struct kvm_mmu *mmu = vcpu->arch.mmu;
+	u8 shadow_root_level = mmu->shadow_root_level;
 	hpa_t root;
 	unsigned i;
 
@@ -3244,42 +3245,43 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 
 		if (!VALID_PAGE(root))
 			return -ENOSPC;
-		vcpu->arch.mmu->root_hpa = root;
+		mmu->root_hpa = root;
 	} else if (shadow_root_level >= PT64_ROOT_4LEVEL) {
 		root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level,
 				      true);
 
 		if (!VALID_PAGE(root))
 			return -ENOSPC;
-		vcpu->arch.mmu->root_hpa = root;
+		mmu->root_hpa = root;
 	} else if (shadow_root_level == PT32E_ROOT_LEVEL) {
 		for (i = 0; i < 4; ++i) {
-			MMU_WARN_ON(VALID_PAGE(vcpu->arch.mmu->pae_root[i]));
+			MMU_WARN_ON(VALID_PAGE(mmu->pae_root[i]));
 
 			root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT),
 					      i << 30, PT32_ROOT_LEVEL, true);
 			if (!VALID_PAGE(root))
 				return -ENOSPC;
-			vcpu->arch.mmu->pae_root[i] = root | PT_PRESENT_MASK;
+			mmu->pae_root[i] = root | PT_PRESENT_MASK;
 		}
-		vcpu->arch.mmu->root_hpa = __pa(vcpu->arch.mmu->pae_root);
+		mmu->root_hpa = __pa(mmu->pae_root);
 	} else
 		BUG();
 
 	/* root_pgd is ignored for direct MMUs. */
-	vcpu->arch.mmu->root_pgd = 0;
+	mmu->root_pgd = 0;
 
 	return 0;
 }
 
 static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 {
+	struct kvm_mmu *mmu = vcpu->arch.mmu;
 	u64 pdptr, pm_mask;
 	gfn_t root_gfn, root_pgd;
 	hpa_t root;
 	int i;
 
-	root_pgd = vcpu->arch.mmu->get_guest_pgd(vcpu);
+	root_pgd = mmu->get_guest_pgd(vcpu);
 	root_gfn = root_pgd >> PAGE_SHIFT;
 
 	if (mmu_check_root(vcpu, root_gfn))
@@ -3289,14 +3291,14 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	 * Do we shadow a long mode page table? If so we need to
 	 * write-protect the guests page table root.
 	 */
-	if (vcpu->arch.mmu->root_level >= PT64_ROOT_4LEVEL) {
-		MMU_WARN_ON(VALID_PAGE(vcpu->arch.mmu->root_hpa));
+	if (mmu->root_level >= PT64_ROOT_4LEVEL) {
+		MMU_WARN_ON(VALID_PAGE(mmu->root_hpa));
 
 		root = mmu_alloc_root(vcpu, root_gfn, 0,
-				      vcpu->arch.mmu->shadow_root_level, false);
+				      mmu->shadow_root_level, false);
 		if (!VALID_PAGE(root))
 			return -ENOSPC;
-		vcpu->arch.mmu->root_hpa = root;
+		mmu->root_hpa = root;
 		goto set_root_pgd;
 	}
 
@@ -3306,7 +3308,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	 * the shadow page table may be a PAE or a long mode page table.
 	 */
 	pm_mask = PT_PRESENT_MASK;
-	if (vcpu->arch.mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
+	if (mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
 		pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK;
 
 		/*
@@ -3314,21 +3316,21 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 		 * with 64-bit only when needed.  Unlike 32-bit NPT, it doesn't
 		 * need to be in low mem.  See also lm_root below.
 		 */
-		if (!vcpu->arch.mmu->pae_root) {
+		if (!mmu->pae_root) {
 			WARN_ON_ONCE(!tdp_enabled);
 
-			vcpu->arch.mmu->pae_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
-			if (!vcpu->arch.mmu->pae_root)
+			mmu->pae_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+			if (!mmu->pae_root)
 				return -ENOMEM;
 		}
 	}
 
 	for (i = 0; i < 4; ++i) {
-		MMU_WARN_ON(VALID_PAGE(vcpu->arch.mmu->pae_root[i]));
-		if (vcpu->arch.mmu->root_level == PT32E_ROOT_LEVEL) {
-			pdptr = vcpu->arch.mmu->get_pdptr(vcpu, i);
+		MMU_WARN_ON(VALID_PAGE(mmu->pae_root[i]));
+		if (mmu->root_level == PT32E_ROOT_LEVEL) {
+			pdptr = mmu->get_pdptr(vcpu, i);
 			if (!(pdptr & PT_PRESENT_MASK)) {
-				vcpu->arch.mmu->pae_root[i] = 0;
+				mmu->pae_root[i] = 0;
 				continue;
 			}
 			root_gfn = pdptr >> PAGE_SHIFT;
@@ -3340,9 +3342,9 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 				      PT32_ROOT_LEVEL, false);
 		if (!VALID_PAGE(root))
 			return -ENOSPC;
-		vcpu->arch.mmu->pae_root[i] = root | pm_mask;
+		mmu->pae_root[i] = root | pm_mask;
 	}
-	vcpu->arch.mmu->root_hpa = __pa(vcpu->arch.mmu->pae_root);
+	mmu->root_hpa = __pa(mmu->pae_root);
 
 	/*
 	 * When shadowing 32-bit or PAE NPT with 64-bit NPT, the PML4 and PDP
@@ -3351,24 +3353,24 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	 * on demand, as running a 32-bit L1 VMM is very rare.  The PDP is
 	 * handled above (to share logic with PAE), deal with the PML4 here.
 	 */
-	if (vcpu->arch.mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
-		if (vcpu->arch.mmu->lm_root == NULL) {
+	if (mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
+		if (mmu->lm_root == NULL) {
 			u64 *lm_root;
 
 			lm_root = (void*)get_zeroed_page(GFP_KERNEL_ACCOUNT);
 			if (!lm_root)
 				return -ENOMEM;
 
-			lm_root[0] = __pa(vcpu->arch.mmu->pae_root) | pm_mask;
+			lm_root[0] = __pa(mmu->pae_root) | pm_mask;
 
-			vcpu->arch.mmu->lm_root = lm_root;
+			mmu->lm_root = lm_root;
 		}
 
-		vcpu->arch.mmu->root_hpa = __pa(vcpu->arch.mmu->lm_root);
+		mmu->root_hpa = __pa(mmu->lm_root);
 	}
 
 set_root_pgd:
-	vcpu->arch.mmu->root_pgd = root_pgd;
+	mmu->root_pgd = root_pgd;
 
 	return 0;
 }
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 04/17] KVM: x86/mmu: Allocate the lm_root before allocating PAE roots
  2021-03-05  1:10 [PATCH v2 00/17] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
                   ` (2 preceding siblings ...)
  2021-03-05  1:10 ` [PATCH v2 03/17] KVM: x86/mmu: Capture 'mmu' in a local variable when allocating roots Sean Christopherson
@ 2021-03-05  1:10 ` Sean Christopherson
  2021-03-05  1:10 ` [PATCH v2 05/17] KVM: x86/mmu: Allocate pae_root and lm_root pages in dedicated helper Sean Christopherson
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2021-03-05  1:10 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Allocate lm_root before the PAE roots so that the PAE roots aren't
leaked if the memory allocation for the lm_root happens to fail.

Note, KVM can still leak PAE roots if mmu_check_root() fails on a guest's
PDPTR, or if mmu_alloc_root() fails due to MMU pages not being available.
Those issues will be fixed in future commits.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 64 ++++++++++++++++++++----------------------
 1 file changed, 31 insertions(+), 33 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c4f8e59f596c..7cb5fb5d2d4d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3308,21 +3308,38 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	 * the shadow page table may be a PAE or a long mode page table.
 	 */
 	pm_mask = PT_PRESENT_MASK;
-	if (mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
+	if (mmu->shadow_root_level == PT64_ROOT_4LEVEL)
 		pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK;
 
-		/*
-		 * Allocate the page for the PDPTEs when shadowing 32-bit NPT
-		 * with 64-bit only when needed.  Unlike 32-bit NPT, it doesn't
-		 * need to be in low mem.  See also lm_root below.
-		 */
-		if (!mmu->pae_root) {
-			WARN_ON_ONCE(!tdp_enabled);
+	/*
+	 * When shadowing 32-bit or PAE NPT with 64-bit NPT, the PML4 and PDP
+	 * tables are allocated and initialized at root creation as there is no
+	 * equivalent level in the guest's NPT to shadow.  Allocate the tables
+	 * on demand, as running a 32-bit L1 VMM is very rare.  Unlike 32-bit
+	 * NPT, the PDP table doesn't need to be in low mem.  Preallocate the
+	 * pages so that the PAE roots aren't leaked on failure.
+	 */
+	if (mmu->shadow_root_level == PT64_ROOT_4LEVEL &&
+	    (!mmu->pae_root || !mmu->lm_root)) {
+		u64 *lm_root, *pae_root;
 
-			mmu->pae_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
-			if (!mmu->pae_root)
-				return -ENOMEM;
+		if (WARN_ON_ONCE(!tdp_enabled || mmu->pae_root || mmu->lm_root))
+			return -EIO;
+
+		pae_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+		if (!pae_root)
+			return -ENOMEM;
+
+		lm_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+		if (!lm_root) {
+			free_page((unsigned long)pae_root);
+			return -ENOMEM;
 		}
+
+		mmu->pae_root = pae_root;
+		mmu->lm_root = lm_root;
+
+		lm_root[0] = __pa(mmu->pae_root) | pm_mask;
 	}
 
 	for (i = 0; i < 4; ++i) {
@@ -3344,30 +3361,11 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 			return -ENOSPC;
 		mmu->pae_root[i] = root | pm_mask;
 	}
-	mmu->root_hpa = __pa(mmu->pae_root);
-
-	/*
-	 * When shadowing 32-bit or PAE NPT with 64-bit NPT, the PML4 and PDP
-	 * tables are allocated and initialized at MMU creation as there is no
-	 * equivalent level in the guest's NPT to shadow.  Allocate the tables
-	 * on demand, as running a 32-bit L1 VMM is very rare.  The PDP is
-	 * handled above (to share logic with PAE), deal with the PML4 here.
-	 */
-	if (mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
-		if (mmu->lm_root == NULL) {
-			u64 *lm_root;
-
-			lm_root = (void*)get_zeroed_page(GFP_KERNEL_ACCOUNT);
-			if (!lm_root)
-				return -ENOMEM;
-
-			lm_root[0] = __pa(mmu->pae_root) | pm_mask;
-
-			mmu->lm_root = lm_root;
-		}
 
+	if (mmu->shadow_root_level == PT64_ROOT_4LEVEL)
 		mmu->root_hpa = __pa(mmu->lm_root);
-	}
+	else
+		mmu->root_hpa = __pa(mmu->pae_root);
 
 set_root_pgd:
 	mmu->root_pgd = root_pgd;
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 05/17] KVM: x86/mmu: Allocate pae_root and lm_root pages in dedicated helper
  2021-03-05  1:10 [PATCH v2 00/17] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
                   ` (3 preceding siblings ...)
  2021-03-05  1:10 ` [PATCH v2 04/17] KVM: x86/mmu: Allocate the lm_root before allocating PAE roots Sean Christopherson
@ 2021-03-05  1:10 ` Sean Christopherson
  2021-03-05 17:34   ` Paolo Bonzini
  2021-03-05  1:10 ` [PATCH v2 06/17] KVM: x86/mmu: Ensure MMU pages are available when allocating roots Sean Christopherson
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2021-03-05  1:10 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Move the on-demand allocation of the pae_root and lm_root pages, used by
nested NPT for 32-bit L1s, into a separate helper.  This will allow a
future patch to hold mmu_lock while allocating the non-special roots so
that make_mmu_pages_available() can be checked once at the start of root
allocation, and thus avoid having to deal with failure in the middle of
root allocation.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 84 +++++++++++++++++++++++++++---------------
 1 file changed, 54 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7cb5fb5d2d4d..dd9d5cc13a46 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3308,38 +3308,10 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	 * the shadow page table may be a PAE or a long mode page table.
 	 */
 	pm_mask = PT_PRESENT_MASK;
-	if (mmu->shadow_root_level == PT64_ROOT_4LEVEL)
+	if (mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
 		pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK;
 
-	/*
-	 * When shadowing 32-bit or PAE NPT with 64-bit NPT, the PML4 and PDP
-	 * tables are allocated and initialized at root creation as there is no
-	 * equivalent level in the guest's NPT to shadow.  Allocate the tables
-	 * on demand, as running a 32-bit L1 VMM is very rare.  Unlike 32-bit
-	 * NPT, the PDP table doesn't need to be in low mem.  Preallocate the
-	 * pages so that the PAE roots aren't leaked on failure.
-	 */
-	if (mmu->shadow_root_level == PT64_ROOT_4LEVEL &&
-	    (!mmu->pae_root || !mmu->lm_root)) {
-		u64 *lm_root, *pae_root;
-
-		if (WARN_ON_ONCE(!tdp_enabled || mmu->pae_root || mmu->lm_root))
-			return -EIO;
-
-		pae_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
-		if (!pae_root)
-			return -ENOMEM;
-
-		lm_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
-		if (!lm_root) {
-			free_page((unsigned long)pae_root);
-			return -ENOMEM;
-		}
-
-		mmu->pae_root = pae_root;
-		mmu->lm_root = lm_root;
-
-		lm_root[0] = __pa(mmu->pae_root) | pm_mask;
+		mmu->lm_root[0] = __pa(mmu->pae_root) | pm_mask;
 	}
 
 	for (i = 0; i < 4; ++i) {
@@ -3373,6 +3345,55 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
+{
+	struct kvm_mmu *mmu = vcpu->arch.mmu;
+	u64 *lm_root, *pae_root;
+
+	/*
+	 * When shadowing 32-bit or PAE NPT with 64-bit NPT, the PML4 and PDP
+	 * tables are allocated and initialized at root creation as there is no
+	 * equivalent level in the guest's NPT to shadow.  Allocate the tables
+	 * on demand, as running a 32-bit L1 VMM on 64-bit KVM is very rare.
+	 */
+	if (mmu->direct_map || mmu->root_level >= PT64_ROOT_4LEVEL ||
+	    mmu->shadow_root_level < PT64_ROOT_4LEVEL)
+		return 0;
+
+	/*
+	 * This mess only works with 4-level paging and needs to be updated to
+	 * work with 5-level paging.
+	 */
+	if (WARN_ON_ONCE(mmu->shadow_root_level != PT64_ROOT_4LEVEL))
+		return -EIO;
+
+	if (mmu->pae_root && mmu->lm_root)
+		return 0;
+
+	/*
+	 * The special roots should always be allocated in concert.  Yell and
+	 * bail if KVM ends up in a state where only one of the roots is valid.
+	 */
+	if (WARN_ON_ONCE(!tdp_enabled || mmu->pae_root || mmu->lm_root))
+		return -EIO;
+
+	/* Unlike 32-bit NPT, the PDP table doesn't need to be in low mem. */
+	pae_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+	if (!pae_root)
+		return -ENOMEM;
+
+	lm_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+	if (!lm_root) {
+		free_page((unsigned long)pae_root);
+		return -ENOMEM;
+	}
+
+	mmu->pae_root = pae_root;
+	mmu->lm_root = lm_root;
+
+	return 0;
+}
+
 static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
 {
 	if (vcpu->arch.mmu->direct_map)
@@ -4820,6 +4841,9 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 	int r;
 
 	r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->direct_map);
+	if (r)
+		goto out;
+	r = mmu_alloc_special_roots(vcpu);
 	if (r)
 		goto out;
 	r = mmu_alloc_roots(vcpu);
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 06/17] KVM: x86/mmu: Ensure MMU pages are available when allocating roots
  2021-03-05  1:10 [PATCH v2 00/17] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
                   ` (4 preceding siblings ...)
  2021-03-05  1:10 ` [PATCH v2 05/17] KVM: x86/mmu: Allocate pae_root and lm_root pages in dedicated helper Sean Christopherson
@ 2021-03-05  1:10 ` Sean Christopherson
  2021-03-05  1:10 ` [PATCH v2 07/17] KVM: x86/mmu: Check PDPTRs before allocating PAE roots Sean Christopherson
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2021-03-05  1:10 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Hold the mmu_lock for write for the entire duration of allocating and
initializing an MMU's roots.  This ensures there are MMU pages available
and thus prevents root allocations from failing.  That in turn fixes a
bug where KVM would fail to free valid PAE roots if a one of the later
roots failed to allocate.

Add a comment to make_mmu_pages_available() to call out that the limit
is a soft limit, e.g. KVM will temporarily exceed the threshold if a
page fault allocates multiple shadow pages and there was only one page
"available".

Note, KVM _still_ leaks the PAE roots if the guest PDPTR checks fail.
This will be addressed in a future commit.

Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c     | 50 +++++++++++++++-----------------------
 arch/x86/kvm/mmu/tdp_mmu.c | 23 ++++--------------
 2 files changed, 25 insertions(+), 48 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index dd9d5cc13a46..7ebfbc77b050 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2403,6 +2403,15 @@ static int make_mmu_pages_available(struct kvm_vcpu *vcpu)
 
 	kvm_mmu_zap_oldest_mmu_pages(vcpu->kvm, KVM_REFILL_PAGES - avail);
 
+	/*
+	 * Note, this check is intentionally soft, it only guarantees that one
+	 * page is available, while the caller may end up allocating as many as
+	 * four pages, e.g. for PAE roots or for 5-level paging.  Temporarily
+	 * exceeding the (arbitrary by default) limit will not harm the host,
+	 * being too agressive may unnecessarily kill the guest, and getting an
+	 * exact count is far more trouble than it's worth, especially in the
+	 * page fault paths.
+	 */
 	if (!kvm_mmu_available_pages(vcpu->kvm))
 		return -ENOSPC;
 	return 0;
@@ -3220,16 +3229,9 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, gva_t gva,
 {
 	struct kvm_mmu_page *sp;
 
-	write_lock(&vcpu->kvm->mmu_lock);
-
-	if (make_mmu_pages_available(vcpu)) {
-		write_unlock(&vcpu->kvm->mmu_lock);
-		return INVALID_PAGE;
-	}
 	sp = kvm_mmu_get_page(vcpu, gfn, gva, level, direct, ACC_ALL);
 	++sp->root_count;
 
-	write_unlock(&vcpu->kvm->mmu_lock);
 	return __pa(sp->spt);
 }
 
@@ -3242,16 +3244,9 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 
 	if (is_tdp_mmu_enabled(vcpu->kvm)) {
 		root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu);
-
-		if (!VALID_PAGE(root))
-			return -ENOSPC;
 		mmu->root_hpa = root;
 	} else if (shadow_root_level >= PT64_ROOT_4LEVEL) {
-		root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level,
-				      true);
-
-		if (!VALID_PAGE(root))
-			return -ENOSPC;
+		root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level, true);
 		mmu->root_hpa = root;
 	} else if (shadow_root_level == PT32E_ROOT_LEVEL) {
 		for (i = 0; i < 4; ++i) {
@@ -3259,8 +3254,6 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 
 			root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT),
 					      i << 30, PT32_ROOT_LEVEL, true);
-			if (!VALID_PAGE(root))
-				return -ENOSPC;
 			mmu->pae_root[i] = root | PT_PRESENT_MASK;
 		}
 		mmu->root_hpa = __pa(mmu->pae_root);
@@ -3296,8 +3289,6 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 
 		root = mmu_alloc_root(vcpu, root_gfn, 0,
 				      mmu->shadow_root_level, false);
-		if (!VALID_PAGE(root))
-			return -ENOSPC;
 		mmu->root_hpa = root;
 		goto set_root_pgd;
 	}
@@ -3316,6 +3307,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 
 	for (i = 0; i < 4; ++i) {
 		MMU_WARN_ON(VALID_PAGE(mmu->pae_root[i]));
+
 		if (mmu->root_level == PT32E_ROOT_LEVEL) {
 			pdptr = mmu->get_pdptr(vcpu, i);
 			if (!(pdptr & PT_PRESENT_MASK)) {
@@ -3329,8 +3321,6 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 
 		root = mmu_alloc_root(vcpu, root_gfn, i << 30,
 				      PT32_ROOT_LEVEL, false);
-		if (!VALID_PAGE(root))
-			return -ENOSPC;
 		mmu->pae_root[i] = root | pm_mask;
 	}
 
@@ -3394,14 +3384,6 @@ static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
-static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
-{
-	if (vcpu->arch.mmu->direct_map)
-		return mmu_alloc_direct_roots(vcpu);
-	else
-		return mmu_alloc_shadow_roots(vcpu);
-}
-
 void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
 {
 	int i;
@@ -4846,7 +4828,15 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 	r = mmu_alloc_special_roots(vcpu);
 	if (r)
 		goto out;
-	r = mmu_alloc_roots(vcpu);
+	write_lock(&vcpu->kvm->mmu_lock);
+	if (make_mmu_pages_available(vcpu))
+		r = -ENOSPC;
+	else if (vcpu->arch.mmu->direct_map)
+		r = mmu_alloc_direct_roots(vcpu);
+	else
+		r = mmu_alloc_shadow_roots(vcpu);
+	write_unlock(&vcpu->kvm->mmu_lock);
+
 	kvm_mmu_sync_roots(vcpu);
 	if (r)
 		goto out;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 70226e0875fe..50ef757c5586 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -137,22 +137,21 @@ static struct kvm_mmu_page *alloc_tdp_mmu_page(struct kvm_vcpu *vcpu, gfn_t gfn,
 	return sp;
 }
 
-static struct kvm_mmu_page *get_tdp_mmu_vcpu_root(struct kvm_vcpu *vcpu)
+hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
 {
 	union kvm_mmu_page_role role;
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_mmu_page *root;
 
+	lockdep_assert_held_write(&kvm->mmu_lock);
+
 	role = page_role_for_level(vcpu, vcpu->arch.mmu->shadow_root_level);
 
-	write_lock(&kvm->mmu_lock);
-
 	/* Check for an existing root before allocating a new one. */
 	for_each_tdp_mmu_root(kvm, root) {
 		if (root->role.word == role.word) {
 			kvm_mmu_get_root(kvm, root);
-			write_unlock(&kvm->mmu_lock);
-			return root;
+			goto out;
 		}
 	}
 
@@ -161,19 +160,7 @@ static struct kvm_mmu_page *get_tdp_mmu_vcpu_root(struct kvm_vcpu *vcpu)
 
 	list_add(&root->link, &kvm->arch.tdp_mmu_roots);
 
-	write_unlock(&kvm->mmu_lock);
-
-	return root;
-}
-
-hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
-{
-	struct kvm_mmu_page *root;
-
-	root = get_tdp_mmu_vcpu_root(vcpu);
-	if (!root)
-		return INVALID_PAGE;
-
+out:
 	return __pa(root->spt);
 }
 
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 07/17] KVM: x86/mmu: Check PDPTRs before allocating PAE roots
  2021-03-05  1:10 [PATCH v2 00/17] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
                   ` (5 preceding siblings ...)
  2021-03-05  1:10 ` [PATCH v2 06/17] KVM: x86/mmu: Ensure MMU pages are available when allocating roots Sean Christopherson
@ 2021-03-05  1:10 ` Sean Christopherson
  2021-04-08 11:15   ` Wanpeng Li
  2021-03-05  1:10 ` [PATCH v2 08/17] KVM: x86/mmu: Fix and unconditionally enable WARNs to detect PAE leaks Sean Christopherson
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2021-03-05  1:10 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Check the validity of the PDPTRs before allocating any of the PAE roots,
otherwise a bad PDPTR will cause KVM to leak any previously allocated
roots.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7ebfbc77b050..9fc2b46f8541 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3269,7 +3269,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 {
 	struct kvm_mmu *mmu = vcpu->arch.mmu;
-	u64 pdptr, pm_mask;
+	u64 pdptrs[4], pm_mask;
 	gfn_t root_gfn, root_pgd;
 	hpa_t root;
 	int i;
@@ -3280,6 +3280,17 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	if (mmu_check_root(vcpu, root_gfn))
 		return 1;
 
+	if (mmu->root_level == PT32E_ROOT_LEVEL) {
+		for (i = 0; i < 4; ++i) {
+			pdptrs[i] = mmu->get_pdptr(vcpu, i);
+			if (!(pdptrs[i] & PT_PRESENT_MASK))
+				continue;
+
+			if (mmu_check_root(vcpu, pdptrs[i] >> PAGE_SHIFT))
+				return 1;
+		}
+	}
+
 	/*
 	 * Do we shadow a long mode page table? If so we need to
 	 * write-protect the guests page table root.
@@ -3309,14 +3320,11 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 		MMU_WARN_ON(VALID_PAGE(mmu->pae_root[i]));
 
 		if (mmu->root_level == PT32E_ROOT_LEVEL) {
-			pdptr = mmu->get_pdptr(vcpu, i);
-			if (!(pdptr & PT_PRESENT_MASK)) {
+			if (!(pdptrs[i] & PT_PRESENT_MASK)) {
 				mmu->pae_root[i] = 0;
 				continue;
 			}
-			root_gfn = pdptr >> PAGE_SHIFT;
-			if (mmu_check_root(vcpu, root_gfn))
-				return 1;
+			root_gfn = pdptrs[i] >> PAGE_SHIFT;
 		}
 
 		root = mmu_alloc_root(vcpu, root_gfn, i << 30,
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 08/17] KVM: x86/mmu: Fix and unconditionally enable WARNs to detect PAE leaks
  2021-03-05  1:10 [PATCH v2 00/17] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
                   ` (6 preceding siblings ...)
  2021-03-05  1:10 ` [PATCH v2 07/17] KVM: x86/mmu: Check PDPTRs before allocating PAE roots Sean Christopherson
@ 2021-03-05  1:10 ` Sean Christopherson
  2021-03-05  1:10 ` [PATCH v2 09/17] KVM: x86/mmu: Use '0' as the one and only value for an invalid PAE root Sean Christopherson
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2021-03-05  1:10 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Exempt NULL PAE roots from the check to detect leaks, since
kvm_mmu_free_roots() doesn't set them back to INVALID_PAGE.  Stop hiding
the WARNs to detect PAE root leaks behind MMU_WARN_ON, the hidden WARNs
obviously didn't do their job given the hilarious number of bugs that
could lead to PAE roots being leaked, not to mention the above false
positive.

Opportunistically delete a warning on root_hpa being valid, there's
nothing special about 4/5-level shadow pages that warrants a WARN.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 9fc2b46f8541..b82c1b0d6d6e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3250,7 +3250,8 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 		mmu->root_hpa = root;
 	} else if (shadow_root_level == PT32E_ROOT_LEVEL) {
 		for (i = 0; i < 4; ++i) {
-			MMU_WARN_ON(VALID_PAGE(mmu->pae_root[i]));
+			WARN_ON_ONCE(mmu->pae_root[i] &&
+				     VALID_PAGE(mmu->pae_root[i]));
 
 			root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT),
 					      i << 30, PT32_ROOT_LEVEL, true);
@@ -3296,8 +3297,6 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	 * write-protect the guests page table root.
 	 */
 	if (mmu->root_level >= PT64_ROOT_4LEVEL) {
-		MMU_WARN_ON(VALID_PAGE(mmu->root_hpa));
-
 		root = mmu_alloc_root(vcpu, root_gfn, 0,
 				      mmu->shadow_root_level, false);
 		mmu->root_hpa = root;
@@ -3317,7 +3316,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	}
 
 	for (i = 0; i < 4; ++i) {
-		MMU_WARN_ON(VALID_PAGE(mmu->pae_root[i]));
+		WARN_ON_ONCE(mmu->pae_root[i] && VALID_PAGE(mmu->pae_root[i]));
 
 		if (mmu->root_level == PT32E_ROOT_LEVEL) {
 			if (!(pdptrs[i] & PT_PRESENT_MASK)) {
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 09/17] KVM: x86/mmu: Use '0' as the one and only value for an invalid PAE root
  2021-03-05  1:10 [PATCH v2 00/17] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
                   ` (7 preceding siblings ...)
  2021-03-05  1:10 ` [PATCH v2 08/17] KVM: x86/mmu: Fix and unconditionally enable WARNs to detect PAE leaks Sean Christopherson
@ 2021-03-05  1:10 ` Sean Christopherson
  2021-03-05 17:52   ` Paolo Bonzini
  2021-03-05  1:10 ` [PATCH v2 10/17] KVM: x86/mmu: Set the C-bit in the PDPTRs and LM pseudo-PDPTRs Sean Christopherson
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2021-03-05  1:10 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Use '0' to denote an invalid pae_root instead of '0' or INVALID_PAGE.
Unlike root_hpa, the pae_roots hold permission bits and thus are
guaranteed to be non-zero.  Having to deal with both values leads to
bugs, e.g. failing to set back to INVALID_PAGE, warning on the wrong
value, etc...

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b82c1b0d6d6e..dbf7f0395e4b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3197,11 +3197,14 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 		    (mmu->root_level >= PT64_ROOT_4LEVEL || mmu->direct_map)) {
 			mmu_free_root_page(kvm, &mmu->root_hpa, &invalid_list);
 		} else if (mmu->pae_root) {
-			for (i = 0; i < 4; ++i)
-				if (mmu->pae_root[i] != 0)
-					mmu_free_root_page(kvm,
-							   &mmu->pae_root[i],
-							   &invalid_list);
+			for (i = 0; i < 4; ++i) {
+				if (!mmu->pae_root[i])
+					continue;
+
+				mmu_free_root_page(kvm, &mmu->pae_root[i],
+						   &invalid_list);
+				mmu->pae_root[i] = 0;
+			}
 		}
 		mmu->root_hpa = INVALID_PAGE;
 		mmu->root_pgd = 0;
@@ -3250,8 +3253,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 		mmu->root_hpa = root;
 	} else if (shadow_root_level == PT32E_ROOT_LEVEL) {
 		for (i = 0; i < 4; ++i) {
-			WARN_ON_ONCE(mmu->pae_root[i] &&
-				     VALID_PAGE(mmu->pae_root[i]));
+			WARN_ON_ONCE(mmu->pae_root[i]);
 
 			root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT),
 					      i << 30, PT32_ROOT_LEVEL, true);
@@ -3316,7 +3318,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	}
 
 	for (i = 0; i < 4; ++i) {
-		WARN_ON_ONCE(mmu->pae_root[i] && VALID_PAGE(mmu->pae_root[i]));
+		WARN_ON_ONCE(mmu->pae_root[i]);
 
 		if (mmu->root_level == PT32E_ROOT_LEVEL) {
 			if (!(pdptrs[i] & PT_PRESENT_MASK)) {
@@ -3438,7 +3440,7 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
 	for (i = 0; i < 4; ++i) {
 		hpa_t root = vcpu->arch.mmu->pae_root[i];
 
-		if (root && VALID_PAGE(root)) {
+		if (root && !WARN_ON_ONCE(!VALID_PAGE(root))) {
 			root &= PT64_BASE_ADDR_MASK;
 			sp = to_shadow_page(root);
 			mmu_sync_children(vcpu, sp);
@@ -5296,7 +5298,7 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
 
 	mmu->pae_root = page_address(page);
 	for (i = 0; i < 4; ++i)
-		mmu->pae_root[i] = INVALID_PAGE;
+		mmu->pae_root[i] = 0;
 
 	return 0;
 }
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 10/17] KVM: x86/mmu: Set the C-bit in the PDPTRs and LM pseudo-PDPTRs
  2021-03-05  1:10 [PATCH v2 00/17] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
                   ` (8 preceding siblings ...)
  2021-03-05  1:10 ` [PATCH v2 09/17] KVM: x86/mmu: Use '0' as the one and only value for an invalid PAE root Sean Christopherson
@ 2021-03-05  1:10 ` Sean Christopherson
  2021-03-05  1:10 ` [PATCH v2 11/17] KVM: x86/mmu: Mark the PAE roots as decrypted for shadow paging Sean Christopherson
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2021-03-05  1:10 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Set the C-bit in SPTEs that are set outside of the normal MMU flows,
specifically the PDPDTRs and the handful of special cased "LM root"
entries, all of which are shadow paging only.

Note, the direct-mapped-root PDPTR handling is needed for the scenario
where paging is disabled in the guest, in which case KVM uses a direct
mapped MMU even though TDP is disabled.

Fixes: d0ec49d4de90 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Cc: stable@vger.kernel.org
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index dbf7f0395e4b..09310c35fcf4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3257,7 +3257,8 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 
 			root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT),
 					      i << 30, PT32_ROOT_LEVEL, true);
-			mmu->pae_root[i] = root | PT_PRESENT_MASK;
+			mmu->pae_root[i] = root | PT_PRESENT_MASK |
+					   shadow_me_mask;
 		}
 		mmu->root_hpa = __pa(mmu->pae_root);
 	} else
@@ -3310,7 +3311,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	 * or a PAE 3-level page table. In either case we need to be aware that
 	 * the shadow page table may be a PAE or a long mode page table.
 	 */
-	pm_mask = PT_PRESENT_MASK;
+	pm_mask = PT_PRESENT_MASK | shadow_me_mask;
 	if (mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
 		pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK;
 
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 11/17] KVM: x86/mmu: Mark the PAE roots as decrypted for shadow paging
  2021-03-05  1:10 [PATCH v2 00/17] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
                   ` (9 preceding siblings ...)
  2021-03-05  1:10 ` [PATCH v2 10/17] KVM: x86/mmu: Set the C-bit in the PDPTRs and LM pseudo-PDPTRs Sean Christopherson
@ 2021-03-05  1:10 ` Sean Christopherson
  2021-03-05 17:44   ` Paolo Bonzini
  2021-03-05  1:10 ` [PATCH v2 12/17] KVM: SVM: Don't strip the C-bit from CR2 on #PF interception Sean Christopherson
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2021-03-05  1:10 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Set the PAE roots used as decrypted to play nice with SME when KVM is
using shadow paging.  Explicitly skip setting the C-bit when loading
CR3 for PAE shadow paging, even though it's completely ignored by the
CPU.  The extra documentation is nice to have.

Note, there are several subtleties at play with NPT.  In addition to
legacy shadow paging, the PAE roots are used for SVM's NPT when either
KVM is 32-bit (uses PAE paging) or KVM is 64-bit and shadowing 32-bit
NPT.  However, 32-bit Linux, and thus KVM, doesn't support SME.  And
64-bit KVM can happily set the C-bit in CR3.  This also means that
keeping __sme_set(root) for 32-bit KVM when NPT is enabled is
conceptually wrong, but functionally ok since SME is 64-bit only.
Leave it as is to avoid unnecessary pollution.

Fixes: d0ec49d4de90 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Cc: stable@vger.kernel.org
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 24 +++++++++++++++++++++++-
 arch/x86/kvm/svm/svm.c |  7 +++++--
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 09310c35fcf4..fa1aca21f6eb 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -48,6 +48,7 @@
 #include <asm/memtype.h>
 #include <asm/cmpxchg.h>
 #include <asm/io.h>
+#include <asm/set_memory.h>
 #include <asm/vmx.h>
 #include <asm/kvm_page_track.h>
 #include "trace.h"
@@ -3377,7 +3378,10 @@ static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
 	if (WARN_ON_ONCE(!tdp_enabled || mmu->pae_root || mmu->lm_root))
 		return -EIO;
 
-	/* Unlike 32-bit NPT, the PDP table doesn't need to be in low mem. */
+	/*
+	 * Unlike 32-bit NPT, the PDP table doesn't need to be in low mem, and
+	 * doesn't need to be decrypted.
+	 */
 	pae_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
 	if (!pae_root)
 		return -ENOMEM;
@@ -5264,6 +5268,8 @@ slot_handle_leaf(struct kvm *kvm, struct kvm_memory_slot *memslot,
 
 static void free_mmu_pages(struct kvm_mmu *mmu)
 {
+	if (!tdp_enabled && mmu->pae_root)
+		set_memory_encrypted((unsigned long)mmu->pae_root, 1);
 	free_page((unsigned long)mmu->pae_root);
 	free_page((unsigned long)mmu->lm_root);
 }
@@ -5301,6 +5307,22 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
 	for (i = 0; i < 4; ++i)
 		mmu->pae_root[i] = 0;
 
+	/*
+	 * CR3 is only 32 bits when PAE paging is used, thus it's impossible to
+	 * get the CPU to treat the PDPTEs as encrypted.  Decrypt the page so
+	 * that KVM's writes and the CPU's reads get along.  Note, this is
+	 * only necessary when using shadow paging, as 64-bit NPT can get at
+	 * the C-bit even when shadowing 32-bit NPT, and SME isn't supported
+	 * by 32-bit kernels (when KVM itself uses 32-bit NPT).
+	 */
+	if (!tdp_enabled)
+		set_memory_decrypted((unsigned long)mmu->pae_root, 1);
+	else
+		WARN_ON_ONCE(shadow_me_mask);
+
+	for (i = 0; i < 4; ++i)
+		mmu->pae_root[i] = 0;
+
 	return 0;
 }
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 54610270f66a..4769cf8bf2fd 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3908,15 +3908,18 @@ static void svm_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned long root,
 	struct vcpu_svm *svm = to_svm(vcpu);
 	unsigned long cr3;
 
-	cr3 = __sme_set(root);
 	if (npt_enabled) {
-		svm->vmcb->control.nested_cr3 = cr3;
+		svm->vmcb->control.nested_cr3 = __sme_set(root);
 		vmcb_mark_dirty(svm->vmcb, VMCB_NPT);
 
 		/* Loading L2's CR3 is handled by enter_svm_guest_mode.  */
 		if (!test_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail))
 			return;
 		cr3 = vcpu->arch.cr3;
+	} else if (vcpu->arch.mmu->shadow_root_level >= PT64_ROOT_4LEVEL) {
+		cr3 = __sme_set(root);
+	} else {
+		cr3 = root;
 	}
 
 	svm->vmcb->save.cr3 = cr3;
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 12/17] KVM: SVM: Don't strip the C-bit from CR2 on #PF interception
  2021-03-05  1:10 [PATCH v2 00/17] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
                   ` (10 preceding siblings ...)
  2021-03-05  1:10 ` [PATCH v2 11/17] KVM: x86/mmu: Mark the PAE roots as decrypted for shadow paging Sean Christopherson
@ 2021-03-05  1:10 ` Sean Christopherson
  2021-03-05  1:10 ` [PATCH v2 13/17] KVM: nVMX: Defer the MMU reload to the normal path on an EPTP switch Sean Christopherson
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2021-03-05  1:10 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Don't strip the C-bit from the faulting address on an intercepted #PF,
the address is a virtual address, not a physical address.

Fixes: 0ede79e13224 ("KVM: SVM: Clear C-bit from the page fault address")
Cc: stable@vger.kernel.org
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 4769cf8bf2fd..dfc8fe231e8b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1907,7 +1907,7 @@ static int pf_interception(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	u64 fault_address = __sme_clr(svm->vmcb->control.exit_info_2);
+	u64 fault_address = svm->vmcb->control.exit_info_2;
 	u64 error_code = svm->vmcb->control.exit_info_1;
 
 	return kvm_handle_page_fault(vcpu, error_code, fault_address,
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 13/17] KVM: nVMX: Defer the MMU reload to the normal path on an EPTP switch
  2021-03-05  1:10 [PATCH v2 00/17] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
                   ` (11 preceding siblings ...)
  2021-03-05  1:10 ` [PATCH v2 12/17] KVM: SVM: Don't strip the C-bit from CR2 on #PF interception Sean Christopherson
@ 2021-03-05  1:10 ` Sean Christopherson
  2021-03-05  1:10 ` [PATCH v2 14/17] KVM: x86: Defer the MMU unload to the normal path on an global INVPCID Sean Christopherson
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2021-03-05  1:10 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Defer reloading the MMU after a EPTP successful EPTP switch.  The VMFUNC
instruction itself is executed in the previous EPTP context, any side
effects, e.g. updating RIP, should occur in the old context.  Practically
speaking, this bug is benign as VMX doesn't touch the MMU when skipping
an emulated instruction, nor does queuing a single-step #DB.  No other
post-switch side effects exist.

Fixes: 41ab93727467 ("KVM: nVMX: Emulate EPTP switching for the L1 hypervisor")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index fdd80dd8e781..81f609886c8b 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5473,16 +5473,11 @@ static int nested_vmx_eptp_switching(struct kvm_vcpu *vcpu,
 		if (!nested_vmx_check_eptp(vcpu, new_eptp))
 			return 1;
 
-		kvm_mmu_unload(vcpu);
 		mmu->ept_ad = accessed_dirty;
 		mmu->mmu_role.base.ad_disabled = !accessed_dirty;
 		vmcs12->ept_pointer = new_eptp;
-		/*
-		 * TODO: Check what's the correct approach in case
-		 * mmu reload fails. Currently, we just let the next
-		 * reload potentially fail
-		 */
-		kvm_mmu_reload(vcpu);
+
+		kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
 	}
 
 	return 0;
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 14/17] KVM: x86: Defer the MMU unload to the normal path on an global INVPCID
  2021-03-05  1:10 [PATCH v2 00/17] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
                   ` (12 preceding siblings ...)
  2021-03-05  1:10 ` [PATCH v2 13/17] KVM: nVMX: Defer the MMU reload to the normal path on an EPTP switch Sean Christopherson
@ 2021-03-05  1:10 ` Sean Christopherson
  2021-03-05  1:10 ` [PATCH v2 15/17] KVM: x86/mmu: Unexport MMU load/unload functions Sean Christopherson
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2021-03-05  1:10 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Defer unloading the MMU after a INVPCID until the instruction emulation
has completed, i.e. until after RIP has been updated.

On VMX, this is a benign bug as VMX doesn't touch the MMU when skipping
an emulated instruction.  However, on SVM, if nrip is disabled, the
emulator is used to skip an instruction, which would lead to fireworks
if the emulator were invoked without a valid MMU.

Fixes: eb4b248e152d ("kvm: vmx: Support INVPCID in shadow paging mode")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 828de7d65074..7b0adebec1ef 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11531,7 +11531,7 @@ int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva)
 
 		fallthrough;
 	case INVPCID_TYPE_ALL_INCL_GLOBAL:
-		kvm_mmu_unload(vcpu);
+		kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
 		return kvm_skip_emulated_instruction(vcpu);
 
 	default:
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 15/17] KVM: x86/mmu: Unexport MMU load/unload functions
  2021-03-05  1:10 [PATCH v2 00/17] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
                   ` (13 preceding siblings ...)
  2021-03-05  1:10 ` [PATCH v2 14/17] KVM: x86: Defer the MMU unload to the normal path on an global INVPCID Sean Christopherson
@ 2021-03-05  1:10 ` Sean Christopherson
  2021-03-05  1:11 ` [PATCH v2 16/17] KVM: x86/mmu: Sync roots after MMU load iff load as successful Sean Christopherson
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2021-03-05  1:10 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

Unexport the MMU load and unload helpers now that they are no longer
used (incorrectly) in vendor code.

Opportunistically move the kvm_mmu_sync_roots() declaration into mmu.h,
it should not be exposed to vendor code.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h | 3 ---
 arch/x86/kvm/mmu.h              | 4 ++++
 arch/x86/kvm/mmu/mmu.c          | 2 --
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6db60ea8ee5b..2da6c9f5935a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1592,9 +1592,6 @@ void kvm_update_dr7(struct kvm_vcpu *vcpu);
 
 int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn);
 void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu);
-int kvm_mmu_load(struct kvm_vcpu *vcpu);
-void kvm_mmu_unload(struct kvm_vcpu *vcpu);
-void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu);
 void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 			ulong roots_to_free);
 gpa_t translate_nested_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access,
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 72b0f66073dc..67e8c7c7a6ce 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -74,6 +74,10 @@ bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu);
 int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
 				u64 fault_address, char *insn, int insn_len);
 
+int kvm_mmu_load(struct kvm_vcpu *vcpu);
+void kvm_mmu_unload(struct kvm_vcpu *vcpu);
+void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu);
+
 static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu)
 {
 	if (likely(vcpu->arch.mmu->root_hpa != INVALID_PAGE))
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index fa1aca21f6eb..4f66ca0f5f68 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4859,7 +4859,6 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 out:
 	return r;
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_load);
 
 void kvm_mmu_unload(struct kvm_vcpu *vcpu)
 {
@@ -4868,7 +4867,6 @@ void kvm_mmu_unload(struct kvm_vcpu *vcpu)
 	kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL);
 	WARN_ON(VALID_PAGE(vcpu->arch.guest_mmu.root_hpa));
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_unload);
 
 static bool need_remote_flush(u64 old, u64 new)
 {
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 16/17] KVM: x86/mmu: Sync roots after MMU load iff load as successful
  2021-03-05  1:10 [PATCH v2 00/17] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
                   ` (14 preceding siblings ...)
  2021-03-05  1:10 ` [PATCH v2 15/17] KVM: x86/mmu: Unexport MMU load/unload functions Sean Christopherson
@ 2021-03-05  1:11 ` Sean Christopherson
  2021-03-05  1:11 ` [PATCH v2 17/17] KVM: x86/mmu: WARN on NULL pae_root or lm_root, or bad shadow root level Sean Christopherson
  2021-03-05 17:53 ` [PATCH v2 00/17] KVM: x86/mmu: Lots of bug fixes Paolo Bonzini
  17 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2021-03-05  1:11 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

For clarity, explicitly skip syncing roots if the MMU load failed
instead of relying on the !VALID_PAGE check in kvm_mmu_sync_roots().

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4f66ca0f5f68..bceff7d815c3 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4850,10 +4850,11 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 	else
 		r = mmu_alloc_shadow_roots(vcpu);
 	write_unlock(&vcpu->kvm->mmu_lock);
+	if (r)
+		goto out;
 
 	kvm_mmu_sync_roots(vcpu);
-	if (r)
-		goto out;
+
 	kvm_mmu_load_pgd(vcpu);
 	static_call(kvm_x86_tlb_flush_current)(vcpu);
 out:
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 17/17] KVM: x86/mmu: WARN on NULL pae_root or lm_root, or bad shadow root level
  2021-03-05  1:10 [PATCH v2 00/17] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
                   ` (15 preceding siblings ...)
  2021-03-05  1:11 ` [PATCH v2 16/17] KVM: x86/mmu: Sync roots after MMU load iff load as successful Sean Christopherson
@ 2021-03-05  1:11 ` Sean Christopherson
  2021-03-05 17:53 ` [PATCH v2 00/17] KVM: x86/mmu: Lots of bug fixes Paolo Bonzini
  17 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2021-03-05  1:11 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Brijesh Singh,
	Tom Lendacky

WARN if KVM is about to dereference a NULL pae_root or lm_root when
loading an MMU, and convert the BUG() on a bad shadow_root_level into a
WARN (now that errors are handled cleanly).  With nested NPT, botching
the level and sending KVM down the wrong path is all too easy, and the
on-demand allocation of pae_root and lm_root means bugs crash the host.
Obviously, KVM could unconditionally allocate the roots, but that's
arguably a worse failure mode as it would potentially corrupt the guest
instead of crashing it.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index bceff7d815c3..eb9dd8144fa5 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3253,6 +3253,9 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 		root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level, true);
 		mmu->root_hpa = root;
 	} else if (shadow_root_level == PT32E_ROOT_LEVEL) {
+		if (WARN_ON_ONCE(!mmu->pae_root))
+			return -EIO;
+
 		for (i = 0; i < 4; ++i) {
 			WARN_ON_ONCE(mmu->pae_root[i]);
 
@@ -3262,8 +3265,10 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 					   shadow_me_mask;
 		}
 		mmu->root_hpa = __pa(mmu->pae_root);
-	} else
-		BUG();
+	} else {
+		WARN_ONCE(1, "Bad TDP root level = %d\n", shadow_root_level);
+		return -EIO;
+	}
 
 	/* root_pgd is ignored for direct MMUs. */
 	mmu->root_pgd = 0;
@@ -3307,6 +3312,9 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 		goto set_root_pgd;
 	}
 
+	if (WARN_ON_ONCE(!mmu->pae_root))
+		return -EIO;
+
 	/*
 	 * We shadow a 32 bit page table. This may be a legacy 2-level
 	 * or a PAE 3-level page table. In either case we need to be aware that
@@ -3316,6 +3324,9 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	if (mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
 		pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK;
 
+		if (WARN_ON_ONCE(!mmu->lm_root))
+			return -EIO;
+
 		mmu->lm_root[0] = __pa(mmu->pae_root) | pm_mask;
 	}
 
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 05/17] KVM: x86/mmu: Allocate pae_root and lm_root pages in dedicated helper
  2021-03-05  1:10 ` [PATCH v2 05/17] KVM: x86/mmu: Allocate pae_root and lm_root pages in dedicated helper Sean Christopherson
@ 2021-03-05 17:34   ` Paolo Bonzini
  0 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2021-03-05 17:34 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Brijesh Singh, Tom Lendacky

On 05/03/21 02:10, Sean Christopherson wrote:
> +	/*
> +	 * This mess only works with 4-level paging and needs to be updated to
> +	 * work with 5-level paging.
> +	 */

Planning for this, it's probably a good idea to rename lm_root to 
pml4_root.  Can be done on top.

Paolo


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 11/17] KVM: x86/mmu: Mark the PAE roots as decrypted for shadow paging
  2021-03-05  1:10 ` [PATCH v2 11/17] KVM: x86/mmu: Mark the PAE roots as decrypted for shadow paging Sean Christopherson
@ 2021-03-05 17:44   ` Paolo Bonzini
  2021-03-05 18:02     ` Sean Christopherson
  0 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2021-03-05 17:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Brijesh Singh, Tom Lendacky

On 05/03/21 02:10, Sean Christopherson wrote:
> @@ -5301,6 +5307,22 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
>   	for (i = 0; i < 4; ++i)
>   		mmu->pae_root[i] = 0;

I think this should be deleted, since you have another identical for 
loop below?

Paolo

> +	/*
> +	 * CR3 is only 32 bits when PAE paging is used, thus it's impossible to
> +	 * get the CPU to treat the PDPTEs as encrypted.  Decrypt the page so
> +	 * that KVM's writes and the CPU's reads get along.  Note, this is
> +	 * only necessary when using shadow paging, as 64-bit NPT can get at
> +	 * the C-bit even when shadowing 32-bit NPT, and SME isn't supported
> +	 * by 32-bit kernels (when KVM itself uses 32-bit NPT).
> +	 */
> +	if (!tdp_enabled)
> +		set_memory_decrypted((unsigned long)mmu->pae_root, 1);
> +	else
> +		WARN_ON_ONCE(shadow_me_mask);
> +
> +	for (i = 0; i < 4; ++i)
> +		mmu->pae_root[i] = 0;
> +


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 09/17] KVM: x86/mmu: Use '0' as the one and only value for an invalid PAE root
  2021-03-05  1:10 ` [PATCH v2 09/17] KVM: x86/mmu: Use '0' as the one and only value for an invalid PAE root Sean Christopherson
@ 2021-03-05 17:52   ` Paolo Bonzini
  2021-03-05 18:22     ` Sean Christopherson
  0 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2021-03-05 17:52 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Brijesh Singh, Tom Lendacky

On 05/03/21 02:10, Sean Christopherson wrote:
> Use '0' to denote an invalid pae_root instead of '0' or INVALID_PAGE.
> Unlike root_hpa, the pae_roots hold permission bits and thus are
> guaranteed to be non-zero.  Having to deal with both values leads to
> bugs, e.g. failing to set back to INVALID_PAGE, warning on the wrong
> value, etc...

I don't dispute this is a good idea, but it deserves one or more comments.

Paolo


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/17] KVM: x86/mmu: Lots of bug fixes
  2021-03-05  1:10 [PATCH v2 00/17] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
                   ` (16 preceding siblings ...)
  2021-03-05  1:11 ` [PATCH v2 17/17] KVM: x86/mmu: WARN on NULL pae_root or lm_root, or bad shadow root level Sean Christopherson
@ 2021-03-05 17:53 ` Paolo Bonzini
  17 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2021-03-05 17:53 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Brijesh Singh, Tom Lendacky

On 05/03/21 02:10, Sean Christopherson wrote:
> Fix nested NPT (nSVM) with 32-bit L1 and SME with shadow paging, which
> are completely broken.  Opportunistically fix theoretical bugs related to
> prematurely reloading/unloading the MMU.
> 
> If nNPT is enabled, L1 can crash the host simply by using 32-bit NPT to
> trigger a null pointer dereference on pae_root.
> 
> SME with shadow paging (including nNPT) fails to set the C-bit in the
> shadow pages that don't go through standard MMU flows (PDPTPRs and the
> PML4 used by nNPT to shadow legacy NPT).  It also failes to account for
> CR3[63:32], and thus the C-bit, being ignored outside of 64-bit mode.
> 
> Patches 01 and 02 fix the null pointer bugs.
> 
> Patches 03-09 fix mostly-benign related memory leaks.
> 
> Patches 10-12 fix the SME shadow paging bugs, which are also what led me to
> the nNPT null pointer bugs.
> 
> Patches 13 and 14 fix theoretical bugs with PTP_SWITCH and INVPCID that
> I found when auditing flows that touch the MMU context.
> 
> Patches 14-17 do additional clean up to hopefully make it harder to
> introduce bugs in the future.
> 
> On the plus side, I finally understand why KVM supports shadowing 2-level
> page tables with 4-level page tables...
> 
> Based on kvm/queue, commit fe5f0041c026 ("KVM/SVM: Move vmenter.S exception
> fixups out of line").  The null pointer fixes cherry-pick cleanly onto
> kvm/master, haven't tried the other bug fixes (I doubt they're worth
> backporting even though I tagged 'em with stable).
> 
> v2:
>    - Collect a review from Ben (did not include his review of patch 03
>      since the patch and its direct dependencies were changed).
>    - Move pae_root and lm_root allocation to a separate helper to avoid
>      sleeping via get_zeroed_page() while holding mmu_lock.
>    - Add a patch to grab 'mmu' in a local variable.
>    - Remove the BUILD_BUG_ON() in make_mmu_pages_available() since the
>      final check wouldn't actually guarnatee 4 pages were "available".
>      Instead, add a comment about the limit being soft.
> 
> v1:
>    - https://lkml.kernel.org/r/20210302184540.2829328-1-seanjc@google.com
>   
> Sean Christopherson (17):
>    KVM: nSVM: Set the shadow root level to the TDP level for nested NPT
>    KVM: x86/mmu: Alloc page for PDPTEs when shadowing 32-bit NPT with
>      64-bit
>    KVM: x86/mmu: Capture 'mmu' in a local variable when allocating roots
>    KVM: x86/mmu: Allocate the lm_root before allocating PAE roots
>    KVM: x86/mmu: Allocate pae_root and lm_root pages in dedicated helper
>    KVM: x86/mmu: Ensure MMU pages are available when allocating roots
>    KVM: x86/mmu: Check PDPTRs before allocating PAE roots
>    KVM: x86/mmu: Fix and unconditionally enable WARNs to detect PAE leaks
>    KVM: x86/mmu: Use '0' as the one and only value for an invalid PAE
>      root
>    KVM: x86/mmu: Set the C-bit in the PDPTRs and LM pseudo-PDPTRs
>    KVM: x86/mmu: Mark the PAE roots as decrypted for shadow paging
>    KVM: SVM: Don't strip the C-bit from CR2 on #PF interception
>    KVM: nVMX: Defer the MMU reload to the normal path on an EPTP switch
>    KVM: x86: Defer the MMU unload to the normal path on an global INVPCID
>    KVM: x86/mmu: Unexport MMU load/unload functions
>    KVM: x86/mmu: Sync roots after MMU load iff load as successful
>    KVM: x86/mmu: WARN on NULL pae_root or lm_root, or bad shadow root
>      level
> 
>   arch/x86/include/asm/kvm_host.h |   3 -
>   arch/x86/kvm/mmu.h              |   4 +
>   arch/x86/kvm/mmu/mmu.c          | 273 ++++++++++++++++++++------------
>   arch/x86/kvm/mmu/tdp_mmu.c      |  23 +--
>   arch/x86/kvm/svm/svm.c          |   9 +-
>   arch/x86/kvm/vmx/nested.c       |   9 +-
>   arch/x86/kvm/x86.c              |   2 +-
>   7 files changed, 192 insertions(+), 131 deletions(-)
> 

Queued all except 9 and 11, thanks.

Paolo


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 11/17] KVM: x86/mmu: Mark the PAE roots as decrypted for shadow paging
  2021-03-05 17:44   ` Paolo Bonzini
@ 2021-03-05 18:02     ` Sean Christopherson
  0 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2021-03-05 18:02 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Brijesh Singh, Tom Lendacky

On Fri, Mar 05, 2021, Paolo Bonzini wrote:
> On 05/03/21 02:10, Sean Christopherson wrote:
> > @@ -5301,6 +5307,22 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
> >   	for (i = 0; i < 4; ++i)
> >   		mmu->pae_root[i] = 0;
> 
> I think this should be deleted, since you have another identical for loop
> below?

Yes, rebase gone awry.  And the zeroing needs to be done after decryption.
Good eyes!

> Paolo
> 
> > +	/*
> > +	 * CR3 is only 32 bits when PAE paging is used, thus it's impossible to
> > +	 * get the CPU to treat the PDPTEs as encrypted.  Decrypt the page so
> > +	 * that KVM's writes and the CPU's reads get along.  Note, this is
> > +	 * only necessary when using shadow paging, as 64-bit NPT can get at
> > +	 * the C-bit even when shadowing 32-bit NPT, and SME isn't supported
> > +	 * by 32-bit kernels (when KVM itself uses 32-bit NPT).
> > +	 */
> > +	if (!tdp_enabled)
> > +		set_memory_decrypted((unsigned long)mmu->pae_root, 1);
> > +	else
> > +		WARN_ON_ONCE(shadow_me_mask);
> > +
> > +	for (i = 0; i < 4; ++i)
> > +		mmu->pae_root[i] = 0;
> > +
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 09/17] KVM: x86/mmu: Use '0' as the one and only value for an invalid PAE root
  2021-03-05 17:52   ` Paolo Bonzini
@ 2021-03-05 18:22     ` Sean Christopherson
  2021-03-05 18:23       ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2021-03-05 18:22 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Brijesh Singh, Tom Lendacky

On Fri, Mar 05, 2021, Paolo Bonzini wrote:
> On 05/03/21 02:10, Sean Christopherson wrote:
> > Use '0' to denote an invalid pae_root instead of '0' or INVALID_PAGE.
> > Unlike root_hpa, the pae_roots hold permission bits and thus are
> > guaranteed to be non-zero.  Having to deal with both values leads to
> > bugs, e.g. failing to set back to INVALID_PAGE, warning on the wrong
> > value, etc...
> 
> I don't dispute this is a good idea, but it deserves one or more comments.

Agreed.   What about adding macros?

/* Comment goes here. */
#define INVALID_PAE_ROOT	0
#define IS_VALID_PAE_ROOT(x)	(!!(x))


Also, I missed this pattern in mmu_audit.c's mmu_spte_walk():

	for (i = 0; i < 4; ++i) {
		hpa_t root = vcpu->arch.mmu->pae_root[i];

		if (root && VALID_PAGE(root)) {
			root &= PT64_BASE_ADDR_MASK;
			sp = to_shadow_page(root);
			__mmu_spte_walk(vcpu, sp, fn, 2);
		}
	}

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 09/17] KVM: x86/mmu: Use '0' as the one and only value for an invalid PAE root
  2021-03-05 18:22     ` Sean Christopherson
@ 2021-03-05 18:23       ` Paolo Bonzini
  0 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2021-03-05 18:23 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Brijesh Singh, Tom Lendacky

On 05/03/21 19:22, Sean Christopherson wrote:
> On Fri, Mar 05, 2021, Paolo Bonzini wrote:
>> On 05/03/21 02:10, Sean Christopherson wrote:
>>> Use '0' to denote an invalid pae_root instead of '0' or INVALID_PAGE.
>>> Unlike root_hpa, the pae_roots hold permission bits and thus are
>>> guaranteed to be non-zero.  Having to deal with both values leads to
>>> bugs, e.g. failing to set back to INVALID_PAGE, warning on the wrong
>>> value, etc...
>>
>> I don't dispute this is a good idea, but it deserves one or more comments.
> 
> Agreed.   What about adding macros?
> 
> /* Comment goes here. */
> #define INVALID_PAE_ROOT	0
> #define IS_VALID_PAE_ROOT(x)	(!!(x))

Yep, that's a nice solution.

Paolo

> 
> 
> Also, I missed this pattern in mmu_audit.c's mmu_spte_walk():
> 
> 	for (i = 0; i < 4; ++i) {
> 		hpa_t root = vcpu->arch.mmu->pae_root[i];
> 
> 		if (root && VALID_PAGE(root)) {
> 			root &= PT64_BASE_ADDR_MASK;
> 			sp = to_shadow_page(root);
> 			__mmu_spte_walk(vcpu, sp, fn, 2);
> 		}
> 	}
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 07/17] KVM: x86/mmu: Check PDPTRs before allocating PAE roots
  2021-03-05  1:10 ` [PATCH v2 07/17] KVM: x86/mmu: Check PDPTRs before allocating PAE roots Sean Christopherson
@ 2021-04-08 11:15   ` Wanpeng Li
  2021-04-08 12:09     ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: Wanpeng Li @ 2021-04-08 11:15 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, LKML, Ben Gardon, Brijesh Singh, Tom Lendacky

On Fri, 5 Mar 2021 at 09:12, Sean Christopherson <seanjc@google.com> wrote:
>
> Check the validity of the PDPTRs before allocating any of the PAE roots,
> otherwise a bad PDPTR will cause KVM to leak any previously allocated
> roots.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 20 ++++++++++++++------
>  1 file changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 7ebfbc77b050..9fc2b46f8541 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3269,7 +3269,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
>  static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
>  {
>         struct kvm_mmu *mmu = vcpu->arch.mmu;
> -       u64 pdptr, pm_mask;
> +       u64 pdptrs[4], pm_mask;
>         gfn_t root_gfn, root_pgd;
>         hpa_t root;
>         int i;
> @@ -3280,6 +3280,17 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
>         if (mmu_check_root(vcpu, root_gfn))
>                 return 1;
>
> +       if (mmu->root_level == PT32E_ROOT_LEVEL) {
> +               for (i = 0; i < 4; ++i) {
> +                       pdptrs[i] = mmu->get_pdptr(vcpu, i);
> +                       if (!(pdptrs[i] & PT_PRESENT_MASK))
> +                               continue;
> +
> +                       if (mmu_check_root(vcpu, pdptrs[i] >> PAGE_SHIFT))
> +                               return 1;
> +               }
> +       }
> +

I saw this splatting:

 BUG: sleeping function called from invalid context at
arch/x86/kvm/kvm_cache_regs.h:115
 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3090, name:
qemu-system-x86
 3 locks held by qemu-system-x86/3090:
  #0: ffff8d499f8d00d0 (&vcpu->mutex){+.+.}-{3:3}, at:
kvm_vcpu_ioctl+0x8e/0x680 [kvm]
  #1: ffffb1b540f873e8 (&kvm->srcu){....}-{0:0}, at:
vcpu_enter_guest+0xb30/0x1b10 [kvm]
  #2: ffffb1b540f7d018 (&(kvm)->mmu_lock){+.+.}-{2:2}, at:
kvm_mmu_load+0xb5/0x540 [kvm]
 Preemption disabled at:
 [<ffffffffc0787365>] kvm_mmu_load+0xb5/0x540 [kvm]
 CPU: 2 PID: 3090 Comm: qemu-system-x86 Tainted: G        W  OE
5.12.0-rc3+ #3
 Call Trace:
  dump_stack+0x87/0xb7
  ___might_sleep+0x202/0x250
  __might_sleep+0x4a/0x80
  kvm_pdptr_read+0x20/0x60 [kvm]
  kvm_mmu_load+0x3bd/0x540 [kvm]
  vcpu_enter_guest+0x1297/0x1b10 [kvm]
  kvm_arch_vcpu_ioctl_run+0x372/0x690 [kvm]
  kvm_vcpu_ioctl+0x3ca/0x680 [kvm]
  __x64_sys_ioctl+0x27a/0x800
  do_syscall_64+0x38/0x50
  entry_SYSCALL_64_after_hwframe+0x44/0xae

There is a might_sleep() in kvm_pdptr_read(), however, the original
commit didn't explain more. I can send a formal one if the below fix
is acceptable.

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index efb41f3..f33026f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3289,17 +3289,24 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
     if (mmu_check_root(vcpu, root_gfn))
         return 1;

+    write_unlock(&vcpu->kvm->mmu_lock);
     if (mmu->root_level == PT32E_ROOT_LEVEL) {
         for (i = 0; i < 4; ++i) {
             pdptrs[i] = mmu->get_pdptr(vcpu, i);
             if (!(pdptrs[i] & PT_PRESENT_MASK))
                 continue;

-            if (mmu_check_root(vcpu, pdptrs[i] >> PAGE_SHIFT))
+            if (mmu_check_root(vcpu, pdptrs[i] >> PAGE_SHIFT)) {
+                write_lock(&vcpu->kvm->mmu_lock);
                 return 1;
+            }
         }
     }

+    write_lock(&vcpu->kvm->mmu_lock);
+    if (make_mmu_pages_available(vcpu))
+        return -ENOSPC;
+
     /*
      * Do we shadow a long mode page table? If so we need to
      * write-protect the guests page table root.

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 07/17] KVM: x86/mmu: Check PDPTRs before allocating PAE roots
  2021-04-08 11:15   ` Wanpeng Li
@ 2021-04-08 12:09     ` Paolo Bonzini
  2021-04-08 15:48       ` Sean Christopherson
  0 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2021-04-08 12:09 UTC (permalink / raw)
  To: Wanpeng Li, Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	LKML, Ben Gardon, Brijesh Singh, Tom Lendacky

On 08/04/21 13:15, Wanpeng Li wrote:
> I saw this splatting:
> 
>   BUG: sleeping function called from invalid context at
> arch/x86/kvm/kvm_cache_regs.h:115
>    kvm_pdptr_read+0x20/0x60 [kvm]
>    kvm_mmu_load+0x3bd/0x540 [kvm]
> 
> There is a might_sleep() in kvm_pdptr_read(), however, the original
> commit didn't explain more. I can send a formal one if the below fix
> is acceptable.

I think we can just push make_mmu_pages_available down into
kvm_mmu_load's callees.  This way it's not necessary to hold the lock
until after the PDPTR check:

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0d92a269c5fa..f92c3695bfeb 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3244,6 +3244,12 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
  	u8 shadow_root_level = mmu->shadow_root_level;
  	hpa_t root;
  	unsigned i;
+	int r;
+
+	write_lock(&vcpu->kvm->mmu_lock);
+	r = make_mmu_pages_available(vcpu);
+	if (r < 0)
+		goto out_unlock;
  
  	if (is_tdp_mmu_enabled(vcpu->kvm)) {
  		root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu);
@@ -3266,13 +3272,16 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
  		mmu->root_hpa = __pa(mmu->pae_root);
  	} else {
  		WARN_ONCE(1, "Bad TDP root level = %d\n", shadow_root_level);
-		return -EIO;
+		r = -EIO;
  	}
  
+out_unlock:
+	write_unlock(&vcpu->kvm->mmu_lock);
+
  	/* root_pgd is ignored for direct MMUs. */
  	mmu->root_pgd = 0;
  
-	return 0;
+	return r;
  }
  
  static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
@@ -3282,6 +3291,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
  	gfn_t root_gfn, root_pgd;
  	hpa_t root;
  	int i;
+	int r;
  
  	root_pgd = mmu->get_guest_pgd(vcpu);
  	root_gfn = root_pgd >> PAGE_SHIFT;
@@ -3300,6 +3310,11 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
  		}
  	}
  
+	write_lock(&vcpu->kvm->mmu_lock);
+	r = make_mmu_pages_available(vcpu);
+	if (r < 0)
+		goto out_unlock;
+
  	/*
  	 * Do we shadow a long mode page table? If so we need to
  	 * write-protect the guests page table root.
@@ -3308,7 +3323,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
  		root = mmu_alloc_root(vcpu, root_gfn, 0,
  				      mmu->shadow_root_level, false);
  		mmu->root_hpa = root;
-		goto set_root_pgd;
+		goto out_unlock;
  	}
  
  	if (WARN_ON_ONCE(!mmu->pae_root))
@@ -3350,7 +3365,8 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
  	else
  		mmu->root_hpa = __pa(mmu->pae_root);
  
-set_root_pgd:
+out_unlock:
+	write_unlock(&vcpu->kvm->mmu_lock);
  	mmu->root_pgd = root_pgd;
  
  	return 0;
@@ -4852,14 +4868,10 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
  	r = mmu_alloc_special_roots(vcpu);
  	if (r)
  		goto out;
-	write_lock(&vcpu->kvm->mmu_lock);
-	if (make_mmu_pages_available(vcpu))
-		r = -ENOSPC;
-	else if (vcpu->arch.mmu->direct_map)
+	if (vcpu->arch.mmu->direct_map)
  		r = mmu_alloc_direct_roots(vcpu);
  	else
  		r = mmu_alloc_shadow_roots(vcpu);
-	write_unlock(&vcpu->kvm->mmu_lock);
  	if (r)
  		goto out;
  

Paolo


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 07/17] KVM: x86/mmu: Check PDPTRs before allocating PAE roots
  2021-04-08 12:09     ` Paolo Bonzini
@ 2021-04-08 15:48       ` Sean Christopherson
  2021-04-08 15:57         ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2021-04-08 15:48 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Wanpeng Li, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, LKML, Ben Gardon, Brijesh Singh, Tom Lendacky

On Thu, Apr 08, 2021, Paolo Bonzini wrote:
> On 08/04/21 13:15, Wanpeng Li wrote:
> > I saw this splatting:
> > 
> >   BUG: sleeping function called from invalid context at
> > arch/x86/kvm/kvm_cache_regs.h:115
> >    kvm_pdptr_read+0x20/0x60 [kvm]
> >    kvm_mmu_load+0x3bd/0x540 [kvm]
> > 
> > There is a might_sleep() in kvm_pdptr_read(), however, the original
> > commit didn't explain more. I can send a formal one if the below fix
> > is acceptable.

We don't want to drop mmu_lock, even temporarily.  The reason for holding it
across the entire sequence is to ensure kvm_mmu_available_pages() isn't violated.

> I think we can just push make_mmu_pages_available down into
> kvm_mmu_load's callees.  This way it's not necessary to hold the lock
> until after the PDPTR check:

...

> @@ -4852,14 +4868,10 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
>  	r = mmu_alloc_special_roots(vcpu);
>  	if (r)
>  		goto out;
> -	write_lock(&vcpu->kvm->mmu_lock);
> -	if (make_mmu_pages_available(vcpu))
> -		r = -ENOSPC;
> -	else if (vcpu->arch.mmu->direct_map)
> +	if (vcpu->arch.mmu->direct_map)
>  		r = mmu_alloc_direct_roots(vcpu);
>  	else
>  		r = mmu_alloc_shadow_roots(vcpu);
> -	write_unlock(&vcpu->kvm->mmu_lock);
>  	if (r)
>  		goto out;

Freaking PDPTRs.  I was really hoping we could keep the lock and pages_available()
logic outside of the helpers.  What if kvm_mmu_load() reads the PDPTRs and
passes them into mmu_alloc_shadow_roots()?  Or is that too ugly?

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index efb41f31e80a..e3c4938cd665 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3275,11 +3275,11 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
        return 0;
 }

-static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
+static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu, u64 pdptrs[4])
 {
        struct kvm_mmu *mmu = vcpu->arch.mmu;
-       u64 pdptrs[4], pm_mask;
        gfn_t root_gfn, root_pgd;
+       u64 pm_mask;
        hpa_t root;
        int i;

@@ -3291,11 +3291,8 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)

        if (mmu->root_level == PT32E_ROOT_LEVEL) {
                for (i = 0; i < 4; ++i) {
-                       pdptrs[i] = mmu->get_pdptr(vcpu, i);
-                       if (!(pdptrs[i] & PT_PRESENT_MASK))
-                               continue;
-
-                       if (mmu_check_root(vcpu, pdptrs[i] >> PAGE_SHIFT))
+                       if ((pdptrs[i] & PT_PRESENT_MASK) &&
+                           mmu_check_root(vcpu, pdptrs[i] >> PAGE_SHIFT))
                                return 1;
                }
        }
@@ -4844,21 +4841,33 @@ EXPORT_SYMBOL_GPL(kvm_mmu_reset_context);

 int kvm_mmu_load(struct kvm_vcpu *vcpu)
 {
-       int r;
+       struct kvm_mmu *mmu = vcpu->arch.mmu;
+       u64 pdptrs[4];
+       int r, i;

-       r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->direct_map);
+       r = mmu_topup_memory_caches(vcpu, !mmu->direct_map);
        if (r)
                goto out;
        r = mmu_alloc_special_roots(vcpu);
        if (r)
                goto out;
+
+       /*
+        * On SVM, reading PDPTRs might access guest memory, which might fault
+        * and thus might sleep.  Grab the PDPTRs before acquiring mmu_lock.
+        */
+       if (!mmu->direct_map && mmu->root_level == PT32E_ROOT_LEVEL) {
+               for (i = 0; i < 4; ++i)
+                       pdptrs[i] = mmu->get_pdptr(vcpu, i);
+       }
+
        write_lock(&vcpu->kvm->mmu_lock);
        if (make_mmu_pages_available(vcpu))
                r = -ENOSPC;
        else if (vcpu->arch.mmu->direct_map)
                r = mmu_alloc_direct_roots(vcpu);
        else
-               r = mmu_alloc_shadow_roots(vcpu);
+               r = mmu_alloc_shadow_roots(vcpu, pdptrs);
        write_unlock(&vcpu->kvm->mmu_lock);
        if (r)
                goto out;

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 07/17] KVM: x86/mmu: Check PDPTRs before allocating PAE roots
  2021-04-08 15:48       ` Sean Christopherson
@ 2021-04-08 15:57         ` Paolo Bonzini
  2021-04-08 16:27           ` Sean Christopherson
  0 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2021-04-08 15:57 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Wanpeng Li, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, LKML, Ben Gardon, Brijesh Singh, Tom Lendacky

On 08/04/21 17:48, Sean Christopherson wrote:
> Freaking PDPTRs.  I was really hoping we could keep the lock and pages_available()
> logic outside of the helpers.  What if kvm_mmu_load() reads the PDPTRs and
> passes them into mmu_alloc_shadow_roots()?  Or is that too ugly?

The patch I have posted (though untested) tries to do that in a slightly 
less ugly way by pushing make_mmu_pages_available down to mmu_alloc_*_roots.

Paolo

> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index efb41f31e80a..e3c4938cd665 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3275,11 +3275,11 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
>          return 0;
>   }
> 
> -static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
> +static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu, u64 pdptrs[4])
>   {
>          struct kvm_mmu *mmu = vcpu->arch.mmu;
> -       u64 pdptrs[4], pm_mask;
>          gfn_t root_gfn, root_pgd;
> +       u64 pm_mask;
>          hpa_t root;
>          int i;
> 
> @@ -3291,11 +3291,8 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
> 
>          if (mmu->root_level == PT32E_ROOT_LEVEL) {
>                  for (i = 0; i < 4; ++i) {
> -                       pdptrs[i] = mmu->get_pdptr(vcpu, i);
> -                       if (!(pdptrs[i] & PT_PRESENT_MASK))
> -                               continue;
> -
> -                       if (mmu_check_root(vcpu, pdptrs[i] >> PAGE_SHIFT))
> +                       if ((pdptrs[i] & PT_PRESENT_MASK) &&
> +                           mmu_check_root(vcpu, pdptrs[i] >> PAGE_SHIFT))
>                                  return 1;
>                  }
>          }
> @@ -4844,21 +4841,33 @@ EXPORT_SYMBOL_GPL(kvm_mmu_reset_context);
> 
>   int kvm_mmu_load(struct kvm_vcpu *vcpu)
>   {
> -       int r;
> +       struct kvm_mmu *mmu = vcpu->arch.mmu;
> +       u64 pdptrs[4];
> +       int r, i;
> 
> -       r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->direct_map);
> +       r = mmu_topup_memory_caches(vcpu, !mmu->direct_map);
>          if (r)
>                  goto out;
>          r = mmu_alloc_special_roots(vcpu);
>          if (r)
>                  goto out;
> +
> +       /*
> +        * On SVM, reading PDPTRs might access guest memory, which might fault
> +        * and thus might sleep.  Grab the PDPTRs before acquiring mmu_lock.
> +        */
> +       if (!mmu->direct_map && mmu->root_level == PT32E_ROOT_LEVEL) {
> +               for (i = 0; i < 4; ++i)
> +                       pdptrs[i] = mmu->get_pdptr(vcpu, i);
> +       }
> +
>          write_lock(&vcpu->kvm->mmu_lock);
>          if (make_mmu_pages_available(vcpu))
>                  r = -ENOSPC;
>          else if (vcpu->arch.mmu->direct_map)
>                  r = mmu_alloc_direct_roots(vcpu);
>          else
> -               r = mmu_alloc_shadow_roots(vcpu);
> +               r = mmu_alloc_shadow_roots(vcpu, pdptrs);
>          write_unlock(&vcpu->kvm->mmu_lock);
>          if (r)
>                  goto out;
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 07/17] KVM: x86/mmu: Check PDPTRs before allocating PAE roots
  2021-04-08 15:57         ` Paolo Bonzini
@ 2021-04-08 16:27           ` Sean Christopherson
  2021-04-08 16:30             ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2021-04-08 16:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Wanpeng Li, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, LKML, Ben Gardon, Brijesh Singh, Tom Lendacky

On Thu, Apr 08, 2021, Paolo Bonzini wrote:
> On 08/04/21 17:48, Sean Christopherson wrote:
> > Freaking PDPTRs.  I was really hoping we could keep the lock and pages_available()
> > logic outside of the helpers.  What if kvm_mmu_load() reads the PDPTRs and
> > passes them into mmu_alloc_shadow_roots()?  Or is that too ugly?
> 
> The patch I have posted (though untested) tries to do that in a slightly
> less ugly way by pushing make_mmu_pages_available down to mmu_alloc_*_roots.

Yeah, I agree it's less ugly.  It would be nice to not duplicate that code, but
it's probably not worth the ugliness.  :-/

For your approach, can we put the out label after the success path?  Setting
mmu->root_pgd isn't wrong per se, but doing so might mislead future readers into
thinking that it's functionally necessary. 


diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index efb41f31e80a..93f97d0a9e2e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3244,6 +3244,13 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
        u8 shadow_root_level = mmu->shadow_root_level;
        hpa_t root;
        unsigned i;
+       int r;
+
+       write_lock(&vcpu->kvm->mmu_lock);
+
+       r = make_mmu_pages_available(vcpu);
+       if (r)
+               goto out_unlock;

        if (is_tdp_mmu_enabled(vcpu->kvm)) {
                root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu);
@@ -3252,8 +3259,10 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
                root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level, true);
                mmu->root_hpa = root;
        } else if (shadow_root_level == PT32E_ROOT_LEVEL) {
-               if (WARN_ON_ONCE(!mmu->pae_root))
-                       return -EIO;
+               if (WARN_ON_ONCE(!mmu->pae_root)) {
+                       r = -EIO;
+                       goto out_unlock;
+               }

                for (i = 0; i < 4; ++i) {
                        WARN_ON_ONCE(IS_VALID_PAE_ROOT(mmu->pae_root[i]));
@@ -3266,13 +3275,15 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
                mmu->root_hpa = __pa(mmu->pae_root);
        } else {
                WARN_ONCE(1, "Bad TDP root level = %d\n", shadow_root_level);
-               return -EIO;
+               r = -EIO;
+               goto out_unlock;
        }

        /* root_pgd is ignored for direct MMUs. */
        mmu->root_pgd = 0;
-
-       return 0;
+out_unlock:
+       write_unlock(&vcpu->kvm->mmu_lock);
+       return r;
 }

 static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
@@ -3281,7 +3292,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
        u64 pdptrs[4], pm_mask;
        gfn_t root_gfn, root_pgd;
        hpa_t root;
-       int i;
+       int i, r;

        root_pgd = mmu->get_guest_pgd(vcpu);
        root_gfn = root_pgd >> PAGE_SHIFT;
@@ -3289,6 +3300,10 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
        if (mmu_check_root(vcpu, root_gfn))
                return 1;

+       /*
+        * On SVM, reading PDPTRs might access guest memory, which might fault
+        * and thus might sleep.  Grab the PDPTRs before acquiring mmu_lock.
+        */
        if (mmu->root_level == PT32E_ROOT_LEVEL) {
                for (i = 0; i < 4; ++i) {
                        pdptrs[i] = mmu->get_pdptr(vcpu, i);
@@ -3300,6 +3315,12 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
                }
        }

+       write_lock(&vcpu->kvm->mmu_lock);
+
+       r = make_mmu_pages_available(vcpu);
+       if (r)
+               goto out_unlock;
+
        /*
         * Do we shadow a long mode page table? If so we need to
         * write-protect the guests page table root.
@@ -3311,8 +3332,10 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
                goto set_root_pgd;
        }

-       if (WARN_ON_ONCE(!mmu->pae_root))
-               return -EIO;
+       if (WARN_ON_ONCE(!mmu->pae_root)) {
+               r = -EIO;
+               goto out_unlock;
+       }

        /*
         * We shadow a 32 bit page table. This may be a legacy 2-level
@@ -3323,8 +3346,10 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
        if (mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
                pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK;

-               if (WARN_ON_ONCE(!mmu->lm_root))
-                       return -EIO;
+               if (WARN_ON_ONCE(!mmu->lm_root)) {
+                       r = -EIO;
+                       goto out_unlock;
+               }

                mmu->lm_root[0] = __pa(mmu->pae_root) | pm_mask;
        }
@@ -3352,8 +3377,9 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)

 set_root_pgd:
        mmu->root_pgd = root_pgd;
-
-       return 0;
+out_unlock:
+       write_unlock(&vcpu->kvm->mmu_lock);
+       return r;
 }

 static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
@@ -4852,14 +4878,10 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
        r = mmu_alloc_special_roots(vcpu);
        if (r)
                goto out;
-       write_lock(&vcpu->kvm->mmu_lock);
-       if (make_mmu_pages_available(vcpu))
-               r = -ENOSPC;
-       else if (vcpu->arch.mmu->direct_map)
+       if (vcpu->arch.mmu->direct_map)
                r = mmu_alloc_direct_roots(vcpu);
        else
                r = mmu_alloc_shadow_roots(vcpu);
-       write_unlock(&vcpu->kvm->mmu_lock);
        if (r)
                goto out;



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 07/17] KVM: x86/mmu: Check PDPTRs before allocating PAE roots
  2021-04-08 16:27           ` Sean Christopherson
@ 2021-04-08 16:30             ` Paolo Bonzini
  0 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2021-04-08 16:30 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Wanpeng Li, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, LKML, Ben Gardon, Brijesh Singh, Tom Lendacky

On 08/04/21 18:27, Sean Christopherson wrote:
> For your approach, can we put the out label after the success path?  Setting
> mmu->root_pgd isn't wrong per se, but doing so might mislead future readers into
> thinking that it's functionally necessary.

Indeed, thanks for the speedy review.  I'll get it queued tomorrow.

Paolo


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2021-04-08 16:30 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-05  1:10 [PATCH v2 00/17] KVM: x86/mmu: Lots of bug fixes Sean Christopherson
2021-03-05  1:10 ` [PATCH v2 01/17] KVM: nSVM: Set the shadow root level to the TDP level for nested NPT Sean Christopherson
2021-03-05  1:10 ` [PATCH v2 02/17] KVM: x86/mmu: Alloc page for PDPTEs when shadowing 32-bit NPT with 64-bit Sean Christopherson
2021-03-05  1:10 ` [PATCH v2 03/17] KVM: x86/mmu: Capture 'mmu' in a local variable when allocating roots Sean Christopherson
2021-03-05  1:10 ` [PATCH v2 04/17] KVM: x86/mmu: Allocate the lm_root before allocating PAE roots Sean Christopherson
2021-03-05  1:10 ` [PATCH v2 05/17] KVM: x86/mmu: Allocate pae_root and lm_root pages in dedicated helper Sean Christopherson
2021-03-05 17:34   ` Paolo Bonzini
2021-03-05  1:10 ` [PATCH v2 06/17] KVM: x86/mmu: Ensure MMU pages are available when allocating roots Sean Christopherson
2021-03-05  1:10 ` [PATCH v2 07/17] KVM: x86/mmu: Check PDPTRs before allocating PAE roots Sean Christopherson
2021-04-08 11:15   ` Wanpeng Li
2021-04-08 12:09     ` Paolo Bonzini
2021-04-08 15:48       ` Sean Christopherson
2021-04-08 15:57         ` Paolo Bonzini
2021-04-08 16:27           ` Sean Christopherson
2021-04-08 16:30             ` Paolo Bonzini
2021-03-05  1:10 ` [PATCH v2 08/17] KVM: x86/mmu: Fix and unconditionally enable WARNs to detect PAE leaks Sean Christopherson
2021-03-05  1:10 ` [PATCH v2 09/17] KVM: x86/mmu: Use '0' as the one and only value for an invalid PAE root Sean Christopherson
2021-03-05 17:52   ` Paolo Bonzini
2021-03-05 18:22     ` Sean Christopherson
2021-03-05 18:23       ` Paolo Bonzini
2021-03-05  1:10 ` [PATCH v2 10/17] KVM: x86/mmu: Set the C-bit in the PDPTRs and LM pseudo-PDPTRs Sean Christopherson
2021-03-05  1:10 ` [PATCH v2 11/17] KVM: x86/mmu: Mark the PAE roots as decrypted for shadow paging Sean Christopherson
2021-03-05 17:44   ` Paolo Bonzini
2021-03-05 18:02     ` Sean Christopherson
2021-03-05  1:10 ` [PATCH v2 12/17] KVM: SVM: Don't strip the C-bit from CR2 on #PF interception Sean Christopherson
2021-03-05  1:10 ` [PATCH v2 13/17] KVM: nVMX: Defer the MMU reload to the normal path on an EPTP switch Sean Christopherson
2021-03-05  1:10 ` [PATCH v2 14/17] KVM: x86: Defer the MMU unload to the normal path on an global INVPCID Sean Christopherson
2021-03-05  1:10 ` [PATCH v2 15/17] KVM: x86/mmu: Unexport MMU load/unload functions Sean Christopherson
2021-03-05  1:11 ` [PATCH v2 16/17] KVM: x86/mmu: Sync roots after MMU load iff load as successful Sean Christopherson
2021-03-05  1:11 ` [PATCH v2 17/17] KVM: x86/mmu: WARN on NULL pae_root or lm_root, or bad shadow root level Sean Christopherson
2021-03-05 17:53 ` [PATCH v2 00/17] KVM: x86/mmu: Lots of bug fixes Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).