kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] SVM 5-level page table support
@ 2021-08-08 19:26 Wei Huang
  2021-08-08 19:26 ` [PATCH v2 1/3] KVM: x86: Allow CPU to force vendor-specific TDP level Wei Huang
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Wei Huang @ 2021-08-08 19:26 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, pbonzini, seanjc, vkuznets, wanpengli, jmattson,
	joro, tglx, mingo, bp, x86, hpa

This patch set adds 5-level page table support for AMD SVM. When the
5-level page table is enabled on host OS, the nested page table for guest
VMs will use the same format as host OS (i.e. 5-level NPT). These patches
were tested with various combination of different settings and test cases
(nested/regular VMs, AMD64/i686 kernels, kvm-unit-tests, etc.)

v1->v2:
 * Remove v1's arch-specific get_tdp_level() and add a new parameter,
   tdp_forced_root_level, to allow forced TDP level (Sean)
 * Add additional comment on tdp_root table chaining trick and change the
   PML root table allocation code (Sean)
 * Revise Patch 1's commit msg (Sean and Jim)

Thanks,
-Wei

Wei Huang (3):
  KVM: x86: Allow CPU to force vendor-specific TDP level
  KVM: x86: Handle the case of 5-level shadow page table
  KVM: SVM: Add 5-level page table support for SVM

 arch/x86/include/asm/kvm_host.h |  6 ++--
 arch/x86/kvm/mmu/mmu.c          | 58 ++++++++++++++++++++++-----------
 arch/x86/kvm/svm/svm.c          | 13 ++++----
 arch/x86/kvm/vmx/vmx.c          |  3 +-
 4 files changed, 50 insertions(+), 30 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 1/3] KVM: x86: Allow CPU to force vendor-specific TDP level
  2021-08-08 19:26 [PATCH v2 0/3] SVM 5-level page table support Wei Huang
@ 2021-08-08 19:26 ` Wei Huang
  2021-08-09  3:58   ` Yu Zhang
  2021-08-08 19:26 ` [PATCH v2 2/3] KVM: x86: Handle the case of 5-level shadow page table Wei Huang
  2021-08-08 19:26 ` [PATCH v2 3/3] KVM: SVM: Add 5-level page table support for SVM Wei Huang
  2 siblings, 1 reply; 19+ messages in thread
From: Wei Huang @ 2021-08-08 19:26 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, pbonzini, seanjc, vkuznets, wanpengli, jmattson,
	joro, tglx, mingo, bp, x86, hpa

AMD future CPUs will require a 5-level NPT if host CR4.LA57 is set.
To prevent kvm_mmu_get_tdp_level() from incorrectly changing NPT level
on behalf of CPUs, add a new parameter in kvm_configure_mmu() to force
a fixed TDP level.

Signed-off-by: Wei Huang <wei.huang2@amd.com>
---
 arch/x86/include/asm/kvm_host.h |  5 ++---
 arch/x86/kvm/mmu/mmu.c          | 10 ++++++++--
 arch/x86/kvm/svm/svm.c          |  4 +++-
 arch/x86/kvm/vmx/vmx.c          |  3 ++-
 4 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 974cbfb1eefe..6d16f75cc8da 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -723,7 +723,6 @@ struct kvm_vcpu_arch {
 
 	u64 reserved_gpa_bits;
 	int maxphyaddr;
-	int max_tdp_level;
 
 	/* emulate context */
 
@@ -1747,8 +1746,8 @@ void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid);
 void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd);
 
-void kvm_configure_mmu(bool enable_tdp, int tdp_max_root_level,
-		       int tdp_huge_page_level);
+void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
+		       int tdp_max_root_level, int tdp_huge_page_level);
 
 static inline u16 kvm_read_ldt(void)
 {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 66f7f5bc3482..c11ee4531f6d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -97,6 +97,7 @@ module_param_named(flush_on_reuse, force_flush_and_sync_on_reuse, bool, 0644);
 bool tdp_enabled = false;
 
 static int max_huge_page_level __read_mostly;
+static int tdp_root_level __read_mostly;
 static int max_tdp_level __read_mostly;
 
 enum {
@@ -4562,6 +4563,10 @@ static union kvm_mmu_role kvm_calc_mmu_role_common(struct kvm_vcpu *vcpu,
 
 static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu)
 {
+	/* tdp_root_level is architecture forced level, use it if nonzero */
+	if (tdp_root_level)
+		return tdp_root_level;
+
 	/* Use 5-level TDP if and only if it's useful/necessary. */
 	if (max_tdp_level == 5 && cpuid_maxphyaddr(vcpu) <= 48)
 		return 4;
@@ -5253,10 +5258,11 @@ void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid)
 	 */
 }
 
-void kvm_configure_mmu(bool enable_tdp, int tdp_max_root_level,
-		       int tdp_huge_page_level)
+void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
+		       int tdp_max_root_level, int tdp_huge_page_level)
 {
 	tdp_enabled = enable_tdp;
+	tdp_root_level = tdp_forced_root_level;
 	max_tdp_level = tdp_max_root_level;
 
 	/*
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e8ccab50ebf6..f361d466e18e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1015,7 +1015,9 @@ static __init int svm_hardware_setup(void)
 	if (!boot_cpu_has(X86_FEATURE_NPT))
 		npt_enabled = false;
 
-	kvm_configure_mmu(npt_enabled, get_max_npt_level(), PG_LEVEL_1G);
+	/* Force VM NPT level equal to the host's max NPT level */
+	kvm_configure_mmu(npt_enabled, get_max_npt_level(),
+			  get_max_npt_level(), PG_LEVEL_1G);
 	pr_info("kvm: Nested Paging %sabled\n", npt_enabled ? "en" : "dis");
 
 	/* Note, SEV setup consumes npt_enabled. */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 927a552393b9..034e1397c7d5 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7803,7 +7803,8 @@ static __init int hardware_setup(void)
 		ept_lpage_level = PG_LEVEL_2M;
 	else
 		ept_lpage_level = PG_LEVEL_4K;
-	kvm_configure_mmu(enable_ept, vmx_get_max_tdp_level(), ept_lpage_level);
+	kvm_configure_mmu(enable_ept, 0, vmx_get_max_tdp_level(),
+			  ept_lpage_level);
 
 	/*
 	 * Only enable PML when hardware supports PML feature, and both EPT
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 2/3] KVM: x86: Handle the case of 5-level shadow page table
  2021-08-08 19:26 [PATCH v2 0/3] SVM 5-level page table support Wei Huang
  2021-08-08 19:26 ` [PATCH v2 1/3] KVM: x86: Allow CPU to force vendor-specific TDP level Wei Huang
@ 2021-08-08 19:26 ` Wei Huang
  2021-08-09 15:17   ` Sean Christopherson
  2021-08-08 19:26 ` [PATCH v2 3/3] KVM: SVM: Add 5-level page table support for SVM Wei Huang
  2 siblings, 1 reply; 19+ messages in thread
From: Wei Huang @ 2021-08-08 19:26 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, pbonzini, seanjc, vkuznets, wanpengli, jmattson,
	joro, tglx, mingo, bp, x86, hpa

When the 5-level page table CPU flag is exposed, KVM code needs to handle
this case by pointing mmu->root_hpa to a properly-constructed 5-level page
table.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/mmu/mmu.c          | 48 +++++++++++++++++++++------------
 2 files changed, 32 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6d16f75cc8da..5d66a94ca428 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -447,6 +447,7 @@ struct kvm_mmu {
 
 	u64 *pae_root;
 	u64 *pml4_root;
+	u64 *pml5_root;
 
 	/*
 	 * check zero bits on shadow page table entries, these
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c11ee4531f6d..9985488c9524 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3430,7 +3430,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	 * the shadow page table may be a PAE or a long mode page table.
 	 */
 	pm_mask = PT_PRESENT_MASK | shadow_me_mask;
-	if (mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
+	if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL) {
 		pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK;
 
 		if (WARN_ON_ONCE(!mmu->pml4_root)) {
@@ -3457,10 +3457,19 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 		mmu->pae_root[i] = root | pm_mask;
 	}
 
-	if (mmu->shadow_root_level == PT64_ROOT_4LEVEL)
+	/*
+	 * Depending on the shadow_root_level, build the root_hpa table by
+	 * chaining either pml5->pml4->pae or pml4->pae.
+	 */
+	mmu->root_hpa = __pa(mmu->pae_root);
+	if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL) {
+		mmu->pml4_root[0] = mmu->root_hpa | pm_mask;
 		mmu->root_hpa = __pa(mmu->pml4_root);
-	else
-		mmu->root_hpa = __pa(mmu->pae_root);
+	}
+	if (mmu->shadow_root_level == PT64_ROOT_5LEVEL) {
+		mmu->pml5_root[0] = mmu->root_hpa | pm_mask;
+		mmu->root_hpa = __pa(mmu->pml5_root);
+	}
 
 set_root_pgd:
 	mmu->root_pgd = root_pgd;
@@ -3473,7 +3482,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
 {
 	struct kvm_mmu *mmu = vcpu->arch.mmu;
-	u64 *pml4_root, *pae_root;
+	u64 *pml5_root, *pml4_root, *pae_root;
 
 	/*
 	 * When shadowing 32-bit or PAE NPT with 64-bit NPT, the PML4 and PDP
@@ -3485,21 +3494,15 @@ static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
 	    mmu->shadow_root_level < PT64_ROOT_4LEVEL)
 		return 0;
 
-	/*
-	 * This mess only works with 4-level paging and needs to be updated to
-	 * work with 5-level paging.
-	 */
-	if (WARN_ON_ONCE(mmu->shadow_root_level != PT64_ROOT_4LEVEL))
-		return -EIO;
-
-	if (mmu->pae_root && mmu->pml4_root)
+	if (mmu->pae_root && mmu->pml4_root && mmu->pml5_root)
 		return 0;
 
 	/*
 	 * The special roots should always be allocated in concert.  Yell and
 	 * bail if KVM ends up in a state where only one of the roots is valid.
 	 */
-	if (WARN_ON_ONCE(!tdp_enabled || mmu->pae_root || mmu->pml4_root))
+	if (WARN_ON_ONCE(!tdp_enabled || mmu->pae_root || mmu->pml4_root ||
+			 mmu->pml5_root))
 		return -EIO;
 
 	/*
@@ -3511,15 +3514,25 @@ static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
 		return -ENOMEM;
 
 	pml4_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
-	if (!pml4_root) {
-		free_page((unsigned long)pae_root);
-		return -ENOMEM;
+	if (!pml4_root)
+		goto err_pml4;
+
+	if (mmu->shadow_root_level > PT64_ROOT_4LEVEL) {
+		pml5_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+		if (!pml5_root)
+			goto err_pml5;
 	}
 
 	mmu->pae_root = pae_root;
 	mmu->pml4_root = pml4_root;
+	mmu->pml5_root = pml5_root;
 
 	return 0;
+err_pml5:
+	free_page((unsigned long)pml4_root);
+err_pml4:
+	free_page((unsigned long)pae_root);
+	return -ENOMEM;
 }
 
 void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
@@ -5338,6 +5351,7 @@ static void free_mmu_pages(struct kvm_mmu *mmu)
 		set_memory_encrypted((unsigned long)mmu->pae_root, 1);
 	free_page((unsigned long)mmu->pae_root);
 	free_page((unsigned long)mmu->pml4_root);
+	free_page((unsigned long)mmu->pml5_root);
 }
 
 static int __kvm_mmu_create(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 3/3] KVM: SVM: Add 5-level page table support for SVM
  2021-08-08 19:26 [PATCH v2 0/3] SVM 5-level page table support Wei Huang
  2021-08-08 19:26 ` [PATCH v2 1/3] KVM: x86: Allow CPU to force vendor-specific TDP level Wei Huang
  2021-08-08 19:26 ` [PATCH v2 2/3] KVM: x86: Handle the case of 5-level shadow page table Wei Huang
@ 2021-08-08 19:26 ` Wei Huang
  2 siblings, 0 replies; 19+ messages in thread
From: Wei Huang @ 2021-08-08 19:26 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, pbonzini, seanjc, vkuznets, wanpengli, jmattson,
	joro, tglx, mingo, bp, x86, hpa

When the 5-level page table is enabled on host OS, the nested page table
for guest VMs must use 5-level as well. Update get_npt_level() function
to reflect this requirement. In the meanwhile, remove the code that
prevents kvm-amd driver from being loaded when 5-level page table is
detected.

Signed-off-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/svm/svm.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f361d466e18e..dfb864f2674b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -261,7 +261,9 @@ u32 svm_msrpm_offset(u32 msr)
 static int get_max_npt_level(void)
 {
 #ifdef CONFIG_X86_64
-	return PT64_ROOT_4LEVEL;
+	bool la57 = (cr4_read_shadow() & X86_CR4_LA57) != 0;
+
+	return la57 ? PT64_ROOT_5LEVEL : PT64_ROOT_4LEVEL;
 #else
 	return PT32E_ROOT_LEVEL;
 #endif
@@ -462,11 +464,6 @@ static int has_svm(void)
 		return 0;
 	}
 
-	if (pgtable_l5_enabled()) {
-		pr_info("KVM doesn't yet support 5-level paging on AMD SVM\n");
-		return 0;
-	}
-
 	return 1;
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Allow CPU to force vendor-specific TDP level
  2021-08-08 19:26 ` [PATCH v2 1/3] KVM: x86: Allow CPU to force vendor-specific TDP level Wei Huang
@ 2021-08-09  3:58   ` Yu Zhang
  2021-08-09  4:11     ` Wei Huang
  0 siblings, 1 reply; 19+ messages in thread
From: Yu Zhang @ 2021-08-09  3:58 UTC (permalink / raw)
  To: Wei Huang
  Cc: kvm, linux-kernel, pbonzini, seanjc, vkuznets, wanpengli,
	jmattson, joro, tglx, mingo, bp, x86, hpa

On Sun, Aug 08, 2021 at 02:26:56PM -0500, Wei Huang wrote:
> AMD future CPUs will require a 5-level NPT if host CR4.LA57 is set.

Sorry, but why? NPT is not indexed by HVA. 

> To prevent kvm_mmu_get_tdp_level() from incorrectly changing NPT level
> on behalf of CPUs, add a new parameter in kvm_configure_mmu() to force
> a fixed TDP level.
> 
> Signed-off-by: Wei Huang <wei.huang2@amd.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  5 ++---
>  arch/x86/kvm/mmu/mmu.c          | 10 ++++++++--
>  arch/x86/kvm/svm/svm.c          |  4 +++-
>  arch/x86/kvm/vmx/vmx.c          |  3 ++-
>  4 files changed, 15 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 974cbfb1eefe..6d16f75cc8da 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -723,7 +723,6 @@ struct kvm_vcpu_arch {
>  
>  	u64 reserved_gpa_bits;
>  	int maxphyaddr;
> -	int max_tdp_level;
>  
>  	/* emulate context */
>  
> @@ -1747,8 +1746,8 @@ void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>  void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid);
>  void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd);
>  
> -void kvm_configure_mmu(bool enable_tdp, int tdp_max_root_level,
> -		       int tdp_huge_page_level);
> +void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
> +		       int tdp_max_root_level, int tdp_huge_page_level);
>  
>  static inline u16 kvm_read_ldt(void)
>  {
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 66f7f5bc3482..c11ee4531f6d 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -97,6 +97,7 @@ module_param_named(flush_on_reuse, force_flush_and_sync_on_reuse, bool, 0644);
>  bool tdp_enabled = false;
>  
>  static int max_huge_page_level __read_mostly;
> +static int tdp_root_level __read_mostly;

I think this is a broken design - meaning KVM can only use 5-level or
4-level NPT for all VMs.

B.R.
Yu

>  static int max_tdp_level __read_mostly;
>  
>  enum {
> @@ -4562,6 +4563,10 @@ static union kvm_mmu_role kvm_calc_mmu_role_common(struct kvm_vcpu *vcpu,
>  
>  static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu)
>  {
> +	/* tdp_root_level is architecture forced level, use it if nonzero */
> +	if (tdp_root_level)
> +		return tdp_root_level;
> +
>  	/* Use 5-level TDP if and only if it's useful/necessary. */
>  	if (max_tdp_level == 5 && cpuid_maxphyaddr(vcpu) <= 48)
>  		return 4;
> @@ -5253,10 +5258,11 @@ void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid)
>  	 */
>  }
>  
> -void kvm_configure_mmu(bool enable_tdp, int tdp_max_root_level,
> -		       int tdp_huge_page_level)
> +void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
> +		       int tdp_max_root_level, int tdp_huge_page_level)
>  {
>  	tdp_enabled = enable_tdp;
> +	tdp_root_level = tdp_forced_root_level;
>  	max_tdp_level = tdp_max_root_level;
>  
>  	/*
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index e8ccab50ebf6..f361d466e18e 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1015,7 +1015,9 @@ static __init int svm_hardware_setup(void)
>  	if (!boot_cpu_has(X86_FEATURE_NPT))
>  		npt_enabled = false;
>  
> -	kvm_configure_mmu(npt_enabled, get_max_npt_level(), PG_LEVEL_1G);
> +	/* Force VM NPT level equal to the host's max NPT level */
> +	kvm_configure_mmu(npt_enabled, get_max_npt_level(),
> +			  get_max_npt_level(), PG_LEVEL_1G);
>  	pr_info("kvm: Nested Paging %sabled\n", npt_enabled ? "en" : "dis");
>  
>  	/* Note, SEV setup consumes npt_enabled. */
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 927a552393b9..034e1397c7d5 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7803,7 +7803,8 @@ static __init int hardware_setup(void)
>  		ept_lpage_level = PG_LEVEL_2M;
>  	else
>  		ept_lpage_level = PG_LEVEL_4K;
> -	kvm_configure_mmu(enable_ept, vmx_get_max_tdp_level(), ept_lpage_level);
> +	kvm_configure_mmu(enable_ept, 0, vmx_get_max_tdp_level(),
> +			  ept_lpage_level);
>  
>  	/*
>  	 * Only enable PML when hardware supports PML feature, and both EPT
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Allow CPU to force vendor-specific TDP level
  2021-08-09  3:58   ` Yu Zhang
@ 2021-08-09  4:11     ` Wei Huang
  2021-08-09  4:27       ` Yu Zhang
  0 siblings, 1 reply; 19+ messages in thread
From: Wei Huang @ 2021-08-09  4:11 UTC (permalink / raw)
  To: Yu Zhang
  Cc: kvm, linux-kernel, pbonzini, seanjc, vkuznets, wanpengli,
	jmattson, joro, tglx, mingo, bp, x86, hpa



On 8/8/21 10:58 PM, Yu Zhang wrote:
> On Sun, Aug 08, 2021 at 02:26:56PM -0500, Wei Huang wrote:
>> AMD future CPUs will require a 5-level NPT if host CR4.LA57 is set.
> 
> Sorry, but why? NPT is not indexed by HVA.

NPT is not indexed by HVA - it is always indexed by GPA. What I meant is 
NPT page table level has to be the same as the host OS page table: if 
5-level page table is enabled in host OS (CR4.LA57=1), guest NPT has to 
5-level too.

> 
>> To prevent kvm_mmu_get_tdp_level() from incorrectly changing NPT level
>> on behalf of CPUs, add a new parameter in kvm_configure_mmu() to force
>> a fixed TDP level.
>>
>> Signed-off-by: Wei Huang <wei.huang2@amd.com>
>> ---
>>   arch/x86/include/asm/kvm_host.h |  5 ++---
>>   arch/x86/kvm/mmu/mmu.c          | 10 ++++++++--
>>   arch/x86/kvm/svm/svm.c          |  4 +++-
>>   arch/x86/kvm/vmx/vmx.c          |  3 ++-
>>   4 files changed, 15 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 974cbfb1eefe..6d16f75cc8da 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -723,7 +723,6 @@ struct kvm_vcpu_arch {
>>   
>>   	u64 reserved_gpa_bits;
>>   	int maxphyaddr;
>> -	int max_tdp_level;
>>   
>>   	/* emulate context */
>>   
>> @@ -1747,8 +1746,8 @@ void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>>   void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid);
>>   void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd);
>>   
>> -void kvm_configure_mmu(bool enable_tdp, int tdp_max_root_level,
>> -		       int tdp_huge_page_level);
>> +void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
>> +		       int tdp_max_root_level, int tdp_huge_page_level);
>>   
>>   static inline u16 kvm_read_ldt(void)
>>   {
>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>> index 66f7f5bc3482..c11ee4531f6d 100644
>> --- a/arch/x86/kvm/mmu/mmu.c
>> +++ b/arch/x86/kvm/mmu/mmu.c
>> @@ -97,6 +97,7 @@ module_param_named(flush_on_reuse, force_flush_and_sync_on_reuse, bool, 0644);
>>   bool tdp_enabled = false;
>>   
>>   static int max_huge_page_level __read_mostly;
>> +static int tdp_root_level __read_mostly;
> 
> I think this is a broken design - meaning KVM can only use 5-level or
> 4-level NPT for all VMs.

Broken normally means non-functional or buggy, which doesn't apply here. 
A good TLB design should be able to offset the potential overhead of 
5-level page table for most cases.

> 
> B.R.
> Yu
> 
>>   static int max_tdp_level __read_mostly;
>>   
>>   enum {
>> @@ -4562,6 +4563,10 @@ static union kvm_mmu_role kvm_calc_mmu_role_common(struct kvm_vcpu *vcpu,
>>   
>>   static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu)
>>   {
>> +	/* tdp_root_level is architecture forced level, use it if nonzero */
>> +	if (tdp_root_level)
>> +		return tdp_root_level;
>> +
>>   	/* Use 5-level TDP if and only if it's useful/necessary. */
>>   	if (max_tdp_level == 5 && cpuid_maxphyaddr(vcpu) <= 48)
>>   		return 4;
>> @@ -5253,10 +5258,11 @@ void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid)
>>   	 */
>>   }
>>   
>> -void kvm_configure_mmu(bool enable_tdp, int tdp_max_root_level,
>> -		       int tdp_huge_page_level)
>> +void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
>> +		       int tdp_max_root_level, int tdp_huge_page_level)
>>   {
>>   	tdp_enabled = enable_tdp;
>> +	tdp_root_level = tdp_forced_root_level;
>>   	max_tdp_level = tdp_max_root_level;
>>   
>>   	/*
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index e8ccab50ebf6..f361d466e18e 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -1015,7 +1015,9 @@ static __init int svm_hardware_setup(void)
>>   	if (!boot_cpu_has(X86_FEATURE_NPT))
>>   		npt_enabled = false;
>>   
>> -	kvm_configure_mmu(npt_enabled, get_max_npt_level(), PG_LEVEL_1G);
>> +	/* Force VM NPT level equal to the host's max NPT level */
>> +	kvm_configure_mmu(npt_enabled, get_max_npt_level(),
>> +			  get_max_npt_level(), PG_LEVEL_1G);
>>   	pr_info("kvm: Nested Paging %sabled\n", npt_enabled ? "en" : "dis");
>>   
>>   	/* Note, SEV setup consumes npt_enabled. */
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index 927a552393b9..034e1397c7d5 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -7803,7 +7803,8 @@ static __init int hardware_setup(void)
>>   		ept_lpage_level = PG_LEVEL_2M;
>>   	else
>>   		ept_lpage_level = PG_LEVEL_4K;
>> -	kvm_configure_mmu(enable_ept, vmx_get_max_tdp_level(), ept_lpage_level);
>> +	kvm_configure_mmu(enable_ept, 0, vmx_get_max_tdp_level(),
>> +			  ept_lpage_level);
>>   
>>   	/*
>>   	 * Only enable PML when hardware supports PML feature, and both EPT
>> -- 
>> 2.31.1
>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Allow CPU to force vendor-specific TDP level
  2021-08-09  4:11     ` Wei Huang
@ 2021-08-09  4:27       ` Yu Zhang
  2021-08-09  4:33         ` Wei Huang
  0 siblings, 1 reply; 19+ messages in thread
From: Yu Zhang @ 2021-08-09  4:27 UTC (permalink / raw)
  To: Wei Huang
  Cc: kvm, linux-kernel, pbonzini, seanjc, vkuznets, wanpengli,
	jmattson, joro, tglx, mingo, bp, x86, hpa

On Sun, Aug 08, 2021 at 11:11:40PM -0500, Wei Huang wrote:
> 
> 
> On 8/8/21 10:58 PM, Yu Zhang wrote:
> > On Sun, Aug 08, 2021 at 02:26:56PM -0500, Wei Huang wrote:
> > > AMD future CPUs will require a 5-level NPT if host CR4.LA57 is set.
> > 
> > Sorry, but why? NPT is not indexed by HVA.
> 
> NPT is not indexed by HVA - it is always indexed by GPA. What I meant is NPT
> page table level has to be the same as the host OS page table: if 5-level
> page table is enabled in host OS (CR4.LA57=1), guest NPT has to 5-level too.

I know what you meant. But may I ask why? 

B.R.
Yu

> 
> > 
> > > To prevent kvm_mmu_get_tdp_level() from incorrectly changing NPT level
> > > on behalf of CPUs, add a new parameter in kvm_configure_mmu() to force
> > > a fixed TDP level.
> > > 
> > > Signed-off-by: Wei Huang <wei.huang2@amd.com>
> > > ---
> > >   arch/x86/include/asm/kvm_host.h |  5 ++---
> > >   arch/x86/kvm/mmu/mmu.c          | 10 ++++++++--
> > >   arch/x86/kvm/svm/svm.c          |  4 +++-
> > >   arch/x86/kvm/vmx/vmx.c          |  3 ++-
> > >   4 files changed, 15 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index 974cbfb1eefe..6d16f75cc8da 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -723,7 +723,6 @@ struct kvm_vcpu_arch {
> > >   	u64 reserved_gpa_bits;
> > >   	int maxphyaddr;
> > > -	int max_tdp_level;
> > >   	/* emulate context */
> > > @@ -1747,8 +1746,8 @@ void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> > >   void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid);
> > >   void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd);
> > > -void kvm_configure_mmu(bool enable_tdp, int tdp_max_root_level,
> > > -		       int tdp_huge_page_level);
> > > +void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
> > > +		       int tdp_max_root_level, int tdp_huge_page_level);
> > >   static inline u16 kvm_read_ldt(void)
> > >   {
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index 66f7f5bc3482..c11ee4531f6d 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -97,6 +97,7 @@ module_param_named(flush_on_reuse, force_flush_and_sync_on_reuse, bool, 0644);
> > >   bool tdp_enabled = false;
> > >   static int max_huge_page_level __read_mostly;
> > > +static int tdp_root_level __read_mostly;
> > 
> > I think this is a broken design - meaning KVM can only use 5-level or
> > 4-level NPT for all VMs.
> 
> Broken normally means non-functional or buggy, which doesn't apply here. A
> good TLB design should be able to offset the potential overhead of 5-level
> page table for most cases.
> 
> > 
> > B.R.
> > Yu
> > 
> > >   static int max_tdp_level __read_mostly;
> > >   enum {
> > > @@ -4562,6 +4563,10 @@ static union kvm_mmu_role kvm_calc_mmu_role_common(struct kvm_vcpu *vcpu,
> > >   static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu)
> > >   {
> > > +	/* tdp_root_level is architecture forced level, use it if nonzero */
> > > +	if (tdp_root_level)
> > > +		return tdp_root_level;
> > > +
> > >   	/* Use 5-level TDP if and only if it's useful/necessary. */
> > >   	if (max_tdp_level == 5 && cpuid_maxphyaddr(vcpu) <= 48)
> > >   		return 4;
> > > @@ -5253,10 +5258,11 @@ void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid)
> > >   	 */
> > >   }
> > > -void kvm_configure_mmu(bool enable_tdp, int tdp_max_root_level,
> > > -		       int tdp_huge_page_level)
> > > +void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
> > > +		       int tdp_max_root_level, int tdp_huge_page_level)
> > >   {
> > >   	tdp_enabled = enable_tdp;
> > > +	tdp_root_level = tdp_forced_root_level;
> > >   	max_tdp_level = tdp_max_root_level;
> > >   	/*
> > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > index e8ccab50ebf6..f361d466e18e 100644
> > > --- a/arch/x86/kvm/svm/svm.c
> > > +++ b/arch/x86/kvm/svm/svm.c
> > > @@ -1015,7 +1015,9 @@ static __init int svm_hardware_setup(void)
> > >   	if (!boot_cpu_has(X86_FEATURE_NPT))
> > >   		npt_enabled = false;
> > > -	kvm_configure_mmu(npt_enabled, get_max_npt_level(), PG_LEVEL_1G);
> > > +	/* Force VM NPT level equal to the host's max NPT level */
> > > +	kvm_configure_mmu(npt_enabled, get_max_npt_level(),
> > > +			  get_max_npt_level(), PG_LEVEL_1G);
> > >   	pr_info("kvm: Nested Paging %sabled\n", npt_enabled ? "en" : "dis");
> > >   	/* Note, SEV setup consumes npt_enabled. */
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index 927a552393b9..034e1397c7d5 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -7803,7 +7803,8 @@ static __init int hardware_setup(void)
> > >   		ept_lpage_level = PG_LEVEL_2M;
> > >   	else
> > >   		ept_lpage_level = PG_LEVEL_4K;
> > > -	kvm_configure_mmu(enable_ept, vmx_get_max_tdp_level(), ept_lpage_level);
> > > +	kvm_configure_mmu(enable_ept, 0, vmx_get_max_tdp_level(),
> > > +			  ept_lpage_level);
> > >   	/*
> > >   	 * Only enable PML when hardware supports PML feature, and both EPT
> > > -- 
> > > 2.31.1
> > > 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Allow CPU to force vendor-specific TDP level
  2021-08-09  4:27       ` Yu Zhang
@ 2021-08-09  4:33         ` Wei Huang
  2021-08-09  6:42           ` Yu Zhang
  0 siblings, 1 reply; 19+ messages in thread
From: Wei Huang @ 2021-08-09  4:33 UTC (permalink / raw)
  To: Yu Zhang
  Cc: kvm, linux-kernel, pbonzini, seanjc, vkuznets, wanpengli,
	jmattson, joro, tglx, mingo, bp, x86, hpa



On 8/8/21 11:27 PM, Yu Zhang wrote:
> On Sun, Aug 08, 2021 at 11:11:40PM -0500, Wei Huang wrote:
>>
>>
>> On 8/8/21 10:58 PM, Yu Zhang wrote:
>>> On Sun, Aug 08, 2021 at 02:26:56PM -0500, Wei Huang wrote:
>>>> AMD future CPUs will require a 5-level NPT if host CR4.LA57 is set.
>>>
>>> Sorry, but why? NPT is not indexed by HVA.
>>
>> NPT is not indexed by HVA - it is always indexed by GPA. What I meant is NPT
>> page table level has to be the same as the host OS page table: if 5-level
>> page table is enabled in host OS (CR4.LA57=1), guest NPT has to 5-level too.
> 
> I know what you meant. But may I ask why?

I don't have a good answer for it. From what I know, VMCB doesn't have a 
field to indicate guest page table level. As a result, hardware relies 
on host CR4 to infer NPT level.

> 
> B.R.
> Yu
> 
>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Allow CPU to force vendor-specific TDP level
  2021-08-09  4:33         ` Wei Huang
@ 2021-08-09  6:42           ` Yu Zhang
  2021-08-09 15:30             ` Sean Christopherson
  0 siblings, 1 reply; 19+ messages in thread
From: Yu Zhang @ 2021-08-09  6:42 UTC (permalink / raw)
  To: Wei Huang
  Cc: kvm, linux-kernel, pbonzini, seanjc, vkuznets, wanpengli,
	jmattson, joro, tglx, mingo, bp, x86, hpa

On Sun, Aug 08, 2021 at 11:33:44PM -0500, Wei Huang wrote:
> 
> 
> On 8/8/21 11:27 PM, Yu Zhang wrote:
> > On Sun, Aug 08, 2021 at 11:11:40PM -0500, Wei Huang wrote:
> > > 
> > > 
> > > On 8/8/21 10:58 PM, Yu Zhang wrote:
> > > > On Sun, Aug 08, 2021 at 02:26:56PM -0500, Wei Huang wrote:
> > > > > AMD future CPUs will require a 5-level NPT if host CR4.LA57 is set.
> > > > 
> > > > Sorry, but why? NPT is not indexed by HVA.
> > > 
> > > NPT is not indexed by HVA - it is always indexed by GPA. What I meant is NPT
> > > page table level has to be the same as the host OS page table: if 5-level
> > > page table is enabled in host OS (CR4.LA57=1), guest NPT has to 5-level too.
> > 
> > I know what you meant. But may I ask why?
> 
> I don't have a good answer for it. From what I know, VMCB doesn't have a
> field to indicate guest page table level. As a result, hardware relies on
> host CR4 to infer NPT level.

I guess you mean not even in the N_CR3 field of VMCB? 

Then it's not a broken design - it's a limitation of SVM. :)

B.R.
Yu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/3] KVM: x86: Handle the case of 5-level shadow page table
  2021-08-08 19:26 ` [PATCH v2 2/3] KVM: x86: Handle the case of 5-level shadow page table Wei Huang
@ 2021-08-09 15:17   ` Sean Christopherson
  2021-08-09 17:03     ` Wei Huang
  0 siblings, 1 reply; 19+ messages in thread
From: Sean Christopherson @ 2021-08-09 15:17 UTC (permalink / raw)
  To: Wei Huang
  Cc: kvm, linux-kernel, pbonzini, vkuznets, wanpengli, jmattson, joro,
	tglx, mingo, bp, x86, hpa

On Sun, Aug 08, 2021, Wei Huang wrote:
> @@ -3457,10 +3457,19 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
>  		mmu->pae_root[i] = root | pm_mask;
>  	}
>  
> -	if (mmu->shadow_root_level == PT64_ROOT_4LEVEL)
> +	/*
> +	 * Depending on the shadow_root_level, build the root_hpa table by
> +	 * chaining either pml5->pml4->pae or pml4->pae.
> +	 */
> +	mmu->root_hpa = __pa(mmu->pae_root);
> +	if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL) {
> +		mmu->pml4_root[0] = mmu->root_hpa | pm_mask;
>  		mmu->root_hpa = __pa(mmu->pml4_root);
> -	else
> -		mmu->root_hpa = __pa(mmu->pae_root);
> +	}
> +	if (mmu->shadow_root_level == PT64_ROOT_5LEVEL) {
> +		mmu->pml5_root[0] = mmu->root_hpa | pm_mask;
> +		mmu->root_hpa = __pa(mmu->pml5_root);
> +	}

I still really dislike this approach, it requires visually connecting multiple
statements to understand the chain.  I don't see any advantage (the 6-level paging
comment was 99.9% a joke) of rewriting root_hpa other than that's how it's done today.

In the future, please give reviewers ample opportunity to respond before sending
a new version if there's disagreement, otherwise the conversation gets carried
over into a different thread and loses the original context.

>  
>  set_root_pgd:
>  	mmu->root_pgd = root_pgd;

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Allow CPU to force vendor-specific TDP level
  2021-08-09  6:42           ` Yu Zhang
@ 2021-08-09 15:30             ` Sean Christopherson
  2021-08-09 21:49               ` Jim Mattson
  2021-08-10  7:40               ` Yu Zhang
  0 siblings, 2 replies; 19+ messages in thread
From: Sean Christopherson @ 2021-08-09 15:30 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Wei Huang, kvm, linux-kernel, pbonzini, vkuznets, wanpengli,
	jmattson, joro, tglx, mingo, bp, x86, hpa

On Mon, Aug 09, 2021, Yu Zhang wrote:
> On Sun, Aug 08, 2021 at 11:33:44PM -0500, Wei Huang wrote:
> > 
> > On 8/8/21 11:27 PM, Yu Zhang wrote:
> > > On Sun, Aug 08, 2021 at 11:11:40PM -0500, Wei Huang wrote:
> > > > 
> > > > 
> > > > On 8/8/21 10:58 PM, Yu Zhang wrote:
> > > > > On Sun, Aug 08, 2021 at 02:26:56PM -0500, Wei Huang wrote:
> > > > > > AMD future CPUs will require a 5-level NPT if host CR4.LA57 is set.
> > > > > 
> > > > > Sorry, but why? NPT is not indexed by HVA.
> > > > 
> > > > NPT is not indexed by HVA - it is always indexed by GPA. What I meant is NPT
> > > > page table level has to be the same as the host OS page table: if 5-level
> > > > page table is enabled in host OS (CR4.LA57=1), guest NPT has to 5-level too.
> > > 
> > > I know what you meant. But may I ask why?
> > 
> > I don't have a good answer for it. From what I know, VMCB doesn't have a
> > field to indicate guest page table level. As a result, hardware relies on
> > host CR4 to infer NPT level.
> 
> I guess you mean not even in the N_CR3 field of VMCB? 

Correct, nCR3 is a basically a pure representation of a regular CR3.

> Then it's not a broken design - it's a limitation of SVM. :)

That's just a polite way of saying it's a broken design ;-)

Joking aside, NPT opted for a semblance of backwards compatibility at the cost of
having to carry all the baggage that comes with a legacy design.  Keeping the core
functionality from IA32 paging presumably miminizes design and hardware costs, and
required minimal enabling in hypervisors.  The downside is that it's less flexible
than EPT and has a few warts, e.g. shadowing NPT is gross because the host can't
easily mirror L1's desired paging mode.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/3] KVM: x86: Handle the case of 5-level shadow page table
  2021-08-09 15:17   ` Sean Christopherson
@ 2021-08-09 17:03     ` Wei Huang
  0 siblings, 0 replies; 19+ messages in thread
From: Wei Huang @ 2021-08-09 17:03 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, pbonzini, vkuznets, wanpengli, jmattson, joro,
	tglx, mingo, bp, x86, hpa



On 8/9/21 10:17 AM, Sean Christopherson wrote:
> On Sun, Aug 08, 2021, Wei Huang wrote:
>> @@ -3457,10 +3457,19 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
>>  		mmu->pae_root[i] = root | pm_mask;
>>  	}
>>  
>> -	if (mmu->shadow_root_level == PT64_ROOT_4LEVEL)
>> +	/*
>> +	 * Depending on the shadow_root_level, build the root_hpa table by
>> +	 * chaining either pml5->pml4->pae or pml4->pae.
>> +	 */
>> +	mmu->root_hpa = __pa(mmu->pae_root);
>> +	if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL) {
>> +		mmu->pml4_root[0] = mmu->root_hpa | pm_mask;
>>  		mmu->root_hpa = __pa(mmu->pml4_root);
>> -	else
>> -		mmu->root_hpa = __pa(mmu->pae_root);
>> +	}
>> +	if (mmu->shadow_root_level == PT64_ROOT_5LEVEL) {
>> +		mmu->pml5_root[0] = mmu->root_hpa | pm_mask;
>> +		mmu->root_hpa = __pa(mmu->pml5_root);
>> +	}
> 
> I still really dislike this approach, it requires visually connecting multiple
> statements to understand the chain.  I don't see any advantage (the 6-level paging
> comment was 99.9% a joke) of rewriting root_hpa other than that's how it's done today.
> 

I can change this part in v3, unless different comments from other
reviewers.

> In the future, please give reviewers ample opportunity to respond before sending
> a new version if there's disagreement, otherwise the conversation gets carried
> over into a different thread and loses the original context.
> 
>>  
>>  set_root_pgd:
>>  	mmu->root_pgd = root_pgd;

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Allow CPU to force vendor-specific TDP level
  2021-08-09 15:30             ` Sean Christopherson
@ 2021-08-09 21:49               ` Jim Mattson
  2021-08-10  9:23                 ` Paolo Bonzini
  2021-08-10  7:40               ` Yu Zhang
  1 sibling, 1 reply; 19+ messages in thread
From: Jim Mattson @ 2021-08-09 21:49 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Yu Zhang, Wei Huang, kvm, linux-kernel, pbonzini, vkuznets,
	wanpengli, joro, tglx, mingo, bp, x86, hpa

On Mon, Aug 9, 2021 at 8:30 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Aug 09, 2021, Yu Zhang wrote:
> > On Sun, Aug 08, 2021 at 11:33:44PM -0500, Wei Huang wrote:
> > >
> > > On 8/8/21 11:27 PM, Yu Zhang wrote:
> > > > On Sun, Aug 08, 2021 at 11:11:40PM -0500, Wei Huang wrote:
> > > > >
> > > > >
> > > > > On 8/8/21 10:58 PM, Yu Zhang wrote:
> > > > > > On Sun, Aug 08, 2021 at 02:26:56PM -0500, Wei Huang wrote:
> > > > > > > AMD future CPUs will require a 5-level NPT if host CR4.LA57 is set.
> > > > > >
> > > > > > Sorry, but why? NPT is not indexed by HVA.
> > > > >
> > > > > NPT is not indexed by HVA - it is always indexed by GPA. What I meant is NPT
> > > > > page table level has to be the same as the host OS page table: if 5-level
> > > > > page table is enabled in host OS (CR4.LA57=1), guest NPT has to 5-level too.
> > > >
> > > > I know what you meant. But may I ask why?
> > >
> > > I don't have a good answer for it. From what I know, VMCB doesn't have a
> > > field to indicate guest page table level. As a result, hardware relies on
> > > host CR4 to infer NPT level.
> >
> > I guess you mean not even in the N_CR3 field of VMCB?
>
> Correct, nCR3 is a basically a pure representation of a regular CR3.
>
> > Then it's not a broken design - it's a limitation of SVM. :)
>
> That's just a polite way of saying it's a broken design ;-)

Doesn't this break legacy type 2 hypervisors that don't know anything
about 5-level NPT and don't have any control over whether or not the
host uses 5-level paging?

> Joking aside, NPT opted for a semblance of backwards compatibility at the cost of
> having to carry all the baggage that comes with a legacy design.  Keeping the core
> functionality from IA32 paging presumably miminizes design and hardware costs, and
> required minimal enabling in hypervisors.  The downside is that it's less flexible
> than EPT and has a few warts, e.g. shadowing NPT is gross because the host can't
> easily mirror L1's desired paging mode.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Allow CPU to force vendor-specific TDP level
  2021-08-09 15:30             ` Sean Christopherson
  2021-08-09 21:49               ` Jim Mattson
@ 2021-08-10  7:40               ` Yu Zhang
  2021-08-10  9:25                 ` Paolo Bonzini
  1 sibling, 1 reply; 19+ messages in thread
From: Yu Zhang @ 2021-08-10  7:40 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Wei Huang, kvm, linux-kernel, pbonzini, vkuznets, wanpengli,
	jmattson, joro, tglx, mingo, bp, x86, hpa

On Mon, Aug 09, 2021 at 03:30:08PM +0000, Sean Christopherson wrote:
> On Mon, Aug 09, 2021, Yu Zhang wrote:
> > On Sun, Aug 08, 2021 at 11:33:44PM -0500, Wei Huang wrote:
> > > 
> > > On 8/8/21 11:27 PM, Yu Zhang wrote:
> > > > On Sun, Aug 08, 2021 at 11:11:40PM -0500, Wei Huang wrote:
> > > > > 
> > > > > 
> > > > > On 8/8/21 10:58 PM, Yu Zhang wrote:
> > > > > > On Sun, Aug 08, 2021 at 02:26:56PM -0500, Wei Huang wrote:
> > > > > > > AMD future CPUs will require a 5-level NPT if host CR4.LA57 is set.
> > > > > > 
> > > > > > Sorry, but why? NPT is not indexed by HVA.
> > > > > 
> > > > > NPT is not indexed by HVA - it is always indexed by GPA. What I meant is NPT
> > > > > page table level has to be the same as the host OS page table: if 5-level
> > > > > page table is enabled in host OS (CR4.LA57=1), guest NPT has to 5-level too.
> > > > 
> > > > I know what you meant. But may I ask why?
> > > 
> > > I don't have a good answer for it. From what I know, VMCB doesn't have a
> > > field to indicate guest page table level. As a result, hardware relies on
> > > host CR4 to infer NPT level.
> > 
> > I guess you mean not even in the N_CR3 field of VMCB? 
> 
> Correct, nCR3 is a basically a pure representation of a regular CR3.
> 
> > Then it's not a broken design - it's a limitation of SVM. :)
> 
> That's just a polite way of saying it's a broken design ;-)
> 
> Joking aside, NPT opted for a semblance of backwards compatibility at the cost of
> having to carry all the baggage that comes with a legacy design.  Keeping the core
> functionality from IA32 paging presumably miminizes design and hardware costs, and
> required minimal enabling in hypervisors.  The downside is that it's less flexible
> than EPT and has a few warts, e.g. shadowing NPT is gross because the host can't
> easily mirror L1's desired paging mode.

Thanks for your explaination, Sean. Everything has a cost, and now it's time to pay
the price. :)

As to the max level, it is calculated in kvm_init(). Though I do not see any chance
that host paging mode will be changed after kvm_init(), or any case that Linux uses
different paging levels in different pCPUs, I am wondering, should we do something,
e.g., to document this as an open?

About "host can't easily mirror L1's desired paging mode", could you please elaborate?
Thanks!

B.R.
Yu


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Allow CPU to force vendor-specific TDP level
  2021-08-09 21:49               ` Jim Mattson
@ 2021-08-10  9:23                 ` Paolo Bonzini
  0 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2021-08-10  9:23 UTC (permalink / raw)
  To: Jim Mattson, Sean Christopherson
  Cc: Yu Zhang, Wei Huang, kvm, linux-kernel, vkuznets, wanpengli,
	joro, tglx, mingo, bp, x86, hpa

On 09/08/21 23:49, Jim Mattson wrote:
> Doesn't this break legacy type 2 hypervisors that don't know anything
> about 5-level NPT and don't have any control over whether or not the
> host uses 5-level paging?

Yes, where "legacy" probably means "all released versions of all of 
them", including KVM.  Host support for LA57 was merged in 4.13, while 
KVM started supporting 5-level page tables in EPT in 4.14 and even then 
just returned PT64_ROOT_LEVEL (i.e. 4) for the maximum NPT level.

So all Linux versions up to 5.13, which has "KVM: x86: Prevent KVM SVM 
from loading on kernels with 5-level paging", will break horribly. 
Better backport that patch to stable...

Paolo


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Allow CPU to force vendor-specific TDP level
  2021-08-10  7:40               ` Yu Zhang
@ 2021-08-10  9:25                 ` Paolo Bonzini
  2021-08-10 11:00                   ` Yu Zhang
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2021-08-10  9:25 UTC (permalink / raw)
  To: Yu Zhang, Sean Christopherson
  Cc: Wei Huang, kvm, linux-kernel, vkuznets, wanpengli, jmattson,
	joro, tglx, mingo, bp, x86, hpa

On 10/08/21 09:40, Yu Zhang wrote:
> About "host can't easily mirror L1's desired paging mode", could you please elaborate?
> Thanks!

Shadow pgae tables in KVM will always have 3 levels on 32-bit machines 
and 4/5 levels on 64-bit machines.  L1 instead might have any number of 
levels from 2 to 5 (though of course not more than the host has).

Therefore, when shadowing 32-bit NPT page tables, KVM has to add extra 
fixed levels on top of those that it's shadowing.  See 
mmu_alloc_direct_roots for the code.

Paolo


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Allow CPU to force vendor-specific TDP level
  2021-08-10  9:25                 ` Paolo Bonzini
@ 2021-08-10 11:00                   ` Yu Zhang
  2021-08-10 12:47                     ` Paolo Bonzini
  0 siblings, 1 reply; 19+ messages in thread
From: Yu Zhang @ 2021-08-10 11:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Wei Huang, kvm, linux-kernel, vkuznets,
	wanpengli, jmattson, joro, tglx, mingo, bp, x86, hpa

On Tue, Aug 10, 2021 at 11:25:27AM +0200, Paolo Bonzini wrote:
> On 10/08/21 09:40, Yu Zhang wrote:
> > About "host can't easily mirror L1's desired paging mode", could you please elaborate?
> > Thanks!
> 
> Shadow pgae tables in KVM will always have 3 levels on 32-bit machines and
> 4/5 levels on 64-bit machines.  L1 instead might have any number of levels
> from 2 to 5 (though of course not more than the host has).

Thanks Paolo.

I guess it's because, unlike EPT which are with either 4 or 5 levels, NPT's
level can range from 2 to 5, depending on the host paging mode...

> 
> Therefore, when shadowing 32-bit NPT page tables, KVM has to add extra fixed
> levels on top of those that it's shadowing.  See mmu_alloc_direct_roots for
> the code.

So when shadowing NPTs(can be 2/3 levels, depending on the paging mode in L1),
and if L0 Linux is running in 4/5 level mode, extra levels of paging structures
is needed in the shadow NPT.

But shadow EPT does not have such annoyance. Is above understanding correct?

B.R.
Yu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Allow CPU to force vendor-specific TDP level
  2021-08-10 11:00                   ` Yu Zhang
@ 2021-08-10 12:47                     ` Paolo Bonzini
  2021-08-10 14:37                       ` Yu Zhang
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2021-08-10 12:47 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Sean Christopherson, Wei Huang, kvm, linux-kernel, vkuznets,
	wanpengli, jmattson, joro, tglx, mingo, bp, x86, hpa

On 10/08/21 13:00, Yu Zhang wrote:
> I guess it's because, unlike EPT which are with either 4 or 5 levels, NPT's
> level can range from 2 to 5, depending on the host paging mode...

Yes, on Linux that will be one of 3/4/5 based on host paging mode, and 
it will apply to all N_CR3...

> But shadow EPT does not have such annoyance. Is above understanding correct?

... Right, because shadow EPT cannot have less than 4 levels, and it can 
always use 4 levels if that's what L1 uses.

Paolo


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Allow CPU to force vendor-specific TDP level
  2021-08-10 12:47                     ` Paolo Bonzini
@ 2021-08-10 14:37                       ` Yu Zhang
  0 siblings, 0 replies; 19+ messages in thread
From: Yu Zhang @ 2021-08-10 14:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Wei Huang, kvm, linux-kernel, vkuznets,
	wanpengli, jmattson, joro, tglx, mingo, bp, x86, hpa

On Tue, Aug 10, 2021 at 02:47:27PM +0200, Paolo Bonzini wrote:
> On 10/08/21 13:00, Yu Zhang wrote:
> > I guess it's because, unlike EPT which are with either 4 or 5 levels, NPT's
> > level can range from 2 to 5, depending on the host paging mode...
> 
> Yes, on Linux that will be one of 3/4/5 based on host paging mode, and it
> will apply to all N_CR3...
> 
> > But shadow EPT does not have such annoyance. Is above understanding correct?
> 
> ... Right, because shadow EPT cannot have less than 4 levels, and it can
> always use 4 levels if that's what L1 uses.

Interesting. :) Thanks a lot for the explanation!

B.R.
Yu

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-08-10 14:37 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-08 19:26 [PATCH v2 0/3] SVM 5-level page table support Wei Huang
2021-08-08 19:26 ` [PATCH v2 1/3] KVM: x86: Allow CPU to force vendor-specific TDP level Wei Huang
2021-08-09  3:58   ` Yu Zhang
2021-08-09  4:11     ` Wei Huang
2021-08-09  4:27       ` Yu Zhang
2021-08-09  4:33         ` Wei Huang
2021-08-09  6:42           ` Yu Zhang
2021-08-09 15:30             ` Sean Christopherson
2021-08-09 21:49               ` Jim Mattson
2021-08-10  9:23                 ` Paolo Bonzini
2021-08-10  7:40               ` Yu Zhang
2021-08-10  9:25                 ` Paolo Bonzini
2021-08-10 11:00                   ` Yu Zhang
2021-08-10 12:47                     ` Paolo Bonzini
2021-08-10 14:37                       ` Yu Zhang
2021-08-08 19:26 ` [PATCH v2 2/3] KVM: x86: Handle the case of 5-level shadow page table Wei Huang
2021-08-09 15:17   ` Sean Christopherson
2021-08-09 17:03     ` Wei Huang
2021-08-08 19:26 ` [PATCH v2 3/3] KVM: SVM: Add 5-level page table support for SVM Wei Huang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).