kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] TDP MMU: several minor fixes or improvements
@ 2021-05-05  9:37 Kai Huang
  2021-05-05  9:37 ` [PATCH 1/3] KVM: x86/mmu: Fix return value in tdp_mmu_map_handle_target_level() Kai Huang
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Kai Huang @ 2021-05-05  9:37 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, bgardon, seanjc, vkuznets, wanpengli, jmattson, joro,
	Kai Huang

Pathc 1 and 2 are basically v2 of below patch, after discussion with Ben:

https://lore.kernel.org/kvm/b6d23d9fd8e526e5c7c1a968e2018d13c5433547.camel@intel.com/T/#t

Patch 3 is a new one.

I didn't do lots of tests, especially I didn't do stress test, but only tested
by creating couple of VMs (16 vcpus, 4G memory) and killing them, and everything
seems fine.

Kai Huang (3):
  KVM: x86/mmu: Fix return value in tdp_mmu_map_handle_target_level()
  KVM: x86/mmu: Fix pf_fixed count in tdp_mmu_map_handle_target_level()
  KVM: x86/mmu: Fix TDP MMU page table level

 arch/x86/kvm/mmu/tdp_mmu.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/3] KVM: x86/mmu: Fix return value in tdp_mmu_map_handle_target_level()
  2021-05-05  9:37 [PATCH 0/3] TDP MMU: several minor fixes or improvements Kai Huang
@ 2021-05-05  9:37 ` Kai Huang
  2021-05-05 16:00   ` Sean Christopherson
  2021-05-05  9:37 ` [PATCH 2/3] KVM: x86/mmu: Fix pf_fixed count " Kai Huang
  2021-05-05  9:37 ` [PATCH 3/3] KVM: x86/mmu: Fix TDP MMU page table level Kai Huang
  2 siblings, 1 reply; 21+ messages in thread
From: Kai Huang @ 2021-05-05  9:37 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, bgardon, seanjc, vkuznets, wanpengli, jmattson, joro,
	Kai Huang

Currently tdp_mmu_map_handle_target_level() returns 0, which is
RET_PF_RETRY, when page fault is actually fixed.  This makes
kvm_tdp_mmu_map() also return RET_PF_RETRY in this case, instead of
RET_PF_FIXED.  Fix by initializing ret to RET_PF_FIXED.

Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index ff0ae030004d..1cad4c9f7c34 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -905,7 +905,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
 					  kvm_pfn_t pfn, bool prefault)
 {
 	u64 new_spte;
-	int ret = 0;
+	int ret = RET_PF_FIXED;
 	int make_spte_ret = 0;
 
 	if (unlikely(is_noslot_pfn(pfn)))
-- 
2.31.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 2/3] KVM: x86/mmu: Fix pf_fixed count in tdp_mmu_map_handle_target_level()
  2021-05-05  9:37 [PATCH 0/3] TDP MMU: several minor fixes or improvements Kai Huang
  2021-05-05  9:37 ` [PATCH 1/3] KVM: x86/mmu: Fix return value in tdp_mmu_map_handle_target_level() Kai Huang
@ 2021-05-05  9:37 ` Kai Huang
  2021-05-05 16:11   ` Ben Gardon
  2021-05-05 16:29   ` Sean Christopherson
  2021-05-05  9:37 ` [PATCH 3/3] KVM: x86/mmu: Fix TDP MMU page table level Kai Huang
  2 siblings, 2 replies; 21+ messages in thread
From: Kai Huang @ 2021-05-05  9:37 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, bgardon, seanjc, vkuznets, wanpengli, jmattson, joro,
	Kai Huang

Currently pf_fixed is increased even when page fault requires emulation,
or fault is spurious.  Fix by only increasing it when return value is
RET_PF_FIXED.

Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 1cad4c9f7c34..debe8c3ec844 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -942,7 +942,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
 				       rcu_dereference(iter->sptep));
 	}
 
-	if (!prefault)
+	if (!prefault && ret == RET_PF_FIXED)
 		vcpu->stat.pf_fixed++;
 
 	return ret;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 3/3] KVM: x86/mmu: Fix TDP MMU page table level
  2021-05-05  9:37 [PATCH 0/3] TDP MMU: several minor fixes or improvements Kai Huang
  2021-05-05  9:37 ` [PATCH 1/3] KVM: x86/mmu: Fix return value in tdp_mmu_map_handle_target_level() Kai Huang
  2021-05-05  9:37 ` [PATCH 2/3] KVM: x86/mmu: Fix pf_fixed count " Kai Huang
@ 2021-05-05  9:37 ` Kai Huang
  2021-05-05 16:28   ` Ben Gardon
  2 siblings, 1 reply; 21+ messages in thread
From: Kai Huang @ 2021-05-05  9:37 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, bgardon, seanjc, vkuznets, wanpengli, jmattson, joro,
	Kai Huang

TDP MMU iterator's level is identical to page table's actual level.  For
instance, for the last level page table (whose entry points to one 4K
page), iter->level is 1 (PG_LEVEL_4K), and in case of 5 level paging,
the iter->level is mmu->shadow_root_level, which is 5.  However, struct
kvm_mmu_page's level currently is not set correctly when it is allocated
in kvm_tdp_mmu_map().  When iterator hits non-present SPTE and needs to
allocate a new child page table, currently iter->level, which is the
level of the page table where the non-present SPTE belongs to, is used.
This results in struct kvm_mmu_page's level always having its parent's
level (excpet root table's level, which is initialized explicitly using
mmu->shadow_root_level).  This is kinda wrong, and not consistent with
existing non TDP MMU code.  Fortuantely the sp->role.level is only used
in handle_removed_tdp_mmu_page(), which apparently is already aware of
this, and handles correctly.  However to make it consistent with non TDP
MMU code (and fix the issue that both root page table and any child of
it having shadow_root_level), fix this by using iter->level - 1 in
kvm_tdp_mmu_map().  Also modify handle_removed_tdp_mmu_page() to handle
such change.

Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index debe8c3ec844..bcfb87e1c06e 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -335,7 +335,7 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, tdp_ptep_t pt,
 
 	for (i = 0; i < PT64_ENT_PER_PAGE; i++) {
 		sptep = rcu_dereference(pt) + i;
-		gfn = base_gfn + (i * KVM_PAGES_PER_HPAGE(level - 1));
+		gfn = base_gfn + i * KVM_PAGES_PER_HPAGE(level);
 
 		if (shared) {
 			/*
@@ -377,12 +377,12 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, tdp_ptep_t pt,
 			WRITE_ONCE(*sptep, REMOVED_SPTE);
 		}
 		handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn,
-				    old_child_spte, REMOVED_SPTE, level - 1,
+				    old_child_spte, REMOVED_SPTE, level,
 				    shared);
 	}
 
 	kvm_flush_remote_tlbs_with_address(kvm, gfn,
-					   KVM_PAGES_PER_HPAGE(level));
+					   KVM_PAGES_PER_HPAGE(level + 1));
 
 	call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
 }
@@ -1009,7 +1009,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 		}
 
 		if (!is_shadow_present_pte(iter.old_spte)) {
-			sp = alloc_tdp_mmu_page(vcpu, iter.gfn, iter.level);
+			sp = alloc_tdp_mmu_page(vcpu, iter.gfn, iter.level - 1);
 			child_pt = sp->spt;
 
 			new_spte = make_nonleaf_spte(child_pt,
-- 
2.31.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] KVM: x86/mmu: Fix return value in tdp_mmu_map_handle_target_level()
  2021-05-05  9:37 ` [PATCH 1/3] KVM: x86/mmu: Fix return value in tdp_mmu_map_handle_target_level() Kai Huang
@ 2021-05-05 16:00   ` Sean Christopherson
  2021-05-05 16:04     ` Ben Gardon
  0 siblings, 1 reply; 21+ messages in thread
From: Sean Christopherson @ 2021-05-05 16:00 UTC (permalink / raw)
  To: Kai Huang; +Cc: kvm, pbonzini, bgardon, vkuznets, wanpengli, jmattson, joro

On Wed, May 05, 2021, Kai Huang wrote:
> Currently tdp_mmu_map_handle_target_level() returns 0, which is
> RET_PF_RETRY, when page fault is actually fixed.  This makes
> kvm_tdp_mmu_map() also return RET_PF_RETRY in this case, instead of
> RET_PF_FIXED.  Fix by initializing ret to RET_PF_FIXED.

Probably worth adding a blurb to call out that the bad return value is benign
since kvm_mmu_page_fault() resumes the guest on RET_PF_RETRY or RET_PF_FIXED.
And for good measure, a Fixes without stable@.

  Fixes: bb18842e2111 ("kvm: x86/mmu: Add TDP MMU PF handler")

Reviewed-by: Sean Christopherson <seanjc@google.com> 


Side topic...

Ben, does the TDP MMU intentionally not prefetch pages?  If so, why not?

> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
>  arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index ff0ae030004d..1cad4c9f7c34 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -905,7 +905,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
>  					  kvm_pfn_t pfn, bool prefault)
>  {
>  	u64 new_spte;
> -	int ret = 0;
> +	int ret = RET_PF_FIXED;
>  	int make_spte_ret = 0;
>  
>  	if (unlikely(is_noslot_pfn(pfn)))
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] KVM: x86/mmu: Fix return value in tdp_mmu_map_handle_target_level()
  2021-05-05 16:00   ` Sean Christopherson
@ 2021-05-05 16:04     ` Ben Gardon
  2021-05-06  1:56       ` Kai Huang
  0 siblings, 1 reply; 21+ messages in thread
From: Ben Gardon @ 2021-05-05 16:04 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Kai Huang, kvm, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

On Wed, May 5, 2021 at 9:00 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, May 05, 2021, Kai Huang wrote:
> > Currently tdp_mmu_map_handle_target_level() returns 0, which is
> > RET_PF_RETRY, when page fault is actually fixed.  This makes
> > kvm_tdp_mmu_map() also return RET_PF_RETRY in this case, instead of
> > RET_PF_FIXED.  Fix by initializing ret to RET_PF_FIXED.
>
> Probably worth adding a blurb to call out that the bad return value is benign
> since kvm_mmu_page_fault() resumes the guest on RET_PF_RETRY or RET_PF_FIXED.
> And for good measure, a Fixes without stable@.
>
>   Fixes: bb18842e2111 ("kvm: x86/mmu: Add TDP MMU PF handler")
>
> Reviewed-by: Sean Christopherson <seanjc@google.com>

Haha I was just about to add the same two comments. Besides those,
this patch looks good to me as well.

Reviewed-by: Ben Gardon <bgardon@google.com>

>
>
> Side topic...
>
> Ben, does the TDP MMU intentionally not prefetch pages?  If so, why not?

It does not currently prefetch pages, but there's no reason it
shouldn't. It simply hasn't been implemented yet.

>
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > ---
> >  arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > index ff0ae030004d..1cad4c9f7c34 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > @@ -905,7 +905,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
> >                                         kvm_pfn_t pfn, bool prefault)
> >  {
> >       u64 new_spte;
> > -     int ret = 0;
> > +     int ret = RET_PF_FIXED;
> >       int make_spte_ret = 0;
> >
> >       if (unlikely(is_noslot_pfn(pfn)))
> > --
> > 2.31.1
> >

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] KVM: x86/mmu: Fix pf_fixed count in tdp_mmu_map_handle_target_level()
  2021-05-05  9:37 ` [PATCH 2/3] KVM: x86/mmu: Fix pf_fixed count " Kai Huang
@ 2021-05-05 16:11   ` Ben Gardon
  2021-05-06  7:51     ` Kai Huang
  2021-05-05 16:29   ` Sean Christopherson
  1 sibling, 1 reply; 21+ messages in thread
From: Ben Gardon @ 2021-05-05 16:11 UTC (permalink / raw)
  To: Kai Huang
  Cc: kvm, Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel

On Wed, May 5, 2021 at 2:38 AM Kai Huang <kai.huang@intel.com> wrote:
>
> Currently pf_fixed is increased even when page fault requires emulation,
> or fault is spurious.  Fix by only increasing it when return value is
> RET_PF_FIXED.

Revisiting __direct_map and mmu_set_spte, there are cases in the
legacy MMU where RET_PF_EMULATE is returned but pf_fixed is still
incremented.
Perhaps it would make more sense to do the increment in the success
case of tdp_mmu_set_spte_atomic as you suggested before. Sorry I
didn't catch that earlier.

It would probably also be worth putting a comment on pf_fixed so that
people in the future know what it's supposed to mean and we don't get
into archeology, reverse engineering the meaning of the stat again.

>
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
>  arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 1cad4c9f7c34..debe8c3ec844 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -942,7 +942,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
>                                        rcu_dereference(iter->sptep));
>         }
>
> -       if (!prefault)
> +       if (!prefault && ret == RET_PF_FIXED)
>                 vcpu->stat.pf_fixed++;
>
>         return ret;
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] KVM: x86/mmu: Fix TDP MMU page table level
  2021-05-05  9:37 ` [PATCH 3/3] KVM: x86/mmu: Fix TDP MMU page table level Kai Huang
@ 2021-05-05 16:28   ` Ben Gardon
  2021-05-05 17:01     ` Ben Gardon
  2021-05-06  8:00     ` Kai Huang
  0 siblings, 2 replies; 21+ messages in thread
From: Ben Gardon @ 2021-05-05 16:28 UTC (permalink / raw)
  To: Kai Huang
  Cc: kvm, Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel

On Wed, May 5, 2021 at 2:38 AM Kai Huang <kai.huang@intel.com> wrote:
>
> TDP MMU iterator's level is identical to page table's actual level.  For
> instance, for the last level page table (whose entry points to one 4K
> page), iter->level is 1 (PG_LEVEL_4K), and in case of 5 level paging,
> the iter->level is mmu->shadow_root_level, which is 5.  However, struct
> kvm_mmu_page's level currently is not set correctly when it is allocated
> in kvm_tdp_mmu_map().  When iterator hits non-present SPTE and needs to
> allocate a new child page table, currently iter->level, which is the
> level of the page table where the non-present SPTE belongs to, is used.
> This results in struct kvm_mmu_page's level always having its parent's
> level (excpet root table's level, which is initialized explicitly using
> mmu->shadow_root_level).  This is kinda wrong, and not consistent with
> existing non TDP MMU code.  Fortuantely the sp->role.level is only used
> in handle_removed_tdp_mmu_page(), which apparently is already aware of
> this, and handles correctly.  However to make it consistent with non TDP
> MMU code (and fix the issue that both root page table and any child of
> it having shadow_root_level), fix this by using iter->level - 1 in
> kvm_tdp_mmu_map().  Also modify handle_removed_tdp_mmu_page() to handle
> such change.

Ugh. Thank you for catching this. This is going to take me a bit to
review as I should audit the code more broadly for this problem in the
TDP MMU.
It would probably also be a good idea to add a comment on the level
field to say that it represents the level of the SPTEs in the
associated page, not the level of the SPTE that links to the
associated page.
Hopefully that will prevent similar future misunderstandings.

>
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
>  arch/x86/kvm/mmu/tdp_mmu.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index debe8c3ec844..bcfb87e1c06e 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -335,7 +335,7 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, tdp_ptep_t pt,
>
>         for (i = 0; i < PT64_ENT_PER_PAGE; i++) {
>                 sptep = rcu_dereference(pt) + i;
> -               gfn = base_gfn + (i * KVM_PAGES_PER_HPAGE(level - 1));
> +               gfn = base_gfn + i * KVM_PAGES_PER_HPAGE(level);
>
>                 if (shared) {
>                         /*
> @@ -377,12 +377,12 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, tdp_ptep_t pt,
>                         WRITE_ONCE(*sptep, REMOVED_SPTE);
>                 }
>                 handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn,
> -                                   old_child_spte, REMOVED_SPTE, level - 1,
> +                                   old_child_spte, REMOVED_SPTE, level,
>                                     shared);
>         }
>
>         kvm_flush_remote_tlbs_with_address(kvm, gfn,
> -                                          KVM_PAGES_PER_HPAGE(level));
> +                                          KVM_PAGES_PER_HPAGE(level + 1));
>
>         call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
>  }
> @@ -1009,7 +1009,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
>                 }
>
>                 if (!is_shadow_present_pte(iter.old_spte)) {
> -                       sp = alloc_tdp_mmu_page(vcpu, iter.gfn, iter.level);
> +                       sp = alloc_tdp_mmu_page(vcpu, iter.gfn, iter.level - 1);
>                         child_pt = sp->spt;
>
>                         new_spte = make_nonleaf_spte(child_pt,
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] KVM: x86/mmu: Fix pf_fixed count in tdp_mmu_map_handle_target_level()
  2021-05-05  9:37 ` [PATCH 2/3] KVM: x86/mmu: Fix pf_fixed count " Kai Huang
  2021-05-05 16:11   ` Ben Gardon
@ 2021-05-05 16:29   ` Sean Christopherson
  2021-05-05 17:16     ` Sean Christopherson
  1 sibling, 1 reply; 21+ messages in thread
From: Sean Christopherson @ 2021-05-05 16:29 UTC (permalink / raw)
  To: Kai Huang; +Cc: kvm, pbonzini, bgardon, vkuznets, wanpengli, jmattson, joro

On Wed, May 05, 2021, Kai Huang wrote:
> Currently pf_fixed is increased even when page fault requires emulation,
> or fault is spurious.  Fix by only increasing it when return value is
> RET_PF_FIXED.
> 
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
>  arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 1cad4c9f7c34..debe8c3ec844 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -942,7 +942,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
>  				       rcu_dereference(iter->sptep));
>  	}
>  
> -	if (!prefault)
> +	if (!prefault && ret == RET_PF_FIXED)
>  		vcpu->stat.pf_fixed++;

Both this patch and the existing code are wrong to check prefault, and they both
deviate from the legacy MMU (both TDP and shadow) for RET_PF_EMULATE.

For prefault==true, KVM should indeed bump pf_fixed since "prefault" really means
"async page fault completed".  In that case, the original page fault from the
guest was morphed into an async page fault and stat.pf_fixed was _not_ incremented
because KVM hasn't actually fixed the fault.  The "prefault" flag is there
purely so that KVM doesn't injected a #PF into the guest in the case where the
guest unmapped the gpa while the async page fault was being handled.

For RET_PF_EMULATE, I could go either way.  On one hand, KVM is installing a
translation that accelerates future emulated MMIO, so it's kinda sorta fixing
the page fault.  On the other handle, future emulated MMIO still page faults, it
just gets handled without going through the full page fault handler.

If we do decide to omit RET_PF_EMULATE, it should be a separate patch and should
be done for all flavors of MMU.  For this patch, the correct code is:

	if (ret != RET_PF_SPURIOUS)
		vcpu->stat.pf_fixed++;

which works because "ret" cannot be RET_PF_RETRY.

Looking through the other code, KVM also fails to bump stat.pf_fixed in the fast
page fault case.  So, if we decide to fix/adjust RET_PF_EMULATE, I think it would
make sense to handle stat.pf_fixed in a common location.

The legacy MMU also prefetches on RET_PF_EMULATE, which isn't technically wrong,
but it's pretty much guaranteed to be a waste of time since prefetching only
installs SPTEs if there is a backing memslot and PFN.

Kai, if it's ok with you, I'll fold the above "ret != RET_PF_SPURIOUS" change
into a separate mini-series to address the other issues I pointed out.  I was
hoping I could paste patches for them inline and let you carry them, but moving
stat.pf_fixed handling to a common location requires additional code shuffling
because of async page faults :-/

Thanks!

>  	return ret;
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] KVM: x86/mmu: Fix TDP MMU page table level
  2021-05-05 16:28   ` Ben Gardon
@ 2021-05-05 17:01     ` Ben Gardon
  2021-05-05 20:19       ` Kai Huang
  2021-05-06  8:00     ` Kai Huang
  1 sibling, 1 reply; 21+ messages in thread
From: Ben Gardon @ 2021-05-05 17:01 UTC (permalink / raw)
  To: Kai Huang
  Cc: kvm, Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel

On Wed, May 5, 2021 at 9:28 AM Ben Gardon <bgardon@google.com> wrote:
>
> On Wed, May 5, 2021 at 2:38 AM Kai Huang <kai.huang@intel.com> wrote:
> >
> > TDP MMU iterator's level is identical to page table's actual level.  For
> > instance, for the last level page table (whose entry points to one 4K
> > page), iter->level is 1 (PG_LEVEL_4K), and in case of 5 level paging,
> > the iter->level is mmu->shadow_root_level, which is 5.  However, struct
> > kvm_mmu_page's level currently is not set correctly when it is allocated
> > in kvm_tdp_mmu_map().  When iterator hits non-present SPTE and needs to
> > allocate a new child page table, currently iter->level, which is the
> > level of the page table where the non-present SPTE belongs to, is used.
> > This results in struct kvm_mmu_page's level always having its parent's
> > level (excpet root table's level, which is initialized explicitly using
> > mmu->shadow_root_level).  This is kinda wrong, and not consistent with
> > existing non TDP MMU code.  Fortuantely the sp->role.level is only used
> > in handle_removed_tdp_mmu_page(), which apparently is already aware of
> > this, and handles correctly.  However to make it consistent with non TDP
> > MMU code (and fix the issue that both root page table and any child of
> > it having shadow_root_level), fix this by using iter->level - 1 in
> > kvm_tdp_mmu_map().  Also modify handle_removed_tdp_mmu_page() to handle
> > such change.
>
> Ugh. Thank you for catching this. This is going to take me a bit to
> review as I should audit the code more broadly for this problem in the
> TDP MMU.
> It would probably also be a good idea to add a comment on the level
> field to say that it represents the level of the SPTEs in the
> associated page, not the level of the SPTE that links to the
> associated page.
> Hopefully that will prevent similar future misunderstandings.

I went through and manually audited the code. I think the only case
that needs to be added to this is for nx recovery:

--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -31,7 +31,7 @@ static inline bool kvm_tdp_mmu_zap_gfn_range(struct
kvm *kvm, int as_id,
 }
 static inline bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
-       gfn_t end = sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level);
+       gfn_t end = sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level + 1);

        /*
         * Don't allow yielding, as the caller may have a flush pending.  Note,

Otherwise we won't zap the full page with this change, resulting in
ineffective or less reliable NX recovery.

>
> >
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > ---
> >  arch/x86/kvm/mmu/tdp_mmu.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > index debe8c3ec844..bcfb87e1c06e 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > @@ -335,7 +335,7 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, tdp_ptep_t pt,
> >
> >         for (i = 0; i < PT64_ENT_PER_PAGE; i++) {
> >                 sptep = rcu_dereference(pt) + i;
> > -               gfn = base_gfn + (i * KVM_PAGES_PER_HPAGE(level - 1));
> > +               gfn = base_gfn + i * KVM_PAGES_PER_HPAGE(level);
> >
> >                 if (shared) {
> >                         /*
> > @@ -377,12 +377,12 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, tdp_ptep_t pt,
> >                         WRITE_ONCE(*sptep, REMOVED_SPTE);
> >                 }
> >                 handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn,
> > -                                   old_child_spte, REMOVED_SPTE, level - 1,
> > +                                   old_child_spte, REMOVED_SPTE, level,
> >                                     shared);
> >         }
> >
> >         kvm_flush_remote_tlbs_with_address(kvm, gfn,
> > -                                          KVM_PAGES_PER_HPAGE(level));
> > +                                          KVM_PAGES_PER_HPAGE(level + 1));
> >
> >         call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
> >  }
> > @@ -1009,7 +1009,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
> >                 }
> >
> >                 if (!is_shadow_present_pte(iter.old_spte)) {
> > -                       sp = alloc_tdp_mmu_page(vcpu, iter.gfn, iter.level);
> > +                       sp = alloc_tdp_mmu_page(vcpu, iter.gfn, iter.level - 1);
> >                         child_pt = sp->spt;
> >
> >                         new_spte = make_nonleaf_spte(child_pt,
> > --
> > 2.31.1
> >

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] KVM: x86/mmu: Fix pf_fixed count in tdp_mmu_map_handle_target_level()
  2021-05-05 16:29   ` Sean Christopherson
@ 2021-05-05 17:16     ` Sean Christopherson
  2021-05-06  1:51       ` Kai Huang
  0 siblings, 1 reply; 21+ messages in thread
From: Sean Christopherson @ 2021-05-05 17:16 UTC (permalink / raw)
  To: Kai Huang; +Cc: kvm, pbonzini, bgardon, vkuznets, wanpengli, jmattson, joro

On Wed, May 05, 2021, Sean Christopherson wrote:
> On Wed, May 05, 2021, Kai Huang wrote:
> > Currently pf_fixed is increased even when page fault requires emulation,
> > or fault is spurious.  Fix by only increasing it when return value is
> > RET_PF_FIXED.
> > 
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > ---
> >  arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > index 1cad4c9f7c34..debe8c3ec844 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > @@ -942,7 +942,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
> >  				       rcu_dereference(iter->sptep));
> >  	}
> >  
> > -	if (!prefault)
> > +	if (!prefault && ret == RET_PF_FIXED)
> >  		vcpu->stat.pf_fixed++;
> For RET_PF_EMULATE, I could go either way.  On one hand, KVM is installing a
> translation that accelerates future emulated MMIO, so it's kinda sorta fixing
> the page fault.  On the other handle, future emulated MMIO still page faults, it
> just gets handled without going through the full page fault handler.

Hrm, the other RET_PF_EMULATE case is when KVM creates a _new_ SPTE to handle a
page fault, but installs a read-only SPTE on a write fault because the page is
marked for write access tracking, e.g. for non-leaf guest page tables.  Bumping
pf_fixed is arguably correct in that case since KVM did fault in a page and from
the guest's perspective the page fault was fixed, it's just that "fixing" the
fault involved a bit of instruction emulation.

> If we do decide to omit RET_PF_EMULATE, it should be a separate patch and should
> be done for all flavors of MMU.  For this patch, the correct code is:
> 
> 	if (ret != RET_PF_SPURIOUS)
> 		vcpu->stat.pf_fixed++;
> 
> which works because "ret" cannot be RET_PF_RETRY.
> 
> Looking through the other code, KVM also fails to bump stat.pf_fixed in the fast
> page fault case.  So, if we decide to fix/adjust RET_PF_EMULATE, I think it would
> make sense to handle stat.pf_fixed in a common location.

Blech.  My original thought was to move the stat.pf_fixed logic all the way out
to kvm_mmu_do_page_fault(), but that would do the wrong thing if pf_fixed is
bumped on RET_PF_EMULATE and page_fault_handle_page_track() returns RET_PF_EMULATE.
That fast path handles the case where the guest gets a !WRITABLE page fault on
an PRESENT SPTE that KVM is write tracking.  *sigh*.

I'm leaning towards making RET_PF_EMULATE a modifier instead of a standalone
type, which would allow more precise pf_fixed adjustments and would also let us
clean up the EMULTYPE_ALLOW_RETRY_PF logic, which has a rather gross check for
detecting MMIO page faults.

> The legacy MMU also prefetches on RET_PF_EMULATE, which isn't technically wrong,
> but it's pretty much guaranteed to be a waste of time since prefetching only
> installs SPTEs if there is a backing memslot and PFN.
> 
> Kai, if it's ok with you, I'll fold the above "ret != RET_PF_SPURIOUS" change
> into a separate mini-series to address the other issues I pointed out.  I was
> hoping I could paste patches for them inline and let you carry them, but moving
> stat.pf_fixed handling to a common location requires additional code shuffling
> because of async page faults :-/

Cancel that idea, given the twisty mess of RET_PF_EMULATE it's probably best for
you to just send a new version of your patch to make the TDP MMU pf_fixed behavior
match the legacy MMU.  It doesn't make sense to hold up a trivial fix just so I
can scratch a refactoring itch :-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] KVM: x86/mmu: Fix TDP MMU page table level
  2021-05-05 17:01     ` Ben Gardon
@ 2021-05-05 20:19       ` Kai Huang
  0 siblings, 0 replies; 21+ messages in thread
From: Kai Huang @ 2021-05-05 20:19 UTC (permalink / raw)
  To: Ben Gardon
  Cc: kvm, Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel

On Wed, 2021-05-05 at 10:01 -0700, Ben Gardon wrote:
> On Wed, May 5, 2021 at 9:28 AM Ben Gardon <bgardon@google.com> wrote:
> > 
> > On Wed, May 5, 2021 at 2:38 AM Kai Huang <kai.huang@intel.com> wrote:
> > > 
> > > TDP MMU iterator's level is identical to page table's actual level.  For
> > > instance, for the last level page table (whose entry points to one 4K
> > > page), iter->level is 1 (PG_LEVEL_4K), and in case of 5 level paging,
> > > the iter->level is mmu->shadow_root_level, which is 5.  However, struct
> > > kvm_mmu_page's level currently is not set correctly when it is allocated
> > > in kvm_tdp_mmu_map().  When iterator hits non-present SPTE and needs to
> > > allocate a new child page table, currently iter->level, which is the
> > > level of the page table where the non-present SPTE belongs to, is used.
> > > This results in struct kvm_mmu_page's level always having its parent's
> > > level (excpet root table's level, which is initialized explicitly using
> > > mmu->shadow_root_level).  This is kinda wrong, and not consistent with
> > > existing non TDP MMU code.  Fortuantely the sp->role.level is only used
> > > in handle_removed_tdp_mmu_page(), which apparently is already aware of
> > > this, and handles correctly.  However to make it consistent with non TDP
> > > MMU code (and fix the issue that both root page table and any child of
> > > it having shadow_root_level), fix this by using iter->level - 1 in
> > > kvm_tdp_mmu_map().  Also modify handle_removed_tdp_mmu_page() to handle
> > > such change.
> > 
> > Ugh. Thank you for catching this. This is going to take me a bit to
> > review as I should audit the code more broadly for this problem in the
> > TDP MMU.
> > It would probably also be a good idea to add a comment on the level
> > field to say that it represents the level of the SPTEs in the
> > associated page, not the level of the SPTE that links to the
> > associated page.
> > Hopefully that will prevent similar future misunderstandings.
> 
> I went through and manually audited the code. I think the only case
> that needs to be added to this is for nx recovery:
> 
> --- a/arch/x86/kvm/mmu/tdp_mmu.h
> +++ b/arch/x86/kvm/mmu/tdp_mmu.h
> @@ -31,7 +31,7 @@ static inline bool kvm_tdp_mmu_zap_gfn_range(struct
> kvm *kvm, int as_id,
>  }
>  static inline bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
>  {
> -       gfn_t end = sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level);
> +       gfn_t end = sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level + 1);
> 
>         /*
>          * Don't allow yielding, as the caller may have a flush pending.  Note,
> 
> Otherwise we won't zap the full page with this change, resulting in
> ineffective or less reliable NX recovery.

Oh correct. I missed this. I'll add this in next version. Thanks.

> 
> > 
> > > 
> > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > > ---
> > >  arch/x86/kvm/mmu/tdp_mmu.c | 8 ++++----
> > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > > index debe8c3ec844..bcfb87e1c06e 100644
> > > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > > @@ -335,7 +335,7 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, tdp_ptep_t pt,
> > > 
> > >         for (i = 0; i < PT64_ENT_PER_PAGE; i++) {
> > >                 sptep = rcu_dereference(pt) + i;
> > > -               gfn = base_gfn + (i * KVM_PAGES_PER_HPAGE(level - 1));
> > > +               gfn = base_gfn + i * KVM_PAGES_PER_HPAGE(level);
> > > 
> > >                 if (shared) {
> > >                         /*
> > > @@ -377,12 +377,12 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, tdp_ptep_t pt,
> > >                         WRITE_ONCE(*sptep, REMOVED_SPTE);
> > >                 }
> > >                 handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn,
> > > -                                   old_child_spte, REMOVED_SPTE, level - 1,
> > > +                                   old_child_spte, REMOVED_SPTE, level,
> > >                                     shared);
> > >         }
> > > 
> > >         kvm_flush_remote_tlbs_with_address(kvm, gfn,
> > > -                                          KVM_PAGES_PER_HPAGE(level));
> > > +                                          KVM_PAGES_PER_HPAGE(level + 1));
> > > 
> > >         call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
> > >  }
> > > @@ -1009,7 +1009,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
> > >                 }
> > > 
> > >                 if (!is_shadow_present_pte(iter.old_spte)) {
> > > -                       sp = alloc_tdp_mmu_page(vcpu, iter.gfn, iter.level);
> > > +                       sp = alloc_tdp_mmu_page(vcpu, iter.gfn, iter.level - 1);
> > >                         child_pt = sp->spt;
> > > 
> > >                         new_spte = make_nonleaf_spte(child_pt,
> > > --
> > > 2.31.1
> > > 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] KVM: x86/mmu: Fix pf_fixed count in tdp_mmu_map_handle_target_level()
  2021-05-05 17:16     ` Sean Christopherson
@ 2021-05-06  1:51       ` Kai Huang
  0 siblings, 0 replies; 21+ messages in thread
From: Kai Huang @ 2021-05-06  1:51 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, pbonzini, bgardon, vkuznets, wanpengli, jmattson, joro


> > > 
> > > -	if (!prefault)
> > > +	if (!prefault && ret == RET_PF_FIXED)
> > >  		vcpu->stat.pf_fixed++;
> > For RET_PF_EMULATE, I could go either way.  On one hand, KVM is installing a
> > translation that accelerates future emulated MMIO, so it's kinda sorta fixing
> > the page fault.  On the other handle, future emulated MMIO still page faults, it
> > just gets handled without going through the full page fault handler.
> 
> Hrm, the other RET_PF_EMULATE case is when KVM creates a _new_ SPTE to handle a
> page fault, but installs a read-only SPTE on a write fault because the page is
> marked for write access tracking, e.g. for non-leaf guest page tables.  Bumping
> pf_fixed is arguably correct in that case since KVM did fault in a page and from
> the guest's perspective the page fault was fixed, it's just that "fixing" the
> fault involved a bit of instruction emulation.

Yes this is exactly the case for video ram :)

> 
> > If we do decide to omit RET_PF_EMULATE, it should be a separate patch and should
> > be done for all flavors of MMU.  For this patch, the correct code is:
> > 
> > 	if (ret != RET_PF_SPURIOUS)
> > 		vcpu->stat.pf_fixed++;
> > 
> > which works because "ret" cannot be RET_PF_RETRY.
> > 
> > Looking through the other code, KVM also fails to bump stat.pf_fixed in the fast
> > page fault case.  So, if we decide to fix/adjust RET_PF_EMULATE, I think it would
> > make sense to handle stat.pf_fixed in a common location.
> 
> Blech.  My original thought was to move the stat.pf_fixed logic all the way out
> to kvm_mmu_do_page_fault(), but that would do the wrong thing if pf_fixed is
> bumped on RET_PF_EMULATE and page_fault_handle_page_track() returns RET_PF_EMULATE.
> That fast path handles the case where the guest gets a !WRITABLE page fault on
> an PRESENT SPTE that KVM is write tracking.  *sigh*.
> 
> I'm leaning towards making RET_PF_EMULATE a modifier instead of a standalone
> type, which would allow more precise pf_fixed adjustments and would also let us
> clean up the EMULTYPE_ALLOW_RETRY_PF logic, which has a rather gross check for
> detecting MMIO page faults.
> 
> > The legacy MMU also prefetches on RET_PF_EMULATE, which isn't technically wrong,
> > but it's pretty much guaranteed to be a waste of time since prefetching only
> > installs SPTEs if there is a backing memslot and PFN.
> > 
> > Kai, if it's ok with you, I'll fold the above "ret != RET_PF_SPURIOUS" change
> > into a separate mini-series to address the other issues I pointed out.  I was
> > hoping I could paste patches for them inline and let you carry them, but moving
> > stat.pf_fixed handling to a common location requires additional code shuffling
> > because of async page faults :-/
> 
> Cancel that idea, given the twisty mess of RET_PF_EMULATE it's probably best for
> you to just send a new version of your patch to make the TDP MMU pf_fixed behavior
> match the legacy MMU.  It doesn't make sense to hold up a trivial fix just so I
> can scratch a refactoring itch :-)

OK. Either way is fine to me. I'll send a new one with your suggestion:

	if (ret != RET_PF_SPURIOUS)
 		vcpu->stat.pf_fixed++;

Thanks!


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] KVM: x86/mmu: Fix return value in tdp_mmu_map_handle_target_level()
  2021-05-05 16:04     ` Ben Gardon
@ 2021-05-06  1:56       ` Kai Huang
  0 siblings, 0 replies; 21+ messages in thread
From: Kai Huang @ 2021-05-06  1:56 UTC (permalink / raw)
  To: Ben Gardon, Sean Christopherson
  Cc: kvm, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel

On Wed, 2021-05-05 at 09:04 -0700, Ben Gardon wrote:
> On Wed, May 5, 2021 at 9:00 AM Sean Christopherson <seanjc@google.com> wrote:
> > 
> > On Wed, May 05, 2021, Kai Huang wrote:
> > > Currently tdp_mmu_map_handle_target_level() returns 0, which is
> > > RET_PF_RETRY, when page fault is actually fixed.  This makes
> > > kvm_tdp_mmu_map() also return RET_PF_RETRY in this case, instead of
> > > RET_PF_FIXED.  Fix by initializing ret to RET_PF_FIXED.
> > 
> > Probably worth adding a blurb to call out that the bad return value is benign
> > since kvm_mmu_page_fault() resumes the guest on RET_PF_RETRY or RET_PF_FIXED.
> > And for good measure, a Fixes without stable@.
> > 
> >   Fixes: bb18842e2111 ("kvm: x86/mmu: Add TDP MMU PF handler")
> > 
> > Reviewed-by: Sean Christopherson <seanjc@google.com>
> 
> Haha I was just about to add the same two comments. Besides those,
> this patch looks good to me as well.
> 
> Reviewed-by: Ben Gardon <bgardon@google.com>
> 
> 

Thanks Sean and Ben. I'll add Sean's suggestion to commit message, and add a Fixes:...



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] KVM: x86/mmu: Fix pf_fixed count in tdp_mmu_map_handle_target_level()
  2021-05-05 16:11   ` Ben Gardon
@ 2021-05-06  7:51     ` Kai Huang
  2021-05-06 15:29       ` Sean Christopherson
  0 siblings, 1 reply; 21+ messages in thread
From: Kai Huang @ 2021-05-06  7:51 UTC (permalink / raw)
  To: Ben Gardon
  Cc: kvm, Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel

On Wed, 2021-05-05 at 09:11 -0700, Ben Gardon wrote:
> On Wed, May 5, 2021 at 2:38 AM Kai Huang <kai.huang@intel.com> wrote:
> > 
> > Currently pf_fixed is increased even when page fault requires emulation,
> > or fault is spurious.  Fix by only increasing it when return value is
> > RET_PF_FIXED.
> 
> Revisiting __direct_map and mmu_set_spte, there are cases in the
> legacy MMU where RET_PF_EMULATE is returned but pf_fixed is still
> incremented.
> Perhaps it would make more sense to do the increment in the success
> case of tdp_mmu_set_spte_atomic as you suggested before. Sorry I
> didn't catch that earlier.

If I understand correctly, Sean's suggestion:

        if (ret != RET_PF_SPURIOUS)
                vcpu->stat.pf_fixed++;   

can handle things correctly. The spurious fault check in existing code should work
correctly -- it detects spurious fault early, but later it overwrites if emulation is
required. So with above code, it should work consistently with legacy MMU behavior.

Or did I miss anything?

> 
> It would probably also be worth putting a comment on pf_fixed so that
> people in the future know what it's supposed to mean and we don't get
> into archeology, reverse engineering the meaning of the stat again.

It seems the legacy MMU code path is a better place to add the comment to explain when
pf_fixed should be increased.  However I am not sure whether it is necessary for this
patch (and I confess I found it's hard to explain why to increase pf_fixed in case of
emulation :)).  Or perhaps Sean can write a patch to add comment to legacy MMU :)

I ended up with  below, by adding a comment in TDP MMU saying "to make it consistent with
legacy MMU...", and in the commit message, I put a lore link of this discussion, since I
found Sean's explanation is quite useful. When people are interested in, they can do a git
blame and find the commit msg of this change -- although it is not as straightforward as
having comment directly.

Is this OK to you?

And Sean?

------------------------------------------------------------------------

Currently pf_fixed is not increased when prefault is true.  This is not
correct, since prefault here really means "async page fault completed".
In that case, the original page fault from the guest was morphed into as
async page fault and pf_fixed was not increased.  So when prefault
indicates async page fault is completed, pf_fixed should be increased.

Additionally, currently pf_fixed is also increased even when page fault
is spurious, while legacy MMU increases pf_fixed when page fault returns
RET_PF_EMULATE or RET_PF_FIXED.

To fix above two issues, change to increase pf_fixed when return value
is not RET_PF_SPURIOUS (RET_PF_RETRY has already been ruled out by
reaching here).

More information:
https://lore.kernel.org/kvm/cover.1620200410.git.kai.huang@intel.com/T/#mbb5f8083e58a2cd262231512b9211cbe70fc3bd5

Fixes: bb18842e2111 ("kvm: x86/mmu: Add TDP MMU PF handler")
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 1cad4c9f7c34..5e28fbabcd35 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -942,7 +942,11 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int
write,
                                       rcu_dereference(iter->sptep));
        }

-       if (!prefault)
+       /*
+        * Increase pf_fixed in both RET_PF_EMULATE and RET_PF_FIXED to be
+        * consistent with legacy MMU behavior.
+        */
+       if (ret != RET_PF_SPURIOUS)
                vcpu->stat.pf_fixed++;

        return ret;
-- 
2.31.1

 
> 
> > 
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > ---
> >  arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > index 1cad4c9f7c34..debe8c3ec844 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > @@ -942,7 +942,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
> >                                        rcu_dereference(iter->sptep));
> >         }
> > 
> > -       if (!prefault)
> > +       if (!prefault && ret == RET_PF_FIXED)
> >                 vcpu->stat.pf_fixed++;
> > 
> >         return ret;
> > --
> > 2.31.1
> > 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] KVM: x86/mmu: Fix TDP MMU page table level
  2021-05-05 16:28   ` Ben Gardon
  2021-05-05 17:01     ` Ben Gardon
@ 2021-05-06  8:00     ` Kai Huang
  2021-05-06 16:22       ` Ben Gardon
  1 sibling, 1 reply; 21+ messages in thread
From: Kai Huang @ 2021-05-06  8:00 UTC (permalink / raw)
  To: Ben Gardon
  Cc: kvm, Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel

On Wed, 2021-05-05 at 09:28 -0700, Ben Gardon wrote:
> On Wed, May 5, 2021 at 2:38 AM Kai Huang <kai.huang@intel.com> wrote:
> > 
> > TDP MMU iterator's level is identical to page table's actual level.  For
> > instance, for the last level page table (whose entry points to one 4K
> > page), iter->level is 1 (PG_LEVEL_4K), and in case of 5 level paging,
> > the iter->level is mmu->shadow_root_level, which is 5.  However, struct
> > kvm_mmu_page's level currently is not set correctly when it is allocated
> > in kvm_tdp_mmu_map().  When iterator hits non-present SPTE and needs to
> > allocate a new child page table, currently iter->level, which is the
> > level of the page table where the non-present SPTE belongs to, is used.
> > This results in struct kvm_mmu_page's level always having its parent's
> > level (excpet root table's level, which is initialized explicitly using
> > mmu->shadow_root_level).  This is kinda wrong, and not consistent with
> > existing non TDP MMU code.  Fortuantely the sp->role.level is only used
> > in handle_removed_tdp_mmu_page(), which apparently is already aware of
> > this, and handles correctly.  However to make it consistent with non TDP
> > MMU code (and fix the issue that both root page table and any child of
> > it having shadow_root_level), fix this by using iter->level - 1 in
> > kvm_tdp_mmu_map().  Also modify handle_removed_tdp_mmu_page() to handle
> > such change.
> 
> Ugh. Thank you for catching this. This is going to take me a bit to
> review as I should audit the code more broadly for this problem in the
> TDP MMU.
> It would probably also be a good idea to add a comment on the level
> field to say that it represents the level of the SPTEs in the
> associated page, not the level of the SPTE that links to the
> associated page.
> Hopefully that will prevent similar future misunderstandings.

Regarding to adding  a comment, sorry I had a hard time to figure out where to add. Did
you mean level field of 'struct kvm_mmu_page_role', or 'struct tdp_iter'? If it is the
former, to me not quite useful. 

I ended up with below. Is it OK to you?

If you still think a comment of level should be added, would you be more specific so that
I can add it?

------------------------------------------------------------------------

TDP MMU iterator's level is identical to page table's actual level.  For
instance, for the last level page table (whose entry points to one 4K
page), iter->level is 1 (PG_LEVEL_4K), and in case of 5 level paging,
the iter->level is mmu->shadow_root_level, which is 5.  However, struct
kvm_mmu_page's level currently is not set correctly when it is allocated
in kvm_tdp_mmu_map().  When iterator hits non-present SPTE and needs to
allocate a new child page table, currently iter->level, which is the
level of the page table where the non-present SPTE belongs to, is used.
This results in struct kvm_mmu_page's level always having its parent's
level (excpet root table's level, which is initialized explicitly using
mmu->shadow_root_level).

This is kinda wrong, and not consistent with existing non TDP MMU code.
Fortuantely sp->role.level is only used in handle_removed_tdp_mmu_page()
and kvm_tdp_mmu_zap_sp(), and they are already aware of this and behave
correctly.  However to make it consistent with legacy MMU code (and fix
the issue that both root page table and its child page table have
shadow_root_level), use iter->level - 1 in kvm_tdp_mmu_map(), and change
handle_removed_tdp_mmu_page() and kvm_tdp_mmu_zap_sp() accordingly.

Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 8 ++++----
 arch/x86/kvm/mmu/tdp_mmu.h | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 5e28fbabcd35..45fb889f6a94 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -335,7 +335,7 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, tdp_ptep_t
pt,

        for (i = 0; i < PT64_ENT_PER_PAGE; i++) {
                sptep = rcu_dereference(pt) + i;
-               gfn = base_gfn + (i * KVM_PAGES_PER_HPAGE(level - 1));
+               gfn = base_gfn + i * KVM_PAGES_PER_HPAGE(level);

                if (shared) {
                        /*
@@ -377,12 +377,12 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, tdp_ptep_t
pt,
                        WRITE_ONCE(*sptep, REMOVED_SPTE);
                }
                handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn,
-                                   old_child_spte, REMOVED_SPTE, level - 1,
+                                   old_child_spte, REMOVED_SPTE, level,
                                    shared);
        }

        kvm_flush_remote_tlbs_with_address(kvm, gfn,
-                                          KVM_PAGES_PER_HPAGE(level));
+                                          KVM_PAGES_PER_HPAGE(level + 1));

        call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
 }
@@ -1013,7 +1013,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32
error_code,
                }

                if (!is_shadow_present_pte(iter.old_spte)) {
-                       sp = alloc_tdp_mmu_page(vcpu, iter.gfn, iter.level);
+                       sp = alloc_tdp_mmu_page(vcpu, iter.gfn, iter.level - 1);
                        child_pt = sp->spt;

                        new_spte = make_nonleaf_spte(child_pt,
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 5fdf63090451..7f9974c5d0b4 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -31,7 +31,7 @@ static inline bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, int as_id,
 }
 static inline bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
-       gfn_t end = sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level);
+       gfn_t end = sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level + 1);

        /*
         * Don't allow yielding, as the caller may have a flush pending.  Note,
-- 
2.31.1

 
> 
> > 
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > ---
> >  arch/x86/kvm/mmu/tdp_mmu.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > index debe8c3ec844..bcfb87e1c06e 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > @@ -335,7 +335,7 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, tdp_ptep_t pt,
> > 
> >         for (i = 0; i < PT64_ENT_PER_PAGE; i++) {
> >                 sptep = rcu_dereference(pt) + i;
> > -               gfn = base_gfn + (i * KVM_PAGES_PER_HPAGE(level - 1));
> > +               gfn = base_gfn + i * KVM_PAGES_PER_HPAGE(level);
> > 
> >                 if (shared) {
> >                         /*
> > @@ -377,12 +377,12 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, tdp_ptep_t pt,
> >                         WRITE_ONCE(*sptep, REMOVED_SPTE);
> >                 }
> >                 handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn,
> > -                                   old_child_spte, REMOVED_SPTE, level - 1,
> > +                                   old_child_spte, REMOVED_SPTE, level,
> >                                     shared);
> >         }
> > 
> >         kvm_flush_remote_tlbs_with_address(kvm, gfn,
> > -                                          KVM_PAGES_PER_HPAGE(level));
> > +                                          KVM_PAGES_PER_HPAGE(level + 1));
> > 
> >         call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
> >  }
> > @@ -1009,7 +1009,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
> >                 }
> > 
> >                 if (!is_shadow_present_pte(iter.old_spte)) {
> > -                       sp = alloc_tdp_mmu_page(vcpu, iter.gfn, iter.level);
> > +                       sp = alloc_tdp_mmu_page(vcpu, iter.gfn, iter.level - 1);
> >                         child_pt = sp->spt;
> > 
> >                         new_spte = make_nonleaf_spte(child_pt,
> > --
> > 2.31.1
> > 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] KVM: x86/mmu: Fix pf_fixed count in tdp_mmu_map_handle_target_level()
  2021-05-06  7:51     ` Kai Huang
@ 2021-05-06 15:29       ` Sean Christopherson
  2021-05-06 22:21         ` Kai Huang
  0 siblings, 1 reply; 21+ messages in thread
From: Sean Christopherson @ 2021-05-06 15:29 UTC (permalink / raw)
  To: Kai Huang
  Cc: Ben Gardon, kvm, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

On Thu, May 06, 2021, Kai Huang wrote:
> On Wed, 2021-05-05 at 09:11 -0700, Ben Gardon wrote:
> > It would probably also be worth putting a comment on pf_fixed so that
> > people in the future know what it's supposed to mean and we don't get
> > into archeology, reverse engineering the meaning of the stat again.
> 
> It seems the legacy MMU code path is a better place to add the comment to explain when
> pf_fixed should be increased.  However I am not sure whether it is necessary for this
> patch (and I confess I found it's hard to explain why to increase pf_fixed in case of
> emulation :)).  Or perhaps Sean can write a patch to add comment to legacy MMU :)

Ya, I think it makes sense to hold off on documenting the existing behavior in
the TDP MMU.  As is often the case in KVM, just because KVM has always done
something one way, doesn't mean it's correct/ideal.  But, bikeshedding over what
faults exactly should count towards pf_fixed is best left to a separate patch.

> I ended up with  below, by adding a comment in TDP MMU saying "to make it consistent with
> legacy MMU...", and in the commit message, I put a lore link of this discussion, since I
> found Sean's explanation is quite useful. When people are interested in, they can do a git
> blame and find the commit msg of this change -- although it is not as straightforward as
> having comment directly.
> 
> Is this OK to you?
> 
> And Sean?

Yep, works for me.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] KVM: x86/mmu: Fix TDP MMU page table level
  2021-05-06  8:00     ` Kai Huang
@ 2021-05-06 16:22       ` Ben Gardon
  2021-05-06 16:23         ` Ben Gardon
  0 siblings, 1 reply; 21+ messages in thread
From: Ben Gardon @ 2021-05-06 16:22 UTC (permalink / raw)
  To: Kai Huang
  Cc: kvm, Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel

On Thu, May 6, 2021 at 1:00 AM Kai Huang <kai.huang@intel.com> wrote:
>
> On Wed, 2021-05-05 at 09:28 -0700, Ben Gardon wrote:
> > On Wed, May 5, 2021 at 2:38 AM Kai Huang <kai.huang@intel.com> wrote:
> > >
> > > TDP MMU iterator's level is identical to page table's actual level.  For
> > > instance, for the last level page table (whose entry points to one 4K
> > > page), iter->level is 1 (PG_LEVEL_4K), and in case of 5 level paging,
> > > the iter->level is mmu->shadow_root_level, which is 5.  However, struct
> > > kvm_mmu_page's level currently is not set correctly when it is allocated
> > > in kvm_tdp_mmu_map().  When iterator hits non-present SPTE and needs to
> > > allocate a new child page table, currently iter->level, which is the
> > > level of the page table where the non-present SPTE belongs to, is used.
> > > This results in struct kvm_mmu_page's level always having its parent's
> > > level (excpet root table's level, which is initialized explicitly using
> > > mmu->shadow_root_level).  This is kinda wrong, and not consistent with
> > > existing non TDP MMU code.  Fortuantely the sp->role.level is only used
> > > in handle_removed_tdp_mmu_page(), which apparently is already aware of
> > > this, and handles correctly.  However to make it consistent with non TDP
> > > MMU code (and fix the issue that both root page table and any child of
> > > it having shadow_root_level), fix this by using iter->level - 1 in
> > > kvm_tdp_mmu_map().  Also modify handle_removed_tdp_mmu_page() to handle
> > > such change.
> >
> > Ugh. Thank you for catching this. This is going to take me a bit to
> > review as I should audit the code more broadly for this problem in the
> > TDP MMU.
> > It would probably also be a good idea to add a comment on the level
> > field to say that it represents the level of the SPTEs in the
> > associated page, not the level of the SPTE that links to the
> > associated page.
> > Hopefully that will prevent similar future misunderstandings.
>
> Regarding to adding  a comment, sorry I had a hard time to figure out where to add. Did
> you mean level field of 'struct kvm_mmu_page_role', or 'struct tdp_iter'? If it is the
> former, to me not quite useful.

I meant the level field of 'struct kvm_mmu_page_role', but if you
don't think it makes sense to add one there, I don't feel strongly
either way.

>
> I ended up with below. Is it OK to you?

Yeah, it looks good to me.

>
> If you still think a comment of level should be added, would you be more specific so that
> I can add it?

struct {
+       /*
+       * The level of the SPT tracked by this SP, as opposed to the
level of the
+       * parent SPTE linking this SPT.
+        */
        unsigned level:4;

... I guess that does sound kind of unnecessary.

>
> ------------------------------------------------------------------------
>
> TDP MMU iterator's level is identical to page table's actual level.  For
> instance, for the last level page table (whose entry points to one 4K
> page), iter->level is 1 (PG_LEVEL_4K), and in case of 5 level paging,
> the iter->level is mmu->shadow_root_level, which is 5.  However, struct
> kvm_mmu_page's level currently is not set correctly when it is allocated
> in kvm_tdp_mmu_map().  When iterator hits non-present SPTE and needs to
> allocate a new child page table, currently iter->level, which is the
> level of the page table where the non-present SPTE belongs to, is used.
> This results in struct kvm_mmu_page's level always having its parent's
> level (excpet root table's level, which is initialized explicitly using
> mmu->shadow_root_level).
>
> This is kinda wrong, and not consistent with existing non TDP MMU code.
> Fortuantely sp->role.level is only used in handle_removed_tdp_mmu_page()
> and kvm_tdp_mmu_zap_sp(), and they are already aware of this and behave
> correctly.  However to make it consistent with legacy MMU code (and fix
> the issue that both root page table and its child page table have
> shadow_root_level), use iter->level - 1 in kvm_tdp_mmu_map(), and change
> handle_removed_tdp_mmu_page() and kvm_tdp_mmu_zap_sp() accordingly.
>
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
>  arch/x86/kvm/mmu/tdp_mmu.c | 8 ++++----
>  arch/x86/kvm/mmu/tdp_mmu.h | 2 +-
>  2 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 5e28fbabcd35..45fb889f6a94 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -335,7 +335,7 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, tdp_ptep_t
> pt,
>
>         for (i = 0; i < PT64_ENT_PER_PAGE; i++) {
>                 sptep = rcu_dereference(pt) + i;
> -               gfn = base_gfn + (i * KVM_PAGES_PER_HPAGE(level - 1));
> +               gfn = base_gfn + i * KVM_PAGES_PER_HPAGE(level);
>
>                 if (shared) {
>                         /*
> @@ -377,12 +377,12 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, tdp_ptep_t
> pt,
>                         WRITE_ONCE(*sptep, REMOVED_SPTE);
>                 }
>                 handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn,
> -                                   old_child_spte, REMOVED_SPTE, level - 1,
> +                                   old_child_spte, REMOVED_SPTE, level,
>                                     shared);
>         }
>
>         kvm_flush_remote_tlbs_with_address(kvm, gfn,
> -                                          KVM_PAGES_PER_HPAGE(level));
> +                                          KVM_PAGES_PER_HPAGE(level + 1));
>
>         call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
>  }
> @@ -1013,7 +1013,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32
> error_code,
>                 }
>
>                 if (!is_shadow_present_pte(iter.old_spte)) {
> -                       sp = alloc_tdp_mmu_page(vcpu, iter.gfn, iter.level);
> +                       sp = alloc_tdp_mmu_page(vcpu, iter.gfn, iter.level - 1);
>                         child_pt = sp->spt;
>
>                         new_spte = make_nonleaf_spte(child_pt,
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
> index 5fdf63090451..7f9974c5d0b4 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.h
> +++ b/arch/x86/kvm/mmu/tdp_mmu.h
> @@ -31,7 +31,7 @@ static inline bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, int as_id,
>  }
>  static inline bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
>  {
> -       gfn_t end = sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level);
> +       gfn_t end = sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level + 1);
>
>         /*
>          * Don't allow yielding, as the caller may have a flush pending.  Note,
> --
> 2.31.1
>
>
> >
> > >
> > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > > ---
> > >  arch/x86/kvm/mmu/tdp_mmu.c | 8 ++++----
> > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > > index debe8c3ec844..bcfb87e1c06e 100644
> > > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > > @@ -335,7 +335,7 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, tdp_ptep_t pt,
> > >
> > >         for (i = 0; i < PT64_ENT_PER_PAGE; i++) {
> > >                 sptep = rcu_dereference(pt) + i;
> > > -               gfn = base_gfn + (i * KVM_PAGES_PER_HPAGE(level - 1));
> > > +               gfn = base_gfn + i * KVM_PAGES_PER_HPAGE(level);
> > >
> > >                 if (shared) {
> > >                         /*
> > > @@ -377,12 +377,12 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, tdp_ptep_t pt,
> > >                         WRITE_ONCE(*sptep, REMOVED_SPTE);
> > >                 }
> > >                 handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn,
> > > -                                   old_child_spte, REMOVED_SPTE, level - 1,
> > > +                                   old_child_spte, REMOVED_SPTE, level,
> > >                                     shared);
> > >         }
> > >
> > >         kvm_flush_remote_tlbs_with_address(kvm, gfn,
> > > -                                          KVM_PAGES_PER_HPAGE(level));
> > > +                                          KVM_PAGES_PER_HPAGE(level + 1));
> > >
> > >         call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
> > >  }
> > > @@ -1009,7 +1009,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
> > >                 }
> > >
> > >                 if (!is_shadow_present_pte(iter.old_spte)) {
> > > -                       sp = alloc_tdp_mmu_page(vcpu, iter.gfn, iter.level);
> > > +                       sp = alloc_tdp_mmu_page(vcpu, iter.gfn, iter.level - 1);
> > >                         child_pt = sp->spt;
> > >
> > >                         new_spte = make_nonleaf_spte(child_pt,
> > > --
> > > 2.31.1
> > >
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] KVM: x86/mmu: Fix TDP MMU page table level
  2021-05-06 16:22       ` Ben Gardon
@ 2021-05-06 16:23         ` Ben Gardon
  2021-05-06 22:19           ` Kai Huang
  0 siblings, 1 reply; 21+ messages in thread
From: Ben Gardon @ 2021-05-06 16:23 UTC (permalink / raw)
  To: Kai Huang
  Cc: kvm, Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel

On Thu, May 6, 2021 at 9:22 AM Ben Gardon <bgardon@google.com> wrote:
>
> On Thu, May 6, 2021 at 1:00 AM Kai Huang <kai.huang@intel.com> wrote:
> >
> > On Wed, 2021-05-05 at 09:28 -0700, Ben Gardon wrote:
> > > On Wed, May 5, 2021 at 2:38 AM Kai Huang <kai.huang@intel.com> wrote:
> > > >
> > > > TDP MMU iterator's level is identical to page table's actual level.  For
> > > > instance, for the last level page table (whose entry points to one 4K
> > > > page), iter->level is 1 (PG_LEVEL_4K), and in case of 5 level paging,
> > > > the iter->level is mmu->shadow_root_level, which is 5.  However, struct
> > > > kvm_mmu_page's level currently is not set correctly when it is allocated
> > > > in kvm_tdp_mmu_map().  When iterator hits non-present SPTE and needs to
> > > > allocate a new child page table, currently iter->level, which is the
> > > > level of the page table where the non-present SPTE belongs to, is used.
> > > > This results in struct kvm_mmu_page's level always having its parent's
> > > > level (excpet root table's level, which is initialized explicitly using
> > > > mmu->shadow_root_level).  This is kinda wrong, and not consistent with
> > > > existing non TDP MMU code.  Fortuantely the sp->role.level is only used
> > > > in handle_removed_tdp_mmu_page(), which apparently is already aware of
> > > > this, and handles correctly.  However to make it consistent with non TDP
> > > > MMU code (and fix the issue that both root page table and any child of
> > > > it having shadow_root_level), fix this by using iter->level - 1 in
> > > > kvm_tdp_mmu_map().  Also modify handle_removed_tdp_mmu_page() to handle
> > > > such change.
> > >
> > > Ugh. Thank you for catching this. This is going to take me a bit to
> > > review as I should audit the code more broadly for this problem in the
> > > TDP MMU.
> > > It would probably also be a good idea to add a comment on the level
> > > field to say that it represents the level of the SPTEs in the
> > > associated page, not the level of the SPTE that links to the
> > > associated page.
> > > Hopefully that will prevent similar future misunderstandings.
> >
> > Regarding to adding  a comment, sorry I had a hard time to figure out where to add. Did
> > you mean level field of 'struct kvm_mmu_page_role', or 'struct tdp_iter'? If it is the
> > former, to me not quite useful.
>
> I meant the level field of 'struct kvm_mmu_page_role', but if you
> don't think it makes sense to add one there, I don't feel strongly
> either way.
>
> >
> > I ended up with below. Is it OK to you?
>
> Yeah, it looks good to me.
>
> >
> > If you still think a comment of level should be added, would you be more specific so that
> > I can add it?
>
> struct {
> +       /*
> +       * The level of the SPT tracked by this SP, as opposed to the
> level of the
> +       * parent SPTE linking this SPT.
> +        */
>         unsigned level:4;
>
> ... I guess that does sound kind of unnecessary.
>
> >
> > ------------------------------------------------------------------------
> >
> > TDP MMU iterator's level is identical to page table's actual level.  For
> > instance, for the last level page table (whose entry points to one 4K
> > page), iter->level is 1 (PG_LEVEL_4K), and in case of 5 level paging,
> > the iter->level is mmu->shadow_root_level, which is 5.  However, struct
> > kvm_mmu_page's level currently is not set correctly when it is allocated
> > in kvm_tdp_mmu_map().  When iterator hits non-present SPTE and needs to
> > allocate a new child page table, currently iter->level, which is the
> > level of the page table where the non-present SPTE belongs to, is used.
> > This results in struct kvm_mmu_page's level always having its parent's
> > level (excpet root table's level, which is initialized explicitly using
> > mmu->shadow_root_level).
> >
> > This is kinda wrong, and not consistent with existing non TDP MMU code.
> > Fortuantely sp->role.level is only used in handle_removed_tdp_mmu_page()
> > and kvm_tdp_mmu_zap_sp(), and they are already aware of this and behave
> > correctly.  However to make it consistent with legacy MMU code (and fix
> > the issue that both root page table and its child page table have
> > shadow_root_level), use iter->level - 1 in kvm_tdp_mmu_map(), and change
> > handle_removed_tdp_mmu_page() and kvm_tdp_mmu_zap_sp() accordingly.
> >
> > Signed-off-by: Kai Huang <kai.huang@intel.com>

Ooops, I meant to add:

Reviewed-by: Ben Gardon <bgardon@google.com>

> > ---
> >  arch/x86/kvm/mmu/tdp_mmu.c | 8 ++++----
> >  arch/x86/kvm/mmu/tdp_mmu.h | 2 +-
> >  2 files changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > index 5e28fbabcd35..45fb889f6a94 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > @@ -335,7 +335,7 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, tdp_ptep_t
> > pt,
> >
> >         for (i = 0; i < PT64_ENT_PER_PAGE; i++) {
> >                 sptep = rcu_dereference(pt) + i;
> > -               gfn = base_gfn + (i * KVM_PAGES_PER_HPAGE(level - 1));
> > +               gfn = base_gfn + i * KVM_PAGES_PER_HPAGE(level);
> >
> >                 if (shared) {
> >                         /*
> > @@ -377,12 +377,12 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, tdp_ptep_t
> > pt,
> >                         WRITE_ONCE(*sptep, REMOVED_SPTE);
> >                 }
> >                 handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn,
> > -                                   old_child_spte, REMOVED_SPTE, level - 1,
> > +                                   old_child_spte, REMOVED_SPTE, level,
> >                                     shared);
> >         }
> >
> >         kvm_flush_remote_tlbs_with_address(kvm, gfn,
> > -                                          KVM_PAGES_PER_HPAGE(level));
> > +                                          KVM_PAGES_PER_HPAGE(level + 1));
> >
> >         call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
> >  }
> > @@ -1013,7 +1013,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32
> > error_code,
> >                 }
> >
> >                 if (!is_shadow_present_pte(iter.old_spte)) {
> > -                       sp = alloc_tdp_mmu_page(vcpu, iter.gfn, iter.level);
> > +                       sp = alloc_tdp_mmu_page(vcpu, iter.gfn, iter.level - 1);
> >                         child_pt = sp->spt;
> >
> >                         new_spte = make_nonleaf_spte(child_pt,
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
> > index 5fdf63090451..7f9974c5d0b4 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.h
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.h
> > @@ -31,7 +31,7 @@ static inline bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, int as_id,
> >  }
> >  static inline bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
> >  {
> > -       gfn_t end = sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level);
> > +       gfn_t end = sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level + 1);
> >
> >         /*
> >          * Don't allow yielding, as the caller may have a flush pending.  Note,
> > --
> > 2.31.1
> >
> >
> > >
> > > >
> > > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > > > ---
> > > >  arch/x86/kvm/mmu/tdp_mmu.c | 8 ++++----
> > > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > > > index debe8c3ec844..bcfb87e1c06e 100644
> > > > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > > > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > > > @@ -335,7 +335,7 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, tdp_ptep_t pt,
> > > >
> > > >         for (i = 0; i < PT64_ENT_PER_PAGE; i++) {
> > > >                 sptep = rcu_dereference(pt) + i;
> > > > -               gfn = base_gfn + (i * KVM_PAGES_PER_HPAGE(level - 1));
> > > > +               gfn = base_gfn + i * KVM_PAGES_PER_HPAGE(level);
> > > >
> > > >                 if (shared) {
> > > >                         /*
> > > > @@ -377,12 +377,12 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, tdp_ptep_t pt,
> > > >                         WRITE_ONCE(*sptep, REMOVED_SPTE);
> > > >                 }
> > > >                 handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn,
> > > > -                                   old_child_spte, REMOVED_SPTE, level - 1,
> > > > +                                   old_child_spte, REMOVED_SPTE, level,
> > > >                                     shared);
> > > >         }
> > > >
> > > >         kvm_flush_remote_tlbs_with_address(kvm, gfn,
> > > > -                                          KVM_PAGES_PER_HPAGE(level));
> > > > +                                          KVM_PAGES_PER_HPAGE(level + 1));
> > > >
> > > >         call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
> > > >  }
> > > > @@ -1009,7 +1009,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
> > > >                 }
> > > >
> > > >                 if (!is_shadow_present_pte(iter.old_spte)) {
> > > > -                       sp = alloc_tdp_mmu_page(vcpu, iter.gfn, iter.level);
> > > > +                       sp = alloc_tdp_mmu_page(vcpu, iter.gfn, iter.level - 1);
> > > >                         child_pt = sp->spt;
> > > >
> > > >                         new_spte = make_nonleaf_spte(child_pt,
> > > > --
> > > > 2.31.1
> > > >
> >
> >

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] KVM: x86/mmu: Fix TDP MMU page table level
  2021-05-06 16:23         ` Ben Gardon
@ 2021-05-06 22:19           ` Kai Huang
  0 siblings, 0 replies; 21+ messages in thread
From: Kai Huang @ 2021-05-06 22:19 UTC (permalink / raw)
  To: Ben Gardon
  Cc: kvm, Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel

On Thu, 2021-05-06 at 09:23 -0700, Ben Gardon wrote:
> On Thu, May 6, 2021 at 9:22 AM Ben Gardon <bgardon@google.com> wrote:
> > 
> > On Thu, May 6, 2021 at 1:00 AM Kai Huang <kai.huang@intel.com> wrote:
> > > 
> > > On Wed, 2021-05-05 at 09:28 -0700, Ben Gardon wrote:
> > > > On Wed, May 5, 2021 at 2:38 AM Kai Huang <kai.huang@intel.com> wrote:
> > > > > 
> > > > > TDP MMU iterator's level is identical to page table's actual level.  For
> > > > > instance, for the last level page table (whose entry points to one 4K
> > > > > page), iter->level is 1 (PG_LEVEL_4K), and in case of 5 level paging,
> > > > > the iter->level is mmu->shadow_root_level, which is 5.  However, struct
> > > > > kvm_mmu_page's level currently is not set correctly when it is allocated
> > > > > in kvm_tdp_mmu_map().  When iterator hits non-present SPTE and needs to
> > > > > allocate a new child page table, currently iter->level, which is the
> > > > > level of the page table where the non-present SPTE belongs to, is used.
> > > > > This results in struct kvm_mmu_page's level always having its parent's
> > > > > level (excpet root table's level, which is initialized explicitly using
> > > > > mmu->shadow_root_level).  This is kinda wrong, and not consistent with
> > > > > existing non TDP MMU code.  Fortuantely the sp->role.level is only used
> > > > > in handle_removed_tdp_mmu_page(), which apparently is already aware of
> > > > > this, and handles correctly.  However to make it consistent with non TDP
> > > > > MMU code (and fix the issue that both root page table and any child of
> > > > > it having shadow_root_level), fix this by using iter->level - 1 in
> > > > > kvm_tdp_mmu_map().  Also modify handle_removed_tdp_mmu_page() to handle
> > > > > such change.
> > > > 
> > > > Ugh. Thank you for catching this. This is going to take me a bit to
> > > > review as I should audit the code more broadly for this problem in the
> > > > TDP MMU.
> > > > It would probably also be a good idea to add a comment on the level
> > > > field to say that it represents the level of the SPTEs in the
> > > > associated page, not the level of the SPTE that links to the
> > > > associated page.
> > > > Hopefully that will prevent similar future misunderstandings.
> > > 
> > > Regarding to adding  a comment, sorry I had a hard time to figure out where to add. Did
> > > you mean level field of 'struct kvm_mmu_page_role', or 'struct tdp_iter'? If it is the
> > > former, to me not quite useful.
> > 
> > I meant the level field of 'struct kvm_mmu_page_role', but if you
> > don't think it makes sense to add one there, I don't feel strongly
> > either way.
> > 
> > > 
> > > I ended up with below. Is it OK to you?
> > 
> > Yeah, it looks good to me.
> > 
> > > 
> > > If you still think a comment of level should be added, would you be more specific so that
> > > I can add it?
> > 
> > struct {
> > +       /*
> > +       * The level of the SPT tracked by this SP, as opposed to the
> > level of the
> > +       * parent SPTE linking this SPT.
> > +        */
> >         unsigned level:4;
> > 
> > ... I guess that does sound kind of unnecessary.

Thanks for explanation.  It looks a little bit unnecessary for me.  And if necessary,
perhaps a separate patch :)

> > 
> > > 
> > > ------------------------------------------------------------------------
> > > 
> > > TDP MMU iterator's level is identical to page table's actual level.  For
> > > instance, for the last level page table (whose entry points to one 4K
> > > page), iter->level is 1 (PG_LEVEL_4K), and in case of 5 level paging,
> > > the iter->level is mmu->shadow_root_level, which is 5.  However, struct
> > > kvm_mmu_page's level currently is not set correctly when it is allocated
> > > in kvm_tdp_mmu_map().  When iterator hits non-present SPTE and needs to
> > > allocate a new child page table, currently iter->level, which is the
> > > level of the page table where the non-present SPTE belongs to, is used.
> > > This results in struct kvm_mmu_page's level always having its parent's
> > > level (excpet root table's level, which is initialized explicitly using
> > > mmu->shadow_root_level).
> > > 
> > > This is kinda wrong, and not consistent with existing non TDP MMU code.
> > > Fortuantely sp->role.level is only used in handle_removed_tdp_mmu_page()
> > > and kvm_tdp_mmu_zap_sp(), and they are already aware of this and behave
> > > correctly.  However to make it consistent with legacy MMU code (and fix
> > > the issue that both root page table and its child page table have
> > > shadow_root_level), use iter->level - 1 in kvm_tdp_mmu_map(), and change
> > > handle_removed_tdp_mmu_page() and kvm_tdp_mmu_zap_sp() accordingly.
> > > 
> > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> 
> Ooops, I meant to add:
> 
> Reviewed-by: Ben Gardon <bgardon@google.com>

Thank you!

> 
> > > ---
> > >  arch/x86/kvm/mmu/tdp_mmu.c | 8 ++++----
> > >  arch/x86/kvm/mmu/tdp_mmu.h | 2 +-
> > >  2 files changed, 5 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > > index 5e28fbabcd35..45fb889f6a94 100644
> > > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > > @@ -335,7 +335,7 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, tdp_ptep_t
> > > pt,
> > > 
> > >         for (i = 0; i < PT64_ENT_PER_PAGE; i++) {
> > >                 sptep = rcu_dereference(pt) + i;
> > > -               gfn = base_gfn + (i * KVM_PAGES_PER_HPAGE(level - 1));
> > > +               gfn = base_gfn + i * KVM_PAGES_PER_HPAGE(level);
> > > 
> > >                 if (shared) {
> > >                         /*
> > > @@ -377,12 +377,12 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, tdp_ptep_t
> > > pt,
> > >                         WRITE_ONCE(*sptep, REMOVED_SPTE);
> > >                 }
> > >                 handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn,
> > > -                                   old_child_spte, REMOVED_SPTE, level - 1,
> > > +                                   old_child_spte, REMOVED_SPTE, level,
> > >                                     shared);
> > >         }
> > > 
> > >         kvm_flush_remote_tlbs_with_address(kvm, gfn,
> > > -                                          KVM_PAGES_PER_HPAGE(level));
> > > +                                          KVM_PAGES_PER_HPAGE(level + 1));
> > > 
> > >         call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
> > >  }
> > > @@ -1013,7 +1013,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32
> > > error_code,
> > >                 }
> > > 
> > >                 if (!is_shadow_present_pte(iter.old_spte)) {
> > > -                       sp = alloc_tdp_mmu_page(vcpu, iter.gfn, iter.level);
> > > +                       sp = alloc_tdp_mmu_page(vcpu, iter.gfn, iter.level - 1);
> > >                         child_pt = sp->spt;
> > > 
> > >                         new_spte = make_nonleaf_spte(child_pt,
> > > diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
> > > index 5fdf63090451..7f9974c5d0b4 100644
> > > --- a/arch/x86/kvm/mmu/tdp_mmu.h
> > > +++ b/arch/x86/kvm/mmu/tdp_mmu.h
> > > @@ -31,7 +31,7 @@ static inline bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, int as_id,
> > >  }
> > >  static inline bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
> > >  {
> > > -       gfn_t end = sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level);
> > > +       gfn_t end = sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level + 1);
> > > 
> > >         /*
> > >          * Don't allow yielding, as the caller may have a flush pending.  Note,
> > > --
> > > 2.31.1
> > > 
> > > 
> > > > 
> > > > > 
> > > > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > > > > ---
> > > > >  arch/x86/kvm/mmu/tdp_mmu.c | 8 ++++----
> > > > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > > > > 
> > > > > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > > > > index debe8c3ec844..bcfb87e1c06e 100644
> > > > > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > > > > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > > > > @@ -335,7 +335,7 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, tdp_ptep_t pt,
> > > > > 
> > > > >         for (i = 0; i < PT64_ENT_PER_PAGE; i++) {
> > > > >                 sptep = rcu_dereference(pt) + i;
> > > > > -               gfn = base_gfn + (i * KVM_PAGES_PER_HPAGE(level - 1));
> > > > > +               gfn = base_gfn + i * KVM_PAGES_PER_HPAGE(level);
> > > > > 
> > > > >                 if (shared) {
> > > > >                         /*
> > > > > @@ -377,12 +377,12 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, tdp_ptep_t pt,
> > > > >                         WRITE_ONCE(*sptep, REMOVED_SPTE);
> > > > >                 }
> > > > >                 handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn,
> > > > > -                                   old_child_spte, REMOVED_SPTE, level - 1,
> > > > > +                                   old_child_spte, REMOVED_SPTE, level,
> > > > >                                     shared);
> > > > >         }
> > > > > 
> > > > >         kvm_flush_remote_tlbs_with_address(kvm, gfn,
> > > > > -                                          KVM_PAGES_PER_HPAGE(level));
> > > > > +                                          KVM_PAGES_PER_HPAGE(level + 1));
> > > > > 
> > > > >         call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
> > > > >  }
> > > > > @@ -1009,7 +1009,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
> > > > >                 }
> > > > > 
> > > > >                 if (!is_shadow_present_pte(iter.old_spte)) {
> > > > > -                       sp = alloc_tdp_mmu_page(vcpu, iter.gfn, iter.level);
> > > > > +                       sp = alloc_tdp_mmu_page(vcpu, iter.gfn, iter.level - 1);
> > > > >                         child_pt = sp->spt;
> > > > > 
> > > > >                         new_spte = make_nonleaf_spte(child_pt,
> > > > > --
> > > > > 2.31.1
> > > > > 
> > > 
> > > 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] KVM: x86/mmu: Fix pf_fixed count in tdp_mmu_map_handle_target_level()
  2021-05-06 15:29       ` Sean Christopherson
@ 2021-05-06 22:21         ` Kai Huang
  0 siblings, 0 replies; 21+ messages in thread
From: Kai Huang @ 2021-05-06 22:21 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Ben Gardon, kvm, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

On Thu, 2021-05-06 at 15:29 +0000, Sean Christopherson wrote:
> On Thu, May 06, 2021, Kai Huang wrote:
> > On Wed, 2021-05-05 at 09:11 -0700, Ben Gardon wrote:
> > > It would probably also be worth putting a comment on pf_fixed so that
> > > people in the future know what it's supposed to mean and we don't get
> > > into archeology, reverse engineering the meaning of the stat again.
> > 
> > It seems the legacy MMU code path is a better place to add the comment to explain when
> > pf_fixed should be increased.  However I am not sure whether it is necessary for this
> > patch (and I confess I found it's hard to explain why to increase pf_fixed in case of
> > emulation :)).  Or perhaps Sean can write a patch to add comment to legacy MMU :)
> 
> Ya, I think it makes sense to hold off on documenting the existing behavior in
> the TDP MMU.  As is often the case in KVM, just because KVM has always done
> something one way, doesn't mean it's correct/ideal.  But, bikeshedding over what
> faults exactly should count towards pf_fixed is best left to a separate patch.
> 
> > I ended up with  below, by adding a comment in TDP MMU saying "to make it consistent with
> > legacy MMU...", and in the commit message, I put a lore link of this discussion, since I
> > found Sean's explanation is quite useful. When people are interested in, they can do a git
> > blame and find the commit msg of this change -- although it is not as straightforward as
> > having comment directly.
> > 
> > Is this OK to you?
> > 
> > And Sean?
> 
> Yep, works for me.

Thanks!


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-05-06 22:21 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-05  9:37 [PATCH 0/3] TDP MMU: several minor fixes or improvements Kai Huang
2021-05-05  9:37 ` [PATCH 1/3] KVM: x86/mmu: Fix return value in tdp_mmu_map_handle_target_level() Kai Huang
2021-05-05 16:00   ` Sean Christopherson
2021-05-05 16:04     ` Ben Gardon
2021-05-06  1:56       ` Kai Huang
2021-05-05  9:37 ` [PATCH 2/3] KVM: x86/mmu: Fix pf_fixed count " Kai Huang
2021-05-05 16:11   ` Ben Gardon
2021-05-06  7:51     ` Kai Huang
2021-05-06 15:29       ` Sean Christopherson
2021-05-06 22:21         ` Kai Huang
2021-05-05 16:29   ` Sean Christopherson
2021-05-05 17:16     ` Sean Christopherson
2021-05-06  1:51       ` Kai Huang
2021-05-05  9:37 ` [PATCH 3/3] KVM: x86/mmu: Fix TDP MMU page table level Kai Huang
2021-05-05 16:28   ` Ben Gardon
2021-05-05 17:01     ` Ben Gardon
2021-05-05 20:19       ` Kai Huang
2021-05-06  8:00     ` Kai Huang
2021-05-06 16:22       ` Ben Gardon
2021-05-06 16:23         ` Ben Gardon
2021-05-06 22:19           ` Kai Huang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).