All of lore.kernel.org
 help / color / mirror / Atom feed
* A question of TDP unloading.
@ 2021-07-27 16:19 Yu Zhang
  2021-07-27 18:07 ` Sean Christopherson
  0 siblings, 1 reply; 22+ messages in thread
From: Yu Zhang @ 2021-07-27 16:19 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Ben Gardon; +Cc: kvm

Hi all,

  I'd like to ask a question about kvm_reset_context(): is there any
  reason that we must alway unload TDP root in kvm_mmu_reset_context()?
  
  As you know, KVM MMU needs to track guest paging mode changes, to
  recalculate the mmu roles and reset callback routines(e.g., guest
  page table walker). These are done in kvm_mmu_reset_context(). Also,
  entering SMM, cpuid updates, and restoring L1 VMM's host state will
  trigger kvm_mmu_reset_context() too.
  
  Meanwhile, another job done by kvm_mmu_reset_context() is to unload
  the KVM MMU:
  
  - For shadow & legacy TDP, it means to unload the root shadow/TDP
    page and reconstruct another one in kvm_mmu_reload(), before
    entering guest. Old shadow/TDP pages will probably be reused later,
    after future guest paging mode switches.
  
  - For TDP MMU, it is even more aggressive, all TDP pages will be
    zapped, meaning a whole new TDP page table will be recontrustred,
    with each paging mode change in the guest. I witnessed dozens of
    rebuildings of TDP when booting a Linux guest(besides the ones
    caused by memslots rearrangement).
  
  However, I am wondering, why do we need the unloading, if GPA->HPA
  relationship is not changed? And if this is not a must, could we
  find a way to refactor kvm_mmu_reset_context(), so that unloading
  of TDP root is only performed when necessary(e.g, SMM switches and
  maybe after cpuid updates which may change the level of TDP)? 
  
  I tried to add a parameter in kvm_mmu_reset_context(), to make the
  unloading optional:  

+void kvm_mmu_reset_context(struct kvm_vcpu *vcpu, bool force_tdp_unload)
 {
-       kvm_mmu_unload(vcpu);
+       if (!tdp_enabled || force_tdp_unload)
+               kvm_mmu_unload(vcpu);
+
        kvm_init_mmu(vcpu);
 }

  But this change brings another problem - if we keep the TDP root, the
  role of existing SPs will be obsolete after guest paging mode changes.
  Altough I guess most role flags are irrelevant in TDP, I am not sure
  if this could cause any trouble.
  
  Is there anyone looking at this issue? Or do you have any suggestion?
  Thanks!
  
B.R.
Yu


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: A question of TDP unloading.
  2021-07-27 16:19 A question of TDP unloading Yu Zhang
@ 2021-07-27 18:07 ` Sean Christopherson
  2021-07-28  6:56   ` Yu Zhang
  0 siblings, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2021-07-27 18:07 UTC (permalink / raw)
  To: Yu Zhang; +Cc: Paolo Bonzini, Ben Gardon, kvm

On Wed, Jul 28, 2021, Yu Zhang wrote:
> Hi all,
> 
>   I'd like to ask a question about kvm_reset_context(): is there any
>   reason that we must alway unload TDP root in kvm_mmu_reset_context()?

The short answer is that mmu_role is changing, thus a new root shadow page is
needed.

>   As you know, KVM MMU needs to track guest paging mode changes, to
>   recalculate the mmu roles and reset callback routines(e.g., guest
>   page table walker). These are done in kvm_mmu_reset_context(). Also,
>   entering SMM, cpuid updates, and restoring L1 VMM's host state will
>   trigger kvm_mmu_reset_context() too.
>   
>   Meanwhile, another job done by kvm_mmu_reset_context() is to unload
>   the KVM MMU:
>   
>   - For shadow & legacy TDP, it means to unload the root shadow/TDP
>     page and reconstruct another one in kvm_mmu_reload(), before
>     entering guest. Old shadow/TDP pages will probably be reused later,
>     after future guest paging mode switches.
>   
>   - For TDP MMU, it is even more aggressive, all TDP pages will be
>     zapped, meaning a whole new TDP page table will be recontrustred,
>     with each paging mode change in the guest. I witnessed dozens of
>     rebuildings of TDP when booting a Linux guest(besides the ones
>     caused by memslots rearrangement).
>   
>   However, I am wondering, why do we need the unloading, if GPA->HPA
>   relationship is not changed? And if this is not a must, could we
>   find a way to refactor kvm_mmu_reset_context(), so that unloading
>   of TDP root is only performed when necessary(e.g, SMM switches and
>   maybe after cpuid updates which may change the level of TDP)? 
>   
>   I tried to add a parameter in kvm_mmu_reset_context(), to make the
>   unloading optional:  
> 
> +void kvm_mmu_reset_context(struct kvm_vcpu *vcpu, bool force_tdp_unload)
>  {
> -       kvm_mmu_unload(vcpu);
> +       if (!tdp_enabled || force_tdp_unload)
> +               kvm_mmu_unload(vcpu);
> +
>         kvm_init_mmu(vcpu);
>  }
> 
>   But this change brings another problem - if we keep the TDP root, the
>   role of existing SPs will be obsolete after guest paging mode changes.
>   Altough I guess most role flags are irrelevant in TDP, I am not sure
>   if this could cause any trouble.
>   
>   Is there anyone looking at this issue? Or do you have any suggestion?

What's the problem you're trying to solve?  kvm_mmu_reset_context() is most
definitely a big hammer, e.g. kvm_post_set_cr0() and kvm_post_set_cr4() in
particular could be reworked to do something like kvm_mmu_new_pgd() + kvm_init_mmu(),
but modifying mmu_role bits in CR0/CR4 should be a rare event, i.e. there hasn't
sufficient motivation to optimize CR0/CR4 changes.

Note, most CR4 bits and CR0.PG are tracked in kvm_mmu_extended_role, not
kvm_mmu_page_role, which adds a minor wrinkle to the logic.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: A question of TDP unloading.
  2021-07-27 18:07 ` Sean Christopherson
@ 2021-07-28  6:56   ` Yu Zhang
  2021-07-28  7:25     ` Yan Zhao
  2021-07-28 18:37     ` Sean Christopherson
  0 siblings, 2 replies; 22+ messages in thread
From: Yu Zhang @ 2021-07-28  6:56 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Ben Gardon, kvm

Thanks a lot for your reply, Sean.

On Tue, Jul 27, 2021 at 06:07:35PM +0000, Sean Christopherson wrote:
> On Wed, Jul 28, 2021, Yu Zhang wrote:
> > Hi all,
> > 
> >   I'd like to ask a question about kvm_reset_context(): is there any
> >   reason that we must alway unload TDP root in kvm_mmu_reset_context()?
> 
> The short answer is that mmu_role is changing, thus a new root shadow page is
> needed.

I saw the mmu_role is recalculated, but I have not figured out how this
change would affect TDP. May I ask a favor to give an example? Thanks!

I realized that if we only recalculate the mmu role, but do not unload
the TDP root(e.g., when guest efer.nx flips), base role of the SPs will
be inconsistent with the mmu context. But I do not understand why this
shall affect TDP. 

> 
> >   As you know, KVM MMU needs to track guest paging mode changes, to
> >   recalculate the mmu roles and reset callback routines(e.g., guest
> >   page table walker). These are done in kvm_mmu_reset_context(). Also,
> >   entering SMM, cpuid updates, and restoring L1 VMM's host state will
> >   trigger kvm_mmu_reset_context() too.
> >   
> >   Meanwhile, another job done by kvm_mmu_reset_context() is to unload
> >   the KVM MMU:
> >   
> >   - For shadow & legacy TDP, it means to unload the root shadow/TDP
> >     page and reconstruct another one in kvm_mmu_reload(), before
> >     entering guest. Old shadow/TDP pages will probably be reused later,
> >     after future guest paging mode switches.
> >   
> >   - For TDP MMU, it is even more aggressive, all TDP pages will be
> >     zapped, meaning a whole new TDP page table will be recontrustred,
> >     with each paging mode change in the guest. I witnessed dozens of
> >     rebuildings of TDP when booting a Linux guest(besides the ones
> >     caused by memslots rearrangement).
> >   
> >   However, I am wondering, why do we need the unloading, if GPA->HPA
> >   relationship is not changed? And if this is not a must, could we
> >   find a way to refactor kvm_mmu_reset_context(), so that unloading
> >   of TDP root is only performed when necessary(e.g, SMM switches and
> >   maybe after cpuid updates which may change the level of TDP)? 
> >   
> >   I tried to add a parameter in kvm_mmu_reset_context(), to make the
> >   unloading optional:  
> > 
> > +void kvm_mmu_reset_context(struct kvm_vcpu *vcpu, bool force_tdp_unload)
> >  {
> > -       kvm_mmu_unload(vcpu);
> > +       if (!tdp_enabled || force_tdp_unload)
> > +               kvm_mmu_unload(vcpu);
> > +
> >         kvm_init_mmu(vcpu);
> >  }
> > 
> >   But this change brings another problem - if we keep the TDP root, the
> >   role of existing SPs will be obsolete after guest paging mode changes.
> >   Altough I guess most role flags are irrelevant in TDP, I am not sure
> >   if this could cause any trouble.
> >   
> >   Is there anyone looking at this issue? Or do you have any suggestion?
> 
> What's the problem you're trying to solve?  kvm_mmu_reset_context() is most
> definitely a big hammer, e.g. kvm_post_set_cr0() and kvm_post_set_cr4() in
> particular could be reworked to do something like kvm_mmu_new_pgd() + kvm_init_mmu(),
> but modifying mmu_role bits in CR0/CR4 should be a rare event, i.e. there hasn't
> sufficient motivation to optimize CR0/CR4 changes.

Well, I noticed this when I was trying to find the reason why a single GFN
can have multiple rmaps in TDP(not the SMM case, which uses different memslot).
And the fact that guest paging mode change will cause the unloading of TDP
looks counter-intuitive. I know I must have missed something important, and
I have a strong desire to figure out why. :)

Then I tried with TDP MMU, yet to find the unloading is performed even more
aggressively... 

> 
> Note, most CR4 bits and CR0.PG are tracked in kvm_mmu_extended_role, not
> kvm_mmu_page_role, which adds a minor wrinkle to the logic.
> 

The extended role is another pain point for me when reading KVM MMU code.
I can understand it is useful in shadow, but does it also matters in TDP?

B.R.
Yu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: A question of TDP unloading.
  2021-07-28  6:56   ` Yu Zhang
@ 2021-07-28  7:25     ` Yan Zhao
  2021-07-28 16:23       ` Ben Gardon
  2021-07-28 18:37     ` Sean Christopherson
  1 sibling, 1 reply; 22+ messages in thread
From: Yan Zhao @ 2021-07-28  7:25 UTC (permalink / raw)
  To: Yu Zhang; +Cc: Sean Christopherson, Paolo Bonzini, Ben Gardon, kvm

On Wed, Jul 28, 2021 at 02:56:05PM +0800, Yu Zhang wrote:
> Thanks a lot for your reply, Sean.
> 
> On Tue, Jul 27, 2021 at 06:07:35PM +0000, Sean Christopherson wrote:
> > On Wed, Jul 28, 2021, Yu Zhang wrote:
> > > Hi all,
> > > 
> > >   I'd like to ask a question about kvm_reset_context(): is there any
> > >   reason that we must alway unload TDP root in kvm_mmu_reset_context()?
> > 
> > The short answer is that mmu_role is changing, thus a new root shadow page is
> > needed.
> 
> I saw the mmu_role is recalculated, but I have not figured out how this
> change would affect TDP. May I ask a favor to give an example? Thanks!
> 
> I realized that if we only recalculate the mmu role, but do not unload
> the TDP root(e.g., when guest efer.nx flips), base role of the SPs will
> be inconsistent with the mmu context. But I do not understand why this
> shall affect TDP. 
> 
> > 
> > >   As you know, KVM MMU needs to track guest paging mode changes, to
> > >   recalculate the mmu roles and reset callback routines(e.g., guest
> > >   page table walker). These are done in kvm_mmu_reset_context(). Also,
> > >   entering SMM, cpuid updates, and restoring L1 VMM's host state will
> > >   trigger kvm_mmu_reset_context() too.
> > >   
> > >   Meanwhile, another job done by kvm_mmu_reset_context() is to unload
> > >   the KVM MMU:
> > >   
> > >   - For shadow & legacy TDP, it means to unload the root shadow/TDP
> > >     page and reconstruct another one in kvm_mmu_reload(), before
> > >     entering guest. Old shadow/TDP pages will probably be reused later,
> > >     after future guest paging mode switches.
> > >   
> > >   - For TDP MMU, it is even more aggressive, all TDP pages will be
> > >     zapped, meaning a whole new TDP page table will be recontrustred,
> > >     with each paging mode change in the guest. I witnessed dozens of
> > >     rebuildings of TDP when booting a Linux guest(besides the ones
> > >     caused by memslots rearrangement).
> > >   
> > >   However, I am wondering, why do we need the unloading, if GPA->HPA
> > >   relationship is not changed? And if this is not a must, could we
> > >   find a way to refactor kvm_mmu_reset_context(), so that unloading
> > >   of TDP root is only performed when necessary(e.g, SMM switches and
> > >   maybe after cpuid updates which may change the level of TDP)? 
> > >   
> > >   I tried to add a parameter in kvm_mmu_reset_context(), to make the
> > >   unloading optional:  
> > > 
> > > +void kvm_mmu_reset_context(struct kvm_vcpu *vcpu, bool force_tdp_unload)
> > >  {
> > > -       kvm_mmu_unload(vcpu);
> > > +       if (!tdp_enabled || force_tdp_unload)
> > > +               kvm_mmu_unload(vcpu);
> > > +
> > >         kvm_init_mmu(vcpu);
> > >  }
> > > 
> > >   But this change brings another problem - if we keep the TDP root, the
> > >   role of existing SPs will be obsolete after guest paging mode changes.
> > >   Altough I guess most role flags are irrelevant in TDP, I am not sure
> > >   if this could cause any trouble.
> > >   
> > >   Is there anyone looking at this issue? Or do you have any suggestion?
> > 
> > What's the problem you're trying to solve?  kvm_mmu_reset_context() is most
> > definitely a big hammer, e.g. kvm_post_set_cr0() and kvm_post_set_cr4() in
> > particular could be reworked to do something like kvm_mmu_new_pgd() + kvm_init_mmu(),
> > but modifying mmu_role bits in CR0/CR4 should be a rare event, i.e. there hasn't
> > sufficient motivation to optimize CR0/CR4 changes.
> 
> Well, I noticed this when I was trying to find the reason why a single GFN
> can have multiple rmaps in TDP(not the SMM case, which uses different memslot).
> And the fact that guest paging mode change will cause the unloading of TDP
> looks counter-intuitive. I know I must have missed something important, and
> I have a strong desire to figure out why. :)
> 
> Then I tried with TDP MMU, yet to find the unloading is performed even more
> aggressively... 
> 
> > 
> > Note, most CR4 bits and CR0.PG are tracked in kvm_mmu_extended_role, not
> > kvm_mmu_page_role, which adds a minor wrinkle to the logic.
> > 
> 
> The extended role is another pain point for me when reading KVM MMU code.
> I can understand it is useful in shadow, but does it also matters in TDP?
> 

hi
I noticed that shadow page's role is of type union kvm_mmu_page_role.
so, can I understand that we actually only need to do kvm_mmu_unload() when
base roles of new mmu role and the old context mmu role are different?

Thanks
Yan



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: A question of TDP unloading.
  2021-07-28  7:25     ` Yan Zhao
@ 2021-07-28 16:23       ` Ben Gardon
  2021-07-28 17:23         ` Yu Zhang
  0 siblings, 1 reply; 22+ messages in thread
From: Ben Gardon @ 2021-07-28 16:23 UTC (permalink / raw)
  To: Yan Zhao; +Cc: Yu Zhang, Sean Christopherson, Paolo Bonzini, kvm

On Wed, Jul 28, 2021 at 12:40 AM Yan Zhao <yan.y.zhao@intel.com> wrote:
>
> On Wed, Jul 28, 2021 at 02:56:05PM +0800, Yu Zhang wrote:
> > Thanks a lot for your reply, Sean.
> >
> > On Tue, Jul 27, 2021 at 06:07:35PM +0000, Sean Christopherson wrote:
> > > On Wed, Jul 28, 2021, Yu Zhang wrote:
> > > > Hi all,
> > > >
> > > >   I'd like to ask a question about kvm_reset_context(): is there any
> > > >   reason that we must alway unload TDP root in kvm_mmu_reset_context()?

I just realized I sent my response to Yu yesterday without reply-all.
Sending it again here for posterity. I'll add comments on the
discussion inline below too.

Hi Yu,

I think the short answer here is no, there's no reason we can't keep
the root around for later.

When developing the TDP MMU, we were primarily concerned about
performance post-boot, especially during migration or when
re-populating memory for demand paging. In these scenarios the guest
role doesn't really change and so the TDP MMU's shadow page tables
aren't torn down. In my initial testing, I thought I only ever
observed two TDP MMU roots be allocated over the life of the VM, but I
could be wrong.

For the TDP MMU root to be torn down, there also has to be no vCPU
using it. This probably happens in transitions to SMM and guest root
level changes, but I suspected that there would usually be at least
one vCPU in some "normal" mode, post boot. That may have been an
incorrect assumption.

I think the easiest solution to this would be to just have the TDP MMU
roots track the life of the VM by adding an extra increment to their
reference count on allocation and an extra decrement when the VM is
torn down. However this introduces a problem because it increases the
amount of memory the TDP MMU is likely to be using for its page
tables. (It could use the memory either way but it would require some
surprising guest behavior.)

I have a few questions about these unnecessary tear-downs during boot:
1. How many teardowns did you observe, and how many different roles
did they represent? Just thrashing between two roles, or 12 different
roles?
2. When the TDP MMU's page tables got torn down, how much memory did
they map / how big were they?
3. If you hacked in the extra refcount increment I suggested above,
how much of a difference in boot time does it make?

For 2 and 3 I ask because if the guest hasn't accessed much of it's
memory early in boot, the paging structure won't be very large and
tearing it down / rebuilding it is pretty cheap.

We may find that we need some kind of page quota for the TDP MMU after
all, if we want to have a bunch of roots at the same time. If that's
the case, perhaps we should spawn another email thread to discuss how
that should work.

Thanks for raising this issue!
Ben

> > >
> > > The short answer is that mmu_role is changing, thus a new root shadow page is
> > > needed.
> >
> > I saw the mmu_role is recalculated, but I have not figured out how this
> > change would affect TDP. May I ask a favor to give an example? Thanks!

One really simple example is if the guest started using SMM. In that
case it's a totally different address space, so we need a new EPT.

> >
> > I realized that if we only recalculate the mmu role, but do not unload
> > the TDP root(e.g., when guest efer.nx flips), base role of the SPs will
> > be inconsistent with the mmu context. But I do not understand why this
> > shall affect TDP.

It might not always cause problems since TDP is less sensitive to this
kind of thing than shadow paging, but getting all the details right is
hard so we just took the conservative approach of handling all role
changes with a new root.

> >
> > >
> > > >   As you know, KVM MMU needs to track guest paging mode changes, to
> > > >   recalculate the mmu roles and reset callback routines(e.g., guest
> > > >   page table walker). These are done in kvm_mmu_reset_context(). Also,
> > > >   entering SMM, cpuid updates, and restoring L1 VMM's host state will
> > > >   trigger kvm_mmu_reset_context() too.
> > > >
> > > >   Meanwhile, another job done by kvm_mmu_reset_context() is to unload
> > > >   the KVM MMU:
> > > >
> > > >   - For shadow & legacy TDP, it means to unload the root shadow/TDP
> > > >     page and reconstruct another one in kvm_mmu_reload(), before
> > > >     entering guest. Old shadow/TDP pages will probably be reused later,
> > > >     after future guest paging mode switches.
> > > >
> > > >   - For TDP MMU, it is even more aggressive, all TDP pages will be
> > > >     zapped, meaning a whole new TDP page table will be recontrustred,
> > > >     with each paging mode change in the guest. I witnessed dozens of
> > > >     rebuildings of TDP when booting a Linux guest(besides the ones
> > > >     caused by memslots rearrangement).
> > > >
> > > >   However, I am wondering, why do we need the unloading, if GPA->HPA
> > > >   relationship is not changed? And if this is not a must, could we
> > > >   find a way to refactor kvm_mmu_reset_context(), so that unloading
> > > >   of TDP root is only performed when necessary(e.g, SMM switches and
> > > >   maybe after cpuid updates which may change the level of TDP)?
> > > >
> > > >   I tried to add a parameter in kvm_mmu_reset_context(), to make the
> > > >   unloading optional:
> > > >
> > > > +void kvm_mmu_reset_context(struct kvm_vcpu *vcpu, bool force_tdp_unload)
> > > >  {
> > > > -       kvm_mmu_unload(vcpu);
> > > > +       if (!tdp_enabled || force_tdp_unload)
> > > > +               kvm_mmu_unload(vcpu);
> > > > +
> > > >         kvm_init_mmu(vcpu);
> > > >  }
> > > >
> > > >   But this change brings another problem - if we keep the TDP root, the
> > > >   role of existing SPs will be obsolete after guest paging mode changes.
> > > >   Altough I guess most role flags are irrelevant in TDP, I am not sure
> > > >   if this could cause any trouble.
> > > >
> > > >   Is there anyone looking at this issue? Or do you have any suggestion?
> > >
> > > What's the problem you're trying to solve?  kvm_mmu_reset_context() is most
> > > definitely a big hammer, e.g. kvm_post_set_cr0() and kvm_post_set_cr4() in
> > > particular could be reworked to do something like kvm_mmu_new_pgd() + kvm_init_mmu(),
> > > but modifying mmu_role bits in CR0/CR4 should be a rare event, i.e. there hasn't
> > > sufficient motivation to optimize CR0/CR4 changes.
> >
> > Well, I noticed this when I was trying to find the reason why a single GFN
> > can have multiple rmaps in TDP(not the SMM case, which uses different memslot).
> > And the fact that guest paging mode change will cause the unloading of TDP
> > looks counter-intuitive. I know I must have missed something important, and
> > I have a strong desire to figure out why. :)

Your best bet there might be to just log all the roles calculated over
the life of the VM and then reverse-engineer from there.
I would naively assume that that would give a pretty good picture of
what changes necessitated new TDP roots.

> >
> > Then I tried with TDP MMU, yet to find the unloading is performed even more
> > aggressively...
> >
> > >
> > > Note, most CR4 bits and CR0.PG are tracked in kvm_mmu_extended_role, not
> > > kvm_mmu_page_role, which adds a minor wrinkle to the logic.
> > >
> >
> > The extended role is another pain point for me when reading KVM MMU code.
> > I can understand it is useful in shadow, but does it also matters in TDP?
> >
>
> hi
> I noticed that shadow page's role is of type union kvm_mmu_page_role.
> so, can I understand that we actually only need to do kvm_mmu_unload() when
> base roles of new mmu role and the old context mmu role are different?
>
> Thanks
> Yan
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: A question of TDP unloading.
  2021-07-28 16:23       ` Ben Gardon
@ 2021-07-28 17:23         ` Yu Zhang
  2021-07-28 17:55           ` Ben Gardon
  0 siblings, 1 reply; 22+ messages in thread
From: Yu Zhang @ 2021-07-28 17:23 UTC (permalink / raw)
  To: Ben Gardon; +Cc: Yan Zhao, Sean Christopherson, Paolo Bonzini, kvm

On Wed, Jul 28, 2021 at 09:23:53AM -0700, Ben Gardon wrote:
> On Wed, Jul 28, 2021 at 12:40 AM Yan Zhao <yan.y.zhao@intel.com> wrote:
> >
> > On Wed, Jul 28, 2021 at 02:56:05PM +0800, Yu Zhang wrote:
> > > Thanks a lot for your reply, Sean.
> > >
> > > On Tue, Jul 27, 2021 at 06:07:35PM +0000, Sean Christopherson wrote:
> > > > On Wed, Jul 28, 2021, Yu Zhang wrote:
> > > > > Hi all,
> > > > >
> > > > >   I'd like to ask a question about kvm_reset_context(): is there any
> > > > >   reason that we must alway unload TDP root in kvm_mmu_reset_context()?
> 
> I just realized I sent my response to Yu yesterday without reply-all.
> Sending it again here for posterity. I'll add comments on the
> discussion inline below too.

Thanks Ben, I copied my reply here. With some gramar fixes. :)

> 
> Hi Yu,
> 
> I think the short answer here is no, there's no reason we can't keep
> the root around for later.
> 
> When developing the TDP MMU, we were primarily concerned about
> performance post-boot, especially during migration or when
> re-populating memory for demand paging. In these scenarios the guest
> role doesn't really change and so the TDP MMU's shadow page tables
> aren't torn down. In my initial testing, I thought I only ever
> observed two TDP MMU roots be allocated over the life of the VM, but I
> could be wrong.

Well I observed more, may be because I am using OVMF? Note, the vCPU
number is just one in my test.

> 
> For the TDP MMU root to be torn down, there also has to be no vCPU
> using it. This probably happens in transitions to SMM and guest root
> level changes, but I suspected that there would usually be at least
> one vCPU in some "normal" mode, post boot. That may have been an
> incorrect assumption.
> 
> I think the easiest solution to this would be to just have the TDP MMU
> roots track the life of the VM by adding an extra increment to their
> reference count on allocation and an extra decrement when the VM is
> torn down. However this introduces a problem because it increases the
> amount of memory the TDP MMU is likely to be using for its page
> tables. (It could use the memory either way but it would require some
> surprising guest behavior.)

So your suggestion is, once allocated, do not free the root page until
the VM is destroyed?

> 
> I have a few questions about these unnecessary tear-downs during boot:
> 1. How many teardowns did you observe, and how many different roles
> did they represent? Just thrashing between two roles, or 12 different
> roles?

I saw 106 reloadings of the root TDP. Among them, 14 are caused by memslot
changes. Remaining ones are caused by the context reset from CR0/CR4/EFER
changes(85 for CR0 changes). And I believe most are using the same roles,
because in legacy TDP, only 4 different TDP roots are allocated due to the
context reset(and several more are caused by memslot updating). But in TDP
MMU, that means 106 times of TDP root being torn down and reallocated.

> 2. When the TDP MMU's page tables got torn down, how much memory did
> they map / how big were they?

I did not collect this in TDP MMU, but I once tried with legacy TDP. IIRC,
there are only several SPs allocated in one TDP table when the context resets.

> 3. If you hacked in the extra refcount increment I suggested above,
> how much of a difference in boot time does it make?

I have not tried this, but I think that proposal is let TDP MMU try to
reuse previous root page with same mmu role with current context, just
like the legacy TDP does?

Actually I am curious, why would the root needs to be unloaded at all(even
in the legacy TDP code)? Sean's reply mentioned that change of the mmu role
is the reason, but I do not understand yet.

> 
> For 2 and 3 I ask because if the guest hasn't accessed much of it's
> memory early in boot, the paging structure won't be very large and
> tearing it down / rebuilding it is pretty cheap.

Agree. But I am a bit surprised to see so many CR0 changes in the boot time.

> 
> We may find that we need some kind of page quota for the TDP MMU after
> all, if we want to have a bunch of roots at the same time. If that's
> the case, perhaps we should spawn another email thread to discuss how
> that should work.

Could we find a way to obviate the requirement of unloading(if unnecessary)?

> 
> Thanks for raising this issue!
> Ben
> 
> > > >
> > > > The short answer is that mmu_role is changing, thus a new root shadow page is
> > > > needed.
> > >
> > > I saw the mmu_role is recalculated, but I have not figured out how this
> > > change would affect TDP. May I ask a favor to give an example? Thanks!
> 
> One really simple example is if the guest started using SMM. In that
> case it's a totally different address space, so we need a new EPT.

Yes. I admit unloading the MMU is necessary for SMM. But what about the other
scenarios? :)

> 
> > >
> > > I realized that if we only recalculate the mmu role, but do not unload
> > > the TDP root(e.g., when guest efer.nx flips), base role of the SPs will
> > > be inconsistent with the mmu context. But I do not understand why this
> > > shall affect TDP.
> 
> It might not always cause problems since TDP is less sensitive to this
> kind of thing than shadow paging, but getting all the details right is
> hard so we just took the conservative approach of handling all role
> changes with a new root.

I have the same feeling, but I doubt if it will *never* cause any problem. :)

Another impact I can think of is: without unloading, the root_hpa will not
be set to INVALID_PAGE, hence the kvm_mmu_load() will not be called before
vcpu entry(which may reload guest CR3/PDPTRs as well). But I have no idea
if this could cause any trouble or not.

B.R.
Yu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: A question of TDP unloading.
  2021-07-28 17:23         ` Yu Zhang
@ 2021-07-28 17:55           ` Ben Gardon
  2021-07-29  3:00             ` Yu Zhang
  0 siblings, 1 reply; 22+ messages in thread
From: Ben Gardon @ 2021-07-28 17:55 UTC (permalink / raw)
  To: Yu Zhang; +Cc: Yan Zhao, Sean Christopherson, Paolo Bonzini, kvm

On Wed, Jul 28, 2021 at 10:23 AM Yu Zhang <yu.c.zhang@linux.intel.com> wrote:
>
> On Wed, Jul 28, 2021 at 09:23:53AM -0700, Ben Gardon wrote:
> > On Wed, Jul 28, 2021 at 12:40 AM Yan Zhao <yan.y.zhao@intel.com> wrote:
> > >
> > > On Wed, Jul 28, 2021 at 02:56:05PM +0800, Yu Zhang wrote:
> > > > Thanks a lot for your reply, Sean.
> > > >
> > > > On Tue, Jul 27, 2021 at 06:07:35PM +0000, Sean Christopherson wrote:
> > > > > On Wed, Jul 28, 2021, Yu Zhang wrote:
> > > > > > Hi all,
> > > > > >
> > > > > >   I'd like to ask a question about kvm_reset_context(): is there any
> > > > > >   reason that we must alway unload TDP root in kvm_mmu_reset_context()?
> >
> > I just realized I sent my response to Yu yesterday without reply-all.
> > Sending it again here for posterity. I'll add comments on the
> > discussion inline below too.
>
> Thanks Ben, I copied my reply here. With some gramar fixes. :)
>
> >
> > Hi Yu,
> >
> > I think the short answer here is no, there's no reason we can't keep
> > the root around for later.
> >
> > When developing the TDP MMU, we were primarily concerned about
> > performance post-boot, especially during migration or when
> > re-populating memory for demand paging. In these scenarios the guest
> > role doesn't really change and so the TDP MMU's shadow page tables
> > aren't torn down. In my initial testing, I thought I only ever
> > observed two TDP MMU roots be allocated over the life of the VM, but I
> > could be wrong.
>
> Well I observed more, may be because I am using OVMF? Note, the vCPU
> number is just one in my test.

Having just one vCPU is likely to lead to more thrash, but I'm
guessing a lot of these transitions happen before the guest starts
using multiple vCPUs anyway.

>
> >
> > For the TDP MMU root to be torn down, there also has to be no vCPU
> > using it. This probably happens in transitions to SMM and guest root
> > level changes, but I suspected that there would usually be at least
> > one vCPU in some "normal" mode, post boot. That may have been an
> > incorrect assumption.
> >
> > I think the easiest solution to this would be to just have the TDP MMU
> > roots track the life of the VM by adding an extra increment to their
> > reference count on allocation and an extra decrement when the VM is
> > torn down. However this introduces a problem because it increases the
> > amount of memory the TDP MMU is likely to be using for its page
> > tables. (It could use the memory either way but it would require some
> > surprising guest behavior.)
>
> So your suggestion is, once allocated, do not free the root page until
> the VM is destroyed?

Yeah, this wouldn't be a great solution for a production setting but
if you just want to quantify the impact of teardown it's an easy hack.

>
> >
> > I have a few questions about these unnecessary tear-downs during boot:
> > 1. How many teardowns did you observe, and how many different roles
> > did they represent? Just thrashing between two roles, or 12 different
> > roles?
>
> I saw 106 reloadings of the root TDP. Among them, 14 are caused by memslot
> changes. Remaining ones are caused by the context reset from CR0/CR4/EFER
> changes(85 for CR0 changes). And I believe most are using the same roles,
> because in legacy TDP, only 4 different TDP roots are allocated due to the
> context reset(and several more are caused by memslot updating). But in TDP
> MMU, that means 106 times of TDP root being torn down and reallocated.
>
> > 2. When the TDP MMU's page tables got torn down, how much memory did
> > they map / how big were they?
>
> I did not collect this in TDP MMU, but I once tried with legacy TDP. IIRC,
> there are only several SPs allocated in one TDP table when the context resets.

Ooof that's a lot of resets, though if there are only a handful of
pages mapped, it might not be a noticeable performance impact. I think
it'd be worth collecting some performance data to quantify the impact.

>
> > 3. If you hacked in the extra refcount increment I suggested above,
> > how much of a difference in boot time does it make?
>
> I have not tried this, but I think that proposal is let TDP MMU try to
> reuse previous root page with same mmu role with current context, just
> like the legacy TDP does?

Yeah, exactly. The TDP MMU is actually already designed to do this,
but it depends on another vCPU keeping the root's refcount elevated to
keep the root around.

>
> Actually I am curious, why would the root needs to be unloaded at all(even
> in the legacy TDP code)? Sean's reply mentioned that change of the mmu role
> is the reason, but I do not understand yet.

Will follow up on this below.

>
> >
> > For 2 and 3 I ask because if the guest hasn't accessed much of it's
> > memory early in boot, the paging structure won't be very large and
> > tearing it down / rebuilding it is pretty cheap.
>
> Agree. But I am a bit surprised to see so many CR0 changes in the boot time.
>
> >
> > We may find that we need some kind of page quota for the TDP MMU after
> > all, if we want to have a bunch of roots at the same time. If that's
> > the case, perhaps we should spawn another email thread to discuss how
> > that should work.
>
> Could we find a way to obviate the requirement of unloading(if unnecessary)?

Could you be more specific about what you mean by unloading? Do you
mean just not using the current paging structure for a bit or tearing
down the whole paging structure?

>
> >
> > Thanks for raising this issue!
> > Ben
> >
> > > > >
> > > > > The short answer is that mmu_role is changing, thus a new root shadow page is
> > > > > needed.
> > > >
> > > > I saw the mmu_role is recalculated, but I have not figured out how this
> > > > change would affect TDP. May I ask a favor to give an example? Thanks!
> >
> > One really simple example is if the guest started using SMM. In that
> > case it's a totally different address space, so we need a new EPT.
>
> Yes. I admit unloading the MMU is necessary for SMM. But what about the other
> scenarios? :)

I think there are certainly scenarios where unloading is not needed
but we do it anyway. There are probably many cases where there's room
for optimization here.
Knowing when it's okay to avoid the reload can be super complicated so
it's probably mostly a case of people not having had the time / perf
motivation to track down the details.
A reload once, early in boot, is probably cheap enough that you'd only
notice it doing a very directed microbenchmark. If not it'd certainly
be worth fixing.

>
> >
> > > >
> > > > I realized that if we only recalculate the mmu role, but do not unload
> > > > the TDP root(e.g., when guest efer.nx flips), base role of the SPs will
> > > > be inconsistent with the mmu context. But I do not understand why this
> > > > shall affect TDP.
> >
> > It might not always cause problems since TDP is less sensitive to this
> > kind of thing than shadow paging, but getting all the details right is
> > hard so we just took the conservative approach of handling all role
> > changes with a new root.
>
> I have the same feeling, but I doubt if it will *never* cause any problem. :)
>
> Another impact I can think of is: without unloading, the root_hpa will not
> be set to INVALID_PAGE, hence the kvm_mmu_load() will not be called before
> vcpu entry(which may reload guest CR3/PDPTRs as well). But I have no idea
> if this could cause any trouble or not.
>
> B.R.
> Yu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: A question of TDP unloading.
  2021-07-28  6:56   ` Yu Zhang
  2021-07-28  7:25     ` Yan Zhao
@ 2021-07-28 18:37     ` Sean Christopherson
  2021-07-29  3:22       ` Yu Zhang
  1 sibling, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2021-07-28 18:37 UTC (permalink / raw)
  To: Yu Zhang; +Cc: Paolo Bonzini, Ben Gardon, kvm

On Wed, Jul 28, 2021, Yu Zhang wrote:
> Thanks a lot for your reply, Sean.
> 
> On Tue, Jul 27, 2021 at 06:07:35PM +0000, Sean Christopherson wrote:
> > On Wed, Jul 28, 2021, Yu Zhang wrote:
> > > Hi all,
> > > 
> > >   I'd like to ask a question about kvm_reset_context(): is there any
> > >   reason that we must alway unload TDP root in kvm_mmu_reset_context()?
> > 
> > The short answer is that mmu_role is changing, thus a new root shadow page is
> > needed.
> 
> I saw the mmu_role is recalculated, but I have not figured out how this
> change would affect TDP. May I ask a favor to give an example? Thanks!
> 
> I realized that if we only recalculate the mmu role, but do not unload
> the TDP root(e.g., when guest efer.nx flips), base role of the SPs will
> be inconsistent with the mmu context. But I do not understand why this
> shall affect TDP. 

The SPTEs themselves are not affected if the base mmu_role doesn't change; note,
this holds true for shadow paging, too.  What changes is all of the kvm_mmu
knowledge about how to walk the guest PTEs, e.g. if a guest toggles CR4.SMAP,
then KVM needs to recalculate the #PF permissions for guest accesses so that
emulating instructions at CPL=0 does the right thing.

As for EFER.NX and CR0.WP, they are in the base page role because they need to
be there for shadow paging, e.g. if the guest toggles EFER.NX, then the reserved
bit and executable permissions change, and reusing shadow paging for the old
EFER.NX could result in missed reserved #PF and/or incorrect executable #PF
behavior.

For simplicitly, it's far, far eaiser to reuse the same page role struct for
TDP paging (both legacy and TDP MMUs) and shadow paging.

However, I think we can safely ignore NX, WP, SMEP, and SMAP in direct shadow
pages, which would allow reusing a TDP root across changes.  This is only a baby
step (assuming it even works), as further changes to set_cr0/cr4/efer would be
needed to fully realize the optimizations, e.g. to avoid complete teardown if
the root_count hits zero.

I'll put this on my todo list, I've been looking for an excuse to update the
cr0/cr4/efer flows anyways :-).  If it works, the changes should be relatively
minor, if it works...

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a8cdfd8d45c4..700664fe163e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2077,8 +2077,20 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
        role = vcpu->arch.mmu->mmu_role.base;
        role.level = level;
        role.direct = direct;
-       if (role.direct)
+       if (role.direct) {
                role.gpte_is_8_bytes = true;
+
+               /*
+                * Guest PTE permissions do not impact SPTE permissions for
+                * direct MMUs.  Either there are no guest PTEs (CR0.PG=0) or
+                * guest PTE permissions are enforced by the CPU (TDP enabled).
+                */
+               WARN_ON_ONCE(access != ACC_ALL);
+               role.efer_nx = 0;
+               role.cr0_wp = 0;
+               role.smep_andnot_wp = 0;
+               role.smap_andnot_wp = 0;
+       }
        role.access = access;
        if (!direct_mmu && vcpu->arch.mmu->root_level <= PT32_ROOT_LEVEL) {
                quadrant = gaddr >> (PAGE_SHIFT + (PT64_PT_BITS * level));

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: A question of TDP unloading.
  2021-07-29  3:00             ` Yu Zhang
@ 2021-07-29  2:58               ` Yan Zhao
  2021-07-29  5:17                 ` Yu Zhang
                                   ` (2 more replies)
  2021-07-29  9:19               ` Paolo Bonzini
  1 sibling, 3 replies; 22+ messages in thread
From: Yan Zhao @ 2021-07-29  2:58 UTC (permalink / raw)
  To: Yu Zhang; +Cc: Ben Gardon, Sean Christopherson, Paolo Bonzini, kvm

On Thu, Jul 29, 2021 at 11:00:56AM +0800, Yu Zhang wrote:
> > 
> > Ooof that's a lot of resets, though if there are only a handful of
> > pages mapped, it might not be a noticeable performance impact. I think
> > it'd be worth collecting some performance data to quantify the impact.
> 
> Yes. Too many reset will definitely hurt the performance, though I did not see
> obvious delay.
>

if I add below limits before unloading mmu, and with
enable_unrestricted_guest=0, the boot time can be reduced to 31 secs
from more than 5 minutes. 

 void kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
 {
-       kvm_mmu_unload(vcpu);
-       kvm_init_mmu(vcpu, true);
+       union kvm_mmu_role new_role =
+               kvm_calc_tdp_mmu_root_page_role(vcpu, false);
+       struct kvm_mmu *context = &vcpu->arch.root_mmu;
+       bool reset = false;
+
+       if (new_role.as_u64 != context->mmu_role.as_u64) {
+               kvm_mmu_unload(vcpu);
+               reset = true;
+       }
+       kvm_init_mmu(vcpu, reset);

But with enable_unrestricted_guest=0, if I further modify the limits to
"if (new_role.base.word != context->mmu_role.base.word)", the VM would
fail to boot.
so, with mmu extended role changes, unload the mmu is necessary in some
situation, or at least we need to zap related sptes.

Thanks
Yan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: A question of TDP unloading.
  2021-07-28 17:55           ` Ben Gardon
@ 2021-07-29  3:00             ` Yu Zhang
  2021-07-29  2:58               ` Yan Zhao
  2021-07-29  9:19               ` Paolo Bonzini
  0 siblings, 2 replies; 22+ messages in thread
From: Yu Zhang @ 2021-07-29  3:00 UTC (permalink / raw)
  To: Ben Gardon; +Cc: Yan Zhao, Sean Christopherson, Paolo Bonzini, kvm

On Wed, Jul 28, 2021 at 10:55:01AM -0700, Ben Gardon wrote:
> On Wed, Jul 28, 2021 at 10:23 AM Yu Zhang <yu.c.zhang@linux.intel.com> wrote:
> >
> > On Wed, Jul 28, 2021 at 09:23:53AM -0700, Ben Gardon wrote:
> > > On Wed, Jul 28, 2021 at 12:40 AM Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > >
> > > > On Wed, Jul 28, 2021 at 02:56:05PM +0800, Yu Zhang wrote:
> > > > > Thanks a lot for your reply, Sean.
> > > > >
> > > > > On Tue, Jul 27, 2021 at 06:07:35PM +0000, Sean Christopherson wrote:
> > > > > > On Wed, Jul 28, 2021, Yu Zhang wrote:
> > > > > > > Hi all,
> > > > > > >
> > > > > > >   I'd like to ask a question about kvm_reset_context(): is there any
> > > > > > >   reason that we must alway unload TDP root in kvm_mmu_reset_context()?
> > >
> > > I just realized I sent my response to Yu yesterday without reply-all.
> > > Sending it again here for posterity. I'll add comments on the
> > > discussion inline below too.
> >
> > Thanks Ben, I copied my reply here. With some gramar fixes. :)
> >
> > >
> > > Hi Yu,
> > >
> > > I think the short answer here is no, there's no reason we can't keep
> > > the root around for later.
> > >
> > > When developing the TDP MMU, we were primarily concerned about
> > > performance post-boot, especially during migration or when
> > > re-populating memory for demand paging. In these scenarios the guest
> > > role doesn't really change and so the TDP MMU's shadow page tables
> > > aren't torn down. In my initial testing, I thought I only ever
> > > observed two TDP MMU roots be allocated over the life of the VM, but I
> > > could be wrong.
> >
> > Well I observed more, may be because I am using OVMF? Note, the vCPU
> > number is just one in my test.
> 
> Having just one vCPU is likely to lead to more thrash, but I'm
> guessing a lot of these transitions happen before the guest starts
> using multiple vCPUs anyway.
> 
> >
> > >
> > > For the TDP MMU root to be torn down, there also has to be no vCPU
> > > using it. This probably happens in transitions to SMM and guest root
> > > level changes, but I suspected that there would usually be at least
> > > one vCPU in some "normal" mode, post boot. That may have been an
> > > incorrect assumption.
> > >
> > > I think the easiest solution to this would be to just have the TDP MMU
> > > roots track the life of the VM by adding an extra increment to their
> > > reference count on allocation and an extra decrement when the VM is
> > > torn down. However this introduces a problem because it increases the
> > > amount of memory the TDP MMU is likely to be using for its page
> > > tables. (It could use the memory either way but it would require some
> > > surprising guest behavior.)
> >
> > So your suggestion is, once allocated, do not free the root page until
> > the VM is destroyed?
> 
> Yeah, this wouldn't be a great solution for a production setting but
> if you just want to quantify the impact of teardown it's an easy hack.
> 
> >
> > >
> > > I have a few questions about these unnecessary tear-downs during boot:
> > > 1. How many teardowns did you observe, and how many different roles
> > > did they represent? Just thrashing between two roles, or 12 different
> > > roles?
> >
> > I saw 106 reloadings of the root TDP. Among them, 14 are caused by memslot
> > changes. Remaining ones are caused by the context reset from CR0/CR4/EFER
> > changes(85 for CR0 changes). And I believe most are using the same roles,
> > because in legacy TDP, only 4 different TDP roots are allocated due to the
> > context reset(and several more are caused by memslot updating). But in TDP
> > MMU, that means 106 times of TDP root being torn down and reallocated.
> >
> > > 2. When the TDP MMU's page tables got torn down, how much memory did
> > > they map / how big were they?
> >
> > I did not collect this in TDP MMU, but I once tried with legacy TDP. IIRC,
> > there are only several SPs allocated in one TDP table when the context resets.
> 
> Ooof that's a lot of resets, though if there are only a handful of
> pages mapped, it might not be a noticeable performance impact. I think
> it'd be worth collecting some performance data to quantify the impact.

Yes. Too many reset will definitely hurt the performance, though I did not see
obvious delay.

> 
> >
> > > 3. If you hacked in the extra refcount increment I suggested above,
> > > how much of a difference in boot time does it make?
> >
> > I have not tried this, but I think that proposal is let TDP MMU try to
> > reuse previous root page with same mmu role with current context, just
> > like the legacy TDP does?
> 
> Yeah, exactly. The TDP MMU is actually already designed to do this,
> but it depends on another vCPU keeping the root's refcount elevated to
> keep the root around.
> 
> >
> > Actually I am curious, why would the root needs to be unloaded at all(even
> > in the legacy TDP code)? Sean's reply mentioned that change of the mmu role
> > is the reason, but I do not understand yet.
> 
> Will follow up on this below.
> 
> >
> > >
> > > For 2 and 3 I ask because if the guest hasn't accessed much of it's
> > > memory early in boot, the paging structure won't be very large and
> > > tearing it down / rebuilding it is pretty cheap.
> >
> > Agree. But I am a bit surprised to see so many CR0 changes in the boot time.
> >
> > >
> > > We may find that we need some kind of page quota for the TDP MMU after
> > > all, if we want to have a bunch of roots at the same time. If that's
> > > the case, perhaps we should spawn another email thread to discuss how
> > > that should work.
> >
> > Could we find a way to obviate the requirement of unloading(if unnecessary)?
> 
> Could you be more specific about what you mean by unloading? Do you
> mean just not using the current paging structure for a bit or tearing
> down the whole paging structure?

I meant just not using the current paging structure. Tearing down the whole TDP
can be avoided, by adding some account for the root SP(which may need some quota).
But if we can avoid the unnecessary unloading for both legacy TDP and TDP MMU, we
can solve this once and for all. :)

B.R.
Yu


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: A question of TDP unloading.
  2021-07-28 18:37     ` Sean Christopherson
@ 2021-07-29  3:22       ` Yu Zhang
  2021-07-29 21:04         ` Sean Christopherson
  0 siblings, 1 reply; 22+ messages in thread
From: Yu Zhang @ 2021-07-29  3:22 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Ben Gardon, kvm

On Wed, Jul 28, 2021 at 06:37:38PM +0000, Sean Christopherson wrote:
> On Wed, Jul 28, 2021, Yu Zhang wrote:
> > Thanks a lot for your reply, Sean.
> > 
> > On Tue, Jul 27, 2021 at 06:07:35PM +0000, Sean Christopherson wrote:
> > > On Wed, Jul 28, 2021, Yu Zhang wrote:
> > > > Hi all,
> > > > 
> > > >   I'd like to ask a question about kvm_reset_context(): is there any
> > > >   reason that we must alway unload TDP root in kvm_mmu_reset_context()?
> > > 
> > > The short answer is that mmu_role is changing, thus a new root shadow page is
> > > needed.
> > 
> > I saw the mmu_role is recalculated, but I have not figured out how this
> > change would affect TDP. May I ask a favor to give an example? Thanks!
> > 
> > I realized that if we only recalculate the mmu role, but do not unload
> > the TDP root(e.g., when guest efer.nx flips), base role of the SPs will
> > be inconsistent with the mmu context. But I do not understand why this
> > shall affect TDP. 
> 
> The SPTEs themselves are not affected if the base mmu_role doesn't change; note,
> this holds true for shadow paging, too.  What changes is all of the kvm_mmu
> knowledge about how to walk the guest PTEs, e.g. if a guest toggles CR4.SMAP,
> then KVM needs to recalculate the #PF permissions for guest accesses so that
> emulating instructions at CPL=0 does the right thing.
> 
> As for EFER.NX and CR0.WP, they are in the base page role because they need to
> be there for shadow paging, e.g. if the guest toggles EFER.NX, then the reserved
> bit and executable permissions change, and reusing shadow paging for the old
> EFER.NX could result in missed reserved #PF and/or incorrect executable #PF
> behavior.
> 
> For simplicitly, it's far, far eaiser to reuse the same page role struct for
> TDP paging (both legacy and TDP MMUs) and shadow paging.
> 
> However, I think we can safely ignore NX, WP, SMEP, and SMAP in direct shadow
> pages, which would allow reusing a TDP root across changes.  This is only a baby
> step (assuming it even works), as further changes to set_cr0/cr4/efer would be
> needed to fully realize the optimizations, e.g. to avoid complete teardown if
> the root_count hits zero.

Thanks for your explaination, Sean. And I fully agree!

As you can see in my first mail, I kept reinitiate the mmu role in kvm_reset_context(),
so that guest paging mode change will be handled correctly, for guest page table walker.
As to shadow, the unload is always needed, because NX and WP of existing SPs matters.

+void kvm_mmu_reset_context(struct kvm_vcpu *vcpu, bool force_tdp_unload)
{
-       kvm_mmu_unload(vcpu);
+       if (!tdp_enabled || force_tdp_unload)
+               kvm_mmu_unload(vcpu);
+
        kvm_init_mmu(vcpu);
}

In the caller, force_tdp_unload was set to false for CR0/CR4/EFER changes. For SMM and
cpuid updates, it is set to true.

With this change, I can successfully boot a VM(and of course, number of unloadings is
greatly reduced). But access test case in kvm-unit-test hangs, after CR4.SMEP is flipped.
I'm trying to figure out why...

> 
> I'll put this on my todo list, I've been looking for an excuse to update the
> cr0/cr4/efer flows anyways :-).  If it works, the changes should be relatively
> minor, if it works...
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index a8cdfd8d45c4..700664fe163e 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2077,8 +2077,20 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>         role = vcpu->arch.mmu->mmu_role.base;
>         role.level = level;
>         role.direct = direct;
> -       if (role.direct)
> +       if (role.direct) {
>                 role.gpte_is_8_bytes = true;
> +
> +               /*
> +                * Guest PTE permissions do not impact SPTE permissions for
> +                * direct MMUs.  Either there are no guest PTEs (CR0.PG=0) or
> +                * guest PTE permissions are enforced by the CPU (TDP enabled).
> +                */
> +               WARN_ON_ONCE(access != ACC_ALL);
> +               role.efer_nx = 0;
> +               role.cr0_wp = 0;
> +               role.smep_andnot_wp = 0;
> +               role.smap_andnot_wp = 0;
> +       }

How about we do this in kvm_calc_mmu_role_common()? :-)

Thanks
Yu

>         role.access = access;
>         if (!direct_mmu && vcpu->arch.mmu->root_level <= PT32_ROOT_LEVEL) {
>                 quadrant = gaddr >> (PAGE_SHIFT + (PT64_PT_BITS * level));

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: A question of TDP unloading.
  2021-07-29  2:58               ` Yan Zhao
@ 2021-07-29  5:17                 ` Yu Zhang
  2021-07-29  5:17                   ` Yan Zhao
  2021-07-29  8:48                 ` Yan Zhao
  2021-07-29 20:40                 ` Sean Christopherson
  2 siblings, 1 reply; 22+ messages in thread
From: Yu Zhang @ 2021-07-29  5:17 UTC (permalink / raw)
  To: Yan Zhao; +Cc: Ben Gardon, Sean Christopherson, Paolo Bonzini, kvm

On Thu, Jul 29, 2021 at 10:58:15AM +0800, Yan Zhao wrote:
> On Thu, Jul 29, 2021 at 11:00:56AM +0800, Yu Zhang wrote:
> > > 
> > > Ooof that's a lot of resets, though if there are only a handful of
> > > pages mapped, it might not be a noticeable performance impact. I think
> > > it'd be worth collecting some performance data to quantify the impact.
> > 
> > Yes. Too many reset will definitely hurt the performance, though I did not see
> > obvious delay.
> >
> 
> if I add below limits before unloading mmu, and with
> enable_unrestricted_guest=0, the boot time can be reduced to 31 secs
> from more than 5 minutes. 

Sorry? Do you mean your VM needs 5 minute to boot? What is your configuration?

VMX unrestricted guest has been supported on all Intel platforms since years 
ago. I do not see any reason to disable it.

B.R.
Yu

> 
>  void kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
>  {
> -       kvm_mmu_unload(vcpu);
> -       kvm_init_mmu(vcpu, true);
> +       union kvm_mmu_role new_role =
> +               kvm_calc_tdp_mmu_root_page_role(vcpu, false);
> +       struct kvm_mmu *context = &vcpu->arch.root_mmu;
> +       bool reset = false;
> +
> +       if (new_role.as_u64 != context->mmu_role.as_u64) {
> +               kvm_mmu_unload(vcpu);
> +               reset = true;
> +       }
> +       kvm_init_mmu(vcpu, reset);
> 
> But with enable_unrestricted_guest=0, if I further modify the limits to
> "if (new_role.base.word != context->mmu_role.base.word)", the VM would
> fail to boot.
> so, with mmu extended role changes, unload the mmu is necessary in some
> situation, or at least we need to zap related sptes.
> 
> Thanks
> Yan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: A question of TDP unloading.
  2021-07-29  5:17                 ` Yu Zhang
@ 2021-07-29  5:17                   ` Yan Zhao
  2021-07-29  6:34                     ` Yan Zhao
  0 siblings, 1 reply; 22+ messages in thread
From: Yan Zhao @ 2021-07-29  5:17 UTC (permalink / raw)
  To: Yu Zhang; +Cc: Ben Gardon, Sean Christopherson, Paolo Bonzini, kvm

On Thu, Jul 29, 2021 at 01:17:43PM +0800, Yu Zhang wrote:
> On Thu, Jul 29, 2021 at 10:58:15AM +0800, Yan Zhao wrote:
> > On Thu, Jul 29, 2021 at 11:00:56AM +0800, Yu Zhang wrote:
> > > > 
> > > > Ooof that's a lot of resets, though if there are only a handful of
> > > > pages mapped, it might not be a noticeable performance impact. I think
> > > > it'd be worth collecting some performance data to quantify the impact.
> > > 
> > > Yes. Too many reset will definitely hurt the performance, though I did not see
> > > obvious delay.
> > >
> > 
> > if I add below limits before unloading mmu, and with
> > enable_unrestricted_guest=0, the boot time can be reduced to 31 secs
> > from more than 5 minutes. 
> 
> Sorry? Do you mean your VM needs 5 minute to boot? What is your configuration?
>
yes. the VM needs 5 minutes to boot when I forced enable_unrestricted_guest=0 in kvm.

> VMX unrestricted guest has been supported on all Intel platforms since years 
> ago. I do not see any reason to disable it.
>
yes. just for test purpose.
To study the impact to the mode enable_unrestricted_guest=0,
since in this mode, cr0, cr4 causes lots of vmexit.

> 
> > 
> >  void kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
> >  {
> > -       kvm_mmu_unload(vcpu);
> > -       kvm_init_mmu(vcpu, true);
> > +       union kvm_mmu_role new_role =
> > +               kvm_calc_tdp_mmu_root_page_role(vcpu, false);
> > +       struct kvm_mmu *context = &vcpu->arch.root_mmu;
> > +       bool reset = false;
> > +
> > +       if (new_role.as_u64 != context->mmu_role.as_u64) {
> > +               kvm_mmu_unload(vcpu);
> > +               reset = true;
> > +       }
> > +       kvm_init_mmu(vcpu, reset);
> > 
> > But with enable_unrestricted_guest=0, if I further modify the limits to
> > "if (new_role.base.word != context->mmu_role.base.word)", the VM would
> > fail to boot.
> > so, with mmu extended role changes, unload the mmu is necessary in some
> > situation, or at least we need to zap related sptes.
> >

BTW, update some performance data when enable_unrestricted_guest=0.

1. without the restricts in above, i.e. always call kvm_mmu_unload:
 
   VM boot time: around 5 minutes.
   kvm_mmu_unload times during VM boot: 3696


2. with the above restricts, i.e. only call kvm_mmu_unload when
kvm_mmu_role changes.
  VM boot time: around 30 secs.
  kvm_mmu_unload times during VM boot: 18

3. with above restricts + Sean's suggestion in another mail.

@@ -4567,6 +4567,11 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu, bool base_only)
        role.base.direct = true;
        role.base.gpte_is_8_bytes = true;

+       role.base.nxe = 0;
+       role.base.cr0_wp = 0;
+       role.base.smep_andnot_wp = 0;
+       role.base.smap_andnot_wp = 0;
+
        return role;
 }

 VM boot time: around 30 secs.
 kvm_mmu_unload times during VM boot: 15.


sorry, though I'm not testing on latest code base (I'm testing on 5.10.0), I guess the
general idea is the same.

Thanks
Yan



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: A question of TDP unloading.
  2021-07-29  5:17                   ` Yan Zhao
@ 2021-07-29  6:34                     ` Yan Zhao
  0 siblings, 0 replies; 22+ messages in thread
From: Yan Zhao @ 2021-07-29  6:34 UTC (permalink / raw)
  To: Yu Zhang; +Cc: Ben Gardon, Sean Christopherson, Paolo Bonzini, kvm

> > Sorry? Do you mean your VM needs 5 minute to boot? What is your configuration?
> >
> yes. the VM needs 5 minutes to boot when I forced enable_unrestricted_guest=0 in kvm.
> 
> > VMX unrestricted guest has been supported on all Intel platforms since years 
> > ago. I do not see any reason to disable it.
> >
> yes. just for test purpose.
> To study the impact to the mode enable_unrestricted_guest=0,
> since in this mode, cr0, cr4 causes lots of vmexit.

one correction. actually with enable_unrestricted_guest=0, it has less
number of kvm_set_cr0(), but more kvm_mmu_reset_context() called from
kvm_set_cr0().
 ___________________________________________________________________
|                            |  #cr0  |#reset_context from cr0| #cr4|
| ---------------------------|--------|-----------------------|-----|
|enable_unrestricted_guest=0 | 627405 |      313704           | 13  | 
|----------------------------|--------|-----------------------|-----|
|enable_unrestricted_guest=1 |2092493 |       5               | 13  |
--------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: A question of TDP unloading.
  2021-07-29  2:58               ` Yan Zhao
  2021-07-29  5:17                 ` Yu Zhang
@ 2021-07-29  8:48                 ` Yan Zhao
  2021-07-29 20:40                 ` Sean Christopherson
  2 siblings, 0 replies; 22+ messages in thread
From: Yan Zhao @ 2021-07-29  8:48 UTC (permalink / raw)
  To: Yu Zhang; +Cc: Ben Gardon, Sean Christopherson, Paolo Bonzini, kvm

On Thu, Jul 29, 2021 at 10:58:15AM +0800, Yan Zhao wrote:
> On Thu, Jul 29, 2021 at 11:00:56AM +0800, Yu Zhang wrote:
> > > 
> > > Ooof that's a lot of resets, though if there are only a handful of
> > > pages mapped, it might not be a noticeable performance impact. I think
> > > it'd be worth collecting some performance data to quantify the impact.
> > 
> > Yes. Too many reset will definitely hurt the performance, though I did not see
> > obvious delay.
> >
> 
> if I add below limits before unloading mmu, and with
> enable_unrestricted_guest=0, the boot time can be reduced to 31 secs
> from more than 5 minutes. 
> 
>  void kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
>  {
> -       kvm_mmu_unload(vcpu);
> -       kvm_init_mmu(vcpu, true);
> +       union kvm_mmu_role new_role =
> +               kvm_calc_tdp_mmu_root_page_role(vcpu, false);
> +       struct kvm_mmu *context = &vcpu->arch.root_mmu;
> +       bool reset = false;
> +
> +       if (new_role.as_u64 != context->mmu_role.as_u64) {
> +               kvm_mmu_unload(vcpu);
> +               reset = true;
> +       }
> +       kvm_init_mmu(vcpu, reset);
> 
> But with enable_unrestricted_guest=0, if I further modify the limits to
> "if (new_role.base.word != context->mmu_role.base.word)", the VM would
> fail to boot.
update to this enable_unrestricted_guest=0 scenario, the previous VM failure to
boot with restrict "if (new_role.base.word != context->mmu_role.base.word)"
is because pcid changes.

with tdp_enabled, below code with extra loading of pgd actually work without
unloading the mmu.

 void kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
 {
-       kvm_mmu_unload(vcpu);
-       kvm_init_mmu(vcpu, true);
+       union kvm_mmu_role new_role =
+               kvm_calc_tdp_mmu_root_page_role(vcpu, false);
+       struct kvm_mmu *context = &vcpu->arch.root_mmu;
+       bool reset = false;
+
+       if (new_role.base.word != context->mmu_role.base.word) {
+               kvm_mmu_unload(vcpu);
+               reset = true;
+       }
+       if (new_role.ext.cr0_pg ^ context->mmu_role.ext.cr0_pg) {
+               kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_LOAD_MMU_PGD);
+               kvm_flush_remote_tlbs(vcpu->kvm);
+       }
+       kvm_init_mmu(vcpu, reset);

Thanks
Yan

> so, with mmu extended role changes, unload the mmu is necessary in some
> situation, or at least we need to zap related sptes.
> 
> Thanks
> Yan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: A question of TDP unloading.
  2021-07-29  3:00             ` Yu Zhang
  2021-07-29  2:58               ` Yan Zhao
@ 2021-07-29  9:19               ` Paolo Bonzini
  2021-07-29 16:38                 ` Yu Zhang
  1 sibling, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2021-07-29  9:19 UTC (permalink / raw)
  To: Yu Zhang, Ben Gardon; +Cc: Yan Zhao, Sean Christopherson, kvm

On 29/07/21 05:00, Yu Zhang wrote:
>> I have a few questions about these unnecessary tear-downs during boot:
>> 1. How many teardowns did you observe, and how many different roles
>> did they represent? Just thrashing between two roles, or 12 different
>> roles?
> I saw 106 reloadings of the root TDP. Among them, 14 are caused by memslot
> changes. Remaining ones are caused by the context reset from CR0/CR4/EFER
> changes(85 for CR0 changes).

Possibly because CR0/CR4/EFER are changed multiple times on SMM entry 
(to go from real mode to protected mode to 32-bit to 64-bit)?  But most 
of those page tables should be very very small; they probably have only 
one page per level.  The SMM page tables are very small too, the only 
one that is really expensive to rebuild is the main non-SMM EPT.

Paolo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: A question of TDP unloading.
  2021-07-29  9:19               ` Paolo Bonzini
@ 2021-07-29 16:38                 ` Yu Zhang
  0 siblings, 0 replies; 22+ messages in thread
From: Yu Zhang @ 2021-07-29 16:38 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Ben Gardon, Yan Zhao, Sean Christopherson, kvm

On Thu, Jul 29, 2021 at 11:19:08AM +0200, Paolo Bonzini wrote:
> On 29/07/21 05:00, Yu Zhang wrote:
> > > I have a few questions about these unnecessary tear-downs during boot:
> > > 1. How many teardowns did you observe, and how many different roles
> > > did they represent? Just thrashing between two roles, or 12 different
> > > roles?
> > I saw 106 reloadings of the root TDP. Among them, 14 are caused by memslot
> > changes. Remaining ones are caused by the context reset from CR0/CR4/EFER
> > changes(85 for CR0 changes).
> 
> Possibly because CR0/CR4/EFER are changed multiple times on SMM entry (to go
> from real mode to protected mode to 32-bit to 64-bit)?  But most of those
> page tables should be very very small; they probably have only one page per
> level.  The SMM page tables are very small too, the only one that is really
> expensive to rebuild is the main non-SMM EPT.

Thanks Paolo. 

Well, I did not see any SMM entry in the whole test. And most resetings
are due to CR0 changes in OVMF(74 out of 85), the rest are from guest
kernel initialization stage.

As expected, the number of SPs used are fairly small - about 5 - 10 for
each TDP tree. For legcy TDP, since only 4 different TDP trees are built(
the ones caused by memslot zapping are not counted). The total number of
SPs are only 30+.

B.R.
Yu
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: A question of TDP unloading.
  2021-07-29  2:58               ` Yan Zhao
  2021-07-29  5:17                 ` Yu Zhang
  2021-07-29  8:48                 ` Yan Zhao
@ 2021-07-29 20:40                 ` Sean Christopherson
  2 siblings, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2021-07-29 20:40 UTC (permalink / raw)
  To: Yan Zhao; +Cc: Yu Zhang, Ben Gardon, Paolo Bonzini, kvm

On Thu, Jul 29, 2021, Yan Zhao wrote:
> On Thu, Jul 29, 2021 at 11:00:56AM +0800, Yu Zhang wrote:
> > > 
> > > Ooof that's a lot of resets, though if there are only a handful of
> > > pages mapped, it might not be a noticeable performance impact. I think
> > > it'd be worth collecting some performance data to quantify the impact.
> > 
> > Yes. Too many reset will definitely hurt the performance, though I did not see
> > obvious delay.
> >
> 
> if I add below limits before unloading mmu, and with
> enable_unrestricted_guest=0, the boot time can be reduced to 31 secs
> from more than 5 minutes. 
> 
>  void kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
>  {
> -       kvm_mmu_unload(vcpu);
> -       kvm_init_mmu(vcpu, true);
> +       union kvm_mmu_role new_role =
> +               kvm_calc_tdp_mmu_root_page_role(vcpu, false);
> +       struct kvm_mmu *context = &vcpu->arch.root_mmu;
> +       bool reset = false;
> +
> +       if (new_role.as_u64 != context->mmu_role.as_u64) {

Aha!  A clue!

This hack indicates that the call to kvm_mmu_reset_context() is spurious, i.e.
none of the MMU role bits in CR0/CR4/EFER are changing.  Dollars to donuts says
this is due to the long-standing hack-a-fix in enter_rmode() that unconditionally
reset the MMU when the guest entered real mode.

Prior to commit 5babbb43a58a ("KVM: VMX: Remove explicit MMU reset in enter_rmode()").
(sitting in kvm/queue), enter_rmode() to deal with unrestricted_guest=0 would
unconditionally do kvm_mmu_reset_context().  Based on the above, it sounds like
your guest is going in and out of RM/PM, i.e. toggling CR0.PE.  CR0.PE isn't a
MMU role bit, so the kvm_mmu_reset_context() is spurious unless CR0.PG is also
being changed.

TL;DR: Try the current kvm/queue, or at least after commit 5babbb43a58a
       ("KVM: VMX: Remove explicit MMU reset in enter_rmode()").

> +               kvm_mmu_unload(vcpu);
> +               reset = true;
> +       }
> +       kvm_init_mmu(vcpu, reset);
> 
> But with enable_unrestricted_guest=0, if I further modify the limits to
> "if (new_role.base.word != context->mmu_role.base.word)", the VM would
> fail to boot.
> so, with mmu extended role changes, unload the mmu is necessary in some
> situation, or at least we need to zap related sptes.
> 
> Thanks
> Yan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: A question of TDP unloading.
  2021-07-29  3:22       ` Yu Zhang
@ 2021-07-29 21:04         ` Sean Christopherson
  2021-07-30  2:42           ` Yu Zhang
  2021-07-30  8:22           ` Yu Zhang
  0 siblings, 2 replies; 22+ messages in thread
From: Sean Christopherson @ 2021-07-29 21:04 UTC (permalink / raw)
  To: Yu Zhang; +Cc: Paolo Bonzini, Ben Gardon, kvm

On Thu, Jul 29, 2021, Yu Zhang wrote:
> On Wed, Jul 28, 2021 at 06:37:38PM +0000, Sean Christopherson wrote:
> > On Wed, Jul 28, 2021, Yu Zhang wrote:
> In the caller, force_tdp_unload was set to false for CR0/CR4/EFER changes. For SMM and
> cpuid updates, it is set to true.
> 
> With this change, I can successfully boot a VM(and of course, number of unloadings is
> greatly reduced). But access test case in kvm-unit-test hangs, after CR4.SMEP is flipped.
> I'm trying to figure out why...

Hrm, I'll look into when I get around to making this into a proper patch.

Note, there's at least once bug, as is_root_usable() will compare the full role
against a root shadow page's modified role.  A common helper to derive the page
role for a direct/TDP page from an existing mmu_role is likely the way to go, as
kvm_tdp_mmu_get_vcpu_root_hpa() would want the same functionality.

> > I'll put this on my todo list, I've been looking for an excuse to update the
> > cr0/cr4/efer flows anyways :-).  If it works, the changes should be relatively
> > minor, if it works...
> > 
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index a8cdfd8d45c4..700664fe163e 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -2077,8 +2077,20 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
> >         role = vcpu->arch.mmu->mmu_role.base;
> >         role.level = level;
> >         role.direct = direct;
> > -       if (role.direct)
> > +       if (role.direct) {
> >                 role.gpte_is_8_bytes = true;
> > +
> > +               /*
> > +                * Guest PTE permissions do not impact SPTE permissions for
> > +                * direct MMUs.  Either there are no guest PTEs (CR0.PG=0) or
> > +                * guest PTE permissions are enforced by the CPU (TDP enabled).
> > +                */
> > +               WARN_ON_ONCE(access != ACC_ALL);
> > +               role.efer_nx = 0;
> > +               role.cr0_wp = 0;
> > +               role.smep_andnot_wp = 0;
> > +               role.smap_andnot_wp = 0;
> > +       }
> 
> How about we do this in kvm_calc_mmu_role_common()? :-)

No, because the role in struct kvm_mmu does need the correct bits, even for TDP,
as the role is used to detect whether or not the context needs to be re-initialized,
e.g. it would get a false negative on a cr0_wp change, not go through
update_permission_bitmask(), and use the wrong page permissions when walking the
guest page tables.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: A question of TDP unloading.
  2021-07-29 21:04         ` Sean Christopherson
@ 2021-07-30  2:42           ` Yu Zhang
  2021-07-30  9:42             ` Yu Zhang
  2021-07-30  8:22           ` Yu Zhang
  1 sibling, 1 reply; 22+ messages in thread
From: Yu Zhang @ 2021-07-30  2:42 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Ben Gardon, kvm

On Thu, Jul 29, 2021 at 09:04:56PM +0000, Sean Christopherson wrote:
> On Thu, Jul 29, 2021, Yu Zhang wrote:
> > On Wed, Jul 28, 2021 at 06:37:38PM +0000, Sean Christopherson wrote:
> > > On Wed, Jul 28, 2021, Yu Zhang wrote:
> > In the caller, force_tdp_unload was set to false for CR0/CR4/EFER changes. For SMM and
> > cpuid updates, it is set to true.
> > 
> > With this change, I can successfully boot a VM(and of course, number of unloadings is
> > greatly reduced). But access test case in kvm-unit-test hangs, after CR4.SMEP is flipped.
> > I'm trying to figure out why...
> 
> Hrm, I'll look into when I get around to making this into a proper patch.
> 
> Note, there's at least once bug, as is_root_usable() will compare the full role
> against a root shadow page's modified role.  A common helper to derive the page
> role for a direct/TDP page from an existing mmu_role is likely the way to go, as
> kvm_tdp_mmu_get_vcpu_root_hpa() would want the same functionality.

So, if we know there are some bits meaningless in SP, could we use a 
ignored_mask, each time we try to compare the full role.word? This may
be also needed in kvm_mmu_get_page().

> 
> > > I'll put this on my todo list, I've been looking for an excuse to update the
> > > cr0/cr4/efer flows anyways :-).  If it works, the changes should be relatively
> > > minor, if it works...
> > > 
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index a8cdfd8d45c4..700664fe163e 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -2077,8 +2077,20 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
> > >         role = vcpu->arch.mmu->mmu_role.base;
> > >         role.level = level;
> > >         role.direct = direct;
> > > -       if (role.direct)
> > > +       if (role.direct) {
> > >                 role.gpte_is_8_bytes = true;
> > > +
> > > +               /*
> > > +                * Guest PTE permissions do not impact SPTE permissions for
> > > +                * direct MMUs.  Either there are no guest PTEs (CR0.PG=0) or
> > > +                * guest PTE permissions are enforced by the CPU (TDP enabled).
> > > +                */
> > > +               WARN_ON_ONCE(access != ACC_ALL);
> > > +               role.efer_nx = 0;
> > > +               role.cr0_wp = 0;
> > > +               role.smep_andnot_wp = 0;
> > > +               role.smap_andnot_wp = 0;
> > > +       }
> > 
> > How about we do this in kvm_calc_mmu_role_common()? :-)
> 
> No, because the role in struct kvm_mmu does need the correct bits, even for TDP,
> as the role is used to detect whether or not the context needs to be re-initialized,
> e.g. it would get a false negative on a cr0_wp change, not go through
> update_permission_bitmask(), and use the wrong page permissions when walking the
> guest page tables.

Oh yes. Regardless of what flags really matter in a SP, all of them are useful for mmu
context. Thanks for correcting me.

B.R.
Yu


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: A question of TDP unloading.
  2021-07-29 21:04         ` Sean Christopherson
  2021-07-30  2:42           ` Yu Zhang
@ 2021-07-30  8:22           ` Yu Zhang
  1 sibling, 0 replies; 22+ messages in thread
From: Yu Zhang @ 2021-07-30  8:22 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Ben Gardon, kvm

On Thu, Jul 29, 2021 at 09:04:56PM +0000, Sean Christopherson wrote:
> On Thu, Jul 29, 2021, Yu Zhang wrote:
> > On Wed, Jul 28, 2021 at 06:37:38PM +0000, Sean Christopherson wrote:
> > > On Wed, Jul 28, 2021, Yu Zhang wrote:
> > In the caller, force_tdp_unload was set to false for CR0/CR4/EFER changes. For SMM and
> > cpuid updates, it is set to true.
> > 
> > With this change, I can successfully boot a VM(and of course, number of unloadings is
> > greatly reduced). But access test case in kvm-unit-test hangs, after CR4.SMEP is flipped.
> > I'm trying to figure out why...
> 
> Hrm, I'll look into when I get around to making this into a proper patch.

Well, I think I found the reason why access test failed in access test, when guest flips
CR4.SMEP: the TLB needs to be flushed. 

Currently, this is done in kvm_mmu_reload() -> kvm_mmu_load(). Avoiding the unloading in
kvm_mmu_reset_context() will not set the root_hpa to INVALID_PAGE, thus will not trigger
the kvm_mmu_load().

After I flushed the TLB, all access test cases passed. :)

B.R.
Yu
> 
> Note, there's at least once bug, as is_root_usable() will compare the full role
> against a root shadow page's modified role.  A common helper to derive the page
> role for a direct/TDP page from an existing mmu_role is likely the way to go, as
> kvm_tdp_mmu_get_vcpu_root_hpa() would want the same functionality.
> 
> > > I'll put this on my todo list, I've been looking for an excuse to update the
> > > cr0/cr4/efer flows anyways :-).  If it works, the changes should be relatively
> > > minor, if it works...
> > > 
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index a8cdfd8d45c4..700664fe163e 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -2077,8 +2077,20 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
> > >         role = vcpu->arch.mmu->mmu_role.base;
> > >         role.level = level;
> > >         role.direct = direct;
> > > -       if (role.direct)
> > > +       if (role.direct) {
> > >                 role.gpte_is_8_bytes = true;
> > > +
> > > +               /*
> > > +                * Guest PTE permissions do not impact SPTE permissions for
> > > +                * direct MMUs.  Either there are no guest PTEs (CR0.PG=0) or
> > > +                * guest PTE permissions are enforced by the CPU (TDP enabled).
> > > +                */
> > > +               WARN_ON_ONCE(access != ACC_ALL);
> > > +               role.efer_nx = 0;
> > > +               role.cr0_wp = 0;
> > > +               role.smep_andnot_wp = 0;
> > > +               role.smap_andnot_wp = 0;
> > > +       }
> > 
> > How about we do this in kvm_calc_mmu_role_common()? :-)
> 
> No, because the role in struct kvm_mmu does need the correct bits, even for TDP,
> as the role is used to detect whether or not the context needs to be re-initialized,
> e.g. it would get a false negative on a cr0_wp change, not go through
> update_permission_bitmask(), and use the wrong page permissions when walking the
> guest page tables.
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: A question of TDP unloading.
  2021-07-30  2:42           ` Yu Zhang
@ 2021-07-30  9:42             ` Yu Zhang
  0 siblings, 0 replies; 22+ messages in thread
From: Yu Zhang @ 2021-07-30  9:42 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Ben Gardon, kvm

On Fri, Jul 30, 2021 at 10:42:51AM +0800, Yu Zhang wrote:
> On Thu, Jul 29, 2021 at 09:04:56PM +0000, Sean Christopherson wrote:
> > On Thu, Jul 29, 2021, Yu Zhang wrote:
> > > On Wed, Jul 28, 2021 at 06:37:38PM +0000, Sean Christopherson wrote:
> > > > On Wed, Jul 28, 2021, Yu Zhang wrote:
> > > In the caller, force_tdp_unload was set to false for CR0/CR4/EFER changes. For SMM and
> > > cpuid updates, it is set to true.
> > > 
> > > With this change, I can successfully boot a VM(and of course, number of unloadings is
> > > greatly reduced). But access test case in kvm-unit-test hangs, after CR4.SMEP is flipped.
> > > I'm trying to figure out why...
> > 
> > Hrm, I'll look into when I get around to making this into a proper patch.
> > 
> > Note, there's at least once bug, as is_root_usable() will compare the full role
> > against a root shadow page's modified role.  A common helper to derive the page
> > role for a direct/TDP page from an existing mmu_role is likely the way to go, as
> > kvm_tdp_mmu_get_vcpu_root_hpa() would want the same functionality.
> 
> So, if we know there are some bits meaningless in SP, could we use a 
> ignored_mask, each time we try to compare the full role.word? This may
> be also needed in kvm_mmu_get_page().

Oh. We do not need this, setting these flags to 0 shall be fine, because role flags
of the SP are all from kvm_mmu_get_page(). Sorry for the noise...

B.R.
Yu
> > 
> > > > I'll put this on my todo list, I've been looking for an excuse to update the
> > > > cr0/cr4/efer flows anyways :-).  If it works, the changes should be relatively
> > > > minor, if it works...
> > > > 
> > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > index a8cdfd8d45c4..700664fe163e 100644
> > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > @@ -2077,8 +2077,20 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
> > > >         role = vcpu->arch.mmu->mmu_role.base;
> > > >         role.level = level;
> > > >         role.direct = direct;
> > > > -       if (role.direct)
> > > > +       if (role.direct) {
> > > >                 role.gpte_is_8_bytes = true;
> > > > +
> > > > +               /*
> > > > +                * Guest PTE permissions do not impact SPTE permissions for
> > > > +                * direct MMUs.  Either there are no guest PTEs (CR0.PG=0) or
> > > > +                * guest PTE permissions are enforced by the CPU (TDP enabled).
> > > > +                */
> > > > +               WARN_ON_ONCE(access != ACC_ALL);
> > > > +               role.efer_nx = 0;
> > > > +               role.cr0_wp = 0;
> > > > +               role.smep_andnot_wp = 0;
> > > > +               role.smap_andnot_wp = 0;
> > > > +       }
> > > 
> > > How about we do this in kvm_calc_mmu_role_common()? :-)
> > 
> > No, because the role in struct kvm_mmu does need the correct bits, even for TDP,
> > as the role is used to detect whether or not the context needs to be re-initialized,
> > e.g. it would get a false negative on a cr0_wp change, not go through
> > update_permission_bitmask(), and use the wrong page permissions when walking the
> > guest page tables.
> 
> Oh yes. Regardless of what flags really matter in a SP, all of them are useful for mmu
> context. Thanks for correcting me.
> 
> B.R.
> Yu
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-07-30  9:42 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-27 16:19 A question of TDP unloading Yu Zhang
2021-07-27 18:07 ` Sean Christopherson
2021-07-28  6:56   ` Yu Zhang
2021-07-28  7:25     ` Yan Zhao
2021-07-28 16:23       ` Ben Gardon
2021-07-28 17:23         ` Yu Zhang
2021-07-28 17:55           ` Ben Gardon
2021-07-29  3:00             ` Yu Zhang
2021-07-29  2:58               ` Yan Zhao
2021-07-29  5:17                 ` Yu Zhang
2021-07-29  5:17                   ` Yan Zhao
2021-07-29  6:34                     ` Yan Zhao
2021-07-29  8:48                 ` Yan Zhao
2021-07-29 20:40                 ` Sean Christopherson
2021-07-29  9:19               ` Paolo Bonzini
2021-07-29 16:38                 ` Yu Zhang
2021-07-28 18:37     ` Sean Christopherson
2021-07-29  3:22       ` Yu Zhang
2021-07-29 21:04         ` Sean Christopherson
2021-07-30  2:42           ` Yu Zhang
2021-07-30  9:42             ` Yu Zhang
2021-07-30  8:22           ` Yu Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.