All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: David Matlack <dmatlack@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org, Ben Gardon <bgardon@google.com>,
	Joerg Roedel <joro@8bytes.org>, Jim Mattson <jmattson@google.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Sean Christopherson <seanjc@google.com>,
	Janis Schoetterl-Glausch <scgl@linux.vnet.ibm.com>,
	Junaid Shahid <junaids@google.com>,
	Oliver Upton <oupton@google.com>,
	Harish Barathvajasankar <hbarath@google.com>,
	Peter Shier <pshier@google.com>
Subject: Re: [RFC PATCH 13/15] KVM: x86/mmu: Split large pages during CLEAR_DIRTY_LOG
Date: Wed, 1 Dec 2021 12:03:49 +0800	[thread overview]
Message-ID: <Yab0JRVmwyr1GL3Y@xz-m1.local> (raw)
In-Reply-To: <CALzav=ex+5y7-5a-8Vum2-eOKuKYe=RU9NvrS82H=sTwj2mqaw@mail.gmail.com>

On Tue, Nov 30, 2021 at 04:17:01PM -0800, David Matlack wrote:
> On Tue, Nov 30, 2021 at 4:16 PM David Matlack <dmatlack@google.com> wrote:
> >
> > On Fri, Nov 26, 2021 at 4:17 AM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > On Fri, Nov 19, 2021 at 11:57:57PM +0000, David Matlack wrote:
> > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > index 6768ef9c0891..4e78ef2dd352 100644
> > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > @@ -1448,6 +1448,12 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
> > > >               gfn_t start = slot->base_gfn + gfn_offset + __ffs(mask);
> > > >               gfn_t end = slot->base_gfn + gfn_offset + __fls(mask);
> > > >
> > > > +             /*
> > > > +              * Try to proactively split any large pages down to 4KB so that
> > > > +              * vCPUs don't have to take write-protection faults.
> > > > +              */
> > > > +             kvm_mmu_try_split_large_pages(kvm, slot, start, end, PG_LEVEL_4K);
> > > > +
> > > >               kvm_mmu_slot_gfn_write_protect(kvm, slot, start, PG_LEVEL_2M);
> > > >
> > > >               /* Cross two large pages? */
> > >
> > > Is it intended to try split every time even if we could have split it already?
> > > As I remember Paolo mentioned that we can skip split if it's not the 1st
> > > CLEAR_LOG on the same range, and IIUC that makes sense.
> > >
> > > But indeed I don't see a trivial way to know whether this is the first clear of
> > > this range.  Maybe we can maintain "how many huge pages are there under current
> > > kvm_mmu_page node" somehow?  Then if root sp has the counter==0, then we can
> > > skip it.  Just a wild idea..
> > >
> > > Or maybe it's intended to try split unconditionally for some reason?  If so
> > > it'll be great to mention that either in the commit message or in comments.
> >
> > Thanks for calling this out. Could the same be said about the existing
> > code that unconditionally tries to write-protect 2M+ pages?

They're different because wr-protect can be restored (to be not-wr-protected)
when vcpu threads write to the pages, so they need to be always done.

For huge page split - when it happened during dirty tracking it'll not be
recovered anymore, so it's a one-time thing.

> > I aimed to keep parity with the write-protection calls (always try to split
> > before write-protecting) but I agree there might be opportunities available
> > to skip altogether.

So IMHO it's not about parity but it could be about how easy can it be
implemented, and whether it'll be worth it to add that complexity.

Besides the above accounting idea per-sp, we can have other ways to do this
too, e.g., keeping a bitmap showing which range has been split: that bitmap
will be 2M in granule for x86 because that'll be enough.  We init-all-ones for
the bitmap when start logging for a memslot.

But again maybe it turns out we don't really want that complexity.

IMHO a good start could be the perf numbers (which I asked in the cover letter)
comparing the overhead of 2nd+ iterations of CLEAR_LOG with/without eager page
split.

> >
> > By the way, looking at this code again I think I see some potential bugs:
> >  - I don't think I ever free split_caches in the initially-all-set case.

I saw that it's freed in kvm_mmu_try_split_large_pages(), no?

> >  - What happens if splitting fails the CLEAR_LOG but succeeds the
> > CLEAR_LOG?
> 
> Gah, meant to say "first CLEAR_LOG" and "second CLEAR_LOG" here.
> 
> > We would end up propagating the write-protection on the 2M
> > page down to the 4K page. This might cause issues if using PML.

Hmm looks correct.. I'm wondering what will happen with that.

Firstly this should be rare as the 1st split should in 99% cases succeed.

Then if split failed at the 1st attempt, we wr-protected sptes even during pml
during the split.  When written, we'll go the fast page fault and record the
writes too, I think, as we'll apply dirty bit to the new spte so I think it'll
just skip pml.  Looks like we'll be using a mixture of pml+wp but all dirty
will still be captured as exptected?..

There could be leftover wp when stopping dirty logging, but that seems not
directly harmful too.  It'll make things a bit messed up, at least.

-- 
Peter Xu


  reply	other threads:[~2021-12-01  4:04 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-19 23:57 [RFC PATCH 00/15] KVM: x86/mmu: Eager Page Splitting for the TDP MMU David Matlack
2021-11-19 23:57 ` [RFC PATCH 01/15] KVM: x86/mmu: Rename rmap_write_protect to kvm_vcpu_write_protect_gfn David Matlack
2021-11-22 18:52   ` Ben Gardon
2021-11-26 12:18   ` Peter Xu
2021-11-19 23:57 ` [RFC PATCH 02/15] KVM: x86/mmu: Rename __rmap_write_protect to rmap_write_protect David Matlack
2021-11-22 18:52   ` Ben Gardon
2021-11-26 12:18   ` Peter Xu
2021-11-19 23:57 ` [RFC PATCH 03/15] KVM: x86/mmu: Automatically update iter->old_spte if cmpxchg fails David Matlack
2021-11-22 18:52   ` Ben Gardon
2021-11-30 23:25     ` David Matlack
2021-11-19 23:57 ` [RFC PATCH 04/15] KVM: x86/mmu: Factor out logic to atomically install a new page table David Matlack
2021-11-22 18:52   ` Ben Gardon
2021-11-30 23:27     ` David Matlack
2021-12-01 19:13   ` Sean Christopherson
2021-12-01 21:52     ` David Matlack
2021-11-19 23:57 ` [RFC PATCH 05/15] KVM: x86/mmu: Abstract mmu caches out to a separate struct David Matlack
2021-11-22 18:55   ` Ben Gardon
2021-11-22 18:55     ` Ben Gardon
2021-11-30 23:28     ` David Matlack
2021-11-19 23:57 ` [RFC PATCH 06/15] KVM: x86/mmu: Derive page role from parent David Matlack
2021-11-20 12:53   ` Paolo Bonzini
2021-11-27  2:07     ` Lai Jiangshan
2021-11-27 10:26       ` Paolo Bonzini
2021-11-30 23:31     ` David Matlack
2021-12-01  0:45       ` Sean Christopherson
2021-12-01 21:56         ` David Matlack
2021-11-19 23:57 ` [RFC PATCH 07/15] KVM: x86/mmu: Pass in vcpu->arch.mmu_caches instead of vcpu David Matlack
2021-11-22 18:56   ` Ben Gardon
2021-11-19 23:57 ` [RFC PATCH 08/15] KVM: x86/mmu: Helper method to check for large and present sptes David Matlack
2021-11-22 18:56   ` Ben Gardon
2021-12-01 18:34   ` Sean Christopherson
2021-12-01 21:13     ` David Matlack
2021-11-19 23:57 ` [RFC PATCH 09/15] KVM: x86/mmu: Move restore_acc_track_spte to spte.c David Matlack
2021-11-22 18:56   ` Ben Gardon
2021-11-19 23:57 ` [RFC PATCH 10/15] KVM: x86/mmu: Abstract need_resched logic from tdp_mmu_iter_cond_resched David Matlack
2021-11-22 18:56   ` Ben Gardon
2021-11-19 23:57 ` [RFC PATCH 11/15] KVM: x86/mmu: Refactor tdp_mmu iterators to take kvm_mmu_page root David Matlack
2021-11-22 18:56   ` Ben Gardon
2021-11-19 23:57 ` [RFC PATCH 12/15] KVM: x86/mmu: Split large pages when dirty logging is enabled David Matlack
2021-11-22  5:05   ` Nikunj A. Dadhania
2021-11-30 23:33     ` David Matlack
2021-11-22 19:30   ` Ben Gardon
2021-11-30 23:44     ` David Matlack
2021-11-26 12:01   ` Peter Xu
2021-11-30 23:56     ` David Matlack
2021-12-01  1:00       ` Sean Christopherson
2021-12-01  1:29         ` David Matlack
2021-12-01  2:29           ` Peter Xu
2021-12-01 18:29             ` Sean Christopherson
2021-12-01 21:36               ` David Matlack
2021-12-01 23:37                 ` Sean Christopherson
2021-12-02 17:41                   ` David Matlack
2021-12-02 18:42                     ` Sean Christopherson
2021-12-03  0:00                       ` David Matlack
2021-12-03  1:07                         ` Sean Christopherson
2021-12-03 17:22                           ` David Matlack
2021-11-19 23:57 ` [RFC PATCH 13/15] KVM: x86/mmu: Split large pages during CLEAR_DIRTY_LOG David Matlack
2021-11-26 12:17   ` Peter Xu
2021-12-01  0:16     ` David Matlack
2021-12-01  0:17       ` David Matlack
2021-12-01  4:03         ` Peter Xu [this message]
2021-12-01 22:14           ` David Matlack
2021-12-03  4:57             ` Peter Xu
2021-12-01 19:22   ` Sean Christopherson
2021-12-01 19:49     ` Ben Gardon
2021-12-01 20:16       ` Sean Christopherson
2021-12-01 22:11         ` Ben Gardon
2021-12-01 22:17     ` David Matlack
2021-12-05 13:30   ` [KVM] d3750a0923: WARNING:possible_circular_locking_dependency_detected kernel test robot
2021-12-05 13:30     ` kernel test robot
2021-12-06  6:55     ` Paolo Bonzini
2021-12-06  6:55       ` Paolo Bonzini
2021-12-06 17:19       ` David Matlack
2021-12-06 17:19         ` David Matlack
2021-11-19 23:57 ` [RFC PATCH 14/15] KVM: x86/mmu: Add tracepoint for splitting large pages David Matlack
2021-11-19 23:57 ` [RFC PATCH 15/15] KVM: x86/mmu: Update page stats when " David Matlack
2021-12-01 19:36   ` Sean Christopherson
2021-12-01 21:11     ` David Matlack
2021-11-26 14:13 ` [RFC PATCH 00/15] KVM: x86/mmu: Eager Page Splitting for the TDP MMU Peter Xu
2021-11-30 23:22   ` David Matlack
2021-12-01  4:10     ` Peter Xu
2021-12-01  4:19       ` Peter Xu
2021-12-01 21:46       ` David Matlack

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yab0JRVmwyr1GL3Y@xz-m1.local \
    --to=peterx@redhat.com \
    --cc=bgardon@google.com \
    --cc=dmatlack@google.com \
    --cc=hbarath@google.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=junaids@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=oupton@google.com \
    --cc=pbonzini@redhat.com \
    --cc=pshier@google.com \
    --cc=scgl@linux.vnet.ibm.com \
    --cc=seanjc@google.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.