kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: David Matlack <dmatlack@google.com>
Cc: Vipin Sharma <vipinsh@google.com>,
	pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] KVM: x86/mmu: Make page tables for eager page splitting NUMA aware
Date: Mon, 1 Aug 2022 23:56:21 +0000	[thread overview]
Message-ID: <YuhoJUoPBOu5eMz8@google.com> (raw)
In-Reply-To: <YuhPT2drgqL+osLl@google.com>

On Mon, Aug 01, 2022, David Matlack wrote:
> On Mon, Aug 01, 2022 at 08:19:28AM -0700, Vipin Sharma wrote:
> That being said, KVM currently has a gap where a guest doing a lot of
> remote memory accesses when touching memory for the first time will
> cause KVM to allocate the TDP page tables on the arguably wrong node.

Userspace can solve this by setting the NUMA policy on a VMA or shared-object
basis.  E.g. create dedicated memslots for each NUMA node, then bind each of the
backing stores to the appropriate host node.

If there is a gap, e.g. a backing store we want to use doesn't properly support
mempolicy for shared mappings, then we should enhance the backing store.

> > We can improve TDP MMU eager page splitting by making
> > tdp_mmu_alloc_sp_for_split() NUMA-aware. Specifically, when splitting a
> > huge page, allocate the new lower level page tables on the same node as the
> > huge page.
> > 
> > __get_free_page() is replaced by alloc_page_nodes(). This introduces two
> > functional changes.
> > 
> >   1. __get_free_page() removes gfp flag __GFP_HIGHMEM via call to
> >   __get_free_pages(). This should not be an issue  as __GFP_HIGHMEM flag is
> >   not passed in tdp_mmu_alloc_sp_for_split() anyway.
> > 
> >   2. __get_free_page() calls alloc_pages() and use thread's mempolicy for
> >   the NUMA node allocation. From this commit, thread's mempolicy will not
> >   be used and first preference will be to allocate on the node where huge
> >   page was present.
> 
> It would be worth noting that userspace could change the mempolicy of
> the thread doing eager splitting to prefer allocating from the target
> NUMA node, as an alternative approach.
> 
> I don't prefer the alternative though since it bleeds details from KVM
> into userspace, such as the fact that enabling dirty logging does eager
> page splitting, which allocates page tables. 

As above, if userspace is cares about vNUMA, then it already needs to be aware of
some of KVM/kernel details.  Separate memslots aren't strictly necessary, e.g.
userspace could stitch together contiguous VMAs to create a single mega-memslot,
but that seems like it'd be more work than just creating separate memslots.

And because eager page splitting for dirty logging runs with mmu_lock held for,
userspace might also benefit from per-node memslots as it can do the splitting on
multiple tasks/CPUs.

Regardless of what we do, the behavior needs to be document, i.e. KVM details will
bleed into userspace.  E.g. if KVM is overriding the per-task NUMA policy, then
that should be documented.

> It's also unnecessary since KVM can infer an appropriate NUMA placement
> without the help of userspace, and I can't think of a reason for userspace to
> prefer a different policy.

I can't think of a reason why userspace would want to have a different policy for
the task that's enabling dirty logging, but I also can't think of a reason why
KVM should go out of its way to ignore that policy.

IMO this is a "bug" in dirty_log_perf_test, though it's probably a good idea to
document how to effectively configure vNUMA-aware memslots.

  reply	other threads:[~2022-08-01 23:56 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-01 15:19 [PATCH] KVM: x86/mmu: Make page tables for eager page splitting NUMA aware Vipin Sharma
2022-08-01 22:10 ` David Matlack
2022-08-01 23:56   ` Sean Christopherson [this message]
2022-08-02 16:31     ` David Matlack
2022-08-02 17:22       ` Sean Christopherson
2022-08-02 17:42         ` Paolo Bonzini
2022-08-02 19:12           ` Sean Christopherson
2022-08-03 15:08             ` Paolo Bonzini
2022-08-05 23:30               ` Vipin Sharma
2022-08-09 16:52                 ` David Matlack
2022-08-09 23:51                   ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YuhoJUoPBOu5eMz8@google.com \
    --to=seanjc@google.com \
    --cc=dmatlack@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=vipinsh@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).