From: Sean Christopherson <seanjc@google.com>
To: David Matlack <dmatlack@google.com>
Cc: Vipin Sharma <vipinsh@google.com>,
pbonzini@redhat.com, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] KVM: x86/mmu: Make page tables for eager page splitting NUMA aware
Date: Mon, 1 Aug 2022 23:56:21 +0000 [thread overview]
Message-ID: <YuhoJUoPBOu5eMz8@google.com> (raw)
In-Reply-To: <YuhPT2drgqL+osLl@google.com>
On Mon, Aug 01, 2022, David Matlack wrote:
> On Mon, Aug 01, 2022 at 08:19:28AM -0700, Vipin Sharma wrote:
> That being said, KVM currently has a gap where a guest doing a lot of
> remote memory accesses when touching memory for the first time will
> cause KVM to allocate the TDP page tables on the arguably wrong node.
Userspace can solve this by setting the NUMA policy on a VMA or shared-object
basis. E.g. create dedicated memslots for each NUMA node, then bind each of the
backing stores to the appropriate host node.
If there is a gap, e.g. a backing store we want to use doesn't properly support
mempolicy for shared mappings, then we should enhance the backing store.
> > We can improve TDP MMU eager page splitting by making
> > tdp_mmu_alloc_sp_for_split() NUMA-aware. Specifically, when splitting a
> > huge page, allocate the new lower level page tables on the same node as the
> > huge page.
> >
> > __get_free_page() is replaced by alloc_page_nodes(). This introduces two
> > functional changes.
> >
> > 1. __get_free_page() removes gfp flag __GFP_HIGHMEM via call to
> > __get_free_pages(). This should not be an issue as __GFP_HIGHMEM flag is
> > not passed in tdp_mmu_alloc_sp_for_split() anyway.
> >
> > 2. __get_free_page() calls alloc_pages() and use thread's mempolicy for
> > the NUMA node allocation. From this commit, thread's mempolicy will not
> > be used and first preference will be to allocate on the node where huge
> > page was present.
>
> It would be worth noting that userspace could change the mempolicy of
> the thread doing eager splitting to prefer allocating from the target
> NUMA node, as an alternative approach.
>
> I don't prefer the alternative though since it bleeds details from KVM
> into userspace, such as the fact that enabling dirty logging does eager
> page splitting, which allocates page tables.
As above, if userspace is cares about vNUMA, then it already needs to be aware of
some of KVM/kernel details. Separate memslots aren't strictly necessary, e.g.
userspace could stitch together contiguous VMAs to create a single mega-memslot,
but that seems like it'd be more work than just creating separate memslots.
And because eager page splitting for dirty logging runs with mmu_lock held for,
userspace might also benefit from per-node memslots as it can do the splitting on
multiple tasks/CPUs.
Regardless of what we do, the behavior needs to be document, i.e. KVM details will
bleed into userspace. E.g. if KVM is overriding the per-task NUMA policy, then
that should be documented.
> It's also unnecessary since KVM can infer an appropriate NUMA placement
> without the help of userspace, and I can't think of a reason for userspace to
> prefer a different policy.
I can't think of a reason why userspace would want to have a different policy for
the task that's enabling dirty logging, but I also can't think of a reason why
KVM should go out of its way to ignore that policy.
IMO this is a "bug" in dirty_log_perf_test, though it's probably a good idea to
document how to effectively configure vNUMA-aware memslots.
next prev parent reply other threads:[~2022-08-01 23:56 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-01 15:19 [PATCH] KVM: x86/mmu: Make page tables for eager page splitting NUMA aware Vipin Sharma
2022-08-01 22:10 ` David Matlack
2022-08-01 23:56 ` Sean Christopherson [this message]
2022-08-02 16:31 ` David Matlack
2022-08-02 17:22 ` Sean Christopherson
2022-08-02 17:42 ` Paolo Bonzini
2022-08-02 19:12 ` Sean Christopherson
2022-08-03 15:08 ` Paolo Bonzini
2022-08-05 23:30 ` Vipin Sharma
2022-08-09 16:52 ` David Matlack
2022-08-09 23:51 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YuhoJUoPBOu5eMz8@google.com \
--to=seanjc@google.com \
--cc=dmatlack@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=vipinsh@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).