From: Paolo Bonzini <pbonzini@redhat.com>
To: Sean Christopherson <seanjc@google.com>
Cc: David Matlack <dmatlack@google.com>,
Vipin Sharma <vipinsh@google.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] KVM: x86/mmu: Make page tables for eager page splitting NUMA aware
Date: Wed, 3 Aug 2022 17:08:25 +0200 [thread overview]
Message-ID: <cedcced0-b92c-07bd-ef2b-272ae58fdf40@redhat.com> (raw)
In-Reply-To: <Yul3A4CmaAHMui2Z@google.com>
On 8/2/22 21:12, Sean Christopherson wrote:
> Ah crud, I misread the patch. I was thinking tdp_mmu_spte_to_nid() was getting
> the node for the existing shadow page, but it's actually getting the node for the
> underlying physical huge page.
>
> That means this patch is broken now that KVM doesn't require a "struct page" to
> create huge SPTEs (commit a8ac499bb6ab ("KVM: x86/mmu: Don't require refcounted
> "struct page" to create huge SPTEs"). I.e. this will explode as pfn_to_page()
> may return garbage.
>
> return page_to_nid(pfn_to_page(spte_to_pfn(spte)));
I was about to say that yesterday. However my knowledge of struct page
things has been proved to be spotty enough, that I wasn't 100% sure of
that. But yeah, with a fresh mind it's quite obvious that anything that
goes through hva_to_pfn_remap and similar paths will fail.
> That said, I still don't like this patch, at least not as a one-off thing. I do
> like the idea of having page table allocations follow the node requested for the
> target pfn, what I don't like is having one-off behavior that silently overrides
> userspace policy.
Yes, I totally agree with making it all or nothing.
> I would much rather we solve the problem for all page table allocations, maybe
> with a KVM_CAP or module param? Unfortunately, that's a very non-trivial change
> because it will probably require having a per-node cache in each of the per-vCPU
> caches.
A module parameter is fine. If you care about performance to this
level, your userspace is probably homogeneous enough.
Paolo
> Hmm, actually, a not-awful alternative would be to have the fault path fallback
> to the current task's policy if allocation fails. I.e. make it best effort.
>
> E.g. as a rough sketch...
>
> ---
> arch/x86/kvm/mmu/tdp_mmu.c | 37 ++++++++++++++++++++++++++++++-------
> 1 file changed, 30 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index bf2ccf9debca..e475f5b55794 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -273,10 +273,11 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
>
> static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
> {
> + struct kvm_mmu_memory_cache *cache = &vcpu->arch.mmu_shadow_page_cache;
> struct kvm_mmu_page *sp;
>
> sp = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
> - sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache);
> + sp->spt = kvm_mmu_alloc_shadow_page_table(cache, GFP_NOWAIT, pfn);
>
> return sp;
> }
> @@ -1190,7 +1191,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> if (is_removed_spte(iter.old_spte))
> break;
>
> - sp = tdp_mmu_alloc_sp(vcpu);
> + sp = tdp_mmu_alloc_sp(vcpu, fault->pfn);
> tdp_mmu_init_child_sp(sp, &iter);
>
> if (tdp_mmu_link_sp(vcpu->kvm, &iter, sp, account_nx, true)) {
> @@ -1402,17 +1403,39 @@ bool kvm_tdp_mmu_wrprot_slot(struct kvm *kvm,
> return spte_set;
> }
>
> -static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(gfp_t gfp)
> +void *kvm_mmu_alloc_shadow_page_table(struct kvm_mmu_memory_cache *cache,
> + gfp_t gfp, kvm_pfn_t pfn)
> +{
> + struct page *page = kvm_pfn_to_refcounted_page(pfn);
> + struct page *spt_page;
> + int node;
> +
> + gfp |= __GFP_ZERO | __GFP_ACCOUNT;
> +
> + if (page) {
> + spt_page = alloc_pages_node(page_to_nid(page), gfp, 0);
> + if (spt_page)
> + return page_address(spt_page);
> + } else if (!cache) {
> + return (void *)__get_free_page(gfp);
> + }
> +
> + if (cache)
> + return kvm_mmu_memory_cache_alloc(cache);
> +
> + return NULL;
> +}
> +
> +static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(gfp_t gfp,
> + kvm_pfn_t pfn)
> {
> struct kvm_mmu_page *sp;
>
> - gfp |= __GFP_ZERO;
> -
> sp = kmem_cache_alloc(mmu_page_header_cache, gfp);
> if (!sp)
> return NULL;
>
> - sp->spt = (void *)__get_free_page(gfp);
> + sp->spt = kvm_mmu_alloc_shadow_page_table(NULL, gfp, pfn);
> if (!sp->spt) {
> kmem_cache_free(mmu_page_header_cache, sp);
> return NULL;
> @@ -1436,7 +1459,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(struct kvm *kvm,
> * If this allocation fails we drop the lock and retry with reclaim
> * allowed.
> */
> - sp = __tdp_mmu_alloc_sp_for_split(GFP_NOWAIT | __GFP_ACCOUNT);
> + sp = __tdp_mmu_alloc_sp_for_split(GFP_NOWAIT);
> if (sp)
> return sp;
>
>
> base-commit: f8990bfe1eab91c807ca8fc0d48705e8f986b951
> --
>
next prev parent reply other threads:[~2022-08-03 15:08 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-01 15:19 [PATCH] KVM: x86/mmu: Make page tables for eager page splitting NUMA aware Vipin Sharma
2022-08-01 22:10 ` David Matlack
2022-08-01 23:56 ` Sean Christopherson
2022-08-02 16:31 ` David Matlack
2022-08-02 17:22 ` Sean Christopherson
2022-08-02 17:42 ` Paolo Bonzini
2022-08-02 19:12 ` Sean Christopherson
2022-08-03 15:08 ` Paolo Bonzini [this message]
2022-08-05 23:30 ` Vipin Sharma
2022-08-09 16:52 ` David Matlack
2022-08-09 23:51 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cedcced0-b92c-07bd-ef2b-272ae58fdf40@redhat.com \
--to=pbonzini@redhat.com \
--cc=dmatlack@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=seanjc@google.com \
--cc=vipinsh@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).