From: Maxim Levitsky <mlevitsk@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: seanjc@google.com
Subject: Re: [PATCH v2 09/18] KVM: x86/mmu: look for a cached PGD when going from 32-bit to 64-bit
Date: Wed, 23 Feb 2022 18:01:51 +0200 [thread overview]
Message-ID: <0c3644304cbead7924f888ec16498e334c6efbfc.camel@redhat.com> (raw)
In-Reply-To: <20220217210340.312449-10-pbonzini@redhat.com>
On Thu, 2022-02-17 at 16:03 -0500, Paolo Bonzini wrote:
> Right now, PGD caching avoids placing a PAE root in the cache by using the
> old value of mmu->root_level and mmu->shadow_root_level; it does not look
> for a cached PGD if the old root is a PAE one, and then frees it using
> kvm_mmu_free_roots.
>
> Change the logic instead to free the uncacheable root early.
> This way, __kvm_new_mmu_pgd is able to look up the cache when going from
> 32-bit to 64-bit (if there is a hit, the invalid root becomes the least
> recently used). An example of this is nested virtualization with shadow
> paging, when a 64-bit L1 runs a 32-bit L2.
>
> As a side effect (which is actually the reason why this patch was
> written), PGD caching does not use the old value of mmu->root_level
> and mmu->shadow_root_level anymore.
Which is great - I hated this code!!!
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> arch/x86/kvm/mmu/mmu.c | 82 ++++++++++++++++++++++++++++++------------
> 1 file changed, 59 insertions(+), 23 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 0f2de811e871..da324a317000 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4107,52 +4107,88 @@ static inline bool is_root_usable(struct kvm_mmu_root_info *root, gpa_t pgd,
> union kvm_mmu_page_role role)
> {
> return (role.direct || pgd == root->pgd) &&
> - VALID_PAGE(root->hpa) && to_shadow_page(root->hpa) &&
> + VALID_PAGE(root->hpa) &&
> role.word == to_shadow_page(root->hpa)->role.word;
> }
>
> /*
> - * Find out if a previously cached root matching the new pgd/role is available.
> - * The current root is also inserted into the cache.
> - * If a matching root was found, it is assigned to kvm_mmu->root.hpa and true is
> - * returned.
> - * Otherwise, the LRU root from the cache is assigned to kvm_mmu->root.hpa and
> - * false is returned. This root should now be freed by the caller.
> + * Find out if a previously cached root matching the new pgd/role is available,
> + * and insert the current root as the MRU in the cache.
> + * If a matching root is found, it is assigned to kvm_mmu->root and
> + * true is returned.
> + * If no match is found, kvm_mmu->root is left invalid, the LRU root is
> + * evicted to make room for the current root, and false is returned.
> */
> -static bool cached_root_available(struct kvm_vcpu *vcpu, gpa_t new_pgd,
> - union kvm_mmu_page_role new_role)
> +static bool cached_root_find_and_keep_current(struct kvm *kvm, struct kvm_mmu *mmu,
> + gpa_t new_pgd,
> + union kvm_mmu_page_role new_role)
> {
> uint i;
> - struct kvm_mmu *mmu = vcpu->arch.mmu;
>
> if (is_root_usable(&mmu->root, new_pgd, new_role))
> return true;
>
> for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
> + /*
> + * The swaps end up rotating the cache like this:
> + * C 0 1 2 3 (on entry to the function)
> + * 0 C 1 2 3
> + * 1 C 0 2 3
> + * 2 C 0 1 3
> + * 3 C 0 1 2 (on exit from the loop)
> + */
Thanks a million for documenting this! I remember it took
me too much time to figure out what all of these swaps do.
I do want to mention that it would be nice to mention that
the above trace is when none of the root in the cache match.
> swap(mmu->root, mmu->prev_roots[i]);
> -
> if (is_root_usable(&mmu->root, new_pgd, new_role))
> - break;
Maybe even add comment that if that break happens, the cache would look like
'2 C 0 1 3' with 2 beeing the matching root.
> + return true;
> }
>
> - return i < KVM_MMU_NUM_PREV_ROOTS;
> + kvm_mmu_free_roots(kvm, mmu, KVM_MMU_ROOT_CURRENT);
> + return false;
> }
>
> -static bool fast_pgd_switch(struct kvm_vcpu *vcpu, gpa_t new_pgd,
> - union kvm_mmu_page_role new_role)
> +/*
> + * Find out if a previously cached root matching the new pgd/role is available.
> + * On entry, mmu->root is invalid.
> + * If a matching root is found, it is assigned to kvm_mmu->root, the LRU entry
> + * of the cache becomes invalid, and true is returned.
> + * If no match is found, kvm_mmu->root is left invalid and false is returned.
> + */
> +static bool cached_root_find_without_current(struct kvm *kvm, struct kvm_mmu *mmu,
> + gpa_t new_pgd,
> + union kvm_mmu_page_role new_role)
> {
> - struct kvm_mmu *mmu = vcpu->arch.mmu;
> + uint i;
> +
> + for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
> + if (is_root_usable(&mmu->prev_roots[i], new_pgd, new_role))
> + goto hit;
>
> + return false;
> +
> +hit:
> + swap(mmu->root, mmu->prev_roots[i]);
> + /* Bubble up the remaining roots. */
> + for (; i < KVM_MMU_NUM_PREV_ROOTS - 1; i++)
> + mmu->prev_roots[i] = mmu->prev_roots[i + 1];
> + mmu->prev_roots[i].hpa = INVALID_PAGE;
I'll would have invalidated the 'pgd' value as well, just in case.
> + return true;
> +}
Since we just have 4 pointers in the LRU cache + root pointer, I wonder if something
dumber/slower could work for both cases.
Like something like that (not tested at all):
for (i = 0 ; i < KVM_MMU_NUM_PREV_ROOTS ; i++)
tmp_prev_roots[i] = mmu->prev_roots[i];
j = 0;
/* current mmu root becomes MRU*/
if (VALID_PAGE(mmu->root.hpa)
mmu->prev_roots[j++] = mmu->root;
for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS && j < KVM_MMU_NUM_PREV_ROOTS; i++)
if (is_root_usable(&tmp_prev_roots[i], new_pgd, new_role))
/* TODO: can also add a warning here if current root is already usable */
mmu->root = mmu->tmp_prev_roots[i];
else
mmu->prev_roots[j++] = mmu->tmp_prev_roots[i];
for (; j < KVM_MMU_NUM_PREV_ROOTS ; j++)
mmu->prev_roots[j++].hpa = INVALID_PAGE;
> +
> +static bool fast_pgd_switch(struct kvm *kvm, struct kvm_mmu *mmu,
> + gpa_t new_pgd, union kvm_mmu_page_role new_role)
> +{
> /*
> - * For now, limit the fast switch to 64-bit hosts+VMs in order to avoid
> + * For now, limit the caching to 64-bit hosts+VMs in order to avoid
> * having to deal with PDPTEs. We may add support for 32-bit hosts/VMs
> * later if necessary.
> */
> - if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL &&
> - mmu->root_level >= PT64_ROOT_4LEVEL)
> - return cached_root_available(vcpu, new_pgd, new_role);
> + if (VALID_PAGE(mmu->root.hpa) && !to_shadow_page(mmu->root.hpa))
> + kvm_mmu_free_roots(kvm, mmu, KVM_MMU_ROOT_CURRENT);
>
> - return false;
> + if (VALID_PAGE(mmu->root.hpa))
> + return cached_root_find_and_keep_current(kvm, mmu, new_pgd, new_role);
> + else
> + return cached_root_find_without_current(kvm, mmu, new_pgd, new_role);
> }
>
> static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd,
> @@ -4160,8 +4196,8 @@ static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd,
> {
> struct kvm_mmu *mmu = vcpu->arch.mmu;
>
> - if (!fast_pgd_switch(vcpu, new_pgd, new_role)) {
> - kvm_mmu_free_roots(vcpu->kvm, mmu, KVM_MMU_ROOT_CURRENT);
> + if (!fast_pgd_switch(vcpu->kvm, mmu, new_pgd, new_role)) {
> + /* kvm_mmu_ensure_valid_pgd will set up a new root. */
I also agree with Sean's comment on this.
> return;
> }
>
I don't see any bugs in the code however, thus:
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Best regards,
Maxim Levitsky
next prev parent reply other threads:[~2022-02-23 16:02 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-17 21:03 [PATCH v2 00/18] KVM: MMU: do not unload MMU roots on all role changes Paolo Bonzini
2022-02-17 21:03 ` [PATCH v2 01/18] KVM: x86: host-initiated EFER.LME write affects the MMU Paolo Bonzini
2022-02-18 17:08 ` Sean Christopherson
2022-02-18 17:26 ` Paolo Bonzini
2022-02-23 13:40 ` Maxim Levitsky
2022-02-17 21:03 ` [PATCH v2 02/18] KVM: x86: do not deliver asynchronous page faults if CR0.PG=0 Paolo Bonzini
2022-02-18 17:12 ` Sean Christopherson
2022-02-23 14:07 ` Maxim Levitsky
2022-02-17 21:03 ` [PATCH v2 03/18] KVM: x86/mmu: WARN if PAE roots linger after kvm_mmu_unload Paolo Bonzini
2022-02-18 17:14 ` Sean Christopherson
2022-02-18 17:23 ` Paolo Bonzini
2022-02-23 14:11 ` Maxim Levitsky
2022-02-17 21:03 ` [PATCH v2 04/18] KVM: x86/mmu: avoid NULL-pointer dereference on page freeing bugs Paolo Bonzini
2022-02-18 17:15 ` Sean Christopherson
2022-02-23 14:12 ` Maxim Levitsky
2022-02-17 21:03 ` [PATCH v2 05/18] KVM: x86/mmu: use struct kvm_mmu_root_info for mmu->root Paolo Bonzini
2022-02-23 14:39 ` Maxim Levitsky
2022-02-23 15:42 ` Sean Christopherson
2022-02-17 21:03 ` [PATCH v2 06/18] KVM: x86/mmu: do not consult levels when freeing roots Paolo Bonzini
2022-02-18 17:27 ` Sean Christopherson
2022-02-23 14:59 ` Maxim Levitsky
2022-02-17 21:03 ` [PATCH v2 07/18] KVM: x86/mmu: Do not use guest root level in audit Paolo Bonzini
2022-02-18 18:37 ` Sean Christopherson
2022-02-18 18:46 ` Paolo Bonzini
2022-02-23 15:02 ` Maxim Levitsky
2022-02-17 21:03 ` [PATCH v2 08/18] KVM: x86/mmu: do not pass vcpu to root freeing functions Paolo Bonzini
2022-02-18 18:39 ` Sean Christopherson
2022-02-23 15:16 ` Maxim Levitsky
2022-02-23 15:48 ` Sean Christopherson
2022-02-17 21:03 ` [PATCH v2 09/18] KVM: x86/mmu: look for a cached PGD when going from 32-bit to 64-bit Paolo Bonzini
2022-02-18 18:08 ` Sean Christopherson
2022-02-23 16:01 ` Maxim Levitsky [this message]
2022-02-17 21:03 ` [PATCH v2 10/18] KVM: x86/mmu: load new PGD after the shadow MMU is initialized Paolo Bonzini
2022-02-18 23:59 ` Sean Christopherson
2022-02-23 16:20 ` Maxim Levitsky
2022-02-17 21:03 ` [PATCH v2 11/18] KVM: x86/mmu: Always use current mmu's role when loading new PGD Paolo Bonzini
2022-02-18 23:59 ` Sean Christopherson
2022-02-23 16:23 ` Maxim Levitsky
2022-02-17 21:03 ` [PATCH v2 12/18] KVM: x86/mmu: clear MMIO cache when unloading the MMU Paolo Bonzini
2022-02-18 23:59 ` Sean Christopherson
2022-02-23 16:32 ` Maxim Levitsky
2022-02-17 21:03 ` [PATCH v2 13/18] KVM: x86: reset and reinitialize the MMU in __set_sregs_common Paolo Bonzini
2022-02-19 0:22 ` Sean Christopherson
2022-02-23 16:48 ` Maxim Levitsky
2022-02-17 21:03 ` [PATCH v2 14/18] KVM: x86/mmu: avoid indirect call for get_cr3 Paolo Bonzini
2022-02-18 20:30 ` Sean Christopherson
2022-02-19 10:03 ` Paolo Bonzini
2022-02-24 11:02 ` Maxim Levitsky
2022-02-24 15:12 ` Sean Christopherson
2022-02-24 15:14 ` Maxim Levitsky
2022-02-17 21:03 ` [PATCH v2 15/18] KVM: x86/mmu: rename kvm_mmu_new_pgd, introduce variant that calls get_guest_pgd Paolo Bonzini
2022-02-18 9:39 ` Paolo Bonzini
2022-02-18 21:00 ` Sean Christopherson
2022-02-24 15:41 ` Maxim Levitsky
2022-02-25 17:40 ` Sean Christopherson
2022-02-17 21:03 ` [PATCH v2 16/18] KVM: x86: introduce KVM_REQ_MMU_UPDATE_ROOT Paolo Bonzini
2022-02-18 21:45 ` Sean Christopherson
2022-02-19 7:54 ` Paolo Bonzini
2022-02-22 16:06 ` Sean Christopherson
2022-02-24 15:50 ` Maxim Levitsky
2022-02-17 21:03 ` [PATCH v2 17/18] KVM: x86: flush TLB separately from MMU reset Paolo Bonzini
2022-02-18 23:57 ` Sean Christopherson
2022-02-21 15:01 ` Paolo Bonzini
2022-02-24 16:11 ` Maxim Levitsky
2022-02-17 21:03 ` [PATCH v2 18/18] KVM: x86: do not unload MMU roots on all role changes Paolo Bonzini
2022-02-24 16:25 ` Maxim Levitsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0c3644304cbead7924f888ec16498e334c6efbfc.camel@redhat.com \
--to=mlevitsk@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).