All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	Paolo Bonzini <pbonzini@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Maxim Levitsky <mlevitsk@redhat.com>,
	Lai Jiangshan <jiangshan.ljs@antgroup.com>
Subject: Re: [PATCH 05/12] KVM: X86/MMU: Clear unsync bit directly in __mmu_unsync_walk()
Date: Tue, 19 Jul 2022 19:52:40 +0000	[thread overview]
Message-ID: <YtcLiNskPb8z/2Qc@google.com> (raw)
In-Reply-To: <20220605064342.309219-6-jiangshanlai@gmail.com>

On Sun, Jun 05, 2022, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> 
> mmu_unsync_walk() and __mmu_unsync_walk() requires the caller to clear
> unsync for the shadow pages in the resulted pvec by synching them or
> zapping them.
> 
> All callers does so.
> 
> Otherwise mmu_unsync_walk() and __mmu_unsync_walk() can't work because
> they always walk from the beginning.
> 
> It is possible to make mmu_unsync_walk() and __mmu_unsync_walk() lists
> unsync shadow pages in the resulted pvec without needing synching them
> or zapping them later.  It would require to change mmu_unsync_walk()
> and __mmu_unsync_walk() and make it walk from the last visited position
> derived from the resulted pvec of the previous call of mmu_unsync_walk().
> 
> It would complicate the walk and no callers require the possible new
> behavior.
> 
> It is better to keep the original behavior.
> 
> Since the shadow pages in the resulted pvec will be synced or zapped,
> and clear_unsync_child_bit() for parents will be called anyway later.
> 
> Call clear_unsync_child_bit() earlier and directly in __mmu_unsync_walk()
> to make the code more efficient (the memory of the shadow pages is hot
> in the CPU cache, and no need to visit the shadow pages again later).

The changelog and shortlog do a poor job of capturing what this patch actually
does.  This is a prime example of why I prefer that changelogs first document
what the patch is doing, and only then dive into background details and alternatives.

This changelog has 6-7 paragraphs talking about current KVM behaviors and
alternatives before talking about the patch itself, and then doesn't actually
describe the net effect of the change.

The use of "directly" in the shortlog is also confusing because __mmu_unsync_walk()
already invokes clear_unsync_child_bit(), e.g. this patch only affects
__mmu_unsync_walk().  IIUC, the change is that __mmu_unsync_walk() will clear
the unsync info when adding to @pvec instead of having to redo the walk after
zapping/synching the page.

  KVM: x86/mmu: Clear unsync child _before_ final zap/sync

  Clear the unsync child information for a shadow page when adding it to
  the array of to-be-zapped/synced shadow pages, i.e. _before_ the actual
  zap/sync that effects the "no longer unsync" state change.  Callers of
  mmu_unsync_walk() and __mmu_unsync_walk() are required to zap/sync all
  entries in @pvec before dropping mmu_lock, i.e. once a shadow page is
  added to the set of pages to zap/sync, success is guaranteed.

  Clearing the unsync info when adding to the array yields more efficient
  code as KVM will no longer need to rewalk the shadow pages to "discover"
  that the child   pages is no longer unsync, and as a bonus, the metadata
  for the shadow   page will be hot in the CPU cache.

  Note, this obviously doesn't work if success isn't guaranteed, but
  mmu_unsync_walk() and __mmu_unsync_walk() would require significant
  changes to allow restarting a walk after failure to zap/sync.  I.e.
  this is but one of many details that would need to change.

> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 22 +++++++++++++---------
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index f35fd5c59c38..2446ede0b7b9 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -1794,19 +1794,23 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
>  				return -ENOSPC;
>  
>  			ret = __mmu_unsync_walk(child, pvec);
> -			if (!ret) {
> -				clear_unsync_child_bit(sp, i);
> -				continue;
> -			} else if (ret > 0) {
> -				nr_unsync_leaf += ret;
> -			} else
> +			if (ret < 0)
>  				return ret;
> -		} else if (child->unsync) {
> +			nr_unsync_leaf += ret;
> +		}
> +
> +		/*
> +		 * Clear unsync bit for @child directly if @child is fully
> +		 * walked and all the unsync shadow pages descended from
> +		 * @child (including itself) are added into @pvec, the caller
> +		 * must sync or zap all the unsync shadow pages in @pvec.
> +		 */
> +		clear_unsync_child_bit(sp, i);
> +		if (child->unsync) {
>  			nr_unsync_leaf++;
>  			if (mmu_pages_add(pvec, child, i))

This ordering is wrong, no?  If the child itself is unsync and can't be added to
@pvec, i.e. fails here, then clearing its bit in unsync_child_bitmap is wrong.

I also dislike that that this patch obfuscates that a shadow page can't be unsync
itself _and_ have unsync children (because only PG_LEVEL_4K can be unsync).  In
other words, keep the

	if (child->unsync_children) {

	} else if (child->unsync) {

	}

And at that point, we can streamline this further:

	int i, ret, nr_unsync_leaf = 0;

	for_each_set_bit(i, sp->unsync_child_bitmap, 512) {
		struct kvm_mmu_page *child;
		u64 ent = sp->spt[i];

		if (is_shadow_present_pte(ent) && !is_large_pte(ent)) {
			child = to_shadow_page(ent & PT64_BASE_ADDR_MASK);
			if (child->unsync_children) {
				ret = __mmu_unsync_walk_and_clear(child, pvec);
				if (ret < 0)
					return ret;
				nr_unsync_leaf += ret;
			} else if (child->unsync) {
				if (mmu_pages_add(pvec, child))
					return -ENOSPC;
				nr_unsync_leaf++;
			}
		}

		/*
		 * Clear the unsync info, the child is either already sync
		 * (bitmap is stale) or is guaranteed to be zapped/synced by
		 * the caller before mmu_lock is released.  Note, the caller is
		 * required to zap/sync all entries in @pvec even if an error
		 * is returned!
		 */
		clear_unsync_child_bit(sp, i);
	}

	return nr_unsync_leaf;

  reply	other threads:[~2022-07-19 19:52 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-05  6:43 [PATCH 00/12] KVM: X86/MMU: Simpliy mmu_unsync_walk() Lai Jiangshan
2022-06-05  6:43 ` [PATCH 01/12] KVM: X86/MMU: Warn if sp->unsync_children > 0 in link_shadow_page() Lai Jiangshan
2022-06-05  6:43 ` [PATCH 02/12] KVM: X86/MMU: Rename kvm_unlink_unsync_page() to kvm_mmu_page_clear_unsync() Lai Jiangshan
2022-07-14 22:10   ` Sean Christopherson
2022-06-05  6:43 ` [PATCH 03/12] KVM: X86/MMU: Split a part of kvm_unsync_page() as kvm_mmu_page_mark_unsync() Lai Jiangshan
2022-07-14 22:19   ` Sean Christopherson
2022-06-05  6:43 ` [PATCH 04/12] KVM: X86/MMU: Remove mmu_pages_clear_parents() Lai Jiangshan
2022-07-14 23:15   ` Sean Christopherson
2022-06-05  6:43 ` [PATCH 05/12] KVM: X86/MMU: Clear unsync bit directly in __mmu_unsync_walk() Lai Jiangshan
2022-07-19 19:52   ` Sean Christopherson [this message]
2022-07-21  9:32     ` Lai Jiangshan
2022-07-21 16:26       ` Sean Christopherson
2022-06-05  6:43 ` [PATCH 06/12] KVM: X86/MMU: Rename mmu_unsync_walk() to mmu_unsync_walk_and_clear() Lai Jiangshan
2022-07-19 20:07   ` Sean Christopherson
2022-06-05  6:43 ` [PATCH 07/12] KVM: X86/MMU: Remove the useless struct mmu_page_path Lai Jiangshan
2022-07-19 20:15   ` Sean Christopherson
2022-07-21  9:43     ` Lai Jiangshan
2022-07-21 15:25       ` Sean Christopherson
2022-06-05  6:43 ` [PATCH 08/12] KVM: X86/MMU: Remove the useless idx from struct kvm_mmu_pages Lai Jiangshan
2022-07-19 20:31   ` Sean Christopherson
2022-06-05  6:43 ` [PATCH 09/12] KVM: X86/MMU: Unfold struct mmu_page_and_offset in " Lai Jiangshan
2022-06-05  6:43 ` [PATCH 10/12] KVM: X86/MMU: Don't add parents to " Lai Jiangshan
2022-07-19 20:34   ` Sean Christopherson
2022-06-05  6:43 ` [PATCH 11/12] KVM: X86/MMU: Remove mmu_pages_first() and mmu_pages_next() Lai Jiangshan
2022-07-19 20:40   ` Sean Christopherson
2022-06-05  6:43 ` [PATCH 12/12] KVM: X86/MMU: Rename struct kvm_mmu_pages to struct kvm_mmu_page_vec Lai Jiangshan
2022-07-19 20:45   ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YtcLiNskPb8z/2Qc@google.com \
    --to=seanjc@google.com \
    --cc=jiangshan.ljs@antgroup.com \
    --cc=jiangshanlai@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mlevitsk@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=vkuznets@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.