kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <sean.j.christopherson@intel.com>
To: Ben Gardon <bgardon@google.com>
Cc: kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	Peter Feiner <pfeiner@google.com>,
	Peter Shier <pshier@google.com>,
	Junaid Shahid <junaids@google.com>,
	Jim Mattson <jmattson@google.com>
Subject: Re: [RFC PATCH 07/28] kvm: mmu: Add functions for handling changed PTEs
Date: Wed, 27 Nov 2019 11:04:01 -0800	[thread overview]
Message-ID: <20191127190401.GG22227@linux.intel.com> (raw)
In-Reply-To: <20190926231824.149014-8-bgardon@google.com>

On Thu, Sep 26, 2019 at 04:18:03PM -0700, Ben Gardon wrote:
> The existing bookkeeping done by KVM when a PTE is changed is
> spread around several functions. This makes it difficult to remember all
> the stats, bitmaps, and other subsystems that need to be updated whenever
> a PTE is modified. When a non-leaf PTE is marked non-present or becomes
> a leaf PTE, page table memory must also be freed. Further, most of the
> bookkeeping is done before the PTE is actually set. This works well with
> a monolithic MMU lock, however if changes use atomic compare/exchanges,
> the bookkeeping cannot be done before the change is made. In either
> case, there is a short window in which some statistics, e.g. the dirty
> bitmap will be inconsistent, however consistency is still restored
> before the MMU lock is released. To simplify the MMU and facilitate the
> use of atomic operations on PTEs, create functions to handle some of the
> bookkeeping required as a result of the change.

This is one case where splitting into multiple patches is probably not the
best option.  It's difficult to review this patch without seeing how
disconnected PTEs are used.  And, the patch is untestable for all intents
and purposes since there is no external caller, i.e. all of the calles are
self-referential within the new code.

> Signed-off-by: Ben Gardon <bgardon@google.com>
> ---
>  arch/x86/kvm/mmu.c | 145 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 145 insertions(+)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 0311d18d9a995..50413f17c7cd0 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -143,6 +143,18 @@ module_param(dbg, bool, 0644);
>  #define SPTE_HOST_WRITEABLE	(1ULL << PT_FIRST_AVAIL_BITS_SHIFT)
>  #define SPTE_MMU_WRITEABLE	(1ULL << (PT_FIRST_AVAIL_BITS_SHIFT + 1))
>  
> +/*
> + * PTEs in a disconnected page table can be set to DISCONNECTED_PTE to indicate
> + * to other threads that the page table in which the pte resides is no longer
> + * connected to the root of a paging structure.
> + *
> + * This constant works because it is considered non-present on both AMD and
> + * Intel CPUs and does not create a L1TF vulnerability because the pfn section
> + * is zeroed out. PTE bit 57 is available to software, per vol 3, figure 28-1
> + * of the Intel SDM and vol 2, figures 5-18 to 5-21 of the AMD APM.
> + */
> +#define DISCONNECTED_PTE (1ull << 57)

Use BIT_ULL, ignore the bad examples in mmu.c :-)

> +
>  #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
>  
>  /* make pte_list_desc fit well in cache line */
> @@ -555,6 +567,16 @@ static int is_shadow_present_pte(u64 pte)
>  	return (pte != 0) && !is_mmio_spte(pte);
>  }
>  
> +static inline int is_disconnected_pte(u64 pte)
> +{
> +	return pte == DISCONNECTED_PTE;
> +}

An explicit comparsion scares me a bit, but that's just my off the cuff
reaction.  I'll come back to the meat of this series after turkey day.

> +
> +static int is_present_direct_pte(u64 pte)
> +{
> +	return is_shadow_present_pte(pte) && !is_disconnected_pte(pte);
> +}
> +
>  static int is_large_pte(u64 pte)
>  {
>  	return pte & PT_PAGE_SIZE_MASK;
> @@ -1659,6 +1681,129 @@ static bool __rmap_set_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
>  	return flush;
>  }
>  
> +static void handle_changed_pte(struct kvm *kvm, int as_id, gfn_t gfn,
> +			       u64 old_pte, u64 new_pte, int level);
> +
> +/**
> + * mark_pte_disconnected - Mark a PTE as part of a disconnected PT
> + * @kvm: kvm instance
> + * @as_id: the address space of the paging structure the PTE was a part of
> + * @gfn: the base GFN that was mapped by the PTE
> + * @ptep: a pointer to the PTE to be marked disconnected
> + * @level: the level of the PT this PTE was a part of, when it was part of the
> + *	paging structure
> + */
> +static void mark_pte_disconnected(struct kvm *kvm, int as_id, gfn_t gfn,
> +				  u64 *ptep, int level)
> +{
> +	u64 old_pte;
> +
> +	old_pte = xchg(ptep, DISCONNECTED_PTE);
> +	BUG_ON(old_pte == DISCONNECTED_PTE);
> +
> +	handle_changed_pte(kvm, as_id, gfn, old_pte, DISCONNECTED_PTE, level);
> +}
> +
> +/**
> + * handle_disconnected_pt - Mark a PT as disconnected and handle associated
> + * bookkeeping and freeing
> + * @kvm: kvm instance
> + * @as_id: the address space of the paging structure the PT was a part of
> + * @pt_base_gfn: the base GFN that was mapped by the first PTE in the PT
> + * @pfn: The physical frame number of the disconnected PT page
> + * @level: the level of the PT, when it was part of the paging structure
> + *
> + * Given a pointer to a page table that has been removed from the paging
> + * structure and its level, recursively free child page tables and mark their
> + * entries as disconnected.
> + */
> +static void handle_disconnected_pt(struct kvm *kvm, int as_id,
> +				   gfn_t pt_base_gfn, kvm_pfn_t pfn, int level)
> +{
> +	int i;
> +	gfn_t gfn = pt_base_gfn;
> +	u64 *pt = pfn_to_kaddr(pfn);
> +
> +	for (i = 0; i < PT64_ENT_PER_PAGE; i++) {
> +		/*
> +		 * Mark the PTE as disconnected so that no other thread will
> +		 * try to map in an entry there or try to free any child page
> +		 * table the entry might have pointed to.
> +		 */
> +		mark_pte_disconnected(kvm, as_id, gfn, &pt[i], level);
> +
> +		gfn += KVM_PAGES_PER_HPAGE(level);
> +	}
> +
> +	free_page((unsigned long)pt);
> +}
> +
> +/**
> + * handle_changed_pte - handle bookkeeping associated with a PTE change
> + * @kvm: kvm instance
> + * @as_id: the address space of the paging structure the PTE was a part of
> + * @gfn: the base GFN that was mapped by the PTE
> + * @old_pte: The value of the PTE before the atomic compare / exchange
> + * @new_pte: The value of the PTE after the atomic compare / exchange
> + * @level: the level of the PT the PTE is part of in the paging structure
> + *
> + * Handle bookkeeping that might result from the modification of a PTE.
> + * This function should be called in the same RCU read critical section as the
> + * atomic cmpxchg on the pte. This function must be called for all direct pte
> + * modifications except those which strictly emulate hardware, for example
> + * setting the dirty bit on a pte.
> + */
> +static void handle_changed_pte(struct kvm *kvm, int as_id, gfn_t gfn,
> +			       u64 old_pte, u64 new_pte, int level)
> +{
> +	bool was_present = is_present_direct_pte(old_pte);
> +	bool is_present = is_present_direct_pte(new_pte);
> +	bool was_leaf = was_present && is_last_spte(old_pte, level);
> +	bool pfn_changed = spte_to_pfn(old_pte) != spte_to_pfn(new_pte);
> +	int child_level;
> +
> +	BUG_ON(level > PT64_ROOT_MAX_LEVEL);
> +	BUG_ON(level < PT_PAGE_TABLE_LEVEL);
> +	BUG_ON(gfn % KVM_PAGES_PER_HPAGE(level));
> +
> +	/*
> +	 * The only times a pte should be changed from a non-present to
> +	 * non-present state is when an entry in an unlinked page table is
> +	 * marked as a disconnected PTE as part of freeing the page table,
> +	 * or an MMIO entry is installed/modified. In these cases there is
> +	 * nothing to do.
> +	 */
> +	if (!was_present && !is_present) {
> +		/*
> +		 * If this change is not on an MMIO PTE and not setting a PTE
> +		 * as disconnected, then it is unexpected. Log the change,
> +		 * though it should not impact the guest since both the former
> +		 * and current PTEs are nonpresent.
> +		 */
> +		WARN_ON((new_pte != DISCONNECTED_PTE) &&
> +			!is_mmio_spte(new_pte));
> +		return;
> +	}
> +
> +	if (was_present && !was_leaf && (pfn_changed || !is_present)) {
> +		/*
> +		 * The level of the page table being freed is one level lower
> +		 * than the level at which it is mapped.
> +		 */
> +		child_level = level - 1;
> +
> +		/*
> +		 * If there was a present non-leaf entry before, and now the
> +		 * entry points elsewhere, the lpage stats and dirty logging /
> +		 * access tracking status for all the entries the old pte
> +		 * pointed to must be updated and the page table pages it
> +		 * pointed to must be freed.
> +		 */
> +		handle_disconnected_pt(kvm, as_id, gfn, spte_to_pfn(old_pte),
> +				       child_level);
> +	}
> +}
> +
>  /**
>   * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages
>   * @kvm: kvm instance
> -- 
> 2.23.0.444.g18eeb5a265-goog
> 

  reply	other threads:[~2019-11-27 19:04 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-26 23:17 [RFC PATCH 00/28] kvm: mmu: Rework the x86 TDP direct mapped case Ben Gardon
2019-09-26 23:17 ` [RFC PATCH 01/28] kvm: mmu: Separate generating and setting mmio ptes Ben Gardon
2019-11-27 18:15   ` Sean Christopherson
2019-09-26 23:17 ` [RFC PATCH 02/28] kvm: mmu: Separate pte generation from set_spte Ben Gardon
2019-11-27 18:25   ` Sean Christopherson
2019-09-26 23:17 ` [RFC PATCH 03/28] kvm: mmu: Zero page cache memory at allocation time Ben Gardon
2019-11-27 18:32   ` Sean Christopherson
2019-09-26 23:18 ` [RFC PATCH 04/28] kvm: mmu: Update the lpages stat atomically Ben Gardon
2019-11-27 18:39   ` Sean Christopherson
2019-12-06 20:10     ` Ben Gardon
2019-09-26 23:18 ` [RFC PATCH 05/28] sched: Add cond_resched_rwlock Ben Gardon
2019-11-27 18:42   ` Sean Christopherson
2019-12-06 20:12     ` Ben Gardon
2019-09-26 23:18 ` [RFC PATCH 06/28] kvm: mmu: Replace mmu_lock with a read/write lock Ben Gardon
2019-11-27 18:47   ` Sean Christopherson
2019-12-02 22:45     ` Sean Christopherson
2019-09-26 23:18 ` [RFC PATCH 07/28] kvm: mmu: Add functions for handling changed PTEs Ben Gardon
2019-11-27 19:04   ` Sean Christopherson [this message]
2019-09-26 23:18 ` [RFC PATCH 08/28] kvm: mmu: Init / Uninit the direct MMU Ben Gardon
2019-12-02 23:40   ` Sean Christopherson
2019-12-06 20:25     ` Ben Gardon
2019-09-26 23:18 ` [RFC PATCH 09/28] kvm: mmu: Free direct MMU page table memory in an RCU callback Ben Gardon
2019-09-26 23:18 ` [RFC PATCH 10/28] kvm: mmu: Flush TLBs before freeing direct MMU page table memory Ben Gardon
2019-12-02 23:46   ` Sean Christopherson
2019-12-06 20:31     ` Ben Gardon
2019-09-26 23:18 ` [RFC PATCH 11/28] kvm: mmu: Optimize for freeing direct MMU PTs on teardown Ben Gardon
2019-12-02 23:54   ` Sean Christopherson
2019-09-26 23:18 ` [RFC PATCH 12/28] kvm: mmu: Set tlbs_dirty atomically Ben Gardon
2019-12-03  0:13   ` Sean Christopherson
2019-09-26 23:18 ` [RFC PATCH 13/28] kvm: mmu: Add an iterator for concurrent paging structure walks Ben Gardon
2019-12-03  2:15   ` Sean Christopherson
2019-12-18 18:25     ` Ben Gardon
2019-12-18 19:14       ` Sean Christopherson
2019-09-26 23:18 ` [RFC PATCH 14/28] kvm: mmu: Batch updates to the direct mmu disconnected list Ben Gardon
2019-09-26 23:18 ` [RFC PATCH 15/28] kvm: mmu: Support invalidate_zap_all_pages Ben Gardon
2019-09-26 23:18 ` [RFC PATCH 16/28] kvm: mmu: Add direct MMU page fault handler Ben Gardon
2020-01-08 17:20   ` Peter Xu
2020-01-08 18:15     ` Ben Gardon
2020-01-08 19:00       ` Peter Xu
2019-09-26 23:18 ` [RFC PATCH 17/28] kvm: mmu: Add direct MMU fast " Ben Gardon
2019-09-26 23:18 ` [RFC PATCH 18/28] kvm: mmu: Add an hva range iterator for memslot GFNs Ben Gardon
2019-09-26 23:18 ` [RFC PATCH 19/28] kvm: mmu: Make address space ID a property of memslots Ben Gardon
2019-09-26 23:18 ` [RFC PATCH 20/28] kvm: mmu: Implement the invalidation MMU notifiers for the direct MMU Ben Gardon
2019-09-26 23:18 ` [RFC PATCH 21/28] kvm: mmu: Integrate the direct mmu with the changed pte notifier Ben Gardon
2019-09-26 23:18 ` [RFC PATCH 22/28] kvm: mmu: Implement access tracking for the direct MMU Ben Gardon
2019-09-26 23:18 ` [RFC PATCH 23/28] kvm: mmu: Make mark_page_dirty_in_slot usable from outside kvm_main Ben Gardon
2019-09-26 23:18 ` [RFC PATCH 24/28] kvm: mmu: Support dirty logging in the direct MMU Ben Gardon
2019-09-26 23:18 ` [RFC PATCH 25/28] kvm: mmu: Support kvm_zap_gfn_range " Ben Gardon
2019-09-26 23:18 ` [RFC PATCH 26/28] kvm: mmu: Integrate direct MMU with nesting Ben Gardon
2019-09-26 23:18 ` [RFC PATCH 27/28] kvm: mmu: Lazily allocate rmap when direct MMU is enabled Ben Gardon
2019-09-26 23:18 ` [RFC PATCH 28/28] kvm: mmu: Support MMIO in the direct MMU Ben Gardon
2019-10-17 18:50 ` [RFC PATCH 00/28] kvm: mmu: Rework the x86 TDP direct mapped case Sean Christopherson
2019-10-18 13:42   ` Paolo Bonzini
2019-11-27 19:09 ` Sean Christopherson
2019-12-06 19:55   ` Ben Gardon
2019-12-06 19:57     ` Sean Christopherson
2019-12-06 20:42       ` Ben Gardon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191127190401.GG22227@linux.intel.com \
    --to=sean.j.christopherson@intel.com \
    --cc=bgardon@google.com \
    --cc=jmattson@google.com \
    --cc=junaids@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=pfeiner@google.com \
    --cc=pshier@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).