All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Yan Zhao <yan.y.zhao@intel.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	David Matlack <dmatlack@google.com>,
	Mingwei Zhang <mizhang@google.com>,
	Ben Gardon <bgardon@google.com>
Subject: Re: [PATCH v3 5/8] KVM: x86/mmu: Set disallowed_nx_huge_page in TDP MMU before setting SPTE
Date: Tue, 9 Aug 2022 14:44:19 +0000	[thread overview]
Message-ID: <YvJyw96QZdf6YPAX@google.com> (raw)
In-Reply-To: <331dc774-c662-9475-1175-725cb2382bb2@redhat.com>

On Tue, Aug 09, 2022, Paolo Bonzini wrote:
> On 8/9/22 05:26, Yan Zhao wrote:
> > hi Sean,
> > 
> > I understand this smp_rmb() is intended to prevent the reading of
> > p->nx_huge_page_disallowed from happening before it's set to true in
> > kvm_tdp_mmu_map(). Is this understanding right?
> > 
> > If it's true, then do we also need the smp_rmb() for read of sp->gfn in
> > handle_removed_pt()? (or maybe for other fields in sp in other places?)
> 
> No, in that case the barrier is provided by rcu_dereference().  In fact, I
> am not sure the barriers are needed in this patch either (but the comments
> are :)):

Yeah, I'm 99% certain the barriers aren't strictly required, but I didn't love the
idea of depending on other implementation details for the barriers.  Of course I
completely overlooked the fact that all other sp fields would need the same
barriers...

> - the write barrier is certainly not needed because it is implicit in
> tdp_mmu_set_spte_atomic's cmpxchg64
> 
> - the read barrier _should_ also be provided by rcu_dereference(pt), but I'm
> not 100% sure about that. The reasoning is that you have
> 
> (1)	iter->old spte = READ_ONCE(*rcu_dereference(iter->sptep));
> 	...
> (2)	tdp_ptep_t pt = spte_to_child_pt(old_spte, level);
> (3)	struct kvm_mmu_page *sp = sptep_to_sp(rcu_dereference(pt));
> 	...
> (4)	if (sp->nx_huge_page_disallowed) {
> 
> and (4) is definitely ordered after (1) thanks to the READ_ONCE hidden
> within (3) and the data dependency from old_spte to sp.

Yes, I think that's correct.  Callers must verify the SPTE is present before getting
the associated child shadow page.  KVM does have instances where a shadow page is
retrieved from the SPTE _pointer_, but that's the parent shadow page, i.e. isn't
guarded by the SPTE being present.

	struct kvm_mmu_page *sp = sptep_to_sp(rcu_dereference(iter->sptep));

Something like this is as a separate patch?

diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index f0af385c56e0..9d982ccf4567 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -13,6 +13,12 @@
  * to be zapped while holding mmu_lock for read, and to allow TLB flushes to be
  * batched without having to collect the list of zapped SPs.  Flows that can
  * remove SPs must service pending TLB flushes prior to dropping RCU protection.
+ *
+ * The READ_ONCE() ensures that, if the SPTE points at a child shadow page, all
+ * fields in struct kvm_mmu_page will be read after the caller observes the
+ * present SPTE (KVM must check that the SPTE is present before following the
+ * SPTE's pfn to its associated shadow page).  Pairs with the implicit memory
+ * barrier in tdp_mmu_set_spte_atomic().
  */
 static inline u64 kvm_tdp_mmu_read_spte(tdp_ptep_t sptep)
 {
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index bf2ccf9debca..ca50296e3696 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -645,6 +645,11 @@ static inline int tdp_mmu_set_spte_atomic(struct kvm *kvm,
        lockdep_assert_held_read(&kvm->mmu_lock);

        /*
+        * The atomic CMPXCHG64 provides an implicit memory barrier and ensures
+        * that, if the SPTE points at a shadow page, all struct kvm_mmu_page
+        * fields are visible to readers before the SPTE is marked present.
+        * Pairs with ordering guarantees provided by kvm_tdp_mmu_read_spte().
+        *
         * Note, fast_pf_fix_direct_spte() can also modify TDP MMU SPTEs and
         * does not hold the mmu_lock.
         */

  reply	other threads:[~2022-08-09 14:44 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-05 23:05 [PATCH v3 0/8] KVM: x86: Apply NX mitigation more precisely Sean Christopherson
2022-08-05 23:05 ` [PATCH v3 1/8] KVM: x86/mmu: Bug the VM if KVM attempts to double count an NX huge page Sean Christopherson
2022-08-14  0:53   ` Mingwei Zhang
2022-08-05 23:05 ` [PATCH v3 2/8] KVM: x86/mmu: Tag disallowed NX huge pages even if they're not tracked Sean Christopherson
2022-08-14  0:53   ` Mingwei Zhang
2022-08-05 23:05 ` [PATCH v3 3/8] KVM: x86/mmu: Rename NX huge pages fields/functions for consistency Sean Christopherson
2022-08-14  1:12   ` Mingwei Zhang
2022-08-15 21:54     ` Sean Christopherson
2022-08-16 21:09       ` Mingwei Zhang
2022-08-17 16:13         ` Sean Christopherson
2022-08-18 22:13           ` Mingwei Zhang
2022-08-18 23:45             ` Sean Christopherson
2022-08-19 18:30               ` Mingwei Zhang
2022-08-20  1:04                 ` Mingwei Zhang
2022-08-05 23:05 ` [PATCH v3 4/8] KVM: x86/mmu: Properly account NX huge page workaround for nonpaging MMUs Sean Christopherson
2022-08-16 21:25   ` Mingwei Zhang
2022-08-05 23:05 ` [PATCH v3 5/8] KVM: x86/mmu: Set disallowed_nx_huge_page in TDP MMU before setting SPTE Sean Christopherson
2022-08-09  3:26   ` Yan Zhao
2022-08-09 12:49     ` Paolo Bonzini
2022-08-09 14:44       ` Sean Christopherson [this message]
2022-08-09 14:48         ` Paolo Bonzini
2022-08-09 15:05           ` Sean Christopherson
2022-08-05 23:05 ` [PATCH v3 6/8] KVM: x86/mmu: Track the number of TDP MMU pages, but not the actual pages Sean Christopherson
2022-08-05 23:05 ` [PATCH v3 7/8] KVM: x86/mmu: Add helper to convert SPTE value to its shadow page Sean Christopherson
2022-08-05 23:05 ` [PATCH v3 8/8] KVM: x86/mmu: explicitly check nx_hugepage in disallowed_hugepage_adjust() Sean Christopherson
2022-08-09 12:57   ` Paolo Bonzini
2022-08-09 14:49     ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YvJyw96QZdf6YPAX@google.com \
    --to=seanjc@google.com \
    --cc=bgardon@google.com \
    --cc=dmatlack@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mizhang@google.com \
    --cc=pbonzini@redhat.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.