All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ben Gardon <bgardon@google.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
	Sean Christopherson <seanjc@google.com>,
	Peter Shier <pshier@google.com>,
	Peter Feiner <pfeiner@google.com>,
	Junaid Shahid <junaids@google.com>,
	Jim Mattson <jmattson@google.com>,
	Yulei Zhang <yulei.kernel@gmail.com>,
	Wanpeng Li <kernellwp@gmail.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Xiao Guangrong <xiaoguangrong.eric@gmail.com>,
	Ben Gardon <bgardon@google.com>
Subject: [PATCH v2 21/28] KVM: x86/mmu: Flush TLBs after zap in TDP MMU PF handler
Date: Tue,  2 Feb 2021 10:57:27 -0800	[thread overview]
Message-ID: <20210202185734.1680553-22-bgardon@google.com> (raw)
In-Reply-To: <20210202185734.1680553-1-bgardon@google.com>

When the TDP MMU is allowed to handle page faults in parallel there is
the possiblity of a race where an SPTE is cleared and then imediately
replaced with a present SPTE pointing to a different PFN, before the
TLBs can be flushed. This race would violate architectural specs. Ensure
that the TLBs are flushed properly before other threads are allowed to
install any present value for the SPTE.

Reviewed-by: Peter Feiner <pfeiner@google.com>
Signed-off-by: Ben Gardon <bgardon@google.com>

---

v1 -> v2
- Renamed "FROZEN_SPTE" to "REMOVED_SPTE" and updated derivative
  comments and code

 arch/x86/kvm/mmu/spte.h    | 21 ++++++++++++-
 arch/x86/kvm/mmu/tdp_mmu.c | 63 ++++++++++++++++++++++++++++++++------
 2 files changed, 74 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 2b3a30bd38b0..3f974006cfb6 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -130,6 +130,25 @@ extern u64 __read_mostly shadow_nonpresent_or_rsvd_mask;
 					  PT64_EPT_EXECUTABLE_MASK)
 #define SHADOW_ACC_TRACK_SAVED_BITS_SHIFT PT64_SECOND_AVAIL_BITS_SHIFT
 
+/*
+ * If a thread running without exclusive control of the MMU lock must perform a
+ * multi-part operation on an SPTE, it can set the SPTE to REMOVED_SPTE as a
+ * non-present intermediate value. Other threads which encounter this value
+ * should not modify the SPTE.
+ *
+ * This constant works because it is considered non-present on both AMD and
+ * Intel CPUs and does not create a L1TF vulnerability because the pfn section
+ * is zeroed out.
+ *
+ * Only used by the TDP MMU.
+ */
+#define REMOVED_SPTE (1ull << 59)
+
+static inline bool is_removed_spte(u64 spte)
+{
+	return spte == REMOVED_SPTE;
+}
+
 /*
  * In some cases, we need to preserve the GFN of a non-present or reserved
  * SPTE when we usurp the upper five bits of the physical address space to
@@ -187,7 +206,7 @@ static inline bool is_access_track_spte(u64 spte)
 
 static inline int is_shadow_present_pte(u64 pte)
 {
-	return (pte != 0) && !is_mmio_spte(pte);
+	return (pte != 0) && !is_mmio_spte(pte) && !is_removed_spte(pte);
 }
 
 static inline int is_large_pte(u64 pte)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 0b5a9339ac55..7a2cdfeac4d2 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -427,15 +427,19 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 	 */
 	if (!was_present && !is_present) {
 		/*
-		 * If this change does not involve a MMIO SPTE, it is
-		 * unexpected. Log the change, though it should not impact the
-		 * guest since both the former and current SPTEs are nonpresent.
+		 * If this change does not involve a MMIO SPTE or removed SPTE,
+		 * it is unexpected. Log the change, though it should not
+		 * impact the guest since both the former and current SPTEs
+		 * are nonpresent.
 		 */
-		if (WARN_ON(!is_mmio_spte(old_spte) && !is_mmio_spte(new_spte)))
+		if (WARN_ON(!is_mmio_spte(old_spte) &&
+			    !is_mmio_spte(new_spte) &&
+			    !is_removed_spte(new_spte)))
 			pr_err("Unexpected SPTE change! Nonpresent SPTEs\n"
 			       "should not be replaced with another,\n"
 			       "different nonpresent SPTE, unless one or both\n"
-			       "are MMIO SPTEs.\n"
+			       "are MMIO SPTEs, or the new SPTE is\n"
+			       "a temporary removed SPTE.\n"
 			       "as_id: %d gfn: %llx old_spte: %llx new_spte: %llx level: %d",
 			       as_id, gfn, old_spte, new_spte, level);
 		return;
@@ -486,6 +490,13 @@ static inline bool tdp_mmu_set_spte_atomic(struct kvm *kvm,
 
 	lockdep_assert_held_read(&kvm->mmu_lock);
 
+	/*
+	 * Do not change removed SPTEs. Only the thread that froze the SPTE
+	 * may modify it.
+	 */
+	if (iter->old_spte == REMOVED_SPTE)
+		return false;
+
 	if (cmpxchg64(rcu_dereference(iter->sptep), iter->old_spte,
 		      new_spte) != iter->old_spte)
 		return false;
@@ -496,6 +507,34 @@ static inline bool tdp_mmu_set_spte_atomic(struct kvm *kvm,
 	return true;
 }
 
+static inline bool tdp_mmu_zap_spte_atomic(struct kvm *kvm,
+					   struct tdp_iter *iter)
+{
+	/*
+	 * Freeze the SPTE by setting it to a special,
+	 * non-present value. This will stop other threads from
+	 * immediately installing a present entry in its place
+	 * before the TLBs are flushed.
+	 */
+	if (!tdp_mmu_set_spte_atomic(kvm, iter, REMOVED_SPTE))
+		return false;
+
+	kvm_flush_remote_tlbs_with_address(kvm, iter->gfn,
+					   KVM_PAGES_PER_HPAGE(iter->level));
+
+	/*
+	 * No other thread can overwrite the removed SPTE as they
+	 * must either wait on the MMU lock or use
+	 * tdp_mmu_set_spte_atomic which will not overrite the
+	 * special removed SPTE value. No bookkeeping is needed
+	 * here since the SPTE is going from non-present
+	 * to non-present.
+	 */
+	WRITE_ONCE(*iter->sptep, 0);
+
+	return true;
+}
+
 
 /*
  * __tdp_mmu_set_spte - Set a TDP MMU SPTE and handle the associated bookkeeping
@@ -523,6 +562,15 @@ static inline void __tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter,
 
 	lockdep_assert_held_write(&kvm->mmu_lock);
 
+	/*
+	 * No thread should be using this function to set SPTEs to the
+	 * temporary removed SPTE value.
+	 * If operating under the MMU lock in read mode, tdp_mmu_set_spte_atomic
+	 * should be used. If operating under the MMU lock in write mode, the
+	 * use of the removed SPTE should not be necessary.
+	 */
+	WARN_ON(iter->old_spte == REMOVED_SPTE);
+
 	WRITE_ONCE(*rcu_dereference(iter->sptep), new_spte);
 
 	__handle_changed_spte(kvm, as_id, iter->gfn, iter->old_spte, new_spte,
@@ -790,12 +838,9 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 		 */
 		if (is_shadow_present_pte(iter.old_spte) &&
 		    is_large_pte(iter.old_spte)) {
-			if (!tdp_mmu_set_spte_atomic(vcpu->kvm, &iter, 0))
+			if (!tdp_mmu_zap_spte_atomic(vcpu->kvm, &iter))
 				break;
 
-			kvm_flush_remote_tlbs_with_address(vcpu->kvm, iter.gfn,
-					KVM_PAGES_PER_HPAGE(iter.level));
-
 			/*
 			 * The iter must explicitly re-read the spte here
 			 * because the new value informs the !present
-- 
2.30.0.365.g02bc693789-goog


  parent reply	other threads:[~2021-02-02 19:10 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-02 18:57 [PATCH v2 00/28] Allow parallel MMU operations with TDP MMU Ben Gardon
2021-02-02 18:57 ` [PATCH v2 01/28] KVM: x86/mmu: change TDP MMU yield function returns to match cond_resched Ben Gardon
2021-02-02 18:57 ` [PATCH v2 02/28] KVM: x86/mmu: Add comment on __tdp_mmu_set_spte Ben Gardon
2021-02-02 18:57 ` [PATCH v2 03/28] KVM: x86/mmu: Add lockdep when setting a TDP MMU SPTE Ben Gardon
2021-02-02 18:57 ` [PATCH v2 04/28] KVM: x86/mmu: Don't redundantly clear TDP MMU pt memory Ben Gardon
2021-02-02 18:57 ` [PATCH v2 05/28] KVM: x86/mmu: Factor out handling of removed page tables Ben Gardon
2021-02-02 18:57 ` [PATCH v2 06/28] locking/rwlocks: Add contention detection for rwlocks Ben Gardon
2021-02-02 22:06   ` kernel test robot
2021-02-09 20:39   ` Guenter Roeck
2021-02-09 21:46     ` Waiman Long
2021-02-09 22:25       ` Guenter Roeck
2021-02-10  0:27         ` Waiman Long
2021-02-10  0:41           ` Waiman Long
2021-02-10  6:04           ` Guenter Roeck
2021-02-10 14:57             ` Waiman Long
2021-02-10  3:32   ` Waiman Long
2021-02-10 15:15     ` Waiman Long
2021-02-02 18:57 ` [PATCH v2 07/28] sched: Add needbreak " Ben Gardon
2021-02-02 18:57 ` [PATCH v2 08/28] sched: Add cond_resched_rwlock Ben Gardon
2021-02-02 18:57 ` [PATCH v2 09/28] KVM: x86/mmu: Fix braces in kvm_recover_nx_lpages Ben Gardon
2021-02-02 18:57 ` [PATCH v2 10/28] KVM: x86/mmu: Fix TDP MMU zap collapsible SPTEs Ben Gardon
2021-02-03  9:43   ` Paolo Bonzini
2021-02-02 18:57 ` [PATCH v2 11/28] KVM: x86/mmu: Merge flush and non-flush tdp_mmu_iter_cond_resched Ben Gardon
2021-02-02 18:57 ` [PATCH v2 12/28] KVM: x86/mmu: Rename goal_gfn to next_last_level_gfn Ben Gardon
2021-02-02 18:57 ` [PATCH v2 13/28] KVM: x86/mmu: Ensure forward progress when yielding in TDP MMU iter Ben Gardon
2021-02-05 23:42   ` Sean Christopherson
2021-02-02 18:57 ` [PATCH v2 14/28] KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed Ben Gardon
2021-02-02 18:57 ` [PATCH v2 15/28] KVM: x86/mmu: Skip no-op changes in TDP MMU functions Ben Gardon
2021-02-02 18:57 ` [PATCH v2 16/28] KVM: x86/mmu: Clear dirtied pages mask bit before early break Ben Gardon
2021-02-02 18:57 ` [PATCH v2 17/28] KVM: x86/mmu: Protect TDP MMU page table memory with RCU Ben Gardon
2021-02-02 18:57 ` [PATCH v2 18/28] KVM: x86/mmu: Use an rwlock for the x86 MMU Ben Gardon
     [not found]   ` <c8aa8f9c-2305-5d58-3b48-261663524ad5@redhat.com>
     [not found]     ` <CANgfPd_RxhBwM95MQQmGOdtmeH8c6=zPqUnXXHNV5Ta0R5R=iw@mail.gmail.com>
2021-02-03 18:14       ` Paolo Bonzini
2021-02-03 23:06   ` kernel test robot
2021-02-02 18:57 ` [PATCH v2 19/28] KVM: x86/mmu: Factor out functions to add/remove TDP MMU pages Ben Gardon
2021-02-02 18:57 ` [PATCH v2 20/28] KVM: x86/mmu: Use atomic ops to set SPTEs in TDP MMU map Ben Gardon
2021-02-03  2:48   ` kernel test robot
2021-02-03  2:48     ` kernel test robot
2021-02-03 11:14   ` Paolo Bonzini
2021-02-06  0:26     ` Sean Christopherson
2021-02-08 10:32       ` Paolo Bonzini
2021-04-01 10:32   ` Paolo Bonzini
2021-04-01 16:50     ` Ben Gardon
2021-04-01 17:32       ` Paolo Bonzini
2021-04-01 18:09         ` Sean Christopherson
2021-02-02 18:57 ` Ben Gardon [this message]
2021-02-06  0:29   ` [PATCH v2 21/28] KVM: x86/mmu: Flush TLBs after zap in TDP MMU PF handler Sean Christopherson
2021-02-02 18:57 ` [PATCH v2 22/28] KVM: x86/mmu: Mark SPTEs in disconnected pages as removed Ben Gardon
2021-02-03 11:17   ` Paolo Bonzini
2021-02-02 18:57 ` [PATCH v2 23/28] KVM: x86/mmu: Allow parallel page faults for the TDP MMU Ben Gardon
2021-02-03 12:39   ` Paolo Bonzini
2021-02-03 17:46     ` Ben Gardon
2021-02-03 18:30       ` Paolo Bonzini
2021-02-06  0:12         ` Sean Christopherson
2021-02-02 18:57 ` [PATCH v2 24/28] KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock Ben Gardon
2021-02-03 11:25   ` Paolo Bonzini
2021-02-03 11:26   ` Paolo Bonzini
2021-02-03 18:31     ` Ben Gardon
2021-02-03 18:32       ` Paolo Bonzini
2021-02-02 18:57 ` [PATCH v2 25/28] KVM: x86/mmu: Allow zapping collapsible SPTEs to use MMU " Ben Gardon
2021-02-03 11:34   ` Paolo Bonzini
2021-02-03 18:51     ` Ben Gardon
2021-02-02 18:57 ` [PATCH v2 26/28] KVM: x86/mmu: Allow enabling / disabling dirty logging under " Ben Gardon
2021-02-03 11:38   ` Paolo Bonzini
2021-02-02 18:57 ` [PATCH v2 27/28] KVM: selftests: Add backing src parameter to dirty_log_perf_test Ben Gardon
2021-02-02 18:57 ` [PATCH v2 28/28] KVM: selftests: Disable dirty logging with vCPUs running Ben Gardon
2021-02-03 10:07   ` Paolo Bonzini
2021-02-03 11:00 ` [PATCH v2 00/28] Allow parallel MMU operations with TDP MMU Paolo Bonzini
2021-02-03 17:54   ` Sean Christopherson
2021-02-03 18:13     ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210202185734.1680553-22-bgardon@google.com \
    --to=bgardon@google.com \
    --cc=jmattson@google.com \
    --cc=junaids@google.com \
    --cc=kernellwp@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=pfeiner@google.com \
    --cc=pshier@google.com \
    --cc=seanjc@google.com \
    --cc=vkuznets@redhat.com \
    --cc=xiaoguangrong.eric@gmail.com \
    --cc=yulei.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.