All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlock
@ 2021-08-12 18:18 Sean Christopherson
  2021-08-12 18:29 ` Ben Gardon
  2021-08-13  7:36 ` Paolo Bonzini
  0 siblings, 2 replies; 4+ messages in thread
From: Sean Christopherson @ 2021-08-12 18:18 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Add yet another spinlock for the TDP MMU and take it when marking indirect
shadow pages unsync.  When using the TDP MMU and L1 is running L2(s) with
nested TDP, KVM may encounter shadow pages for the TDP entries managed by
L1 (controlling L2) when handling a TDP MMU page fault.  The unsync logic
is not thread safe, e.g. the kvm_mmu_page fields are not atomic, and
misbehaves when a shadow page is marked unsync via a TDP MMU page fault,
which runs with mmu_lock held for read, not write.

Lack of a critical section manifests most visibly as an underflow of
unsync_children in clear_unsync_child_bit() due to unsync_children being
corrupted when multiple CPUs write it without a critical section and
without atomic operations.  But underflow is the best case scenario.  The
worst case scenario is that unsync_children prematurely hits '0' and
leads to guest memory corruption due to KVM neglecting to properly sync
shadow pages.

Use an entirely new spinlock even though piggybacking tdp_mmu_pages_lock
would functionally be ok.  Usurping the lock could degrade performance when
building upper level page tables on different vCPUs, especially since the
unsync flow could hold the lock for a comparatively long time depending on
the number of indirect shadow pages and the depth of the paging tree.

For simplicity, take the lock for all MMUs, even though KVM could fairly
easily know that mmu_lock is held for write.  If mmu_lock is held for
write, there cannot be contention for the inner spinlock, and marking
shadow pages unsync across multiple vCPUs will be slow enough that
bouncing the kvm_arch cacheline should be in the noise.

Note, even though L2 could theoretically be given access to its own EPT
entries, a nested MMU must hold mmu_lock for write and thus cannot race
against a TDP MMU page fault.  I.e. the additional spinlock only _needs_ to
be taken by the TDP MMU, as opposed to being taken by any MMU for a VM
that is running with the TDP MMU enabled.  Holding mmu_lock for read also
prevents the indirect shadow page from being freed.  But as above, keep
it simple and always take the lock.

Alternative #1, the TDP MMU could simply pass "false" for can_unsync and
effectively disable unsync behavior for nested TDP.  Write protecting leaf
shadow pages is unlikely to noticeably impact traditional L1 VMMs, as such
VMMs typically don't modify TDP entries, but the same may not hold true for
non-standard use cases and/or VMMs that are migrating physical pages (from
L1's perspective).

Alternative #2, the unsync logic could be made thread safe.  In theory,
simply converting all relevant kvm_mmu_page fields to atomics and using
atomic bitops for the bitmap would suffice.  However, (a) an in-depth audit
would be required, (b) the code churn would be substantial, and (c) legacy
shadow paging would incur additional atomic operations in performance
sensitive paths for no benefit (to legacy shadow paging).

Fixes: a2855afc7ee8 ("KVM: x86/mmu: Allow parallel page faults for the TDP MMU")
Cc: stable@vger.kernel.org
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 Documentation/virt/kvm/locking.rst |  8 ++++----
 arch/x86/include/asm/kvm_host.h    |  7 +++++++
 arch/x86/kvm/mmu/mmu.c             | 28 ++++++++++++++++++++++++++++
 3 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
index 8138201efb09..5d27da356836 100644
--- a/Documentation/virt/kvm/locking.rst
+++ b/Documentation/virt/kvm/locking.rst
@@ -31,10 +31,10 @@ On x86:
 
 - vcpu->mutex is taken outside kvm->arch.hyperv.hv_lock
 
-- kvm->arch.mmu_lock is an rwlock.  kvm->arch.tdp_mmu_pages_lock is
-  taken inside kvm->arch.mmu_lock, and cannot be taken without already
-  holding kvm->arch.mmu_lock (typically with ``read_lock``, otherwise
-  there's no need to take kvm->arch.tdp_mmu_pages_lock at all).
+- kvm->arch.mmu_lock is an rwlock.  kvm->arch.tdp_mmu_pages_lock and
+  kvm->arch.mmu_unsync_pages_lock are taken inside kvm->arch.mmu_lock, and
+  cannot be taken without already holding kvm->arch.mmu_lock (typically with
+  ``read_lock`` for the TDP MMU, thus the need for additional spinlocks).
 
 Everything else is a leaf: no other lock is taken inside the critical
 sections.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 20daaf67a5bf..cf32b87b6bd3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1036,6 +1036,13 @@ struct kvm_arch {
 	struct list_head lpage_disallowed_mmu_pages;
 	struct kvm_page_track_notifier_node mmu_sp_tracker;
 	struct kvm_page_track_notifier_head track_notifier_head;
+	/*
+	 * Protects marking pages unsync during page faults, as TDP MMU page
+	 * faults only take mmu_lock for read.  For simplicity, the unsync
+	 * pages lock is always taken when marking pages unsync regardless of
+	 * whether mmu_lock is held for read or write.
+	 */
+	spinlock_t mmu_unsync_pages_lock;
 
 	struct list_head assigned_dev_head;
 	struct iommu_domain *iommu_domain;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a272ccbddfa1..cef526dac730 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2596,6 +2596,7 @@ static void kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn, bool can_unsync)
 {
 	struct kvm_mmu_page *sp;
+	bool locked = false;
 
 	/*
 	 * Force write-protection if the page is being tracked.  Note, the page
@@ -2618,9 +2619,34 @@ int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn, bool can_unsync)
 		if (sp->unsync)
 			continue;
 
+		/*
+		 * TDP MMU page faults require an additional spinlock as they
+		 * run with mmu_lock held for read, not write, and the unsync
+		 * logic is not thread safe.  Take the spinklock regardless of
+		 * the MMU type to avoid extra conditionals/parameters, there's
+		 * no meaningful penalty if mmu_lock is held for write.
+		 */
+		if (!locked) {
+			locked = true;
+			spin_lock(&vcpu->kvm->arch.mmu_unsync_pages_lock);
+
+			/*
+			 * Recheck after taking the spinlock, a different vCPU
+			 * may have since marked the page unsync.  A false
+			 * positive on the unprotected check above is not
+			 * possible as clearing sp->unsync _must_ hold mmu_lock
+			 * for write, i.e. unsync cannot transition from 0->1
+			 * while this CPU holds mmu_lock for read (or write).
+			 */
+			if (READ_ONCE(sp->unsync))
+				continue;
+		}
+
 		WARN_ON(sp->role.level != PG_LEVEL_4K);
 		kvm_unsync_page(vcpu, sp);
 	}
+	if (locked)
+		spin_unlock(&vcpu->kvm->arch.mmu_unsync_pages_lock);
 
 	/*
 	 * We need to ensure that the marking of unsync pages is visible
@@ -5604,6 +5630,8 @@ void kvm_mmu_init_vm(struct kvm *kvm)
 {
 	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
 
+	spin_lock_init(&kvm->arch.mmu_unsync_pages_lock);
+
 	if (!kvm_mmu_init_tdp_mmu(kvm))
 		/*
 		 * No smp_load/store wrappers needed here as we are in
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlock
  2021-08-12 18:18 [PATCH v2] KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlock Sean Christopherson
@ 2021-08-12 18:29 ` Ben Gardon
  2021-08-13  7:36 ` Paolo Bonzini
  1 sibling, 0 replies; 4+ messages in thread
From: Ben Gardon @ 2021-08-12 18:29 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, LKML

On Thu, Aug 12, 2021 at 11:18 AM Sean Christopherson <seanjc@google.com> wrote:
>
> Add yet another spinlock for the TDP MMU and take it when marking indirect
> shadow pages unsync.  When using the TDP MMU and L1 is running L2(s) with
> nested TDP, KVM may encounter shadow pages for the TDP entries managed by
> L1 (controlling L2) when handling a TDP MMU page fault.  The unsync logic
> is not thread safe, e.g. the kvm_mmu_page fields are not atomic, and
> misbehaves when a shadow page is marked unsync via a TDP MMU page fault,
> which runs with mmu_lock held for read, not write.
>
> Lack of a critical section manifests most visibly as an underflow of
> unsync_children in clear_unsync_child_bit() due to unsync_children being
> corrupted when multiple CPUs write it without a critical section and
> without atomic operations.  But underflow is the best case scenario.  The
> worst case scenario is that unsync_children prematurely hits '0' and
> leads to guest memory corruption due to KVM neglecting to properly sync
> shadow pages.
>
> Use an entirely new spinlock even though piggybacking tdp_mmu_pages_lock
> would functionally be ok.  Usurping the lock could degrade performance when
> building upper level page tables on different vCPUs, especially since the
> unsync flow could hold the lock for a comparatively long time depending on
> the number of indirect shadow pages and the depth of the paging tree.
>
> For simplicity, take the lock for all MMUs, even though KVM could fairly
> easily know that mmu_lock is held for write.  If mmu_lock is held for
> write, there cannot be contention for the inner spinlock, and marking
> shadow pages unsync across multiple vCPUs will be slow enough that
> bouncing the kvm_arch cacheline should be in the noise.
>
> Note, even though L2 could theoretically be given access to its own EPT
> entries, a nested MMU must hold mmu_lock for write and thus cannot race
> against a TDP MMU page fault.  I.e. the additional spinlock only _needs_ to
> be taken by the TDP MMU, as opposed to being taken by any MMU for a VM
> that is running with the TDP MMU enabled.  Holding mmu_lock for read also
> prevents the indirect shadow page from being freed.  But as above, keep
> it simple and always take the lock.
>
> Alternative #1, the TDP MMU could simply pass "false" for can_unsync and
> effectively disable unsync behavior for nested TDP.  Write protecting leaf
> shadow pages is unlikely to noticeably impact traditional L1 VMMs, as such
> VMMs typically don't modify TDP entries, but the same may not hold true for
> non-standard use cases and/or VMMs that are migrating physical pages (from
> L1's perspective).
>
> Alternative #2, the unsync logic could be made thread safe.  In theory,
> simply converting all relevant kvm_mmu_page fields to atomics and using
> atomic bitops for the bitmap would suffice.  However, (a) an in-depth audit
> would be required, (b) the code churn would be substantial, and (c) legacy
> shadow paging would incur additional atomic operations in performance
> sensitive paths for no benefit (to legacy shadow paging).
>
> Fixes: a2855afc7ee8 ("KVM: x86/mmu: Allow parallel page faults for the TDP MMU")
> Cc: stable@vger.kernel.org
> Cc: Ben Gardon <bgardon@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  Documentation/virt/kvm/locking.rst |  8 ++++----
>  arch/x86/include/asm/kvm_host.h    |  7 +++++++
>  arch/x86/kvm/mmu/mmu.c             | 28 ++++++++++++++++++++++++++++
>  3 files changed, 39 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
> index 8138201efb09..5d27da356836 100644
> --- a/Documentation/virt/kvm/locking.rst
> +++ b/Documentation/virt/kvm/locking.rst
> @@ -31,10 +31,10 @@ On x86:
>
>  - vcpu->mutex is taken outside kvm->arch.hyperv.hv_lock
>
> -- kvm->arch.mmu_lock is an rwlock.  kvm->arch.tdp_mmu_pages_lock is
> -  taken inside kvm->arch.mmu_lock, and cannot be taken without already
> -  holding kvm->arch.mmu_lock (typically with ``read_lock``, otherwise
> -  there's no need to take kvm->arch.tdp_mmu_pages_lock at all).
> +- kvm->arch.mmu_lock is an rwlock.  kvm->arch.tdp_mmu_pages_lock and
> +  kvm->arch.mmu_unsync_pages_lock are taken inside kvm->arch.mmu_lock, and
> +  cannot be taken without already holding kvm->arch.mmu_lock (typically with
> +  ``read_lock`` for the TDP MMU, thus the need for additional spinlocks).
>
>  Everything else is a leaf: no other lock is taken inside the critical
>  sections.
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 20daaf67a5bf..cf32b87b6bd3 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1036,6 +1036,13 @@ struct kvm_arch {
>         struct list_head lpage_disallowed_mmu_pages;
>         struct kvm_page_track_notifier_node mmu_sp_tracker;
>         struct kvm_page_track_notifier_head track_notifier_head;
> +       /*
> +        * Protects marking pages unsync during page faults, as TDP MMU page
> +        * faults only take mmu_lock for read.  For simplicity, the unsync
> +        * pages lock is always taken when marking pages unsync regardless of
> +        * whether mmu_lock is held for read or write.
> +        */
> +       spinlock_t mmu_unsync_pages_lock;
>
>         struct list_head assigned_dev_head;
>         struct iommu_domain *iommu_domain;
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index a272ccbddfa1..cef526dac730 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2596,6 +2596,7 @@ static void kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
>  int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn, bool can_unsync)
>  {
>         struct kvm_mmu_page *sp;
> +       bool locked = false;
>
>         /*
>          * Force write-protection if the page is being tracked.  Note, the page

It might also be worth adding a note about how it's safe to use
for_each_gfn_indirect_valid_sp under the MMU read lock because
mmu_page_hash is only modified with the lock held for write.

> @@ -2618,9 +2619,34 @@ int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn, bool can_unsync)
>                 if (sp->unsync)
>                         continue;
>
> +               /*
> +                * TDP MMU page faults require an additional spinlock as they
> +                * run with mmu_lock held for read, not write, and the unsync
> +                * logic is not thread safe.  Take the spinklock regardless of
> +                * the MMU type to avoid extra conditionals/parameters, there's
> +                * no meaningful penalty if mmu_lock is held for write.
> +                */
> +               if (!locked) {
> +                       locked = true;
> +                       spin_lock(&vcpu->kvm->arch.mmu_unsync_pages_lock);
> +
> +                       /*
> +                        * Recheck after taking the spinlock, a different vCPU
> +                        * may have since marked the page unsync.  A false
> +                        * positive on the unprotected check above is not
> +                        * possible as clearing sp->unsync _must_ hold mmu_lock
> +                        * for write, i.e. unsync cannot transition from 0->1
> +                        * while this CPU holds mmu_lock for read (or write).
> +                        */
> +                       if (READ_ONCE(sp->unsync))
> +                               continue;
> +               }
> +
>                 WARN_ON(sp->role.level != PG_LEVEL_4K);
>                 kvm_unsync_page(vcpu, sp);
>         }
> +       if (locked)
> +               spin_unlock(&vcpu->kvm->arch.mmu_unsync_pages_lock);
>
>         /*
>          * We need to ensure that the marking of unsync pages is visible
> @@ -5604,6 +5630,8 @@ void kvm_mmu_init_vm(struct kvm *kvm)
>  {
>         struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
>
> +       spin_lock_init(&kvm->arch.mmu_unsync_pages_lock);
> +
>         if (!kvm_mmu_init_tdp_mmu(kvm))
>                 /*
>                  * No smp_load/store wrappers needed here as we are in
> --
> 2.33.0.rc1.237.g0d66db33f3-goog
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlock
  2021-08-12 18:18 [PATCH v2] KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlock Sean Christopherson
  2021-08-12 18:29 ` Ben Gardon
@ 2021-08-13  7:36 ` Paolo Bonzini
  1 sibling, 0 replies; 4+ messages in thread
From: Paolo Bonzini @ 2021-08-13  7:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon

On 12/08/21 20:18, Sean Christopherson wrote:
> Add yet another spinlock for the TDP MMU and take it when marking indirect
> shadow pages unsync.  When using the TDP MMU and L1 is running L2(s) with
> nested TDP, KVM may encounter shadow pages for the TDP entries managed by
> L1 (controlling L2) when handling a TDP MMU page fault.  The unsync logic
> is not thread safe, e.g. the kvm_mmu_page fields are not atomic, and
> misbehaves when a shadow page is marked unsync via a TDP MMU page fault,
> which runs with mmu_lock held for read, not write.
> 
> Lack of a critical section manifests most visibly as an underflow of
> unsync_children in clear_unsync_child_bit() due to unsync_children being
> corrupted when multiple CPUs write it without a critical section and
> without atomic operations.  But underflow is the best case scenario.  The
> worst case scenario is that unsync_children prematurely hits '0' and
> leads to guest memory corruption due to KVM neglecting to properly sync
> shadow pages.
> 
> Use an entirely new spinlock even though piggybacking tdp_mmu_pages_lock
> would functionally be ok.  Usurping the lock could degrade performance when
> building upper level page tables on different vCPUs, especially since the
> unsync flow could hold the lock for a comparatively long time depending on
> the number of indirect shadow pages and the depth of the paging tree.
> 
> For simplicity, take the lock for all MMUs, even though KVM could fairly
> easily know that mmu_lock is held for write.  If mmu_lock is held for
> write, there cannot be contention for the inner spinlock, and marking
> shadow pages unsync across multiple vCPUs will be slow enough that
> bouncing the kvm_arch cacheline should be in the noise.
> 
> Note, even though L2 could theoretically be given access to its own EPT
> entries, a nested MMU must hold mmu_lock for write and thus cannot race
> against a TDP MMU page fault.  I.e. the additional spinlock only _needs_ to
> be taken by the TDP MMU, as opposed to being taken by any MMU for a VM
> that is running with the TDP MMU enabled.  Holding mmu_lock for read also
> prevents the indirect shadow page from being freed.  But as above, keep
> it simple and always take the lock.
> 
> Alternative #1, the TDP MMU could simply pass "false" for can_unsync and
> effectively disable unsync behavior for nested TDP.  Write protecting leaf
> shadow pages is unlikely to noticeably impact traditional L1 VMMs, as such
> VMMs typically don't modify TDP entries, but the same may not hold true for
> non-standard use cases and/or VMMs that are migrating physical pages (from
> L1's perspective).
> 
> Alternative #2, the unsync logic could be made thread safe.  In theory,
> simply converting all relevant kvm_mmu_page fields to atomics and using
> atomic bitops for the bitmap would suffice.  However, (a) an in-depth audit
> would be required, (b) the code churn would be substantial, and (c) legacy
> shadow paging would incur additional atomic operations in performance
> sensitive paths for no benefit (to legacy shadow paging).
> 
> Fixes: a2855afc7ee8 ("KVM: x86/mmu: Allow parallel page faults for the TDP MMU")
> Cc: stable@vger.kernel.org
> Cc: Ben Gardon <bgardon@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   Documentation/virt/kvm/locking.rst |  8 ++++----
>   arch/x86/include/asm/kvm_host.h    |  7 +++++++
>   arch/x86/kvm/mmu/mmu.c             | 28 ++++++++++++++++++++++++++++
>   3 files changed, 39 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
> index 8138201efb09..5d27da356836 100644
> --- a/Documentation/virt/kvm/locking.rst
> +++ b/Documentation/virt/kvm/locking.rst
> @@ -31,10 +31,10 @@ On x86:
>   
>   - vcpu->mutex is taken outside kvm->arch.hyperv.hv_lock
>   
> -- kvm->arch.mmu_lock is an rwlock.  kvm->arch.tdp_mmu_pages_lock is
> -  taken inside kvm->arch.mmu_lock, and cannot be taken without already
> -  holding kvm->arch.mmu_lock (typically with ``read_lock``, otherwise
> -  there's no need to take kvm->arch.tdp_mmu_pages_lock at all).
> +- kvm->arch.mmu_lock is an rwlock.  kvm->arch.tdp_mmu_pages_lock and
> +  kvm->arch.mmu_unsync_pages_lock are taken inside kvm->arch.mmu_lock, and
> +  cannot be taken without already holding kvm->arch.mmu_lock (typically with
> +  ``read_lock`` for the TDP MMU, thus the need for additional spinlocks).
>   
>   Everything else is a leaf: no other lock is taken inside the critical
>   sections.
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 20daaf67a5bf..cf32b87b6bd3 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1036,6 +1036,13 @@ struct kvm_arch {
>   	struct list_head lpage_disallowed_mmu_pages;
>   	struct kvm_page_track_notifier_node mmu_sp_tracker;
>   	struct kvm_page_track_notifier_head track_notifier_head;
> +	/*
> +	 * Protects marking pages unsync during page faults, as TDP MMU page
> +	 * faults only take mmu_lock for read.  For simplicity, the unsync
> +	 * pages lock is always taken when marking pages unsync regardless of
> +	 * whether mmu_lock is held for read or write.
> +	 */
> +	spinlock_t mmu_unsync_pages_lock;
>   
>   	struct list_head assigned_dev_head;
>   	struct iommu_domain *iommu_domain;
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index a272ccbddfa1..cef526dac730 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2596,6 +2596,7 @@ static void kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
>   int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn, bool can_unsync)
>   {
>   	struct kvm_mmu_page *sp;
> +	bool locked = false;
>   
>   	/*
>   	 * Force write-protection if the page is being tracked.  Note, the page
> @@ -2618,9 +2619,34 @@ int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn, bool can_unsync)
>   		if (sp->unsync)
>   			continue;
>   
> +		/*
> +		 * TDP MMU page faults require an additional spinlock as they
> +		 * run with mmu_lock held for read, not write, and the unsync
> +		 * logic is not thread safe.  Take the spinklock regardless of
> +		 * the MMU type to avoid extra conditionals/parameters, there's
> +		 * no meaningful penalty if mmu_lock is held for write.
> +		 */
> +		if (!locked) {
> +			locked = true;
> +			spin_lock(&vcpu->kvm->arch.mmu_unsync_pages_lock);
> +
> +			/*
> +			 * Recheck after taking the spinlock, a different vCPU
> +			 * may have since marked the page unsync.  A false
> +			 * positive on the unprotected check above is not
> +			 * possible as clearing sp->unsync _must_ hold mmu_lock
> +			 * for write, i.e. unsync cannot transition from 0->1
> +			 * while this CPU holds mmu_lock for read (or write).
> +			 */
> +			if (READ_ONCE(sp->unsync))
> +				continue;
> +		}
> +
>   		WARN_ON(sp->role.level != PG_LEVEL_4K);
>   		kvm_unsync_page(vcpu, sp);
>   	}
> +	if (locked)
> +		spin_unlock(&vcpu->kvm->arch.mmu_unsync_pages_lock);
>   
>   	/*
>   	 * We need to ensure that the marking of unsync pages is visible
> @@ -5604,6 +5630,8 @@ void kvm_mmu_init_vm(struct kvm *kvm)
>   {
>   	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
>   
> +	spin_lock_init(&kvm->arch.mmu_unsync_pages_lock);
> +
>   	if (!kvm_mmu_init_tdp_mmu(kvm))
>   		/*
>   		 * No smp_load/store wrappers needed here as we are in
> 

Queued, thanks.

Paolo


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlock
@ 2021-08-13  1:20 kernel test robot
  0 siblings, 0 replies; 4+ messages in thread
From: kernel test robot @ 2021-08-13  1:20 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 13423 bytes --]

CC: kbuild-all(a)lists.01.org
In-Reply-To: <20210812181815.3378104-1-seanjc@google.com>
References: <20210812181815.3378104-1-seanjc@google.com>
TO: Sean Christopherson <seanjc@google.com>
TO: Paolo Bonzini <pbonzini@redhat.com>
CC: Sean Christopherson <seanjc@google.com>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
CC: Wanpeng Li <wanpengli@tencent.com>
CC: Jim Mattson <jmattson@google.com>
CC: Joerg Roedel <joro@8bytes.org>
CC: kvm(a)vger.kernel.org
CC: linux-kernel(a)vger.kernel.org
CC: Ben Gardon <bgardon@google.com>

Hi Sean,

I love your patch! Perhaps something to improve:

[auto build test WARNING on kvm/queue]
[also build test WARNING on v5.14-rc5 next-20210812]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Sean-Christopherson/KVM-x86-mmu-Protect-marking-SPs-unsync-when-using-TDP-MMU-with-spinlock/20210813-022023
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git queue
:::::: branch date: 7 hours ago
:::::: commit date: 7 hours ago
config: i386-randconfig-c001-20210812 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@lip6.fr>


cocci warnings: (new ones prefixed by >>)
>> arch/x86/kvm/mmu/mmu.c:2631:3-12: second lock on line 2631

vim +2631 arch/x86/kvm/mmu/mmu.c

9cf5cf5ad43b29 arch/x86/kvm/mmu.c     Xiao Guangrong      2010-05-24  2589  
0337f585f57fc8 arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-06-22  2590  /*
0337f585f57fc8 arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-06-22  2591   * Attempt to unsync any shadow pages that can be reached by the specified gfn,
0337f585f57fc8 arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-06-22  2592   * KVM is creating a writable mapping for said gfn.  Returns 0 if all pages
0337f585f57fc8 arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-06-22  2593   * were marked unsync (or if there is no shadow page), -EPERM if the SPTE must
0337f585f57fc8 arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-06-22  2594   * be write-protected.
0337f585f57fc8 arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-06-22  2595   */
0337f585f57fc8 arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-06-22  2596  int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn, bool can_unsync)
4731d4c7a07769 arch/x86/kvm/mmu.c     Marcelo Tosatti     2008-09-23  2597  {
5c520e90af3ad5 arch/x86/kvm/mmu.c     Xiao Guangrong      2016-02-24  2598  	struct kvm_mmu_page *sp;
985bc2aab52bde arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-08-12  2599  	bool locked = false;
9cf5cf5ad43b29 arch/x86/kvm/mmu.c     Xiao Guangrong      2010-05-24  2600  
0337f585f57fc8 arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-06-22  2601  	/*
0337f585f57fc8 arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-06-22  2602  	 * Force write-protection if the page is being tracked.  Note, the page
0337f585f57fc8 arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-06-22  2603  	 * track machinery is used to write-protect upper-level shadow pages,
0337f585f57fc8 arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-06-22  2604  	 * i.e. this guards the role.level == 4K assertion below!
0337f585f57fc8 arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-06-22  2605  	 */
3d0c27ad6ee465 arch/x86/kvm/mmu.c     Xiao Guangrong      2016-02-24  2606  	if (kvm_page_track_is_active(vcpu, gfn, KVM_PAGE_TRACK_WRITE))
0337f585f57fc8 arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-06-22  2607  		return -EPERM;
3d0c27ad6ee465 arch/x86/kvm/mmu.c     Xiao Guangrong      2016-02-24  2608  
0337f585f57fc8 arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-06-22  2609  	/*
0337f585f57fc8 arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-06-22  2610  	 * The page is not write-tracked, mark existing shadow pages unsync
0337f585f57fc8 arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-06-22  2611  	 * unless KVM is synchronizing an unsync SP (can_unsync = false).  In
0337f585f57fc8 arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-06-22  2612  	 * that case, KVM must complete emulation of the guest TLB flush before
0337f585f57fc8 arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-06-22  2613  	 * allowing shadow pages to become unsync (writable by the guest).
0337f585f57fc8 arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-06-22  2614  	 */
5c520e90af3ad5 arch/x86/kvm/mmu.c     Xiao Guangrong      2016-02-24  2615  	for_each_gfn_indirect_valid_sp(vcpu->kvm, sp, gfn) {
36a2e6774bfb5f arch/x86/kvm/mmu.c     Xiao Guangrong      2010-06-30  2616  		if (!can_unsync)
0337f585f57fc8 arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-06-22  2617  			return -EPERM;
36a2e6774bfb5f arch/x86/kvm/mmu.c     Xiao Guangrong      2010-06-30  2618  
5c520e90af3ad5 arch/x86/kvm/mmu.c     Xiao Guangrong      2016-02-24  2619  		if (sp->unsync)
5c520e90af3ad5 arch/x86/kvm/mmu.c     Xiao Guangrong      2016-02-24  2620  			continue;
9cf5cf5ad43b29 arch/x86/kvm/mmu.c     Xiao Guangrong      2010-05-24  2621  
985bc2aab52bde arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-08-12  2622  		/*
985bc2aab52bde arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-08-12  2623  		 * TDP MMU page faults require an additional spinlock as they
985bc2aab52bde arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-08-12  2624  		 * run with mmu_lock held for read, not write, and the unsync
985bc2aab52bde arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-08-12  2625  		 * logic is not thread safe.  Take the spinklock regardless of
985bc2aab52bde arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-08-12  2626  		 * the MMU type to avoid extra conditionals/parameters, there's
985bc2aab52bde arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-08-12  2627  		 * no meaningful penalty if mmu_lock is held for write.
985bc2aab52bde arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-08-12  2628  		 */
985bc2aab52bde arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-08-12  2629  		if (!locked) {
985bc2aab52bde arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-08-12  2630  			locked = true;
985bc2aab52bde arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-08-12 @2631  			spin_lock(&vcpu->kvm->arch.mmu_unsync_pages_lock);
985bc2aab52bde arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-08-12  2632  
985bc2aab52bde arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-08-12  2633  			/*
985bc2aab52bde arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-08-12  2634  			 * Recheck after taking the spinlock, a different vCPU
985bc2aab52bde arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-08-12  2635  			 * may have since marked the page unsync.  A false
985bc2aab52bde arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-08-12  2636  			 * positive on the unprotected check above is not
985bc2aab52bde arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-08-12  2637  			 * possible as clearing sp->unsync _must_ hold mmu_lock
985bc2aab52bde arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-08-12  2638  			 * for write, i.e. unsync cannot transition from 0->1
985bc2aab52bde arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-08-12  2639  			 * while this CPU holds mmu_lock for read (or write).
985bc2aab52bde arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-08-12  2640  			 */
985bc2aab52bde arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-08-12  2641  			if (READ_ONCE(sp->unsync))
985bc2aab52bde arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-08-12  2642  				continue;
985bc2aab52bde arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-08-12  2643  		}
985bc2aab52bde arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-08-12  2644  
3bae0459bcd559 arch/x86/kvm/mmu/mmu.c Sean Christopherson 2020-04-27  2645  		WARN_ON(sp->role.level != PG_LEVEL_4K);
5c520e90af3ad5 arch/x86/kvm/mmu.c     Xiao Guangrong      2016-02-24  2646  		kvm_unsync_page(vcpu, sp);
9cf5cf5ad43b29 arch/x86/kvm/mmu.c     Xiao Guangrong      2010-05-24  2647  	}
985bc2aab52bde arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-08-12  2648  	if (locked)
985bc2aab52bde arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-08-12  2649  		spin_unlock(&vcpu->kvm->arch.mmu_unsync_pages_lock);
3d0c27ad6ee465 arch/x86/kvm/mmu.c     Xiao Guangrong      2016-02-24  2650  
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2651  	/*
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2652  	 * We need to ensure that the marking of unsync pages is visible
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2653  	 * before the SPTE is updated to allow writes because
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2654  	 * kvm_mmu_sync_roots() checks the unsync flags without holding
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2655  	 * the MMU lock and so can race with this. If the SPTE was updated
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2656  	 * before the page had been marked as unsync-ed, something like the
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2657  	 * following could happen:
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2658  	 *
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2659  	 * CPU 1                    CPU 2
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2660  	 * ---------------------------------------------------------------------
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2661  	 * 1.2 Host updates SPTE
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2662  	 *     to be writable
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2663  	 *                      2.1 Guest writes a GPTE for GVA X.
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2664  	 *                          (GPTE being in the guest page table shadowed
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2665  	 *                           by the SP from CPU 1.)
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2666  	 *                          This reads SPTE during the page table walk.
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2667  	 *                          Since SPTE.W is read as 1, there is no
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2668  	 *                          fault.
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2669  	 *
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2670  	 *                      2.2 Guest issues TLB flush.
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2671  	 *                          That causes a VM Exit.
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2672  	 *
0337f585f57fc8 arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-06-22  2673  	 *                      2.3 Walking of unsync pages sees sp->unsync is
0337f585f57fc8 arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-06-22  2674  	 *                          false and skips the page.
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2675  	 *
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2676  	 *                      2.4 Guest accesses GVA X.
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2677  	 *                          Since the mapping in the SP was not updated,
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2678  	 *                          so the old mapping for GVA X incorrectly
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2679  	 *                          gets used.
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2680  	 * 1.1 Host marks SP
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2681  	 *     as unsync
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2682  	 *     (sp->unsync = true)
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2683  	 *
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2684  	 * The write barrier below ensures that 1.1 happens before 1.2 and thus
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2685  	 * the situation in 2.4 does not arise. The implicit barrier in 2.2
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2686  	 * pairs with this write barrier.
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2687  	 */
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2688  	smp_wmb();
578e1c4db22135 arch/x86/kvm/mmu.c     Junaid Shahid       2018-06-27  2689  
0337f585f57fc8 arch/x86/kvm/mmu/mmu.c Sean Christopherson 2021-06-22  2690  	return 0;
4731d4c7a07769 arch/x86/kvm/mmu.c     Marcelo Tosatti     2008-09-23  2691  }
4731d4c7a07769 arch/x86/kvm/mmu.c     Marcelo Tosatti     2008-09-23  2692  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 32288 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-08-13  7:36 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-12 18:18 [PATCH v2] KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlock Sean Christopherson
2021-08-12 18:29 ` Ben Gardon
2021-08-13  7:36 ` Paolo Bonzini
2021-08-13  1:20 kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.