linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] KVM: Don't actually set a request when evicting vCPUs for GFN cache invd
@ 2022-02-23 16:53 Sean Christopherson
  2022-02-23 17:01 ` David Woodhouse
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Sean Christopherson @ 2022-02-23 16:53 UTC (permalink / raw)
  To: Paolo Bonzini, Christian Borntraeger, Janosch Frank
  Cc: David Hildenbrand, Claudio Imbrenda, kvm, linux-kernel,
	David Woodhouse, Sean Christopherson

Don't actually set a request bit in vcpu->requests when making a request
purely to force a vCPU to exit the guest.  Logging a request but not
actually consuming it would cause the vCPU to get stuck in an infinite
loop during KVM_RUN because KVM would see the pending request and bail
from VM-Enter to service the request.

Note, it's currently impossible for KVM to set KVM_REQ_GPC_INVALIDATE as
nothing in KVM is wired up to set guest_uses_pa=true.  But, it'd be all
too easy for arch code to introduce use of kvm_gfn_to_pfn_cache_init()
without implementing handling of the request, especially since getting
test coverage of MMU notifier interaction with specific KVM features
usually requires a directed test.

Opportunistically rename gfn_to_pfn_cache_invalidate_start()'s wake_vcpus
to evict_vcpus.  The purpose of the request is to get vCPUs out of guest
mode, it's supposed to _avoid_ waking vCPUs that are blocking.

Opportunistically rename KVM_REQ_GPC_INVALIDATE to be more specific as to
what it wants to accomplish, and to genericize the name so that it can
used for similar but unrelated scenarios, should they arise in the future.
Add a comment and documentation to explain why the "no action" request
exists.

Add compile-time assertions to help detect improper usage.  Use the inner
assertless helper in the one s390 path that makes requests without a
hardcoded request.

Cc: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---

 v2:
  - Rewrite changelog and drop Fixes:, no bug currently exists. [David]
  - Rebase to kvm/queue.

 Documentation/virt/kvm/vcpu-requests.rst | 10 +++++++
 arch/s390/kvm/kvm-s390.c                 |  2 +-
 include/linux/kvm_host.h                 | 38 +++++++++++++++++++-----
 virt/kvm/kvm_main.c                      |  3 +-
 virt/kvm/pfncache.c                      | 18 ++++++-----
 5 files changed, 55 insertions(+), 16 deletions(-)

diff --git a/Documentation/virt/kvm/vcpu-requests.rst b/Documentation/virt/kvm/vcpu-requests.rst
index ad2915ef7020..f3d38e8a1fb3 100644
--- a/Documentation/virt/kvm/vcpu-requests.rst
+++ b/Documentation/virt/kvm/vcpu-requests.rst
@@ -136,6 +136,16 @@ KVM_REQ_UNHALT
   such as a pending signal, which does not indicate the VCPU's halt
   emulation should stop, and therefore does not make the request.
 
+KVM_REQ_OUTSIDE_GUEST_MODE
+
+  This "request" ensures the target vCPU has exited guest mode prior to the
+  sender of the request continuing on.  No action needs be taken by the target,
+  and so no request is actually logged for the target.  This request is similar
+  to a "kick", but unlike a kick it guarantees the vCPU has actually exited
+  guest mode.  A kick only guarantees the vCPU will exit at some point in the
+  future, e.g. a previous kick may have started the process, but there's no
+  guarantee the to-be-kicked vCPU has fully exited guest mode.
+
 KVM_REQUEST_MASK
 ----------------
 
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 577f1ead6a51..54b7e0017208 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -3371,7 +3371,7 @@ void exit_sie(struct kvm_vcpu *vcpu)
 /* Kick a guest cpu out of SIE to process a request synchronously */
 void kvm_s390_sync_request(int req, struct kvm_vcpu *vcpu)
 {
-	kvm_make_request(req, vcpu);
+	__kvm_make_request(req, vcpu);
 	kvm_s390_vcpu_request(vcpu);
 }
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f11039944c08..45a78abdcd92 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -148,6 +148,7 @@ static inline bool is_error_page(struct page *page)
 #define KVM_REQUEST_MASK           GENMASK(7,0)
 #define KVM_REQUEST_NO_WAKEUP      BIT(8)
 #define KVM_REQUEST_WAIT           BIT(9)
+#define KVM_REQUEST_NO_ACTION      BIT(10)
 /*
  * Architecture-independent vcpu->requests bit members
  * Bits 4-7 are reserved for more arch-independent bits.
@@ -157,9 +158,18 @@ static inline bool is_error_page(struct page *page)
 #define KVM_REQ_UNBLOCK           2
 #define KVM_REQ_UNHALT            3
 #define KVM_REQ_VM_DEAD           (4 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
-#define KVM_REQ_GPC_INVALIDATE    (5 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQUEST_ARCH_BASE     8
 
+/*
+ * KVM_REQ_OUTSIDE_GUEST_MODE exists is purely as way to force the vCPU to
+ * OUTSIDE_GUEST_MODE.  KVM_REQ_OUTSIDE_GUEST_MODE differs from a vCPU "kick"
+ * in that it ensures the vCPU has reached OUTSIDE_GUEST_MODE before continuing
+ * on.  A kick only guarantees that the vCPU is on its way out, e.g. a previous
+ * kick may have set vcpu->mode to EXITING_GUEST_MODE, and so there's no
+ * guarantee the vCPU received an IPI and has actually exited guest mode.
+ */
+#define KVM_REQ_OUTSIDE_GUEST_MODE	(KVM_REQUEST_NO_ACTION | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+
 #define KVM_ARCH_REQ_FLAGS(nr, flags) ({ \
 	BUILD_BUG_ON((unsigned)(nr) >= (sizeof_field(struct kvm_vcpu, requests) * 8) - KVM_REQUEST_ARCH_BASE); \
 	(unsigned)(((nr) + KVM_REQUEST_ARCH_BASE) | (flags)); \
@@ -1223,7 +1233,9 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn);
  * @vcpu:	   vCPU to be used for marking pages dirty and to be woken on
  *		   invalidation.
  * @guest_uses_pa: indicates that the resulting host physical PFN is used while
- *		   @vcpu is IN_GUEST_MODE so invalidations should wake it.
+ *		   @vcpu is IN_GUEST_MODE; invalidations of the cache from MMU
+ *		   notifiers (but not for KVM memslot changes!) will also force
+ *		   @vcpu to exit the guest to refresh the cache.
  * @kernel_map:    requests a kernel virtual mapping (kmap / memremap).
  * @gpa:	   guest physical address to map.
  * @len:	   sanity check; the range being access must fit a single page.
@@ -1234,10 +1246,9 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn);
  *                 -EFAULT for an untranslatable guest physical address.
  *
  * This primes a gfn_to_pfn_cache and links it into the @kvm's list for
- * invalidations to be processed. Invalidation callbacks to @vcpu using
- * %KVM_REQ_GPC_INVALIDATE will occur only for MMU notifiers, not for KVM
- * memslot changes. Callers are required to use kvm_gfn_to_pfn_cache_check()
- * to ensure that the cache is valid before accessing the target page.
+ * invalidations to be processed.  Callers are required to use
+ * kvm_gfn_to_pfn_cache_check() to ensure that the cache is valid before
+ * accessing the target page.
  */
 int kvm_gfn_to_pfn_cache_init(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 			      struct kvm_vcpu *vcpu, bool guest_uses_pa,
@@ -1986,7 +1997,7 @@ static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
 
 void kvm_arch_irq_routing_update(struct kvm *kvm);
 
-static inline void kvm_make_request(int req, struct kvm_vcpu *vcpu)
+static inline void __kvm_make_request(int req, struct kvm_vcpu *vcpu)
 {
 	/*
 	 * Ensure the rest of the request is published to kvm_check_request's
@@ -1996,6 +2007,19 @@ static inline void kvm_make_request(int req, struct kvm_vcpu *vcpu)
 	set_bit(req & KVM_REQUEST_MASK, (void *)&vcpu->requests);
 }
 
+static __always_inline void kvm_make_request(int req, struct kvm_vcpu *vcpu)
+{
+	/*
+	 * Request that don't require vCPU action should never be logged in
+	 * vcpu->requests.  The vCPU won't clear the request, so it will stay
+	 * logged indefinitely and prevent the vCPU from entering the guest.
+	 */
+	BUILD_BUG_ON(!__builtin_constant_p(req) ||
+		     (req & KVM_REQUEST_NO_ACTION));
+
+	__kvm_make_request(req, vcpu);
+}
+
 static inline bool kvm_request_pending(struct kvm_vcpu *vcpu)
 {
 	return READ_ONCE(vcpu->requests);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 83c57bcc6eb6..4e19c5a44e7e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -251,7 +251,8 @@ static void kvm_make_vcpu_request(struct kvm_vcpu *vcpu, unsigned int req,
 {
 	int cpu;
 
-	kvm_make_request(req, vcpu);
+	if (likely(!(req & KVM_REQUEST_NO_ACTION)))
+		__kvm_make_request(req, vcpu);
 
 	if (!(req & KVM_REQUEST_NO_WAKEUP) && kvm_vcpu_wake_up(vcpu))
 		return;
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index ce878f4be4da..5c21f81e5491 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -27,7 +27,7 @@ void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, unsigned long start,
 {
 	DECLARE_BITMAP(vcpu_bitmap, KVM_MAX_VCPUS);
 	struct gfn_to_pfn_cache *gpc;
-	bool wake_vcpus = false;
+	bool evict_vcpus = false;
 
 	spin_lock(&kvm->gpc_lock);
 	list_for_each_entry(gpc, &kvm->gpc_list, list) {
@@ -40,11 +40,11 @@ void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, unsigned long start,
 
 			/*
 			 * If a guest vCPU could be using the physical address,
-			 * it needs to be woken.
+			 * it needs to be forced out of guest mode.
 			 */
 			if (gpc->guest_uses_pa) {
-				if (!wake_vcpus) {
-					wake_vcpus = true;
+				if (!evict_vcpus) {
+					evict_vcpus = true;
 					bitmap_zero(vcpu_bitmap, KVM_MAX_VCPUS);
 				}
 				__set_bit(gpc->vcpu->vcpu_idx, vcpu_bitmap);
@@ -67,14 +67,18 @@ void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, unsigned long start,
 	}
 	spin_unlock(&kvm->gpc_lock);
 
-	if (wake_vcpus) {
-		unsigned int req = KVM_REQ_GPC_INVALIDATE;
+	if (evict_vcpus) {
+		/*
+		 * KVM needs to ensure the vCPU is fully out of guest context
+		 * before allowing the invalidation to continue.
+		 */
+		unsigned int req = KVM_REQ_OUTSIDE_GUEST_MODE;
 		bool called;
 
 		/*
 		 * If the OOM reaper is active, then all vCPUs should have
 		 * been stopped already, so perform the request without
-		 * KVM_REQUEST_WAIT and be sad if any needed to be woken.
+		 * KVM_REQUEST_WAIT and be sad if any needed to be IPI'd.
 		 */
 		if (!may_block)
 			req &= ~KVM_REQUEST_WAIT;

base-commit: 2b5c12735f5123c70093f06b4c36e49ef8a4fee2
-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] KVM: Don't actually set a request when evicting vCPUs for GFN cache invd
  2022-02-23 16:53 [PATCH v2] KVM: Don't actually set a request when evicting vCPUs for GFN cache invd Sean Christopherson
@ 2022-02-23 17:01 ` David Woodhouse
  2022-02-23 17:08   ` Sean Christopherson
  2022-02-24 10:14 ` [EXTERNAL] " David Woodhouse
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: David Woodhouse @ 2022-02-23 17:01 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Christian Borntraeger, Janosch Frank
  Cc: David Hildenbrand, Claudio Imbrenda, kvm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2255 bytes --]

On Wed, 2022-02-23 at 16:53 +0000, Sean Christopherson wrote:
> Don't actually set a request bit in vcpu->requests when making a request
> purely to force a vCPU to exit the guest.  Logging a request but not
> actually consuming it would cause the vCPU to get stuck in an infinite
> loop during KVM_RUN because KVM would see the pending request and bail
> from VM-Enter to service the request.
> 
> Note, it's currently impossible for KVM to set KVM_REQ_GPC_INVALIDATE as
> nothing in KVM is wired up to set guest_uses_pa=true.  But, it'd be all
> too easy for arch code to introduce use of kvm_gfn_to_pfn_cache_init()
> without implementing handling of the request, especially since getting
> test coverage of MMU notifier interaction with specific KVM features
> usually requires a directed test.
> 
> Opportunistically rename gfn_to_pfn_cache_invalidate_start()'s wake_vcpus
> to evict_vcpus.  The purpose of the request is to get vCPUs out of guest
> mode, it's supposed to _avoid_ waking vCPUs that are blocking.
> 
> Opportunistically rename KVM_REQ_GPC_INVALIDATE to be more specific as to
> what it wants to accomplish, and to genericize the name so that it can
> used for similar but unrelated scenarios, should they arise in the future.
> Add a comment and documentation to explain why the "no action" request
> exists.
> 
> Add compile-time assertions to help detect improper usage.  Use the inner
> assertless helper in the one s390 path that makes requests without a
> hardcoded request.
> 
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> 
>  v2:
>   - Rewrite changelog and drop Fixes:, no bug currently exists. [David]
>   - Rebase to kvm/queue.

Acked-by: David Woodhouse <dwmw@amazon.co.uk>

But please could you rebase it on your other patch, which I have
included into my xen-evtchn-kernel branch 
https://git.infradead.org/users/dwmw2/linux.git/commitdiff/48bc5fddd6ed
https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/xen-evtchn-kernel

I'll be sending that out as a proper 'v1' series just as soon as I've
finished writing the unit tests; the rest I think is ready (and working
with actual guests in testing).

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] KVM: Don't actually set a request when evicting vCPUs for GFN cache invd
  2022-02-23 17:01 ` David Woodhouse
@ 2022-02-23 17:08   ` Sean Christopherson
  0 siblings, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2022-02-23 17:08 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	David Hildenbrand, Claudio Imbrenda, kvm, linux-kernel

On Wed, Feb 23, 2022, David Woodhouse wrote:
> But please could you rebase it on your other patch, which I have
> included into my xen-evtchn-kernel branch 
> https://git.infradead.org/users/dwmw2/linux.git/commitdiff/48bc5fddd6ed
> https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/xen-evtchn-kernel

Ya, will do.  I almost did that straightaway, but was like, "it's just a minor
conflict..."

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [EXTERNAL] [PATCH v2] KVM: Don't actually set a request when evicting vCPUs for GFN cache invd
  2022-02-23 16:53 [PATCH v2] KVM: Don't actually set a request when evicting vCPUs for GFN cache invd Sean Christopherson
  2022-02-23 17:01 ` David Woodhouse
@ 2022-02-24 10:14 ` David Woodhouse
       [not found] ` <2547e9675d855449bc5cc7efb97251d6286a377c.camel@amazon.co.uk>
  2022-03-08 15:41 ` [PATCH v2] KVM: Don't actually set a request when evicting vCPUs for GFN cache invd Paolo Bonzini
  3 siblings, 0 replies; 13+ messages in thread
From: David Woodhouse @ 2022-02-24 10:14 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Christian Borntraeger, Janosch Frank
  Cc: David Hildenbrand, Claudio Imbrenda, kvm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 586 bytes --]

On Wed, 2022-02-23 at 16:53 +0000, Sean Christopherson wrote:
> +       /*
> +        * Request that don't require vCPU action should never be logged in
> +        * vcpu->requests.  The vCPU won't clear the request, so it will stay
> +        * logged indefinitely and prevent the vCPU from entering the guest.
> +        */
> +       BUILD_BUG_ON(!__builtin_constant_p(req) ||
> +                    (req & KVM_REQUEST_NO_ACTION));


Should KVM_REQ_UNHALT be one of these? Might have saved me a number of
hours of debugging the SCHEDOP_poll patch I'm about to repost...


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] KVM: Don't actually set a request when evicting vCPUs for GFN cache invd
       [not found] ` <2547e9675d855449bc5cc7efb97251d6286a377c.camel@amazon.co.uk>
@ 2022-02-25 16:13   ` Sean Christopherson
  2022-02-25 16:59     ` [EXTERNAL] " David Woodhouse
  0 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2022-02-25 16:13 UTC (permalink / raw)
  To: Woodhouse, David
  Cc: borntraeger, pbonzini, frankja, kvm, imbrenda, david, linux-kernel

On Fri, Feb 25, 2022, Woodhouse, David wrote:
> On Wed, 2022-02-23 at 16:53 +0000, Sean Christopherson wrote:
> > Don't actually set a request bit in vcpu->requests when making a request
> > purely to force a vCPU to exit the guest.  Logging a request but not
> > actually consuming it would cause the vCPU to get stuck in an infinite
> > loop during KVM_RUN because KVM would see the pending request and bail
> > from VM-Enter to service the request.
> 
> Hm, it might be that we *do* want to do some work.
> 
> I think there's a problem with the existing kvm_host_map that we
> haven't yet resolved with the new gfn_to_pfn_cache.
> 
> Look for the calls to 'kvm_vcpu_unmap(…, true)' in e.g. vmx/nested.c
> 
> Now, what if a vCPU is in guest mode, doesn't vmexit back to the L1,
> its userspace thread takes a signal and returns to userspace.
> 
> The pages referenced by those maps may have been written, but because
> the cache is still valid, they haven't been marked as dirty in the KVM
> dirty logs yet.
> 
> So, a traditional live migration workflow once it reaches convergence
> would pause the vCPUs, copy the final batch of dirty pages to the
> destination, then destroy the VM on the source.
> 
> And AFAICT those mapped pages don't actually get marked dirty until
> nested_vmx_free_cpu() calls vmx_leave_nested(). Which will probably
> trigger the dirty log WARN now, since there's no active vCPU context
> for logging, right?
> 
> And the latest copy of those pages never does get copied to the
> destination.
> 
> Since I didn't spot that problem until today, the pfn_to_gfn_cache
> design inherited it too. The 'dirty' flag remains set in the GPC until
> a subsequent revalidate or explicit unmap.
> 
> Since we need an active vCPU context to do dirty logging (thanks, dirty
> ring)... and since any time vcpu_run exits to userspace for any reason
> might be the last time we ever get an active vCPU context... I think
> that kind of fundamentally means that we must flush dirty state to the
> log on *every* return to userspace, doesn't it?

I would rather add a variant of mark_page_dirty_in_slot() that takes a vCPU, which
we whould have in all cases.  I see no reason to require use of kvm_get_running_vcpu().

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [EXTERNAL] [PATCH v2] KVM: Don't actually set a request when evicting vCPUs for GFN cache invd
  2022-02-25 16:13   ` Sean Christopherson
@ 2022-02-25 16:59     ` David Woodhouse
  2022-02-25 17:27       ` Sean Christopherson
  0 siblings, 1 reply; 13+ messages in thread
From: David Woodhouse @ 2022-02-25 16:59 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: borntraeger, pbonzini, frankja, kvm, imbrenda, david, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1573 bytes --]

On Fri, 2022-02-25 at 16:13 +0000, Sean Christopherson wrote:
> On Fri, Feb 25, 2022, Woodhouse, David wrote:
> > Since we need an active vCPU context to do dirty logging (thanks, dirty
> > ring)... and since any time vcpu_run exits to userspace for any reason
> > might be the last time we ever get an active vCPU context... I think
> > that kind of fundamentally means that we must flush dirty state to the
> > log on *every* return to userspace, doesn't it?
> 
> I would rather add a variant of mark_page_dirty_in_slot() that takes a vCPU, which
> we whould have in all cases.  I see no reason to require use of kvm_get_running_vcpu().

We already have kvm_vcpu_mark_page_dirty(), but it can't use just 'some
vcpu' because the dirty ring is lockless. So if you're ever going to
use anything other than kvm_get_running_vcpu() we need to add locks.

And while we *could* do that, I don't think it would negate the
fundamental observation that *any* time we return from vcpu_run to
userspace, that could be the last time. Userspace might read the dirty
log for the *last* time, and any internally-cached "oh, at some point
we need to mark <this> page dirty" is lost because by the time the vCPU
is finally destroyed, it's too late.

I think I'm going to rip out the 'dirty' flag from the gfn_to_pfn_cache
completely and add a function (to be called with an active vCPU
context) which marks the page dirty *now*.

KVM_GUEST_USES_PFN users like nested VMX will be expected to do this
before returning from vcpu_run anytime it's in L2 guest mode. 

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [EXTERNAL] [PATCH v2] KVM: Don't actually set a request when evicting vCPUs for GFN cache invd
  2022-02-25 16:59     ` [EXTERNAL] " David Woodhouse
@ 2022-02-25 17:27       ` Sean Christopherson
  2022-02-25 18:41         ` David Woodhouse
  0 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2022-02-25 17:27 UTC (permalink / raw)
  To: David Woodhouse
  Cc: borntraeger, pbonzini, frankja, kvm, imbrenda, david, linux-kernel

On Fri, Feb 25, 2022, David Woodhouse wrote:
> On Fri, 2022-02-25 at 16:13 +0000, Sean Christopherson wrote:
> > On Fri, Feb 25, 2022, Woodhouse, David wrote:
> > > Since we need an active vCPU context to do dirty logging (thanks, dirty
> > > ring)... and since any time vcpu_run exits to userspace for any reason
> > > might be the last time we ever get an active vCPU context... I think
> > > that kind of fundamentally means that we must flush dirty state to the
> > > log on *every* return to userspace, doesn't it?
> > 
> > I would rather add a variant of mark_page_dirty_in_slot() that takes a vCPU, which
> > we whould have in all cases.  I see no reason to require use of kvm_get_running_vcpu().
> 
> We already have kvm_vcpu_mark_page_dirty(), but it can't use just 'some
> vcpu' because the dirty ring is lockless. So if you're ever going to
> use anything other than kvm_get_running_vcpu() we need to add locks.

Heh, actually, scratch my previous comment.  I was going to respond that
kvm_get_running_vcpu() is mutually exclusive with all other ioctls() on the same
vCPU by virtue of vcpu->mutex, but I had forgotten that kvm_get_running_vcpu()
really should be "kvm_get_loaded_vcpu()".  I.e. as long as KVM is in a vCPU-ioctl
path, kvm_get_running_vcpu() will be non-null.

> And while we *could* do that, I don't think it would negate the
> fundamental observation that *any* time we return from vcpu_run to
> userspace, that could be the last time. Userspace might read the dirty
> log for the *last* time, and any internally-cached "oh, at some point
> we need to mark <this> page dirty" is lost because by the time the vCPU
> is finally destroyed, it's too late.

Hmm, isn't that an existing bug?  I think the correct fix would be to flush all
dirty vmcs12 pages to the memslot in vmx_get_nested_state().  Userspace _must_
invoke that if it wants to migrated a nested vCPU.

> I think I'm going to rip out the 'dirty' flag from the gfn_to_pfn_cache
> completely and add a function (to be called with an active vCPU
> context) which marks the page dirty *now*.

Hrm, something like?

  1. Drop @dirty from kvm_gfn_to_pfn_cache_init()
  2. Rename @dirty => @old_dirty in kvm_gfn_to_pfn_cache_refresh()
  3. Add an API to mark the associated slot dirty without unmapping

I think that makes sense.

> KVM_GUEST_USES_PFN users like nested VMX will be expected to do this
> before returning from vcpu_run anytime it's in L2 guest mode. 

As above, I think the correct thing to do is enlightent the flows that retrieve
the state being cached.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [EXTERNAL] [PATCH v2] KVM: Don't actually set a request when evicting vCPUs for GFN cache invd
  2022-02-25 17:27       ` Sean Christopherson
@ 2022-02-25 18:41         ` David Woodhouse
  2022-02-25 18:52           ` Sean Christopherson
  0 siblings, 1 reply; 13+ messages in thread
From: David Woodhouse @ 2022-02-25 18:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: borntraeger, pbonzini, frankja, kvm, imbrenda, david, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3452 bytes --]

On Fri, 2022-02-25 at 17:27 +0000, Sean Christopherson wrote:
> On Fri, Feb 25, 2022, David Woodhouse wrote:
> > On Fri, 2022-02-25 at 16:13 +0000, Sean Christopherson wrote:
> > > On Fri, Feb 25, 2022, Woodhouse, David wrote:
> > > > Since we need an active vCPU context to do dirty logging (thanks, dirty
> > > > ring)... and since any time vcpu_run exits to userspace for any reason
> > > > might be the last time we ever get an active vCPU context... I think
> > > > that kind of fundamentally means that we must flush dirty state to the
> > > > log on *every* return to userspace, doesn't it?
> > > 
> > > I would rather add a variant of mark_page_dirty_in_slot() that takes a vCPU, which
> > > we whould have in all cases.  I see no reason to require use of kvm_get_running_vcpu().
> > 
> > We already have kvm_vcpu_mark_page_dirty(), but it can't use just 'some
> > vcpu' because the dirty ring is lockless. So if you're ever going to
> > use anything other than kvm_get_running_vcpu() we need to add locks.
> 
> Heh, actually, scratch my previous comment.  I was going to respond that
> kvm_get_running_vcpu() is mutually exclusive with all other ioctls() on the same
> vCPU by virtue of vcpu->mutex, but I had forgotten that kvm_get_running_vcpu()
> really should be "kvm_get_loaded_vcpu()".  I.e. as long as KVM is in a vCPU-ioctl
> path, kvm_get_running_vcpu() will be non-null.
> 
> > And while we *could* do that, I don't think it would negate the
> > fundamental observation that *any* time we return from vcpu_run to
> > userspace, that could be the last time. Userspace might read the dirty
> > log for the *last* time, and any internally-cached "oh, at some point
> > we need to mark <this> page dirty" is lost because by the time the vCPU
> > is finally destroyed, it's too late.
> 
> Hmm, isn't that an existing bug?  I think the correct fix would be to flush all
> dirty vmcs12 pages to the memslot in vmx_get_nested_state().  Userspace _must_
> invoke that if it wants to migrated a nested vCPU.

Yes, AFAICT it's an existing bug in the way the kvm_host_map code works
today. Your suggestion makes sense as *long* as we consider it OK to
retrospectively document that userspace must extract the nested state
*before* doing the final read of the dirty log.

I am not aware that we have a clearly documented "the dirty log may
keep changing until XXX" anyway. But you're proposing that we change
it, I think. There may well be VMMs which assume that no pages will be
dirtied unless they are actually *running* a vCPU.

Which is why I was proposing that we flush the dirty status to the log
*every* time we leave vcpu_run back to userspace. But I'll not die on
that hill, if you make a good case for your proposal being OK.

> > I think I'm going to rip out the 'dirty' flag from the gfn_to_pfn_cache
> > completely and add a function (to be called with an active vCPU
> > context) which marks the page dirty *now*.
> 
> Hrm, something like?
> 
>   1. Drop @dirty from kvm_gfn_to_pfn_cache_init()
>   2. Rename @dirty => @old_dirty in kvm_gfn_to_pfn_cache_refresh()
>   3. Add an API to mark the associated slot dirty without unmapping
> 
> I think that makes sense.

Except I'll drop 'dirty' from kvm_gfn_to_pfn_cache_refresh() too.
There's no scope for a deferred "oh, I meant to tell you that was
dirty" even in that case, is there? Use the API we add in your #3.



[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [EXTERNAL] [PATCH v2] KVM: Don't actually set a request when evicting vCPUs for GFN cache invd
  2022-02-25 18:41         ` David Woodhouse
@ 2022-02-25 18:52           ` Sean Christopherson
  2022-02-25 20:15             ` David Woodhouse
  2022-02-27 15:11             ` [PATCH] KVM: Remove dirty handling from gfn_to_pfn_cache completely David Woodhouse
  0 siblings, 2 replies; 13+ messages in thread
From: Sean Christopherson @ 2022-02-25 18:52 UTC (permalink / raw)
  To: David Woodhouse
  Cc: borntraeger, pbonzini, frankja, kvm, imbrenda, david, linux-kernel

On Fri, Feb 25, 2022, David Woodhouse wrote:
> On Fri, 2022-02-25 at 17:27 +0000, Sean Christopherson wrote:
> > On Fri, Feb 25, 2022, David Woodhouse wrote:
> > > On Fri, 2022-02-25 at 16:13 +0000, Sean Christopherson wrote:
> > > > On Fri, Feb 25, 2022, Woodhouse, David wrote:
> > > > > Since we need an active vCPU context to do dirty logging (thanks, dirty
> > > > > ring)... and since any time vcpu_run exits to userspace for any reason
> > > > > might be the last time we ever get an active vCPU context... I think
> > > > > that kind of fundamentally means that we must flush dirty state to the
> > > > > log on *every* return to userspace, doesn't it?
> > > > 
> > > > I would rather add a variant of mark_page_dirty_in_slot() that takes a vCPU, which
> > > > we whould have in all cases.  I see no reason to require use of kvm_get_running_vcpu().
> > > 
> > > We already have kvm_vcpu_mark_page_dirty(), but it can't use just 'some
> > > vcpu' because the dirty ring is lockless. So if you're ever going to
> > > use anything other than kvm_get_running_vcpu() we need to add locks.
> > 
> > Heh, actually, scratch my previous comment.  I was going to respond that
> > kvm_get_running_vcpu() is mutually exclusive with all other ioctls() on the same
> > vCPU by virtue of vcpu->mutex, but I had forgotten that kvm_get_running_vcpu()
> > really should be "kvm_get_loaded_vcpu()".  I.e. as long as KVM is in a vCPU-ioctl
> > path, kvm_get_running_vcpu() will be non-null.
> > 
> > > And while we *could* do that, I don't think it would negate the
> > > fundamental observation that *any* time we return from vcpu_run to
> > > userspace, that could be the last time. Userspace might read the dirty
> > > log for the *last* time, and any internally-cached "oh, at some point
> > > we need to mark <this> page dirty" is lost because by the time the vCPU
> > > is finally destroyed, it's too late.
> > 
> > Hmm, isn't that an existing bug?  I think the correct fix would be to flush all
> > dirty vmcs12 pages to the memslot in vmx_get_nested_state().  Userspace _must_
> > invoke that if it wants to migrated a nested vCPU.
> 
> Yes, AFAICT it's an existing bug in the way the kvm_host_map code works
> today. Your suggestion makes sense as *long* as we consider it OK to
> retrospectively document that userspace must extract the nested state
> *before* doing the final read of the dirty log.
> 
> I am not aware that we have a clearly documented "the dirty log may
> keep changing until XXX" anyway. But you're proposing that we change
> it, I think. There may well be VMMs which assume that no pages will be
> dirtied unless they are actually *running* a vCPU.
> 
> Which is why I was proposing that we flush the dirty status to the log
> *every* time we leave vcpu_run back to userspace. But I'll not die on
> that hill, if you make a good case for your proposal being OK.

Drat, I didn't consider the ABI aspect.  Flushing on every exit to userspace would
indeed be more robust.

> > > I think I'm going to rip out the 'dirty' flag from the gfn_to_pfn_cache
> > > completely and add a function (to be called with an active vCPU
> > > context) which marks the page dirty *now*.
> > 
> > Hrm, something like?
> > 
> >   1. Drop @dirty from kvm_gfn_to_pfn_cache_init()
> >   2. Rename @dirty => @old_dirty in kvm_gfn_to_pfn_cache_refresh()
> >   3. Add an API to mark the associated slot dirty without unmapping
> > 
> > I think that makes sense.
> 
> Except I'll drop 'dirty' from kvm_gfn_to_pfn_cache_refresh() too.
> There's no scope for a deferred "oh, I meant to tell you that was
> dirty" even in that case, is there? Use the API we add in your #3.

But won't we end up with a bunch of call sites that attempt to determine whether
not the dirty status needs to be flushed?  I'm specifically thinking of scenarios
where the status needs to be conditionally flushed, e.g. if the backing pfn doesn't
change, then it's ok to not mark the page dirty.  Not handling that in the refresh
helper will either lead to unnecessary dirtying or duplicate code/work.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [EXTERNAL] [PATCH v2] KVM: Don't actually set a request when evicting vCPUs for GFN cache invd
  2022-02-25 18:52           ` Sean Christopherson
@ 2022-02-25 20:15             ` David Woodhouse
  2022-02-27 15:11             ` [PATCH] KVM: Remove dirty handling from gfn_to_pfn_cache completely David Woodhouse
  1 sibling, 0 replies; 13+ messages in thread
From: David Woodhouse @ 2022-02-25 20:15 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: borntraeger, pbonzini, frankja, kvm, imbrenda, david, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1833 bytes --]

On Fri, 2022-02-25 at 18:52 +0000, Sean Christopherson wrote:
> On Fri, Feb 25, 2022, David Woodhouse wrote:
> > > > I think I'm going to rip out the 'dirty' flag from the gfn_to_pfn_cache
> > > > completely and add a function (to be called with an active vCPU
> > > > context) which marks the page dirty *now*.
> > > 
> > > Hrm, something like?
> > > 
> > >   1. Drop @dirty from kvm_gfn_to_pfn_cache_init()
> > >   2. Rename @dirty => @old_dirty in kvm_gfn_to_pfn_cache_refresh()
> > >   3. Add an API to mark the associated slot dirty without unmapping
> > > 
> > > I think that makes sense.
> > 
> > Except I'll drop 'dirty' from kvm_gfn_to_pfn_cache_refresh() too.
> > There's no scope for a deferred "oh, I meant to tell you that was
> > dirty" even in that case, is there? Use the API we add in your #3.
> 
> But won't we end up with a bunch of call sites that attempt to determine whether
> not the dirty status needs to be flushed?  I'm specifically thinking of scenarios
> where the status needs to be conditionally flushed, e.g. if the backing pfn doesn't
> change, then it's ok to not mark the page dirty.  Not handling that in the refresh
> helper will either lead to unnecessary dirtying or duplicate code/work.

I don't know that it's ever OK to leave the dirty status cached in the
GPC to be flushed at some unspecified "later" time. Once the vCPU exits
back to userspace, that might be the *last* time it's ever run, and we
have the same ABI issue as (snipped) above, that userspace might have
done its final read of the dirty log, and might not expect anything
else to be marked dirty.

We might be able to defer it until we return to userspace, but no
further. So I don't see flushing it in *refresh* as being useful? And
we *can't* flush it in invalidate anyway, as discussed.


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] KVM: Remove dirty handling from gfn_to_pfn_cache completely
  2022-02-25 18:52           ` Sean Christopherson
  2022-02-25 20:15             ` David Woodhouse
@ 2022-02-27 15:11             ` David Woodhouse
  1 sibling, 0 replies; 13+ messages in thread
From: David Woodhouse @ 2022-02-27 15:11 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: borntraeger, pbonzini, frankja, kvm, imbrenda, david, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 16135 bytes --]

From: David Woodhouse <dwmw@amazon.co.uk>

It isn't OK to cache the dirty status of a page in internal structures
for an indefinite period of time.

Any time a vCPU exits the run loop to userspace might be its last; the
VMM might do its final check of the dirty log, flush the last remaining
dirty pages to the destination and complete a live migration. If we
have internal 'dirty' state which doesn't get flushed until the vCPU
is finally destroyed on the source after migration is complete, then
we have lost data because that will escape the final copy.

This problem already exists with the use of kvm_vcpu_unmap() to mark
pages dirty in e.g. VMX nesting.

Note that the actual Linux MM already considers the page to be dirty
since we have a writeable mapping of it. This is just about the KVM
dirty logging.

Make the PV clock mark the page dirty immediately (which is fine as
it's happening in vCPU context). Document the Xen shinfo/vcpu_info
case more completely as being exempt, because we might dirty those
from interrupt context as we deliver event channels.

For the nesting-style use cases (KVM_GUEST_USES_PFN) we will need to
track which gfn_to_pfn_caches have been used and explicitly mark the
corresponding pages dirty before returning to userspace. But we would
have needed external tracking of that anyway, rather than walking the
full list of GPCs to find those belonging to this vCPU which are dirty.

So let's rely *solely* on that external tracking, and keep it simple
rather than laying a tempting trap for callers to fall into.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---

This sits on top of the Xen event channel series, because that adds a
few users of gfn_to_pfn_cache and I wanted to see it actually being
used this way. If we're OK with this approach, I'll refactor that
series to put this first.

I think this is OK for the KVM_GUEST_USES_PFN cases (mostly nesting)
because, as I note in the commit message, they're going to want to keep
track of which caches they have used *anyway* to avoid walking the full
GPC list on every exit to userspace.

 Documentation/virt/kvm/api.rst |  4 ++++
 arch/x86/kvm/x86.c             |  8 +++----
 arch/x86/kvm/xen.c             | 24 +++++++-------------
 include/linux/kvm_host.h       | 14 +++++-------
 include/linux/kvm_types.h      |  3 +--
 virt/kvm/pfncache.c            | 41 +++++++---------------------------
 6 files changed, 30 insertions(+), 64 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 982fcdf8cfa8..77b631452036 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5288,6 +5288,10 @@ type values:
 
 KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO
   Sets the guest physical address of the vcpu_info for a given vCPU.
+  As with the shared_info page for the VM, the corresponding page may be
+  dirtied at any time if event channel interrupt delivery is enabled, so
+  userspace should always assume that the page is dirty without relying
+  on dirty logging.
 
 KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO
   Sets the guest physical address of an additional pvclock structure
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 04f86cb94069..08317ad5c93b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2237,8 +2237,7 @@ static void kvm_write_system_time(struct kvm_vcpu *vcpu, gpa_t system_time,
 	if (system_time & 1) {
 		kvm_gfn_to_pfn_cache_init(vcpu->kvm, &vcpu->arch.pv_time, vcpu,
 					  KVM_HOST_USES_PFN, system_time & ~1ULL,
-					  sizeof(struct pvclock_vcpu_time_info),
-					  false);
+					  sizeof(struct pvclock_vcpu_time_info));
 	} else {
 		kvm_gfn_to_pfn_cache_destroy(vcpu->kvm, &vcpu->arch.pv_time);
 	}
@@ -2958,8 +2957,7 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
 		read_unlock_irqrestore(&gpc->lock, flags);
 
 		if (kvm_gfn_to_pfn_cache_refresh(v->kvm, gpc, gpc->gpa,
-						 offset + sizeof(*guest_hv_clock),
-						 true))
+						 offset + sizeof(*guest_hv_clock)))
 			return;
 
 		read_lock_irqsave(&gpc->lock, flags);
@@ -2989,6 +2987,8 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
 	smp_wmb();
 
 	guest_hv_clock->version = ++vcpu->hv_clock.version;
+
+	mark_page_dirty_in_slot(v->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT);
 	read_unlock_irqrestore(&gpc->lock, flags);
 
 	trace_kvm_pvclock_update(v->vcpu_id, &vcpu->hv_clock);
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index f59dca40d7c3..71311bb03d53 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -49,7 +49,7 @@ static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn)
 
 	do {
 		ret = kvm_gfn_to_pfn_cache_init(kvm, gpc, NULL, KVM_HOST_USES_PFN,
-						gpa, PAGE_SIZE, false);
+						gpa, PAGE_SIZE);
 		if (ret)
 			goto out;
 
@@ -245,8 +245,7 @@ void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, int state)
 		if (state == RUNSTATE_runnable)
 			return;
 
-		if (kvm_gfn_to_pfn_cache_refresh(v->kvm, gpc, gpc->gpa,
-						 user_len, false))
+		if (kvm_gfn_to_pfn_cache_refresh(v->kvm, gpc, gpc->gpa, user_len))
 			return;
 
 		read_lock_irqsave(&gpc->lock, flags);
@@ -377,8 +376,7 @@ void kvm_xen_inject_pending_events(struct kvm_vcpu *v)
 		read_unlock_irqrestore(&gpc->lock, flags);
 
 		if (kvm_gfn_to_pfn_cache_refresh(v->kvm, gpc, gpc->gpa,
-						 sizeof(struct vcpu_info),
-						 false))
+						 sizeof(struct vcpu_info)))
 			return;
 
 		read_lock_irqsave(&gpc->lock, flags);
@@ -454,8 +452,7 @@ int __kvm_xen_has_interrupt(struct kvm_vcpu *v)
 			return 1;
 
 		if (kvm_gfn_to_pfn_cache_refresh(v->kvm, gpc, gpc->gpa,
-						 sizeof(struct vcpu_info),
-						 false)) {
+						 sizeof(struct vcpu_info))) {
 			/*
 			 * If this failed, userspace has screwed up the
 			 * vcpu_info mapping. No interrupts for you.
@@ -583,7 +580,7 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data)
 		r = kvm_gfn_to_pfn_cache_init(vcpu->kvm,
 					      &vcpu->arch.xen.vcpu_info_cache,
 					      NULL, KVM_HOST_USES_PFN, data->u.gpa,
-					      sizeof(struct vcpu_info), false);
+					      sizeof(struct vcpu_info));
 		if (!r)
 			kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
 
@@ -600,8 +597,7 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data)
 		r = kvm_gfn_to_pfn_cache_init(vcpu->kvm,
 					      &vcpu->arch.xen.vcpu_time_info_cache,
 					      NULL, KVM_HOST_USES_PFN, data->u.gpa,
-					      sizeof(struct pvclock_vcpu_time_info),
-					      false);
+					      sizeof(struct pvclock_vcpu_time_info));
 		if (!r)
 			kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
 		break;
@@ -621,8 +617,7 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data)
 		r = kvm_gfn_to_pfn_cache_init(vcpu->kvm,
 					      &vcpu->arch.xen.runstate_cache,
 					      NULL, KVM_HOST_USES_PFN, data->u.gpa,
-					      sizeof(struct vcpu_runstate_info),
-					      false);
+					      sizeof(struct vcpu_runstate_info));
 		break;
 
 	case KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT:
@@ -1513,8 +1508,7 @@ static int kvm_xen_set_evtchn(struct kvm_xen_evtchn *xe, struct kvm *kvm)
 			break;
 
 		idx = srcu_read_lock(&kvm->srcu);
-		rc = kvm_gfn_to_pfn_cache_refresh(kvm, gpc, gpc->gpa,
-						  PAGE_SIZE, false);
+		rc = kvm_gfn_to_pfn_cache_refresh(kvm, gpc, gpc->gpa, PAGE_SIZE);
 		srcu_read_unlock(&kvm->srcu, idx);
 	} while(!rc);
 
@@ -1829,10 +1823,8 @@ void kvm_xen_destroy_vcpu(struct kvm_vcpu *vcpu)
 
 	kvm_gfn_to_pfn_cache_destroy(vcpu->kvm,
 				     &vcpu->arch.xen.runstate_cache);
-	vcpu->arch.xen.vcpu_info_cache.dirty = false;
 	kvm_gfn_to_pfn_cache_destroy(vcpu->kvm,
 				     &vcpu->arch.xen.vcpu_info_cache);
-	vcpu->arch.xen.vcpu_time_info_cache.dirty = false;
 	kvm_gfn_to_pfn_cache_destroy(vcpu->kvm,
 				     &vcpu->arch.xen.vcpu_time_info_cache);
 	del_timer_sync(&vcpu->arch.xen.poll_timer);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6e5bbb1b3e0d..aa596fd19578 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1228,7 +1228,6 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn);
  *		   by KVM (and thus needs a kernel virtual mapping).
  * @gpa:	   guest physical address to map.
  * @len:	   sanity check; the range being access must fit a single page.
- * @dirty:         mark the cache dirty immediately.
  *
  * @return:	   0 for success.
  *		   -EINVAL for a mapping which would cross a page boundary.
@@ -1242,7 +1241,7 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn);
  */
 int kvm_gfn_to_pfn_cache_init(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 			      struct kvm_vcpu *vcpu, enum pfn_cache_usage usage,
-			      gpa_t gpa, unsigned long len, bool dirty);
+			      gpa_t gpa, unsigned long len);
 
 /**
  * kvm_gfn_to_pfn_cache_check - check validity of a gfn_to_pfn_cache.
@@ -1251,7 +1250,6 @@ int kvm_gfn_to_pfn_cache_init(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
  * @gpc:	   struct gfn_to_pfn_cache object.
  * @gpa:	   current guest physical address to map.
  * @len:	   sanity check; the range being access must fit a single page.
- * @dirty:         mark the cache dirty immediately.
  *
  * @return:	   %true if the cache is still valid and the address matches.
  *		   %false if the cache is not valid.
@@ -1273,7 +1271,6 @@ bool kvm_gfn_to_pfn_cache_check(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
  * @gpc:	   struct gfn_to_pfn_cache object.
  * @gpa:	   updated guest physical address to map.
  * @len:	   sanity check; the range being access must fit a single page.
- * @dirty:         mark the cache dirty immediately.
  *
  * @return:	   0 for success.
  *		   -EINVAL for a mapping which would cross a page boundary.
@@ -1286,7 +1283,7 @@ bool kvm_gfn_to_pfn_cache_check(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
  * with the lock still held to permit access.
  */
 int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
-				 gpa_t gpa, unsigned long len, bool dirty);
+				 gpa_t gpa, unsigned long len);
 
 /**
  * kvm_gfn_to_pfn_cache_unmap - temporarily unmap a gfn_to_pfn_cache.
@@ -1294,10 +1291,9 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
  * @kvm:	   pointer to kvm instance.
  * @gpc:	   struct gfn_to_pfn_cache object.
  *
- * This unmaps the referenced page and marks it dirty, if appropriate. The
- * cache is left in the invalid state but at least the mapping from GPA to
- * userspace HVA will remain cached and can be reused on a subsequent
- * refresh.
+ * This unmaps the referenced page. The cache is left in the invalid state
+ * but at least the mapping from GPA to userspace HVA will remain cached
+ * and can be reused on a subsequent refresh.
  */
 void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, struct gfn_to_pfn_cache *gpc);
 
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index 784f37cbf33e..d1447801fc68 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
+#/* SPDX-License-Identifier: GPL-2.0-only */
 
 #ifndef __KVM_TYPES_H__
 #define __KVM_TYPES_H__
@@ -74,7 +74,6 @@ struct gfn_to_pfn_cache {
 	enum pfn_cache_usage usage;
 	bool active;
 	bool valid;
-	bool dirty;
 };
 
 #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 9b3a192cb18c..d789f2705e5e 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -49,19 +49,6 @@ void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, unsigned long start,
 				}
 				__set_bit(gpc->vcpu->vcpu_idx, vcpu_bitmap);
 			}
-
-			/*
-			 * We cannot call mark_page_dirty() from here because
-			 * this physical CPU might not have an active vCPU
-			 * with which to do the KVM dirty tracking.
-			 *
-			 * Neither is there any point in telling the kernel MM
-			 * that the underlying page is dirty. A vCPU in guest
-			 * mode might still be writing to it up to the point
-			 * where we wake them a few lines further down anyway.
-			 *
-			 * So all the dirty marking happens on the unmap.
-			 */
 		}
 		write_unlock_irq(&gpc->lock);
 	}
@@ -104,8 +91,7 @@ bool kvm_gfn_to_pfn_cache_check(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 }
 EXPORT_SYMBOL_GPL(kvm_gfn_to_pfn_cache_check);
 
-static void __release_gpc(struct kvm *kvm, kvm_pfn_t pfn, void *khva,
-			  gpa_t gpa, bool dirty)
+static void __release_gpc(struct kvm *kvm, kvm_pfn_t pfn, void *khva, gpa_t gpa)
 {
 	/* Unmap the old page if it was mapped before, and release it */
 	if (!is_error_noslot_pfn(pfn)) {
@@ -118,9 +104,7 @@ static void __release_gpc(struct kvm *kvm, kvm_pfn_t pfn, void *khva,
 #endif
 		}
 
-		kvm_release_pfn(pfn, dirty);
-		if (dirty)
-			mark_page_dirty(kvm, gpa);
+		kvm_release_pfn(pfn, false);
 	}
 }
 
@@ -152,7 +136,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, unsigned long uhva)
 }
 
 int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
-				 gpa_t gpa, unsigned long len, bool dirty)
+				 gpa_t gpa, unsigned long len)
 {
 	struct kvm_memslots *slots = kvm_memslots(kvm);
 	unsigned long page_offset = gpa & ~PAGE_MASK;
@@ -160,7 +144,7 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 	unsigned long old_uhva;
 	gpa_t old_gpa;
 	void *old_khva;
-	bool old_valid, old_dirty;
+	bool old_valid;
 	int ret = 0;
 
 	/*
@@ -177,14 +161,12 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 	old_khva = gpc->khva - offset_in_page(gpc->khva);
 	old_uhva = gpc->uhva;
 	old_valid = gpc->valid;
-	old_dirty = gpc->dirty;
 
 	/* If the userspace HVA is invalid, refresh that first */
 	if (gpc->gpa != gpa || gpc->generation != slots->generation ||
 	    kvm_is_error_hva(gpc->uhva)) {
 		gfn_t gfn = gpa_to_gfn(gpa);
 
-		gpc->dirty = false;
 		gpc->gpa = gpa;
 		gpc->generation = slots->generation;
 		gpc->memslot = __gfn_to_memslot(slots, gfn);
@@ -255,14 +237,9 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 	}
 
  out:
-	if (ret)
-		gpc->dirty = false;
-	else
-		gpc->dirty = dirty;
-
 	write_unlock_irq(&gpc->lock);
 
-	__release_gpc(kvm, old_pfn, old_khva, old_gpa, old_dirty);
+	__release_gpc(kvm, old_pfn, old_khva, old_gpa);
 
 	return ret;
 }
@@ -272,7 +249,6 @@ void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, struct gfn_to_pfn_cache *gpc)
 {
 	void *old_khva;
 	kvm_pfn_t old_pfn;
-	bool old_dirty;
 	gpa_t old_gpa;
 
 	write_lock_irq(&gpc->lock);
@@ -280,7 +256,6 @@ void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, struct gfn_to_pfn_cache *gpc)
 	gpc->valid = false;
 
 	old_khva = gpc->khva - offset_in_page(gpc->khva);
-	old_dirty = gpc->dirty;
 	old_gpa = gpc->gpa;
 	old_pfn = gpc->pfn;
 
@@ -293,14 +268,14 @@ void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, struct gfn_to_pfn_cache *gpc)
 
 	write_unlock_irq(&gpc->lock);
 
-	__release_gpc(kvm, old_pfn, old_khva, old_gpa, old_dirty);
+	__release_gpc(kvm, old_pfn, old_khva, old_gpa);
 }
 EXPORT_SYMBOL_GPL(kvm_gfn_to_pfn_cache_unmap);
 
 
 int kvm_gfn_to_pfn_cache_init(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 			      struct kvm_vcpu *vcpu, enum pfn_cache_usage usage,
-			      gpa_t gpa, unsigned long len, bool dirty)
+			      gpa_t gpa, unsigned long len)
 {
 	WARN_ON_ONCE(!usage || (usage & KVM_GUEST_AND_HOST_USE_PFN) != usage);
 
@@ -319,7 +294,7 @@ int kvm_gfn_to_pfn_cache_init(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 		list_add(&gpc->list, &kvm->gpc_list);
 		spin_unlock(&kvm->gpc_lock);
 	}
-	return kvm_gfn_to_pfn_cache_refresh(kvm, gpc, gpa, len, dirty);
+	return kvm_gfn_to_pfn_cache_refresh(kvm, gpc, gpa, len);
 }
 EXPORT_SYMBOL_GPL(kvm_gfn_to_pfn_cache_init);
 
-- 
2.33.1


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] KVM: Don't actually set a request when evicting vCPUs for GFN cache invd
  2022-02-23 16:53 [PATCH v2] KVM: Don't actually set a request when evicting vCPUs for GFN cache invd Sean Christopherson
                   ` (2 preceding siblings ...)
       [not found] ` <2547e9675d855449bc5cc7efb97251d6286a377c.camel@amazon.co.uk>
@ 2022-03-08 15:41 ` Paolo Bonzini
  3 siblings, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2022-03-08 15:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Christian Borntraeger, Janosch Frank, David Hildenbrand,
	Claudio Imbrenda, kvm, linux-kernel, David Woodhouse

Queued, thanks.

Paolo



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] KVM: Don't actually set a request when evicting vCPUs for GFN cache invd
@ 2022-02-25 10:35 Paolo Bonzini
  0 siblings, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2022-02-25 10:35 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: linux-kernel, kvm, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Christian Borntraeger, Janosch Frank,
	David Hildenbrand, Claudio Imbrenda, David Woodhouse

Queued, thanks.

Paolo


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-03-08 15:41 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-23 16:53 [PATCH v2] KVM: Don't actually set a request when evicting vCPUs for GFN cache invd Sean Christopherson
2022-02-23 17:01 ` David Woodhouse
2022-02-23 17:08   ` Sean Christopherson
2022-02-24 10:14 ` [EXTERNAL] " David Woodhouse
     [not found] ` <2547e9675d855449bc5cc7efb97251d6286a377c.camel@amazon.co.uk>
2022-02-25 16:13   ` Sean Christopherson
2022-02-25 16:59     ` [EXTERNAL] " David Woodhouse
2022-02-25 17:27       ` Sean Christopherson
2022-02-25 18:41         ` David Woodhouse
2022-02-25 18:52           ` Sean Christopherson
2022-02-25 20:15             ` David Woodhouse
2022-02-27 15:11             ` [PATCH] KVM: Remove dirty handling from gfn_to_pfn_cache completely David Woodhouse
2022-03-08 15:41 ` [PATCH v2] KVM: Don't actually set a request when evicting vCPUs for GFN cache invd Paolo Bonzini
2022-02-25 10:35 Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).