On Tue, 2021-11-16 at 15:57 +0100, Paolo Bonzini wrote:
> On 11/16/21 15:25, David Woodhouse wrote:
> > +       /*
> > +        * If the guest requires direct access to mapped L1 pages, check
> > +        * the caches are valid. Will raise KVM_REQ_GET_NESTED_STATE_PAGES
> > +        * to go and revalidate them, if necessary.
> > +        */
> > +       if (is_guest_mode(vcpu) && kvm_x86_ops.nested_ops->check_guest_maps)
> > +               kvm_x86_ops.nested_ops->check_guest_maps();
> > +
> 
> This should not be needed, should it?  As long as the gfn-to-pfn
> cache's vcpu field is handled properly, the request will just cause
> the vCPU not to enter. 

If the MMU mappings never change, the request never happens. But the
memslots *can* change, so it does need to be revalidated each time
through I think?

>  It would have to take the gpc->lock around
> changes to gpc->vcpu though (meaning: it's probably best to add a
> function gfn_to_pfn_cache_set_vcpu).

Hm, in my head that was never going to *change* for a given gpc; it
*belongs* to that vCPU for ever (and was even part of vmx->nested. for
that vCPU, to replace e.g. vmx->nested.pi_desc_map).

If I flesh out what I had in my last email a bit more, perhaps my
vision is a little bit clearer...?

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 465455334c0c..9f279d08e570 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1510,6 +1510,7 @@ struct kvm_x86_nested_ops {
 	int (*enable_evmcs)(struct kvm_vcpu *vcpu,
 			    uint16_t *vmcs_version);
 	uint16_t (*get_evmcs_version)(struct kvm_vcpu *vcpu);
+	void (*check_guest_maps)(struct kvm_vcpu *vcpu);
 };
 
 struct kvm_x86_init_ops {
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 280f34ea02c3..71d2d8171a1c 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3242,6 +3242,31 @@ static bool vmx_get_nested_state_pages(struct kvm_vcpu *vcpu)
 	return true;
 }
 
+static void nested_vmx_check_guest_maps(struct kvm_vcpu *vcpu)
+{
+	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	struct gfn_to_pfn_cache *gpc;
+
+	bool valid;
+
+	if (nested_cpu_has_posted_intr(vmcs12)) {
+		gpc = &vmx->nested.pi_desc_cache;
+
+		read_lock(&gpc->lock);
+		valid = kvm_gfn_to_pfn_cache_check(vcpu->kvm, gpc,
+						   vmcs12->posted_intr_desc_addr,
+						   PAGE_SIZE);
+		read_unlock(&gpc->lock);
+		if (!valid) {
+			/* XX: This isn't idempotent. Make it so, or use a different
+			 * req for the 'refresh'. */
+			kvm_make_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
+			return;
+		}
+	}
+}
+
 static int nested_vmx_write_pml_buffer(struct kvm_vcpu *vcpu, gpa_t gpa)
 {
 	struct vmcs12 *vmcs12;
@@ -6744,4 +6769,5 @@ struct kvm_x86_nested_ops vmx_nested_ops = {
 	.write_log_dirty = nested_vmx_write_pml_buffer,
 	.enable_evmcs = nested_enable_evmcs,
 	.get_evmcs_version = nested_get_evmcs_version,
+	.check_guest_maps = nested_vmx_check_guest_maps,
 };
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0a689bb62e9e..a879e4d08758 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9735,6 +9735,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 		if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu))
 			static_call(kvm_x86_update_cpu_dirty_logging)(vcpu);
+		if (kvm_check_request(KVM_REQ_GPC_INVALIDATE, vcpu))
+			; /* Nothing to do. It just wanted to wake us */
 	}
 
 	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win ||
@@ -9781,6 +9783,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	local_irq_disable();
 	vcpu->mode = IN_GUEST_MODE;
 
+	/*
+	 * If the guest requires direct access to mapped L1 pages, check
+	 * the caches are valid. Will raise KVM_REQ_GET_NESTED_STATE_PAGES
+	 * to go and revalidate them, if necessary.
+	 */
+	if (is_guest_mode(vcpu) && kvm_x86_ops.nested_ops->check_guest_maps)
+		kvm_x86_ops.nested_ops->check_guest_maps(vcpu);
+
 	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
 
 	/*


> Doing it lockless would be harder; I cannot think of any well-known
> pattern that is good for this scenario.
> 
> > That check_guest_maps() function can validate the caches which the L2
> > guest is actually using in the VMCS02, and if they need to be refreshed
> > then raising a req will immediately break out of vcpu_enter_guest() to
> > allow that to happen.
> > 
> > I*think*  we can just use KVM_REQ_GET_NESTED_STATE_PAGES for that and
> > don't need to invent a new one?
> 
> Yes, maybe even do it unconditionally?
> 

So nested_get_vmcs12_pages() certainly isn't idempotent right now
because of all the kvm_vcpu_map() calls, which would just end up
leaking — but I suppose the point is to kill all those, and then maybe
it will be?

I quite liked the idea of *not* refreshing the caches immediately,m
because we can wait until the vCPU is in L2 mode again and actually
*needs* them.