On Tue, 2021-11-16 at 15:57 +0100, Paolo Bonzini wrote: > On 11/16/21 15:25, David Woodhouse wrote: > > + /* > > + * If the guest requires direct access to mapped L1 pages, check > > + * the caches are valid. Will raise KVM_REQ_GET_NESTED_STATE_PAGES > > + * to go and revalidate them, if necessary. > > + */ > > + if (is_guest_mode(vcpu) && kvm_x86_ops.nested_ops->check_guest_maps) > > + kvm_x86_ops.nested_ops->check_guest_maps(); > > + > > This should not be needed, should it? As long as the gfn-to-pfn > cache's vcpu field is handled properly, the request will just cause > the vCPU not to enter. If the MMU mappings never change, the request never happens. But the memslots *can* change, so it does need to be revalidated each time through I think? > It would have to take the gpc->lock around > changes to gpc->vcpu though (meaning: it's probably best to add a > function gfn_to_pfn_cache_set_vcpu). Hm, in my head that was never going to *change* for a given gpc; it *belongs* to that vCPU for ever (and was even part of vmx->nested. for that vCPU, to replace e.g. vmx->nested.pi_desc_map). If I flesh out what I had in my last email a bit more, perhaps my vision is a little bit clearer...? diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 465455334c0c..9f279d08e570 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1510,6 +1510,7 @@ struct kvm_x86_nested_ops { int (*enable_evmcs)(struct kvm_vcpu *vcpu, uint16_t *vmcs_version); uint16_t (*get_evmcs_version)(struct kvm_vcpu *vcpu); + void (*check_guest_maps)(struct kvm_vcpu *vcpu); }; struct kvm_x86_init_ops { diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 280f34ea02c3..71d2d8171a1c 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -3242,6 +3242,31 @@ static bool vmx_get_nested_state_pages(struct kvm_vcpu *vcpu) return true; } +static void nested_vmx_check_guest_maps(struct kvm_vcpu *vcpu) +{ + struct vmcs12 *vmcs12 = get_vmcs12(vcpu); + struct vcpu_vmx *vmx = to_vmx(vcpu); + struct gfn_to_pfn_cache *gpc; + + bool valid; + + if (nested_cpu_has_posted_intr(vmcs12)) { + gpc = &vmx->nested.pi_desc_cache; + + read_lock(&gpc->lock); + valid = kvm_gfn_to_pfn_cache_check(vcpu->kvm, gpc, + vmcs12->posted_intr_desc_addr, + PAGE_SIZE); + read_unlock(&gpc->lock); + if (!valid) { + /* XX: This isn't idempotent. Make it so, or use a different + * req for the 'refresh'. */ + kvm_make_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu); + return; + } + } +} + static int nested_vmx_write_pml_buffer(struct kvm_vcpu *vcpu, gpa_t gpa) { struct vmcs12 *vmcs12; @@ -6744,4 +6769,5 @@ struct kvm_x86_nested_ops vmx_nested_ops = { .write_log_dirty = nested_vmx_write_pml_buffer, .enable_evmcs = nested_enable_evmcs, .get_evmcs_version = nested_get_evmcs_version, + .check_guest_maps = nested_vmx_check_guest_maps, }; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0a689bb62e9e..a879e4d08758 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9735,6 +9735,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu)) static_call(kvm_x86_update_cpu_dirty_logging)(vcpu); + if (kvm_check_request(KVM_REQ_GPC_INVALIDATE, vcpu)) + ; /* Nothing to do. It just wanted to wake us */ } if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win || @@ -9781,6 +9783,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) local_irq_disable(); vcpu->mode = IN_GUEST_MODE; + /* + * If the guest requires direct access to mapped L1 pages, check + * the caches are valid. Will raise KVM_REQ_GET_NESTED_STATE_PAGES + * to go and revalidate them, if necessary. + */ + if (is_guest_mode(vcpu) && kvm_x86_ops.nested_ops->check_guest_maps) + kvm_x86_ops.nested_ops->check_guest_maps(vcpu); + srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); /* > Doing it lockless would be harder; I cannot think of any well-known > pattern that is good for this scenario. > > > That check_guest_maps() function can validate the caches which the L2 > > guest is actually using in the VMCS02, and if they need to be refreshed > > then raising a req will immediately break out of vcpu_enter_guest() to > > allow that to happen. > > > > I*think* we can just use KVM_REQ_GET_NESTED_STATE_PAGES for that and > > don't need to invent a new one? > > Yes, maybe even do it unconditionally? > So nested_get_vmcs12_pages() certainly isn't idempotent right now because of all the kvm_vcpu_map() calls, which would just end up leaking — but I suppose the point is to kill all those, and then maybe it will be? I quite liked the idea of *not* refreshing the caches immediately,m because we can wait until the vCPU is in L2 mode again and actually *needs* them.