On Mon, 2021-11-15 at 19:50 +0100, Paolo Bonzini wrote: > On 11/15/21 17:47, David Woodhouse wrote: > > So... a user of this must check the validity after setting its mode to > > IN_GUEST_MODE, and the invalidation must make a request and wake any > > vCPU(s) which might be using it. > > Yes, though the check is implicit in the existing call to > kvm_vcpu_exit_request(vcpu). Right, though *handling* the request isn't (and I'm still not sure whether to use a single new KVM_REQ_GPC_INVALIDATE or let the user of the cache specify the req to use). I don't really want generic code refreshing these caches even when they aren't going to be used (e.g. vmcs02 for a vCPU that isn't even in L2 guest mode right now). > > I moved the invalidation to the invalidate_range MMU notifier, as > > discussed. But that's where the plan falls down a little bit because > > IIUC, that one can't sleep at all. > > Which is a problem in the existing code, too. It hasn't broken yet > because invalidate_range() is _usually_ called with no spinlocks taken > (the only caller that does call with a spinlock taken seems to be > hugetlb_cow). > > Once the dust settles, we need to add non_block_start/end around calls > to ops->invalidate_range. > > > I need to move it *back* to > > invalidate_range_start() where I had it before, if I want to let it > > wait for vCPUs to exit. Which means... that the cache 'refresh' call > > must wait until the mmu_notifier_count reaches zero? Am I allowed to do > that, and make the "There can be only one waiter" comment in > > kvm_mmu_notifier_invalidate_range_end() no longer true? > > You can also update the cache while taking the mmu_lock for read, and > retry if mmu_notifier_retry_hva tells you to do so. Looking at the > scenario from commit e649b3f0188 you would have: > > (Invalidator) kvm_mmu_notifier_invalidate_range_start() > (Invalidator) write_lock(mmu_lock) > (Invalidator) increment mmu_notifier_count > (Invalidator) write_unlock(mmu_lock) > (Invalidator) request KVM_REQ_APIC_PAGE_RELOAD > (KVM VCPU) vcpu_enter_guest() > (KVM VCPU) kvm_vcpu_reload_apic_access_page() > + (KVM VCPU) read_lock(mmu_lock) > + (KVM VCPU) mmu_notifier_retry_hva() > + (KVM VCPU) read_unlock(mmu_lock) > + (KVM VCPU) retry! (mmu_notifier_count>1) But unless we do start using a waitq, it can just spin and spin and spin here can't it? + (KVM VCPU) read_lock(mmu_lock)> + (KVM VCPU) mmu_notifier_retry_hva()> + (KVM VCPU) read_unlock(mmu_lock)> + (KVM VCPU) retry! (mmu_notifier_count>1) + (KVM VCPU) read_lock(mmu_lock)> + (KVM VCPU) mmu_notifier_retry_hva()> + (KVM VCPU) read_unlock(mmu_lock)> + (KVM VCPU) retry! (mmu_notifier_count>1) + (KVM VCPU) read_lock(mmu_lock)> + (KVM VCPU) mmu_notifier_retry_hva()> + (KVM VCPU) read_unlock(mmu_lock)> + (KVM VCPU) retry! (mmu_notifier_count>1) > (Invalidator) actually unmap page + (KVM VCPU) read_lock(mmu_lock)> + (KVM VCPU) mmu_notifier_retry_hva()> + (KVM VCPU) read_unlock(mmu_lock)> + (KVM VCPU) retry! (mmu_notifier_count>1) + (KVM VCPU) read_lock(mmu_lock)> + (KVM VCPU) mmu_notifier_retry_hva()> + (KVM VCPU) read_unlock(mmu_lock)> + (KVM VCPU) retry! (mmu_notifier_count>1) > + (Invalidator) kvm_mmu_notifier_invalidate_range_end() > + (Invalidator) write_lock(mmu_lock) > + (Invalidator) decrement mmu_notifier_count > + (Invalidator) write_unlock(mmu_lock) > + (KVM VCPU) vcpu_enter_guest() > + (KVM VCPU) kvm_vcpu_reload_apic_access_page() > + (KVM VCPU) mmu_notifier_retry_hva() > > Changing mn_memslots_update_rcuwait to a waitq (and renaming it to > mn_invalidate_waitq) is of course also a possibility. I suspect that's the answer. I think the actual *invalidation* of the cache still lives in the invalidate_range() callback where I have it at the moment. But making the req to the affected vCPUs can live in invalidate_range_start(). And then the code which *handles* that req can wait for the mmu_notifier_count to reach zero before it proceeds. Atomic users of the cache (like the Xen event channel code) don't have to get involved with that. > Also, for the small requests: since you are at it, can you add the code > in a new file under virt/kvm/? Hm... only if I can make hva_to_pfn() and probably a handful of other things non-static?