On Wed, Jan 27, 2021, David Stevens wrote: > From: David Stevens > > Track the range being invalidated by mmu_notifier and skip page fault > retries if the fault address is not affected by the in-progress > invalidation. Handle concurrent invalidations by finding the minimal > range which includes all ranges being invalidated. Although the combined > range may include unrelated addresses and cannot be shrunk as individual > invalidation operations complete, it is unlikely the marginal gains of > proper range tracking are worth the additional complexity. > > The primary benefit of this change is the reduction in the likelihood of > extreme latency when handing a page fault due to another thread having > been preempted while modifying host virtual addresses. > > Signed-off-by: David Stevens > --- > v1 -> v2: > - improve handling of concurrent invalidation requests by unioning > ranges, instead of just giving up and using [0, ULONG_MAX). Ooh, even better. > - add lockdep check > - code comments and formatting > > arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +- > arch/powerpc/kvm/book3s_64_mmu_radix.c | 2 +- > arch/x86/kvm/mmu/mmu.c | 16 ++++++++------ > arch/x86/kvm/mmu/paging_tmpl.h | 7 ++++--- > include/linux/kvm_host.h | 27 +++++++++++++++++++++++- > virt/kvm/kvm_main.c | 29 ++++++++++++++++++++++---- > 6 files changed, 67 insertions(+), 16 deletions(-) > ... > @@ -3717,7 +3720,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, > mmu_seq = vcpu->kvm->mmu_notifier_seq; > smp_rmb(); > > - if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, write, &map_writable)) > + if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, &hva, > + write, &map_writable)) > return RET_PF_RETRY; > > if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r)) > @@ -3725,7 +3729,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, > > r = RET_PF_RETRY; > spin_lock(&vcpu->kvm->mmu_lock); > - if (mmu_notifier_retry(vcpu->kvm, mmu_seq)) > + if (mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva)) 'hva' will be uninitialized at this point if the gfn did not resolve to a memslot, i.e. when handling an MMIO page fault. On the plus side, that's an opportunity for another optimization as there is no need to retry MMIO page faults on mmu_notifier invalidations. Including the attached patch as a preqreq to this will avoid consuming an uninitialized 'hva'. > goto out_unlock; > r = make_mmu_pages_available(vcpu); > if (r) ... > void kvm_release_pfn_clean(kvm_pfn_t pfn); > void kvm_release_pfn_dirty(kvm_pfn_t pfn); > @@ -1203,6 +1206,28 @@ static inline int mmu_notifier_retry(struct kvm *kvm, unsigned long mmu_seq) > return 1; > return 0; > } > + > +static inline int mmu_notifier_retry_hva(struct kvm *kvm, > + unsigned long mmu_seq, > + unsigned long hva) > +{ > +#ifdef CONFIG_LOCKDEP > + lockdep_is_held(&kvm->mmu_lock); No need to manually do the #ifdef, just use lockdep_assert_held instead of lockdep_is_held. > +#endif > + /* > + * If mmu_notifier_count is non-zero, then the range maintained by > + * kvm_mmu_notifier_invalidate_range_start contains all addresses that > + * might be being invalidated. Note that it may include some false > + * positives, due to shortcuts when handing concurrent invalidations. > + */ > + if (unlikely(kvm->mmu_notifier_count) && > + kvm->mmu_notifier_range_start <= hva && > + hva < kvm->mmu_notifier_range_end) Uber nit: I find this easier to read if 'hva' is on the left-hand side for both checks, i.e. if (unlikely(kvm->mmu_notifier_count) && hva >= kvm->mmu_notifier_range_start && hva < kvm->mmu_notifier_range_end) > + return 1; > + if (kvm->mmu_notifier_seq != mmu_seq) > + return 1; > + return 0; > +} > #endif > > #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING