From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 46B0AC4332F for ; Fri, 11 Nov 2022 08:32:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D2F2A8E0001; Fri, 11 Nov 2022 03:32:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CB7D16B0072; Fri, 11 Nov 2022 03:32:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B31638E0001; Fri, 11 Nov 2022 03:32:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 9E7746B0071 for ; Fri, 11 Nov 2022 03:32:20 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 75EDBA0367 for ; Fri, 11 Nov 2022 08:32:20 +0000 (UTC) X-FDA: 80120494440.17.9254DBE Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf04.hostedemail.com (Postfix) with ESMTP id 5A3FB40006 for ; Fri, 11 Nov 2022 08:32:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668155539; x=1699691539; h=date:from:to:cc:subject:message-id:reply-to:references: mime-version:in-reply-to; bh=IKVU9Q/fDJqTFbQbiTsRm7ednwn42qF/GjJlkMKg1z0=; b=BhRP9L6tZ9CLLGr0RtGbekTh+KKOdN13KVtM1A+Y4LE+6wC/PDduXTo9 epIaYHXaxh73trI0T2qRdQc3eQqG6FvC2bE6PUyZ55grKbzjhR5otBeyE YDG/XBSds3BPF9EbBjrEH7eio672AxZoevCtXUN/drsLFKXz6YeqGminN 5UQbcHarcQh1wFbXPCHPLpeylUEXuQZU7pGZPWeEL50jO0G7h4QjBOJUs GKeWDG//hf23Le3t9pygQPnbaQPqvjbi0ZsblIh9EsTtQZ1V7ZdpTZW24 6givwpcXcTJ/vYcme9O0KreVmCsVLsz5FOgI40rx/mgHQXclF/57v2vbQ g==; X-IronPort-AV: E=McAfee;i="6500,9779,10527"; a="294914953" X-IronPort-AV: E=Sophos;i="5.96,156,1665471600"; d="scan'208";a="294914953" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 00:32:17 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10527"; a="615423370" X-IronPort-AV: E=Sophos;i="5.96,156,1665471600"; d="scan'208";a="615423370" Received: from chaop.bj.intel.com (HELO localhost) ([10.240.193.75]) by orsmga006.jf.intel.com with ESMTP; 11 Nov 2022 00:32:05 -0800 Date: Fri, 11 Nov 2022 16:27:39 +0800 From: Chao Peng To: Sean Christopherson Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Shuah Khan , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , "Kirill A . Shutemov" , luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret , tabba@google.com, Michael Roth , mhocko@suse.com, Muchun Song , wei.w.wang@intel.com Subject: Re: [PATCH v9 4/8] KVM: Use gfn instead of hva for mmu_notifier_retry Message-ID: <20221111082739.GA203765@chaop.bj.intel.com> Reply-To: Chao Peng References: <20221025151344.3784230-1-chao.p.peng@linux.intel.com> <20221025151344.3784230-5-chao.p.peng@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1668155540; a=rsa-sha256; cv=none; b=C4qC2NIw120j/A5Mv7i8FYXXp9JN79hrV1QtnAMqY8I7T7sJVF/iGYswfnNI/DE8sK+KNA htTFeHSUl7prbj9ltG8vus2mDjGfTGy1g5wS9JtryJn0hBcAMAI1TeGHuWNdWUjScV+MtD 80WCjA6cUosuOdqrsqHrnXJrB7WNlP0= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=BhRP9L6t; spf=none (imf04.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 134.134.136.126) smtp.mailfrom=chao.p.peng@linux.intel.com; dmarc=fail reason="No valid SPF" header.from=intel.com (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1668155540; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tk5I3JsWzaTOkP1+qghWNZxxJCbLmOgWt2D4A7gnnEo=; b=kjvh7cXiIMYXnXImdThhRrerxUUhULYtWzyokCDe3TkpNjFlJ4asMX5k+NzpUzN5yPmxuV QpHos2dRkY6Hd2/DZf84A1UqWbBv8RZFIQ7d8ziHc1iaJaoKEMD627e2ax0byG13GlE8QH SOBfxH4Mm31n/TjAwxXgN6tFlMr4/SA= X-Stat-Signature: 55prtippti8km9k39czmu3wu3f4m9up4 X-Rspamd-Queue-Id: 5A3FB40006 Authentication-Results: imf04.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=BhRP9L6t; spf=none (imf04.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 134.134.136.126) smtp.mailfrom=chao.p.peng@linux.intel.com; dmarc=fail reason="No valid SPF" header.from=intel.com (policy=none) X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1668155538-906254 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Nov 10, 2022 at 08:06:33PM +0000, Sean Christopherson wrote: > On Tue, Oct 25, 2022, Chao Peng wrote: > > @@ -715,15 +715,9 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn, > > kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn); > > } > > > > -void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, > > - unsigned long end) > > +static inline > > Don't tag static functions with "inline" unless they're in headers, in which case > the inline is effectively required. In pretty much every scenario, the compiler > can do a better job of optimizing inline vs. non-inline, i.e. odds are very good > the compiler would inline this helper anyways, and if not, there would likely be > a good reason not to inline it. Yep, I know the rationale behind, I made a mistake. > > It'll be a moot point in this case (more below), but this would also reduce the > line length and avoid the wrap. > > > void update_invalidate_range(struct kvm *kvm, gfn_t start, > > + gfn_t end) > > I appreciate the effort to make this easier to read, but making such a big divergence > from the kernel's preferred formatting is often counter-productive, e.g. I blinked a > few times when first reading this code. > > Again, moot point this time (still below ;-) ), but for future reference, better > options are to either let the line poke out or simply wrap early to get the > bundling of parameters that you want, e.g. > > static inline void update_invalidate_range(struct kvm *kvm, gfn_t start, gfn_t end) > > or > > static inline void update_invalidate_range(struct kvm *kvm, > gfn_t start, gfn_t end) Fully agreed. > > > { > > - /* > > - * The count increase must become visible at unlock time as no > > - * spte can be established without taking the mmu_lock and > > - * count is also read inside the mmu_lock critical section. > > - */ > > - kvm->mmu_invalidate_in_progress++; > > if (likely(kvm->mmu_invalidate_in_progress == 1)) { > > kvm->mmu_invalidate_range_start = start; > > kvm->mmu_invalidate_range_end = end; > > @@ -744,6 +738,28 @@ void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, > > } > > } > > > > +static void mark_invalidate_in_progress(struct kvm *kvm, gfn_t start, gfn_t end) > > Splitting the helpers this way yields a weird API overall, e.g. it's possible > (common, actually) to have an "end" without a "begin". > > Taking the range in the "end" is also dangerous/misleading/imbalanced, because _if_ > there are multiple ranges in a batch, each range would need to be unwound > independently, e.g. the invocation of the "end" helper in > kvm_mmu_notifier_invalidate_range_end() is flat out wrong, it just doesn't cause > problems because KVM doesn't (currently) try to unwind regions (and probably never > will, but that's beside the point). I actually also don't feel good with existing code (taking range in the "start" and "end") but didn't go further to find a better solution. > > Rather than shunt what is effectively the "begin" into a separate helper, provide > three separate APIs, e.g. begin, range_add, end. That way, begin+end don't take a > range and thus are symmetrical, always paired, and can't screw up unwinding since > they don't have a range to unwind. This looks much better to me. > > It'll require three calls in every case, but that's not the end of the world since > none of these flows are super hot paths. > > > +{ > > + /* > > + * The count increase must become visible at unlock time as no > > + * spte can be established without taking the mmu_lock and > > + * count is also read inside the mmu_lock critical section. > > + */ > > + kvm->mmu_invalidate_in_progress++; > > This should invalidate (ha!) mmu_invalidate_range_{start,end}, and then WARN in > mmu_invalidate_retry() if the range isn't valid. And the "add" helper should > WARN if mmu_invalidate_in_progress == 0. > > > +} > > + > > +static bool kvm_mmu_handle_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) > > "handle" is waaaay too generic. Just match kvm_unmap_gfn_range() and call it > kvm_mmu_unmap_gfn_range(). This is a local function so it's unlikely to collide > with arch code, now or in the future. Agreed. > > > +{ > > + update_invalidate_range(kvm, range->start, range->end); > > + return kvm_unmap_gfn_range(kvm, range); > > +} > > Overall, this? Compile tested only... Thanks! Chao > > --- > arch/x86/kvm/mmu/mmu.c | 8 +++++--- > include/linux/kvm_host.h | 33 +++++++++++++++++++++------------ > virt/kvm/kvm_main.c | 30 +++++++++++++++++++++--------- > 3 files changed, 47 insertions(+), 24 deletions(-) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 93c389eaf471..d4b373e3e524 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -4259,7 +4259,7 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu, > return true; > > return fault->slot && > - mmu_invalidate_retry_hva(vcpu->kvm, mmu_seq, fault->hva); > + mmu_invalidate_retry_gfn(vcpu->kvm, mmu_seq, fault->gfn); > } > > static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > @@ -6098,7 +6098,9 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) > > write_lock(&kvm->mmu_lock); > > - kvm_mmu_invalidate_begin(kvm, gfn_start, gfn_end); > + kvm_mmu_invalidate_begin(kvm); > + > + kvm_mmu_invalidate_range_add(kvm, gfn_start, gfn_end); > > flush = kvm_rmap_zap_gfn_range(kvm, gfn_start, gfn_end); > > @@ -6112,7 +6114,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) > kvm_flush_remote_tlbs_with_address(kvm, gfn_start, > gfn_end - gfn_start); > > - kvm_mmu_invalidate_end(kvm, gfn_start, gfn_end); > + kvm_mmu_invalidate_end(kvm); > > write_unlock(&kvm->mmu_lock); > } > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index e6e66c5e56f2..29aa6d6827cc 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -778,8 +778,8 @@ struct kvm { > struct mmu_notifier mmu_notifier; > unsigned long mmu_invalidate_seq; > long mmu_invalidate_in_progress; > - unsigned long mmu_invalidate_range_start; > - unsigned long mmu_invalidate_range_end; > + gfn_t mmu_invalidate_range_start; > + gfn_t mmu_invalidate_range_end; > #endif > struct list_head devices; > u64 manual_dirty_log_protect; > @@ -1378,10 +1378,9 @@ void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc); > void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc); > #endif > > -void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, > - unsigned long end); > -void kvm_mmu_invalidate_end(struct kvm *kvm, unsigned long start, > - unsigned long end); > +void kvm_mmu_invalidate_begin(struct kvm *kvm); > +void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end); > +void kvm_mmu_invalidate_end(struct kvm *kvm); > > long kvm_arch_dev_ioctl(struct file *filp, > unsigned int ioctl, unsigned long arg); > @@ -1952,9 +1951,9 @@ static inline int mmu_invalidate_retry(struct kvm *kvm, unsigned long mmu_seq) > return 0; > } > > -static inline int mmu_invalidate_retry_hva(struct kvm *kvm, > +static inline int mmu_invalidate_retry_gfn(struct kvm *kvm, > unsigned long mmu_seq, > - unsigned long hva) > + gfn_t gfn) > { > lockdep_assert_held(&kvm->mmu_lock); > /* > @@ -1963,10 +1962,20 @@ static inline int mmu_invalidate_retry_hva(struct kvm *kvm, > * that might be being invalidated. Note that it may include some false > * positives, due to shortcuts when handing concurrent invalidations. > */ > - if (unlikely(kvm->mmu_invalidate_in_progress) && > - hva >= kvm->mmu_invalidate_range_start && > - hva < kvm->mmu_invalidate_range_end) > - return 1; > + if (unlikely(kvm->mmu_invalidate_in_progress)) { > + /* > + * Dropping mmu_lock after bumping mmu_invalidate_in_progress > + * but before updating the range is a KVM bug. > + */ > + if (WARN_ON_ONCE(kvm->mmu_invalidate_range_start == INVALID_GPA || > + kvm->mmu_invalidate_range_end == INVALID_GPA)) > + return 1; > + > + if (gfn >= kvm->mmu_invalidate_range_start && > + gfn < kvm->mmu_invalidate_range_end) > + return 1; > + } > + > if (kvm->mmu_invalidate_seq != mmu_seq) > return 1; > return 0; > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 43bbe4fde078..e9e03b979f77 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -540,9 +540,7 @@ static void kvm_mmu_notifier_invalidate_range(struct mmu_notifier *mn, > > typedef bool (*hva_handler_t)(struct kvm *kvm, struct kvm_gfn_range *range); > > -typedef void (*on_lock_fn_t)(struct kvm *kvm, unsigned long start, > - unsigned long end); > - > +typedef void (*on_lock_fn_t)(struct kvm *kvm); > typedef void (*on_unlock_fn_t)(struct kvm *kvm); > > struct kvm_hva_range { > @@ -628,7 +626,8 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm, > locked = true; > KVM_MMU_LOCK(kvm); > if (!IS_KVM_NULL_FN(range->on_lock)) > - range->on_lock(kvm, range->start, range->end); > + range->on_lock(kvm); > + > if (IS_KVM_NULL_FN(range->handler)) > break; > } > @@ -715,8 +714,7 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn, > kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn); > } > > -void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, > - unsigned long end) > +void kvm_mmu_invalidate_begin(struct kvm *kvm) > { > /* > * The count increase must become visible at unlock time as no > @@ -724,6 +722,15 @@ void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, > * count is also read inside the mmu_lock critical section. > */ > kvm->mmu_invalidate_in_progress++; > + > + kvm->mmu_invalidate_range_start = INVALID_GPA; > + kvm->mmu_invalidate_range_end = INVALID_GPA; > +} > + > +void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end) > +{ > + WARN_ON_ONCE(!kvm->mmu_invalidate_in_progress); > + > if (likely(kvm->mmu_invalidate_in_progress == 1)) { > kvm->mmu_invalidate_range_start = start; > kvm->mmu_invalidate_range_end = end; > @@ -744,6 +751,12 @@ void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, > } > } > > +static bool kvm_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) > +{ > + kvm_mmu_invalidate_range_add(kvm, range->start, range->end); > + return kvm_unmap_gfn_range(kvm, range); > +} > + > static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, > const struct mmu_notifier_range *range) > { > @@ -752,7 +765,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, > .start = range->start, > .end = range->end, > .pte = __pte(0), > - .handler = kvm_unmap_gfn_range, > + .handler = kvm_mmu_unmap_gfn_range, > .on_lock = kvm_mmu_invalidate_begin, > .on_unlock = kvm_arch_guest_memory_reclaimed, > .flush_on_ret = true, > @@ -791,8 +804,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, > return 0; > } > > -void kvm_mmu_invalidate_end(struct kvm *kvm, unsigned long start, > - unsigned long end) > +void kvm_mmu_invalidate_end(struct kvm *kvm) > { > /* > * This sequence increase will notify the kvm page fault that > > base-commit: d663b8a285986072428a6a145e5994bc275df994 > -- >