All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Sean Christopherson <seanjc@google.com>
Cc: Will Deacon <will@kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	Oliver Upton <oliver.upton@linux.dev>,
	Tianrui Zhao <zhaotianrui@loongson.cn>,
	Bibo Mao <maobibo@loongson.cn>,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	Nicholas Piggin <npiggin@gmail.com>,
	Anup Patel <anup@brainfault.org>,
	Atish Patra <atishp@atishpatra.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	loongarch@lists.linux.dev, linux-mips@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org,
	linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org,
	linux-perf-users@vger.kernel.org
Subject: Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback
Date: Sat, 13 Apr 2024 10:56:25 +0100	[thread overview]
Message-ID: <86h6g5si0m.wl-maz@kernel.org> (raw)
In-Reply-To: <ZhlLHtfeSHk9gRRO@google.com>

On Fri, 12 Apr 2024 15:54:22 +0100,
Sean Christopherson <seanjc@google.com> wrote:
> 
> On Fri, Apr 12, 2024, Marc Zyngier wrote:
> > On Fri, 12 Apr 2024 11:44:09 +0100, Will Deacon <will@kernel.org> wrote:
> > > On Fri, Apr 05, 2024 at 07:58:12AM -0400, Paolo Bonzini wrote:
> > > Also, if you're in the business of hacking the MMU notifier code, it
> > > would be really great to change the .clear_flush_young() callback so
> > > that the architecture could handle the TLB invalidation. At the moment,
> > > the core KVM code invalidates the whole VMID courtesy of 'flush_on_ret'
> > > being set by kvm_handle_hva_range(), whereas we could do a much
> > > lighter-weight and targetted TLBI in the architecture page-table code
> > > when we actually update the ptes for small ranges.
> > 
> > Indeed, and I was looking at this earlier this week as it has a pretty
> > devastating effect with NV (it blows the shadow S2 for that VMID, with
> > costly consequences).
> > 
> > In general, it feels like the TLB invalidation should stay with the
> > code that deals with the page tables, as it has a pretty good idea of
> > what needs to be invalidated and how -- specially on architectures
> > that have a HW-broadcast facility like arm64.
> 
> Would this be roughly on par with an in-line flush on arm64?  The simpler, more
> straightforward solution would be to let architectures override flush_on_ret,
> but I would prefer something like the below as x86 can also utilize a range-based
> flush when running as a nested hypervisor.
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index ff0a20565f90..b65116294efe 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -601,6 +601,7 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm,
>         struct kvm_gfn_range gfn_range;
>         struct kvm_memory_slot *slot;
>         struct kvm_memslots *slots;
> +       bool need_flush = false;
>         int i, idx;
>  
>         if (WARN_ON_ONCE(range->end <= range->start))
> @@ -653,10 +654,22 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm,
>                                         break;
>                         }
>                         r.ret |= range->handler(kvm, &gfn_range);
> +
> +                       /*
> +                        * Use a precise gfn-based TLB flush when possible, as
> +                        * most mmu_notifier events affect a small-ish range.
> +                        * Fall back to a full TLB flush if the gfn-based flush
> +                        * fails, and don't bother trying the gfn-based flush
> +                        * if a full flush is already pending.
> +                        */
> +                       if (range->flush_on_ret && !need_flush && r.ret &&
> +                           kvm_arch_flush_remote_tlbs_range(kvm, gfn_range.start
> +                                                            gfn_range.end - gfn_range.start + 1))
> +                               need_flush = true;
>                 }
>         }
>  
> -       if (range->flush_on_ret && r.ret)
> +       if (need_flush)
>                 kvm_flush_remote_tlbs(kvm);
>  
>         if (r.found_memslot)

I think this works for us on HW that has range invalidation, which
would already be a positive move.

For the lesser HW that isn't range capable, it also gives the
opportunity to perform the iteration ourselves or go for the nuclear
option if the range is larger than some arbitrary constant (though
this is additional work).

But this still considers the whole range as being affected by
range->handler(). It'd be interesting to try and see whether more
precise tracking is (or isn't) generally beneficial.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org, David Hildenbrand <david@redhat.com>,
	linux-mips@vger.kernel.org, linux-mm@kvack.org,
	Will Deacon <will@kernel.org>, Anup Patel <anup@brainfault.org>,
	linux-trace-kernel@vger.kernel.org,
	Nicholas Piggin <npiggin@gmail.com>,
	Bibo Mao <maobibo@loongson.cn>,
	loongarch@lists.linux.dev, Atish Patra <atishp@atishpatra.org>,
	kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	linux-kernel@vger.kernel.org,
	Oliver Upton <oliver.upton@linux.dev>,
	linux-perf-users@vger.kernel.org, kvm-riscv@lists.infradead.org,
	Paolo Bonzini <pbonzini@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tianrui Zhao <zhaotianrui@loongson.cn>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback
Date: Sat, 13 Apr 2024 10:56:25 +0100	[thread overview]
Message-ID: <86h6g5si0m.wl-maz@kernel.org> (raw)
In-Reply-To: <ZhlLHtfeSHk9gRRO@google.com>

On Fri, 12 Apr 2024 15:54:22 +0100,
Sean Christopherson <seanjc@google.com> wrote:
> 
> On Fri, Apr 12, 2024, Marc Zyngier wrote:
> > On Fri, 12 Apr 2024 11:44:09 +0100, Will Deacon <will@kernel.org> wrote:
> > > On Fri, Apr 05, 2024 at 07:58:12AM -0400, Paolo Bonzini wrote:
> > > Also, if you're in the business of hacking the MMU notifier code, it
> > > would be really great to change the .clear_flush_young() callback so
> > > that the architecture could handle the TLB invalidation. At the moment,
> > > the core KVM code invalidates the whole VMID courtesy of 'flush_on_ret'
> > > being set by kvm_handle_hva_range(), whereas we could do a much
> > > lighter-weight and targetted TLBI in the architecture page-table code
> > > when we actually update the ptes for small ranges.
> > 
> > Indeed, and I was looking at this earlier this week as it has a pretty
> > devastating effect with NV (it blows the shadow S2 for that VMID, with
> > costly consequences).
> > 
> > In general, it feels like the TLB invalidation should stay with the
> > code that deals with the page tables, as it has a pretty good idea of
> > what needs to be invalidated and how -- specially on architectures
> > that have a HW-broadcast facility like arm64.
> 
> Would this be roughly on par with an in-line flush on arm64?  The simpler, more
> straightforward solution would be to let architectures override flush_on_ret,
> but I would prefer something like the below as x86 can also utilize a range-based
> flush when running as a nested hypervisor.
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index ff0a20565f90..b65116294efe 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -601,6 +601,7 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm,
>         struct kvm_gfn_range gfn_range;
>         struct kvm_memory_slot *slot;
>         struct kvm_memslots *slots;
> +       bool need_flush = false;
>         int i, idx;
>  
>         if (WARN_ON_ONCE(range->end <= range->start))
> @@ -653,10 +654,22 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm,
>                                         break;
>                         }
>                         r.ret |= range->handler(kvm, &gfn_range);
> +
> +                       /*
> +                        * Use a precise gfn-based TLB flush when possible, as
> +                        * most mmu_notifier events affect a small-ish range.
> +                        * Fall back to a full TLB flush if the gfn-based flush
> +                        * fails, and don't bother trying the gfn-based flush
> +                        * if a full flush is already pending.
> +                        */
> +                       if (range->flush_on_ret && !need_flush && r.ret &&
> +                           kvm_arch_flush_remote_tlbs_range(kvm, gfn_range.start
> +                                                            gfn_range.end - gfn_range.start + 1))
> +                               need_flush = true;
>                 }
>         }
>  
> -       if (range->flush_on_ret && r.ret)
> +       if (need_flush)
>                 kvm_flush_remote_tlbs(kvm);
>  
>         if (r.found_memslot)

I think this works for us on HW that has range invalidation, which
would already be a positive move.

For the lesser HW that isn't range capable, it also gives the
opportunity to perform the iteration ourselves or go for the nuclear
option if the range is larger than some arbitrary constant (though
this is additional work).

But this still considers the whole range as being affected by
range->handler(). It'd be interesting to try and see whether more
precise tracking is (or isn't) generally beneficial.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Sean Christopherson <seanjc@google.com>
Cc: Will Deacon <will@kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	Oliver Upton <oliver.upton@linux.dev>,
	Tianrui Zhao <zhaotianrui@loongson.cn>,
	Bibo Mao <maobibo@loongson.cn>,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	Nicholas Piggin <npiggin@gmail.com>,
	Anup Patel <anup@brainfault.org>,
	Atish Patra <atishp@atishpatra.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	loongarch@lists.linux.dev, linux-mips@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org,
	linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org,
	linux-perf-users@vger.kernel.org
Subject: Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback
Date: Sat, 13 Apr 2024 10:56:25 +0100	[thread overview]
Message-ID: <86h6g5si0m.wl-maz@kernel.org> (raw)
In-Reply-To: <ZhlLHtfeSHk9gRRO@google.com>

On Fri, 12 Apr 2024 15:54:22 +0100,
Sean Christopherson <seanjc@google.com> wrote:
> 
> On Fri, Apr 12, 2024, Marc Zyngier wrote:
> > On Fri, 12 Apr 2024 11:44:09 +0100, Will Deacon <will@kernel.org> wrote:
> > > On Fri, Apr 05, 2024 at 07:58:12AM -0400, Paolo Bonzini wrote:
> > > Also, if you're in the business of hacking the MMU notifier code, it
> > > would be really great to change the .clear_flush_young() callback so
> > > that the architecture could handle the TLB invalidation. At the moment,
> > > the core KVM code invalidates the whole VMID courtesy of 'flush_on_ret'
> > > being set by kvm_handle_hva_range(), whereas we could do a much
> > > lighter-weight and targetted TLBI in the architecture page-table code
> > > when we actually update the ptes for small ranges.
> > 
> > Indeed, and I was looking at this earlier this week as it has a pretty
> > devastating effect with NV (it blows the shadow S2 for that VMID, with
> > costly consequences).
> > 
> > In general, it feels like the TLB invalidation should stay with the
> > code that deals with the page tables, as it has a pretty good idea of
> > what needs to be invalidated and how -- specially on architectures
> > that have a HW-broadcast facility like arm64.
> 
> Would this be roughly on par with an in-line flush on arm64?  The simpler, more
> straightforward solution would be to let architectures override flush_on_ret,
> but I would prefer something like the below as x86 can also utilize a range-based
> flush when running as a nested hypervisor.
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index ff0a20565f90..b65116294efe 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -601,6 +601,7 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm,
>         struct kvm_gfn_range gfn_range;
>         struct kvm_memory_slot *slot;
>         struct kvm_memslots *slots;
> +       bool need_flush = false;
>         int i, idx;
>  
>         if (WARN_ON_ONCE(range->end <= range->start))
> @@ -653,10 +654,22 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm,
>                                         break;
>                         }
>                         r.ret |= range->handler(kvm, &gfn_range);
> +
> +                       /*
> +                        * Use a precise gfn-based TLB flush when possible, as
> +                        * most mmu_notifier events affect a small-ish range.
> +                        * Fall back to a full TLB flush if the gfn-based flush
> +                        * fails, and don't bother trying the gfn-based flush
> +                        * if a full flush is already pending.
> +                        */
> +                       if (range->flush_on_ret && !need_flush && r.ret &&
> +                           kvm_arch_flush_remote_tlbs_range(kvm, gfn_range.start
> +                                                            gfn_range.end - gfn_range.start + 1))
> +                               need_flush = true;
>                 }
>         }
>  
> -       if (range->flush_on_ret && r.ret)
> +       if (need_flush)
>                 kvm_flush_remote_tlbs(kvm);
>  
>         if (r.found_memslot)

I think this works for us on HW that has range invalidation, which
would already be a positive move.

For the lesser HW that isn't range capable, it also gives the
opportunity to perform the iteration ourselves or go for the nuclear
option if the range is larger than some arbitrary constant (though
this is additional work).

But this still considers the whole range as being affected by
range->handler(). It'd be interesting to try and see whether more
precise tracking is (or isn't) generally beneficial.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2024-04-13  9:56 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-05 11:58 [PATCH 0/4] KVM, mm: remove the .change_pte() MMU notifier and set_pte_at_notify() Paolo Bonzini
2024-04-05 11:58 ` Paolo Bonzini
2024-04-05 11:58 ` Paolo Bonzini
2024-04-05 11:58 ` [PATCH 1/4] KVM: delete .change_pte MMU notifier callback Paolo Bonzini
2024-04-05 11:58   ` Paolo Bonzini
2024-04-05 11:58   ` Paolo Bonzini
2024-04-07  4:50   ` Anup Patel
2024-04-07  4:50     ` Anup Patel
2024-04-07  4:50     ` Anup Patel
2024-04-08  7:23   ` maobibo
2024-04-08  7:23     ` maobibo
2024-04-08  7:23     ` maobibo
2024-04-08 11:45   ` Michael Ellerman
2024-04-08 11:45     ` Michael Ellerman
2024-04-08 11:45     ` Michael Ellerman
2024-04-08 13:56   ` Peter Xu
2024-04-08 13:56     ` Peter Xu
2024-04-08 13:56     ` Peter Xu
2024-04-11 16:55     ` Paolo Bonzini
2024-04-11 16:55       ` Paolo Bonzini
2024-04-11 16:55       ` Paolo Bonzini
2024-04-11 18:47       ` Peter Xu
2024-04-11 18:47         ` Peter Xu
2024-04-11 18:47         ` Peter Xu
2024-04-12 20:01       ` David Hildenbrand
2024-04-12 20:01         ` David Hildenbrand
2024-04-12 20:01         ` David Hildenbrand
2024-04-12 10:44   ` Will Deacon
2024-04-12 10:44     ` Will Deacon
2024-04-12 10:44     ` Will Deacon
2024-04-12 13:15     ` Marc Zyngier
2024-04-12 13:15       ` Marc Zyngier
2024-04-12 13:15       ` Marc Zyngier
2024-04-12 14:54       ` Sean Christopherson
2024-04-12 14:54         ` Sean Christopherson
2024-04-12 14:54         ` Sean Christopherson
2024-04-13  9:56         ` Marc Zyngier [this message]
2024-04-13  9:56           ` Marc Zyngier
2024-04-13  9:56           ` Marc Zyngier
2024-04-15 17:03           ` Sean Christopherson
2024-04-15 17:03             ` Sean Christopherson
2024-04-15 17:03             ` Sean Christopherson
2024-04-18 14:19             ` Will Deacon
2024-04-18 14:19               ` Will Deacon
2024-04-18 14:19               ` Will Deacon
2024-04-18 19:53               ` Sean Christopherson
2024-04-18 19:53                 ` Sean Christopherson
2024-04-18 19:53                 ` Sean Christopherson
2024-04-19 11:24                 ` Will Deacon
2024-04-19 11:24                   ` Will Deacon
2024-04-19 11:24                   ` Will Deacon
2024-04-19 13:58                   ` Sean Christopherson
2024-04-19 13:58                     ` Sean Christopherson
2024-04-19 13:58                     ` Sean Christopherson
2024-04-05 11:58 ` [PATCH 2/4] KVM: remove unused argument of kvm_handle_hva_range() Paolo Bonzini
2024-04-05 11:58   ` Paolo Bonzini
2024-04-05 11:58   ` Paolo Bonzini
2024-04-08  6:31   ` Philippe Mathieu-Daudé
2024-04-08  6:31     ` Philippe Mathieu-Daudé
2024-04-08  6:31     ` Philippe Mathieu-Daudé
2024-04-05 11:58 ` [PATCH 3/4] mmu_notifier: remove the .change_pte() callback Paolo Bonzini
2024-04-05 11:58   ` Paolo Bonzini
2024-04-05 11:58   ` Paolo Bonzini
2024-04-08  7:35   ` David Hildenbrand
2024-04-08  7:35     ` David Hildenbrand
2024-04-08  7:35     ` David Hildenbrand
2024-04-05 11:58 ` [PATCH 4/4] mm: replace set_pte_at_notify() with just set_pte_at() Paolo Bonzini
2024-04-05 11:58   ` Paolo Bonzini
2024-04-05 11:58   ` Paolo Bonzini
2024-04-08  6:28   ` Philippe Mathieu-Daudé
2024-04-08  6:28     ` Philippe Mathieu-Daudé
2024-04-08  6:28     ` Philippe Mathieu-Daudé
2024-04-08  7:36   ` David Hildenbrand
2024-04-08  7:36     ` David Hildenbrand
2024-04-08  7:36     ` David Hildenbrand
2024-04-10 21:30 ` [PATCH 0/4] KVM, mm: remove the .change_pte() MMU notifier and set_pte_at_notify() Andrew Morton
2024-04-10 21:30   ` Andrew Morton
2024-04-10 21:30   ` Andrew Morton
2024-04-11 16:57   ` Paolo Bonzini
2024-04-11 16:57     ` Paolo Bonzini
2024-04-11 16:57     ` Paolo Bonzini
2024-04-12 13:07 ` Marc Zyngier
2024-04-12 13:07   ` Marc Zyngier
2024-04-12 13:07   ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86h6g5si0m.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=anup@brainfault.org \
    --cc=atishp@atishpatra.org \
    --cc=david@redhat.com \
    --cc=kvm-riscv@lists.infradead.org \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=loongarch@lists.linux.dev \
    --cc=maobibo@loongson.cn \
    --cc=npiggin@gmail.com \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=tsbogend@alpha.franken.de \
    --cc=will@kernel.org \
    --cc=zhaotianrui@loongson.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.