linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] powerpc/powernv/npu: Remove redundant change_pte() hook
@ 2019-01-31 10:30 Peter Xu
  2019-01-31 17:11 ` Andrea Arcangeli
  2019-02-22  9:47 ` Michael Ellerman
  0 siblings, 2 replies; 5+ messages in thread
From: Peter Xu @ 2019-01-31 10:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterx, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Alistair Popple, Alexey Kardashevskiy, Mark Hairgrove,
	Balbir Singh, David Gibson, Andrea Arcangeli, Jerome Glisse,
	Jason Wang, linuxppc-dev

The change_pte() notifier was designed to use as a quick path to
update secondary MMU PTEs on write permission changes or PFN changes.
For KVM, it could reduce the vm-exits when vcpu faults on the pages
that was touched up by KSM.  It's not used to do cache invalidations,
for example, if we see the notifier will be called before the real PTE
update after all (please see set_pte_at_notify that set_pte_at was
called later).

All the necessary cache invalidation should all be done in
invalidate_range() already.

CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Alistair Popple <alistair@popple.id.au>
CC: Alexey Kardashevskiy <aik@ozlabs.ru>
CC: Mark Hairgrove <mhairgrove@nvidia.com>
CC: Balbir Singh <bsingharora@gmail.com>
CC: David Gibson <david@gibson.dropbear.id.au>
CC: Andrea Arcangeli <aarcange@redhat.com>
CC: Jerome Glisse <jglisse@redhat.com>
CC: Jason Wang <jasowang@redhat.com>
CC: linuxppc-dev@lists.ozlabs.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 arch/powerpc/platforms/powernv/npu-dma.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
index 3f58c7dbd581..c003b29d870e 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -917,15 +917,6 @@ static void pnv_npu2_mn_release(struct mmu_notifier *mn,
 	mmio_invalidate(npu_context, 0, ~0UL);
 }
 
-static void pnv_npu2_mn_change_pte(struct mmu_notifier *mn,
-				struct mm_struct *mm,
-				unsigned long address,
-				pte_t pte)
-{
-	struct npu_context *npu_context = mn_to_npu_context(mn);
-	mmio_invalidate(npu_context, address, PAGE_SIZE);
-}
-
 static void pnv_npu2_mn_invalidate_range(struct mmu_notifier *mn,
 					struct mm_struct *mm,
 					unsigned long start, unsigned long end)
@@ -936,7 +927,6 @@ static void pnv_npu2_mn_invalidate_range(struct mmu_notifier *mn,
 
 static const struct mmu_notifier_ops nv_nmmu_notifier_ops = {
 	.release = pnv_npu2_mn_release,
-	.change_pte = pnv_npu2_mn_change_pte,
 	.invalidate_range = pnv_npu2_mn_invalidate_range,
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] powerpc/powernv/npu: Remove redundant change_pte() hook
  2019-01-31 10:30 [PATCH] powerpc/powernv/npu: Remove redundant change_pte() hook Peter Xu
@ 2019-01-31 17:11 ` Andrea Arcangeli
  2019-02-05  3:52   ` Alistair Popple
  2019-02-22  9:47 ` Michael Ellerman
  1 sibling, 1 reply; 5+ messages in thread
From: Andrea Arcangeli @ 2019-01-31 17:11 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-kernel, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Alistair Popple, Alexey Kardashevskiy,
	Mark Hairgrove, Balbir Singh, David Gibson, Jerome Glisse,
	Jason Wang, linuxppc-dev

On Thu, Jan 31, 2019 at 06:30:22PM +0800, Peter Xu wrote:
> The change_pte() notifier was designed to use as a quick path to
> update secondary MMU PTEs on write permission changes or PFN changes.
> For KVM, it could reduce the vm-exits when vcpu faults on the pages
> that was touched up by KSM.  It's not used to do cache invalidations,
> for example, if we see the notifier will be called before the real PTE
> update after all (please see set_pte_at_notify that set_pte_at was
> called later).
> 
> All the necessary cache invalidation should all be done in
> invalidate_range() already.
> 
> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> CC: Paul Mackerras <paulus@samba.org>
> CC: Michael Ellerman <mpe@ellerman.id.au>
> CC: Alistair Popple <alistair@popple.id.au>
> CC: Alexey Kardashevskiy <aik@ozlabs.ru>
> CC: Mark Hairgrove <mhairgrove@nvidia.com>
> CC: Balbir Singh <bsingharora@gmail.com>
> CC: David Gibson <david@gibson.dropbear.id.au>
> CC: Andrea Arcangeli <aarcange@redhat.com>
> CC: Jerome Glisse <jglisse@redhat.com>
> CC: Jason Wang <jasowang@redhat.com>
> CC: linuxppc-dev@lists.ozlabs.org
> CC: linux-kernel@vger.kernel.org
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  arch/powerpc/platforms/powernv/npu-dma.c | 10 ----------
>  1 file changed, 10 deletions(-)

Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>

It doesn't make sense to implement change_pte as an invalidate,
change_pte is not compulsory to implement so if one wants to have
invalidates only, change_pte method shouldn't be implemented in the
first place and the common code will guarantee to invoke the range
invalidates instead.

Currently the whole change_pte optimization is effectively disabled as
noted in past discussions with Jerome (because of the range
invalidates that always surrounds it), so we need to revisit the whole
change_pte logic and decide it to re-enable it or to drop it as a
whole, but in the meantime it's good to cleanup spots like below that
should leave change_pte alone.

There are several examples of mmu_notifiers_ops in the kernel that
don't implement change_pte, in fact it's the majority. Of all mmu
notifier users, only nv_nmmu_notifier_ops, intel_mmuops_change and
kvm_mmu_notifier_ops implements change_pte and as Peter found out by
source review nv_nmmu_notifier_ops, intel_mmuops_change are wrong
about it and should stop implementing it as an invalidate.

In short change_pte is only implemented correctly from KVM which can
really updates the spte and flushes the TLB but the spte update
remains and could avoid a vmexit if we figure out how to re-enable the
optimization safely (the TLB fill after change_pte in KVM EPT/shadow
secondary MMU will be looked up by the CPU in hardware).

If change_pte is implemented, it should update the mapping like KVM
does and not do an invalidate.

Thanks,
Andrea

> 
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
> index 3f58c7dbd581..c003b29d870e 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -917,15 +917,6 @@ static void pnv_npu2_mn_release(struct mmu_notifier *mn,
>  	mmio_invalidate(npu_context, 0, ~0UL);
>  }
>  
> -static void pnv_npu2_mn_change_pte(struct mmu_notifier *mn,
> -				struct mm_struct *mm,
> -				unsigned long address,
> -				pte_t pte)
> -{
> -	struct npu_context *npu_context = mn_to_npu_context(mn);
> -	mmio_invalidate(npu_context, address, PAGE_SIZE);
> -}
> -
>  static void pnv_npu2_mn_invalidate_range(struct mmu_notifier *mn,
>  					struct mm_struct *mm,
>  					unsigned long start, unsigned long end)
> @@ -936,7 +927,6 @@ static void pnv_npu2_mn_invalidate_range(struct mmu_notifier *mn,
>  
>  static const struct mmu_notifier_ops nv_nmmu_notifier_ops = {
>  	.release = pnv_npu2_mn_release,
> -	.change_pte = pnv_npu2_mn_change_pte,
>  	.invalidate_range = pnv_npu2_mn_invalidate_range,
>  };
>  
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] powerpc/powernv/npu: Remove redundant change_pte() hook
  2019-01-31 17:11 ` Andrea Arcangeli
@ 2019-02-05  3:52   ` Alistair Popple
  2019-02-06  2:55     ` Balbir Singh
  0 siblings, 1 reply; 5+ messages in thread
From: Alistair Popple @ 2019-02-05  3:52 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Peter Xu, linux-kernel, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Alexey Kardashevskiy, Mark Hairgrove,
	Balbir Singh, David Gibson, Jerome Glisse, Jason Wang,
	linuxppc-dev

On Thursday, 31 January 2019 12:11:06 PM AEDT Andrea Arcangeli wrote:
> On Thu, Jan 31, 2019 at 06:30:22PM +0800, Peter Xu wrote:
> > The change_pte() notifier was designed to use as a quick path to
> > update secondary MMU PTEs on write permission changes or PFN changes.
> > For KVM, it could reduce the vm-exits when vcpu faults on the pages
> > that was touched up by KSM.  It's not used to do cache invalidations,
> > for example, if we see the notifier will be called before the real PTE
> > update after all (please see set_pte_at_notify that set_pte_at was
> > called later).

Thanks for the fixup. I didn't realise that invalidate_range() always gets 
called but I now see that is the case so this change looks good to me as well.

Reviewed-by: Alistair Popple <alistair@popple.id.au>

> > All the necessary cache invalidation should all be done in
> > invalidate_range() already.
> > 
> > CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > CC: Paul Mackerras <paulus@samba.org>
> > CC: Michael Ellerman <mpe@ellerman.id.au>
> > CC: Alistair Popple <alistair@popple.id.au>
> > CC: Alexey Kardashevskiy <aik@ozlabs.ru>
> > CC: Mark Hairgrove <mhairgrove@nvidia.com>
> > CC: Balbir Singh <bsingharora@gmail.com>
> > CC: David Gibson <david@gibson.dropbear.id.au>
> > CC: Andrea Arcangeli <aarcange@redhat.com>
> > CC: Jerome Glisse <jglisse@redhat.com>
> > CC: Jason Wang <jasowang@redhat.com>
> > CC: linuxppc-dev@lists.ozlabs.org
> > CC: linux-kernel@vger.kernel.org
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> > 
> >  arch/powerpc/platforms/powernv/npu-dma.c | 10 ----------
> >  1 file changed, 10 deletions(-)
> 
> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
> 
> It doesn't make sense to implement change_pte as an invalidate,
> change_pte is not compulsory to implement so if one wants to have
> invalidates only, change_pte method shouldn't be implemented in the
> first place and the common code will guarantee to invoke the range
> invalidates instead.
> 
> Currently the whole change_pte optimization is effectively disabled as
> noted in past discussions with Jerome (because of the range
> invalidates that always surrounds it), so we need to revisit the whole
> change_pte logic and decide it to re-enable it or to drop it as a
> whole, but in the meantime it's good to cleanup spots like below that
> should leave change_pte alone.
> 
> There are several examples of mmu_notifiers_ops in the kernel that
> don't implement change_pte, in fact it's the majority. Of all mmu
> notifier users, only nv_nmmu_notifier_ops, intel_mmuops_change and
> kvm_mmu_notifier_ops implements change_pte and as Peter found out by
> source review nv_nmmu_notifier_ops, intel_mmuops_change are wrong
> about it and should stop implementing it as an invalidate.
> 
> In short change_pte is only implemented correctly from KVM which can
> really updates the spte and flushes the TLB but the spte update
> remains and could avoid a vmexit if we figure out how to re-enable the
> optimization safely (the TLB fill after change_pte in KVM EPT/shadow
> secondary MMU will be looked up by the CPU in hardware).
> 
> If change_pte is implemented, it should update the mapping like KVM
> does and not do an invalidate.
> 
> Thanks,
> Andrea
> 
> > diff --git a/arch/powerpc/platforms/powernv/npu-dma.c
> > b/arch/powerpc/platforms/powernv/npu-dma.c index
> > 3f58c7dbd581..c003b29d870e 100644
> > --- a/arch/powerpc/platforms/powernv/npu-dma.c
> > +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> > @@ -917,15 +917,6 @@ static void pnv_npu2_mn_release(struct mmu_notifier
> > *mn,> 
> >  	mmio_invalidate(npu_context, 0, ~0UL);
> >  
> >  }
> > 
> > -static void pnv_npu2_mn_change_pte(struct mmu_notifier *mn,
> > -				struct mm_struct *mm,
> > -				unsigned long address,
> > -				pte_t pte)
> > -{
> > -	struct npu_context *npu_context = mn_to_npu_context(mn);
> > -	mmio_invalidate(npu_context, address, PAGE_SIZE);
> > -}
> > -
> > 
> >  static void pnv_npu2_mn_invalidate_range(struct mmu_notifier *mn,
> >  
> >  					struct mm_struct *mm,
> >  					unsigned long start, unsigned long end)
> > 
> > @@ -936,7 +927,6 @@ static void pnv_npu2_mn_invalidate_range(struct
> > mmu_notifier *mn,> 
> >  static const struct mmu_notifier_ops nv_nmmu_notifier_ops = {
> >  
> >  	.release = pnv_npu2_mn_release,
> > 
> > -	.change_pte = pnv_npu2_mn_change_pte,
> > 
> >  	.invalidate_range = pnv_npu2_mn_invalidate_range,
> >  
> >  };



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] powerpc/powernv/npu: Remove redundant change_pte() hook
  2019-02-05  3:52   ` Alistair Popple
@ 2019-02-06  2:55     ` Balbir Singh
  0 siblings, 0 replies; 5+ messages in thread
From: Balbir Singh @ 2019-02-06  2:55 UTC (permalink / raw)
  To: Alistair Popple
  Cc: Andrea Arcangeli, Peter Xu, linux-kernel, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Alexey Kardashevskiy,
	Mark Hairgrove, David Gibson, Jerome Glisse, Jason Wang,
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)

On Tue, Feb 5, 2019 at 2:52 PM Alistair Popple <alistair@popple.id.au> wrote:
>
> On Thursday, 31 January 2019 12:11:06 PM AEDT Andrea Arcangeli wrote:
> > On Thu, Jan 31, 2019 at 06:30:22PM +0800, Peter Xu wrote:
> > > The change_pte() notifier was designed to use as a quick path to
> > > update secondary MMU PTEs on write permission changes or PFN changes.
> > > For KVM, it could reduce the vm-exits when vcpu faults on the pages
> > > that was touched up by KSM.  It's not used to do cache invalidations,
> > > for example, if we see the notifier will be called before the real PTE
> > > update after all (please see set_pte_at_notify that set_pte_at was
> > > called later).
>
> Thanks for the fixup. I didn't realise that invalidate_range() always gets
> called but I now see that is the case so this change looks good to me as well.
>
> Reviewed-by: Alistair Popple <alistair@popple.id.au>
>
I checked the three callers of set_pte_at_notify and the assumption
seems correct

Reviewed-by: Balbir Singh <bsingharora@gmail.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: powerpc/powernv/npu: Remove redundant change_pte() hook
  2019-01-31 10:30 [PATCH] powerpc/powernv/npu: Remove redundant change_pte() hook Peter Xu
  2019-01-31 17:11 ` Andrea Arcangeli
@ 2019-02-22  9:47 ` Michael Ellerman
  1 sibling, 0 replies; 5+ messages in thread
From: Michael Ellerman @ 2019-02-22  9:47 UTC (permalink / raw)
  To: Peter Xu, linux-kernel
  Cc: Andrea Arcangeli, Alexey Kardashevskiy, Mark Hairgrove, peterx,
	Jerome Glisse, Paul Mackerras, Alistair Popple, linuxppc-dev,
	Jason Wang, David Gibson

On Thu, 2019-01-31 at 10:30:22 UTC, Peter Xu wrote:
> The change_pte() notifier was designed to use as a quick path to
> update secondary MMU PTEs on write permission changes or PFN changes.
> For KVM, it could reduce the vm-exits when vcpu faults on the pages
> that was touched up by KSM.  It's not used to do cache invalidations,
> for example, if we see the notifier will be called before the real PTE
> update after all (please see set_pte_at_notify that set_pte_at was
> called later).
> 
> All the necessary cache invalidation should all be done in
> invalidate_range() already.
> 
> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> CC: Paul Mackerras <paulus@samba.org>
> CC: Michael Ellerman <mpe@ellerman.id.au>
> CC: Alistair Popple <alistair@popple.id.au>
> CC: Alexey Kardashevskiy <aik@ozlabs.ru>
> CC: Mark Hairgrove <mhairgrove@nvidia.com>
> CC: Balbir Singh <bsingharora@gmail.com>
> CC: David Gibson <david@gibson.dropbear.id.au>
> CC: Andrea Arcangeli <aarcange@redhat.com>
> CC: Jerome Glisse <jglisse@redhat.com>
> CC: Jason Wang <jasowang@redhat.com>
> CC: linuxppc-dev@lists.ozlabs.org
> CC: linux-kernel@vger.kernel.org
> Signed-off-by: Peter Xu <peterx@redhat.com>
> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
> Reviewed-by: Alistair Popple <alistair@popple.id.au>
> Reviewed-by: Balbir Singh <bsingharora@gmail.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/1b58a975be36994a572ae3b3fb3e0232

cheers

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-02-22  9:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-31 10:30 [PATCH] powerpc/powernv/npu: Remove redundant change_pte() hook Peter Xu
2019-01-31 17:11 ` Andrea Arcangeli
2019-02-05  3:52   ` Alistair Popple
2019-02-06  2:55     ` Balbir Singh
2019-02-22  9:47 ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).