All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] powerpc/mm/books3s: Add new pte bit to mark pte temporarily invalid.
@ 2018-08-22 17:16 Aneesh Kumar K.V
  2018-08-22 17:16 ` [PATCH 2/2] powerpc/mm/hash: Only need the Nest MMU workaround for R -> RW transition Aneesh Kumar K.V
  2018-08-23 14:18 ` [1/2] powerpc/mm/books3s: Add new pte bit to mark pte temporarily invalid Michael Ellerman
  0 siblings, 2 replies; 7+ messages in thread
From: Aneesh Kumar K.V @ 2018-08-22 17:16 UTC (permalink / raw)
  To: npiggin, benh, paulus, mpe; +Cc: linuxppc-dev, Aneesh Kumar K.V

When splitting a huge pmd pte, we need to mark the pmd entry invalid. We
can do that by clearing _PAGE_PRESENT bit. But then that will be taken as a
swap pte. In order to differentiate between the two use a software pte bit
when invalidating.

For regular pte, due to bd5050e38aec ("powerpc/mm/radix: Change pte relax
sequence to handle nest MMU hang") we need to mark the pte entry invalid when
relaxing access permission. Instead of marking pte_none which can result in
different page table walk routines possibly skipping this pte entry, invalidate
it but still keep it marked present.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 676118743a06..13a688fc8cd0 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -44,6 +44,16 @@
 
 #define _PAGE_PTE		0x4000000000000000UL	/* distinguishes PTEs from pointers */
 #define _PAGE_PRESENT		0x8000000000000000UL	/* pte contains a translation */
+/*
+ * We need to mark a pmd pte invalid while splitting. We can do that by clearing
+ * the _PAGE_PRESENT bit. But then that will be taken as a swap pte. In order to
+ * differentiate between two use a SW field when invalidating.
+ *
+ * We do that temporary invalidate for regular pte entry in ptep_set_access_flags
+ *
+ * This is used only when _PAGE_PRESENT is cleared.
+ */
+#define _PAGE_INVALID		_RPAGE_SW0
 
 /*
  * Top and bottom bits of RPN which can be used by hash
@@ -568,7 +578,13 @@ static inline pte_t pte_clear_savedwrite(pte_t pte)
 
 static inline int pte_present(pte_t pte)
 {
-	return !!(pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT));
+	/*
+	 * A pte is considerent present if _PAGE_PRESENT is set.
+	 * We also need to consider the pte present which is marked
+	 * invalid during ptep_set_access_flags. Hence we look for _PAGE_INVALID
+	 * if we find _PAGE_PRESENT cleared.
+	 */
+	return !!(pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_INVALID));
 }
 
 #ifdef CONFIG_PPC_MEM_KEYS
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] powerpc/mm/hash: Only need the Nest MMU workaround for R -> RW transition
  2018-08-22 17:16 [PATCH 1/2] powerpc/mm/books3s: Add new pte bit to mark pte temporarily invalid Aneesh Kumar K.V
@ 2018-08-22 17:16 ` Aneesh Kumar K.V
  2018-08-22 17:18   ` Aneesh Kumar K.V
  2018-08-23  9:23   ` Nicholas Piggin
  2018-08-23 14:18 ` [1/2] powerpc/mm/books3s: Add new pte bit to mark pte temporarily invalid Michael Ellerman
  1 sibling, 2 replies; 7+ messages in thread
From: Aneesh Kumar K.V @ 2018-08-22 17:16 UTC (permalink / raw)
  To: npiggin, benh, paulus, mpe; +Cc: linuxppc-dev, Aneesh Kumar K.V

The Nest MMU workaround is only needed for RW upgrades. Avoid doing that
for other pte updates.

We also avoid clearing the pte while marking it invalid. This is because other
page table walk will find this pte none and can result in unexpected behaviour
due to that. Instead we clear _PAGE_PRESENT and set the software pte bit
_PAGE_INVALID. pte_present is already updated to check for bot the bits. This
make sure page table walkers will find the pte present and things like
pte_pfn(pte) returns the right value.

Based on the original patch from Benjamin Herrenschmidt <benh@kernel.crashing.org>

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/mm/pgtable-radix.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 7be99fd9af15..c879979faa73 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -1045,20 +1045,22 @@ void radix__ptep_set_access_flags(struct vm_area_struct *vma, pte_t *ptep,
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long set = pte_val(entry) & (_PAGE_DIRTY | _PAGE_ACCESSED |
 					      _PAGE_RW | _PAGE_EXEC);
+
+	unsigned long change = pte_val(entry) ^ pte_val(*ptep);
 	/*
 	 * To avoid NMMU hang while relaxing access, we need mark
 	 * the pte invalid in between.
 	 */
-	if (atomic_read(&mm->context.copros) > 0) {
+	if ((change & _PAGE_RW) && atomic_read(&mm->context.copros) > 0) {
 		unsigned long old_pte, new_pte;
 
-		old_pte = __radix_pte_update(ptep, ~0, 0);
+		old_pte = __radix_pte_update(ptep, _PAGE_PRESENT, _PAGE_INVALID);
 		/*
 		 * new value of pte
 		 */
 		new_pte = old_pte | set;
 		radix__flush_tlb_page_psize(mm, address, psize);
-		__radix_pte_update(ptep, 0, new_pte);
+		__radix_pte_update(ptep, _PAGE_INVALID, new_pte);
 	} else {
 		__radix_pte_update(ptep, 0, set);
 		/*
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] powerpc/mm/hash: Only need the Nest MMU workaround for R -> RW transition
  2018-08-22 17:16 ` [PATCH 2/2] powerpc/mm/hash: Only need the Nest MMU workaround for R -> RW transition Aneesh Kumar K.V
@ 2018-08-22 17:18   ` Aneesh Kumar K.V
  2018-08-23  9:23   ` Nicholas Piggin
  1 sibling, 0 replies; 7+ messages in thread
From: Aneesh Kumar K.V @ 2018-08-22 17:18 UTC (permalink / raw)
  To: npiggin, benh, paulus, mpe; +Cc: linuxppc-dev

On 08/22/2018 10:46 PM, Aneesh Kumar K.V wrote:
> The Nest MMU workaround is only needed for RW upgrades. Avoid doing that
> for other pte updates.
> 
> We also avoid clearing the pte while marking it invalid. This is because other
> page table walk will find this pte none and can result in unexpected behaviour
> due to that. Instead we clear _PAGE_PRESENT and set the software pte bit
> _PAGE_INVALID. pte_present is already updated to check for bot the bits. This
> make sure page table walkers will find the pte present and things like
> pte_pfn(pte) returns the right value.
> 
> Based on the original patch from Benjamin Herrenschmidt <benh@kernel.crashing.org>
> 

Fixes: bd5050e38aec3
("powerpc/mm/radix: Change pte relax sequence to handle nest MMU hang")

> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>   arch/powerpc/mm/pgtable-radix.c | 8 +++++---
>   1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
> index 7be99fd9af15..c879979faa73 100644
> --- a/arch/powerpc/mm/pgtable-radix.c
> +++ b/arch/powerpc/mm/pgtable-radix.c
> @@ -1045,20 +1045,22 @@ void radix__ptep_set_access_flags(struct vm_area_struct *vma, pte_t *ptep,
>   	struct mm_struct *mm = vma->vm_mm;
>   	unsigned long set = pte_val(entry) & (_PAGE_DIRTY | _PAGE_ACCESSED |
>   					      _PAGE_RW | _PAGE_EXEC);
> +
> +	unsigned long change = pte_val(entry) ^ pte_val(*ptep);
>   	/*
>   	 * To avoid NMMU hang while relaxing access, we need mark
>   	 * the pte invalid in between.
>   	 */
> -	if (atomic_read(&mm->context.copros) > 0) {
> +	if ((change & _PAGE_RW) && atomic_read(&mm->context.copros) > 0) {
>   		unsigned long old_pte, new_pte;
>   
> -		old_pte = __radix_pte_update(ptep, ~0, 0);
> +		old_pte = __radix_pte_update(ptep, _PAGE_PRESENT, _PAGE_INVALID);
>   		/*
>   		 * new value of pte
>   		 */
>   		new_pte = old_pte | set;
>   		radix__flush_tlb_page_psize(mm, address, psize);
> -		__radix_pte_update(ptep, 0, new_pte);
> +		__radix_pte_update(ptep, _PAGE_INVALID, new_pte);
>   	} else {
>   		__radix_pte_update(ptep, 0, set);
>   		/*
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] powerpc/mm/hash: Only need the Nest MMU workaround for R -> RW transition
  2018-08-22 17:16 ` [PATCH 2/2] powerpc/mm/hash: Only need the Nest MMU workaround for R -> RW transition Aneesh Kumar K.V
  2018-08-22 17:18   ` Aneesh Kumar K.V
@ 2018-08-23  9:23   ` Nicholas Piggin
  2018-08-23 11:51     ` Benjamin Herrenschmidt
  2018-08-23 11:57     ` Michael Ellerman
  1 sibling, 2 replies; 7+ messages in thread
From: Nicholas Piggin @ 2018-08-23  9:23 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: benh, paulus, mpe, linuxppc-dev

On Wed, 22 Aug 2018 22:46:05 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> wrote:

> The Nest MMU workaround is only needed for RW upgrades. Avoid doing that
> for other pte updates.
> 
> We also avoid clearing the pte while marking it invalid. This is because other
> page table walk will find this pte none and can result in unexpected behaviour
> due to that. Instead we clear _PAGE_PRESENT and set the software pte bit
> _PAGE_INVALID. pte_present is already updated to check for bot the bits. This
> make sure page table walkers will find the pte present and things like
> pte_pfn(pte) returns the right value.
> 
> Based on the original patch from Benjamin Herrenschmidt <benh@kernel.crashing.org>
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  arch/powerpc/mm/pgtable-radix.c | 8 +++++---

This is powerpc/mm/radix, isn't it? Subject says hash.

Could we make this fix POWER9 only and use a RSV bit for it
rather than use up a SW bit? Other than that,

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
> index 7be99fd9af15..c879979faa73 100644
> --- a/arch/powerpc/mm/pgtable-radix.c
> +++ b/arch/powerpc/mm/pgtable-radix.c
> @@ -1045,20 +1045,22 @@ void radix__ptep_set_access_flags(struct vm_area_struct *vma, pte_t *ptep,
>  	struct mm_struct *mm = vma->vm_mm;
>  	unsigned long set = pte_val(entry) & (_PAGE_DIRTY | _PAGE_ACCESSED |
>  					      _PAGE_RW | _PAGE_EXEC);
> +
> +	unsigned long change = pte_val(entry) ^ pte_val(*ptep);
>  	/*
>  	 * To avoid NMMU hang while relaxing access, we need mark
>  	 * the pte invalid in between.
>  	 */
> -	if (atomic_read(&mm->context.copros) > 0) {
> +	if ((change & _PAGE_RW) && atomic_read(&mm->context.copros) > 0) {
>  		unsigned long old_pte, new_pte;
>  
> -		old_pte = __radix_pte_update(ptep, ~0, 0);
> +		old_pte = __radix_pte_update(ptep, _PAGE_PRESENT, _PAGE_INVALID);
>  		/*
>  		 * new value of pte
>  		 */
>  		new_pte = old_pte | set;
>  		radix__flush_tlb_page_psize(mm, address, psize);
> -		__radix_pte_update(ptep, 0, new_pte);
> +		__radix_pte_update(ptep, _PAGE_INVALID, new_pte);
>  	} else {
>  		__radix_pte_update(ptep, 0, set);
>  		/*

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] powerpc/mm/hash: Only need the Nest MMU workaround for R -> RW transition
  2018-08-23  9:23   ` Nicholas Piggin
@ 2018-08-23 11:51     ` Benjamin Herrenschmidt
  2018-08-23 11:57     ` Michael Ellerman
  1 sibling, 0 replies; 7+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-23 11:51 UTC (permalink / raw)
  To: Nicholas Piggin, Aneesh Kumar K.V; +Cc: paulus, mpe, linuxppc-dev

On Thu, 2018-08-23 at 19:23 +1000, Nicholas Piggin wrote:
> On Wed, 22 Aug 2018 22:46:05 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> wrote:
> 
> > The Nest MMU workaround is only needed for RW upgrades. Avoid doing that
> > for other pte updates.
> > 
> > We also avoid clearing the pte while marking it invalid. This is because other
> > page table walk will find this pte none and can result in unexpected behaviour
> > due to that. Instead we clear _PAGE_PRESENT and set the software pte bit
> > _PAGE_INVALID. pte_present is already updated to check for bot the bits. This
> > make sure page table walkers will find the pte present and things like
> > pte_pfn(pte) returns the right value.
> > 
> > Based on the original patch from Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > 
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> > ---
> >  arch/powerpc/mm/pgtable-radix.c | 8 +++++---
> 
> This is powerpc/mm/radix, isn't it? Subject says hash.
> 
> Could we make this fix POWER9 only and use a RSV bit for it
> rather than use up a SW bit? Other than that,

Well, the SW bit is necessary for THP as well isn't it ?
> 
> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
> 
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
> > index 7be99fd9af15..c879979faa73 100644
> > --- a/arch/powerpc/mm/pgtable-radix.c
> > +++ b/arch/powerpc/mm/pgtable-radix.c
> > @@ -1045,20 +1045,22 @@ void radix__ptep_set_access_flags(struct vm_area_struct *vma, pte_t *ptep,
> >  	struct mm_struct *mm = vma->vm_mm;
> >  	unsigned long set = pte_val(entry) & (_PAGE_DIRTY | _PAGE_ACCESSED |
> >  					      _PAGE_RW | _PAGE_EXEC);
> > +
> > +	unsigned long change = pte_val(entry) ^ pte_val(*ptep);
> >  	/*
> >  	 * To avoid NMMU hang while relaxing access, we need mark
> >  	 * the pte invalid in between.
> >  	 */
> > -	if (atomic_read(&mm->context.copros) > 0) {
> > +	if ((change & _PAGE_RW) && atomic_read(&mm->context.copros) > 0) {
> >  		unsigned long old_pte, new_pte;
> >  
> > -		old_pte = __radix_pte_update(ptep, ~0, 0);
> > +		old_pte = __radix_pte_update(ptep, _PAGE_PRESENT, _PAGE_INVALID);
> >  		/*
> >  		 * new value of pte
> >  		 */
> >  		new_pte = old_pte | set;
> >  		radix__flush_tlb_page_psize(mm, address, psize);
> > -		__radix_pte_update(ptep, 0, new_pte);
> > +		__radix_pte_update(ptep, _PAGE_INVALID, new_pte);
> >  	} else {
> >  		__radix_pte_update(ptep, 0, set);
> >  		/*

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] powerpc/mm/hash: Only need the Nest MMU workaround for R -> RW transition
  2018-08-23  9:23   ` Nicholas Piggin
  2018-08-23 11:51     ` Benjamin Herrenschmidt
@ 2018-08-23 11:57     ` Michael Ellerman
  1 sibling, 0 replies; 7+ messages in thread
From: Michael Ellerman @ 2018-08-23 11:57 UTC (permalink / raw)
  To: Nicholas Piggin, Aneesh Kumar K.V; +Cc: benh, paulus, linuxppc-dev

Nicholas Piggin <npiggin@gmail.com> writes:
> On Wed, 22 Aug 2018 22:46:05 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> wrote:
>
>> The Nest MMU workaround is only needed for RW upgrades. Avoid doing that
>> for other pte updates.
>> 
>> We also avoid clearing the pte while marking it invalid. This is because other
>> page table walk will find this pte none and can result in unexpected behaviour
>> due to that. Instead we clear _PAGE_PRESENT and set the software pte bit
>> _PAGE_INVALID. pte_present is already updated to check for bot the bits. This
>> make sure page table walkers will find the pte present and things like
>> pte_pfn(pte) returns the right value.
>> 
>> Based on the original patch from Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>  arch/powerpc/mm/pgtable-radix.c | 8 +++++---
>
> This is powerpc/mm/radix, isn't it? Subject says hash.

I fixed it when applying.

> Could we make this fix POWER9 only and use a RSV bit for it
> rather than use up a SW bit? Other than that,
>
> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

Thanks.

cheers

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [1/2] powerpc/mm/books3s: Add new pte bit to mark pte temporarily invalid.
  2018-08-22 17:16 [PATCH 1/2] powerpc/mm/books3s: Add new pte bit to mark pte temporarily invalid Aneesh Kumar K.V
  2018-08-22 17:16 ` [PATCH 2/2] powerpc/mm/hash: Only need the Nest MMU workaround for R -> RW transition Aneesh Kumar K.V
@ 2018-08-23 14:18 ` Michael Ellerman
  1 sibling, 0 replies; 7+ messages in thread
From: Michael Ellerman @ 2018-08-23 14:18 UTC (permalink / raw)
  To: Aneesh Kumar K.V, npiggin, benh, paulus; +Cc: Aneesh Kumar K.V, linuxppc-dev

On Wed, 2018-08-22 at 17:16:04 UTC, "Aneesh Kumar K.V" wrote:
> When splitting a huge pmd pte, we need to mark the pmd entry invalid. We
> can do that by clearing _PAGE_PRESENT bit. But then that will be taken as a
> swap pte. In order to differentiate between the two use a software pte bit
> when invalidating.
> 
> For regular pte, due to bd5050e38aec ("powerpc/mm/radix: Change pte relax
> sequence to handle nest MMU hang") we need to mark the pte entry invalid when
> relaxing access permission. Instead of marking pte_none which can result in
> different page table walk routines possibly skipping this pte entry, invalidate
> it but still keep it marked present.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/bd0dbb73e01306a1060e56f81e5fe2

cheers

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-08-23 14:18 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-22 17:16 [PATCH 1/2] powerpc/mm/books3s: Add new pte bit to mark pte temporarily invalid Aneesh Kumar K.V
2018-08-22 17:16 ` [PATCH 2/2] powerpc/mm/hash: Only need the Nest MMU workaround for R -> RW transition Aneesh Kumar K.V
2018-08-22 17:18   ` Aneesh Kumar K.V
2018-08-23  9:23   ` Nicholas Piggin
2018-08-23 11:51     ` Benjamin Herrenschmidt
2018-08-23 11:57     ` Michael Ellerman
2018-08-23 14:18 ` [1/2] powerpc/mm/books3s: Add new pte bit to mark pte temporarily invalid Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.