linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] powerpc/64s: Fix THP PMD collapse serialisation
@ 2019-06-07  3:56 Nicholas Piggin
  2019-06-07  3:56 ` [PATCH 2/2] powerpc/64s: __find_linux_pte synchronization vs pmdp_invalidate Nicholas Piggin
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Nicholas Piggin @ 2019-06-07  3:56 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Aneesh Kumar K . V, Nicholas Piggin

Commit 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion
in pte helpers") changed the actual bitwise tests in pte_access_permitted
by using pte_write() and pte_present() helpers rather than raw bitwise
testing _PAGE_WRITE and _PAGE_PRESENT bits.

The pte_present change now returns true for ptes which are !_PAGE_PRESENT
and _PAGE_INVALID, which is the combination used by pmdp_invalidate to
synchronize access from lock-free lookups. pte_access_permitted is used by
pmd_access_permitted, so allowing GUP lock free access to proceed with
such PTEs breaks this synchronisation.

This bug has been observed on HPT host, with random crashes and corruption
in guests, usually together with bad PMD messages in the host.

Fix this by adding an explicit check in pmd_access_permitted, and
documenting the condition explicitly.

The pte_write() change should be okay, and would prevent GUP from falling
back to the slow path when encountering savedwrite ptes, which matches
what x86 (that does not implement savedwrite) does.

Fixes: 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion in pte helpers")
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---

I accounted for Aneesh's and Christophe's feedback, except I couldn't
find a good way to replace the ifdef with IS_ENABLED because of
_PAGE_INVALID etc., but at least cleaned that up a bit nicer.

Patch 1 solves a problem I can hit quite reliably running HPT/HPT KVM.
Patch 2 was noticed by Aneesh when inspecting code for similar bugs.
They should probably both be merged in stable kernels after upstream.

 arch/powerpc/include/asm/book3s/64/pgtable.h | 30 ++++++++++++++++++++
 arch/powerpc/mm/book3s64/pgtable.c           |  3 ++
 2 files changed, 33 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 7dede2e34b70..ccf00a8b98c6 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -876,6 +876,23 @@ static inline int pmd_present(pmd_t pmd)
 	return false;
 }
 
+static inline int pmd_is_serializing(pmd_t pmd)
+{
+	/*
+	 * If the pmd is undergoing a split, the _PAGE_PRESENT bit is clear
+	 * and _PAGE_INVALID is set (see pmd_present, pmdp_invalidate).
+	 *
+	 * This condition may also occur when flushing a pmd while flushing
+	 * it (see ptep_modify_prot_start), so callers must ensure this
+	 * case is fine as well.
+	 */
+	if ((pmd_raw(pmd) & cpu_to_be64(_PAGE_PRESENT | _PAGE_INVALID)) ==
+						cpu_to_be64(_PAGE_INVALID))
+		return true;
+
+	return false;
+}
+
 static inline int pmd_bad(pmd_t pmd)
 {
 	if (radix_enabled())
@@ -1092,6 +1109,19 @@ static inline int pmd_protnone(pmd_t pmd)
 #define pmd_access_permitted pmd_access_permitted
 static inline bool pmd_access_permitted(pmd_t pmd, bool write)
 {
+	/*
+	 * pmdp_invalidate sets this combination (which is not caught by
+	 * !pte_present() check in pte_access_permitted), to prevent
+	 * lock-free lookups, as part of the serialize_against_pte_lookup()
+	 * synchronisation.
+	 *
+	 * This also catches the case where the PTE's hardware PRESENT bit is
+	 * cleared while TLB is flushed, which is suboptimal but should not
+	 * be frequent.
+	 */
+	if (pmd_is_serializing(pmd))
+		return false;
+
 	return pte_access_permitted(pmd_pte(pmd), write);
 }
 
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index 16bda049187a..ff98b663c83e 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -116,6 +116,9 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 	/*
 	 * This ensures that generic code that rely on IRQ disabling
 	 * to prevent a parallel THP split work as expected.
+	 *
+	 * Marking the entry with _PAGE_INVALID && ~_PAGE_PRESENT requires
+	 * a special case check in pmd_access_permitted.
 	 */
 	serialize_against_pte_lookup(vma->vm_mm);
 	return __pmd(old_pmd);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] powerpc/64s: __find_linux_pte synchronization vs pmdp_invalidate
  2019-06-07  3:56 [PATCH 1/2] powerpc/64s: Fix THP PMD collapse serialisation Nicholas Piggin
@ 2019-06-07  3:56 ` Nicholas Piggin
  2019-06-07  5:35   ` Christophe Leroy
                     ` (2 more replies)
  2019-06-07  5:34 ` [PATCH 1/2] powerpc/64s: Fix THP PMD collapse serialisation Christophe Leroy
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 10+ messages in thread
From: Nicholas Piggin @ 2019-06-07  3:56 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Aneesh Kumar K . V, Nicholas Piggin

The change to pmdp_invalidate to mark the pmd with _PAGE_INVALID broke
the synchronisation against lock free lookups, __find_linux_pte's
pmd_none check no longer returns true for such cases.

Fix this by adding a check for this condition as well.

Fixes: da7ad366b497 ("powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit")
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/mm/pgtable.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index db4a6253df92..533fc6fa6726 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -372,13 +372,25 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
 	pdshift = PMD_SHIFT;
 	pmdp = pmd_offset(&pud, ea);
 	pmd  = READ_ONCE(*pmdp);
+
 	/*
-	 * A hugepage collapse is captured by pmd_none, because
-	 * it mark the pmd none and do a hpte invalidate.
+	 * A hugepage collapse is captured by this condition, see
+	 * pmdp_collapse_flush.
 	 */
 	if (pmd_none(pmd))
 		return NULL;
 
+#ifdef CONFIG_PPC_BOOK3S_64
+	/*
+	 * A hugepage split is captured by this condition, see
+	 * pmdp_invalidate.
+	 *
+	 * Huge page modification can be caught here too.
+	 */
+	if (pmd_is_serializing(pmd))
+		return NULL;
+#endif
+
 	if (pmd_trans_huge(pmd) || pmd_devmap(pmd)) {
 		if (is_thp)
 			*is_thp = true;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] powerpc/64s: Fix THP PMD collapse serialisation
  2019-06-07  3:56 [PATCH 1/2] powerpc/64s: Fix THP PMD collapse serialisation Nicholas Piggin
  2019-06-07  3:56 ` [PATCH 2/2] powerpc/64s: __find_linux_pte synchronization vs pmdp_invalidate Nicholas Piggin
@ 2019-06-07  5:34 ` Christophe Leroy
  2019-06-07  6:40   ` Nicholas Piggin
  2019-06-07  5:35 ` Aneesh Kumar K.V
  2019-06-12  4:59 ` Michael Ellerman
  3 siblings, 1 reply; 10+ messages in thread
From: Christophe Leroy @ 2019-06-07  5:34 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Aneesh Kumar K . V



Le 07/06/2019 à 05:56, Nicholas Piggin a écrit :
> Commit 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion
> in pte helpers") changed the actual bitwise tests in pte_access_permitted
> by using pte_write() and pte_present() helpers rather than raw bitwise
> testing _PAGE_WRITE and _PAGE_PRESENT bits.
> 
> The pte_present change now returns true for ptes which are !_PAGE_PRESENT
> and _PAGE_INVALID, which is the combination used by pmdp_invalidate to
> synchronize access from lock-free lookups. pte_access_permitted is used by
> pmd_access_permitted, so allowing GUP lock free access to proceed with
> such PTEs breaks this synchronisation.
> 
> This bug has been observed on HPT host, with random crashes and corruption
> in guests, usually together with bad PMD messages in the host.
> 
> Fix this by adding an explicit check in pmd_access_permitted, and
> documenting the condition explicitly.
> 
> The pte_write() change should be okay, and would prevent GUP from falling
> back to the slow path when encountering savedwrite ptes, which matches
> what x86 (that does not implement savedwrite) does.
> 
> Fixes: 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion in pte helpers")
> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> 
> I accounted for Aneesh's and Christophe's feedback, except I couldn't
> find a good way to replace the ifdef with IS_ENABLED because of
> _PAGE_INVALID etc., but at least cleaned that up a bit nicer.

I guess the standard way is to add a pmd_is_serializing() which return 
always false in book3s/32/pgtable.h and in nohash/pgtable.h

> 
> Patch 1 solves a problem I can hit quite reliably running HPT/HPT KVM.
> Patch 2 was noticed by Aneesh when inspecting code for similar bugs.
> They should probably both be merged in stable kernels after upstream.
> 
>   arch/powerpc/include/asm/book3s/64/pgtable.h | 30 ++++++++++++++++++++
>   arch/powerpc/mm/book3s64/pgtable.c           |  3 ++
>   2 files changed, 33 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 7dede2e34b70..ccf00a8b98c6 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -876,6 +876,23 @@ static inline int pmd_present(pmd_t pmd)
>   	return false;
>   }
>   
> +static inline int pmd_is_serializing(pmd_t pmd)

should be static inline bool instead of int ?

Christophe

> +{
> +	/*
> +	 * If the pmd is undergoing a split, the _PAGE_PRESENT bit is clear
> +	 * and _PAGE_INVALID is set (see pmd_present, pmdp_invalidate).
> +	 *
> +	 * This condition may also occur when flushing a pmd while flushing
> +	 * it (see ptep_modify_prot_start), so callers must ensure this
> +	 * case is fine as well.
> +	 */
> +	if ((pmd_raw(pmd) & cpu_to_be64(_PAGE_PRESENT | _PAGE_INVALID)) ==
> +						cpu_to_be64(_PAGE_INVALID))
> +		return true;
> +
> +	return false;
> +}
> +
>   static inline int pmd_bad(pmd_t pmd)
>   {
>   	if (radix_enabled())
> @@ -1092,6 +1109,19 @@ static inline int pmd_protnone(pmd_t pmd)
>   #define pmd_access_permitted pmd_access_permitted
>   static inline bool pmd_access_permitted(pmd_t pmd, bool write)
>   {
> +	/*
> +	 * pmdp_invalidate sets this combination (which is not caught by
> +	 * !pte_present() check in pte_access_permitted), to prevent
> +	 * lock-free lookups, as part of the serialize_against_pte_lookup()
> +	 * synchronisation.
> +	 *
> +	 * This also catches the case where the PTE's hardware PRESENT bit is
> +	 * cleared while TLB is flushed, which is suboptimal but should not
> +	 * be frequent.
> +	 */
> +	if (pmd_is_serializing(pmd))
> +		return false;
> +
>   	return pte_access_permitted(pmd_pte(pmd), write);
>   }
>   
> diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
> index 16bda049187a..ff98b663c83e 100644
> --- a/arch/powerpc/mm/book3s64/pgtable.c
> +++ b/arch/powerpc/mm/book3s64/pgtable.c
> @@ -116,6 +116,9 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>   	/*
>   	 * This ensures that generic code that rely on IRQ disabling
>   	 * to prevent a parallel THP split work as expected.
> +	 *
> +	 * Marking the entry with _PAGE_INVALID && ~_PAGE_PRESENT requires
> +	 * a special case check in pmd_access_permitted.
>   	 */
>   	serialize_against_pte_lookup(vma->vm_mm);
>   	return __pmd(old_pmd);
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] powerpc/64s: Fix THP PMD collapse serialisation
  2019-06-07  3:56 [PATCH 1/2] powerpc/64s: Fix THP PMD collapse serialisation Nicholas Piggin
  2019-06-07  3:56 ` [PATCH 2/2] powerpc/64s: __find_linux_pte synchronization vs pmdp_invalidate Nicholas Piggin
  2019-06-07  5:34 ` [PATCH 1/2] powerpc/64s: Fix THP PMD collapse serialisation Christophe Leroy
@ 2019-06-07  5:35 ` Aneesh Kumar K.V
  2019-06-12  4:59 ` Michael Ellerman
  3 siblings, 0 replies; 10+ messages in thread
From: Aneesh Kumar K.V @ 2019-06-07  5:35 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Nicholas Piggin

Nicholas Piggin <npiggin@gmail.com> writes:

> Commit 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion
> in pte helpers") changed the actual bitwise tests in pte_access_permitted
> by using pte_write() and pte_present() helpers rather than raw bitwise
> testing _PAGE_WRITE and _PAGE_PRESENT bits.
>
> The pte_present change now returns true for ptes which are !_PAGE_PRESENT
> and _PAGE_INVALID, which is the combination used by pmdp_invalidate to
> synchronize access from lock-free lookups. pte_access_permitted is used by
> pmd_access_permitted, so allowing GUP lock free access to proceed with
> such PTEs breaks this synchronisation.
>
> This bug has been observed on HPT host, with random crashes and corruption
> in guests, usually together with bad PMD messages in the host.
>
> Fix this by adding an explicit check in pmd_access_permitted, and
> documenting the condition explicitly.
>
> The pte_write() change should be okay, and would prevent GUP from falling
> back to the slow path when encountering savedwrite ptes, which matches
> what x86 (that does not implement savedwrite) does.
>

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

> Fixes: 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion in pte helpers")
> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>
> I accounted for Aneesh's and Christophe's feedback, except I couldn't
> find a good way to replace the ifdef with IS_ENABLED because of
> _PAGE_INVALID etc., but at least cleaned that up a bit nicer.
>
> Patch 1 solves a problem I can hit quite reliably running HPT/HPT KVM.
> Patch 2 was noticed by Aneesh when inspecting code for similar bugs.
> They should probably both be merged in stable kernels after upstream.
>
>  arch/powerpc/include/asm/book3s/64/pgtable.h | 30 ++++++++++++++++++++
>  arch/powerpc/mm/book3s64/pgtable.c           |  3 ++
>  2 files changed, 33 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 7dede2e34b70..ccf00a8b98c6 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -876,6 +876,23 @@ static inline int pmd_present(pmd_t pmd)
>  	return false;
>  }
>  
> +static inline int pmd_is_serializing(pmd_t pmd)
> +{
> +	/*
> +	 * If the pmd is undergoing a split, the _PAGE_PRESENT bit is clear
> +	 * and _PAGE_INVALID is set (see pmd_present, pmdp_invalidate).
> +	 *
> +	 * This condition may also occur when flushing a pmd while flushing
> +	 * it (see ptep_modify_prot_start), so callers must ensure this
> +	 * case is fine as well.
> +	 */
> +	if ((pmd_raw(pmd) & cpu_to_be64(_PAGE_PRESENT | _PAGE_INVALID)) ==
> +						cpu_to_be64(_PAGE_INVALID))
> +		return true;
> +
> +	return false;
> +}
> +
>  static inline int pmd_bad(pmd_t pmd)
>  {
>  	if (radix_enabled())
> @@ -1092,6 +1109,19 @@ static inline int pmd_protnone(pmd_t pmd)
>  #define pmd_access_permitted pmd_access_permitted
>  static inline bool pmd_access_permitted(pmd_t pmd, bool write)
>  {
> +	/*
> +	 * pmdp_invalidate sets this combination (which is not caught by
> +	 * !pte_present() check in pte_access_permitted), to prevent
> +	 * lock-free lookups, as part of the serialize_against_pte_lookup()
> +	 * synchronisation.
> +	 *
> +	 * This also catches the case where the PTE's hardware PRESENT bit is
> +	 * cleared while TLB is flushed, which is suboptimal but should not
> +	 * be frequent.
> +	 */
> +	if (pmd_is_serializing(pmd))
> +		return false;
> +
>  	return pte_access_permitted(pmd_pte(pmd), write);
>  }
>  
> diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
> index 16bda049187a..ff98b663c83e 100644
> --- a/arch/powerpc/mm/book3s64/pgtable.c
> +++ b/arch/powerpc/mm/book3s64/pgtable.c
> @@ -116,6 +116,9 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>  	/*
>  	 * This ensures that generic code that rely on IRQ disabling
>  	 * to prevent a parallel THP split work as expected.
> +	 *
> +	 * Marking the entry with _PAGE_INVALID && ~_PAGE_PRESENT requires
> +	 * a special case check in pmd_access_permitted.
>  	 */
>  	serialize_against_pte_lookup(vma->vm_mm);
>  	return __pmd(old_pmd);
> -- 
> 2.20.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] powerpc/64s: __find_linux_pte synchronization vs pmdp_invalidate
  2019-06-07  3:56 ` [PATCH 2/2] powerpc/64s: __find_linux_pte synchronization vs pmdp_invalidate Nicholas Piggin
@ 2019-06-07  5:35   ` Christophe Leroy
  2019-06-07  6:31     ` Nicholas Piggin
  2019-06-07  5:35   ` Aneesh Kumar K.V
  2019-06-12  4:59   ` Michael Ellerman
  2 siblings, 1 reply; 10+ messages in thread
From: Christophe Leroy @ 2019-06-07  5:35 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Aneesh Kumar K . V



Le 07/06/2019 à 05:56, Nicholas Piggin a écrit :
> The change to pmdp_invalidate to mark the pmd with _PAGE_INVALID broke
> the synchronisation against lock free lookups, __find_linux_pte's
> pmd_none check no longer returns true for such cases.
> 
> Fix this by adding a check for this condition as well.
> 
> Fixes: da7ad366b497 ("powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit")
> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
> Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   arch/powerpc/mm/pgtable.c | 16 ++++++++++++++--
>   1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> index db4a6253df92..533fc6fa6726 100644
> --- a/arch/powerpc/mm/pgtable.c
> +++ b/arch/powerpc/mm/pgtable.c
> @@ -372,13 +372,25 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
>   	pdshift = PMD_SHIFT;
>   	pmdp = pmd_offset(&pud, ea);
>   	pmd  = READ_ONCE(*pmdp);
> +
>   	/*
> -	 * A hugepage collapse is captured by pmd_none, because
> -	 * it mark the pmd none and do a hpte invalidate.
> +	 * A hugepage collapse is captured by this condition, see
> +	 * pmdp_collapse_flush.
>   	 */
>   	if (pmd_none(pmd))
>   		return NULL;
>   
> +#ifdef CONFIG_PPC_BOOK3S_64
> +	/*
> +	 * A hugepage split is captured by this condition, see
> +	 * pmdp_invalidate.
> +	 *
> +	 * Huge page modification can be caught here too.
> +	 */
> +	if (pmd_is_serializing(pmd))
> +		return NULL;
> +#endif
> +

Could get rid of that #ifdef by adding the following in book3s32 and 
nohash pgtable.h:

static inline bool pmd_is_serializing()  { return false; }

Christophe

>   	if (pmd_trans_huge(pmd) || pmd_devmap(pmd)) {
>   		if (is_thp)
>   			*is_thp = true;
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] powerpc/64s: __find_linux_pte synchronization vs pmdp_invalidate
  2019-06-07  3:56 ` [PATCH 2/2] powerpc/64s: __find_linux_pte synchronization vs pmdp_invalidate Nicholas Piggin
  2019-06-07  5:35   ` Christophe Leroy
@ 2019-06-07  5:35   ` Aneesh Kumar K.V
  2019-06-12  4:59   ` Michael Ellerman
  2 siblings, 0 replies; 10+ messages in thread
From: Aneesh Kumar K.V @ 2019-06-07  5:35 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Nicholas Piggin

Nicholas Piggin <npiggin@gmail.com> writes:

> The change to pmdp_invalidate to mark the pmd with _PAGE_INVALID broke
> the synchronisation against lock free lookups, __find_linux_pte's
> pmd_none check no longer returns true for such cases.
>
> Fix this by adding a check for this condition as well.
>

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

> Fixes: da7ad366b497 ("powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit")
> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
> Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  arch/powerpc/mm/pgtable.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> index db4a6253df92..533fc6fa6726 100644
> --- a/arch/powerpc/mm/pgtable.c
> +++ b/arch/powerpc/mm/pgtable.c
> @@ -372,13 +372,25 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
>  	pdshift = PMD_SHIFT;
>  	pmdp = pmd_offset(&pud, ea);
>  	pmd  = READ_ONCE(*pmdp);
> +
>  	/*
> -	 * A hugepage collapse is captured by pmd_none, because
> -	 * it mark the pmd none and do a hpte invalidate.
> +	 * A hugepage collapse is captured by this condition, see
> +	 * pmdp_collapse_flush.
>  	 */
>  	if (pmd_none(pmd))
>  		return NULL;
>  
> +#ifdef CONFIG_PPC_BOOK3S_64
> +	/*
> +	 * A hugepage split is captured by this condition, see
> +	 * pmdp_invalidate.
> +	 *
> +	 * Huge page modification can be caught here too.
> +	 */
> +	if (pmd_is_serializing(pmd))
> +		return NULL;
> +#endif
> +
>  	if (pmd_trans_huge(pmd) || pmd_devmap(pmd)) {
>  		if (is_thp)
>  			*is_thp = true;
> -- 
> 2.20.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] powerpc/64s: __find_linux_pte synchronization vs pmdp_invalidate
  2019-06-07  5:35   ` Christophe Leroy
@ 2019-06-07  6:31     ` Nicholas Piggin
  0 siblings, 0 replies; 10+ messages in thread
From: Nicholas Piggin @ 2019-06-07  6:31 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev; +Cc: Aneesh Kumar K . V

Christophe Leroy's on June 7, 2019 3:35 pm:
> 
> 
> Le 07/06/2019 à 05:56, Nicholas Piggin a écrit :
>> The change to pmdp_invalidate to mark the pmd with _PAGE_INVALID broke
>> the synchronisation against lock free lookups, __find_linux_pte's
>> pmd_none check no longer returns true for such cases.
>> 
>> Fix this by adding a check for this condition as well.
>> 
>> Fixes: da7ad366b497 ("powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit")
>> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
>> Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>   arch/powerpc/mm/pgtable.c | 16 ++++++++++++++--
>>   1 file changed, 14 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
>> index db4a6253df92..533fc6fa6726 100644
>> --- a/arch/powerpc/mm/pgtable.c
>> +++ b/arch/powerpc/mm/pgtable.c
>> @@ -372,13 +372,25 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
>>   	pdshift = PMD_SHIFT;
>>   	pmdp = pmd_offset(&pud, ea);
>>   	pmd  = READ_ONCE(*pmdp);
>> +
>>   	/*
>> -	 * A hugepage collapse is captured by pmd_none, because
>> -	 * it mark the pmd none and do a hpte invalidate.
>> +	 * A hugepage collapse is captured by this condition, see
>> +	 * pmdp_collapse_flush.
>>   	 */
>>   	if (pmd_none(pmd))
>>   		return NULL;
>>   
>> +#ifdef CONFIG_PPC_BOOK3S_64
>> +	/*
>> +	 * A hugepage split is captured by this condition, see
>> +	 * pmdp_invalidate.
>> +	 *
>> +	 * Huge page modification can be caught here too.
>> +	 */
>> +	if (pmd_is_serializing(pmd))
>> +		return NULL;
>> +#endif
>> +
> 
> Could get rid of that #ifdef by adding the following in book3s32 and 
> nohash pgtable.h:
> 
> static inline bool pmd_is_serializing()  { return false; }

I don't mind either way. If it's an isolated case like this, sometimes 
I'm against polluting the sub arch code with it.

It's up to you I can change that if you prefer.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] powerpc/64s: Fix THP PMD collapse serialisation
  2019-06-07  5:34 ` [PATCH 1/2] powerpc/64s: Fix THP PMD collapse serialisation Christophe Leroy
@ 2019-06-07  6:40   ` Nicholas Piggin
  0 siblings, 0 replies; 10+ messages in thread
From: Nicholas Piggin @ 2019-06-07  6:40 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev; +Cc: Aneesh Kumar K . V

Christophe Leroy's on June 7, 2019 3:34 pm:
> 
> 
> Le 07/06/2019 à 05:56, Nicholas Piggin a écrit :
>> Commit 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion
>> in pte helpers") changed the actual bitwise tests in pte_access_permitted
>> by using pte_write() and pte_present() helpers rather than raw bitwise
>> testing _PAGE_WRITE and _PAGE_PRESENT bits.
>> 
>> The pte_present change now returns true for ptes which are !_PAGE_PRESENT
>> and _PAGE_INVALID, which is the combination used by pmdp_invalidate to
>> synchronize access from lock-free lookups. pte_access_permitted is used by
>> pmd_access_permitted, so allowing GUP lock free access to proceed with
>> such PTEs breaks this synchronisation.
>> 
>> This bug has been observed on HPT host, with random crashes and corruption
>> in guests, usually together with bad PMD messages in the host.
>> 
>> Fix this by adding an explicit check in pmd_access_permitted, and
>> documenting the condition explicitly.
>> 
>> The pte_write() change should be okay, and would prevent GUP from falling
>> back to the slow path when encountering savedwrite ptes, which matches
>> what x86 (that does not implement savedwrite) does.
>> 
>> Fixes: 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion in pte helpers")
>> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>> 
>> I accounted for Aneesh's and Christophe's feedback, except I couldn't
>> find a good way to replace the ifdef with IS_ENABLED because of
>> _PAGE_INVALID etc., but at least cleaned that up a bit nicer.
> 
> I guess the standard way is to add a pmd_is_serializing() which return 
> always false in book3s/32/pgtable.h and in nohash/pgtable.h


> 
>> 
>> Patch 1 solves a problem I can hit quite reliably running HPT/HPT KVM.
>> Patch 2 was noticed by Aneesh when inspecting code for similar bugs.
>> They should probably both be merged in stable kernels after upstream.
>> 
>>   arch/powerpc/include/asm/book3s/64/pgtable.h | 30 ++++++++++++++++++++
>>   arch/powerpc/mm/book3s64/pgtable.c           |  3 ++
>>   2 files changed, 33 insertions(+)
>> 
>> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> index 7dede2e34b70..ccf00a8b98c6 100644
>> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
>> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> @@ -876,6 +876,23 @@ static inline int pmd_present(pmd_t pmd)
>>   	return false;
>>   }
>>   
>> +static inline int pmd_is_serializing(pmd_t pmd)
> 
> should be static inline bool instead of int ?

I think just about all the p?d_blah boolean functions in the tree are
int at the moment, so I followed that pattern.

May be a good tree wide change to make at some point.

Thanks,
Nick



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] powerpc/64s: Fix THP PMD collapse serialisation
  2019-06-07  3:56 [PATCH 1/2] powerpc/64s: Fix THP PMD collapse serialisation Nicholas Piggin
                   ` (2 preceding siblings ...)
  2019-06-07  5:35 ` Aneesh Kumar K.V
@ 2019-06-12  4:59 ` Michael Ellerman
  3 siblings, 0 replies; 10+ messages in thread
From: Michael Ellerman @ 2019-06-12  4:59 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Aneesh Kumar K . V, Nicholas Piggin

On Fri, 2019-06-07 at 03:56:35 UTC, Nicholas Piggin wrote:
> Commit 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion
> in pte helpers") changed the actual bitwise tests in pte_access_permitted
> by using pte_write() and pte_present() helpers rather than raw bitwise
> testing _PAGE_WRITE and _PAGE_PRESENT bits.
> 
> The pte_present change now returns true for ptes which are !_PAGE_PRESENT
> and _PAGE_INVALID, which is the combination used by pmdp_invalidate to
> synchronize access from lock-free lookups. pte_access_permitted is used by
> pmd_access_permitted, so allowing GUP lock free access to proceed with
> such PTEs breaks this synchronisation.
> 
> This bug has been observed on HPT host, with random crashes and corruption
> in guests, usually together with bad PMD messages in the host.
> 
> Fix this by adding an explicit check in pmd_access_permitted, and
> documenting the condition explicitly.
> 
> The pte_write() change should be okay, and would prevent GUP from falling
> back to the slow path when encountering savedwrite ptes, which matches
> what x86 (that does not implement savedwrite) does.
> 
> Fixes: 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion in pte helpers")
> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/33258a1db165cf43a9e6382587ad06e9

cheers

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] powerpc/64s: __find_linux_pte synchronization vs pmdp_invalidate
  2019-06-07  3:56 ` [PATCH 2/2] powerpc/64s: __find_linux_pte synchronization vs pmdp_invalidate Nicholas Piggin
  2019-06-07  5:35   ` Christophe Leroy
  2019-06-07  5:35   ` Aneesh Kumar K.V
@ 2019-06-12  4:59   ` Michael Ellerman
  2 siblings, 0 replies; 10+ messages in thread
From: Michael Ellerman @ 2019-06-12  4:59 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Aneesh Kumar K . V, Nicholas Piggin

On Fri, 2019-06-07 at 03:56:36 UTC, Nicholas Piggin wrote:
> The change to pmdp_invalidate to mark the pmd with _PAGE_INVALID broke
> the synchronisation against lock free lookups, __find_linux_pte's
> pmd_none check no longer returns true for such cases.
> 
> Fix this by adding a check for this condition as well.
> 
> Fixes: da7ad366b497 ("powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit")
> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
> Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/a00196a272161338d4b1d66ec69e3d57

cheers

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-06-12  5:06 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-07  3:56 [PATCH 1/2] powerpc/64s: Fix THP PMD collapse serialisation Nicholas Piggin
2019-06-07  3:56 ` [PATCH 2/2] powerpc/64s: __find_linux_pte synchronization vs pmdp_invalidate Nicholas Piggin
2019-06-07  5:35   ` Christophe Leroy
2019-06-07  6:31     ` Nicholas Piggin
2019-06-07  5:35   ` Aneesh Kumar K.V
2019-06-12  4:59   ` Michael Ellerman
2019-06-07  5:34 ` [PATCH 1/2] powerpc/64s: Fix THP PMD collapse serialisation Christophe Leroy
2019-06-07  6:40   ` Nicholas Piggin
2019-06-07  5:35 ` Aneesh Kumar K.V
2019-06-12  4:59 ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).