linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/mm: fix function name typo in pmd_read_atomic() comment
@ 2019-09-25  1:44 Wei Yang
  2019-09-25  1:59 ` Wei Yang
  2019-09-27  8:10 ` [tip: x86/mm] x86/mm: Fix function name typo in pmd_read_atomic() comment tip-bot2 for Wei Yang
  0 siblings, 2 replies; 5+ messages in thread
From: Wei Yang @ 2019-09-25  1:44 UTC (permalink / raw)
  To: tglx, mingo, bp; +Cc: x86, linux-kernel, Wei Yang

The function involved should be pte_offset_map_lock and we never have
function pmd_offset_map_lock defined.

Fixes: 26c191788f18 ("mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race conditio")

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
---

Hope my understanding is correct.

---
 arch/x86/include/asm/pgtable-3level.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h
index e3633795fb22..45e6099fe6b7 100644
--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -44,10 +44,10 @@ static inline void native_set_pte(pte_t *ptep, pte_t pte)
  * pmd_populate rightfully does a set_64bit, but if we're reading the
  * pmd_t with a "*pmdp" on the mincore side, a SMP race can happen
  * because gcc will not read the 64bit of the pmd atomically. To fix
- * this all places running pmd_offset_map_lock() while holding the
+ * this all places running pte_offset_map_lock() while holding the
  * mmap_sem in read mode, shall read the pmdp pointer using this
  * function to know if the pmd is null nor not, and in turn to know if
- * they can run pmd_offset_map_lock or pmd_trans_huge or other pmd
+ * they can run pte_offset_map_lock or pmd_trans_huge or other pmd
  * operations.
  *
  * Without THP if the mmap_sem is hold for reading, the pmd can only
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/mm: fix function name typo in pmd_read_atomic() comment
  2019-09-25  1:44 [PATCH] x86/mm: fix function name typo in pmd_read_atomic() comment Wei Yang
@ 2019-09-25  1:59 ` Wei Yang
  2019-09-25  6:55   ` [PATCH] x86/mm: Clean up the pmd_read_atomic() comments Ingo Molnar
  2019-09-27  8:10 ` [tip: x86/mm] x86/mm: Fix function name typo in pmd_read_atomic() comment tip-bot2 for Wei Yang
  1 sibling, 1 reply; 5+ messages in thread
From: Wei Yang @ 2019-09-25  1:59 UTC (permalink / raw)
  To: Wei Yang; +Cc: tglx, mingo, bp, x86, linux-kernel

To be honest, I have a question on how this works.

As the comment says, we need to call pmd_read_atomic before using
pte_offset_map_lock to avoid data corruption.

For example, in function swapin_walk_pmd_entry:

    pmd_none_or_trans_huge_or_clear_bad(pmd)
        pmd_read_atomic(pmd)                      ---   1
    pte_offset_map_lock(mm, pmd, ...)             ---   2

At point 1, we are assured the content is intact. While in point 2, we would
read pmd again to calculate the pte address. How we ensure this time the
content is intact? Because pmd_none_or_trans_huge_or_clear_bad() ensures the
pte is stable, so that the content won't be changed?

Thanks for your time in advance.

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] x86/mm: Clean up the pmd_read_atomic() comments
  2019-09-25  1:59 ` Wei Yang
@ 2019-09-25  6:55   ` Ingo Molnar
  2019-09-25  7:28     ` Wei Yang
  0 siblings, 1 reply; 5+ messages in thread
From: Ingo Molnar @ 2019-09-25  6:55 UTC (permalink / raw)
  To: Wei Yang
  Cc: tglx, mingo, bp, x86, linux-kernel, Peter Zijlstra,
	Andy Lutomirski, Linus Torvalds, Rik van Riel, Dave Hansen


* Wei Yang <richardw.yang@linux.intel.com> wrote:

> To be honest, I have a question on how this works.
> 
> As the comment says, we need to call pmd_read_atomic before using
> pte_offset_map_lock to avoid data corruption.

There's only a risk of data corruption if mmap_sem is held for reading. 
If it's held for writing then the pagetable contents should be stable.

> For example, in function swapin_walk_pmd_entry:
> 
>     pmd_none_or_trans_huge_or_clear_bad(pmd)
>         pmd_read_atomic(pmd)                      ---   1
>     pte_offset_map_lock(mm, pmd, ...)             ---   2
> 
> At point 1, we are assured the content is intact. While in point 2, we would
> read pmd again to calculate the pte address. How we ensure this time the
> content is intact? Because pmd_none_or_trans_huge_or_clear_bad() ensures the
> pte is stable, so that the content won't be changed?

Indeed pte_offset_map_lock() will take a non-atomic *pmd value in 
pte_lockptr() before taking the pte spinlock.

I believe the rule here is that if pmd_none_or_trans_huge_or_clear_bad() 
finds the pmd 'stable' while holding the mmap_sem read-locked, then the 
pmd cannot change while we are continuously holding the mmap_sem.

Hence the followup pte_offset_map_lock() and other iterators can look at 
the value of the pmd without locking. (Individual pte entries still need 
the pte-lock, because they might be faulted-in in parallel.)

So the pmd use pattern in swapin_walk_pmd_entry() should be safe.

I'm not 100% sure though - so I've added a few more Cc:s ...

I've also cleaned up the pmd_read_atomic() some more to make it more 
readable - see the patch below.

Thanks,

	Ingo

==================>
From: Ingo Molnar <mingo@kernel.org>
Date: Wed, 25 Sep 2019 08:38:57 +0200
Subject: [PATCH] x86/mm: Clean up the pmd_read_atomic() comments

Fix spelling, consistent parenthesis and grammar - and also clarify
the language where needed.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Link: https://lkml.kernel.org/r/20190925014453.20236-1-richardw.yang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/pgtable-3level.h | 44 ++++++++++++++++++-----------------
 1 file changed, 23 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h
index 1796462ff143..5afb5e0fe903 100644
--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -36,39 +36,41 @@ static inline void native_set_pte(pte_t *ptep, pte_t pte)
 
 #define pmd_read_atomic pmd_read_atomic
 /*
- * pte_offset_map_lock on 32bit PAE kernels was reading the pmd_t with
- * a "*pmdp" dereference done by gcc. Problem is, in certain places
- * where pte_offset_map_lock is called, concurrent page faults are
+ * pte_offset_map_lock() on 32-bit PAE kernels was reading the pmd_t with
+ * a "*pmdp" dereference done by GCC. Problem is, in certain places
+ * where pte_offset_map_lock() is called, concurrent page faults are
  * allowed, if the mmap_sem is hold for reading. An example is mincore
  * vs page faults vs MADV_DONTNEED. On the page fault side
- * pmd_populate rightfully does a set_64bit, but if we're reading the
+ * pmd_populate() rightfully does a set_64bit(), but if we're reading the
  * pmd_t with a "*pmdp" on the mincore side, a SMP race can happen
- * because gcc will not read the 64bit of the pmd atomically. To fix
- * this all places running pte_offset_map_lock() while holding the
+ * because GCC will not read the 64-bit value of the pmd atomically.
+ *
+ * To fix this all places running pte_offset_map_lock() while holding the
  * mmap_sem in read mode, shall read the pmdp pointer using this
- * function to know if the pmd is null nor not, and in turn to know if
+ * function to know if the pmd is null or not, and in turn to know if
  * they can run pte_offset_map_lock() or pmd_trans_huge() or other pmd
  * operations.
  *
- * Without THP if the mmap_sem is hold for reading, the pmd can only
- * transition from null to not null while pmd_read_atomic runs. So
+ * Without THP if the mmap_sem is held for reading, the pmd can only
+ * transition from null to not null while pmd_read_atomic() runs. So
  * we can always return atomic pmd values with this function.
  *
- * With THP if the mmap_sem is hold for reading, the pmd can become
+ * With THP if the mmap_sem is held for reading, the pmd can become
  * trans_huge or none or point to a pte (and in turn become "stable")
- * at any time under pmd_read_atomic. We could read it really
- * atomically here with a atomic64_read for the THP enabled case (and
+ * at any time under pmd_read_atomic(). We could read it truly
+ * atomically here with an atomic64_read() for the THP enabled case (and
  * it would be a whole lot simpler), but to avoid using cmpxchg8b we
  * only return an atomic pmdval if the low part of the pmdval is later
- * found stable (i.e. pointing to a pte). And we're returning a none
- * pmdval if the low part of the pmd is none. In some cases the high
- * and low part of the pmdval returned may not be consistent if THP is
- * enabled (the low part may point to previously mapped hugepage,
- * while the high part may point to a more recently mapped hugepage),
- * but pmd_none_or_trans_huge_or_clear_bad() only needs the low part
- * of the pmd to be read atomically to decide if the pmd is unstable
- * or not, with the only exception of when the low part of the pmd is
- * zero in which case we return a none pmd.
+ * found to be stable (i.e. pointing to a pte). We are also returning a
+ * 'none' (zero) pmdval if the low part of the pmd is zero.
+ *
+ * In some cases the high and low part of the pmdval returned may not be
+ * consistent if THP is enabled (the low part may point to previously
+ * mapped hugepage, while the high part may point to a more recently
+ * mapped hugepage), but pmd_none_or_trans_huge_or_clear_bad() only
+ * needs the low part of the pmd to be read atomically to decide if the
+ * pmd is unstable or not, with the only exception when the low part
+ * of the pmd is zero, in which case we return a 'none' pmd.
  */
 static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
 {

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/mm: Clean up the pmd_read_atomic() comments
  2019-09-25  6:55   ` [PATCH] x86/mm: Clean up the pmd_read_atomic() comments Ingo Molnar
@ 2019-09-25  7:28     ` Wei Yang
  0 siblings, 0 replies; 5+ messages in thread
From: Wei Yang @ 2019-09-25  7:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Wei Yang, tglx, mingo, bp, x86, linux-kernel, Peter Zijlstra,
	Andy Lutomirski, Linus Torvalds, Rik van Riel, Dave Hansen

On Wed, Sep 25, 2019 at 08:55:14AM +0200, Ingo Molnar wrote:
>
>* Wei Yang <richardw.yang@linux.intel.com> wrote:
>
>> To be honest, I have a question on how this works.
>> 
>> As the comment says, we need to call pmd_read_atomic before using
>> pte_offset_map_lock to avoid data corruption.
>
>There's only a risk of data corruption if mmap_sem is held for reading. 
>If it's held for writing then the pagetable contents should be stable.
>
>> For example, in function swapin_walk_pmd_entry:
>> 
>>     pmd_none_or_trans_huge_or_clear_bad(pmd)
>>         pmd_read_atomic(pmd)                      ---   1
>>     pte_offset_map_lock(mm, pmd, ...)             ---   2
>> 
>> At point 1, we are assured the content is intact. While in point 2, we would
>> read pmd again to calculate the pte address. How we ensure this time the
>> content is intact? Because pmd_none_or_trans_huge_or_clear_bad() ensures the
>> pte is stable, so that the content won't be changed?
>
>Indeed pte_offset_map_lock() will take a non-atomic *pmd value in 
>pte_lockptr() before taking the pte spinlock.
>
>I believe the rule here is that if pmd_none_or_trans_huge_or_clear_bad() 
>finds the pmd 'stable' while holding the mmap_sem read-locked, then the 
>pmd cannot change while we are continuously holding the mmap_sem.
>

I have the same gut feeling as you.

Then I have another question, why in page fault routines, we don't use
pmd_read_atomic() to retrieve pmd value? Since we support concurrent page
fault, and we just grap mmap_sem read-locked during page fault, the value
could be corrupted too.

For example in __handle_mm_fault()

	pmd_t orig_pmd = *vmf.pmd;
	barrier();

This is why the barrier() is here?

>Hence the followup pte_offset_map_lock() and other iterators can look at 
>the value of the pmd without locking. (Individual pte entries still need 
>the pte-lock, because they might be faulted-in in parallel.)
>
>So the pmd use pattern in swapin_walk_pmd_entry() should be safe.
>
>I'm not 100% sure though - so I've added a few more Cc:s ...
>
>I've also cleaned up the pmd_read_atomic() some more to make it more 
>readable - see the patch below.
>
>Thanks,
>
>	Ingo
>
>==================>
>From: Ingo Molnar <mingo@kernel.org>
>Date: Wed, 25 Sep 2019 08:38:57 +0200
>Subject: [PATCH] x86/mm: Clean up the pmd_read_atomic() comments
>
>Fix spelling, consistent parenthesis and grammar - and also clarify
>the language where needed.
>
>Cc: Andy Lutomirski <luto@kernel.org>
>Cc: Borislav Petkov <bp@alien8.de>
>Cc: Dave Hansen <dave.hansen@linux.intel.com>
>Cc: H. Peter Anvin <hpa@zytor.com>
>Cc: Linus Torvalds <torvalds@linux-foundation.org>
>Cc: Peter Zijlstra <peterz@infradead.org>
>Cc: Rik van Riel <riel@surriel.com>
>Cc: Thomas Gleixner <tglx@linutronix.de>
>Cc: Wei Yang <richardw.yang@linux.intel.com>
>Link: https://lkml.kernel.org/r/20190925014453.20236-1-richardw.yang@linux.intel.com
>Signed-off-by: Ingo Molnar <mingo@kernel.org>

The following change looks good to me.

Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>

>---
> arch/x86/include/asm/pgtable-3level.h | 44 ++++++++++++++++++-----------------
> 1 file changed, 23 insertions(+), 21 deletions(-)
>
>diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h
>index 1796462ff143..5afb5e0fe903 100644
>--- a/arch/x86/include/asm/pgtable-3level.h
>+++ b/arch/x86/include/asm/pgtable-3level.h
>@@ -36,39 +36,41 @@ static inline void native_set_pte(pte_t *ptep, pte_t pte)
> 
> #define pmd_read_atomic pmd_read_atomic
> /*
>- * pte_offset_map_lock on 32bit PAE kernels was reading the pmd_t with
>- * a "*pmdp" dereference done by gcc. Problem is, in certain places
>- * where pte_offset_map_lock is called, concurrent page faults are
>+ * pte_offset_map_lock() on 32-bit PAE kernels was reading the pmd_t with
>+ * a "*pmdp" dereference done by GCC. Problem is, in certain places
>+ * where pte_offset_map_lock() is called, concurrent page faults are
>  * allowed, if the mmap_sem is hold for reading. An example is mincore
>  * vs page faults vs MADV_DONTNEED. On the page fault side
>- * pmd_populate rightfully does a set_64bit, but if we're reading the
>+ * pmd_populate() rightfully does a set_64bit(), but if we're reading the
>  * pmd_t with a "*pmdp" on the mincore side, a SMP race can happen
>- * because gcc will not read the 64bit of the pmd atomically. To fix
>- * this all places running pte_offset_map_lock() while holding the
>+ * because GCC will not read the 64-bit value of the pmd atomically.
>+ *
>+ * To fix this all places running pte_offset_map_lock() while holding the
>  * mmap_sem in read mode, shall read the pmdp pointer using this
>- * function to know if the pmd is null nor not, and in turn to know if
>+ * function to know if the pmd is null or not, and in turn to know if
>  * they can run pte_offset_map_lock() or pmd_trans_huge() or other pmd
>  * operations.
>  *
>- * Without THP if the mmap_sem is hold for reading, the pmd can only
>- * transition from null to not null while pmd_read_atomic runs. So
>+ * Without THP if the mmap_sem is held for reading, the pmd can only
>+ * transition from null to not null while pmd_read_atomic() runs. So
>  * we can always return atomic pmd values with this function.
>  *
>- * With THP if the mmap_sem is hold for reading, the pmd can become
>+ * With THP if the mmap_sem is held for reading, the pmd can become
>  * trans_huge or none or point to a pte (and in turn become "stable")
>- * at any time under pmd_read_atomic. We could read it really
>- * atomically here with a atomic64_read for the THP enabled case (and
>+ * at any time under pmd_read_atomic(). We could read it truly
>+ * atomically here with an atomic64_read() for the THP enabled case (and
>  * it would be a whole lot simpler), but to avoid using cmpxchg8b we
>  * only return an atomic pmdval if the low part of the pmdval is later
>- * found stable (i.e. pointing to a pte). And we're returning a none
>- * pmdval if the low part of the pmd is none. In some cases the high
>- * and low part of the pmdval returned may not be consistent if THP is
>- * enabled (the low part may point to previously mapped hugepage,
>- * while the high part may point to a more recently mapped hugepage),
>- * but pmd_none_or_trans_huge_or_clear_bad() only needs the low part
>- * of the pmd to be read atomically to decide if the pmd is unstable
>- * or not, with the only exception of when the low part of the pmd is
>- * zero in which case we return a none pmd.
>+ * found to be stable (i.e. pointing to a pte). We are also returning a
>+ * 'none' (zero) pmdval if the low part of the pmd is zero.
>+ *
>+ * In some cases the high and low part of the pmdval returned may not be
>+ * consistent if THP is enabled (the low part may point to previously
>+ * mapped hugepage, while the high part may point to a more recently
>+ * mapped hugepage), but pmd_none_or_trans_huge_or_clear_bad() only
>+ * needs the low part of the pmd to be read atomically to decide if the
>+ * pmd is unstable or not, with the only exception when the low part
>+ * of the pmd is zero, in which case we return a 'none' pmd.
>  */
> static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
> {

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [tip: x86/mm] x86/mm: Fix function name typo in pmd_read_atomic() comment
  2019-09-25  1:44 [PATCH] x86/mm: fix function name typo in pmd_read_atomic() comment Wei Yang
  2019-09-25  1:59 ` Wei Yang
@ 2019-09-27  8:10 ` tip-bot2 for Wei Yang
  1 sibling, 0 replies; 5+ messages in thread
From: tip-bot2 for Wei Yang @ 2019-09-27  8:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Wei Yang, Andy Lutomirski, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Linus Torvalds, Peter Zijlstra, Rik van Riel,
	Thomas Gleixner, Ingo Molnar, linux-kernel

The following commit has been merged into the x86/mm branch of tip:

Commit-ID:     a2f7a0bfcaaa3928e4876d15edd4dfdc09e139b6
Gitweb:        https://git.kernel.org/tip/a2f7a0bfcaaa3928e4876d15edd4dfdc09e139b6
Author:        Wei Yang <richardw.yang@linux.intel.com>
AuthorDate:    Wed, 25 Sep 2019 09:44:53 +08:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 25 Sep 2019 08:40:19 +02:00

x86/mm: Fix function name typo in pmd_read_atomic() comment

The function involved should be pte_offset_map_lock() and we never have
function pmd_offset_map_lock defined.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190925014453.20236-1-richardw.yang@linux.intel.com
[ Minor edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/pgtable-3level.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h
index e363379..1796462 100644
--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -44,10 +44,10 @@ static inline void native_set_pte(pte_t *ptep, pte_t pte)
  * pmd_populate rightfully does a set_64bit, but if we're reading the
  * pmd_t with a "*pmdp" on the mincore side, a SMP race can happen
  * because gcc will not read the 64bit of the pmd atomically. To fix
- * this all places running pmd_offset_map_lock() while holding the
+ * this all places running pte_offset_map_lock() while holding the
  * mmap_sem in read mode, shall read the pmdp pointer using this
  * function to know if the pmd is null nor not, and in turn to know if
- * they can run pmd_offset_map_lock or pmd_trans_huge or other pmd
+ * they can run pte_offset_map_lock() or pmd_trans_huge() or other pmd
  * operations.
  *
  * Without THP if the mmap_sem is hold for reading, the pmd can only

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-09-27  8:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-25  1:44 [PATCH] x86/mm: fix function name typo in pmd_read_atomic() comment Wei Yang
2019-09-25  1:59 ` Wei Yang
2019-09-25  6:55   ` [PATCH] x86/mm: Clean up the pmd_read_atomic() comments Ingo Molnar
2019-09-25  7:28     ` Wei Yang
2019-09-27  8:10 ` [tip: x86/mm] x86/mm: Fix function name typo in pmd_read_atomic() comment tip-bot2 for Wei Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).