All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: x86@kernel.org, willy@infradead.org,
	torvalds@linux-foundation.org, akpm@linux-foundation.org
Cc: linux-kernel@vger.kernel.org, peterz@infradead.org,
	linux-mm@kvack.org, aarcange@redhat.com,
	kirill.shutemov@linux.intel.com, jroedel@suse.de,
	ubizjak@gmail.com
Subject: [PATCH 04/13] mm: Fix pmd_read_atomic()
Date: Sat, 22 Oct 2022 13:14:07 +0200	[thread overview]
Message-ID: <20221022114424.711181252@infradead.org> (raw)
In-Reply-To: 20221022111403.531902164@infradead.org

AFAICT there's no reason to do anything different than what we do for
PTEs. Make it so (also affects SH).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/pgtable-3level.h |   56 ----------------------------------
 include/linux/pgtable.h               |   49 +++++++++++++++++++++++------
 2 files changed, 39 insertions(+), 66 deletions(-)

--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -34,62 +34,6 @@ static inline void native_set_pte(pte_t
 	ptep->pte_low = pte.pte_low;
 }
 
-#define pmd_read_atomic pmd_read_atomic
-/*
- * pte_offset_map_lock() on 32-bit PAE kernels was reading the pmd_t with
- * a "*pmdp" dereference done by GCC. Problem is, in certain places
- * where pte_offset_map_lock() is called, concurrent page faults are
- * allowed, if the mmap_lock is hold for reading. An example is mincore
- * vs page faults vs MADV_DONTNEED. On the page fault side
- * pmd_populate() rightfully does a set_64bit(), but if we're reading the
- * pmd_t with a "*pmdp" on the mincore side, a SMP race can happen
- * because GCC will not read the 64-bit value of the pmd atomically.
- *
- * To fix this all places running pte_offset_map_lock() while holding the
- * mmap_lock in read mode, shall read the pmdp pointer using this
- * function to know if the pmd is null or not, and in turn to know if
- * they can run pte_offset_map_lock() or pmd_trans_huge() or other pmd
- * operations.
- *
- * Without THP if the mmap_lock is held for reading, the pmd can only
- * transition from null to not null while pmd_read_atomic() runs. So
- * we can always return atomic pmd values with this function.
- *
- * With THP if the mmap_lock is held for reading, the pmd can become
- * trans_huge or none or point to a pte (and in turn become "stable")
- * at any time under pmd_read_atomic(). We could read it truly
- * atomically here with an atomic64_read() for the THP enabled case (and
- * it would be a whole lot simpler), but to avoid using cmpxchg8b we
- * only return an atomic pmdval if the low part of the pmdval is later
- * found to be stable (i.e. pointing to a pte). We are also returning a
- * 'none' (zero) pmdval if the low part of the pmd is zero.
- *
- * In some cases the high and low part of the pmdval returned may not be
- * consistent if THP is enabled (the low part may point to previously
- * mapped hugepage, while the high part may point to a more recently
- * mapped hugepage), but pmd_none_or_trans_huge_or_clear_bad() only
- * needs the low part of the pmd to be read atomically to decide if the
- * pmd is unstable or not, with the only exception when the low part
- * of the pmd is zero, in which case we return a 'none' pmd.
- */
-static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
-{
-	pmdval_t ret;
-	u32 *tmp = (u32 *)pmdp;
-
-	ret = (pmdval_t) (*tmp);
-	if (ret) {
-		/*
-		 * If the low part is null, we must not read the high part
-		 * or we can end up with a partial pmd.
-		 */
-		smp_rmb();
-		ret |= ((pmdval_t)*(tmp + 1)) << 32;
-	}
-
-	return (pmd_t) { .pmd = ret };
-}
-
 static inline void native_set_pte_atomic(pte_t *ptep, pte_t pte)
 {
 	set_64bit((unsigned long long *)(ptep), native_pte_val(pte));
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -258,6 +258,13 @@ static inline pte_t ptep_get(pte_t *ptep
 }
 #endif
 
+#ifndef __HAVE_ARCH_PMDP_GET
+static inline pmd_t pmdp_get(pmd_t *pmdp)
+{
+	return READ_ONCE(*pmdp);
+}
+#endif
+
 #ifdef CONFIG_GUP_GET_PTE_LOW_HIGH
 /*
  * For walking the pagetables without holding any locks.  Some architectures
@@ -302,15 +309,42 @@ static inline pte_t ptep_get_lockless(pt
 
 	return pte;
 }
-#else /* CONFIG_GUP_GET_PTE_LOW_HIGH */
+#define ptep_get_lockless ptep_get_lockless
+
+#if CONFIG_PGTABLE_LEVELS > 2
+static inline pmd_t pmdp_get_lockless(pmd_t *pmdp)
+{
+	pmd_t pmd;
+
+	do {
+		pmd.pmd_low = pmdp->pmd_low;
+		smp_rmb();
+		pmd.pmd_high = pmdp->pmd_high;
+		smp_rmb();
+	} while (unlikely(pmd.pmd_low != pmdp->pmd_low));
+
+	return pmd;
+}
+#define pmdp_get_lockless pmdp_get_lockless
+#endif /* CONFIG_PGTABLE_LEVELS > 2 */
+#endif /* CONFIG_GUP_GET_PTE_LOW_HIGH */
+
 /*
  * We require that the PTE can be read atomically.
  */
+#ifndef ptep_get_lockless
 static inline pte_t ptep_get_lockless(pte_t *ptep)
 {
 	return ptep_get(ptep);
 }
-#endif /* CONFIG_GUP_GET_PTE_LOW_HIGH */
+#endif
+
+#ifndef pmdp_get_lockless
+static inline pmd_t pmdp_get_lockless(pmd_t *pmdp)
+{
+	return pmdp_get(pmdp);
+}
+#endif
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #ifndef __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR
@@ -1211,17 +1247,10 @@ static inline int pud_trans_unstable(pud
 #endif
 }
 
-#ifndef pmd_read_atomic
 static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
 {
-	/*
-	 * Depend on compiler for an atomic pmd read. NOTE: this is
-	 * only going to work, if the pmdval_t isn't larger than
-	 * an unsigned long.
-	 */
-	return *pmdp;
+	return pmdp_get_lockless(pmdp);
 }
-#endif
 
 #ifndef arch_needs_pgtable_deposit
 #define arch_needs_pgtable_deposit() (false)



  parent reply	other threads:[~2022-10-22 11:49 UTC|newest]

Thread overview: 148+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-22 11:14 [PATCH 00/13] Clean up pmd_get_atomic() and i386-PAE Peter Zijlstra
2022-10-22 11:14 ` [PATCH 01/13] mm: Update ptep_get_lockless()s comment Peter Zijlstra
2022-10-24  5:42   ` John Hubbard
2022-10-24  8:00     ` Peter Zijlstra
2022-10-24 19:58       ` Jann Horn
2022-10-24 20:19         ` Linus Torvalds
2022-10-24 20:23           ` Jann Horn
2022-10-24 20:36             ` Linus Torvalds
2022-10-25  3:21             ` Matthew Wilcox
2022-10-25  7:54               ` Alistair Popple
2022-10-25 13:33                 ` Peter Zijlstra
2022-10-25 13:44                 ` Jann Horn
2022-10-26  0:45                   ` Alistair Popple
2022-10-25 14:02         ` Peter Zijlstra
2022-10-25 14:18           ` Jann Horn
2022-10-25 15:06             ` Peter Zijlstra
2022-10-26 16:45               ` Jann Horn
2022-10-27  7:08                 ` Peter Zijlstra
2022-10-27 18:13                   ` Linus Torvalds
2022-10-27 19:35                     ` Peter Zijlstra
2022-10-27 19:43                       ` Linus Torvalds
2022-10-27 20:15                     ` Nadav Amit
2022-10-27 20:31                       ` Linus Torvalds
2022-10-27 21:44                         ` Nadav Amit
2022-10-28 23:57                           ` Nadav Amit
2022-10-29  0:42                             ` Linus Torvalds
2022-10-29 18:05                               ` Nadav Amit
2022-10-29 18:36                                 ` Linus Torvalds
2022-10-29 18:58                                   ` Linus Torvalds
2022-10-29 19:14                                     ` Linus Torvalds
2022-10-29 19:28                                       ` Nadav Amit
2022-10-30  0:18                                       ` Nadav Amit
2022-10-30  2:17                                     ` Nadav Amit
2022-10-30 18:19                                       ` Linus Torvalds
2022-10-30 18:51                                         ` Linus Torvalds
2022-10-30 22:47                                           ` Linus Torvalds
2022-10-31  1:47                                             ` Linus Torvalds
2022-10-31  4:09                                               ` Nadav Amit
2022-10-31  4:55                                                 ` Nadav Amit
2022-10-31  5:00                                                 ` Linus Torvalds
2022-10-31 15:43                                                   ` Nadav Amit
2022-10-31 17:32                                                     ` Linus Torvalds
2022-10-31  9:36                                               ` Peter Zijlstra
2022-10-31 17:28                                                 ` Linus Torvalds
2022-10-31 18:43                                                   ` mm: delay rmap removal until after TLB flush Linus Torvalds
2022-11-02  9:14                                                     ` Christian Borntraeger
2022-11-02  9:23                                                       ` Christian Borntraeger
2022-11-02 17:55                                                       ` Linus Torvalds
2022-11-02 18:28                                                         ` Linus Torvalds
2022-11-02 22:29                                                         ` Gerald Schaefer
2022-11-02 12:45                                                     ` Peter Zijlstra
2022-11-02 22:31                                                     ` Gerald Schaefer
2022-11-02 23:13                                                       ` Linus Torvalds
2022-11-03  9:52                                                     ` David Hildenbrand
2022-11-03 16:54                                                       ` Linus Torvalds
2022-11-03 17:09                                                         ` Linus Torvalds
2022-11-03 17:36                                                           ` David Hildenbrand
2022-11-04  6:33                                                     ` Alexander Gordeev
2022-11-04 17:35                                                       ` Linus Torvalds
2022-11-06 21:06                                                         ` Hugh Dickins
2022-11-06 22:34                                                           ` Linus Torvalds
2022-11-06 23:14                                                             ` Andrew Morton
2022-11-07  0:06                                                               ` Stephen Rothwell
2022-11-07 16:19                                                               ` Linus Torvalds
2022-11-07 23:02                                                                 ` Andrew Morton
2022-11-07 23:44                                                                   ` Stephen Rothwell
2022-11-07  9:12                                                           ` Peter Zijlstra
2022-11-07 20:07                                                           ` Johannes Weiner
2022-11-07 20:29                                                             ` Linus Torvalds
2022-11-07 23:47                                                               ` Linus Torvalds
2022-11-08  4:28                                                                 ` Linus Torvalds
2022-11-08 19:56                                                                   ` Linus Torvalds
2022-11-08 20:03                                                                     ` Konstantin Ryabitsev
2022-11-08 20:18                                                                       ` Linus Torvalds
2022-11-08 19:41                                                                 ` [PATCH 1/4] mm: introduce 'encoded' page pointers with embedded extra bits Linus Torvalds
2022-11-08 20:37                                                                   ` Nadav Amit
2022-11-08 20:46                                                                     ` Linus Torvalds
2022-11-09  6:36                                                                   ` Alexander Gordeev
2022-11-09 18:00                                                                     ` Linus Torvalds
2022-11-09 20:02                                                                       ` Linus Torvalds
2022-11-08 19:41                                                                 ` [PATCH 2/4] mm: teach release_pages() to take an array of encoded page pointers too Linus Torvalds
2022-11-08 19:41                                                                 ` [PATCH 3/4] mm: mmu_gather: prepare to gather encoded page pointers with flags Linus Torvalds
2022-11-08 19:41                                                                 ` [PATCH 4/4] mm: delay page_remove_rmap() until after the TLB has been flushed Linus Torvalds
2022-11-08 20:48                                                                   ` [lkp] [+115 bytes kernel size regression] [i386-tinyconfig] [0309f16088] " kernel test robot
2022-11-08 21:01                                                                     ` Linus Torvalds
2022-11-08 21:05                                                                   ` [PATCH 4/4] " Nadav Amit
2022-11-09 15:53                                                                   ` Johannes Weiner
2022-11-09 19:31                                                                     ` Hugh Dickins
2022-10-31  9:39                                               ` [PATCH 01/13] mm: Update ptep_get_lockless()s comment Peter Zijlstra
2022-10-31 17:22                                                 ` Linus Torvalds
2022-10-31  9:46                                               ` Peter Zijlstra
2022-10-31  9:28                                             ` Peter Zijlstra
2022-10-31 17:19                                               ` Linus Torvalds
2022-10-30 19:34                                         ` Nadav Amit
2022-10-29 19:39                                   ` John Hubbard
2022-10-29 20:15                                     ` Linus Torvalds
2022-10-29 20:30                                       ` Linus Torvalds
2022-10-29 20:42                                         ` John Hubbard
2022-10-29 20:56                                       ` Nadav Amit
2022-10-29 21:03                                         ` Nadav Amit
2022-10-29 21:12                                         ` Linus Torvalds
2022-10-29 20:59                                       ` Theodore Ts'o
2022-10-26 19:43               ` Nadav Amit
2022-10-27  7:27                 ` Peter Zijlstra
2022-10-27 17:30                   ` Nadav Amit
2022-10-22 11:14 ` [PATCH 02/13] x86/mm/pae: Make pmd_t similar to pte_t Peter Zijlstra
2022-10-22 11:14 ` [PATCH 03/13] sh/mm: " Peter Zijlstra
2022-12-21 13:54   ` Guenter Roeck
2022-10-22 11:14 ` Peter Zijlstra [this message]
2022-10-22 17:30   ` [PATCH 04/13] mm: Fix pmd_read_atomic() Linus Torvalds
2022-10-24  8:09     ` Peter Zijlstra
2022-11-01 12:41     ` Peter Zijlstra
2022-11-01 17:42       ` Linus Torvalds
2022-11-02  9:12       ` [tip: x86/mm] mm: Convert __HAVE_ARCH_P..P_GET to the new style tip-bot2 for Peter Zijlstra
2022-11-03 21:15       ` tip-bot2 for Peter Zijlstra
2022-12-17 18:55       ` tip-bot2 for Peter Zijlstra
2022-10-22 11:14 ` [PATCH 05/13] mm: Rename GUP_GET_PTE_LOW_HIGH Peter Zijlstra
2022-10-22 11:14 ` [PATCH 06/13] mm: Rename pmd_read_atomic() Peter Zijlstra
2022-10-22 11:14 ` [PATCH 07/13] mm/gup: Fix the lockless PMD access Peter Zijlstra
2022-10-23  0:42   ` Hugh Dickins
2022-10-24  7:42     ` Peter Zijlstra
2022-10-25  3:58       ` Hugh Dickins
2022-10-22 11:14 ` [PATCH 08/13] x86/mm/pae: Dont (ab)use atomic64 Peter Zijlstra
2022-10-22 11:14 ` [PATCH 09/13] x86/mm/pae: Use WRITE_ONCE() Peter Zijlstra
2022-10-22 17:42   ` Linus Torvalds
2022-10-24 10:21     ` Peter Zijlstra
2022-10-22 11:14 ` [PATCH 10/13] x86/mm/pae: Be consistent with pXXp_get_and_clear() Peter Zijlstra
2022-10-22 17:53   ` Linus Torvalds
2022-10-24 11:13     ` Peter Zijlstra
2022-10-22 11:14 ` [PATCH 11/13] x86_64: Remove pointless set_64bit() usage Peter Zijlstra
2022-10-22 17:55   ` Linus Torvalds
2022-11-03 19:09   ` Nathan Chancellor
2022-11-03 19:23     ` Uros Bizjak
2022-11-03 19:35       ` Nathan Chancellor
2022-11-03 20:39         ` Linus Torvalds
2022-11-03 21:06           ` Peter Zijlstra
2022-11-04 16:01           ` Peter Zijlstra
2022-11-04 17:15             ` Linus Torvalds
2022-11-05 13:29               ` Jason A. Donenfeld
2022-11-05 15:14                 ` Peter Zijlstra
2022-11-05 20:54                   ` Jason A. Donenfeld
2022-11-07  9:14                   ` David Laight
2022-12-19 15:44               ` Peter Zijlstra
2022-10-22 11:14 ` [PATCH 12/13] x86/mm/pae: Get rid of set_64bit() Peter Zijlstra
2022-10-22 11:14 ` [PATCH 13/13] mm: Remove pointless barrier() after pmdp_get_lockless() Peter Zijlstra
2022-10-22 19:59   ` Yu Zhao
2022-10-22 17:57 ` [PATCH 00/13] Clean up pmd_get_atomic() and i386-PAE Linus Torvalds
2022-10-29 12:21 ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221022114424.711181252@infradead.org \
    --to=peterz@infradead.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=jroedel@suse.de \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=torvalds@linux-foundation.org \
    --cc=ubizjak@gmail.com \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.