linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/5] x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init()
@ 2018-12-04 21:37 Dan Williams
  2018-12-04 21:37 ` [PATCH v3 1/5] generic/pgtable: Make {pmd, pud}_same() unconditionally available Dan Williams
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Dan Williams @ 2018-12-04 21:37 UTC (permalink / raw)
  To: tglx
  Cc: Sebastian Andrzej Siewior, Peter Zijlstra (Intel),
	Peter Zijlstra, Borislav Petkov, Kirill A. Shutemov,
	Andy Lutomirski, Dave Hansen, Dave Hansen, stable, x86,
	linux-kernel

Changes since v2 [1]:
* Fix links in the changelogs to reference lkml.kernel.org instead of
  lore.kernel.org (Peter)
* Collect Acked-by's from Kirill and Peter.
* Add more Cc's for patches 1 and 2.
* Strengthen the lead-in comment for the set_p*_safe() helpers (Dave)

[1]: https://lkml.org/lkml/2018/12/1/358

---

>From patch 5:

    Commit f77084d96355 "x86/mm/pat: Disable preemption around
    __flush_tlb_all()" addressed a case where __flush_tlb_all() is called
    without preemption being disabled. It also left a warning to catch other
    cases where preemption is not disabled. That warning triggers for the
    memory hotplug path which is also used for persistent memory enabling:
    
     WARNING: CPU: 35 PID: 911 at ./arch/x86/include/asm/tlbflush.h:460
     RIP: 0010:__flush_tlb_all+0x1b/0x3a
     [..]
     Call Trace:
      phys_pud_init+0x29c/0x2bb
      kernel_physical_mapping_init+0xfc/0x219
      init_memory_mapping+0x1a5/0x3b0
      arch_add_memory+0x2c/0x50
      devm_memremap_pages+0x3aa/0x610
      pmem_attach_disk+0x585/0x700 [nd_pmem]
    
    Andy wondered why a path that can sleep was using __flush_tlb_all() [1]
    and Dave confirmed the expectation for TLB flush is for modifying /
    invalidating existing pte entries, but not initial population [2]. Drop
    the usage of __flush_tlb_all() in phys_{p4d,pud,pmd}_init() on the
    expectation that this path is only ever populating empty entries for the
    linear map. Note, at linear map teardown time there is a call to the
    all-cpu flush_tlb_all() to invalidate the removed mappings.

Additionally, Dave wanted some runtime assurances that
kernel_physical_mapping_init() is only populating and not changing
existing page table entries. Patches 1 - 4 are implementing a new
set_pte() family of helpers for that purpose.

Patch 5 is tagged for -stable because the false positive warning is now
showing up on 4.19-stable kernels. Patches 1 - 4 are not tagged for
-stable, but if the sanity checking is needed please mark them as such.

The hang that was observed while developing the sanity checking
implementation was resolved by Peter's suggestion to not trigger when
the same pte value is being rewritten.

---

Dan Williams (5):
      generic/pgtable: Make {pmd,pud}_same() unconditionally available
      generic/pgtable: Introduce {p4d,pgd}_same()
      generic/pgtable: Introduce set_pte_safe()
      x86/mm: Validate kernel_physical_mapping_init() pte population
      x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init()


 arch/x86/include/asm/pgalloc.h           |   27 ++++++++++++++
 arch/x86/mm/init_64.c                    |   30 ++++++----------
 include/asm-generic/5level-fixup.h       |    1 +
 include/asm-generic/pgtable-nop4d-hack.h |    1 +
 include/asm-generic/pgtable-nop4d.h      |    1 +
 include/asm-generic/pgtable-nopud.h      |    1 +
 include/asm-generic/pgtable.h            |   56 +++++++++++++++++++++++++-----
 7 files changed, 90 insertions(+), 27 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 1/5] generic/pgtable: Make {pmd, pud}_same() unconditionally available
  2018-12-04 21:37 [PATCH v3 0/5] x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init() Dan Williams
@ 2018-12-04 21:37 ` Dan Williams
  2018-12-05 18:05   ` [tip:x86/mm] " tip-bot for Dan Williams
  2018-12-04 21:37 ` [PATCH v3 2/5] generic/pgtable: Introduce {p4d,pgd}_same() Dan Williams
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Dan Williams @ 2018-12-04 21:37 UTC (permalink / raw)
  To: tglx
  Cc: Sebastian Andrzej Siewior, Borislav Petkov, Peter Zijlstra,
	Dave Hansen, Kirill A. Shutemov, dave.hansen, x86, linux-kernel

In preparation for {pmd,pud}_same() to be used outside of transparent
huge page code paths, make them unconditionally available. This enables
them to be used in the definition of a new family of
set_{pte,pmd,pud,p4d,pgd}_safe() helpers.

Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/asm-generic/pgtable.h |   14 --------------
 1 file changed, 14 deletions(-)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 359fb935ded6..eea50ef8b8cd 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -375,7 +375,6 @@ static inline int pte_unused(pte_t pte)
 #endif
 
 #ifndef __HAVE_ARCH_PMD_SAME
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
 static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b)
 {
 	return pmd_val(pmd_a) == pmd_val(pmd_b);
@@ -385,19 +384,6 @@ static inline int pud_same(pud_t pud_a, pud_t pud_b)
 {
 	return pud_val(pud_a) == pud_val(pud_b);
 }
-#else /* CONFIG_TRANSPARENT_HUGEPAGE */
-static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b)
-{
-	BUILD_BUG();
-	return 0;
-}
-
-static inline int pud_same(pud_t pud_a, pud_t pud_b)
-{
-	BUILD_BUG();
-	return 0;
-}
-#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 #endif
 
 #ifndef __HAVE_ARCH_DO_SWAP_PAGE


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 2/5] generic/pgtable: Introduce {p4d,pgd}_same()
  2018-12-04 21:37 [PATCH v3 0/5] x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init() Dan Williams
  2018-12-04 21:37 ` [PATCH v3 1/5] generic/pgtable: Make {pmd, pud}_same() unconditionally available Dan Williams
@ 2018-12-04 21:37 ` Dan Williams
  2018-12-05 18:05   ` [tip:x86/mm] " tip-bot for Dan Williams
  2018-12-04 21:37 ` [PATCH v3 3/5] generic/pgtable: Introduce set_pte_safe() Dan Williams
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Dan Williams @ 2018-12-04 21:37 UTC (permalink / raw)
  To: tglx
  Cc: Sebastian Andrzej Siewior, Borislav Petkov, Peter Zijlstra,
	Dave Hansen, Kirill A. Shutemov, dave.hansen, x86, linux-kernel

In preparation for introducing '_safe' versions of page table entry 'set'
helpers, introduce generic versions of p4d_same() and pgd_same().

Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/asm-generic/pgtable.h |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index eea50ef8b8cd..dae7f98babed 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -386,6 +386,20 @@ static inline int pud_same(pud_t pud_a, pud_t pud_b)
 }
 #endif
 
+#ifndef __HAVE_ARCH_P4D_SAME
+static inline int p4d_same(p4d_t p4d_a, p4d_t p4d_b)
+{
+	return p4d_val(p4d_a) == p4d_val(p4d_b);
+}
+#endif
+
+#ifndef __HAVE_ARCH_PGD_SAME
+static inline int pgd_same(pgd_t pgd_a, pgd_t pgd_b)
+{
+	return pgd_val(pgd_a) == pgd_val(pgd_b);
+}
+#endif
+
 #ifndef __HAVE_ARCH_DO_SWAP_PAGE
 /*
  * Some architectures support metadata associated with a page. When a


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 3/5] generic/pgtable: Introduce set_pte_safe()
  2018-12-04 21:37 [PATCH v3 0/5] x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init() Dan Williams
  2018-12-04 21:37 ` [PATCH v3 1/5] generic/pgtable: Make {pmd, pud}_same() unconditionally available Dan Williams
  2018-12-04 21:37 ` [PATCH v3 2/5] generic/pgtable: Introduce {p4d,pgd}_same() Dan Williams
@ 2018-12-04 21:37 ` Dan Williams
  2018-12-04 21:37 ` [PATCH v3 4/5] x86/mm: Validate kernel_physical_mapping_init() pte population Dan Williams
  2018-12-04 21:37 ` [PATCH v3 5/5] x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init() Dan Williams
  4 siblings, 0 replies; 10+ messages in thread
From: Dan Williams @ 2018-12-04 21:37 UTC (permalink / raw)
  To: tglx
  Cc: Sebastian Andrzej Siewior, Borislav Petkov, Peter Zijlstra,
	Dave Hansen, Peter Zijlstra (Intel),
	Kirill A. Shutemov, dave.hansen, x86, linux-kernel

Commit f77084d96355 "x86/mm/pat: Disable preemption around
__flush_tlb_all()" introduced a warning to capture cases
__flush_tlb_all() is called without pre-emption disabled. It triggers a
false positive warning in the memory hotplug path. On investigation it
was found that the __flush_tlb_all() calls are not necessary. However,
they are only "not necessary" in practice provided the ptes are being
initially populated from the !present state. Introduce set_pte_safe() as
a sanity check that the pte is being updated in a way that does not
require a tlb flush.

Forgive the macro, the availability of the various of set_pte() levels
is hit and miss across architectures.

Link: https://lkml.kernel.org/r/279dadae-9148-465c-7ec6-3f37e026c6c9@intel.com
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/asm-generic/pgtable.h |   38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index dae7f98babed..a9cac82e9a7a 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -400,6 +400,44 @@ static inline int pgd_same(pgd_t pgd_a, pgd_t pgd_b)
 }
 #endif
 
+/*
+ * Use set_p*_safe(), and elide TLB flushing, when confident that *no*
+ * TLB flush will be required as a result of the "set". For example, use
+ * in scenarios where it is known ahead of time that the routine is
+ * setting non-present entries, or re-setting an existing entry to the
+ * same value. Otherwise, use the typical "set" helpers and flush the
+ * TLB.
+ */
+#define set_pte_safe(ptep, pte) \
+({ \
+	WARN_ON_ONCE(pte_present(*ptep) && !pte_same(*ptep, pte)); \
+	set_pte(ptep, pte); \
+})
+
+#define set_pmd_safe(pmdp, pmd) \
+({ \
+	WARN_ON_ONCE(pmd_present(*pmdp) && !pmd_same(*pmdp, pmd)); \
+	set_pmd(pmdp, pmd); \
+})
+
+#define set_pud_safe(pudp, pud) \
+({ \
+	WARN_ON_ONCE(pud_present(*pudp) && !pud_same(*pudp, pud)); \
+	set_pud(pudp, pud); \
+})
+
+#define set_p4d_safe(p4dp, p4d) \
+({ \
+	WARN_ON_ONCE(p4d_present(*p4dp) && !p4d_same(*p4dp, p4d)); \
+	set_p4d(p4dp, p4d); \
+})
+
+#define set_pgd_safe(pgdp, pgd) \
+({ \
+	WARN_ON_ONCE(pgd_present(*pgdp) && !pgd_same(*pgdp, pgd)); \
+	set_pgd(pgdp, pgd); \
+})
+
 #ifndef __HAVE_ARCH_DO_SWAP_PAGE
 /*
  * Some architectures support metadata associated with a page. When a


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 4/5] x86/mm: Validate kernel_physical_mapping_init() pte population
  2018-12-04 21:37 [PATCH v3 0/5] x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init() Dan Williams
                   ` (2 preceding siblings ...)
  2018-12-04 21:37 ` [PATCH v3 3/5] generic/pgtable: Introduce set_pte_safe() Dan Williams
@ 2018-12-04 21:37 ` Dan Williams
  2018-12-05 18:06   ` [tip:x86/mm] x86/mm: Validate kernel_physical_mapping_init() PTE population tip-bot for Dan Williams
  2018-12-04 21:37 ` [PATCH v3 5/5] x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init() Dan Williams
  4 siblings, 1 reply; 10+ messages in thread
From: Dan Williams @ 2018-12-04 21:37 UTC (permalink / raw)
  To: tglx
  Cc: Sebastian Andrzej Siewior, Borislav Petkov, Peter Zijlstra,
	Dave Hansen, Kirill A. Shutemov, Peter Zijlstra (Intel),
	dave.hansen, x86, linux-kernel

The usage of __flush_tlb_all() in the kernel_physical_mapping_init()
path is not necessary. In general flushing the tlb is not required when
updating an entry from the !present state. However, to give confidence
in the future removal of tlb flushing in this path, use the new
set_pte_safe() family of helpers to assert that the !present assumption
is true in this path.

Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/pgalloc.h           |   27 +++++++++++++++++++++++++++
 arch/x86/mm/init_64.c                    |   24 ++++++++++++------------
 include/asm-generic/5level-fixup.h       |    1 +
 include/asm-generic/pgtable-nop4d-hack.h |    1 +
 include/asm-generic/pgtable-nop4d.h      |    1 +
 include/asm-generic/pgtable-nopud.h      |    1 +
 6 files changed, 43 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h
index ec7f43327033..1ea41aaef68b 100644
--- a/arch/x86/include/asm/pgalloc.h
+++ b/arch/x86/include/asm/pgalloc.h
@@ -80,6 +80,13 @@ static inline void pmd_populate_kernel(struct mm_struct *mm,
 	set_pmd(pmd, __pmd(__pa(pte) | _PAGE_TABLE));
 }
 
+static inline void pmd_populate_kernel_safe(struct mm_struct *mm,
+				       pmd_t *pmd, pte_t *pte)
+{
+	paravirt_alloc_pte(mm, __pa(pte) >> PAGE_SHIFT);
+	set_pmd_safe(pmd, __pmd(__pa(pte) | _PAGE_TABLE));
+}
+
 static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
 				struct page *pte)
 {
@@ -132,6 +139,12 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
 	paravirt_alloc_pmd(mm, __pa(pmd) >> PAGE_SHIFT);
 	set_pud(pud, __pud(_PAGE_TABLE | __pa(pmd)));
 }
+
+static inline void pud_populate_safe(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
+{
+	paravirt_alloc_pmd(mm, __pa(pmd) >> PAGE_SHIFT);
+	set_pud_safe(pud, __pud(_PAGE_TABLE | __pa(pmd)));
+}
 #endif	/* CONFIG_X86_PAE */
 
 #if CONFIG_PGTABLE_LEVELS > 3
@@ -141,6 +154,12 @@ static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4d, pud_t *pud)
 	set_p4d(p4d, __p4d(_PAGE_TABLE | __pa(pud)));
 }
 
+static inline void p4d_populate_safe(struct mm_struct *mm, p4d_t *p4d, pud_t *pud)
+{
+	paravirt_alloc_pud(mm, __pa(pud) >> PAGE_SHIFT);
+	set_p4d_safe(p4d, __p4d(_PAGE_TABLE | __pa(pud)));
+}
+
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
 	gfp_t gfp = GFP_KERNEL_ACCOUNT;
@@ -173,6 +192,14 @@ static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, p4d_t *p4d)
 	set_pgd(pgd, __pgd(_PAGE_TABLE | __pa(p4d)));
 }
 
+static inline void pgd_populate_safe(struct mm_struct *mm, pgd_t *pgd, p4d_t *p4d)
+{
+	if (!pgtable_l5_enabled())
+		return;
+	paravirt_alloc_p4d(mm, __pa(p4d) >> PAGE_SHIFT);
+	set_pgd_safe(pgd, __pgd(_PAGE_TABLE | __pa(p4d)));
+}
+
 static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
 	gfp_t gfp = GFP_KERNEL_ACCOUNT;
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 5fab264948c2..3e25ac2793ef 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -432,7 +432,7 @@ phys_pte_init(pte_t *pte_page, unsigned long paddr, unsigned long paddr_end,
 					     E820_TYPE_RAM) &&
 			    !e820__mapped_any(paddr & PAGE_MASK, paddr_next,
 					     E820_TYPE_RESERVED_KERN))
-				set_pte(pte, __pte(0));
+				set_pte_safe(pte, __pte(0));
 			continue;
 		}
 
@@ -452,7 +452,7 @@ phys_pte_init(pte_t *pte_page, unsigned long paddr, unsigned long paddr_end,
 			pr_info("   pte=%p addr=%lx pte=%016lx\n", pte, paddr,
 				pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL).pte);
 		pages++;
-		set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, prot));
+		set_pte_safe(pte, pfn_pte(paddr >> PAGE_SHIFT, prot));
 		paddr_last = (paddr & PAGE_MASK) + PAGE_SIZE;
 	}
 
@@ -487,7 +487,7 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long paddr, unsigned long paddr_end,
 					     E820_TYPE_RAM) &&
 			    !e820__mapped_any(paddr & PMD_MASK, paddr_next,
 					     E820_TYPE_RESERVED_KERN))
-				set_pmd(pmd, __pmd(0));
+				set_pmd_safe(pmd, __pmd(0));
 			continue;
 		}
 
@@ -524,7 +524,7 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long paddr, unsigned long paddr_end,
 		if (page_size_mask & (1<<PG_LEVEL_2M)) {
 			pages++;
 			spin_lock(&init_mm.page_table_lock);
-			set_pte((pte_t *)pmd,
+			set_pte_safe((pte_t *)pmd,
 				pfn_pte((paddr & PMD_MASK) >> PAGE_SHIFT,
 					__pgprot(pgprot_val(prot) | _PAGE_PSE)));
 			spin_unlock(&init_mm.page_table_lock);
@@ -536,7 +536,7 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long paddr, unsigned long paddr_end,
 		paddr_last = phys_pte_init(pte, paddr, paddr_end, new_prot);
 
 		spin_lock(&init_mm.page_table_lock);
-		pmd_populate_kernel(&init_mm, pmd, pte);
+		pmd_populate_kernel_safe(&init_mm, pmd, pte);
 		spin_unlock(&init_mm.page_table_lock);
 	}
 	update_page_count(PG_LEVEL_2M, pages);
@@ -573,7 +573,7 @@ phys_pud_init(pud_t *pud_page, unsigned long paddr, unsigned long paddr_end,
 					     E820_TYPE_RAM) &&
 			    !e820__mapped_any(paddr & PUD_MASK, paddr_next,
 					     E820_TYPE_RESERVED_KERN))
-				set_pud(pud, __pud(0));
+				set_pud_safe(pud, __pud(0));
 			continue;
 		}
 
@@ -611,7 +611,7 @@ phys_pud_init(pud_t *pud_page, unsigned long paddr, unsigned long paddr_end,
 		if (page_size_mask & (1<<PG_LEVEL_1G)) {
 			pages++;
 			spin_lock(&init_mm.page_table_lock);
-			set_pte((pte_t *)pud,
+			set_pte_safe((pte_t *)pud,
 				pfn_pte((paddr & PUD_MASK) >> PAGE_SHIFT,
 					PAGE_KERNEL_LARGE));
 			spin_unlock(&init_mm.page_table_lock);
@@ -624,7 +624,7 @@ phys_pud_init(pud_t *pud_page, unsigned long paddr, unsigned long paddr_end,
 					   page_size_mask, prot);
 
 		spin_lock(&init_mm.page_table_lock);
-		pud_populate(&init_mm, pud, pmd);
+		pud_populate_safe(&init_mm, pud, pmd);
 		spin_unlock(&init_mm.page_table_lock);
 	}
 	__flush_tlb_all();
@@ -659,7 +659,7 @@ phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, unsigned long paddr_end,
 					     E820_TYPE_RAM) &&
 			    !e820__mapped_any(paddr & P4D_MASK, paddr_next,
 					     E820_TYPE_RESERVED_KERN))
-				set_p4d(p4d, __p4d(0));
+				set_p4d_safe(p4d, __p4d(0));
 			continue;
 		}
 
@@ -677,7 +677,7 @@ phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, unsigned long paddr_end,
 					   page_size_mask);
 
 		spin_lock(&init_mm.page_table_lock);
-		p4d_populate(&init_mm, p4d, pud);
+		p4d_populate_safe(&init_mm, p4d, pud);
 		spin_unlock(&init_mm.page_table_lock);
 	}
 	__flush_tlb_all();
@@ -723,9 +723,9 @@ kernel_physical_mapping_init(unsigned long paddr_start,
 
 		spin_lock(&init_mm.page_table_lock);
 		if (pgtable_l5_enabled())
-			pgd_populate(&init_mm, pgd, p4d);
+			pgd_populate_safe(&init_mm, pgd, p4d);
 		else
-			p4d_populate(&init_mm, p4d_offset(pgd, vaddr), (pud_t *) p4d);
+			p4d_populate_safe(&init_mm, p4d_offset(pgd, vaddr), (pud_t *) p4d);
 		spin_unlock(&init_mm.page_table_lock);
 		pgd_changed = true;
 	}
diff --git a/include/asm-generic/5level-fixup.h b/include/asm-generic/5level-fixup.h
index 73474bb52344..bb6cb347018c 100644
--- a/include/asm-generic/5level-fixup.h
+++ b/include/asm-generic/5level-fixup.h
@@ -26,6 +26,7 @@
 #define p4d_clear(p4d)			pgd_clear(p4d)
 #define p4d_val(p4d)			pgd_val(p4d)
 #define p4d_populate(mm, p4d, pud)	pgd_populate(mm, p4d, pud)
+#define p4d_populate_safe(mm, p4d, pud)	pgd_populate(mm, p4d, pud)
 #define p4d_page(p4d)			pgd_page(p4d)
 #define p4d_page_vaddr(p4d)		pgd_page_vaddr(p4d)
 
diff --git a/include/asm-generic/pgtable-nop4d-hack.h b/include/asm-generic/pgtable-nop4d-hack.h
index 1d6dd38c0e5e..829bdb0d6327 100644
--- a/include/asm-generic/pgtable-nop4d-hack.h
+++ b/include/asm-generic/pgtable-nop4d-hack.h
@@ -31,6 +31,7 @@ static inline void pgd_clear(pgd_t *pgd)	{ }
 #define pud_ERROR(pud)				(pgd_ERROR((pud).pgd))
 
 #define pgd_populate(mm, pgd, pud)		do { } while (0)
+#define pgd_populate_safe(mm, pgd, pud)		do { } while (0)
 /*
  * (puds are folded into pgds so this doesn't get actually called,
  * but the define is needed for a generic inline function.)
diff --git a/include/asm-generic/pgtable-nop4d.h b/include/asm-generic/pgtable-nop4d.h
index 04cb913797bc..aebab905e6cd 100644
--- a/include/asm-generic/pgtable-nop4d.h
+++ b/include/asm-generic/pgtable-nop4d.h
@@ -26,6 +26,7 @@ static inline void pgd_clear(pgd_t *pgd)	{ }
 #define p4d_ERROR(p4d)				(pgd_ERROR((p4d).pgd))
 
 #define pgd_populate(mm, pgd, p4d)		do { } while (0)
+#define pgd_populate_safe(mm, pgd, p4d)		do { } while (0)
 /*
  * (p4ds are folded into pgds so this doesn't get actually called,
  * but the define is needed for a generic inline function.)
diff --git a/include/asm-generic/pgtable-nopud.h b/include/asm-generic/pgtable-nopud.h
index 9bef475db6fe..c77a1d301155 100644
--- a/include/asm-generic/pgtable-nopud.h
+++ b/include/asm-generic/pgtable-nopud.h
@@ -35,6 +35,7 @@ static inline void p4d_clear(p4d_t *p4d)	{ }
 #define pud_ERROR(pud)				(p4d_ERROR((pud).p4d))
 
 #define p4d_populate(mm, p4d, pud)		do { } while (0)
+#define p4d_populate_safe(mm, p4d, pud)		do { } while (0)
 /*
  * (puds are folded into p4ds so this doesn't get actually called,
  * but the define is needed for a generic inline function.)


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 5/5] x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init()
  2018-12-04 21:37 [PATCH v3 0/5] x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init() Dan Williams
                   ` (3 preceding siblings ...)
  2018-12-04 21:37 ` [PATCH v3 4/5] x86/mm: Validate kernel_physical_mapping_init() pte population Dan Williams
@ 2018-12-04 21:37 ` Dan Williams
  2018-12-05 18:07   ` [tip:x86/mm] " tip-bot for Dan Williams
  4 siblings, 1 reply; 10+ messages in thread
From: Dan Williams @ 2018-12-04 21:37 UTC (permalink / raw)
  To: tglx
  Cc: Sebastian Andrzej Siewior, Borislav Petkov, stable,
	Andy Lutomirski, Dave Hansen, Peter Zijlstra (Intel),
	Kirill A. Shutemov, dave.hansen, x86, linux-kernel

Commit f77084d96355 "x86/mm/pat: Disable preemption around
__flush_tlb_all()" addressed a case where __flush_tlb_all() is called
without preemption being disabled. It also left a warning to catch other
cases where preemption is not disabled. That warning triggers for the
memory hotplug path which is also used for persistent memory enabling:

 WARNING: CPU: 35 PID: 911 at ./arch/x86/include/asm/tlbflush.h:460
 RIP: 0010:__flush_tlb_all+0x1b/0x3a
 [..]
 Call Trace:
  phys_pud_init+0x29c/0x2bb
  kernel_physical_mapping_init+0xfc/0x219
  init_memory_mapping+0x1a5/0x3b0
  arch_add_memory+0x2c/0x50
  devm_memremap_pages+0x3aa/0x610
  pmem_attach_disk+0x585/0x700 [nd_pmem]

Andy wondered why a path that can sleep was using __flush_tlb_all() [1]
and Dave confirmed the expectation for TLB flush is for modifying /
invalidating existing pte entries, but not initial population [2]. Drop
the usage of __flush_tlb_all() in phys_{p4d,pud,pmd}_init() on the
expectation that this path is only ever populating empty entries for the
linear map. Note, at linear map teardown time there is a call to the
all-cpu flush_tlb_all() to invalidate the removed mappings.

[1]: https://lkml.kernel.org/r/9DFD717D-857D-493D-A606-B635D72BAC21@amacapital.net
[2]: https://lkml.kernel.org/r/749919a4-cdb1-48a3-adb4-adb81a5fa0b5@intel.com

Fixes: f77084d96355 ("x86/mm/pat: Disable preemption around __flush_tlb_all()")
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: <stable@vger.kernel.org>
Reported-by: Andy Lutomirski <luto@kernel.org>
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/mm/init_64.c |    6 ------
 1 file changed, 6 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 3e25ac2793ef..484c1b92f078 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -584,7 +584,6 @@ phys_pud_init(pud_t *pud_page, unsigned long paddr, unsigned long paddr_end,
 							   paddr_end,
 							   page_size_mask,
 							   prot);
-				__flush_tlb_all();
 				continue;
 			}
 			/*
@@ -627,7 +626,6 @@ phys_pud_init(pud_t *pud_page, unsigned long paddr, unsigned long paddr_end,
 		pud_populate_safe(&init_mm, pud, pmd);
 		spin_unlock(&init_mm.page_table_lock);
 	}
-	__flush_tlb_all();
 
 	update_page_count(PG_LEVEL_1G, pages);
 
@@ -668,7 +666,6 @@ phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, unsigned long paddr_end,
 			paddr_last = phys_pud_init(pud, paddr,
 					paddr_end,
 					page_size_mask);
-			__flush_tlb_all();
 			continue;
 		}
 
@@ -680,7 +677,6 @@ phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, unsigned long paddr_end,
 		p4d_populate_safe(&init_mm, p4d, pud);
 		spin_unlock(&init_mm.page_table_lock);
 	}
-	__flush_tlb_all();
 
 	return paddr_last;
 }
@@ -733,8 +729,6 @@ kernel_physical_mapping_init(unsigned long paddr_start,
 	if (pgd_changed)
 		sync_global_pgds(vaddr_start, vaddr_end - 1);
 
-	__flush_tlb_all();
-
 	return paddr_last;
 }
 


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [tip:x86/mm] generic/pgtable: Make {pmd, pud}_same() unconditionally available
  2018-12-04 21:37 ` [PATCH v3 1/5] generic/pgtable: Make {pmd, pud}_same() unconditionally available Dan Williams
@ 2018-12-05 18:05   ` tip-bot for Dan Williams
  0 siblings, 0 replies; 10+ messages in thread
From: tip-bot for Dan Williams @ 2018-12-05 18:05 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dan.j.williams, bp, mingo, tglx, riel, linux-kernel, dave.hansen,
	torvalds, bigeasy, luto, hpa, kirill.shutemov, peterz

Commit-ID:  c683c37cd13246941924c48f6c6a9863425e0cec
Gitweb:     https://git.kernel.org/tip/c683c37cd13246941924c48f6c6a9863425e0cec
Author:     Dan Williams <dan.j.williams@intel.com>
AuthorDate: Tue, 4 Dec 2018 13:37:06 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 5 Dec 2018 09:03:05 +0100

generic/pgtable: Make {pmd, pud}_same() unconditionally available

In preparation for {pmd,pud}_same() to be used outside of transparent
huge page code paths, make them unconditionally available. This enables
them to be used in the definition of a new family of
set_{pte,pmd,pud,p4d,pgd}_safe() helpers.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/154395942644.32119.10238934183949418128.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/asm-generic/pgtable.h | 14 --------------
 1 file changed, 14 deletions(-)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 359fb935ded6..eea50ef8b8cd 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -375,7 +375,6 @@ static inline int pte_unused(pte_t pte)
 #endif
 
 #ifndef __HAVE_ARCH_PMD_SAME
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
 static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b)
 {
 	return pmd_val(pmd_a) == pmd_val(pmd_b);
@@ -385,19 +384,6 @@ static inline int pud_same(pud_t pud_a, pud_t pud_b)
 {
 	return pud_val(pud_a) == pud_val(pud_b);
 }
-#else /* CONFIG_TRANSPARENT_HUGEPAGE */
-static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b)
-{
-	BUILD_BUG();
-	return 0;
-}
-
-static inline int pud_same(pud_t pud_a, pud_t pud_b)
-{
-	BUILD_BUG();
-	return 0;
-}
-#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 #endif
 
 #ifndef __HAVE_ARCH_DO_SWAP_PAGE

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [tip:x86/mm] generic/pgtable: Introduce {p4d,pgd}_same()
  2018-12-04 21:37 ` [PATCH v3 2/5] generic/pgtable: Introduce {p4d,pgd}_same() Dan Williams
@ 2018-12-05 18:05   ` tip-bot for Dan Williams
  0 siblings, 0 replies; 10+ messages in thread
From: tip-bot for Dan Williams @ 2018-12-05 18:05 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dave.hansen, bigeasy, kirill.shutemov, mingo, tglx, luto,
	torvalds, dan.j.williams, hpa, peterz, linux-kernel, riel, bp

Commit-ID:  0cebbb60f759a709dabb3c87b9704f9844878850
Gitweb:     https://git.kernel.org/tip/0cebbb60f759a709dabb3c87b9704f9844878850
Author:     Dan Williams <dan.j.williams@intel.com>
AuthorDate: Tue, 4 Dec 2018 13:37:11 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 5 Dec 2018 09:03:06 +0100

generic/pgtable: Introduce {p4d,pgd}_same()

In preparation for introducing '_safe' versions of page table entry 'set'
helpers, introduce generic versions of p4d_same() and pgd_same().

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/154395943153.32119.1733586547359626311.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/asm-generic/pgtable.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index eea50ef8b8cd..dae7f98babed 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -386,6 +386,20 @@ static inline int pud_same(pud_t pud_a, pud_t pud_b)
 }
 #endif
 
+#ifndef __HAVE_ARCH_P4D_SAME
+static inline int p4d_same(p4d_t p4d_a, p4d_t p4d_b)
+{
+	return p4d_val(p4d_a) == p4d_val(p4d_b);
+}
+#endif
+
+#ifndef __HAVE_ARCH_PGD_SAME
+static inline int pgd_same(pgd_t pgd_a, pgd_t pgd_b)
+{
+	return pgd_val(pgd_a) == pgd_val(pgd_b);
+}
+#endif
+
 #ifndef __HAVE_ARCH_DO_SWAP_PAGE
 /*
  * Some architectures support metadata associated with a page. When a

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [tip:x86/mm] x86/mm: Validate kernel_physical_mapping_init() PTE population
  2018-12-04 21:37 ` [PATCH v3 4/5] x86/mm: Validate kernel_physical_mapping_init() pte population Dan Williams
@ 2018-12-05 18:06   ` tip-bot for Dan Williams
  0 siblings, 0 replies; 10+ messages in thread
From: tip-bot for Dan Williams @ 2018-12-05 18:06 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, torvalds, luto, hpa, riel, dave.hansen, linux-kernel,
	bigeasy, kirill.shutemov, dave.hansen, dan.j.williams, tglx,
	peterz, bp

Commit-ID:  0a9fe8ca844d43f3f547f0e166122b6048121c8f
Gitweb:     https://git.kernel.org/tip/0a9fe8ca844d43f3f547f0e166122b6048121c8f
Author:     Dan Williams <dan.j.williams@intel.com>
AuthorDate: Tue, 4 Dec 2018 13:37:21 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 5 Dec 2018 09:03:06 +0100

x86/mm: Validate kernel_physical_mapping_init() PTE population

The usage of __flush_tlb_all() in the kernel_physical_mapping_init()
path is not necessary. In general flushing the TLB is not required when
updating an entry from the !present state. However, to give confidence
in the future removal of TLB flushing in this path, use the new
set_pte_safe() family of helpers to assert that the !present assumption
is true in this path.

[ mingo: Minor readability edits. ]

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/154395944177.32119.8524957429632012270.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/pgalloc.h           | 27 +++++++++++++++++++++++++++
 arch/x86/mm/init_64.c                    | 24 ++++++++++++------------
 include/asm-generic/5level-fixup.h       |  1 +
 include/asm-generic/pgtable-nop4d-hack.h |  1 +
 include/asm-generic/pgtable-nop4d.h      |  1 +
 include/asm-generic/pgtable-nopud.h      |  1 +
 6 files changed, 43 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h
index ec7f43327033..1ea41aaef68b 100644
--- a/arch/x86/include/asm/pgalloc.h
+++ b/arch/x86/include/asm/pgalloc.h
@@ -80,6 +80,13 @@ static inline void pmd_populate_kernel(struct mm_struct *mm,
 	set_pmd(pmd, __pmd(__pa(pte) | _PAGE_TABLE));
 }
 
+static inline void pmd_populate_kernel_safe(struct mm_struct *mm,
+				       pmd_t *pmd, pte_t *pte)
+{
+	paravirt_alloc_pte(mm, __pa(pte) >> PAGE_SHIFT);
+	set_pmd_safe(pmd, __pmd(__pa(pte) | _PAGE_TABLE));
+}
+
 static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
 				struct page *pte)
 {
@@ -132,6 +139,12 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
 	paravirt_alloc_pmd(mm, __pa(pmd) >> PAGE_SHIFT);
 	set_pud(pud, __pud(_PAGE_TABLE | __pa(pmd)));
 }
+
+static inline void pud_populate_safe(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
+{
+	paravirt_alloc_pmd(mm, __pa(pmd) >> PAGE_SHIFT);
+	set_pud_safe(pud, __pud(_PAGE_TABLE | __pa(pmd)));
+}
 #endif	/* CONFIG_X86_PAE */
 
 #if CONFIG_PGTABLE_LEVELS > 3
@@ -141,6 +154,12 @@ static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4d, pud_t *pud)
 	set_p4d(p4d, __p4d(_PAGE_TABLE | __pa(pud)));
 }
 
+static inline void p4d_populate_safe(struct mm_struct *mm, p4d_t *p4d, pud_t *pud)
+{
+	paravirt_alloc_pud(mm, __pa(pud) >> PAGE_SHIFT);
+	set_p4d_safe(p4d, __p4d(_PAGE_TABLE | __pa(pud)));
+}
+
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
 	gfp_t gfp = GFP_KERNEL_ACCOUNT;
@@ -173,6 +192,14 @@ static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, p4d_t *p4d)
 	set_pgd(pgd, __pgd(_PAGE_TABLE | __pa(p4d)));
 }
 
+static inline void pgd_populate_safe(struct mm_struct *mm, pgd_t *pgd, p4d_t *p4d)
+{
+	if (!pgtable_l5_enabled())
+		return;
+	paravirt_alloc_p4d(mm, __pa(p4d) >> PAGE_SHIFT);
+	set_pgd_safe(pgd, __pgd(_PAGE_TABLE | __pa(p4d)));
+}
+
 static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
 	gfp_t gfp = GFP_KERNEL_ACCOUNT;
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 5fab264948c2..3e25ac2793ef 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -432,7 +432,7 @@ phys_pte_init(pte_t *pte_page, unsigned long paddr, unsigned long paddr_end,
 					     E820_TYPE_RAM) &&
 			    !e820__mapped_any(paddr & PAGE_MASK, paddr_next,
 					     E820_TYPE_RESERVED_KERN))
-				set_pte(pte, __pte(0));
+				set_pte_safe(pte, __pte(0));
 			continue;
 		}
 
@@ -452,7 +452,7 @@ phys_pte_init(pte_t *pte_page, unsigned long paddr, unsigned long paddr_end,
 			pr_info("   pte=%p addr=%lx pte=%016lx\n", pte, paddr,
 				pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL).pte);
 		pages++;
-		set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, prot));
+		set_pte_safe(pte, pfn_pte(paddr >> PAGE_SHIFT, prot));
 		paddr_last = (paddr & PAGE_MASK) + PAGE_SIZE;
 	}
 
@@ -487,7 +487,7 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long paddr, unsigned long paddr_end,
 					     E820_TYPE_RAM) &&
 			    !e820__mapped_any(paddr & PMD_MASK, paddr_next,
 					     E820_TYPE_RESERVED_KERN))
-				set_pmd(pmd, __pmd(0));
+				set_pmd_safe(pmd, __pmd(0));
 			continue;
 		}
 
@@ -524,7 +524,7 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long paddr, unsigned long paddr_end,
 		if (page_size_mask & (1<<PG_LEVEL_2M)) {
 			pages++;
 			spin_lock(&init_mm.page_table_lock);
-			set_pte((pte_t *)pmd,
+			set_pte_safe((pte_t *)pmd,
 				pfn_pte((paddr & PMD_MASK) >> PAGE_SHIFT,
 					__pgprot(pgprot_val(prot) | _PAGE_PSE)));
 			spin_unlock(&init_mm.page_table_lock);
@@ -536,7 +536,7 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long paddr, unsigned long paddr_end,
 		paddr_last = phys_pte_init(pte, paddr, paddr_end, new_prot);
 
 		spin_lock(&init_mm.page_table_lock);
-		pmd_populate_kernel(&init_mm, pmd, pte);
+		pmd_populate_kernel_safe(&init_mm, pmd, pte);
 		spin_unlock(&init_mm.page_table_lock);
 	}
 	update_page_count(PG_LEVEL_2M, pages);
@@ -573,7 +573,7 @@ phys_pud_init(pud_t *pud_page, unsigned long paddr, unsigned long paddr_end,
 					     E820_TYPE_RAM) &&
 			    !e820__mapped_any(paddr & PUD_MASK, paddr_next,
 					     E820_TYPE_RESERVED_KERN))
-				set_pud(pud, __pud(0));
+				set_pud_safe(pud, __pud(0));
 			continue;
 		}
 
@@ -611,7 +611,7 @@ phys_pud_init(pud_t *pud_page, unsigned long paddr, unsigned long paddr_end,
 		if (page_size_mask & (1<<PG_LEVEL_1G)) {
 			pages++;
 			spin_lock(&init_mm.page_table_lock);
-			set_pte((pte_t *)pud,
+			set_pte_safe((pte_t *)pud,
 				pfn_pte((paddr & PUD_MASK) >> PAGE_SHIFT,
 					PAGE_KERNEL_LARGE));
 			spin_unlock(&init_mm.page_table_lock);
@@ -624,7 +624,7 @@ phys_pud_init(pud_t *pud_page, unsigned long paddr, unsigned long paddr_end,
 					   page_size_mask, prot);
 
 		spin_lock(&init_mm.page_table_lock);
-		pud_populate(&init_mm, pud, pmd);
+		pud_populate_safe(&init_mm, pud, pmd);
 		spin_unlock(&init_mm.page_table_lock);
 	}
 	__flush_tlb_all();
@@ -659,7 +659,7 @@ phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, unsigned long paddr_end,
 					     E820_TYPE_RAM) &&
 			    !e820__mapped_any(paddr & P4D_MASK, paddr_next,
 					     E820_TYPE_RESERVED_KERN))
-				set_p4d(p4d, __p4d(0));
+				set_p4d_safe(p4d, __p4d(0));
 			continue;
 		}
 
@@ -677,7 +677,7 @@ phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, unsigned long paddr_end,
 					   page_size_mask);
 
 		spin_lock(&init_mm.page_table_lock);
-		p4d_populate(&init_mm, p4d, pud);
+		p4d_populate_safe(&init_mm, p4d, pud);
 		spin_unlock(&init_mm.page_table_lock);
 	}
 	__flush_tlb_all();
@@ -723,9 +723,9 @@ kernel_physical_mapping_init(unsigned long paddr_start,
 
 		spin_lock(&init_mm.page_table_lock);
 		if (pgtable_l5_enabled())
-			pgd_populate(&init_mm, pgd, p4d);
+			pgd_populate_safe(&init_mm, pgd, p4d);
 		else
-			p4d_populate(&init_mm, p4d_offset(pgd, vaddr), (pud_t *) p4d);
+			p4d_populate_safe(&init_mm, p4d_offset(pgd, vaddr), (pud_t *) p4d);
 		spin_unlock(&init_mm.page_table_lock);
 		pgd_changed = true;
 	}
diff --git a/include/asm-generic/5level-fixup.h b/include/asm-generic/5level-fixup.h
index 73474bb52344..bb6cb347018c 100644
--- a/include/asm-generic/5level-fixup.h
+++ b/include/asm-generic/5level-fixup.h
@@ -26,6 +26,7 @@
 #define p4d_clear(p4d)			pgd_clear(p4d)
 #define p4d_val(p4d)			pgd_val(p4d)
 #define p4d_populate(mm, p4d, pud)	pgd_populate(mm, p4d, pud)
+#define p4d_populate_safe(mm, p4d, pud)	pgd_populate(mm, p4d, pud)
 #define p4d_page(p4d)			pgd_page(p4d)
 #define p4d_page_vaddr(p4d)		pgd_page_vaddr(p4d)
 
diff --git a/include/asm-generic/pgtable-nop4d-hack.h b/include/asm-generic/pgtable-nop4d-hack.h
index 1d6dd38c0e5e..829bdb0d6327 100644
--- a/include/asm-generic/pgtable-nop4d-hack.h
+++ b/include/asm-generic/pgtable-nop4d-hack.h
@@ -31,6 +31,7 @@ static inline void pgd_clear(pgd_t *pgd)	{ }
 #define pud_ERROR(pud)				(pgd_ERROR((pud).pgd))
 
 #define pgd_populate(mm, pgd, pud)		do { } while (0)
+#define pgd_populate_safe(mm, pgd, pud)		do { } while (0)
 /*
  * (puds are folded into pgds so this doesn't get actually called,
  * but the define is needed for a generic inline function.)
diff --git a/include/asm-generic/pgtable-nop4d.h b/include/asm-generic/pgtable-nop4d.h
index 04cb913797bc..aebab905e6cd 100644
--- a/include/asm-generic/pgtable-nop4d.h
+++ b/include/asm-generic/pgtable-nop4d.h
@@ -26,6 +26,7 @@ static inline void pgd_clear(pgd_t *pgd)	{ }
 #define p4d_ERROR(p4d)				(pgd_ERROR((p4d).pgd))
 
 #define pgd_populate(mm, pgd, p4d)		do { } while (0)
+#define pgd_populate_safe(mm, pgd, p4d)		do { } while (0)
 /*
  * (p4ds are folded into pgds so this doesn't get actually called,
  * but the define is needed for a generic inline function.)
diff --git a/include/asm-generic/pgtable-nopud.h b/include/asm-generic/pgtable-nopud.h
index 9bef475db6fe..c77a1d301155 100644
--- a/include/asm-generic/pgtable-nopud.h
+++ b/include/asm-generic/pgtable-nopud.h
@@ -35,6 +35,7 @@ static inline void p4d_clear(p4d_t *p4d)	{ }
 #define pud_ERROR(pud)				(p4d_ERROR((pud).p4d))
 
 #define p4d_populate(mm, p4d, pud)		do { } while (0)
+#define p4d_populate_safe(mm, p4d, pud)		do { } while (0)
 /*
  * (puds are folded into p4ds so this doesn't get actually called,
  * but the define is needed for a generic inline function.)

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [tip:x86/mm] x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init()
  2018-12-04 21:37 ` [PATCH v3 5/5] x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init() Dan Williams
@ 2018-12-05 18:07   ` tip-bot for Dan Williams
  0 siblings, 0 replies; 10+ messages in thread
From: tip-bot for Dan Williams @ 2018-12-05 18:07 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dave.hansen, linux-kernel, dan.j.williams, torvalds, stable,
	mingo, bp, luto, bigeasy, peterz, riel, tglx, kirill.shutemov,
	hpa

Commit-ID:  ba6f508d0ec4adb09f0a939af6d5e19cdfa8667d
Gitweb:     https://git.kernel.org/tip/ba6f508d0ec4adb09f0a939af6d5e19cdfa8667d
Author:     Dan Williams <dan.j.williams@intel.com>
AuthorDate: Tue, 4 Dec 2018 13:37:27 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 5 Dec 2018 09:03:07 +0100

x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init()

Commit:

  f77084d96355 "x86/mm/pat: Disable preemption around __flush_tlb_all()"

addressed a case where __flush_tlb_all() is called without preemption
being disabled. It also left a warning to catch other cases where
preemption is not disabled.

That warning triggers for the memory hotplug path which is also used for
persistent memory enabling:

 WARNING: CPU: 35 PID: 911 at ./arch/x86/include/asm/tlbflush.h:460
 RIP: 0010:__flush_tlb_all+0x1b/0x3a
 [..]
 Call Trace:
  phys_pud_init+0x29c/0x2bb
  kernel_physical_mapping_init+0xfc/0x219
  init_memory_mapping+0x1a5/0x3b0
  arch_add_memory+0x2c/0x50
  devm_memremap_pages+0x3aa/0x610
  pmem_attach_disk+0x585/0x700 [nd_pmem]

Andy wondered why a path that can sleep was using __flush_tlb_all() [1]
and Dave confirmed the expectation for TLB flush is for modifying /
invalidating existing PTE entries, but not initial population [2]. Drop
the usage of __flush_tlb_all() in phys_{p4d,pud,pmd}_init() on the
expectation that this path is only ever populating empty entries for the
linear map. Note, at linear map teardown time there is a call to the
all-cpu flush_tlb_all() to invalidate the removed mappings.

[1]: https://lkml.kernel.org/r/9DFD717D-857D-493D-A606-B635D72BAC21@amacapital.net
[2]: https://lkml.kernel.org/r/749919a4-cdb1-48a3-adb4-adb81a5fa0b5@intel.com

[ mingo: Minor readability edits. ]

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Fixes: f77084d96355 ("x86/mm/pat: Disable preemption around __flush_tlb_all()")
Link: http://lkml.kernel.org/r/154395944713.32119.15611079023837132638.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/mm/init_64.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 3e25ac2793ef..484c1b92f078 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -584,7 +584,6 @@ phys_pud_init(pud_t *pud_page, unsigned long paddr, unsigned long paddr_end,
 							   paddr_end,
 							   page_size_mask,
 							   prot);
-				__flush_tlb_all();
 				continue;
 			}
 			/*
@@ -627,7 +626,6 @@ phys_pud_init(pud_t *pud_page, unsigned long paddr, unsigned long paddr_end,
 		pud_populate_safe(&init_mm, pud, pmd);
 		spin_unlock(&init_mm.page_table_lock);
 	}
-	__flush_tlb_all();
 
 	update_page_count(PG_LEVEL_1G, pages);
 
@@ -668,7 +666,6 @@ phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, unsigned long paddr_end,
 			paddr_last = phys_pud_init(pud, paddr,
 					paddr_end,
 					page_size_mask);
-			__flush_tlb_all();
 			continue;
 		}
 
@@ -680,7 +677,6 @@ phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, unsigned long paddr_end,
 		p4d_populate_safe(&init_mm, p4d, pud);
 		spin_unlock(&init_mm.page_table_lock);
 	}
-	__flush_tlb_all();
 
 	return paddr_last;
 }
@@ -733,8 +729,6 @@ kernel_physical_mapping_init(unsigned long paddr_start,
 	if (pgd_changed)
 		sync_global_pgds(vaddr_start, vaddr_end - 1);
 
-	__flush_tlb_all();
-
 	return paddr_last;
 }
 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-12-05 18:07 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-04 21:37 [PATCH v3 0/5] x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init() Dan Williams
2018-12-04 21:37 ` [PATCH v3 1/5] generic/pgtable: Make {pmd, pud}_same() unconditionally available Dan Williams
2018-12-05 18:05   ` [tip:x86/mm] " tip-bot for Dan Williams
2018-12-04 21:37 ` [PATCH v3 2/5] generic/pgtable: Introduce {p4d,pgd}_same() Dan Williams
2018-12-05 18:05   ` [tip:x86/mm] " tip-bot for Dan Williams
2018-12-04 21:37 ` [PATCH v3 3/5] generic/pgtable: Introduce set_pte_safe() Dan Williams
2018-12-04 21:37 ` [PATCH v3 4/5] x86/mm: Validate kernel_physical_mapping_init() pte population Dan Williams
2018-12-05 18:06   ` [tip:x86/mm] x86/mm: Validate kernel_physical_mapping_init() PTE population tip-bot for Dan Williams
2018-12-04 21:37 ` [PATCH v3 5/5] x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init() Dan Williams
2018-12-05 18:07   ` [tip:x86/mm] " tip-bot for Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).