All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/4] x86/mm: Fix some issues with using trampoline_pgd
@ 2021-12-02 15:32 Joerg Roedel
  2021-12-02 15:32 ` [PATCH v4 1/4] x86/realmode: Add comment for Global bit usage in trampline_pgd Joerg Roedel
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Joerg Roedel @ 2021-12-02 15:32 UTC (permalink / raw)
  To: x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, hpa, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Mike Rapoport, Andrew Morton,
	Brijesh Singh, linux-kernel, Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Hi

here are a couple of fixes and documentation improvements for the use of
the trampoline_pgd in the kernel. Most importantly it fixes the issue
that switching to the trampoline_pgd will unmap the kernel stack and
real_mode_header, making crashes likely before the code can actually
jump to real mode.

The first patch adds a comment to document that the trampoline_pgd
aliases kernel page-tables in the user address range, establishing
global TLB entries for these addresses. The next two patches add
global TLB flushes when switching to and from the trampoline_pgd.

The last patch extends the trampoline_pgd to cover the whole kernel
address range. This is needed to make sure the stack and the
real_mode_header are still mapped after the switch and that the code
flow can safely reach real-mode.

Please review.

Thanks,

	Joerg

Changes v3->v4:

	- Rebased to latest tip/master
	- Addressed Boris' review comments

Link to v3: https://lore.kernel.org/all/20211001154817.29225-1-joro@8bytes.org/

Joerg Roedel (4):
  x86/realmode: Add comment for Global bit usage in trampline_pgd
  x86/mm/64: Flush global TLB on boot and AP bringup
  x86/mm: Flush global TLB when switching to trampoline page-table
  x86/64/mm: Map all kernel memory into trampoline_pgd

 arch/x86/include/asm/realmode.h |  1 +
 arch/x86/include/asm/tlbflush.h |  5 +++++
 arch/x86/kernel/head64.c        |  2 ++
 arch/x86/kernel/head_64.S       | 19 ++++++++++++++++-
 arch/x86/kernel/reboot.c        | 12 ++---------
 arch/x86/mm/init.c              |  5 +++++
 arch/x86/mm/tlb.c               |  8 ++-----
 arch/x86/realmode/init.c        | 38 ++++++++++++++++++++++++++++++++-
 8 files changed, 72 insertions(+), 18 deletions(-)


base-commit: b6c28e3cc445bf451a516ac075ec27b4619e4f5f
-- 
2.34.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v4 1/4] x86/realmode: Add comment for Global bit usage in trampline_pgd
  2021-12-02 15:32 [PATCH v4 0/4] x86/mm: Fix some issues with using trampoline_pgd Joerg Roedel
@ 2021-12-02 15:32 ` Joerg Roedel
  2021-12-06 21:57   ` [tip: x86/mm] x86/realmode: Add comment for Global bit usage in trampoline_pgd tip-bot2 for Joerg Roedel
  2021-12-02 15:32 ` [PATCH v4 2/4] x86/mm/64: Flush global TLB on boot and AP bringup Joerg Roedel
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Joerg Roedel @ 2021-12-02 15:32 UTC (permalink / raw)
  To: x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, hpa, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Mike Rapoport, Andrew Morton,
	Brijesh Singh, linux-kernel, Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Document the fact that using the trampoline_pgd will result in the
creation of global TLB entries in the user range of the address
space.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/mm/init.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 1895986842b9..4ba024d5b63a 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -714,6 +714,11 @@ static void __init memory_map_bottom_up(unsigned long map_start,
 static void __init init_trampoline(void)
 {
 #ifdef CONFIG_X86_64
+	/*
+	 * The code below will alias kernel page-tables in the user-range of the
+	 * address space, including the Global bit. So global TLB entries will
+	 * be created when using the trampoline page-table.
+	 */
 	if (!kaslr_memory_enabled())
 		trampoline_pgd_entry = init_top_pgt[pgd_index(__PAGE_OFFSET)];
 	else
-- 
2.34.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v4 2/4] x86/mm/64: Flush global TLB on boot and AP bringup
  2021-12-02 15:32 [PATCH v4 0/4] x86/mm: Fix some issues with using trampoline_pgd Joerg Roedel
  2021-12-02 15:32 ` [PATCH v4 1/4] x86/realmode: Add comment for Global bit usage in trampline_pgd Joerg Roedel
@ 2021-12-02 15:32 ` Joerg Roedel
  2021-12-06 21:57   ` [tip: x86/mm] " tip-bot2 for Joerg Roedel
  2021-12-02 15:32 ` [PATCH v4 3/4] x86/mm: Flush global TLB when switching to trampoline page-table Joerg Roedel
  2021-12-02 15:32 ` [PATCH v4 4/4] x86/64/mm: Map all kernel memory into trampoline_pgd Joerg Roedel
  3 siblings, 1 reply; 9+ messages in thread
From: Joerg Roedel @ 2021-12-02 15:32 UTC (permalink / raw)
  To: x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, hpa, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Mike Rapoport, Andrew Morton,
	Brijesh Singh, linux-kernel, Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

The AP bringup code uses the trampoline_pgd page-table, which
establishes global mappings in the user range of the address space.
Flush the global TLB entries after the indentity mappings are removed
so no stale entries remain in the TLB.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/tlbflush.h |  5 +++++
 arch/x86/kernel/head64.c        |  2 ++
 arch/x86/kernel/head_64.S       | 19 ++++++++++++++++++-
 arch/x86/mm/tlb.c               |  8 ++------
 4 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index b587a9ee9cb2..98fa0a114074 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -261,4 +261,9 @@ extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch);
 
 #endif /* !MODULE */
 
+static inline void __native_tlb_flush_global(unsigned long cr4)
+{
+	native_write_cr4(cr4 ^ X86_CR4_PGE);
+	native_write_cr4(cr4);
+}
 #endif /* _ASM_X86_TLBFLUSH_H */
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 3be9dd213dad..3890fe64ffff 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -485,6 +485,8 @@ asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
 	/* Kill off the identity-map trampoline */
 	reset_early_page_tables();
 
+	__native_tlb_flush_global(native_read_cr4());
+
 	clear_bss();
 
 	clear_page(init_top_pgt);
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index d8b3ebd2bb85..9c63fc5988cd 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -166,9 +166,26 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
 	call	sev_verify_cbit
 	popq	%rsi
 
-	/* Switch to new page-table */
+	/*
+	 * Switch to new page-table
+	 *
+	 * For the boot CPU this switches to early_top_pgt which still has the
+	 * indentity mappings present. The secondary CPUs will switch to the
+	 * init_top_pgt here, away from the trampoline_pgd and unmap the
+	 * indentity mapped ranges.
+	 */
 	movq	%rax, %cr3
 
+	/*
+	 * Do a global TLB flush after the CR3 switch to make sure the TLB
+	 * entries from the identity mapping are flushed.
+	 */
+	movq	%cr4, %rcx
+	movq	%rcx, %rax
+	xorq	$X86_CR4_PGE, %rcx
+	movq	%rcx, %cr4
+	movq	%rax, %cr4
+
 	/* Ensure I am executing from virtual addresses */
 	movq	$1f, %rax
 	ANNOTATE_RETPOLINE_SAFE
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 92bb03b9ceb5..a6cf56a14939 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -1148,7 +1148,7 @@ void flush_tlb_one_user(unsigned long addr)
  */
 STATIC_NOPV void native_flush_tlb_global(void)
 {
-	unsigned long cr4, flags;
+	unsigned long flags;
 
 	if (static_cpu_has(X86_FEATURE_INVPCID)) {
 		/*
@@ -1168,11 +1168,7 @@ STATIC_NOPV void native_flush_tlb_global(void)
 	 */
 	raw_local_irq_save(flags);
 
-	cr4 = this_cpu_read(cpu_tlbstate.cr4);
-	/* toggle PGE */
-	native_write_cr4(cr4 ^ X86_CR4_PGE);
-	/* write old PGE again and flush TLBs */
-	native_write_cr4(cr4);
+	__native_tlb_flush_global(this_cpu_read(cpu_tlbstate.cr4));
 
 	raw_local_irq_restore(flags);
 }
-- 
2.34.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v4 3/4] x86/mm: Flush global TLB when switching to trampoline page-table
  2021-12-02 15:32 [PATCH v4 0/4] x86/mm: Fix some issues with using trampoline_pgd Joerg Roedel
  2021-12-02 15:32 ` [PATCH v4 1/4] x86/realmode: Add comment for Global bit usage in trampline_pgd Joerg Roedel
  2021-12-02 15:32 ` [PATCH v4 2/4] x86/mm/64: Flush global TLB on boot and AP bringup Joerg Roedel
@ 2021-12-02 15:32 ` Joerg Roedel
  2021-12-06 21:57   ` [tip: x86/mm] " tip-bot2 for Joerg Roedel
  2021-12-02 15:32 ` [PATCH v4 4/4] x86/64/mm: Map all kernel memory into trampoline_pgd Joerg Roedel
  3 siblings, 1 reply; 9+ messages in thread
From: Joerg Roedel @ 2021-12-02 15:32 UTC (permalink / raw)
  To: x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, hpa, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Mike Rapoport, Andrew Morton,
	Brijesh Singh, linux-kernel, Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Move the switching code into a function so that it can be re-used and
add a global TLB flush. This makes sure that usage of memory which is
not mapped in the trampoline page-table is reliably caught.

Also move the clearing of CR4.PCIDE before the CR3 switch because the
cr4_clear_bits() function will access data not mapped into the
trampoline page-table.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/realmode.h |  1 +
 arch/x86/kernel/reboot.c        | 12 ++----------
 arch/x86/realmode/init.c        | 26 ++++++++++++++++++++++++++
 3 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index 5db5d083c873..331474b150f1 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -89,6 +89,7 @@ static inline void set_real_mode_mem(phys_addr_t mem)
 }
 
 void reserve_real_mode(void);
+void load_trampoline_pgtable(void);
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 0a40df66a40d..fa700b46588e 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -113,17 +113,9 @@ void __noreturn machine_real_restart(unsigned int type)
 	spin_unlock(&rtc_lock);
 
 	/*
-	 * Switch back to the initial page table.
+	 * Switch to the trampoline page table.
 	 */
-#ifdef CONFIG_X86_32
-	load_cr3(initial_page_table);
-#else
-	write_cr3(real_mode_header->trampoline_pgd);
-
-	/* Exiting long mode will fail if CR4.PCIDE is set. */
-	if (boot_cpu_has(X86_FEATURE_PCID))
-		cr4_clear_bits(X86_CR4_PCIDE);
-#endif
+	load_trampoline_pgtable();
 
 	/* Jump to the identity-mapped low memory code */
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 4a3da7592b99..6d98609387ba 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -17,6 +17,32 @@ u32 *trampoline_cr4_features;
 /* Hold the pgd entry used on booting additional CPUs */
 pgd_t trampoline_pgd_entry;
 
+void load_trampoline_pgtable(void)
+{
+#ifdef CONFIG_X86_32
+	load_cr3(initial_page_table);
+#else
+	/*
+	 * This function is called before exiting to real-mode and that will
+	 * fail with CR4.PCIDE still set.
+	 */
+	if (boot_cpu_has(X86_FEATURE_PCID))
+		cr4_clear_bits(X86_CR4_PCIDE);
+
+	write_cr3(real_mode_header->trampoline_pgd);
+#endif
+
+	/*
+	 * The CR3 write above will not flush global TLB entries.
+	 * Stale, global entries from previous page tables may still be
+	 * present.  Flush those stale entries.
+	 *
+	 * This ensures that memory accessed while running with
+	 * trampoline_pgd is *actually* mapped into trampoline_pgd.
+	 */
+	__flush_tlb_all();
+}
+
 void __init reserve_real_mode(void)
 {
 	phys_addr_t mem;
-- 
2.34.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v4 4/4] x86/64/mm: Map all kernel memory into trampoline_pgd
  2021-12-02 15:32 [PATCH v4 0/4] x86/mm: Fix some issues with using trampoline_pgd Joerg Roedel
                   ` (2 preceding siblings ...)
  2021-12-02 15:32 ` [PATCH v4 3/4] x86/mm: Flush global TLB when switching to trampoline page-table Joerg Roedel
@ 2021-12-02 15:32 ` Joerg Roedel
  2021-12-03 10:04   ` [tip: x86/urgent] " tip-bot2 for Joerg Roedel
  3 siblings, 1 reply; 9+ messages in thread
From: Joerg Roedel @ 2021-12-02 15:32 UTC (permalink / raw)
  To: x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, hpa, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Mike Rapoport, Andrew Morton,
	Brijesh Singh, linux-kernel, Joerg Roedel, Joerg Roedel, stable

From: Joerg Roedel <jroedel@suse.de>

The trampoline_pgd only maps the 0xfffffff000000000-0xffffffffffffffff
range of kernel memory (with 4-level paging). This range contains the
kernels text+data+bss mappings and the module mapping space, but not the
direct mapping and the vmalloc area.

This is enough to get an application processors out of real-mode, but
for code that switches back to real-mode the trampoline_pgd is missing
important parts of the address space. For example, consider this code
from arch/x86/kernel/reboot.c, function machine_real_restart() for a
64-bit kernel:

	#ifdef CONFIG_X86_32
		load_cr3(initial_page_table);
	#else
		write_cr3(real_mode_header->trampoline_pgd);

		/* Exiting long mode will fail if CR4.PCIDE is set. */
		if (boot_cpu_has(X86_FEATURE_PCID))
			cr4_clear_bits(X86_CR4_PCIDE);
	#endif

		/* Jump to the identity-mapped low memory code */
	#ifdef CONFIG_X86_32
		asm volatile("jmpl *%0" : :
			     "rm" (real_mode_header->machine_real_restart_asm),
			     "a" (type));
	#else
		asm volatile("ljmpl *%0" : :
			     "m" (real_mode_header->machine_real_restart_asm),
			     "D" (type));
	#endif

The code switches to the trampoline_pgd, which unmaps the direct mapping
and also the kernel stack. The call to cr4_clear_bits() will find no
stack and crash the machine. The real_mode_header pointer below points
into the direct mapping, and dereferencing it also causes a crash.

The reason this does not crash always is only that kernel mappings are
global and the CR3 switch does not flush those mappings. But if theses
mappings are not in the TLB already, the above code will crash before it
can jump to the real-mode stub.

Extend the trampoline_pgd to contain all kernel mappings to prevent
these crashes and to make code which runs on this page-table more
robust.

Cc: stable@vger.kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/realmode/init.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 6d98609387ba..c5e29db02a46 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -98,6 +98,7 @@ static void __init setup_real_mode(void)
 #ifdef CONFIG_X86_64
 	u64 *trampoline_pgd;
 	u64 efer;
+	int i;
 #endif
 
 	base = (unsigned char *)real_mode_header;
@@ -154,8 +155,17 @@ static void __init setup_real_mode(void)
 	trampoline_header->flags = 0;
 
 	trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
+
+	/* Map the real mode stub as virtual == physical */
 	trampoline_pgd[0] = trampoline_pgd_entry.pgd;
-	trampoline_pgd[511] = init_top_pgt[511].pgd;
+
+	/*
+	 * Include the entirety of the kernel mapping into the trampoline
+	 * PGD.  This way, all mappings present in the normal kernel page
+	 * tables are usable while running on trampoline_pgd.
+	 */
+	for (i = pgd_index(__PAGE_OFFSET); i < PTRS_PER_PGD; i++)
+		trampoline_pgd[i] = init_top_pgt[i].pgd;
 #endif
 
 	sme_sev_setup_real_mode(trampoline_header);
-- 
2.34.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [tip: x86/urgent] x86/64/mm: Map all kernel memory into trampoline_pgd
  2021-12-02 15:32 ` [PATCH v4 4/4] x86/64/mm: Map all kernel memory into trampoline_pgd Joerg Roedel
@ 2021-12-03 10:04   ` tip-bot2 for Joerg Roedel
  0 siblings, 0 replies; 9+ messages in thread
From: tip-bot2 for Joerg Roedel @ 2021-12-03 10:04 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Joerg Roedel, Borislav Petkov, stable, x86, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     51523ed1c26758de1af7e58730a656875f72f783
Gitweb:        https://git.kernel.org/tip/51523ed1c26758de1af7e58730a656875f72f783
Author:        Joerg Roedel <jroedel@suse.de>
AuthorDate:    Thu, 02 Dec 2021 16:32:26 +01:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Fri, 03 Dec 2021 09:11:43 +01:00

x86/64/mm: Map all kernel memory into trampoline_pgd

The trampoline_pgd only maps the 0xfffffff000000000-0xffffffffffffffff
range of kernel memory (with 4-level paging). This range contains the
kernel's text+data+bss mappings and the module mapping space but not the
direct mapping and the vmalloc area.

This is enough to get the application processors out of real-mode, but
for code that switches back to real-mode the trampoline_pgd is missing
important parts of the address space. For example, consider this code
from arch/x86/kernel/reboot.c, function machine_real_restart() for a
64-bit kernel:

  #ifdef CONFIG_X86_32
  	load_cr3(initial_page_table);
  #else
  	write_cr3(real_mode_header->trampoline_pgd);

  	/* Exiting long mode will fail if CR4.PCIDE is set. */
  	if (boot_cpu_has(X86_FEATURE_PCID))
  		cr4_clear_bits(X86_CR4_PCIDE);
  #endif

  	/* Jump to the identity-mapped low memory code */
  #ifdef CONFIG_X86_32
  	asm volatile("jmpl *%0" : :
  		     "rm" (real_mode_header->machine_real_restart_asm),
  		     "a" (type));
  #else
  	asm volatile("ljmpl *%0" : :
  		     "m" (real_mode_header->machine_real_restart_asm),
  		     "D" (type));
  #endif

The code switches to the trampoline_pgd, which unmaps the direct mapping
and also the kernel stack. The call to cr4_clear_bits() will find no
stack and crash the machine. The real_mode_header pointer below points
into the direct mapping, and dereferencing it also causes a crash.

The reason this does not crash always is only that kernel mappings are
global and the CR3 switch does not flush those mappings. But if theses
mappings are not in the TLB already, the above code will crash before it
can jump to the real-mode stub.

Extend the trampoline_pgd to contain all kernel mappings to prevent
these crashes and to make code which runs on this page-table more
robust.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20211202153226.22946-5-joro@8bytes.org
---
 arch/x86/realmode/init.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 4a3da75..38d24d2 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -72,6 +72,7 @@ static void __init setup_real_mode(void)
 #ifdef CONFIG_X86_64
 	u64 *trampoline_pgd;
 	u64 efer;
+	int i;
 #endif
 
 	base = (unsigned char *)real_mode_header;
@@ -128,8 +129,17 @@ static void __init setup_real_mode(void)
 	trampoline_header->flags = 0;
 
 	trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
+
+	/* Map the real mode stub as virtual == physical */
 	trampoline_pgd[0] = trampoline_pgd_entry.pgd;
-	trampoline_pgd[511] = init_top_pgt[511].pgd;
+
+	/*
+	 * Include the entirety of the kernel mapping into the trampoline
+	 * PGD.  This way, all mappings present in the normal kernel page
+	 * tables are usable while running on trampoline_pgd.
+	 */
+	for (i = pgd_index(__PAGE_OFFSET); i < PTRS_PER_PGD; i++)
+		trampoline_pgd[i] = init_top_pgt[i].pgd;
 #endif
 
 	sme_sev_setup_real_mode(trampoline_header);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [tip: x86/mm] x86/mm: Flush global TLB when switching to trampoline page-table
  2021-12-02 15:32 ` [PATCH v4 3/4] x86/mm: Flush global TLB when switching to trampoline page-table Joerg Roedel
@ 2021-12-06 21:57   ` tip-bot2 for Joerg Roedel
  0 siblings, 0 replies; 9+ messages in thread
From: tip-bot2 for Joerg Roedel @ 2021-12-06 21:57 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Joerg Roedel, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/mm branch of tip:

Commit-ID:     71d5049b053876afbde6c3273250b76935494ab2
Gitweb:        https://git.kernel.org/tip/71d5049b053876afbde6c3273250b76935494ab2
Author:        Joerg Roedel <jroedel@suse.de>
AuthorDate:    Thu, 02 Dec 2021 16:32:25 +01:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Mon, 06 Dec 2021 09:54:10 +01:00

x86/mm: Flush global TLB when switching to trampoline page-table

Move the switching code into a function so that it can be re-used and
add a global TLB flush. This makes sure that usage of memory which is
not mapped in the trampoline page-table is reliably caught.

Also move the clearing of CR4.PCIDE before the CR3 switch because the
cr4_clear_bits() function will access data not mapped into the
trampoline page-table.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211202153226.22946-4-joro@8bytes.org
---
 arch/x86/include/asm/realmode.h |  1 +
 arch/x86/kernel/reboot.c        | 12 ++----------
 arch/x86/realmode/init.c        | 26 ++++++++++++++++++++++++++
 3 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index 5db5d08..331474b 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -89,6 +89,7 @@ static inline void set_real_mode_mem(phys_addr_t mem)
 }
 
 void reserve_real_mode(void);
+void load_trampoline_pgtable(void);
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 0a40df6..fa700b4 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -113,17 +113,9 @@ void __noreturn machine_real_restart(unsigned int type)
 	spin_unlock(&rtc_lock);
 
 	/*
-	 * Switch back to the initial page table.
+	 * Switch to the trampoline page table.
 	 */
-#ifdef CONFIG_X86_32
-	load_cr3(initial_page_table);
-#else
-	write_cr3(real_mode_header->trampoline_pgd);
-
-	/* Exiting long mode will fail if CR4.PCIDE is set. */
-	if (boot_cpu_has(X86_FEATURE_PCID))
-		cr4_clear_bits(X86_CR4_PCIDE);
-#endif
+	load_trampoline_pgtable();
 
 	/* Jump to the identity-mapped low memory code */
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 4a3da75..6d98609 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -17,6 +17,32 @@ u32 *trampoline_cr4_features;
 /* Hold the pgd entry used on booting additional CPUs */
 pgd_t trampoline_pgd_entry;
 
+void load_trampoline_pgtable(void)
+{
+#ifdef CONFIG_X86_32
+	load_cr3(initial_page_table);
+#else
+	/*
+	 * This function is called before exiting to real-mode and that will
+	 * fail with CR4.PCIDE still set.
+	 */
+	if (boot_cpu_has(X86_FEATURE_PCID))
+		cr4_clear_bits(X86_CR4_PCIDE);
+
+	write_cr3(real_mode_header->trampoline_pgd);
+#endif
+
+	/*
+	 * The CR3 write above will not flush global TLB entries.
+	 * Stale, global entries from previous page tables may still be
+	 * present.  Flush those stale entries.
+	 *
+	 * This ensures that memory accessed while running with
+	 * trampoline_pgd is *actually* mapped into trampoline_pgd.
+	 */
+	__flush_tlb_all();
+}
+
 void __init reserve_real_mode(void)
 {
 	phys_addr_t mem;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [tip: x86/mm] x86/mm/64: Flush global TLB on boot and AP bringup
  2021-12-02 15:32 ` [PATCH v4 2/4] x86/mm/64: Flush global TLB on boot and AP bringup Joerg Roedel
@ 2021-12-06 21:57   ` tip-bot2 for Joerg Roedel
  0 siblings, 0 replies; 9+ messages in thread
From: tip-bot2 for Joerg Roedel @ 2021-12-06 21:57 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Joerg Roedel, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/mm branch of tip:

Commit-ID:     f154f290855b070cc94dd44ad253c0ef8a9337bb
Gitweb:        https://git.kernel.org/tip/f154f290855b070cc94dd44ad253c0ef8a9337bb
Author:        Joerg Roedel <jroedel@suse.de>
AuthorDate:    Thu, 02 Dec 2021 16:32:24 +01:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Mon, 06 Dec 2021 09:38:48 +01:00

x86/mm/64: Flush global TLB on boot and AP bringup

The AP bringup code uses the trampoline_pgd page-table which
establishes global mappings in the user range of the address space.
Flush the global TLB entries after the indentity mappings are removed so
no stale entries remain in the TLB.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211202153226.22946-3-joro@8bytes.org
---
 arch/x86/include/asm/tlbflush.h |  5 +++++
 arch/x86/kernel/head64.c        |  2 ++
 arch/x86/kernel/head_64.S       | 19 ++++++++++++++++++-
 arch/x86/mm/tlb.c               |  8 ++------
 4 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index b587a9e..98fa0a1 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -261,4 +261,9 @@ extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch);
 
 #endif /* !MODULE */
 
+static inline void __native_tlb_flush_global(unsigned long cr4)
+{
+	native_write_cr4(cr4 ^ X86_CR4_PGE);
+	native_write_cr4(cr4);
+}
 #endif /* _ASM_X86_TLBFLUSH_H */
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index fc5371a..75acb60 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -483,6 +483,8 @@ asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
 	/* Kill off the identity-map trampoline */
 	reset_early_page_tables();
 
+	__native_tlb_flush_global(native_read_cr4());
+
 	clear_bss();
 
 	clear_page(init_top_pgt);
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index d8b3ebd..9c63fc5 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -166,9 +166,26 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
 	call	sev_verify_cbit
 	popq	%rsi
 
-	/* Switch to new page-table */
+	/*
+	 * Switch to new page-table
+	 *
+	 * For the boot CPU this switches to early_top_pgt which still has the
+	 * indentity mappings present. The secondary CPUs will switch to the
+	 * init_top_pgt here, away from the trampoline_pgd and unmap the
+	 * indentity mapped ranges.
+	 */
 	movq	%rax, %cr3
 
+	/*
+	 * Do a global TLB flush after the CR3 switch to make sure the TLB
+	 * entries from the identity mapping are flushed.
+	 */
+	movq	%cr4, %rcx
+	movq	%rcx, %rax
+	xorq	$X86_CR4_PGE, %rcx
+	movq	%rcx, %cr4
+	movq	%rax, %cr4
+
 	/* Ensure I am executing from virtual addresses */
 	movq	$1f, %rax
 	ANNOTATE_RETPOLINE_SAFE
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 59ba296..1e6513f 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -1148,7 +1148,7 @@ void flush_tlb_one_user(unsigned long addr)
  */
 STATIC_NOPV void native_flush_tlb_global(void)
 {
-	unsigned long cr4, flags;
+	unsigned long flags;
 
 	if (static_cpu_has(X86_FEATURE_INVPCID)) {
 		/*
@@ -1168,11 +1168,7 @@ STATIC_NOPV void native_flush_tlb_global(void)
 	 */
 	raw_local_irq_save(flags);
 
-	cr4 = this_cpu_read(cpu_tlbstate.cr4);
-	/* toggle PGE */
-	native_write_cr4(cr4 ^ X86_CR4_PGE);
-	/* write old PGE again and flush TLBs */
-	native_write_cr4(cr4);
+	__native_tlb_flush_global(this_cpu_read(cpu_tlbstate.cr4));
 
 	raw_local_irq_restore(flags);
 }

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [tip: x86/mm] x86/realmode: Add comment for Global bit usage in trampoline_pgd
  2021-12-02 15:32 ` [PATCH v4 1/4] x86/realmode: Add comment for Global bit usage in trampline_pgd Joerg Roedel
@ 2021-12-06 21:57   ` tip-bot2 for Joerg Roedel
  0 siblings, 0 replies; 9+ messages in thread
From: tip-bot2 for Joerg Roedel @ 2021-12-06 21:57 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Joerg Roedel, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/mm branch of tip:

Commit-ID:     9de4999050b5f2e847c84372c6a1aa1fe32bb269
Gitweb:        https://git.kernel.org/tip/9de4999050b5f2e847c84372c6a1aa1fe32bb269
Author:        Joerg Roedel <jroedel@suse.de>
AuthorDate:    Thu, 02 Dec 2021 16:32:23 +01:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Sat, 04 Dec 2021 13:50:08 +01:00

x86/realmode: Add comment for Global bit usage in trampoline_pgd

Document the fact that using the trampoline_pgd will result in the
creation of global TLB entries in the user range of the address
space.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211202153226.22946-2-joro@8bytes.org
---
 arch/x86/mm/init.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 1895986..4ba024d 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -714,6 +714,11 @@ static void __init memory_map_bottom_up(unsigned long map_start,
 static void __init init_trampoline(void)
 {
 #ifdef CONFIG_X86_64
+	/*
+	 * The code below will alias kernel page-tables in the user-range of the
+	 * address space, including the Global bit. So global TLB entries will
+	 * be created when using the trampoline page-table.
+	 */
 	if (!kaslr_memory_enabled())
 		trampoline_pgd_entry = init_top_pgt[pgd_index(__PAGE_OFFSET)];
 	else

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-12-06 21:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-02 15:32 [PATCH v4 0/4] x86/mm: Fix some issues with using trampoline_pgd Joerg Roedel
2021-12-02 15:32 ` [PATCH v4 1/4] x86/realmode: Add comment for Global bit usage in trampline_pgd Joerg Roedel
2021-12-06 21:57   ` [tip: x86/mm] x86/realmode: Add comment for Global bit usage in trampoline_pgd tip-bot2 for Joerg Roedel
2021-12-02 15:32 ` [PATCH v4 2/4] x86/mm/64: Flush global TLB on boot and AP bringup Joerg Roedel
2021-12-06 21:57   ` [tip: x86/mm] " tip-bot2 for Joerg Roedel
2021-12-02 15:32 ` [PATCH v4 3/4] x86/mm: Flush global TLB when switching to trampoline page-table Joerg Roedel
2021-12-06 21:57   ` [tip: x86/mm] " tip-bot2 for Joerg Roedel
2021-12-02 15:32 ` [PATCH v4 4/4] x86/64/mm: Map all kernel memory into trampoline_pgd Joerg Roedel
2021-12-03 10:04   ` [tip: x86/urgent] " tip-bot2 for Joerg Roedel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.