linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/4] x86/mm: Fix some issues with using trampoline_pgd
@ 2021-10-01 15:48 Joerg Roedel
  2021-10-01 15:48 ` [PATCH v3 1/4] x86/realmode: Add comment for Global bit usage in trampline_pgd Joerg Roedel
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Joerg Roedel @ 2021-10-01 15:48 UTC (permalink / raw)
  To: x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, hpa, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Joerg Roedel, Mike Rapoport,
	Andrew Morton, Brijesh Singh, linux-kernel

From: Joerg Roedel <jroedel@suse.de>

Hi,

here are a couple of fixes and documentation improvements for the use
of the trampoline_pgd in the kernel. Most importantly it fixes the
issue that switching to the trampoline_pgd will unmap the kernel stack
and real_mode_header, making crashes likely before the code can
actually jump to real mode.

The first patch adds a comment to document that the trampoline_pgd
aliases kernel page-tables in the user address range, establishing
global TLB entries for these addresses. The next two patches add
global TLB flushes when switching to and from the trampoline_pgd.

The last patch extends the trampoline_pgd to cover the whole kernel
address range. This is needed to make sure the stack and the
real_mode_header are still mapped after the switch and that the code
flow can safely reach real-mode.

Please review.

Thanks,

	Joerg

Changes v2->v3:

	- Addressed review comments from Dave Hansen

Link to v2: https://lore.kernel.org/lkml/20210929145501.4612-1-joro@8bytes.org/

Joerg Roedel (4):
  x86/realmode: Add comment for Global bit usage in trampline_pgd
  x86/mm/64: Flush global TLB on boot and AP bringup
  x86/mm: Flush global TLB when switching to trampoline page-table
  x86/64/mm: Map all kernel memory into trampoline_pgd

 arch/x86/include/asm/realmode.h |  1 +
 arch/x86/kernel/head64.c        | 15 ++++++++++++++
 arch/x86/kernel/head_64.S       | 19 +++++++++++++++++-
 arch/x86/kernel/reboot.c        | 12 ++---------
 arch/x86/mm/init.c              |  5 +++++
 arch/x86/realmode/init.c        | 35 ++++++++++++++++++++++++++++++++-
 6 files changed, 75 insertions(+), 12 deletions(-)


base-commit: 5816b3e6577eaa676ceb00a848f0fd65fe2adc29
-- 
2.33.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3 1/4] x86/realmode: Add comment for Global bit usage in trampline_pgd
  2021-10-01 15:48 [PATCH v3 0/4] x86/mm: Fix some issues with using trampoline_pgd Joerg Roedel
@ 2021-10-01 15:48 ` Joerg Roedel
  2021-10-01 15:48 ` [PATCH v3 2/4] x86/mm/64: Flush global TLB on boot and AP bringup Joerg Roedel
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Joerg Roedel @ 2021-10-01 15:48 UTC (permalink / raw)
  To: x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, hpa, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Joerg Roedel, Mike Rapoport,
	Andrew Morton, Brijesh Singh, linux-kernel

From: Joerg Roedel <jroedel@suse.de>

Document the fact that using the trampoline_pgd will result in the
creation of global TLB entries in the user range of the address
space.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/mm/init.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 23a14d82e783..accd702d4253 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -714,6 +714,11 @@ static void __init memory_map_bottom_up(unsigned long map_start,
 static void __init init_trampoline(void)
 {
 #ifdef CONFIG_X86_64
+	/*
+	 * The code below will alias kernel page-tables in the user-range of the
+	 * address space, including the Global bit. So global TLB entries will
+	 * be created when using the trampoline page-table.
+	 */
 	if (!kaslr_memory_enabled())
 		trampoline_pgd_entry = init_top_pgt[pgd_index(__PAGE_OFFSET)];
 	else
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 2/4] x86/mm/64: Flush global TLB on boot and AP bringup
  2021-10-01 15:48 [PATCH v3 0/4] x86/mm: Fix some issues with using trampoline_pgd Joerg Roedel
  2021-10-01 15:48 ` [PATCH v3 1/4] x86/realmode: Add comment for Global bit usage in trampline_pgd Joerg Roedel
@ 2021-10-01 15:48 ` Joerg Roedel
  2021-10-26  9:55   ` Borislav Petkov
  2021-10-01 15:48 ` [PATCH v3 3/4] x86/mm: Flush global TLB when switching to trampoline page-table Joerg Roedel
  2021-10-01 15:48 ` [PATCH v3 4/4] x86/64/mm: Map all kernel memory into trampoline_pgd Joerg Roedel
  3 siblings, 1 reply; 15+ messages in thread
From: Joerg Roedel @ 2021-10-01 15:48 UTC (permalink / raw)
  To: x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, hpa, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Joerg Roedel, Mike Rapoport,
	Andrew Morton, Brijesh Singh, linux-kernel

From: Joerg Roedel <jroedel@suse.de>

The AP bringup code uses the trampoline_pgd page-table, which
establishes global mappings in the user range of the address space.
Flush the global TLB entries after the indentity mappings are removed
so no stale entries remain in the TLB.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/head64.c  | 15 +++++++++++++++
 arch/x86/kernel/head_64.S | 19 ++++++++++++++++++-
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index de01903c3735..cae21afe0922 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -457,6 +457,19 @@ static void __init copy_bootdata(char *real_mode_data)
 	sme_unmap_bootdata(real_mode_data);
 }
 
+/*
+ * The __flush_tlb_all() function uses all kinds of state which is not
+ * initialized that early and can not be used here. So the helper below is used
+ * to flush global TLB entries.
+ */
+static void __init early_flush_tlb_global(void)
+{
+	unsigned long cr4 = native_read_cr4();
+
+	native_write_cr4(cr4 ^ X86_CR4_PGE);
+	native_write_cr4(cr4);
+}
+
 asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
 {
 	/*
@@ -478,6 +491,8 @@ asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
 	/* Kill off the identity-map trampoline */
 	reset_early_page_tables();
 
+	early_flush_tlb_global();
+
 	clear_bss();
 
 	clear_page(init_top_pgt);
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index d8b3ebd2bb85..bd4b6ebafdc3 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -166,9 +166,26 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
 	call	sev_verify_cbit
 	popq	%rsi
 
-	/* Switch to new page-table */
+	/*
+	 * Switch to new page-table
+	 *
+	 * For the boot CPU this switches to early_top_pgt which still has the
+	 * indentity mappings present. The secondary CPUs will switch to the
+	 * init_top_pgt here, away from the trampoline_pgd and unmapping the
+	 * indentity mapped ranges.
+	 *
+	 * Do a global TLB flush after the CR3 switch to make sure the TLB
+	 * entries from the identity mapping are flushed.
+	 */
 	movq	%rax, %cr3
 
+	/* Flush global TLB entries - only needed for secondary CPUs */
+	movq	%cr4, %rcx
+	movq	%rcx, %rax
+	xorq	$X86_CR4_PGE, %rcx
+	movq	%rcx, %cr4
+	movq	%rax, %cr4
+
 	/* Ensure I am executing from virtual addresses */
 	movq	$1f, %rax
 	ANNOTATE_RETPOLINE_SAFE
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 3/4] x86/mm: Flush global TLB when switching to trampoline page-table
  2021-10-01 15:48 [PATCH v3 0/4] x86/mm: Fix some issues with using trampoline_pgd Joerg Roedel
  2021-10-01 15:48 ` [PATCH v3 1/4] x86/realmode: Add comment for Global bit usage in trampline_pgd Joerg Roedel
  2021-10-01 15:48 ` [PATCH v3 2/4] x86/mm/64: Flush global TLB on boot and AP bringup Joerg Roedel
@ 2021-10-01 15:48 ` Joerg Roedel
  2021-10-01 16:13   ` Dave Hansen
  2021-10-27  9:58   ` Borislav Petkov
  2021-10-01 15:48 ` [PATCH v3 4/4] x86/64/mm: Map all kernel memory into trampoline_pgd Joerg Roedel
  3 siblings, 2 replies; 15+ messages in thread
From: Joerg Roedel @ 2021-10-01 15:48 UTC (permalink / raw)
  To: x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, hpa, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Joerg Roedel, Mike Rapoport,
	Andrew Morton, Brijesh Singh, linux-kernel

From: Joerg Roedel <jroedel@suse.de>

Move the switching code into a function so that it can be re-used and
add a global TLB flush. This makes sure that usage of memory which is
not mapped in the trampoline page-table is reliably caught.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/realmode.h |  1 +
 arch/x86/kernel/reboot.c        | 12 ++----------
 arch/x86/realmode/init.c        | 23 +++++++++++++++++++++++
 3 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index 5db5d083c873..331474b150f1 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -89,6 +89,7 @@ static inline void set_real_mode_mem(phys_addr_t mem)
 }
 
 void reserve_real_mode(void);
+void load_trampoline_pgtable(void);
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 0a40df66a40d..fa700b46588e 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -113,17 +113,9 @@ void __noreturn machine_real_restart(unsigned int type)
 	spin_unlock(&rtc_lock);
 
 	/*
-	 * Switch back to the initial page table.
+	 * Switch to the trampoline page table.
 	 */
-#ifdef CONFIG_X86_32
-	load_cr3(initial_page_table);
-#else
-	write_cr3(real_mode_header->trampoline_pgd);
-
-	/* Exiting long mode will fail if CR4.PCIDE is set. */
-	if (boot_cpu_has(X86_FEATURE_PCID))
-		cr4_clear_bits(X86_CR4_PCIDE);
-#endif
+	load_trampoline_pgtable();
 
 	/* Jump to the identity-mapped low memory code */
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 31b5856010cb..b9802b18f504 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -17,6 +17,29 @@ u32 *trampoline_cr4_features;
 /* Hold the pgd entry used on booting additional CPUs */
 pgd_t trampoline_pgd_entry;
 
+void load_trampoline_pgtable(void)
+{
+#ifdef CONFIG_X86_32
+	load_cr3(initial_page_table);
+#else
+	/* Exiting long mode will fail if CR4.PCIDE is set. */
+	if (boot_cpu_has(X86_FEATURE_PCID))
+		cr4_clear_bits(X86_CR4_PCIDE);
+
+	write_cr3(real_mode_header->trampoline_pgd);
+#endif
+
+	/*
+	 * The CR3 write above will not flush global TLB entries.
+	 * Stale, global entries from previous sets of page tables may
+	 * still be present.  Flush those stale entries.
+	 *
+	 * This ensures that memory accessed while running with
+	 * trampoline_pgd is *actually* mapped into trampoline_pgd.
+	 */
+	__flush_tlb_all();
+}
+
 void __init reserve_real_mode(void)
 {
 	phys_addr_t mem;
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 4/4] x86/64/mm: Map all kernel memory into trampoline_pgd
  2021-10-01 15:48 [PATCH v3 0/4] x86/mm: Fix some issues with using trampoline_pgd Joerg Roedel
                   ` (2 preceding siblings ...)
  2021-10-01 15:48 ` [PATCH v3 3/4] x86/mm: Flush global TLB when switching to trampoline page-table Joerg Roedel
@ 2021-10-01 15:48 ` Joerg Roedel
  3 siblings, 0 replies; 15+ messages in thread
From: Joerg Roedel @ 2021-10-01 15:48 UTC (permalink / raw)
  To: x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, hpa, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Joerg Roedel, Mike Rapoport,
	Andrew Morton, Brijesh Singh, linux-kernel, stable

From: Joerg Roedel <jroedel@suse.de>

The trampoline_pgd only maps the 0xfffffff000000000-0xffffffffffffffff
range of kernel memory (with 4-level paging). This range contains the
kernels text+data+bss mappings and the module mapping space, but not the
direct mapping and the vmalloc area.

This is enough to get an application processors out of real-mode, but
for code that switches back to real-mode the trampoline_pgd is missing
important parts of the address space. For example, consider this code
from arch/x86/kernel/reboot.c, function machine_real_restart() for a
64-bit kernel:

	#ifdef CONFIG_X86_32
		load_cr3(initial_page_table);
	#else
		write_cr3(real_mode_header->trampoline_pgd);

		/* Exiting long mode will fail if CR4.PCIDE is set. */
		if (boot_cpu_has(X86_FEATURE_PCID))
			cr4_clear_bits(X86_CR4_PCIDE);
	#endif

		/* Jump to the identity-mapped low memory code */
	#ifdef CONFIG_X86_32
		asm volatile("jmpl *%0" : :
			     "rm" (real_mode_header->machine_real_restart_asm),
			     "a" (type));
	#else
		asm volatile("ljmpl *%0" : :
			     "m" (real_mode_header->machine_real_restart_asm),
			     "D" (type));
	#endif

The code switches to the trampoline_pgd, which unmaps the direct mapping
and also the kernel stack. The call to cr4_clear_bits() will find no
stack and crash the machine. The real_mode_header pointer below points
into the direct mapping, and dereferencing it also causes a crash.

The reason this does not crash always is only that kernel mappings are
global and the CR3 switch does not flush those mappings. But if theses
mappings are not in the TLB already, the above code will crash before it
can jump to the real-mode stub.

Extend the trampoline_pgd to contain all kernel mappings to prevent
these crashes and to make code which runs on this page-table more
robust.

Cc: stable@vger.kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/realmode/init.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index b9802b18f504..77617cd624fe 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -95,6 +95,7 @@ static void __init setup_real_mode(void)
 #ifdef CONFIG_X86_64
 	u64 *trampoline_pgd;
 	u64 efer;
+	int i;
 #endif
 
 	base = (unsigned char *)real_mode_header;
@@ -151,8 +152,17 @@ static void __init setup_real_mode(void)
 	trampoline_header->flags = 0;
 
 	trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
+
+	/* Map the real mode stub as virtual == physical */
 	trampoline_pgd[0] = trampoline_pgd_entry.pgd;
-	trampoline_pgd[511] = init_top_pgt[511].pgd;
+
+	/*
+	 * Include the entirety of the kernel mapping into the trampoline
+	 * PGD.  This way, all mappings present in the normal kernel page
+	 * tables are usable while running on trampoline_pgd.
+	 */
+	for (i = pgd_index(__PAGE_OFFSET); i < PTRS_PER_PGD; i++)
+		trampoline_pgd[i] = init_top_pgt[i].pgd;
 #endif
 
 	sme_sev_setup_real_mode(trampoline_header);
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 3/4] x86/mm: Flush global TLB when switching to trampoline page-table
  2021-10-01 15:48 ` [PATCH v3 3/4] x86/mm: Flush global TLB when switching to trampoline page-table Joerg Roedel
@ 2021-10-01 16:13   ` Dave Hansen
  2021-10-01 17:57     ` Jörg Rödel
  2021-10-27  9:58   ` Borislav Petkov
  1 sibling, 1 reply; 15+ messages in thread
From: Dave Hansen @ 2021-10-01 16:13 UTC (permalink / raw)
  To: Joerg Roedel, x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, hpa, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Joerg Roedel, Mike Rapoport,
	Andrew Morton, Brijesh Singh, linux-kernel

On 10/1/21 8:48 AM, Joerg Roedel wrote:
> Move the switching code into a function so that it can be re-used and
> add a global TLB flush. This makes sure that usage of memory which is
> not mapped in the trampoline page-table is reliably caught.

This looks fine.  But, just to be clear, nothing in this series reuses
the code, right?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 3/4] x86/mm: Flush global TLB when switching to trampoline page-table
  2021-10-01 16:13   ` Dave Hansen
@ 2021-10-01 17:57     ` Jörg Rödel
  0 siblings, 0 replies; 15+ messages in thread
From: Jörg Rödel @ 2021-10-01 17:57 UTC (permalink / raw)
  To: Dave Hansen
  Cc: x86, Thomas Gleixner, Ingo Molnar, Borislav Petkov, hpa,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, Mike Rapoport,
	Andrew Morton, Brijesh Singh, linux-kernel



> Am 01.10.2021 um 18:13 schrieb Dave Hansen <dave.hansen@intel.com>:
> 
> On 10/1/21 8:48 AM, Joerg Roedel wrote:
>> Move the switching code into a function so that it can be re-used and
>> add a global TLB flush. This makes sure that usage of memory which is
>> not mapped in the trampoline page-table is reliably caught.
> 
> This looks fine.  But, just to be clear, nothing in this series reuses
> the code, right?

Correct, but my SEV-ES kexec series will re-use it, as it also uses the trampoline_pgd and switches to real-mode to park the APs.

Regards,

Jörg

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/4] x86/mm/64: Flush global TLB on boot and AP bringup
  2021-10-01 15:48 ` [PATCH v3 2/4] x86/mm/64: Flush global TLB on boot and AP bringup Joerg Roedel
@ 2021-10-26  9:55   ` Borislav Petkov
  2021-10-26 12:58     ` Borislav Petkov
  0 siblings, 1 reply; 15+ messages in thread
From: Borislav Petkov @ 2021-10-26  9:55 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: x86, Thomas Gleixner, Ingo Molnar, hpa, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Joerg Roedel, Mike Rapoport,
	Andrew Morton, Brijesh Singh, linux-kernel

On Fri, Oct 01, 2021 at 05:48:15PM +0200, Joerg Roedel wrote:
> +/*
> + * The __flush_tlb_all() function uses all kinds of state which is not
> + * initialized that early and can not be used here. So the helper below is used
> + * to flush global TLB entries.
> + */
> +static void __init early_flush_tlb_global(void)
> +{
> +	unsigned long cr4 = native_read_cr4();
> +
> +	native_write_cr4(cr4 ^ X86_CR4_PGE);
> +	native_write_cr4(cr4);
> +}

Please make sure now and in the future to avoid such duplication - diff
ontop, for a possible way to do this, at the end.

>  asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
>  {
>  	/*
> @@ -478,6 +491,8 @@ asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
>  	/* Kill off the identity-map trampoline */
>  	reset_early_page_tables();
>  
> +	early_flush_tlb_global();
> +
>  	clear_bss();
>  
>  	clear_page(init_top_pgt);
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index d8b3ebd2bb85..bd4b6ebafdc3 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -166,9 +166,26 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
>  	call	sev_verify_cbit
>  	popq	%rsi
>  
> -	/* Switch to new page-table */
> +	/*
> +	 * Switch to new page-table
> +	 *
> +	 * For the boot CPU this switches to early_top_pgt which still has the
> +	 * indentity mappings present. The secondary CPUs will switch to the
> +	 * init_top_pgt here, away from the trampoline_pgd and unmapping the

"... unmap the... "

> +	 * indentity mapped ranges.
> +	 *
> +	 * Do a global TLB flush after the CR3 switch to make sure the TLB
> +	 * entries from the identity mapping are flushed.

Put this comment...

> +	 */
>  	movq	%rax, %cr3
>  
> +	/* Flush global TLB entries - only needed for secondary CPUs */

... here instead of this one.

> +	movq	%cr4, %rcx
> +	movq	%rcx, %rax
> +	xorq	$X86_CR4_PGE, %rcx
> +	movq	%rcx, %cr4
> +	movq	%rax, %cr4
> +
>  	/* Ensure I am executing from virtual addresses */
>  	movq	$1f, %rax
>  	ANNOTATE_RETPOLINE_SAFE
> -- 

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index b587a9ee9cb2..98fa0a114074 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -261,4 +261,9 @@ extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch);
 
 #endif /* !MODULE */
 
+static inline void __native_tlb_flush_global(unsigned long cr4)
+{
+	native_write_cr4(cr4 ^ X86_CR4_PGE);
+	native_write_cr4(cr4);
+}
 #endif /* _ASM_X86_TLBFLUSH_H */
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 0a93b24d7604..75acb6027a87 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -462,19 +462,6 @@ static void __init copy_bootdata(char *real_mode_data)
 	sme_unmap_bootdata(real_mode_data);
 }
 
-/*
- * The __flush_tlb_all() function uses all kinds of state which is not
- * initialized that early and can not be used here. So the helper below is used
- * to flush global TLB entries.
- */
-static void __init early_flush_tlb_global(void)
-{
-	unsigned long cr4 = native_read_cr4();
-
-	native_write_cr4(cr4 ^ X86_CR4_PGE);
-	native_write_cr4(cr4);
-}
-
 asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
 {
 	/*
@@ -496,7 +483,7 @@ asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
 	/* Kill off the identity-map trampoline */
 	reset_early_page_tables();
 
-	early_flush_tlb_global();
+	__native_tlb_flush_global(native_read_cr4());
 
 	clear_bss();
 
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 59ba2968af1b..1e6513f95133 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -1148,7 +1148,7 @@ void flush_tlb_one_user(unsigned long addr)
  */
 STATIC_NOPV void native_flush_tlb_global(void)
 {
-	unsigned long cr4, flags;
+	unsigned long flags;
 
 	if (static_cpu_has(X86_FEATURE_INVPCID)) {
 		/*
@@ -1168,11 +1168,7 @@ STATIC_NOPV void native_flush_tlb_global(void)
 	 */
 	raw_local_irq_save(flags);
 
-	cr4 = this_cpu_read(cpu_tlbstate.cr4);
-	/* toggle PGE */
-	native_write_cr4(cr4 ^ X86_CR4_PGE);
-	/* write old PGE again and flush TLBs */
-	native_write_cr4(cr4);
+	__native_tlb_flush_global(this_cpu_read(cpu_tlbstate.cr4));
 
 	raw_local_irq_restore(flags);
 }

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/4] x86/mm/64: Flush global TLB on boot and AP bringup
  2021-10-26  9:55   ` Borislav Petkov
@ 2021-10-26 12:58     ` Borislav Petkov
  2021-12-02 12:50       ` Joerg Roedel
  0 siblings, 1 reply; 15+ messages in thread
From: Borislav Petkov @ 2021-10-26 12:58 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: x86, Thomas Gleixner, Ingo Molnar, hpa, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Joerg Roedel, Mike Rapoport,
	Andrew Morton, Brijesh Singh, linux-kernel

On Tue, Oct 26, 2021 at 11:55:44AM +0200, Borislav Petkov wrote:
> > +	movq	%cr4, %rcx
> > +	movq	%rcx, %rax
> > +	xorq	$X86_CR4_PGE, %rcx
> > +	movq	%rcx, %cr4
> > +	movq	%rax, %cr4

Also, I'm wondering if you could compact this even more by defining a
function toggling the PGE bit and calling it from everywhere, even from
asm.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 3/4] x86/mm: Flush global TLB when switching to trampoline page-table
  2021-10-01 15:48 ` [PATCH v3 3/4] x86/mm: Flush global TLB when switching to trampoline page-table Joerg Roedel
  2021-10-01 16:13   ` Dave Hansen
@ 2021-10-27  9:58   ` Borislav Petkov
  2021-12-02 12:58     ` Joerg Roedel
  1 sibling, 1 reply; 15+ messages in thread
From: Borislav Petkov @ 2021-10-27  9:58 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: x86, Thomas Gleixner, Ingo Molnar, hpa, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Joerg Roedel, Mike Rapoport,
	Andrew Morton, Brijesh Singh, linux-kernel

On Fri, Oct 01, 2021 at 05:48:16PM +0200, Joerg Roedel wrote:
> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
> index 31b5856010cb..b9802b18f504 100644
> --- a/arch/x86/realmode/init.c
> +++ b/arch/x86/realmode/init.c
> @@ -17,6 +17,29 @@ u32 *trampoline_cr4_features;
>  /* Hold the pgd entry used on booting additional CPUs */
>  pgd_t trampoline_pgd_entry;
>  
> +void load_trampoline_pgtable(void)
> +{
> +#ifdef CONFIG_X86_32
> +	load_cr3(initial_page_table);
> +#else
> +	/* Exiting long mode will fail if CR4.PCIDE is set. */

So this comment is not valid anymore if this is a separate function - it
is valid only when that function is called in reboot.c so I guess you
should leave that comment there.

> +	if (boot_cpu_has(X86_FEATURE_PCID))
> +		cr4_clear_bits(X86_CR4_PCIDE);
> +
> +	write_cr3(real_mode_header->trampoline_pgd);

Is there any significance to the reordering of those calls here? The
commit message doesn't say...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/4] x86/mm/64: Flush global TLB on boot and AP bringup
  2021-10-26 12:58     ` Borislav Petkov
@ 2021-12-02 12:50       ` Joerg Roedel
  2021-12-02 18:19         ` Borislav Petkov
  0 siblings, 1 reply; 15+ messages in thread
From: Joerg Roedel @ 2021-12-02 12:50 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Joerg Roedel, x86, Thomas Gleixner, Ingo Molnar, hpa,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, Mike Rapoport,
	Andrew Morton, Brijesh Singh, linux-kernel

On Tue, Oct 26, 2021 at 02:58:44PM +0200, Borislav Petkov wrote:
> On Tue, Oct 26, 2021 at 11:55:44AM +0200, Borislav Petkov wrote:
> > > +	movq	%cr4, %rcx
> > > +	movq	%rcx, %rax
> > > +	xorq	$X86_CR4_PGE, %rcx
> > > +	movq	%rcx, %cr4
> > > +	movq	%rax, %cr4
> 
> Also, I'm wondering if you could compact this even more by defining a
> function toggling the PGE bit and calling it from everywhere, even from
> asm.

Yeah, that would make sense, but is probably worth its own patch-set.
Unifying this across arch/x86/ needs to touch a couple more places and
needs special care so that the function is safe to call from early asm.

Regards,

-- 
Jörg Rödel
jroedel@suse.de

SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nürnberg
Germany
 
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 3/4] x86/mm: Flush global TLB when switching to trampoline page-table
  2021-10-27  9:58   ` Borislav Petkov
@ 2021-12-02 12:58     ` Joerg Roedel
  2021-12-02 18:26       ` Borislav Petkov
  0 siblings, 1 reply; 15+ messages in thread
From: Joerg Roedel @ 2021-12-02 12:58 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Joerg Roedel, x86, Thomas Gleixner, Ingo Molnar, hpa,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, Mike Rapoport,
	Andrew Morton, Brijesh Singh, linux-kernel

On Wed, Oct 27, 2021 at 11:58:45AM +0200, Borislav Petkov wrote:
> On Fri, Oct 01, 2021 at 05:48:16PM +0200, Joerg Roedel wrote:
> > +void load_trampoline_pgtable(void)
> > +{
> > +#ifdef CONFIG_X86_32
> > +	load_cr3(initial_page_table);
> > +#else
> > +	/* Exiting long mode will fail if CR4.PCIDE is set. */
> 
> So this comment is not valid anymore if this is a separate function - it
> is valid only when that function is called in reboot.c so I guess you
> should leave that comment there.

Okay, but in the caller it is not visible the CR4.PCID is disabled in
this function. I'd rather update the comment to tell that the function
is called before transitioning to real mode?

> 
> > +	if (boot_cpu_has(X86_FEATURE_PCID))
> > +		cr4_clear_bits(X86_CR4_PCIDE);
> > +
> > +	write_cr3(real_mode_header->trampoline_pgd);
> 
> Is there any significance to the reordering of those calls here? The
> commit message doesn't say...

Yes, the call to cr4_clear_bits() is not safe anymore on the trampoline
page-table, because the per-cpu areas are not fully mapped anymore.

This changes with the next patch, but its nevertheless more robust to
minimize the code running on the trampoline page-table.

I will add that to the commit message.

Regards,

-- 
Jörg Rödel
jroedel@suse.de

SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nürnberg
Germany
 
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/4] x86/mm/64: Flush global TLB on boot and AP bringup
  2021-12-02 12:50       ` Joerg Roedel
@ 2021-12-02 18:19         ` Borislav Petkov
  2021-12-02 21:17           ` Joerg Roedel
  0 siblings, 1 reply; 15+ messages in thread
From: Borislav Petkov @ 2021-12-02 18:19 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Joerg Roedel, x86, Thomas Gleixner, Ingo Molnar, hpa,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, Mike Rapoport,
	Andrew Morton, Brijesh Singh, linux-kernel

On Thu, Dec 02, 2021 at 01:50:05PM +0100, Joerg Roedel wrote:
> Yeah, that would make sense, but is probably worth its own patch-set.
> Unifying this across arch/x86/ needs to touch a couple more places and
> needs special care so that the function is safe to call from early asm.

I'd gladly review a preparatory patchset doing that. The usual strategy
is, cleanup and refactoring first, new features later.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 3/4] x86/mm: Flush global TLB when switching to trampoline page-table
  2021-12-02 12:58     ` Joerg Roedel
@ 2021-12-02 18:26       ` Borislav Petkov
  0 siblings, 0 replies; 15+ messages in thread
From: Borislav Petkov @ 2021-12-02 18:26 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Joerg Roedel, x86, Thomas Gleixner, Ingo Molnar, hpa,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, Mike Rapoport,
	Andrew Morton, Brijesh Singh, linux-kernel

On Thu, Dec 02, 2021 at 01:58:51PM +0100, Joerg Roedel wrote:
> Okay, but in the caller it is not visible the CR4.PCID is disabled in
> this function. I'd rather update the comment to tell that the function
> is called before transitioning to real mode?

Well, if something calls load_trampoline_pgtable(), it kinda assumes
that if it wants that that function will do all the necessary steps to
load it, including clearing PCIDE.

Why does the caller even need to know that that function clears
CR4.PCIDE?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/4] x86/mm/64: Flush global TLB on boot and AP bringup
  2021-12-02 18:19         ` Borislav Petkov
@ 2021-12-02 21:17           ` Joerg Roedel
  0 siblings, 0 replies; 15+ messages in thread
From: Joerg Roedel @ 2021-12-02 21:17 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Joerg Roedel, x86, Thomas Gleixner, Ingo Molnar, hpa,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, Mike Rapoport,
	Andrew Morton, Brijesh Singh, linux-kernel

On Thu, Dec 02, 2021 at 07:19:07PM +0100, Borislav Petkov wrote:
> I'd gladly review a preparatory patchset doing that. The usual strategy
> is, cleanup and refactoring first, new features later.

Is that also true for fixes? Because this patch-set actually tries to
fix an issue present in current code and not adding any new feature.

Thanks,

-- 
Jörg Rödel
jroedel@suse.de

SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nürnberg
Germany
 
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-12-02 21:17 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-01 15:48 [PATCH v3 0/4] x86/mm: Fix some issues with using trampoline_pgd Joerg Roedel
2021-10-01 15:48 ` [PATCH v3 1/4] x86/realmode: Add comment for Global bit usage in trampline_pgd Joerg Roedel
2021-10-01 15:48 ` [PATCH v3 2/4] x86/mm/64: Flush global TLB on boot and AP bringup Joerg Roedel
2021-10-26  9:55   ` Borislav Petkov
2021-10-26 12:58     ` Borislav Petkov
2021-12-02 12:50       ` Joerg Roedel
2021-12-02 18:19         ` Borislav Petkov
2021-12-02 21:17           ` Joerg Roedel
2021-10-01 15:48 ` [PATCH v3 3/4] x86/mm: Flush global TLB when switching to trampoline page-table Joerg Roedel
2021-10-01 16:13   ` Dave Hansen
2021-10-01 17:57     ` Jörg Rödel
2021-10-27  9:58   ` Borislav Petkov
2021-12-02 12:58     ` Joerg Roedel
2021-12-02 18:26       ` Borislav Petkov
2021-10-01 15:48 ` [PATCH v3 4/4] x86/64/mm: Map all kernel memory into trampoline_pgd Joerg Roedel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).