All of lore.kernel.org
 help / color / mirror / Atom feed
* Erratum 383 fix for 32 bit x86 kernels
@ 2010-09-24 11:52 Joerg Roedel
  2010-09-24 11:58 ` Joerg Roedel
                   ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Joerg Roedel @ 2010-09-24 11:52 UTC (permalink / raw)
  To: Greg KH; +Cc: H. Peter Anvin, Borislav Petkov, linux-kernel

Hi Greg,

here is a patch which I want to ask you to include into the current
-stable kernels. This patch fixes the occurence of AMD Erratum 383 on
32 bit x86 kernels when using CPU hotplug. This patch is a combined
patch created by cherry-picking (and fixing a small compile error)
upstream commits:

	fd89a137924e0710078c3ae855e7cec1c43cb845
	b7d460897739e02f186425b7276e3fdb1595cea7

The second commit fixes a problem in the first one. The patch I attach
here is against 2.6.32.22 but should also apply against 2.6.35 (At least
cherry-picking from 2.6.36-rc gave no conflicts).
Please consider to include this patch into the -stable kernel series.

Regards,

	Joerg

>From d12a669119908f1c24a3b5037445c124c312eea5 Mon Sep 17 00:00:00 2001
From: Joerg Roedel <joerg.roedel@amd.com>
Date: Thu, 22 Jul 2010 11:32:00 +0200
Subject: [PATCH] x86-32: Fix crashes with CPU hotplug on AMD machines

This patch fixes machine crashes which occured when heavily
testing cpu hotplug code on a 32 bit kernel. The kernel
crashed because these tests triggered AMD erratum 383 which
resulted in a machine check exception.
The erratum triggered in the test cases because of the
following steps:

	1. On 32 bit the swapper_pg_dir page table is used
	   as the initial page table for booting a secondary
	   cpu.

	2. To make this work swapper_pg_dir needs a direct
	   mapping of physical memory in it (the low
	   mappings)

	3. Other cpus may use swapper_pg_dir while the low
	   mappings are present (when leave_mm is called).

	4. This can result in TLB entries for addresses
	   below __PAGE_OFFSET to be established due to
	   speculative TLB loads. These TLB entries are
	   marked global and possibly large.

	5. When the CPU which has this entry loaded switches
	   to another page table this global TLB entry is not
	   flushed.

	6. The process accesses an address covered by this
	   TLB entry but there is a permission mismatch
	   (present TLB entry covers a large global page not
	    accessible for userspace).

	7. Due to the permission mismatch the hardware
	   walks the new page table and establishes a new
	   4kb TLB entry. Due to AMD erratum 383 there might
	   be a small window of time now where both TLB
	   entries are present.

	8. After the page walk the hardware does a new TLB
	   lookup which may result in two matches. This
	   results in a machine check exception which
	   signals a TLB multimatch error. The machine
	   crashes.

There are two ways to fix this issue:

	1. Always do a global TLB flush when a new cr3 is
	   loaded and the old page table was swapper_pg_dir.
	   I consider this a hack hard to understand.

	2. Do not use swapper_pg_dir to boot secondary cpus.

This patch implements solution 2. It introduces a
trampoline_pg_dir which has the same layout as
swapper_pg_dir with low_mappings. This page table is used as
the initial page table of the booting cpu. Later in the
bringup process it switches to swapper_pg_dir and does a
global tlb flush. This fixes the crashes in our test cases.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/pgtable_32.h |    1 +
 arch/x86/include/asm/trampoline.h |    3 +++
 arch/x86/kernel/head_32.S         |    8 +++++++-
 arch/x86/kernel/setup.c           |    2 ++
 arch/x86/kernel/smpboot.c         |   21 ++++++---------------
 arch/x86/kernel/trampoline.c      |   18 ++++++++++++++++++
 6 files changed, 37 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h
index 2984a25..f686f49 100644
--- a/arch/x86/include/asm/pgtable_32.h
+++ b/arch/x86/include/asm/pgtable_32.h
@@ -26,6 +26,7 @@ struct mm_struct;
 struct vm_area_struct;
 
 extern pgd_t swapper_pg_dir[1024];
+extern pgd_t trampoline_pg_dir[1024];
 
 static inline void pgtable_cache_init(void) { }
 static inline void check_pgt_cache(void) { }
diff --git a/arch/x86/include/asm/trampoline.h b/arch/x86/include/asm/trampoline.h
index cb507bb..8f78fdf 100644
--- a/arch/x86/include/asm/trampoline.h
+++ b/arch/x86/include/asm/trampoline.h
@@ -13,14 +13,17 @@ extern unsigned char *trampoline_base;
 
 extern unsigned long init_rsp;
 extern unsigned long initial_code;
+extern unsigned long initial_page_table;
 extern unsigned long initial_gs;
 
 #define TRAMPOLINE_SIZE roundup(trampoline_end - trampoline_data, PAGE_SIZE)
 
 extern unsigned long setup_trampoline(void);
+extern void __init setup_trampoline_page_table(void);
 extern void __init reserve_trampoline_memory(void);
 #else
 static inline void reserve_trampoline_memory(void) {};
+extern void __init setup_trampoline_page_table(void) {};
 #endif /* CONFIG_X86_TRAMPOLINE */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 37c3d4b..75e3981 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -328,7 +328,7 @@ ENTRY(startup_32_smp)
 /*
  * Enable paging
  */
-	movl $pa(swapper_pg_dir),%eax
+	movl pa(initial_page_table), %eax
 	movl %eax,%cr3		/* set the page table pointer.. */
 	movl %cr0,%eax
 	orl  $X86_CR0_PG,%eax
@@ -608,6 +608,8 @@ ignore_int:
 .align 4
 ENTRY(initial_code)
 	.long i386_start_kernel
+ENTRY(initial_page_table)
+	.long pa(swapper_pg_dir)
 
 /*
  * BSS section
@@ -623,6 +625,10 @@ ENTRY(swapper_pg_dir)
 #endif
 swapper_pg_fixmap:
 	.fill 1024,4,0
+#ifdef CONFIG_X86_TRAMPOLINE
+ENTRY(trampoline_pg_dir)
+	.fill 1024,4,0
+#endif
 ENTRY(empty_zero_page)
 	.fill 4096,1,0
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index b4ae4ac..6600cfd 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1008,6 +1008,8 @@ void __init setup_arch(char **cmdline_p)
 	paging_init();
 	x86_init.paging.pagetable_setup_done(swapper_pg_dir);
 
+	setup_trampoline_page_table();
+
 	tboot_probe();
 
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c4f33b2..7b05eb1 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -73,7 +73,6 @@
 
 #ifdef CONFIG_X86_32
 u8 apicid_2_node[MAX_APICID];
-static int low_mappings;
 #endif
 
 /* State of each CPU */
@@ -300,8 +299,8 @@ notrace static void __cpuinit start_secondary(void *unused)
 	}
 
 #ifdef CONFIG_X86_32
-	while (low_mappings)
-		cpu_relax();
+	/* switch away from the trampoline page-table */
+	load_cr3(swapper_pg_dir);
 	__flush_tlb_all();
 #endif
 
@@ -765,6 +764,10 @@ do_rest:
 	initial_code = (unsigned long)start_secondary;
 	stack_start.sp = (void *) c_idle.idle->thread.sp;
 
+#ifdef CONFIG_X86_32
+	initial_page_table = __pa(&trampoline_pg_dir);
+#endif
+
 	/* start_ip had better be page-aligned! */
 	start_ip = setup_trampoline();
 
@@ -894,20 +897,8 @@ int __cpuinit native_cpu_up(unsigned int cpu)
 
 	per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
 
-#ifdef CONFIG_X86_32
-	/* init low mem mapping */
-	clone_pgd_range(swapper_pg_dir, swapper_pg_dir + KERNEL_PGD_BOUNDARY,
-		min_t(unsigned long, KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY));
-	flush_tlb_all();
-	low_mappings = 1;
-
 	err = do_boot_cpu(apicid, cpu);
 
-	zap_low_mappings(false);
-	low_mappings = 0;
-#else
-	err = do_boot_cpu(apicid, cpu);
-#endif
 	if (err) {
 		pr_debug("do_boot_cpu failed %d\n", err);
 		return -EIO;
diff --git a/arch/x86/kernel/trampoline.c b/arch/x86/kernel/trampoline.c
index c652ef6..a874495 100644
--- a/arch/x86/kernel/trampoline.c
+++ b/arch/x86/kernel/trampoline.c
@@ -1,6 +1,7 @@
 #include <linux/io.h>
 
 #include <asm/trampoline.h>
+#include <asm/pgtable.h>
 #include <asm/e820.h>
 
 #if defined(CONFIG_X86_64) && defined(CONFIG_ACPI_SLEEP)
@@ -37,3 +38,20 @@ unsigned long __trampinit setup_trampoline(void)
 	memcpy(trampoline_base, trampoline_data, TRAMPOLINE_SIZE);
 	return virt_to_phys(trampoline_base);
 }
+
+void __init setup_trampoline_page_table(void)
+{
+#ifdef CONFIG_X86_32
+	/* Copy kernel address range */
+	clone_pgd_range(trampoline_pg_dir + KERNEL_PGD_BOUNDARY,
+			swapper_pg_dir + KERNEL_PGD_BOUNDARY,
+			min_t(unsigned long, KERNEL_PGD_PTRS,
+			      KERNEL_PGD_BOUNDARY));
+
+	/* Initialize low mappings */
+	clone_pgd_range(trampoline_pg_dir,
+			swapper_pg_dir + KERNEL_PGD_BOUNDARY,
+			min_t(unsigned long, KERNEL_PGD_PTRS,
+			      KERNEL_PGD_BOUNDARY));
+#endif
+}
-- 
1.7.0.4

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: Erratum 383 fix for 32 bit x86 kernels
  2010-09-24 11:52 Erratum 383 fix for 32 bit x86 kernels Joerg Roedel
@ 2010-09-24 11:58 ` Joerg Roedel
  2010-09-24 13:47 ` Greg KH
  2010-09-24 16:02 ` Greg KH
  2 siblings, 0 replies; 32+ messages in thread
From: Joerg Roedel @ 2010-09-24 11:58 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel, Borislav Petkov, H. Peter Anvin

On Fri, Sep 24, 2010 at 07:52:40AM -0400, Joerg Roedel wrote:
> here is a patch which I want to ask you to include into the current
> -stable kernels. This patch fixes the occurence of AMD Erratum 383 on
> 32 bit x86 kernels when using CPU hotplug. This patch is a combined
> patch created by cherry-picking (and fixing a small compile error)
> upstream commits:
> 
> 	fd89a137924e0710078c3ae855e7cec1c43cb845
> 	b7d460897739e02f186425b7276e3fdb1595cea7
> 
> The second commit fixes a problem in the first one. The patch I attach
> here is against 2.6.32.22 but should also apply against 2.6.35 (At least
> cherry-picking from 2.6.36-rc gave no conflicts).
> Please consider to include this patch into the -stable kernel series.

Ups, I attached the original patch, not the backported one, sorry. Here
is the right one for -stable.


>From 46f0a1afac5471a8ba9d716c76b0c561f6b02139 Mon Sep 17 00:00:00 2001
From: Joerg Roedel <joerg.roedel@amd.com>
Date: Mon, 16 Aug 2010 14:38:33 +0200
Subject: [PATCH] x86-32: Separate 1:1 pagetables from swapper_pg_dir

This patch fixes machine crashes which occur when heavily exercising the
CPU hotplug codepaths on a 32-bit kernel. These crashes are caused by
AMD Erratum 383 and result in a fatal machine check exception. Here's
the scenario:

1. On 32-bit, the swapper_pg_dir page table is used as the initial page
table for booting a secondary CPU.

2. To make this work, swapper_pg_dir needs a direct mapping of physical
memory in it (the low mappings). By adding those low, large page (2M)
mappings (PAE kernel), we create the necessary conditions for Erratum
383 to occur.

3. Other CPUs which do not participate in the off- and onlining game may
use swapper_pg_dir while the low mappings are present (when leave_mm is
called). For all steps below, the CPU referred to is a CPU that is using
swapper_pg_dir, and not the CPU which is being onlined.

4. The presence of the low mappings in swapper_pg_dir can result
in TLB entries for addresses below __PAGE_OFFSET to be established
speculatively. These TLB entries are marked global and large.

5. When the CPU with such TLB entry switches to another page table, this
TLB entry remains because it is global.

6. The process then generates an access to an address covered by the
above TLB entry but there is a permission mismatch - the TLB entry
covers a large global page not accessible to userspace.

7. Due to this permission mismatch a new 4kb, user TLB entry gets
established. Further, Erratum 383 provides for a small window of time
where both TLB entries are present. This results in an uncorrectable
machine check exception signalling a TLB multimatch which panics the
machine.

There are two ways to fix this issue:

        1. Always do a global TLB flush when a new cr3 is loaded and the
        old page table was swapper_pg_dir. I consider this a hack hard
        to understand and with performance implications

        2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit
        does.

This patch implements solution 2. It introduces a trampoline_pg_dir
which has the same layout as swapper_pg_dir with low_mappings. This page
table is used as the initial page table of the booting CPU. Later in the
bringup process, it switches to swapper_pg_dir and does a global TLB
flush. This fixes the crashes in our test cases.

-v2: switch to swapper_pg_dir right after entering start_secondary() so
that we are able to access percpu data which might not be mapped in the
trampoline page table.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
LKML-Reference: <20100816123833.GB28147@aftab>
LKML-Reference: <alpine.LSU.2.00.1008242235120.2515@sister.anvils>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
 arch/x86/include/asm/pgtable_32.h |    1 +
 arch/x86/include/asm/trampoline.h |    3 +++
 arch/x86/kernel/head_32.S         |    8 +++++++-
 arch/x86/kernel/setup.c           |    3 +++
 arch/x86/kernel/smpboot.c         |   32 +++++++++++++-------------------
 arch/x86/kernel/trampoline.c      |   17 +++++++++++++++++
 6 files changed, 44 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h
index 01fd946..750f1bf 100644
--- a/arch/x86/include/asm/pgtable_32.h
+++ b/arch/x86/include/asm/pgtable_32.h
@@ -27,6 +27,7 @@ struct mm_struct;
 struct vm_area_struct;
 
 extern pgd_t swapper_pg_dir[1024];
+extern pgd_t trampoline_pg_dir[1024];
 
 static inline void pgtable_cache_init(void) { }
 static inline void check_pgt_cache(void) { }
diff --git a/arch/x86/include/asm/trampoline.h b/arch/x86/include/asm/trampoline.h
index 90f06c2..2ab4ccd 100644
--- a/arch/x86/include/asm/trampoline.h
+++ b/arch/x86/include/asm/trampoline.h
@@ -13,15 +13,18 @@ extern unsigned char *trampoline_base;
 
 extern unsigned long init_rsp;
 extern unsigned long initial_code;
+extern unsigned long initial_page_table;
 extern unsigned long initial_gs;
 
 #define TRAMPOLINE_SIZE roundup(trampoline_end - trampoline_data, PAGE_SIZE)
 #define TRAMPOLINE_BASE 0x6000
 
 extern unsigned long setup_trampoline(void);
+extern void __init setup_trampoline_page_table(void);
 extern void __init reserve_trampoline_memory(void);
 #else
 static inline void reserve_trampoline_memory(void) {};
+extern void __init setup_trampoline_page_table(void) {};
 #endif /* CONFIG_X86_TRAMPOLINE */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 050c278..34c3308 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -324,7 +324,7 @@ ENTRY(startup_32_smp)
 /*
  * Enable paging
  */
-	movl $pa(swapper_pg_dir),%eax
+	movl pa(initial_page_table), %eax
 	movl %eax,%cr3		/* set the page table pointer.. */
 	movl %cr0,%eax
 	orl  $X86_CR0_PG,%eax
@@ -604,6 +604,8 @@ ignore_int:
 .align 4
 ENTRY(initial_code)
 	.long i386_start_kernel
+ENTRY(initial_page_table)
+	.long pa(swapper_pg_dir)
 
 /*
  * BSS section
@@ -619,6 +621,10 @@ ENTRY(swapper_pg_dir)
 #endif
 swapper_pg_fixmap:
 	.fill 1024,4,0
+#ifdef CONFIG_X86_TRAMPOLINE
+ENTRY(trampoline_pg_dir)
+	.fill 1024,4,0
+#endif
 ENTRY(empty_zero_page)
 	.fill 4096,1,0
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d7a0888..5449a26 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -110,6 +110,7 @@
 #include <asm/numa_64.h>
 #endif
 #include <asm/mce.h>
+#include <asm/trampoline.h>
 
 /*
  * end_pfn only includes RAM, while max_pfn_mapped includes all e820 entries.
@@ -998,6 +999,8 @@ void __init setup_arch(char **cmdline_p)
 	paging_init();
 	x86_init.paging.pagetable_setup_done(swapper_pg_dir);
 
+	setup_trampoline_page_table();
+
 	tboot_probe();
 
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 29ec560..7e8e905 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -70,7 +70,6 @@
 
 #ifdef CONFIG_X86_32
 u8 apicid_2_node[MAX_APICID];
-static int low_mappings;
 #endif
 
 /* State of each CPU */
@@ -292,6 +291,18 @@ notrace static void __cpuinit start_secondary(void *unused)
 	 * fragile that we want to limit the things done here to the
 	 * most necessary things.
 	 */
+
+#ifdef CONFIG_X86_32
+	/*
+	 * Switch away from the trampoline page-table
+	 *
+	 * Do this before cpu_init() because it needs to access per-cpu
+	 * data which may not be mapped in the trampoline page-table.
+	 */
+	load_cr3(swapper_pg_dir);
+	__flush_tlb_all();
+#endif
+
 	vmi_bringup();
 	cpu_init();
 	preempt_disable();
@@ -310,12 +321,6 @@ notrace static void __cpuinit start_secondary(void *unused)
 		enable_8259A_irq(0);
 	}
 
-#ifdef CONFIG_X86_32
-	while (low_mappings)
-		cpu_relax();
-	__flush_tlb_all();
-#endif
-
 	/* This must be done before setting cpu_online_mask */
 	set_cpu_sibling_map(raw_smp_processor_id());
 	wmb();
@@ -741,6 +746,7 @@ do_rest:
 #ifdef CONFIG_X86_32
 	/* Stack for startup_32 can be just as for start_secondary onwards */
 	irq_ctx_init(cpu);
+	initial_page_table = __pa(&trampoline_pg_dir);
 #else
 	clear_tsk_thread_flag(c_idle.idle, TIF_FORK);
 	initial_gs = per_cpu_offset(cpu);
@@ -885,20 +891,8 @@ int __cpuinit native_cpu_up(unsigned int cpu)
 
 	per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
 
-#ifdef CONFIG_X86_32
-	/* init low mem mapping */
-	clone_pgd_range(swapper_pg_dir, swapper_pg_dir + KERNEL_PGD_BOUNDARY,
-		min_t(unsigned long, KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY));
-	flush_tlb_all();
-	low_mappings = 1;
-
 	err = do_boot_cpu(apicid, cpu);
 
-	zap_low_mappings(false);
-	low_mappings = 0;
-#else
-	err = do_boot_cpu(apicid, cpu);
-#endif
 	if (err) {
 		pr_debug("do_boot_cpu failed %d\n", err);
 		return -EIO;
diff --git a/arch/x86/kernel/trampoline.c b/arch/x86/kernel/trampoline.c
index cd02212..0ac23a7 100644
--- a/arch/x86/kernel/trampoline.c
+++ b/arch/x86/kernel/trampoline.c
@@ -1,6 +1,7 @@
 #include <linux/io.h>
 
 #include <asm/trampoline.h>
+#include <asm/pgtable.h>
 #include <asm/e820.h>
 
 #if defined(CONFIG_X86_64) && defined(CONFIG_ACPI_SLEEP)
@@ -39,3 +40,19 @@ unsigned long __trampinit setup_trampoline(void)
 	memcpy(trampoline_base, trampoline_data, TRAMPOLINE_SIZE);
 	return virt_to_phys(trampoline_base);
 }
+
+void __init setup_trampoline_page_table(void)
+{
+#ifdef CONFIG_X86_32
+	/* Copy kernel address range */
+	clone_pgd_range(trampoline_pg_dir + KERNEL_PGD_BOUNDARY,
+			swapper_pg_dir + KERNEL_PGD_BOUNDARY,
+			KERNEL_PGD_PTRS);
+
+	/* Initialize low mappings */
+	clone_pgd_range(trampoline_pg_dir,
+			swapper_pg_dir + KERNEL_PGD_BOUNDARY,
+			min_t(unsigned long, KERNEL_PGD_PTRS,
+			      KERNEL_PGD_BOUNDARY));
+#endif
+}
-- 
1.7.0.4


-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: Erratum 383 fix for 32 bit x86 kernels
  2010-09-24 11:52 Erratum 383 fix for 32 bit x86 kernels Joerg Roedel
  2010-09-24 11:58 ` Joerg Roedel
@ 2010-09-24 13:47 ` Greg KH
  2010-09-24 13:53   ` Roedel, Joerg
  2010-09-24 16:02 ` Greg KH
  2 siblings, 1 reply; 32+ messages in thread
From: Greg KH @ 2010-09-24 13:47 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: H. Peter Anvin, Borislav Petkov, linux-kernel

On Fri, Sep 24, 2010 at 01:52:40PM +0200, Joerg Roedel wrote:
> Hi Greg,
> 
> here is a patch which I want to ask you to include into the current
> -stable kernels.

For any stable-related stuff, please send it to stable@kernel.org so it
doesn't get lost in my crazy inbox.

I'll take this this time, but in the future, please do this to ensure it
gets in.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Erratum 383 fix for 32 bit x86 kernels
  2010-09-24 13:47 ` Greg KH
@ 2010-09-24 13:53   ` Roedel, Joerg
  0 siblings, 0 replies; 32+ messages in thread
From: Roedel, Joerg @ 2010-09-24 13:53 UTC (permalink / raw)
  To: Greg KH; +Cc: H. Peter Anvin, Borislav Petkov, linux-kernel

On Fri, Sep 24, 2010 at 09:47:09AM -0400, Greg KH wrote:
> On Fri, Sep 24, 2010 at 01:52:40PM +0200, Joerg Roedel wrote:
> > Hi Greg,
> > 
> > here is a patch which I want to ask you to include into the current
> > -stable kernels.
> 
> For any stable-related stuff, please send it to stable@kernel.org so it
> doesn't get lost in my crazy inbox.
> 
> I'll take this this time, but in the future, please do this to ensure it
> gets in.

Ah ok, thanks for the hint. I will do this in the future, just forgot
this time.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Erratum 383 fix for 32 bit x86 kernels
  2010-09-24 11:52 Erratum 383 fix for 32 bit x86 kernels Joerg Roedel
  2010-09-24 11:58 ` Joerg Roedel
  2010-09-24 13:47 ` Greg KH
@ 2010-09-24 16:02 ` Greg KH
  2010-09-24 16:24   ` Borislav Petkov
  2 siblings, 1 reply; 32+ messages in thread
From: Greg KH @ 2010-09-24 16:02 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Greg KH, H. Peter Anvin, Borislav Petkov, linux-kernel

On Fri, Sep 24, 2010 at 01:52:40PM +0200, Joerg Roedel wrote:
> Hi Greg,
> 
> here is a patch which I want to ask you to include into the current
> -stable kernels. This patch fixes the occurence of AMD Erratum 383 on
> 32 bit x86 kernels when using CPU hotplug. This patch is a combined
> patch created by cherry-picking (and fixing a small compile error)
> upstream commits:
> 
> 	fd89a137924e0710078c3ae855e7cec1c43cb845
> 	b7d460897739e02f186425b7276e3fdb1595cea7
> 
> The second commit fixes a problem in the first one. The patch I attach
> here is against 2.6.32.22 but should also apply against 2.6.35 (At least
> cherry-picking from 2.6.36-rc gave no conflicts).
> Please consider to include this patch into the -stable kernel series.
> 
> Regards,
> 
> 	Joerg
> 
> >From d12a669119908f1c24a3b5037445c124c312eea5 Mon Sep 17 00:00:00 2001
> From: Joerg Roedel <joerg.roedel@amd.com>
> Date: Thu, 22 Jul 2010 11:32:00 +0200
> Subject: [PATCH] x86-32: Fix crashes with CPU hotplug on AMD machines
> 
> This patch fixes machine crashes which occured when heavily
> testing cpu hotplug code on a 32 bit kernel. The kernel
> crashed because these tests triggered AMD erratum 383 which
> resulted in a machine check exception.
> The erratum triggered in the test cases because of the
> following steps:
> 
> 	1. On 32 bit the swapper_pg_dir page table is used
> 	   as the initial page table for booting a secondary
> 	   cpu.
> 
> 	2. To make this work swapper_pg_dir needs a direct
> 	   mapping of physical memory in it (the low
> 	   mappings)
> 
> 	3. Other cpus may use swapper_pg_dir while the low
> 	   mappings are present (when leave_mm is called).
> 
> 	4. This can result in TLB entries for addresses
> 	   below __PAGE_OFFSET to be established due to
> 	   speculative TLB loads. These TLB entries are
> 	   marked global and possibly large.
> 
> 	5. When the CPU which has this entry loaded switches
> 	   to another page table this global TLB entry is not
> 	   flushed.
> 
> 	6. The process accesses an address covered by this
> 	   TLB entry but there is a permission mismatch
> 	   (present TLB entry covers a large global page not
> 	    accessible for userspace).
> 
> 	7. Due to the permission mismatch the hardware
> 	   walks the new page table and establishes a new
> 	   4kb TLB entry. Due to AMD erratum 383 there might
> 	   be a small window of time now where both TLB
> 	   entries are present.
> 
> 	8. After the page walk the hardware does a new TLB
> 	   lookup which may result in two matches. This
> 	   results in a machine check exception which
> 	   signals a TLB multimatch error. The machine
> 	   crashes.
> 
> There are two ways to fix this issue:
> 
> 	1. Always do a global TLB flush when a new cr3 is
> 	   loaded and the old page table was swapper_pg_dir.
> 	   I consider this a hack hard to understand.
> 
> 	2. Do not use swapper_pg_dir to boot secondary cpus.
> 
> This patch implements solution 2. It introduces a
> trampoline_pg_dir which has the same layout as
> swapper_pg_dir with low_mappings. This page table is used as
> the initial page table of the booting cpu. Later in the
> bringup process it switches to swapper_pg_dir and does a
> global tlb flush. This fixes the crashes in our test cases.
> 
> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
> ---
>  arch/x86/include/asm/pgtable_32.h |    1 +
>  arch/x86/include/asm/trampoline.h |    3 +++
>  arch/x86/kernel/head_32.S         |    8 +++++++-
>  arch/x86/kernel/setup.c           |    2 ++
>  arch/x86/kernel/smpboot.c         |   21 ++++++---------------
>  arch/x86/kernel/trampoline.c      |   18 ++++++++++++++++++
>  6 files changed, 37 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h
> index 2984a25..f686f49 100644
> --- a/arch/x86/include/asm/pgtable_32.h
> +++ b/arch/x86/include/asm/pgtable_32.h
> @@ -26,6 +26,7 @@ struct mm_struct;
>  struct vm_area_struct;
>  
>  extern pgd_t swapper_pg_dir[1024];
> +extern pgd_t trampoline_pg_dir[1024];
>  
>  static inline void pgtable_cache_init(void) { }
>  static inline void check_pgt_cache(void) { }
> diff --git a/arch/x86/include/asm/trampoline.h b/arch/x86/include/asm/trampoline.h
> index cb507bb..8f78fdf 100644
> --- a/arch/x86/include/asm/trampoline.h
> +++ b/arch/x86/include/asm/trampoline.h
> @@ -13,14 +13,17 @@ extern unsigned char *trampoline_base;
>  
>  extern unsigned long init_rsp;
>  extern unsigned long initial_code;
> +extern unsigned long initial_page_table;
>  extern unsigned long initial_gs;
>  
>  #define TRAMPOLINE_SIZE roundup(trampoline_end - trampoline_data, PAGE_SIZE)
>  
>  extern unsigned long setup_trampoline(void);
> +extern void __init setup_trampoline_page_table(void);
>  extern void __init reserve_trampoline_memory(void);
>  #else
>  static inline void reserve_trampoline_memory(void) {};
> +extern void __init setup_trampoline_page_table(void) {};
>  #endif /* CONFIG_X86_TRAMPOLINE */

I don't think that last setup_trampoline_page_table() line is correct
here.

Shouldn't it be:
	static inline void setup_trampoline_page_table(void) {};
instead?

Otherwise I get the following error building the .32 code with this
patch:
	  CC      arch/x86/kernel/setup.o
	  arch/x86/kernel/setup.c: In function ‘setup_arch’:
	  arch/x86/kernel/setup.c:1001:2: error: implicit declaration of function ‘setup_trampoline_page_table’

Is this really how the code looks upstream?

Hm, even with changing the function prototype, I still get an error
building on the .32-stable tree on x86-64, so I'm dropping this patch
from there.

Also, it didn't apply cleanly to .32-stable, I had to apply this chunk
by hand, no big deal.

So, why not I just take the original git commits that are in Linus's
tree?  That should work, right?  If so, do I just need to use those two
above-mentioned commits?  Or something else?  I prefer taking the
original commits as it makes spelunking over time much easier.

thanks,

greg k-h


>  
>  #endif /* __ASSEMBLY__ */
> diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
> index 37c3d4b..75e3981 100644
> --- a/arch/x86/kernel/head_32.S
> +++ b/arch/x86/kernel/head_32.S
> @@ -328,7 +328,7 @@ ENTRY(startup_32_smp)
>  /*
>   * Enable paging
>   */
> -	movl $pa(swapper_pg_dir),%eax
> +	movl pa(initial_page_table), %eax
>  	movl %eax,%cr3		/* set the page table pointer.. */
>  	movl %cr0,%eax
>  	orl  $X86_CR0_PG,%eax
> @@ -608,6 +608,8 @@ ignore_int:
>  .align 4
>  ENTRY(initial_code)
>  	.long i386_start_kernel
> +ENTRY(initial_page_table)
> +	.long pa(swapper_pg_dir)
>  
>  /*
>   * BSS section
> @@ -623,6 +625,10 @@ ENTRY(swapper_pg_dir)
>  #endif
>  swapper_pg_fixmap:
>  	.fill 1024,4,0
> +#ifdef CONFIG_X86_TRAMPOLINE
> +ENTRY(trampoline_pg_dir)
> +	.fill 1024,4,0
> +#endif
>  ENTRY(empty_zero_page)
>  	.fill 4096,1,0
>  
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index b4ae4ac..6600cfd 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1008,6 +1008,8 @@ void __init setup_arch(char **cmdline_p)
>  	paging_init();
>  	x86_init.paging.pagetable_setup_done(swapper_pg_dir);
>  
> +	setup_trampoline_page_table();
> +
>  	tboot_probe();
>  
>  #ifdef CONFIG_X86_64
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index c4f33b2..7b05eb1 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -73,7 +73,6 @@
>  
>  #ifdef CONFIG_X86_32
>  u8 apicid_2_node[MAX_APICID];
> -static int low_mappings;
>  #endif
>  
>  /* State of each CPU */
> @@ -300,8 +299,8 @@ notrace static void __cpuinit start_secondary(void *unused)
>  	}
>  
>  #ifdef CONFIG_X86_32
> -	while (low_mappings)
> -		cpu_relax();
> +	/* switch away from the trampoline page-table */
> +	load_cr3(swapper_pg_dir);
>  	__flush_tlb_all();
>  #endif
>  
> @@ -765,6 +764,10 @@ do_rest:
>  	initial_code = (unsigned long)start_secondary;
>  	stack_start.sp = (void *) c_idle.idle->thread.sp;
>  
> +#ifdef CONFIG_X86_32
> +	initial_page_table = __pa(&trampoline_pg_dir);
> +#endif
> +
>  	/* start_ip had better be page-aligned! */
>  	start_ip = setup_trampoline();
>  
> @@ -894,20 +897,8 @@ int __cpuinit native_cpu_up(unsigned int cpu)
>  
>  	per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
>  
> -#ifdef CONFIG_X86_32
> -	/* init low mem mapping */
> -	clone_pgd_range(swapper_pg_dir, swapper_pg_dir + KERNEL_PGD_BOUNDARY,
> -		min_t(unsigned long, KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY));
> -	flush_tlb_all();
> -	low_mappings = 1;
> -
>  	err = do_boot_cpu(apicid, cpu);
>  
> -	zap_low_mappings(false);
> -	low_mappings = 0;
> -#else
> -	err = do_boot_cpu(apicid, cpu);
> -#endif
>  	if (err) {
>  		pr_debug("do_boot_cpu failed %d\n", err);
>  		return -EIO;
> diff --git a/arch/x86/kernel/trampoline.c b/arch/x86/kernel/trampoline.c
> index c652ef6..a874495 100644
> --- a/arch/x86/kernel/trampoline.c
> +++ b/arch/x86/kernel/trampoline.c
> @@ -1,6 +1,7 @@
>  #include <linux/io.h>
>  
>  #include <asm/trampoline.h>
> +#include <asm/pgtable.h>
>  #include <asm/e820.h>
>  
>  #if defined(CONFIG_X86_64) && defined(CONFIG_ACPI_SLEEP)
> @@ -37,3 +38,20 @@ unsigned long __trampinit setup_trampoline(void)
>  	memcpy(trampoline_base, trampoline_data, TRAMPOLINE_SIZE);
>  	return virt_to_phys(trampoline_base);
>  }
> +
> +void __init setup_trampoline_page_table(void)
> +{
> +#ifdef CONFIG_X86_32
> +	/* Copy kernel address range */
> +	clone_pgd_range(trampoline_pg_dir + KERNEL_PGD_BOUNDARY,
> +			swapper_pg_dir + KERNEL_PGD_BOUNDARY,
> +			min_t(unsigned long, KERNEL_PGD_PTRS,
> +			      KERNEL_PGD_BOUNDARY));
> +
> +	/* Initialize low mappings */
> +	clone_pgd_range(trampoline_pg_dir,
> +			swapper_pg_dir + KERNEL_PGD_BOUNDARY,
> +			min_t(unsigned long, KERNEL_PGD_PTRS,
> +			      KERNEL_PGD_BOUNDARY));
> +#endif
> +}
> -- 
> 1.7.0.4
> 
> -- 
> AMD Operating System Research Center
> 
> Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
> General Managers: Alberto Bozzo, Andrew Bowd
> Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Erratum 383 fix for 32 bit x86 kernels
  2010-09-24 16:02 ` Greg KH
@ 2010-09-24 16:24   ` Borislav Petkov
  2010-09-24 16:29     ` Greg KH
  2010-10-22 16:18     ` Greg KH
  0 siblings, 2 replies; 32+ messages in thread
From: Borislav Petkov @ 2010-09-24 16:24 UTC (permalink / raw)
  To: Greg KH; +Cc: Roedel, Joerg, Greg KH, H. Peter Anvin, linux-kernel

From: Greg KH <greg@kroah.com>
Date: Fri, Sep 24, 2010 at 12:02:06PM -0400

> >  extern unsigned long setup_trampoline(void);
> > +extern void __init setup_trampoline_page_table(void);
> >  extern void __init reserve_trampoline_memory(void);
> >  #else
> >  static inline void reserve_trampoline_memory(void) {};
> > +extern void __init setup_trampoline_page_table(void) {};
> >  #endif /* CONFIG_X86_TRAMPOLINE */
> 
> I don't think that last setup_trampoline_page_table() line is correct
> here.
> 
> Shouldn't it be:
> 	static inline void setup_trampoline_page_table(void) {};
> instead?
> 
> Otherwise I get the following error building the .32 code with this
> patch:
> 	  CC      arch/x86/kernel/setup.o
> 	  arch/x86/kernel/setup.c: In function ‘setup_arch’:
> 	  arch/x86/kernel/setup.c:1001:2: error: implicit declaration of function ‘setup_trampoline_page_table’
> 
> Is this really how the code looks upstream?
> 
> Hm, even with changing the function prototype, I still get an error
> building on the .32-stable tree on x86-64, so I'm dropping this patch
> from there.

Yeah, Joerg forgot 8848a91068c018bc91f597038a0f41462a0f88a4.

> Also, it didn't apply cleanly to .32-stable, I had to apply this chunk
> by hand, no big deal.
> 
> So, why not I just take the original git commits that are in Linus's
> tree?  That should work, right?  If so, do I just need to use those two
> above-mentioned commits?  Or something else?  I prefer taking the
> original commits as it makes spelunking over time much easier.

Sure, you need

1. fd89a137924e0710078c3ae855e7cec1c43cb845	<-- erratum fix
2. 8848a91068c018bc91f597038a0f41462a0f88a4	<-- build fix
3. b7d460897739e02f186425b7276e3fdb1595cea7	<-- VMSPLIT_* fix

in that order and they should cherry-pick fine.

Let me know if you need something tested on our end.

Thanks and sorry for the confusion.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Erratum 383 fix for 32 bit x86 kernels
  2010-09-24 16:24   ` Borislav Petkov
@ 2010-09-24 16:29     ` Greg KH
  2010-10-22 16:18     ` Greg KH
  1 sibling, 0 replies; 32+ messages in thread
From: Greg KH @ 2010-09-24 16:29 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Roedel, Joerg, Greg KH, H. Peter Anvin, linux-kernel

On Fri, Sep 24, 2010 at 06:24:34PM +0200, Borislav Petkov wrote:
> From: Greg KH <greg@kroah.com>
> Date: Fri, Sep 24, 2010 at 12:02:06PM -0400
> 
> > >  extern unsigned long setup_trampoline(void);
> > > +extern void __init setup_trampoline_page_table(void);
> > >  extern void __init reserve_trampoline_memory(void);
> > >  #else
> > >  static inline void reserve_trampoline_memory(void) {};
> > > +extern void __init setup_trampoline_page_table(void) {};
> > >  #endif /* CONFIG_X86_TRAMPOLINE */
> > 
> > I don't think that last setup_trampoline_page_table() line is correct
> > here.
> > 
> > Shouldn't it be:
> > 	static inline void setup_trampoline_page_table(void) {};
> > instead?
> > 
> > Otherwise I get the following error building the .32 code with this
> > patch:
> > 	  CC      arch/x86/kernel/setup.o
> > 	  arch/x86/kernel/setup.c: In function ‘setup_arch’:
> > 	  arch/x86/kernel/setup.c:1001:2: error: implicit declaration of function ‘setup_trampoline_page_table’
> > 
> > Is this really how the code looks upstream?
> > 
> > Hm, even with changing the function prototype, I still get an error
> > building on the .32-stable tree on x86-64, so I'm dropping this patch
> > from there.
> 
> Yeah, Joerg forgot 8848a91068c018bc91f597038a0f41462a0f88a4.
> 
> > Also, it didn't apply cleanly to .32-stable, I had to apply this chunk
> > by hand, no big deal.
> > 
> > So, why not I just take the original git commits that are in Linus's
> > tree?  That should work, right?  If so, do I just need to use those two
> > above-mentioned commits?  Or something else?  I prefer taking the
> > original commits as it makes spelunking over time much easier.
> 
> Sure, you need
> 
> 1. fd89a137924e0710078c3ae855e7cec1c43cb845	<-- erratum fix
> 2. 8848a91068c018bc91f597038a0f41462a0f88a4	<-- build fix
> 3. b7d460897739e02f186425b7276e3fdb1595cea7	<-- VMSPLIT_* fix
> 
> in that order and they should cherry-pick fine.
> 
> Let me know if you need something tested on our end.
> 
> Thanks and sorry for the confusion.

Ok, I'll do this for the next round of -stable kernel updates, this
arrived too late for the one that just went out for review right now.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Erratum 383 fix for 32 bit x86 kernels
  2010-09-24 16:24   ` Borislav Petkov
  2010-09-24 16:29     ` Greg KH
@ 2010-10-22 16:18     ` Greg KH
  2010-10-22 16:20       ` Greg KH
  1 sibling, 1 reply; 32+ messages in thread
From: Greg KH @ 2010-10-22 16:18 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Roedel, Joerg, Greg KH, H. Peter Anvin, linux-kernel

On Fri, Sep 24, 2010 at 06:24:34PM +0200, Borislav Petkov wrote:
> From: Greg KH <greg@kroah.com>
> Date: Fri, Sep 24, 2010 at 12:02:06PM -0400
> 
> > >  extern unsigned long setup_trampoline(void);
> > > +extern void __init setup_trampoline_page_table(void);
> > >  extern void __init reserve_trampoline_memory(void);
> > >  #else
> > >  static inline void reserve_trampoline_memory(void) {};
> > > +extern void __init setup_trampoline_page_table(void) {};
> > >  #endif /* CONFIG_X86_TRAMPOLINE */
> > 
> > I don't think that last setup_trampoline_page_table() line is correct
> > here.
> > 
> > Shouldn't it be:
> > 	static inline void setup_trampoline_page_table(void) {};
> > instead?
> > 
> > Otherwise I get the following error building the .32 code with this
> > patch:
> > 	  CC      arch/x86/kernel/setup.o
> > 	  arch/x86/kernel/setup.c: In function ‘setup_arch’:
> > 	  arch/x86/kernel/setup.c:1001:2: error: implicit declaration of function ‘setup_trampoline_page_table’
> > 
> > Is this really how the code looks upstream?
> > 
> > Hm, even with changing the function prototype, I still get an error
> > building on the .32-stable tree on x86-64, so I'm dropping this patch
> > from there.
> 
> Yeah, Joerg forgot 8848a91068c018bc91f597038a0f41462a0f88a4.
> 
> > Also, it didn't apply cleanly to .32-stable, I had to apply this chunk
> > by hand, no big deal.
> > 
> > So, why not I just take the original git commits that are in Linus's
> > tree?  That should work, right?  If so, do I just need to use those two
> > above-mentioned commits?  Or something else?  I prefer taking the
> > original commits as it makes spelunking over time much easier.
> 
> Sure, you need
> 
> 1. fd89a137924e0710078c3ae855e7cec1c43cb845	<-- erratum fix
> 2. 8848a91068c018bc91f597038a0f41462a0f88a4	<-- build fix
> 3. b7d460897739e02f186425b7276e3fdb1595cea7	<-- VMSPLIT_* fix
> 
> in that order and they should cherry-pick fine.
> 
> Let me know if you need something tested on our end.

Nope, that worked out well, thanks for letting me know exactly which
ones to apply in which order.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Erratum 383 fix for 32 bit x86 kernels
  2010-10-22 16:18     ` Greg KH
@ 2010-10-22 16:20       ` Greg KH
  2010-10-23  8:26         ` Borislav Petkov
                           ` (4 more replies)
  0 siblings, 5 replies; 32+ messages in thread
From: Greg KH @ 2010-10-22 16:20 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Roedel, Joerg, Greg KH, H. Peter Anvin, linux-kernel

On Fri, Oct 22, 2010 at 09:18:10AM -0700, Greg KH wrote:
> On Fri, Sep 24, 2010 at 06:24:34PM +0200, Borislav Petkov wrote:
> > From: Greg KH <greg@kroah.com>
> > Date: Fri, Sep 24, 2010 at 12:02:06PM -0400
> > 
> > > >  extern unsigned long setup_trampoline(void);
> > > > +extern void __init setup_trampoline_page_table(void);
> > > >  extern void __init reserve_trampoline_memory(void);
> > > >  #else
> > > >  static inline void reserve_trampoline_memory(void) {};
> > > > +extern void __init setup_trampoline_page_table(void) {};
> > > >  #endif /* CONFIG_X86_TRAMPOLINE */
> > > 
> > > I don't think that last setup_trampoline_page_table() line is correct
> > > here.
> > > 
> > > Shouldn't it be:
> > > 	static inline void setup_trampoline_page_table(void) {};
> > > instead?
> > > 
> > > Otherwise I get the following error building the .32 code with this
> > > patch:
> > > 	  CC      arch/x86/kernel/setup.o
> > > 	  arch/x86/kernel/setup.c: In function ‘setup_arch’:
> > > 	  arch/x86/kernel/setup.c:1001:2: error: implicit declaration of function ‘setup_trampoline_page_table’
> > > 
> > > Is this really how the code looks upstream?
> > > 
> > > Hm, even with changing the function prototype, I still get an error
> > > building on the .32-stable tree on x86-64, so I'm dropping this patch
> > > from there.
> > 
> > Yeah, Joerg forgot 8848a91068c018bc91f597038a0f41462a0f88a4.
> > 
> > > Also, it didn't apply cleanly to .32-stable, I had to apply this chunk
> > > by hand, no big deal.
> > > 
> > > So, why not I just take the original git commits that are in Linus's
> > > tree?  That should work, right?  If so, do I just need to use those two
> > > above-mentioned commits?  Or something else?  I prefer taking the
> > > original commits as it makes spelunking over time much easier.
> > 
> > Sure, you need
> > 
> > 1. fd89a137924e0710078c3ae855e7cec1c43cb845	<-- erratum fix
> > 2. 8848a91068c018bc91f597038a0f41462a0f88a4	<-- build fix
> > 3. b7d460897739e02f186425b7276e3fdb1595cea7	<-- VMSPLIT_* fix
> > 
> > in that order and they should cherry-pick fine.
> > 
> > Let me know if you need something tested on our end.
> 
> Nope, that worked out well, thanks for letting me know exactly which
> ones to apply in which order.

Oops, nope, that didn't work for the .32 kernel tree.  If you want these
patches there, please backport them and test them to verify that they
build and work properly.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Erratum 383 fix for 32 bit x86 kernels
  2010-10-22 16:20       ` Greg KH
@ 2010-10-23  8:26         ` Borislav Petkov
  2010-11-11 13:56         ` [PATCH 0/3] " Joerg Roedel
                           ` (3 subsequent siblings)
  4 siblings, 0 replies; 32+ messages in thread
From: Borislav Petkov @ 2010-10-23  8:26 UTC (permalink / raw)
  To: Greg KH
  Cc: Borislav Petkov, Roedel, Joerg, Greg KH, H. Peter Anvin, linux-kernel

On Fri, Oct 22, 2010 at 12:20:38PM -0400, Greg KH wrote:
> Oops, nope, that didn't work for the .32 kernel tree.  If you want these
> patches there, please backport them and test them to verify that they
> build and work properly.

I'll try to get to it next week, thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 0/3] Erratum 383 fix for 32 bit x86 kernels
  2010-10-22 16:20       ` Greg KH
  2010-10-23  8:26         ` Borislav Petkov
@ 2010-11-11 13:56         ` Joerg Roedel
  2010-11-11 13:56         ` [PATCH 1/3] x86-32: Separate 1:1 pagetables from swapper_pg_dir Joerg Roedel
                           ` (2 subsequent siblings)
  4 siblings, 0 replies; 32+ messages in thread
From: Joerg Roedel @ 2010-11-11 13:56 UTC (permalink / raw)
  To: Greg KH; +Cc: Borislav Petkov, linux-kernel, stable

Hi Greg,

ok, these are the 3 patches required for the Erratum 383 fix on 32 bit
for the 2.6.32.25 kernel. The patches apply cleanly on-top of it. I
tested the kernel with defconfig and allnoconfig and on 32 and 64 bit.
Every of the 4 combinations comiled fine and booted fine on and UMP and
SMP virtual machine until the point where they wanted to mount a
root-fs.
The problems you had with the previous patches should be solved now.
Sorry for the inconvenience.

Regards,
	Joerg



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/3] x86-32: Separate 1:1 pagetables from swapper_pg_dir
  2010-10-22 16:20       ` Greg KH
  2010-10-23  8:26         ` Borislav Petkov
  2010-11-11 13:56         ` [PATCH 0/3] " Joerg Roedel
@ 2010-11-11 13:56         ` Joerg Roedel
  2011-01-19  0:39           ` Konrad Rzeszutek Wilk
  2010-11-11 13:56         ` [PATCH 2/3] x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline Joerg Roedel
  2010-11-11 13:56         ` Joerg Roedel
  4 siblings, 1 reply; 32+ messages in thread
From: Joerg Roedel @ 2010-11-11 13:56 UTC (permalink / raw)
  To: Greg KH
  Cc: Borislav Petkov, linux-kernel, stable, Joerg Roedel,
	Borislav Petkov, H. Peter Anvin

This patch fixes machine crashes which occur when heavily exercising the
CPU hotplug codepaths on a 32-bit kernel. These crashes are caused by
AMD Erratum 383 and result in a fatal machine check exception. Here's
the scenario:

1. On 32-bit, the swapper_pg_dir page table is used as the initial page
table for booting a secondary CPU.

2. To make this work, swapper_pg_dir needs a direct mapping of physical
memory in it (the low mappings). By adding those low, large page (2M)
mappings (PAE kernel), we create the necessary conditions for Erratum
383 to occur.

3. Other CPUs which do not participate in the off- and onlining game may
use swapper_pg_dir while the low mappings are present (when leave_mm is
called). For all steps below, the CPU referred to is a CPU that is using
swapper_pg_dir, and not the CPU which is being onlined.

4. The presence of the low mappings in swapper_pg_dir can result
in TLB entries for addresses below __PAGE_OFFSET to be established
speculatively. These TLB entries are marked global and large.

5. When the CPU with such TLB entry switches to another page table, this
TLB entry remains because it is global.

6. The process then generates an access to an address covered by the
above TLB entry but there is a permission mismatch - the TLB entry
covers a large global page not accessible to userspace.

7. Due to this permission mismatch a new 4kb, user TLB entry gets
established. Further, Erratum 383 provides for a small window of time
where both TLB entries are present. This results in an uncorrectable
machine check exception signalling a TLB multimatch which panics the
machine.

There are two ways to fix this issue:

        1. Always do a global TLB flush when a new cr3 is loaded and the
        old page table was swapper_pg_dir. I consider this a hack hard
        to understand and with performance implications

        2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit
        does.

This patch implements solution 2. It introduces a trampoline_pg_dir
which has the same layout as swapper_pg_dir with low_mappings. This page
table is used as the initial page table of the booting CPU. Later in the
bringup process, it switches to swapper_pg_dir and does a global TLB
flush. This fixes the crashes in our test cases.

-v2: switch to swapper_pg_dir right after entering start_secondary() so
that we are able to access percpu data which might not be mapped in the
trampoline page table.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
LKML-Reference: <20100816123833.GB28147@aftab>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
 arch/x86/include/asm/pgtable_32.h |    1 +
 arch/x86/include/asm/trampoline.h |    3 +++
 arch/x86/kernel/head_32.S         |    8 +++++++-
 arch/x86/kernel/setup.c           |    3 +++
 arch/x86/kernel/smpboot.c         |   32 +++++++++++++-------------------
 arch/x86/kernel/trampoline.c      |   18 ++++++++++++++++++
 6 files changed, 45 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h
index 01fd946..750f1bf 100644
--- a/arch/x86/include/asm/pgtable_32.h
+++ b/arch/x86/include/asm/pgtable_32.h
@@ -27,6 +27,7 @@ struct mm_struct;
 struct vm_area_struct;
 
 extern pgd_t swapper_pg_dir[1024];
+extern pgd_t trampoline_pg_dir[1024];
 
 static inline void pgtable_cache_init(void) { }
 static inline void check_pgt_cache(void) { }
diff --git a/arch/x86/include/asm/trampoline.h b/arch/x86/include/asm/trampoline.h
index 90f06c2..2ab4ccd 100644
--- a/arch/x86/include/asm/trampoline.h
+++ b/arch/x86/include/asm/trampoline.h
@@ -13,15 +13,18 @@ extern unsigned char *trampoline_base;
 
 extern unsigned long init_rsp;
 extern unsigned long initial_code;
+extern unsigned long initial_page_table;
 extern unsigned long initial_gs;
 
 #define TRAMPOLINE_SIZE roundup(trampoline_end - trampoline_data, PAGE_SIZE)
 #define TRAMPOLINE_BASE 0x6000
 
 extern unsigned long setup_trampoline(void);
+extern void __init setup_trampoline_page_table(void);
 extern void __init reserve_trampoline_memory(void);
 #else
 static inline void reserve_trampoline_memory(void) {};
+extern void __init setup_trampoline_page_table(void) {};
 #endif /* CONFIG_X86_TRAMPOLINE */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 050c278..34c3308 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -324,7 +324,7 @@ ENTRY(startup_32_smp)
 /*
  * Enable paging
  */
-	movl $pa(swapper_pg_dir),%eax
+	movl pa(initial_page_table), %eax
 	movl %eax,%cr3		/* set the page table pointer.. */
 	movl %cr0,%eax
 	orl  $X86_CR0_PG,%eax
@@ -604,6 +604,8 @@ ignore_int:
 .align 4
 ENTRY(initial_code)
 	.long i386_start_kernel
+ENTRY(initial_page_table)
+	.long pa(swapper_pg_dir)
 
 /*
  * BSS section
@@ -619,6 +621,10 @@ ENTRY(swapper_pg_dir)
 #endif
 swapper_pg_fixmap:
 	.fill 1024,4,0
+#ifdef CONFIG_X86_TRAMPOLINE
+ENTRY(trampoline_pg_dir)
+	.fill 1024,4,0
+#endif
 ENTRY(empty_zero_page)
 	.fill 4096,1,0
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d7a0888..5449a26 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -110,6 +110,7 @@
 #include <asm/numa_64.h>
 #endif
 #include <asm/mce.h>
+#include <asm/trampoline.h>
 
 /*
  * end_pfn only includes RAM, while max_pfn_mapped includes all e820 entries.
@@ -998,6 +999,8 @@ void __init setup_arch(char **cmdline_p)
 	paging_init();
 	x86_init.paging.pagetable_setup_done(swapper_pg_dir);
 
+	setup_trampoline_page_table();
+
 	tboot_probe();
 
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 29ec560..7e8e905 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -70,7 +70,6 @@
 
 #ifdef CONFIG_X86_32
 u8 apicid_2_node[MAX_APICID];
-static int low_mappings;
 #endif
 
 /* State of each CPU */
@@ -292,6 +291,18 @@ notrace static void __cpuinit start_secondary(void *unused)
 	 * fragile that we want to limit the things done here to the
 	 * most necessary things.
 	 */
+
+#ifdef CONFIG_X86_32
+	/*
+	 * Switch away from the trampoline page-table
+	 *
+	 * Do this before cpu_init() because it needs to access per-cpu
+	 * data which may not be mapped in the trampoline page-table.
+	 */
+	load_cr3(swapper_pg_dir);
+	__flush_tlb_all();
+#endif
+
 	vmi_bringup();
 	cpu_init();
 	preempt_disable();
@@ -310,12 +321,6 @@ notrace static void __cpuinit start_secondary(void *unused)
 		enable_8259A_irq(0);
 	}
 
-#ifdef CONFIG_X86_32
-	while (low_mappings)
-		cpu_relax();
-	__flush_tlb_all();
-#endif
-
 	/* This must be done before setting cpu_online_mask */
 	set_cpu_sibling_map(raw_smp_processor_id());
 	wmb();
@@ -741,6 +746,7 @@ do_rest:
 #ifdef CONFIG_X86_32
 	/* Stack for startup_32 can be just as for start_secondary onwards */
 	irq_ctx_init(cpu);
+	initial_page_table = __pa(&trampoline_pg_dir);
 #else
 	clear_tsk_thread_flag(c_idle.idle, TIF_FORK);
 	initial_gs = per_cpu_offset(cpu);
@@ -885,20 +891,8 @@ int __cpuinit native_cpu_up(unsigned int cpu)
 
 	per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
 
-#ifdef CONFIG_X86_32
-	/* init low mem mapping */
-	clone_pgd_range(swapper_pg_dir, swapper_pg_dir + KERNEL_PGD_BOUNDARY,
-		min_t(unsigned long, KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY));
-	flush_tlb_all();
-	low_mappings = 1;
-
 	err = do_boot_cpu(apicid, cpu);
 
-	zap_low_mappings(false);
-	low_mappings = 0;
-#else
-	err = do_boot_cpu(apicid, cpu);
-#endif
 	if (err) {
 		pr_debug("do_boot_cpu failed %d\n", err);
 		return -EIO;
diff --git a/arch/x86/kernel/trampoline.c b/arch/x86/kernel/trampoline.c
index cd02212..4e816a4 100644
--- a/arch/x86/kernel/trampoline.c
+++ b/arch/x86/kernel/trampoline.c
@@ -1,6 +1,7 @@
 #include <linux/io.h>
 
 #include <asm/trampoline.h>
+#include <asm/pgtable.h>
 #include <asm/e820.h>
 
 #if defined(CONFIG_X86_64) && defined(CONFIG_ACPI_SLEEP)
@@ -39,3 +40,20 @@ unsigned long __trampinit setup_trampoline(void)
 	memcpy(trampoline_base, trampoline_data, TRAMPOLINE_SIZE);
 	return virt_to_phys(trampoline_base);
 }
+
+void __init setup_trampoline_page_table(void)
+{
+#ifdef CONFIG_X86_32
+	/* Copy kernel address range */
+	clone_pgd_range(trampoline_pg_dir + KERNEL_PGD_BOUNDARY,
+			swapper_pg_dir + KERNEL_PGD_BOUNDARY,
+			min_t(unsigned long, KERNEL_PGD_PTRS,
+			      KERNEL_PGD_BOUNDARY));
+
+	/* Initialize low mappings */
+	clone_pgd_range(trampoline_pg_dir,
+			swapper_pg_dir + KERNEL_PGD_BOUNDARY,
+			min_t(unsigned long, KERNEL_PGD_PTRS,
+			      KERNEL_PGD_BOUNDARY));
+#endif
+}
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/3] x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline
  2010-10-22 16:20       ` Greg KH
                           ` (2 preceding siblings ...)
  2010-11-11 13:56         ` [PATCH 1/3] x86-32: Separate 1:1 pagetables from swapper_pg_dir Joerg Roedel
@ 2010-11-11 13:56         ` Joerg Roedel
  2010-11-11 14:11           ` Greg KH
  2010-11-11 13:56         ` Joerg Roedel
  4 siblings, 1 reply; 32+ messages in thread
From: Joerg Roedel @ 2010-11-11 13:56 UTC (permalink / raw)
  To: Greg KH
  Cc: Borislav Petkov, linux-kernel, stable, Hugh Dickins, H. Peter Anvin

From: Hugh Dickins <hughd@google.com>

rc2 kernel crashes when booting second cpu on this CONFIG_VMSPLIT_2G_OPT
laptop: whereas cloning from kernel to low mappings pgd range does need
to limit by both KERNEL_PGD_PTRS and KERNEL_PGD_BOUNDARY, cloning kernel
pgd range itself must not be limited by the smaller KERNEL_PGD_BOUNDARY.

Signed-off-by: Hugh Dickins <hughd@google.com>
LKML-Reference: <alpine.LSU.2.00.1008242235120.2515@sister.anvils>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
 arch/x86/kernel/trampoline.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/trampoline.c b/arch/x86/kernel/trampoline.c
index 4e816a4..0ac23a7 100644
--- a/arch/x86/kernel/trampoline.c
+++ b/arch/x86/kernel/trampoline.c
@@ -47,8 +47,7 @@ void __init setup_trampoline_page_table(void)
 	/* Copy kernel address range */
 	clone_pgd_range(trampoline_pg_dir + KERNEL_PGD_BOUNDARY,
 			swapper_pg_dir + KERNEL_PGD_BOUNDARY,
-			min_t(unsigned long, KERNEL_PGD_PTRS,
-			      KERNEL_PGD_BOUNDARY));
+			KERNEL_PGD_PTRS);
 
 	/* Initialize low mappings */
 	clone_pgd_range(trampoline_pg_dir,
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/3] x86-32: Fix dummy trampoline-related inline stubs
  2010-10-22 16:20       ` Greg KH
                           ` (3 preceding siblings ...)
  2010-11-11 13:56         ` [PATCH 2/3] x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline Joerg Roedel
@ 2010-11-11 13:56         ` Joerg Roedel
  4 siblings, 0 replies; 32+ messages in thread
From: Joerg Roedel @ 2010-11-11 13:56 UTC (permalink / raw)
  To: Greg KH
  Cc: Borislav Petkov, linux-kernel, stable, H. Peter Anvin,
	Joerg Roedel, Borislav Petkov

From: H. Peter Anvin <hpa@zytor.com>

Fix dummy inline stubs for trampoline-related functions when no
trampolines exist (until we get rid of the no-trampoline case
entirely.)

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <4C6C294D.3030404@zytor.com>
---
 arch/x86/include/asm/trampoline.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/trampoline.h b/arch/x86/include/asm/trampoline.h
index 2ab4ccd..ebace68 100644
--- a/arch/x86/include/asm/trampoline.h
+++ b/arch/x86/include/asm/trampoline.h
@@ -23,8 +23,8 @@ extern unsigned long setup_trampoline(void);
 extern void __init setup_trampoline_page_table(void);
 extern void __init reserve_trampoline_memory(void);
 #else
-static inline void reserve_trampoline_memory(void) {};
-extern void __init setup_trampoline_page_table(void) {};
+static inline void setup_trampoline_page_table(void) {}
+static inline void reserve_trampoline_memory(void) {}
 #endif /* CONFIG_X86_TRAMPOLINE */
 
 #endif /* __ASSEMBLY__ */
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline
  2010-11-11 13:56         ` [PATCH 2/3] x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline Joerg Roedel
@ 2010-11-11 14:11           ` Greg KH
  2010-11-11 14:13             ` Greg KH
  0 siblings, 1 reply; 32+ messages in thread
From: Greg KH @ 2010-11-11 14:11 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Borislav Petkov, linux-kernel, stable, Hugh Dickins, H. Peter Anvin

On Thu, Nov 11, 2010 at 02:56:14PM +0100, Joerg Roedel wrote:
> From: Hugh Dickins <hughd@google.com>
> 
> rc2 kernel crashes when booting second cpu on this CONFIG_VMSPLIT_2G_OPT
> laptop: whereas cloning from kernel to low mappings pgd range does need
> to limit by both KERNEL_PGD_PTRS and KERNEL_PGD_BOUNDARY, cloning kernel
> pgd range itself must not be limited by the smaller KERNEL_PGD_BOUNDARY.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> LKML-Reference: <alpine.LSU.2.00.1008242235120.2515@sister.anvils>
> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
> ---
>  arch/x86/kernel/trampoline.c |    3 +--
>  1 files changed, 1 insertions(+), 2 deletions(-)

Why are you sending this to me?  I'm not the x86 maintainer.

confused,

greg k-h

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline
  2010-11-11 14:11           ` Greg KH
@ 2010-11-11 14:13             ` Greg KH
  2010-11-11 14:17               ` Roedel, Joerg
                                 ` (4 more replies)
  0 siblings, 5 replies; 32+ messages in thread
From: Greg KH @ 2010-11-11 14:13 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Borislav Petkov, linux-kernel, stable, Hugh Dickins, H. Peter Anvin

On Thu, Nov 11, 2010 at 06:11:30AM -0800, Greg KH wrote:
> On Thu, Nov 11, 2010 at 02:56:14PM +0100, Joerg Roedel wrote:
> > From: Hugh Dickins <hughd@google.com>
> > 
> > rc2 kernel crashes when booting second cpu on this CONFIG_VMSPLIT_2G_OPT
> > laptop: whereas cloning from kernel to low mappings pgd range does need
> > to limit by both KERNEL_PGD_PTRS and KERNEL_PGD_BOUNDARY, cloning kernel
> > pgd range itself must not be limited by the smaller KERNEL_PGD_BOUNDARY.
> > 
> > Signed-off-by: Hugh Dickins <hughd@google.com>
> > LKML-Reference: <alpine.LSU.2.00.1008242235120.2515@sister.anvils>
> > Signed-off-by: H. Peter Anvin <hpa@zytor.com>
> > ---
> >  arch/x86/kernel/trampoline.c |    3 +--
> >  1 files changed, 1 insertions(+), 2 deletions(-)
> 
> Why are you sending this to me?  I'm not the x86 maintainer.

Ah, now it makes more sense (seeing 0/3 after 2/3 came in.)

But, for all 3 of these patches, can you please resend them with the git
commit id of the original patch that is in Linus's tree?  I need that to
be able to apply them to the stable releases.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline
  2010-11-11 14:13             ` Greg KH
@ 2010-11-11 14:17               ` Roedel, Joerg
  2010-11-11 15:16               ` [PATCH 0/3] Erratum 383 fix for 32 bit x86 kernels Joerg Roedel
                                 ` (3 subsequent siblings)
  4 siblings, 0 replies; 32+ messages in thread
From: Roedel, Joerg @ 2010-11-11 14:17 UTC (permalink / raw)
  To: Greg KH
  Cc: Borislav Petkov, linux-kernel, stable, Hugh Dickins, H. Peter Anvin

On Thu, Nov 11, 2010 at 09:13:58AM -0500, Greg KH wrote:
> On Thu, Nov 11, 2010 at 06:11:30AM -0800, Greg KH wrote:
> > On Thu, Nov 11, 2010 at 02:56:14PM +0100, Joerg Roedel wrote:
> > > From: Hugh Dickins <hughd@google.com>
> > > 
> > > rc2 kernel crashes when booting second cpu on this CONFIG_VMSPLIT_2G_OPT
> > > laptop: whereas cloning from kernel to low mappings pgd range does need
> > > to limit by both KERNEL_PGD_PTRS and KERNEL_PGD_BOUNDARY, cloning kernel
> > > pgd range itself must not be limited by the smaller KERNEL_PGD_BOUNDARY.
> > > 
> > > Signed-off-by: Hugh Dickins <hughd@google.com>
> > > LKML-Reference: <alpine.LSU.2.00.1008242235120.2515@sister.anvils>
> > > Signed-off-by: H. Peter Anvin <hpa@zytor.com>
> > > ---
> > >  arch/x86/kernel/trampoline.c |    3 +--
> > >  1 files changed, 1 insertions(+), 2 deletions(-)
> > 
> > Why are you sending this to me?  I'm not the x86 maintainer.
> 
> Ah, now it makes more sense (seeing 0/3 after 2/3 came in.)
> 
> But, for all 3 of these patches, can you please resend them with the git
> commit id of the original patch that is in Linus's tree?  I need that to
> be able to apply them to the stable releases.

Hey, you work very early in the morning :-)

I will add the upstream-commit-ids and repost.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 0/3] Erratum 383 fix for 32 bit x86 kernels
  2010-11-11 14:13             ` Greg KH
  2010-11-11 14:17               ` Roedel, Joerg
@ 2010-11-11 15:16               ` Joerg Roedel
  2010-11-11 15:16               ` [PATCH 1/3] x86-32: Separate 1:1 pagetables from swapper_pg_dir Joerg Roedel
                                 ` (2 subsequent siblings)
  4 siblings, 0 replies; 32+ messages in thread
From: Joerg Roedel @ 2010-11-11 15:16 UTC (permalink / raw)
  To: Greg KH; +Cc: Borislav Petkov, linux-kernel, stable

Again with upstream commit-ids



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/3] x86-32: Separate 1:1 pagetables from swapper_pg_dir
  2010-11-11 14:13             ` Greg KH
  2010-11-11 14:17               ` Roedel, Joerg
  2010-11-11 15:16               ` [PATCH 0/3] Erratum 383 fix for 32 bit x86 kernels Joerg Roedel
@ 2010-11-11 15:16               ` Joerg Roedel
  2010-12-07 21:05                 ` Greg KH
  2010-11-11 15:16               ` [PATCH 2/3] x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline Joerg Roedel
  2010-11-11 15:16               ` [PATCH 3/3] x86-32: Fix dummy trampoline-related inline stubs Joerg Roedel
  4 siblings, 1 reply; 32+ messages in thread
From: Joerg Roedel @ 2010-11-11 15:16 UTC (permalink / raw)
  To: Greg KH
  Cc: Borislav Petkov, linux-kernel, stable, Joerg Roedel,
	Borislav Petkov, H. Peter Anvin

commit fd89a137924e0710078c3ae855e7cec1c43cb845 upstream

This patch fixes machine crashes which occur when heavily exercising the
CPU hotplug codepaths on a 32-bit kernel. These crashes are caused by
AMD Erratum 383 and result in a fatal machine check exception. Here's
the scenario:

1. On 32-bit, the swapper_pg_dir page table is used as the initial page
table for booting a secondary CPU.

2. To make this work, swapper_pg_dir needs a direct mapping of physical
memory in it (the low mappings). By adding those low, large page (2M)
mappings (PAE kernel), we create the necessary conditions for Erratum
383 to occur.

3. Other CPUs which do not participate in the off- and onlining game may
use swapper_pg_dir while the low mappings are present (when leave_mm is
called). For all steps below, the CPU referred to is a CPU that is using
swapper_pg_dir, and not the CPU which is being onlined.

4. The presence of the low mappings in swapper_pg_dir can result
in TLB entries for addresses below __PAGE_OFFSET to be established
speculatively. These TLB entries are marked global and large.

5. When the CPU with such TLB entry switches to another page table, this
TLB entry remains because it is global.

6. The process then generates an access to an address covered by the
above TLB entry but there is a permission mismatch - the TLB entry
covers a large global page not accessible to userspace.

7. Due to this permission mismatch a new 4kb, user TLB entry gets
established. Further, Erratum 383 provides for a small window of time
where both TLB entries are present. This results in an uncorrectable
machine check exception signalling a TLB multimatch which panics the
machine.

There are two ways to fix this issue:

        1. Always do a global TLB flush when a new cr3 is loaded and the
        old page table was swapper_pg_dir. I consider this a hack hard
        to understand and with performance implications

        2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit
        does.

This patch implements solution 2. It introduces a trampoline_pg_dir
which has the same layout as swapper_pg_dir with low_mappings. This page
table is used as the initial page table of the booting CPU. Later in the
bringup process, it switches to swapper_pg_dir and does a global TLB
flush. This fixes the crashes in our test cases.

-v2: switch to swapper_pg_dir right after entering start_secondary() so
that we are able to access percpu data which might not be mapped in the
trampoline page table.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
LKML-Reference: <20100816123833.GB28147@aftab>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
 arch/x86/include/asm/pgtable_32.h |    1 +
 arch/x86/include/asm/trampoline.h |    3 +++
 arch/x86/kernel/head_32.S         |    8 +++++++-
 arch/x86/kernel/setup.c           |    3 +++
 arch/x86/kernel/smpboot.c         |   32 +++++++++++++-------------------
 arch/x86/kernel/trampoline.c      |   18 ++++++++++++++++++
 6 files changed, 45 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h
index 01fd946..750f1bf 100644
--- a/arch/x86/include/asm/pgtable_32.h
+++ b/arch/x86/include/asm/pgtable_32.h
@@ -27,6 +27,7 @@ struct mm_struct;
 struct vm_area_struct;
 
 extern pgd_t swapper_pg_dir[1024];
+extern pgd_t trampoline_pg_dir[1024];
 
 static inline void pgtable_cache_init(void) { }
 static inline void check_pgt_cache(void) { }
diff --git a/arch/x86/include/asm/trampoline.h b/arch/x86/include/asm/trampoline.h
index 90f06c2..2ab4ccd 100644
--- a/arch/x86/include/asm/trampoline.h
+++ b/arch/x86/include/asm/trampoline.h
@@ -13,15 +13,18 @@ extern unsigned char *trampoline_base;
 
 extern unsigned long init_rsp;
 extern unsigned long initial_code;
+extern unsigned long initial_page_table;
 extern unsigned long initial_gs;
 
 #define TRAMPOLINE_SIZE roundup(trampoline_end - trampoline_data, PAGE_SIZE)
 #define TRAMPOLINE_BASE 0x6000
 
 extern unsigned long setup_trampoline(void);
+extern void __init setup_trampoline_page_table(void);
 extern void __init reserve_trampoline_memory(void);
 #else
 static inline void reserve_trampoline_memory(void) {};
+extern void __init setup_trampoline_page_table(void) {};
 #endif /* CONFIG_X86_TRAMPOLINE */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 050c278..34c3308 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -324,7 +324,7 @@ ENTRY(startup_32_smp)
 /*
  * Enable paging
  */
-	movl $pa(swapper_pg_dir),%eax
+	movl pa(initial_page_table), %eax
 	movl %eax,%cr3		/* set the page table pointer.. */
 	movl %cr0,%eax
 	orl  $X86_CR0_PG,%eax
@@ -604,6 +604,8 @@ ignore_int:
 .align 4
 ENTRY(initial_code)
 	.long i386_start_kernel
+ENTRY(initial_page_table)
+	.long pa(swapper_pg_dir)
 
 /*
  * BSS section
@@ -619,6 +621,10 @@ ENTRY(swapper_pg_dir)
 #endif
 swapper_pg_fixmap:
 	.fill 1024,4,0
+#ifdef CONFIG_X86_TRAMPOLINE
+ENTRY(trampoline_pg_dir)
+	.fill 1024,4,0
+#endif
 ENTRY(empty_zero_page)
 	.fill 4096,1,0
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d7a0888..5449a26 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -110,6 +110,7 @@
 #include <asm/numa_64.h>
 #endif
 #include <asm/mce.h>
+#include <asm/trampoline.h>
 
 /*
  * end_pfn only includes RAM, while max_pfn_mapped includes all e820 entries.
@@ -998,6 +999,8 @@ void __init setup_arch(char **cmdline_p)
 	paging_init();
 	x86_init.paging.pagetable_setup_done(swapper_pg_dir);
 
+	setup_trampoline_page_table();
+
 	tboot_probe();
 
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 29ec560..7e8e905 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -70,7 +70,6 @@
 
 #ifdef CONFIG_X86_32
 u8 apicid_2_node[MAX_APICID];
-static int low_mappings;
 #endif
 
 /* State of each CPU */
@@ -292,6 +291,18 @@ notrace static void __cpuinit start_secondary(void *unused)
 	 * fragile that we want to limit the things done here to the
 	 * most necessary things.
 	 */
+
+#ifdef CONFIG_X86_32
+	/*
+	 * Switch away from the trampoline page-table
+	 *
+	 * Do this before cpu_init() because it needs to access per-cpu
+	 * data which may not be mapped in the trampoline page-table.
+	 */
+	load_cr3(swapper_pg_dir);
+	__flush_tlb_all();
+#endif
+
 	vmi_bringup();
 	cpu_init();
 	preempt_disable();
@@ -310,12 +321,6 @@ notrace static void __cpuinit start_secondary(void *unused)
 		enable_8259A_irq(0);
 	}
 
-#ifdef CONFIG_X86_32
-	while (low_mappings)
-		cpu_relax();
-	__flush_tlb_all();
-#endif
-
 	/* This must be done before setting cpu_online_mask */
 	set_cpu_sibling_map(raw_smp_processor_id());
 	wmb();
@@ -741,6 +746,7 @@ do_rest:
 #ifdef CONFIG_X86_32
 	/* Stack for startup_32 can be just as for start_secondary onwards */
 	irq_ctx_init(cpu);
+	initial_page_table = __pa(&trampoline_pg_dir);
 #else
 	clear_tsk_thread_flag(c_idle.idle, TIF_FORK);
 	initial_gs = per_cpu_offset(cpu);
@@ -885,20 +891,8 @@ int __cpuinit native_cpu_up(unsigned int cpu)
 
 	per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
 
-#ifdef CONFIG_X86_32
-	/* init low mem mapping */
-	clone_pgd_range(swapper_pg_dir, swapper_pg_dir + KERNEL_PGD_BOUNDARY,
-		min_t(unsigned long, KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY));
-	flush_tlb_all();
-	low_mappings = 1;
-
 	err = do_boot_cpu(apicid, cpu);
 
-	zap_low_mappings(false);
-	low_mappings = 0;
-#else
-	err = do_boot_cpu(apicid, cpu);
-#endif
 	if (err) {
 		pr_debug("do_boot_cpu failed %d\n", err);
 		return -EIO;
diff --git a/arch/x86/kernel/trampoline.c b/arch/x86/kernel/trampoline.c
index cd02212..4e816a4 100644
--- a/arch/x86/kernel/trampoline.c
+++ b/arch/x86/kernel/trampoline.c
@@ -1,6 +1,7 @@
 #include <linux/io.h>
 
 #include <asm/trampoline.h>
+#include <asm/pgtable.h>
 #include <asm/e820.h>
 
 #if defined(CONFIG_X86_64) && defined(CONFIG_ACPI_SLEEP)
@@ -39,3 +40,20 @@ unsigned long __trampinit setup_trampoline(void)
 	memcpy(trampoline_base, trampoline_data, TRAMPOLINE_SIZE);
 	return virt_to_phys(trampoline_base);
 }
+
+void __init setup_trampoline_page_table(void)
+{
+#ifdef CONFIG_X86_32
+	/* Copy kernel address range */
+	clone_pgd_range(trampoline_pg_dir + KERNEL_PGD_BOUNDARY,
+			swapper_pg_dir + KERNEL_PGD_BOUNDARY,
+			min_t(unsigned long, KERNEL_PGD_PTRS,
+			      KERNEL_PGD_BOUNDARY));
+
+	/* Initialize low mappings */
+	clone_pgd_range(trampoline_pg_dir,
+			swapper_pg_dir + KERNEL_PGD_BOUNDARY,
+			min_t(unsigned long, KERNEL_PGD_PTRS,
+			      KERNEL_PGD_BOUNDARY));
+#endif
+}
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/3] x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline
  2010-11-11 14:13             ` Greg KH
                                 ` (2 preceding siblings ...)
  2010-11-11 15:16               ` [PATCH 1/3] x86-32: Separate 1:1 pagetables from swapper_pg_dir Joerg Roedel
@ 2010-11-11 15:16               ` Joerg Roedel
  2010-12-07 21:06                 ` [stable] " Greg KH
  2010-11-11 15:16               ` [PATCH 3/3] x86-32: Fix dummy trampoline-related inline stubs Joerg Roedel
  4 siblings, 1 reply; 32+ messages in thread
From: Joerg Roedel @ 2010-11-11 15:16 UTC (permalink / raw)
  To: Greg KH
  Cc: Borislav Petkov, linux-kernel, stable, Hugh Dickins, H. Peter Anvin

From: Hugh Dickins <hughd@google.com>

commit b7d460897739e02f186425b7276e3fdb1595cea7 upstream

rc2 kernel crashes when booting second cpu on this CONFIG_VMSPLIT_2G_OPT
laptop: whereas cloning from kernel to low mappings pgd range does need
to limit by both KERNEL_PGD_PTRS and KERNEL_PGD_BOUNDARY, cloning kernel
pgd range itself must not be limited by the smaller KERNEL_PGD_BOUNDARY.

Signed-off-by: Hugh Dickins <hughd@google.com>
LKML-Reference: <alpine.LSU.2.00.1008242235120.2515@sister.anvils>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
 arch/x86/kernel/trampoline.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/trampoline.c b/arch/x86/kernel/trampoline.c
index 4e816a4..0ac23a7 100644
--- a/arch/x86/kernel/trampoline.c
+++ b/arch/x86/kernel/trampoline.c
@@ -47,8 +47,7 @@ void __init setup_trampoline_page_table(void)
 	/* Copy kernel address range */
 	clone_pgd_range(trampoline_pg_dir + KERNEL_PGD_BOUNDARY,
 			swapper_pg_dir + KERNEL_PGD_BOUNDARY,
-			min_t(unsigned long, KERNEL_PGD_PTRS,
-			      KERNEL_PGD_BOUNDARY));
+			KERNEL_PGD_PTRS);
 
 	/* Initialize low mappings */
 	clone_pgd_range(trampoline_pg_dir,
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/3] x86-32: Fix dummy trampoline-related inline stubs
  2010-11-11 14:13             ` Greg KH
                                 ` (3 preceding siblings ...)
  2010-11-11 15:16               ` [PATCH 2/3] x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline Joerg Roedel
@ 2010-11-11 15:16               ` Joerg Roedel
  2010-12-07 21:07                 ` Greg KH
  4 siblings, 1 reply; 32+ messages in thread
From: Joerg Roedel @ 2010-11-11 15:16 UTC (permalink / raw)
  To: Greg KH
  Cc: Borislav Petkov, linux-kernel, stable, H. Peter Anvin,
	Joerg Roedel, Borislav Petkov

From: H. Peter Anvin <hpa@zytor.com>

commit 8848a91068c018bc91f597038a0f41462a0f88a4 upstream

Fix dummy inline stubs for trampoline-related functions when no
trampolines exist (until we get rid of the no-trampoline case
entirely.)

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <4C6C294D.3030404@zytor.com>
---
 arch/x86/include/asm/trampoline.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/trampoline.h b/arch/x86/include/asm/trampoline.h
index 2ab4ccd..ebace68 100644
--- a/arch/x86/include/asm/trampoline.h
+++ b/arch/x86/include/asm/trampoline.h
@@ -23,8 +23,8 @@ extern unsigned long setup_trampoline(void);
 extern void __init setup_trampoline_page_table(void);
 extern void __init reserve_trampoline_memory(void);
 #else
-static inline void reserve_trampoline_memory(void) {};
-extern void __init setup_trampoline_page_table(void) {};
+static inline void setup_trampoline_page_table(void) {}
+static inline void reserve_trampoline_memory(void) {}
 #endif /* CONFIG_X86_TRAMPOLINE */
 
 #endif /* __ASSEMBLY__ */
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] x86-32: Separate 1:1 pagetables from swapper_pg_dir
  2010-11-11 15:16               ` [PATCH 1/3] x86-32: Separate 1:1 pagetables from swapper_pg_dir Joerg Roedel
@ 2010-12-07 21:05                 ` Greg KH
  2010-12-08  3:06                   ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 32+ messages in thread
From: Greg KH @ 2010-12-07 21:05 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Borislav Petkov, linux-kernel, stable, Borislav Petkov, H. Peter Anvin

On Thu, Nov 11, 2010 at 04:16:14PM +0100, Joerg Roedel wrote:
> commit fd89a137924e0710078c3ae855e7cec1c43cb845 upstream

applied, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [stable] [PATCH 2/3] x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline
  2010-11-11 15:16               ` [PATCH 2/3] x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline Joerg Roedel
@ 2010-12-07 21:06                 ` Greg KH
  0 siblings, 0 replies; 32+ messages in thread
From: Greg KH @ 2010-12-07 21:06 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: H. Peter Anvin, Hugh Dickins, linux-kernel, Borislav Petkov, stable

On Thu, Nov 11, 2010 at 04:16:15PM +0100, Joerg Roedel wrote:
> From: Hugh Dickins <hughd@google.com>
> 
> commit b7d460897739e02f186425b7276e3fdb1595cea7 upstream

Applied, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/3] x86-32: Fix dummy trampoline-related inline stubs
  2010-11-11 15:16               ` [PATCH 3/3] x86-32: Fix dummy trampoline-related inline stubs Joerg Roedel
@ 2010-12-07 21:07                 ` Greg KH
  0 siblings, 0 replies; 32+ messages in thread
From: Greg KH @ 2010-12-07 21:07 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Borislav Petkov, linux-kernel, stable, H. Peter Anvin, Borislav Petkov

On Thu, Nov 11, 2010 at 04:16:16PM +0100, Joerg Roedel wrote:
> From: H. Peter Anvin <hpa@zytor.com>
> 
> commit 8848a91068c018bc91f597038a0f41462a0f88a4 upstream

Applied, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] x86-32: Separate 1:1 pagetables from swapper_pg_dir
  2010-12-07 21:05                 ` Greg KH
@ 2010-12-08  3:06                   ` Jeremy Fitzhardinge
  2010-12-08  4:15                     ` Greg KH
  0 siblings, 1 reply; 32+ messages in thread
From: Jeremy Fitzhardinge @ 2010-12-08  3:06 UTC (permalink / raw)
  To: Greg KH
  Cc: Joerg Roedel, Borislav Petkov, linux-kernel, stable,
	Borislav Petkov, H. Peter Anvin, Ian Campbell

On 12/07/2010 01:05 PM, Greg KH wrote:
> On Thu, Nov 11, 2010 at 04:16:14PM +0100, Joerg Roedel wrote:
>> commit fd89a137924e0710078c3ae855e7cec1c43cb845 upstream
> applied, thanks.

This will need as well 5b5c1af104ab5adec1be9dcb4c787492d83d8d83 to
prevent Xen regressions.

    J

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] x86-32: Separate 1:1 pagetables from swapper_pg_dir
  2010-12-08  3:06                   ` Jeremy Fitzhardinge
@ 2010-12-08  4:15                     ` Greg KH
  2010-12-08  9:34                       ` Ian Campbell
  0 siblings, 1 reply; 32+ messages in thread
From: Greg KH @ 2010-12-08  4:15 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Joerg Roedel, Borislav Petkov, linux-kernel, stable,
	Borislav Petkov, H. Peter Anvin, Ian Campbell

On Tue, Dec 07, 2010 at 07:06:31PM -0800, Jeremy Fitzhardinge wrote:
> On 12/07/2010 01:05 PM, Greg KH wrote:
> > On Thu, Nov 11, 2010 at 04:16:14PM +0100, Joerg Roedel wrote:
> >> commit fd89a137924e0710078c3ae855e7cec1c43cb845 upstream
> > applied, thanks.
> 
> This will need as well 5b5c1af104ab5adec1be9dcb4c787492d83d8d83 to
> prevent Xen regressions.

So should I add that to a specific release?  The next .36 stable or
something else?

confused,

greg k-h

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] x86-32: Separate 1:1 pagetables from swapper_pg_dir
  2010-12-08  4:15                     ` Greg KH
@ 2010-12-08  9:34                       ` Ian Campbell
  2010-12-08 11:58                         ` Borislav Petkov
  0 siblings, 1 reply; 32+ messages in thread
From: Ian Campbell @ 2010-12-08  9:34 UTC (permalink / raw)
  To: Greg KH
  Cc: Jeremy Fitzhardinge, Joerg Roedel, Borislav Petkov, linux-kernel,
	stable, Borislav Petkov, H. Peter Anvin

On Wed, 2010-12-08 at 04:15 +0000, Greg KH wrote:
> On Tue, Dec 07, 2010 at 07:06:31PM -0800, Jeremy Fitzhardinge wrote:
> > On 12/07/2010 01:05 PM, Greg KH wrote:
> > > On Thu, Nov 11, 2010 at 04:16:14PM +0100, Joerg Roedel wrote:
> > >> commit fd89a137924e0710078c3ae855e7cec1c43cb845 upstream
> > > applied, thanks.
> > 
> > This will need as well 5b5c1af104ab5adec1be9dcb4c787492d83d8d83 to
> > prevent Xen regressions.
> 
> So should I add that to a specific release? The next .36 stable or something else?

5b5c1af104ab5 fixed an issue exposed by b40827fa7268. I'm not sure if
fd89a13792 also exposes a similar issue, but I think it does not:

It looks to me like fd89a13792 only reads from swapper_pg_dir and writes
to trampoline_pg_dir which is unpinned (and unused) under Xen and hence
there is no problem and 5b5c1af104ab5 is not needed.

b40827fa7268 is in 2.6.37-rc and doesn't seem to be targeted for a
stable backport AFAICT.

Ian.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] x86-32: Separate 1:1 pagetables from swapper_pg_dir
  2010-12-08  9:34                       ` Ian Campbell
@ 2010-12-08 11:58                         ` Borislav Petkov
  2010-12-08 15:21                           ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 32+ messages in thread
From: Borislav Petkov @ 2010-12-08 11:58 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Greg KH, Jeremy Fitzhardinge, Roedel, Joerg, Borislav Petkov,
	linux-kernel, stable, H. Peter Anvin

On Wed, Dec 08, 2010 at 04:34:15AM -0500, Ian Campbell wrote:
> On Wed, 2010-12-08 at 04:15 +0000, Greg KH wrote:
> > On Tue, Dec 07, 2010 at 07:06:31PM -0800, Jeremy Fitzhardinge wrote:
> > > On 12/07/2010 01:05 PM, Greg KH wrote:
> > > > On Thu, Nov 11, 2010 at 04:16:14PM +0100, Joerg Roedel wrote:
> > > >> commit fd89a137924e0710078c3ae855e7cec1c43cb845 upstream
> > > > applied, thanks.
> > > 
> > > This will need as well 5b5c1af104ab5adec1be9dcb4c787492d83d8d83 to
> > > prevent Xen regressions.
> >
> > So should I add that to a specific release? The next .36 stable or something else?
> 
> 5b5c1af104ab5 fixed an issue exposed by b40827fa7268. I'm not sure if
> fd89a13792 also exposes a similar issue, but I think it does not:
> 
> It looks to me like fd89a13792 only reads from swapper_pg_dir and writes
> to trampoline_pg_dir which is unpinned (and unused) under Xen and hence
> there is no problem and 5b5c1af104ab5 is not needed.

I tend to agree with Ian's assessment here.

Let me put it this way: have you guys seen any Xen-related regressions
with fd89a13792? I mean, this patch went in after 2.6.36-rc1 and if
it were breaking Xen, we (or maybe you :)) would've caught it by now,
right?

> b40827fa7268 is in 2.6.37-rc and doesn't seem to be targeted for a
> stable backport AFAICT.

Nope, since it's not fixing a bug or fits any other of the stable rules.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] x86-32: Separate 1:1 pagetables from swapper_pg_dir
  2010-12-08 11:58                         ` Borislav Petkov
@ 2010-12-08 15:21                           ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 32+ messages in thread
From: Jeremy Fitzhardinge @ 2010-12-08 15:21 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ian Campbell, Greg KH, Roedel, Joerg, linux-kernel, stable,
	H. Peter Anvin

On 12/08/2010 03:58 AM, Borislav Petkov wrote:
> On Wed, Dec 08, 2010 at 04:34:15AM -0500, Ian Campbell wrote:
>> On Wed, 2010-12-08 at 04:15 +0000, Greg KH wrote:
>>> On Tue, Dec 07, 2010 at 07:06:31PM -0800, Jeremy Fitzhardinge wrote:
>>>> On 12/07/2010 01:05 PM, Greg KH wrote:
>>>>> On Thu, Nov 11, 2010 at 04:16:14PM +0100, Joerg Roedel wrote:
>>>>>> commit fd89a137924e0710078c3ae855e7cec1c43cb845 upstream
>>>>> applied, thanks.
>>>> This will need as well 5b5c1af104ab5adec1be9dcb4c787492d83d8d83 to
>>>> prevent Xen regressions.
>>> So should I add that to a specific release? The next .36 stable or something else?
>> 5b5c1af104ab5 fixed an issue exposed by b40827fa7268. I'm not sure if
>> fd89a13792 also exposes a similar issue, but I think it does not:
>>
>> It looks to me like fd89a13792 only reads from swapper_pg_dir and writes
>> to trampoline_pg_dir which is unpinned (and unused) under Xen and hence
>> there is no problem and 5b5c1af104ab5 is not needed.
> I tend to agree with Ian's assessment here.
>
> Let me put it this way: have you guys seen any Xen-related regressions
> with fd89a13792? I mean, this patch went in after 2.6.36-rc1 and if
> it were breaking Xen, we (or maybe you :)) would've caught it by now,
> right?

No,  I got confused.  Sorry for spreading the confusion around...

    J

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] x86-32: Separate 1:1 pagetables from swapper_pg_dir
  2010-11-11 13:56         ` [PATCH 1/3] x86-32: Separate 1:1 pagetables from swapper_pg_dir Joerg Roedel
@ 2011-01-19  0:39           ` Konrad Rzeszutek Wilk
  2011-01-19  7:19             ` Borislav Petkov
  0 siblings, 1 reply; 32+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-01-19  0:39 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Greg KH, Borislav Petkov, linux-kernel, stable, Borislav Petkov,
	H. Peter Anvin

On Thu, Nov 11, 2010 at 02:56:13PM +0100, Joerg Roedel wrote:
> This patch fixes machine crashes which occur when heavily exercising the
> CPU hotplug codepaths on a 32-bit kernel. These crashes are caused by
> AMD Erratum 383 and result in a fatal machine check exception. Here's
> the scenario:
> 
> 1. On 32-bit, the swapper_pg_dir page table is used as the initial page
> table for booting a secondary CPU.
> 
> 2. To make this work, swapper_pg_dir needs a direct mapping of physical
> memory in it (the low mappings). By adding those low, large page (2M)
> mappings (PAE kernel), we create the necessary conditions for Erratum
> 383 to occur.
> 
> 3. Other CPUs which do not participate in the off- and onlining game may
> use swapper_pg_dir while the low mappings are present (when leave_mm is
> called). For all steps below, the CPU referred to is a CPU that is using
> swapper_pg_dir, and not the CPU which is being onlined.
> 
> 4. The presence of the low mappings in swapper_pg_dir can result
> in TLB entries for addresses below __PAGE_OFFSET to be established
> speculatively. These TLB entries are marked global and large.
> 
> 5. When the CPU with such TLB entry switches to another page table, this
> TLB entry remains because it is global.
> 
> 6. The process then generates an access to an address covered by the
> above TLB entry but there is a permission mismatch - the TLB entry
> covers a large global page not accessible to userspace.
> 
> 7. Due to this permission mismatch a new 4kb, user TLB entry gets
> established. Further, Erratum 383 provides for a small window of time
> where both TLB entries are present. This results in an uncorrectable
> machine check exception signalling a TLB multimatch which panics the
> machine.
> 
> There are two ways to fix this issue:
> 
>         1. Always do a global TLB flush when a new cr3 is loaded and the
>         old page table was swapper_pg_dir. I consider this a hack hard
>         to understand and with performance implications
> 
>         2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit
>         does.
> 
> This patch implements solution 2. It introduces a trampoline_pg_dir
> which has the same layout as swapper_pg_dir with low_mappings. This page
> table is used as the initial page table of the booting CPU. Later in the
> bringup process, it switches to swapper_pg_dir and does a global TLB
> flush. This fixes the crashes in our test cases.
> 
> -v2: switch to swapper_pg_dir right after entering start_secondary() so
> that we are able to access percpu data which might not be mapped in the
> trampoline page table.

You also might want to look at the regression this patch caused when it
was introduced. Mainly this fix:
805e3f495057aa5307ad4e3d6dc7073d4733c691

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] x86-32: Separate 1:1 pagetables from swapper_pg_dir
  2011-01-19  0:39           ` Konrad Rzeszutek Wilk
@ 2011-01-19  7:19             ` Borislav Petkov
  2011-01-19 15:52               ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 32+ messages in thread
From: Borislav Petkov @ 2011-01-19  7:19 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Joerg Roedel, Greg KH, Borislav Petkov, linux-kernel, stable,
	Borislav Petkov, H. Peter Anvin

On Tue, Jan 18, 2011 at 07:39:08PM -0500, Konrad Rzeszutek Wilk wrote:
> On Thu, Nov 11, 2010 at 02:56:13PM +0100, Joerg Roedel wrote:

[..]

> > -v2: switch to swapper_pg_dir right after entering start_secondary() so
> > that we are able to access percpu data which might not be mapped in the
> > trampoline page table.
> 
> You also might want to look at the regression this patch caused when it
> was introduced. Mainly this fix:
> 805e3f495057aa5307ad4e3d6dc7073d4733c691

I think you mean the regression _this_
b40827fa7268fda8a62490728a61c2856f33830b patch caused, which is not in
-stable anyway.

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] x86-32: Separate 1:1 pagetables from swapper_pg_dir
  2011-01-19  7:19             ` Borislav Petkov
@ 2011-01-19 15:52               ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 32+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-01-19 15:52 UTC (permalink / raw)
  To: Borislav Petkov, Joerg Roedel, Greg KH, Borislav Petkov,
	linux-kernel, stable, Borislav Petkov, H. Peter Anvin

On Wed, Jan 19, 2011 at 08:19:44AM +0100, Borislav Petkov wrote:
> On Tue, Jan 18, 2011 at 07:39:08PM -0500, Konrad Rzeszutek Wilk wrote:
> > On Thu, Nov 11, 2010 at 02:56:13PM +0100, Joerg Roedel wrote:
> 
> [..]
> 
> > > -v2: switch to swapper_pg_dir right after entering start_secondary() so
> > > that we are able to access percpu data which might not be mapped in the
> > > trampoline page table.
> > 
> > You also might want to look at the regression this patch caused when it
> > was introduced. Mainly this fix:
> > 805e3f495057aa5307ad4e3d6dc7073d4733c691
> 
> I think you mean the regression _this_
> b40827fa7268fda8a62490728a61c2856f33830b patch caused, which is not in
> -stable anyway.

Duh. Saw 'initial_page_table' and immediately jumped to conclusions.
Thank you for clarifying.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2011-01-19 15:55 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-24 11:52 Erratum 383 fix for 32 bit x86 kernels Joerg Roedel
2010-09-24 11:58 ` Joerg Roedel
2010-09-24 13:47 ` Greg KH
2010-09-24 13:53   ` Roedel, Joerg
2010-09-24 16:02 ` Greg KH
2010-09-24 16:24   ` Borislav Petkov
2010-09-24 16:29     ` Greg KH
2010-10-22 16:18     ` Greg KH
2010-10-22 16:20       ` Greg KH
2010-10-23  8:26         ` Borislav Petkov
2010-11-11 13:56         ` [PATCH 0/3] " Joerg Roedel
2010-11-11 13:56         ` [PATCH 1/3] x86-32: Separate 1:1 pagetables from swapper_pg_dir Joerg Roedel
2011-01-19  0:39           ` Konrad Rzeszutek Wilk
2011-01-19  7:19             ` Borislav Petkov
2011-01-19 15:52               ` Konrad Rzeszutek Wilk
2010-11-11 13:56         ` [PATCH 2/3] x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline Joerg Roedel
2010-11-11 14:11           ` Greg KH
2010-11-11 14:13             ` Greg KH
2010-11-11 14:17               ` Roedel, Joerg
2010-11-11 15:16               ` [PATCH 0/3] Erratum 383 fix for 32 bit x86 kernels Joerg Roedel
2010-11-11 15:16               ` [PATCH 1/3] x86-32: Separate 1:1 pagetables from swapper_pg_dir Joerg Roedel
2010-12-07 21:05                 ` Greg KH
2010-12-08  3:06                   ` Jeremy Fitzhardinge
2010-12-08  4:15                     ` Greg KH
2010-12-08  9:34                       ` Ian Campbell
2010-12-08 11:58                         ` Borislav Petkov
2010-12-08 15:21                           ` Jeremy Fitzhardinge
2010-11-11 15:16               ` [PATCH 2/3] x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline Joerg Roedel
2010-12-07 21:06                 ` [stable] " Greg KH
2010-11-11 15:16               ` [PATCH 3/3] x86-32: Fix dummy trampoline-related inline stubs Joerg Roedel
2010-12-07 21:07                 ` Greg KH
2010-11-11 13:56         ` Joerg Roedel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.