All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Fix 32-bit CPU hotplug issue on AMD
@ 2010-08-04 16:45 Borislav Petkov
  2010-08-04 16:45 ` [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines Borislav Petkov
  2010-08-04 16:45 ` [PATCH 2/2] x86, cpu hotplug: Fix cpu bringup concurrency issue Borislav Petkov
  0 siblings, 2 replies; 31+ messages in thread
From: Borislav Petkov @ 2010-08-04 16:45 UTC (permalink / raw)
  To: hpa, mingo, tglx
  Cc: andreas.herrmann3, conny.seidel, joerg.roedel, Bhavna.Sarathy,
	greg, x86, linux-kernel

From: Borislav Petkov <borislav.petkov@amd.com>

Hi,

the following two patches fix an issue which one of our tests triggers
here on a 32-bit kernel while exercising the CPU hotplug path. More
details are to be found in the respective commit messages.

Now, considering the seriousness of the issue, I'd very much like
to backport the fixes to .32 and .34. And while the fixes are not
oneliners, they're straightforward enough to not cause any problems
IMHO. But I may be biased from staring at them for a while now so what
do you guys, think, any objections to backporting those?

Thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines
  2010-08-04 16:45 [PATCH 0/2] Fix 32-bit CPU hotplug issue on AMD Borislav Petkov
@ 2010-08-04 16:45 ` Borislav Petkov
  2010-08-04 23:05   ` H. Peter Anvin
  2010-08-04 16:45 ` [PATCH 2/2] x86, cpu hotplug: Fix cpu bringup concurrency issue Borislav Petkov
  1 sibling, 1 reply; 31+ messages in thread
From: Borislav Petkov @ 2010-08-04 16:45 UTC (permalink / raw)
  To: hpa, mingo, tglx
  Cc: andreas.herrmann3, conny.seidel, joerg.roedel, Bhavna.Sarathy,
	greg, x86, linux-kernel

From: Joerg Roedel <joerg.roedel@amd.com>

This patch fixes machine crashes which occur when heavily exercising the
CPU hotplug codepaths on a 32-bit kernel. These crashes are caused by
AMD Erratum 383 and result in a fatal machine check exception. Here's
the scenario:

1. On 32-bit, the swapper_pg_dir page table is used as the initial page
table for booting a secondary CPU.

2. To make this work, swapper_pg_dir needs a direct mapping of physical
memory in it (the low mappings). By adding those low, large page (2M)
mappings (PAE kernel), we create the necessary conditions for Erratum
383 to occur.

3. Other CPUs which do not participate in the off- and onlining game may
use swapper_pg_dir while the low mappings are present (when leave_mm is
called). For all steps below, the CPU referred to is a CPU that is using
swapper_pg_dir, and not the CPU which is being onlined.

4. The presence of the low mappings in swapper_pg_dir can result
in TLB entries for addresses below __PAGE_OFFSET to be established
speculatively. These TLB entries are marked global and large.

5. When the CPU with such TLB entry switches to another page table, this
TLB entry remains because it is global.

6. The process then generates an access to an address covered by the
above TLB entry but there is a permission mismatch - the TLB entry
covers a large global page not accessible to userspace.

7. Due to this permission mismatch a new 4kb, user TLB entry gets
established. Further, Erratum 383 provides for a small window of time
where both TLB entries are present. This results in an uncorrectable
machine check exception signalling a TLB multimatch which panics the
machine.

There are two ways to fix this issue:

        1. Always do a global TLB flush when a new cr3 is loaded and the
        old page table was swapper_pg_dir. I consider this a hack hard
        to understand and with performance implications

        2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit
        does.

This patch implements solution 2. It introduces a trampoline_pg_dir
which has the same layout as swapper_pg_dir with low_mappings. This page
table is used as the initial page table of the booting CPU. Later in the
bringup process, it switches to swapper_pg_dir and does a global TLB
flush. This fixes the crashes in our test cases.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
 arch/x86/include/asm/pgtable_32.h |    1 +
 arch/x86/include/asm/trampoline.h |    3 +++
 arch/x86/kernel/head_32.S         |    8 +++++++-
 arch/x86/kernel/setup.c           |    2 ++
 arch/x86/kernel/smpboot.c         |   18 +++---------------
 arch/x86/kernel/trampoline.c      |   18 ++++++++++++++++++
 6 files changed, 34 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h
index 2984a25..f686f49 100644
--- a/arch/x86/include/asm/pgtable_32.h
+++ b/arch/x86/include/asm/pgtable_32.h
@@ -26,6 +26,7 @@ struct mm_struct;
 struct vm_area_struct;
 
 extern pgd_t swapper_pg_dir[1024];
+extern pgd_t trampoline_pg_dir[1024];
 
 static inline void pgtable_cache_init(void) { }
 static inline void check_pgt_cache(void) { }
diff --git a/arch/x86/include/asm/trampoline.h b/arch/x86/include/asm/trampoline.h
index cb507bb..8f78fdf 100644
--- a/arch/x86/include/asm/trampoline.h
+++ b/arch/x86/include/asm/trampoline.h
@@ -13,14 +13,17 @@ extern unsigned char *trampoline_base;
 
 extern unsigned long init_rsp;
 extern unsigned long initial_code;
+extern unsigned long initial_page_table;
 extern unsigned long initial_gs;
 
 #define TRAMPOLINE_SIZE roundup(trampoline_end - trampoline_data, PAGE_SIZE)
 
 extern unsigned long setup_trampoline(void);
+extern void __init setup_trampoline_page_table(void);
 extern void __init reserve_trampoline_memory(void);
 #else
 static inline void reserve_trampoline_memory(void) {};
+extern void __init setup_trampoline_page_table(void) {};
 #endif /* CONFIG_X86_TRAMPOLINE */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 37c3d4b..75e3981 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -328,7 +328,7 @@ ENTRY(startup_32_smp)
 /*
  * Enable paging
  */
-	movl $pa(swapper_pg_dir),%eax
+	movl pa(initial_page_table), %eax
 	movl %eax,%cr3		/* set the page table pointer.. */
 	movl %cr0,%eax
 	orl  $X86_CR0_PG,%eax
@@ -608,6 +608,8 @@ ignore_int:
 .align 4
 ENTRY(initial_code)
 	.long i386_start_kernel
+ENTRY(initial_page_table)
+	.long pa(swapper_pg_dir)
 
 /*
  * BSS section
@@ -623,6 +625,10 @@ ENTRY(swapper_pg_dir)
 #endif
 swapper_pg_fixmap:
 	.fill 1024,4,0
+#ifdef CONFIG_X86_TRAMPOLINE
+ENTRY(trampoline_pg_dir)
+	.fill 1024,4,0
+#endif
 ENTRY(empty_zero_page)
 	.fill 4096,1,0
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index b4ae4ac..6600cfd 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1008,6 +1008,8 @@ void __init setup_arch(char **cmdline_p)
 	paging_init();
 	x86_init.paging.pagetable_setup_done(swapper_pg_dir);
 
+	setup_trampoline_page_table();
+
 	tboot_probe();
 
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c4f33b2..6859a42 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -73,7 +73,6 @@
 
 #ifdef CONFIG_X86_32
 u8 apicid_2_node[MAX_APICID];
-static int low_mappings;
 #endif
 
 /* State of each CPU */
@@ -300,8 +299,8 @@ notrace static void __cpuinit start_secondary(void *unused)
 	}
 
 #ifdef CONFIG_X86_32
-	while (low_mappings)
-		cpu_relax();
+	/* switch away from the trampoline page-table */
+	load_cr3(swapper_pg_dir);
 	__flush_tlb_all();
 #endif
 
@@ -754,6 +753,7 @@ do_rest:
 #ifdef CONFIG_X86_32
 	/* Stack for startup_32 can be just as for start_secondary onwards */
 	irq_ctx_init(cpu);
+	initial_page_table = __pa(&trampoline_pg_dir);
 #else
 	clear_tsk_thread_flag(c_idle.idle, TIF_FORK);
 	initial_gs = per_cpu_offset(cpu);
@@ -894,20 +894,8 @@ int __cpuinit native_cpu_up(unsigned int cpu)
 
 	per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
 
-#ifdef CONFIG_X86_32
-	/* init low mem mapping */
-	clone_pgd_range(swapper_pg_dir, swapper_pg_dir + KERNEL_PGD_BOUNDARY,
-		min_t(unsigned long, KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY));
-	flush_tlb_all();
-	low_mappings = 1;
-
 	err = do_boot_cpu(apicid, cpu);
 
-	zap_low_mappings(false);
-	low_mappings = 0;
-#else
-	err = do_boot_cpu(apicid, cpu);
-#endif
 	if (err) {
 		pr_debug("do_boot_cpu failed %d\n", err);
 		return -EIO;
diff --git a/arch/x86/kernel/trampoline.c b/arch/x86/kernel/trampoline.c
index c652ef6..a874495 100644
--- a/arch/x86/kernel/trampoline.c
+++ b/arch/x86/kernel/trampoline.c
@@ -1,6 +1,7 @@
 #include <linux/io.h>
 
 #include <asm/trampoline.h>
+#include <asm/pgtable.h>
 #include <asm/e820.h>
 
 #if defined(CONFIG_X86_64) && defined(CONFIG_ACPI_SLEEP)
@@ -37,3 +38,20 @@ unsigned long __trampinit setup_trampoline(void)
 	memcpy(trampoline_base, trampoline_data, TRAMPOLINE_SIZE);
 	return virt_to_phys(trampoline_base);
 }
+
+void __init setup_trampoline_page_table(void)
+{
+#ifdef CONFIG_X86_32
+	/* Copy kernel address range */
+	clone_pgd_range(trampoline_pg_dir + KERNEL_PGD_BOUNDARY,
+			swapper_pg_dir + KERNEL_PGD_BOUNDARY,
+			min_t(unsigned long, KERNEL_PGD_PTRS,
+			      KERNEL_PGD_BOUNDARY));
+
+	/* Initialize low mappings */
+	clone_pgd_range(trampoline_pg_dir,
+			swapper_pg_dir + KERNEL_PGD_BOUNDARY,
+			min_t(unsigned long, KERNEL_PGD_PTRS,
+			      KERNEL_PGD_BOUNDARY));
+#endif
+}
-- 
1.6.4.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 2/2] x86, cpu hotplug: Fix cpu bringup concurrency issue
  2010-08-04 16:45 [PATCH 0/2] Fix 32-bit CPU hotplug issue on AMD Borislav Petkov
  2010-08-04 16:45 ` [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines Borislav Petkov
@ 2010-08-04 16:45 ` Borislav Petkov
  1 sibling, 0 replies; 31+ messages in thread
From: Borislav Petkov @ 2010-08-04 16:45 UTC (permalink / raw)
  To: hpa, mingo, tglx
  Cc: andreas.herrmann3, conny.seidel, joerg.roedel, Bhavna.Sarathy,
	greg, x86, linux-kernel

From: Borislav Petkov <borislav.petkov@amd.com>

When testing cpu hotplug code on 32-bit we kept hitting the "CPU%d:
Stuck ??" message due to multiple cores concurrently accessing the
cpu_callin_mask, among others.

Since these codepaths are not protected from concurrent access due to
the fact that there's no sane reason for making an already complex
code unnecessarily more complex - we hit the issue only when insanely
switching cores off- and online - serialize hotplugging cores on the
sysfs level and be done with it.

Cc: <stable@kernel.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
 arch/x86/Kconfig          |    6 ++++++
 arch/x86/kernel/smpboot.c |   19 +++++++++++++++++++
 2 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index dcb0593..1598663 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -58,6 +58,7 @@ config X86
 	select ANON_INODES
 	select HAVE_ARCH_KMEMCHECK
 	select HAVE_USER_RETURN_NOTIFIER
+	select ARCH_CPU_PROBE_RELEASE
 
 config INSTRUCTION_DECODER
 	def_bool (KPROBES || PERF_EVENTS)
@@ -247,6 +248,11 @@ config ARCH_HWEIGHT_CFLAGS
 
 config KTIME_SCALAR
 	def_bool X86_32
+
+config ARCH_CPU_PROBE_RELEASE
+	def_bool y
+	depends on HOTPLUG_CPU
+
 source "init/Kconfig"
 source "kernel/Kconfig.freezer"
 
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 6859a42..9e8e61b 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -90,6 +90,25 @@ DEFINE_PER_CPU(int, cpu_state) = { 0 };
 static DEFINE_PER_CPU(struct task_struct *, idle_thread_array);
 #define get_idle_for_cpu(x)      (per_cpu(idle_thread_array, x))
 #define set_idle_for_cpu(x, p)   (per_cpu(idle_thread_array, x) = (p))
+
+/*
+ * We need this for trampoline_base protection from concurrent accesses when
+ * off- and onlining cores wildly.
+ */
+static DEFINE_MUTEX(x86_cpu_hotplug_driver_mutex);
+
+void cpu_hotplug_driver_lock()
+{
+        mutex_lock(&x86_cpu_hotplug_driver_mutex);
+}
+
+void cpu_hotplug_driver_unlock()
+{
+        mutex_unlock(&x86_cpu_hotplug_driver_mutex);
+}
+
+ssize_t arch_cpu_probe(const char *buf, size_t count) { return -1; }
+ssize_t arch_cpu_release(const char *buf, size_t count) { return -1; }
 #else
 static struct task_struct *idle_thread_array[NR_CPUS] __cpuinitdata ;
 #define get_idle_for_cpu(x)      (idle_thread_array[(x)])
-- 
1.6.4.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines
  2010-08-04 16:45 ` [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines Borislav Petkov
@ 2010-08-04 23:05   ` H. Peter Anvin
  2010-08-05  4:48     ` Borislav Petkov
  2010-08-05  7:45     ` Roedel, Joerg
  0 siblings, 2 replies; 31+ messages in thread
From: H. Peter Anvin @ 2010-08-04 23:05 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: mingo, tglx, andreas.herrmann3, conny.seidel, joerg.roedel,
	Bhavna.Sarathy, greg, x86, linux-kernel

On 08/04/2010 09:45 AM, Borislav Petkov wrote:
> 
>         2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit
>         does.
> 
> This patch implements solution 2. It introduces a trampoline_pg_dir
> which has the same layout as swapper_pg_dir with low_mappings. This page
> table is used as the initial page table of the booting CPU. Later in the
> bringup process, it switches to swapper_pg_dir and does a global TLB
> flush. This fixes the crashes in our test cases.
> 

I would like to keep around a page directory with the low mappings
around -- and not use it for kernel threads -- at all times *anyway*.
This means we can remove any current hacks that we have to do around S3
entry and exit, for example.

--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -328,7 +328,7 @@ ENTRY(startup_32_smp)
 /*
  * Enable paging
  */
-	movl $pa(swapper_pg_dir),%eax
+	movl pa(initial_page_table), %eax
 	movl %eax,%cr3		/* set the page table pointer.. */
 	movl %cr0,%eax
 	orl  $X86_CR0_PG,%eax
@@ -608,6 +608,8 @@ ignore_int:
 .align 4
 ENTRY(initial_code)
 	.long i386_start_kernel
+ENTRY(initial_page_table)
+	.long pa(swapper_pg_dir)

 /*
  * BSS section
@@ -623,6 +625,10 @@ ENTRY(swapper_pg_dir)
 #endif
 swapper_pg_fixmap:
 	.fill 1024,4,0
+#ifdef CONFIG_X86_TRAMPOLINE
+ENTRY(trampoline_pg_dir)
+	.fill 1024,4,0
+#endif

I don't really see why this makes sense, though.  It would make more
sense that the initial page table we set up becomes trampoline_pg_dir;
we can then set up and change to swapper_pg_dir almost immediately.

I realize this isn't how the 64-bit code works at the moment, but in a
lot of ways I think it would be better if it did.

	-hpa

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines
  2010-08-04 23:05   ` H. Peter Anvin
@ 2010-08-05  4:48     ` Borislav Petkov
  2010-08-05  7:45     ` Roedel, Joerg
  1 sibling, 0 replies; 31+ messages in thread
From: Borislav Petkov @ 2010-08-05  4:48 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: mingo, tglx, Herrmann3, Andreas, Seidel, Conny, Roedel, Joerg,
	Sarathy, Bhavna, greg, x86, linux-kernel

From: "H. Peter Anvin" <hpa@zytor.com>
Date: Wed, Aug 04, 2010 at 07:05:47PM -0400

> On 08/04/2010 09:45 AM, Borislav Petkov wrote:
> > 
> >         2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit
> >         does.
> > 
> > This patch implements solution 2. It introduces a trampoline_pg_dir
> > which has the same layout as swapper_pg_dir with low_mappings. This page
> > table is used as the initial page table of the booting CPU. Later in the
> > bringup process, it switches to swapper_pg_dir and does a global TLB
> > flush. This fixes the crashes in our test cases.
> > 
> 
> I would like to keep around a page directory with the low mappings
> around -- and not use it for kernel threads -- at all times *anyway*.
> This means we can remove any current hacks that we have to do around S3
> entry and exit, for example.
> 
> --- a/arch/x86/kernel/head_32.S
> +++ b/arch/x86/kernel/head_32.S
> @@ -328,7 +328,7 @@ ENTRY(startup_32_smp)
>  /*
>   * Enable paging
>   */
> -	movl $pa(swapper_pg_dir),%eax
> +	movl pa(initial_page_table), %eax
>  	movl %eax,%cr3		/* set the page table pointer.. */
>  	movl %cr0,%eax
>  	orl  $X86_CR0_PG,%eax
> @@ -608,6 +608,8 @@ ignore_int:
>  .align 4
>  ENTRY(initial_code)
>  	.long i386_start_kernel
> +ENTRY(initial_page_table)
> +	.long pa(swapper_pg_dir)
> 
>  /*
>   * BSS section
> @@ -623,6 +625,10 @@ ENTRY(swapper_pg_dir)
>  #endif
>  swapper_pg_fixmap:
>  	.fill 1024,4,0
> +#ifdef CONFIG_X86_TRAMPOLINE
> +ENTRY(trampoline_pg_dir)
> +	.fill 1024,4,0
> +#endif
> 
> I don't really see why this makes sense, though.  It would make more
> sense that the initial page table we set up becomes trampoline_pg_dir;
> we can then set up and change to swapper_pg_dir almost immediately.

Yeah, now we use swapper_pg_dir at all times and zap the low mappings.
However, this is not perfectly clean, as this case in point shows how
unrelated CPUs might establish TLB entries speculatively. Now imagine
if they don't mcheck about it but silently and merrily continue on.
In this particular case, there were no improper accesses due to the
user/kernel permissions mismatch but imagine if suddenly kernel code
started accessing userspace and this not through copy_to_user() et al.

So it really does make sense to have an initial page table and copy
swapper_pg_dir from it. Which would be a perfect exercise for someone
who would like to play with the boot code a bit more, ^hint hint^, if
Joerg doesn't beat me to it.

But I'd suggest we get those fixes in now if there are no objections and
later adjustments should come ontop after excessive testing. And what
about backporting those fixes to .32 and .34, would you be ok with that?
Greg, what about you?

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines
  2010-08-04 23:05   ` H. Peter Anvin
  2010-08-05  4:48     ` Borislav Petkov
@ 2010-08-05  7:45     ` Roedel, Joerg
  2010-08-05 14:14       ` H. Peter Anvin
  1 sibling, 1 reply; 31+ messages in thread
From: Roedel, Joerg @ 2010-08-05  7:45 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, mingo, tglx, Herrmann3, Andreas, Seidel, Conny,
	Sarathy, Bhavna, greg, x86, linux-kernel

On Wed, Aug 04, 2010 at 07:05:47PM -0400, H. Peter Anvin wrote:
> On 08/04/2010 09:45 AM, Borislav Petkov wrote:
> > 
> >         2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit
> >         does.
> > 
> > This patch implements solution 2. It introduces a trampoline_pg_dir
> > which has the same layout as swapper_pg_dir with low_mappings. This page
> > table is used as the initial page table of the booting CPU. Later in the
> > bringup process, it switches to swapper_pg_dir and does a global TLB
> > flush. This fixes the crashes in our test cases.
> > 
> 
> I would like to keep around a page directory with the low mappings
> around -- and not use it for kernel threads -- at all times *anyway*.
> This means we can remove any current hacks that we have to do around S3
> entry and exit, for example.

Yeah, the page table with the low mappings is trampoline_pg_dir
introduced in this patch :-)

> --- a/arch/x86/kernel/head_32.S
> +++ b/arch/x86/kernel/head_32.S
> @@ -328,7 +328,7 @@ ENTRY(startup_32_smp)
>  /*
>   * Enable paging
>   */
> -	movl $pa(swapper_pg_dir),%eax
> +	movl pa(initial_page_table), %eax
>  	movl %eax,%cr3		/* set the page table pointer.. */
>  	movl %cr0,%eax
>  	orl  $X86_CR0_PG,%eax
> @@ -608,6 +608,8 @@ ignore_int:
>  .align 4
>  ENTRY(initial_code)
>  	.long i386_start_kernel
> +ENTRY(initial_page_table)
> +	.long pa(swapper_pg_dir)
> 
>  /*
>   * BSS section
> @@ -623,6 +625,10 @@ ENTRY(swapper_pg_dir)
>  #endif
>  swapper_pg_fixmap:
>  	.fill 1024,4,0
> +#ifdef CONFIG_X86_TRAMPOLINE
> +ENTRY(trampoline_pg_dir)
> +	.fill 1024,4,0
> +#endif
> 
> I don't really see why this makes sense, though.  It would make more
> sense that the initial page table we set up becomes trampoline_pg_dir;
> we can then set up and change to swapper_pg_dir almost immediately.

To make sure I understand correctly, you suggest to initialize
tramponline_pg_dir in the boot sequence of the first cpu and fork
swapper_pg_dir from it later on?

> I realize this isn't how the 64-bit code works at the moment, but in a
> lot of ways I think it would be better if it did.

Yeah, may make sense. This patch already brings the 32 bit
implementation closer to the 64 bit one. On 64 bit things are somewhat
simpler because the tramponline page table can be defined at
compile-time there (contains only 2 pgd_t entries) while on 32 bit we
have to initialize it at runtime.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines
  2010-08-05  7:45     ` Roedel, Joerg
@ 2010-08-05 14:14       ` H. Peter Anvin
  2010-08-12 14:41         ` Joerg Roedel
  0 siblings, 1 reply; 31+ messages in thread
From: H. Peter Anvin @ 2010-08-05 14:14 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: Borislav Petkov, mingo, tglx, Herrmann3, Andreas, Seidel, Conny,
	Sarathy, Bhavna, greg, x86, linux-kernel

On 08/05/2010 12:45 AM, Roedel, Joerg wrote:
> 
> To make sure I understand correctly, you suggest to initialize
> tramponline_pg_dir in the boot sequence of the first cpu and fork
> swapper_pg_dir from it later on?
> 

Correct.

>> I realize this isn't how the 64-bit code works at the moment, but in a
>> lot of ways I think it would be better if it did.
> 
> Yeah, may make sense. This patch already brings the 32 bit
> implementation closer to the 64 bit one. On 64 bit things are somewhat
> simpler because the tramponline page table can be defined at
> compile-time there (contains only 2 pgd_t entries) while on 32 bit we
> have to initialize it at runtime.

Correct, again.  It's unclear to me if we can get away with the very
simple 64-bit approach -- in particular, not including all the 1:1
mappings in the kernel -- for all future users, though.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines
  2010-08-05 14:14       ` H. Peter Anvin
@ 2010-08-12 14:41         ` Joerg Roedel
  2010-08-12 15:34           ` H. Peter Anvin
  0 siblings, 1 reply; 31+ messages in thread
From: Joerg Roedel @ 2010-08-12 14:41 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Roedel, Joerg, Borislav Petkov, mingo, tglx, Herrmann3, Andreas,
	Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel

On Thu, Aug 05, 2010 at 07:14:14AM -0700, H. Peter Anvin wrote:
> On 08/05/2010 12:45 AM, Roedel, Joerg wrote:
> Correct, again.  It's unclear to me if we can get away with the very
> simple 64-bit approach -- in particular, not including all the 1:1
> mappings in the kernel -- for all future users, though.

Yeah, We could probably use the same symbol names for the trampoline and
the swapper page-table on 32 and 64 bit and merge the code that handles
it between the architectures. Shouldn't be too difficult. We cook
something up :-)
But do you mind to take this patch first? It fixes the occurence of an
erratum on AMD hardware, it is a quite simple fix compared to the rework
suggested. In fact, I would like to have a simple patch to fix the
problem first (and backport it as necessary) and then go the way to
rework the current code differences between 32 and 64 bit.

	Joerg


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines
  2010-08-12 14:41         ` Joerg Roedel
@ 2010-08-12 15:34           ` H. Peter Anvin
  2010-08-12 15:47             ` Borislav Petkov
  2010-08-12 17:06             ` [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines Joerg Roedel
  0 siblings, 2 replies; 31+ messages in thread
From: H. Peter Anvin @ 2010-08-12 15:34 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Roedel, Joerg, Borislav Petkov, mingo, tglx, Herrmann3, Andreas,
	Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel

On 08/12/2010 07:41 AM, Joerg Roedel wrote:
> On Thu, Aug 05, 2010 at 07:14:14AM -0700, H. Peter Anvin wrote:
>> On 08/05/2010 12:45 AM, Roedel, Joerg wrote:
>> Correct, again.  It's unclear to me if we can get away with the very
>> simple 64-bit approach -- in particular, not including all the 1:1
>> mappings in the kernel -- for all future users, though.
> 
> Yeah, We could probably use the same symbol names for the trampoline and
> the swapper page-table on 32 and 64 bit and merge the code that handles
> it between the architectures. Shouldn't be too difficult. We cook
> something up :-)
> But do you mind to take this patch first? It fixes the occurence of an
> erratum on AMD hardware, it is a quite simple fix compared to the rework
> suggested. In fact, I would like to have a simple patch to fix the
> problem first (and backport it as necessary) and then go the way to
> rework the current code differences between 32 and 64 bit.
> 

Agreed.  I do have a concern about the kernel page tables not being
synchronized into trampoline_pg_dir (it's only an issue for 32-bit
!PAE), so at the very least we need to keep an eye out for it...

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines
  2010-08-12 15:47             ` Borislav Petkov
@ 2010-08-12 15:47               ` H. Peter Anvin
  2010-08-12 17:38                 ` Borislav Petkov
  2010-08-24  7:33                 ` Initial working version (Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines) Borislav Petkov
  0 siblings, 2 replies; 31+ messages in thread
From: H. Peter Anvin @ 2010-08-12 15:47 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Joerg Roedel, Roedel, Joerg, mingo, tglx, Herrmann3, Andreas,
	Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel

On 08/12/2010 08:47 AM, Borislav Petkov wrote:
>>
>> Agreed.  I do have a concern about the kernel page tables not being
>> synchronized into trampoline_pg_dir (it's only an issue for 32-bit
>> !PAE), so at the very least we need to keep an eye out for it...
> 
> I've been working on what you suggested and the patch boots on a 32-bit
> PAE kernel but keeps rebooting on non-PAE after setting the stack_start
> on the AP. I could send it out later for you to look at, if you like -
> you should be able to spot what I'm missing :)...
> 

Sounds good.  This is obviously .37-or-higher material.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines
  2010-08-12 15:34           ` H. Peter Anvin
@ 2010-08-12 15:47             ` Borislav Petkov
  2010-08-12 15:47               ` H. Peter Anvin
  2010-08-12 17:06             ` [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines Joerg Roedel
  1 sibling, 1 reply; 31+ messages in thread
From: Borislav Petkov @ 2010-08-12 15:47 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Joerg Roedel, Roedel, Joerg, mingo, tglx, Herrmann3, Andreas,
	Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel

From: "H. Peter Anvin" <hpa@zytor.com>
Date: Thu, Aug 12, 2010 at 11:34:37AM -0400

> On 08/12/2010 07:41 AM, Joerg Roedel wrote:
> > On Thu, Aug 05, 2010 at 07:14:14AM -0700, H. Peter Anvin wrote:
> >> On 08/05/2010 12:45 AM, Roedel, Joerg wrote:
> >> Correct, again.  It's unclear to me if we can get away with the very
> >> simple 64-bit approach -- in particular, not including all the 1:1
> >> mappings in the kernel -- for all future users, though.
> > 
> > Yeah, We could probably use the same symbol names for the trampoline and
> > the swapper page-table on 32 and 64 bit and merge the code that handles
> > it between the architectures. Shouldn't be too difficult. We cook
> > something up :-)
> > But do you mind to take this patch first? It fixes the occurence of an
> > erratum on AMD hardware, it is a quite simple fix compared to the rework
> > suggested. In fact, I would like to have a simple patch to fix the
> > problem first (and backport it as necessary) and then go the way to
> > rework the current code differences between 32 and 64 bit.
> > 
> 
> Agreed.  I do have a concern about the kernel page tables not being
> synchronized into trampoline_pg_dir (it's only an issue for 32-bit
> !PAE), so at the very least we need to keep an eye out for it...

I've been working on what you suggested and the patch boots on a 32-bit
PAE kernel but keeps rebooting on non-PAE after setting the stack_start
on the AP. I could send it out later for you to look at, if you like -
you should be able to spot what I'm missing :)...

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines
  2010-08-12 15:34           ` H. Peter Anvin
  2010-08-12 15:47             ` Borislav Petkov
@ 2010-08-12 17:06             ` Joerg Roedel
  2010-08-12 19:01               ` H. Peter Anvin
  1 sibling, 1 reply; 31+ messages in thread
From: Joerg Roedel @ 2010-08-12 17:06 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Roedel, Joerg, Borislav Petkov, mingo, tglx, Herrmann3, Andreas,
	Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel

On Thu, Aug 12, 2010 at 08:34:37AM -0700, H. Peter Anvin wrote:

> Agreed.  I do have a concern about the kernel page tables not being
> synchronized into trampoline_pg_dir (it's only an issue for 32-bit
> !PAE), so at the very least we need to keep an eye out for it...

Great.

The synchronization problem might be real. The cpu_init function uses
per-cpu variables which might not be present in the early synced
tramponline page table (iirc). The change to swapper_pg_dir and the
global tlb flush should probably be moved to the very beginning of the
start_secondary function.

	Joerg


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines
  2010-08-12 15:47               ` H. Peter Anvin
@ 2010-08-12 17:38                 ` Borislav Petkov
  2010-08-24  7:33                 ` Initial working version (Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines) Borislav Petkov
  1 sibling, 0 replies; 31+ messages in thread
From: Borislav Petkov @ 2010-08-12 17:38 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Joerg Roedel, Roedel, Joerg, mingo, tglx, Herrmann3, Andreas,
	Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel

From: "H. Peter Anvin" <hpa@zytor.com>
Date: Thu, Aug 12, 2010 at 11:47:55AM -0400

> On 08/12/2010 08:47 AM, Borislav Petkov wrote:
> >>
> >> Agreed.  I do have a concern about the kernel page tables not being
> >> synchronized into trampoline_pg_dir (it's only an issue for 32-bit
> >> !PAE), so at the very least we need to keep an eye out for it...
> > 
> > I've been working on what you suggested and the patch boots on a 32-bit
> > PAE kernel but keeps rebooting on non-PAE after setting the stack_start
> > on the AP. I could send it out later for you to look at, if you like -
> > you should be able to spot what I'm missing :)...
> > 
> 
> Sounds good.  This is obviously .37-or-higher material.

Definitely, this needs a lot of hammering before going upstream.

So here's what I got so far, I went and changed everything called
swapper_* when used to prep the initial page table to initial_*. Then
in setup_arch I switch to the swapper_pg_dir after copying the kernel
address mappings into it so all other places touching swapper_pg_dir can
remain unchanged. Then, initial_page_table is still used for booting the
APs.

The nice effect is that we drop all that swsusp_pg_dir for ACPI_SLEEP
and unifying the do_boot_cpu() part of start_secondary() even more.

So this boots fine on 32-bit PAE but reboots the machine on !PAE when
the AP does

	/* Set up the stack pointer */
	lss stack_start,%esp

	pushl $0
	popfl

in head_32.S. I've debugged this by adding endless loops in the manner of 

@@@ -274,6 -274,6 +274,7 @@@ ENTRY(startup_32_smp
        movl %eax,%es
        movl %eax,%fs
        movl %eax,%gs
+      movl $0xdeadbeef, %ebx
  #endif /* CONFIG_SMP */

so that this happens only on the AP and then did

+667:
+      cmpl $0xdeadbeef, %ebx
+      je 667b
+

So when the loop is before the "pushl.. popfl" sequence above, the
BSP would print "CPU stuck??" for the AP and continue booting by not
bringing the AP online. If I put the endless loop after that sequence,
the machine reboots. This is also nicely reproducible in kvm for even
faster and safer experimenting with the code.


Thanks.

--
diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h
index 2984a25..8abde9e 100644
--- a/arch/x86/include/asm/pgtable_32.h
+++ b/arch/x86/include/asm/pgtable_32.h
@@ -26,6 +26,7 @@ struct mm_struct;
 struct vm_area_struct;
 
 extern pgd_t swapper_pg_dir[1024];
+extern pgd_t initial_page_table[1024];
 
 static inline void pgtable_cache_init(void) { }
 static inline void check_pgt_cache(void) { }
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 7f3eba0..169be89 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -172,6 +172,4 @@ static inline void flush_tlb_kernel_range(unsigned long start,
 	flush_tlb_all();
 }
 
-extern void zap_low_mappings(bool early);
-
 #endif /* _ASM_X86_TLBFLUSH_H */
diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c
index fcc3c61..48d89d6 100644
--- a/arch/x86/kernel/acpi/sleep.c
+++ b/arch/x86/kernel/acpi/sleep.c
@@ -12,6 +12,11 @@
 #include <asm/segment.h>
 #include <asm/desc.h>
 
+#ifdef CONFIG_X86_32
+#include <asm/pgtable.h>
+#include <asm/pgtable_32.h>
+#endif
+
 #include "realmode/wakeup.h"
 #include "sleep.h"
 
@@ -90,7 +95,7 @@ int acpi_save_state_mem(void)
 
 #ifndef CONFIG_64BIT
 	header->pmode_entry = (u32)&wakeup_pmode_return;
-	header->pmode_cr3 = (u32)(swsusp_pg_dir - __PAGE_OFFSET);
+	header->pmode_cr3 = (u32)(initial_page_table - __PAGE_OFFSET);
 	saved_magic = 0x12345678;
 #else /* CONFIG_64BIT */
 	header->trampoline_segment = setup_trampoline() >> 4;
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 37c3d4b..80b0961 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -177,7 +177,7 @@ default_entry:
 #ifdef CONFIG_X86_PAE
 
 	/*
-	 * In PAE mode swapper_pg_dir is statically defined to contain enough
+	 * In PAE mode initial_page_table is statically defined to contain enough
 	 * entries to cover the VMSPLIT option (that is the top 1, 2 or 3
 	 * entries). The identity mapping is handled by pointing two PGD
 	 * entries to the first kernel PMD.
@@ -191,7 +191,7 @@ default_entry:
 	xorl %ebx,%ebx				/* %ebx is kept at zero */
 
 	movl $pa(__brk_base), %edi
-	movl $pa(swapper_pg_pmd), %edx
+	movl $pa(initial_pg_pmd), %edx
 	movl $PTE_IDENT_ATTR, %eax
 10:
 	leal PDE_IDENT_ATTR(%edi),%ecx		/* Create PMD entry */
@@ -220,14 +220,14 @@ default_entry:
 	movl %eax, pa(max_pfn_mapped)
 
 	/* Do early initialization of the fixmap area */
-	movl $pa(swapper_pg_fixmap)+PDE_IDENT_ATTR,%eax
-	movl %eax,pa(swapper_pg_pmd+0x1000*KPMDS-8)
+	movl $pa(initial_pg_fixmap)+PDE_IDENT_ATTR,%eax
+	movl %eax,pa(initial_pg_pmd+0x1000*KPMDS-8)
 #else	/* Not PAE */
 
 page_pde_offset = (__PAGE_OFFSET >> 20);
 
 	movl $pa(__brk_base), %edi
-	movl $pa(swapper_pg_dir), %edx
+	movl $pa(initial_page_table), %edx
 	movl $PTE_IDENT_ATTR, %eax
 10:
 	leal PDE_IDENT_ATTR(%edi),%ecx		/* Create PDE entry */
@@ -251,8 +251,8 @@ page_pde_offset = (__PAGE_OFFSET >> 20);
 	movl %eax, pa(max_pfn_mapped)
 
 	/* Do early initialization of the fixmap area */
-	movl $pa(swapper_pg_fixmap)+PDE_IDENT_ATTR,%eax
-	movl %eax,pa(swapper_pg_dir+0xffc)
+	movl $pa(initial_pg_fixmap)+PDE_IDENT_ATTR,%eax
+	movl %eax,pa(initial_page_table+0xffc)
 #endif
 	jmp 3f
 /*
@@ -328,7 +328,7 @@ ENTRY(startup_32_smp)
 /*
  * Enable paging
  */
-	movl $pa(swapper_pg_dir),%eax
+	movl $pa(initial_page_table),%eax
 	movl %eax,%cr3		/* set the page table pointer.. */
 	movl %cr0,%eax
 	orl  $X86_CR0_PG,%eax
@@ -615,16 +615,18 @@ ENTRY(initial_code)
 __PAGE_ALIGNED_BSS
 	.align PAGE_SIZE_asm
 #ifdef CONFIG_X86_PAE
-swapper_pg_pmd:
+initial_pg_pmd:
 	.fill 1024*KPMDS,4,0
 #else
-ENTRY(swapper_pg_dir)
+ENTRY(initial_page_table)
 	.fill 1024,4,0
 #endif
-swapper_pg_fixmap:
+initial_pg_fixmap:
 	.fill 1024,4,0
 ENTRY(empty_zero_page)
 	.fill 4096,1,0
+ENTRY(swapper_pg_dir)
+	.fill 1024,4,0
 
 /*
  * This starts the data section.
@@ -633,20 +635,20 @@ ENTRY(empty_zero_page)
 __PAGE_ALIGNED_DATA
 	/* Page-aligned for the benefit of paravirt? */
 	.align PAGE_SIZE_asm
-ENTRY(swapper_pg_dir)
-	.long	pa(swapper_pg_pmd+PGD_IDENT_ATTR),0	/* low identity map */
+ENTRY(initial_page_table)
+	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR),0	/* low identity map */
 # if KPMDS == 3
-	.long	pa(swapper_pg_pmd+PGD_IDENT_ATTR),0
-	.long	pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x1000),0
-	.long	pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x2000),0
+	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR),0
+	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR+0x1000),0
+	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR+0x2000),0
 # elif KPMDS == 2
 	.long	0,0
-	.long	pa(swapper_pg_pmd+PGD_IDENT_ATTR),0
-	.long	pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x1000),0
+	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR),0
+	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR+0x1000),0
 # elif KPMDS == 1
 	.long	0,0
 	.long	0,0
-	.long	pa(swapper_pg_pmd+PGD_IDENT_ATTR),0
+	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR),0
 # else
 #  error "Kernel PMDs should be 1, 2 or 3"
 # endif
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index b4ae4ac..ba19c9e 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -727,6 +727,15 @@ void __init setup_arch(char **cmdline_p)
 	int k8 = 0;
 
 #ifdef CONFIG_X86_32
+       /* Copy kernel address range */
+       clone_pgd_range(swapper_pg_dir + KERNEL_PGD_BOUNDARY,
+                       initial_page_table + KERNEL_PGD_BOUNDARY,
+                       min_t(unsigned long, KERNEL_PGD_PTRS,
+                             KERNEL_PGD_BOUNDARY));
+
+       load_cr3(swapper_pg_dir);
+       __flush_tlb_all();
+
 	memcpy(&boot_cpu_data, &new_cpu_data, sizeof(new_cpu_data));
 	visws_early_detect();
 #else
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c4f33b2..d1bd555 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -73,7 +73,6 @@
 
 #ifdef CONFIG_X86_32
 u8 apicid_2_node[MAX_APICID];
-static int low_mappings;
 #endif
 
 /* State of each CPU */
@@ -300,8 +299,7 @@ notrace static void __cpuinit start_secondary(void *unused)
 	}
 
 #ifdef CONFIG_X86_32
-	while (low_mappings)
-		cpu_relax();
+	load_cr3(swapper_pg_dir);
 	__flush_tlb_all();
 #endif
 
@@ -894,20 +892,7 @@ int __cpuinit native_cpu_up(unsigned int cpu)
 
 	per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
 
-#ifdef CONFIG_X86_32
-	/* init low mem mapping */
-	clone_pgd_range(swapper_pg_dir, swapper_pg_dir + KERNEL_PGD_BOUNDARY,
-		min_t(unsigned long, KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY));
-	flush_tlb_all();
-	low_mappings = 1;
-
 	err = do_boot_cpu(apicid, cpu);
-
-	zap_low_mappings(false);
-	low_mappings = 0;
-#else
-	err = do_boot_cpu(apicid, cpu);
-#endif
 	if (err) {
 		pr_debug("do_boot_cpu failed %d\n", err);
 		return -EIO;
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index bca7909..adad48b 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -548,48 +548,6 @@ static void __init pagetable_init(void)
 	permanent_kmaps_init(pgd_base);
 }
 
-#ifdef CONFIG_ACPI_SLEEP
-/*
- * ACPI suspend needs this for resume, because things like the intel-agp
- * driver might have split up a kernel 4MB mapping.
- */
-char swsusp_pg_dir[PAGE_SIZE]
-	__attribute__ ((aligned(PAGE_SIZE)));
-
-static inline void save_pg_dir(void)
-{
-	memcpy(swsusp_pg_dir, swapper_pg_dir, PAGE_SIZE);
-}
-#else /* !CONFIG_ACPI_SLEEP */
-static inline void save_pg_dir(void)
-{
-}
-#endif /* !CONFIG_ACPI_SLEEP */
-
-void zap_low_mappings(bool early)
-{
-	int i;
-
-	/*
-	 * Zap initial low-memory mappings.
-	 *
-	 * Note that "pgd_clear()" doesn't do it for
-	 * us, because pgd_clear() is a no-op on i386.
-	 */
-	for (i = 0; i < KERNEL_PGD_BOUNDARY; i++) {
-#ifdef CONFIG_X86_PAE
-		set_pgd(swapper_pg_dir+i, __pgd(1 + __pa(empty_zero_page)));
-#else
-		set_pgd(swapper_pg_dir+i, __pgd(0));
-#endif
-	}
-
-	if (early)
-		__flush_tlb();
-	else
-		flush_tlb_all();
-}
-
 pteval_t __supported_pte_mask __read_mostly = ~(_PAGE_NX | _PAGE_GLOBAL | _PAGE_IOMAP);
 EXPORT_SYMBOL_GPL(__supported_pte_mask);
 
@@ -958,9 +916,6 @@ void __init mem_init(void)
 
 	if (boot_cpu_data.wp_works_ok < 0)
 		test_wp_bit();
-
-	save_pg_dir();
-	zap_low_mappings(true);
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG



-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines
  2010-08-12 17:06             ` [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines Joerg Roedel
@ 2010-08-12 19:01               ` H. Peter Anvin
  2010-08-12 19:04                 ` Joerg Roedel
  0 siblings, 1 reply; 31+ messages in thread
From: H. Peter Anvin @ 2010-08-12 19:01 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Roedel, Joerg, Borislav Petkov, mingo, tglx, Herrmann3, Andreas,
	Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel

On 08/12/2010 10:06 AM, Joerg Roedel wrote:
> On Thu, Aug 12, 2010 at 08:34:37AM -0700, H. Peter Anvin wrote:
> 
>> Agreed.  I do have a concern about the kernel page tables not being
>> synchronized into trampoline_pg_dir (it's only an issue for 32-bit
>> !PAE), so at the very least we need to keep an eye out for it...
> 
> Great.
> 
> The synchronization problem might be real. The cpu_init function uses
> per-cpu variables which might not be present in the early synced
> tramponline page table (iirc). The change to swapper_pg_dir and the
> global tlb flush should probably be moved to the very beginning of the
> start_secondary function.
> 

Are you going to submit an updated patch, then?

	-hpa


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines
  2010-08-12 19:01               ` H. Peter Anvin
@ 2010-08-12 19:04                 ` Joerg Roedel
  2010-08-13 12:35                   ` Borislav Petkov
  0 siblings, 1 reply; 31+ messages in thread
From: Joerg Roedel @ 2010-08-12 19:04 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Roedel, Joerg, Borislav Petkov, mingo, tglx, Herrmann3, Andreas,
	Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel

On Thu, Aug 12, 2010 at 12:01:27PM -0700, H. Peter Anvin wrote:
> On 08/12/2010 10:06 AM, Joerg Roedel wrote:

> > The synchronization problem might be real. The cpu_init function uses
> > per-cpu variables which might not be present in the early synced
> > tramponline page table (iirc). The change to swapper_pg_dir and the
> > global tlb flush should probably be moved to the very beginning of the
> > start_secondary function.
> > 
> 
> Are you going to submit an updated patch, then?

Yes, I will send it next week when I am back in office and gave it some
testing.

	Joerg


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines
  2010-08-12 19:04                 ` Joerg Roedel
@ 2010-08-13 12:35                   ` Borislav Petkov
  2010-08-16 12:19                     ` Borislav Petkov
  0 siblings, 1 reply; 31+ messages in thread
From: Borislav Petkov @ 2010-08-13 12:35 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: H. Peter Anvin, Roedel, Joerg, mingo, tglx, Herrmann3, Andreas,
	Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel

From: Joerg Roedel <joro@8bytes.org>
Date: Thu, Aug 12, 2010 at 03:04:14PM -0400

> On Thu, Aug 12, 2010 at 12:01:27PM -0700, H. Peter Anvin wrote:
> > On 08/12/2010 10:06 AM, Joerg Roedel wrote:
> 
> > > The synchronization problem might be real. The cpu_init function uses
> > > per-cpu variables which might not be present in the early synced
> > > tramponline page table (iirc). The change to swapper_pg_dir and the
> > > global tlb flush should probably be moved to the very beginning of the
> > > start_secondary function.
> > > 
> > 
> > Are you going to submit an updated patch, then?
> 
> Yes, I will send it next week when I am back in office and gave it some
> testing.

currently testing...

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines
  2010-08-13 12:35                   ` Borislav Petkov
@ 2010-08-16 12:19                     ` Borislav Petkov
  2010-08-16 12:38                       ` [PATCH -v2 " Borislav Petkov
  2010-08-16 12:39                       ` [PATCH -v2 2/2] x86, cpu hotplug: Fix cpu bringup concurrency issue Borislav Petkov
  0 siblings, 2 replies; 31+ messages in thread
From: Borislav Petkov @ 2010-08-16 12:19 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Joerg Roedel, Roedel, Joerg, mingo, tglx, Herrmann3, Andreas,
	Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel

From: Borislav Petkov <bp@amd64.org>
Date: Fri, Aug 13, 2010 at 08:35:43AM -0400

> > > > The synchronization problem might be real. The cpu_init function uses
> > > > per-cpu variables which might not be present in the early synced
> > > > tramponline page table (iirc). The change to swapper_pg_dir and the
> > > > global tlb flush should probably be moved to the very beginning of the
> > > > start_secondary function.
> > > > 
> > > 
> > > Are you going to submit an updated patch, then?
> > 
> > Yes, I will send it next week when I am back in office and gave it some
> > testing.
> 
> currently testing...

Updated versions survived almost 48h of uninterrupted hammering, I'm
sending them as a reply to this message...

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH -v2 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines
  2010-08-16 12:19                     ` Borislav Petkov
@ 2010-08-16 12:38                       ` Borislav Petkov
  2010-08-18 18:41                         ` H. Peter Anvin
  2010-08-18 20:55                         ` [tip:x86/urgent] x86-32: Separate 1:1 pagetables from swapper_pg_dir tip-bot for Joerg Roedel
  2010-08-16 12:39                       ` [PATCH -v2 2/2] x86, cpu hotplug: Fix cpu bringup concurrency issue Borislav Petkov
  1 sibling, 2 replies; 31+ messages in thread
From: Borislav Petkov @ 2010-08-16 12:38 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Joerg Roedel, Roedel, Joerg, mingo, tglx, Herrmann3, Andreas,
	Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel

From: Joerg Roedel <joerg.roedel@amd.com>

This patch fixes machine crashes which occur when heavily exercising the
CPU hotplug codepaths on a 32-bit kernel. These crashes are caused by
AMD Erratum 383 and result in a fatal machine check exception. Here's
the scenario:

1. On 32-bit, the swapper_pg_dir page table is used as the initial page
table for booting a secondary CPU.

2. To make this work, swapper_pg_dir needs a direct mapping of physical
memory in it (the low mappings). By adding those low, large page (2M)
mappings (PAE kernel), we create the necessary conditions for Erratum
383 to occur.

3. Other CPUs which do not participate in the off- and onlining game may
use swapper_pg_dir while the low mappings are present (when leave_mm is
called). For all steps below, the CPU referred to is a CPU that is using
swapper_pg_dir, and not the CPU which is being onlined.

4. The presence of the low mappings in swapper_pg_dir can result
in TLB entries for addresses below __PAGE_OFFSET to be established
speculatively. These TLB entries are marked global and large.

5. When the CPU with such TLB entry switches to another page table, this
TLB entry remains because it is global.

6. The process then generates an access to an address covered by the
above TLB entry but there is a permission mismatch - the TLB entry
covers a large global page not accessible to userspace.

7. Due to this permission mismatch a new 4kb, user TLB entry gets
established. Further, Erratum 383 provides for a small window of time
where both TLB entries are present. This results in an uncorrectable
machine check exception signalling a TLB multimatch which panics the
machine.

There are two ways to fix this issue:

        1. Always do a global TLB flush when a new cr3 is loaded and the
        old page table was swapper_pg_dir. I consider this a hack hard
        to understand and with performance implications

        2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit
        does.

This patch implements solution 2. It introduces a trampoline_pg_dir
which has the same layout as swapper_pg_dir with low_mappings. This page
table is used as the initial page table of the booting CPU. Later in the
bringup process, it switches to swapper_pg_dir and does a global TLB
flush. This fixes the crashes in our test cases.

-v2: switch to swapper_pg_dir right after entering start_secondary() so
that we are able to access percpu data which might not be mapped in the
trampoline page table.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
 arch/x86/include/asm/pgtable_32.h |    1 +
 arch/x86/include/asm/trampoline.h |    3 +++
 arch/x86/kernel/head_32.S         |    8 +++++++-
 arch/x86/kernel/setup.c           |    2 ++
 arch/x86/kernel/smpboot.c         |   32 +++++++++++++-------------------
 arch/x86/kernel/trampoline.c      |   18 ++++++++++++++++++
 6 files changed, 44 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h
index 2984a25..f686f49 100644
--- a/arch/x86/include/asm/pgtable_32.h
+++ b/arch/x86/include/asm/pgtable_32.h
@@ -26,6 +26,7 @@ struct mm_struct;
 struct vm_area_struct;
 
 extern pgd_t swapper_pg_dir[1024];
+extern pgd_t trampoline_pg_dir[1024];
 
 static inline void pgtable_cache_init(void) { }
 static inline void check_pgt_cache(void) { }
diff --git a/arch/x86/include/asm/trampoline.h b/arch/x86/include/asm/trampoline.h
index cb507bb..8f78fdf 100644
--- a/arch/x86/include/asm/trampoline.h
+++ b/arch/x86/include/asm/trampoline.h
@@ -13,14 +13,17 @@ extern unsigned char *trampoline_base;
 
 extern unsigned long init_rsp;
 extern unsigned long initial_code;
+extern unsigned long initial_page_table;
 extern unsigned long initial_gs;
 
 #define TRAMPOLINE_SIZE roundup(trampoline_end - trampoline_data, PAGE_SIZE)
 
 extern unsigned long setup_trampoline(void);
+extern void __init setup_trampoline_page_table(void);
 extern void __init reserve_trampoline_memory(void);
 #else
 static inline void reserve_trampoline_memory(void) {};
+extern void __init setup_trampoline_page_table(void) {};
 #endif /* CONFIG_X86_TRAMPOLINE */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 37c3d4b..75e3981 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -328,7 +328,7 @@ ENTRY(startup_32_smp)
 /*
  * Enable paging
  */
-	movl $pa(swapper_pg_dir),%eax
+	movl pa(initial_page_table), %eax
 	movl %eax,%cr3		/* set the page table pointer.. */
 	movl %cr0,%eax
 	orl  $X86_CR0_PG,%eax
@@ -608,6 +608,8 @@ ignore_int:
 .align 4
 ENTRY(initial_code)
 	.long i386_start_kernel
+ENTRY(initial_page_table)
+	.long pa(swapper_pg_dir)
 
 /*
  * BSS section
@@ -623,6 +625,10 @@ ENTRY(swapper_pg_dir)
 #endif
 swapper_pg_fixmap:
 	.fill 1024,4,0
+#ifdef CONFIG_X86_TRAMPOLINE
+ENTRY(trampoline_pg_dir)
+	.fill 1024,4,0
+#endif
 ENTRY(empty_zero_page)
 	.fill 4096,1,0
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index b4ae4ac..6600cfd 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1008,6 +1008,8 @@ void __init setup_arch(char **cmdline_p)
 	paging_init();
 	x86_init.paging.pagetable_setup_done(swapper_pg_dir);
 
+	setup_trampoline_page_table();
+
 	tboot_probe();
 
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c4f33b2..2d148a9 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -73,7 +73,6 @@
 
 #ifdef CONFIG_X86_32
 u8 apicid_2_node[MAX_APICID];
-static int low_mappings;
 #endif
 
 /* State of each CPU */
@@ -281,6 +280,18 @@ notrace static void __cpuinit start_secondary(void *unused)
 	 * fragile that we want to limit the things done here to the
 	 * most necessary things.
 	 */
+
+#ifdef CONFIG_X86_32
+	/*
+	 * Switch away from the trampoline page-table
+	 *
+	 * Do this before cpu_init() because it needs to access per-cpu
+	 * data which may not be mapped in the trampoline page-table.
+	 */
+	load_cr3(swapper_pg_dir);
+	__flush_tlb_all();
+#endif
+
 	vmi_bringup();
 	cpu_init();
 	preempt_disable();
@@ -299,12 +310,6 @@ notrace static void __cpuinit start_secondary(void *unused)
 		legacy_pic->chip->unmask(0);
 	}
 
-#ifdef CONFIG_X86_32
-	while (low_mappings)
-		cpu_relax();
-	__flush_tlb_all();
-#endif
-
 	/* This must be done before setting cpu_online_mask */
 	set_cpu_sibling_map(raw_smp_processor_id());
 	wmb();
@@ -754,6 +759,7 @@ do_rest:
 #ifdef CONFIG_X86_32
 	/* Stack for startup_32 can be just as for start_secondary onwards */
 	irq_ctx_init(cpu);
+	initial_page_table = __pa(&trampoline_pg_dir);
 #else
 	clear_tsk_thread_flag(c_idle.idle, TIF_FORK);
 	initial_gs = per_cpu_offset(cpu);
@@ -894,20 +900,8 @@ int __cpuinit native_cpu_up(unsigned int cpu)
 
 	per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
 
-#ifdef CONFIG_X86_32
-	/* init low mem mapping */
-	clone_pgd_range(swapper_pg_dir, swapper_pg_dir + KERNEL_PGD_BOUNDARY,
-		min_t(unsigned long, KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY));
-	flush_tlb_all();
-	low_mappings = 1;
-
 	err = do_boot_cpu(apicid, cpu);
 
-	zap_low_mappings(false);
-	low_mappings = 0;
-#else
-	err = do_boot_cpu(apicid, cpu);
-#endif
 	if (err) {
 		pr_debug("do_boot_cpu failed %d\n", err);
 		return -EIO;
diff --git a/arch/x86/kernel/trampoline.c b/arch/x86/kernel/trampoline.c
index c652ef6..a874495 100644
--- a/arch/x86/kernel/trampoline.c
+++ b/arch/x86/kernel/trampoline.c
@@ -1,6 +1,7 @@
 #include <linux/io.h>
 
 #include <asm/trampoline.h>
+#include <asm/pgtable.h>
 #include <asm/e820.h>
 
 #if defined(CONFIG_X86_64) && defined(CONFIG_ACPI_SLEEP)
@@ -37,3 +38,20 @@ unsigned long __trampinit setup_trampoline(void)
 	memcpy(trampoline_base, trampoline_data, TRAMPOLINE_SIZE);
 	return virt_to_phys(trampoline_base);
 }
+
+void __init setup_trampoline_page_table(void)
+{
+#ifdef CONFIG_X86_32
+	/* Copy kernel address range */
+	clone_pgd_range(trampoline_pg_dir + KERNEL_PGD_BOUNDARY,
+			swapper_pg_dir + KERNEL_PGD_BOUNDARY,
+			min_t(unsigned long, KERNEL_PGD_PTRS,
+			      KERNEL_PGD_BOUNDARY));
+
+	/* Initialize low mappings */
+	clone_pgd_range(trampoline_pg_dir,
+			swapper_pg_dir + KERNEL_PGD_BOUNDARY,
+			min_t(unsigned long, KERNEL_PGD_PTRS,
+			      KERNEL_PGD_BOUNDARY));
+#endif
+}
-- 
1.7.1


-- 
Regards/Gruss,
Boris.

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH -v2 2/2] x86, cpu hotplug: Fix cpu bringup concurrency issue
  2010-08-16 12:19                     ` Borislav Petkov
  2010-08-16 12:38                       ` [PATCH -v2 " Borislav Petkov
@ 2010-08-16 12:39                       ` Borislav Petkov
  2010-08-18 19:03                         ` H. Peter Anvin
  1 sibling, 1 reply; 31+ messages in thread
From: Borislav Petkov @ 2010-08-16 12:39 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Joerg Roedel, Roedel, Joerg, mingo, tglx, Herrmann3, Andreas,
	Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel

When testing cpu hotplug code on 32-bit we kept hitting the "CPU%d:
Stuck ??" message due to multiple cores concurrently accessing the
cpu_callin_mask, among others.

Since these codepaths are not protected from concurrent access due to
the fact that there's no sane reason for making an already complex
code unnecessarily more complex - we hit the issue only when insanely
switching cores off- and online - serialize hotplugging cores on the
sysfs level and be done with it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
 arch/x86/Kconfig          |    6 ++++++
 arch/x86/kernel/smpboot.c |   19 +++++++++++++++++++
 2 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index dcb0593..1598663 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -58,6 +58,7 @@ config X86
 	select ANON_INODES
 	select HAVE_ARCH_KMEMCHECK
 	select HAVE_USER_RETURN_NOTIFIER
+	select ARCH_CPU_PROBE_RELEASE
 
 config INSTRUCTION_DECODER
 	def_bool (KPROBES || PERF_EVENTS)
@@ -247,6 +248,11 @@ config ARCH_HWEIGHT_CFLAGS
 
 config KTIME_SCALAR
 	def_bool X86_32
+
+config ARCH_CPU_PROBE_RELEASE
+	def_bool y
+	depends on HOTPLUG_CPU
+
 source "init/Kconfig"
 source "kernel/Kconfig.freezer"
 
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 2d148a9..733ad2b 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -90,6 +90,25 @@ DEFINE_PER_CPU(int, cpu_state) = { 0 };
 static DEFINE_PER_CPU(struct task_struct *, idle_thread_array);
 #define get_idle_for_cpu(x)      (per_cpu(idle_thread_array, x))
 #define set_idle_for_cpu(x, p)   (per_cpu(idle_thread_array, x) = (p))
+
+/*
+ * We need this for trampoline_base protection from concurrent accesses when
+ * off- and onlining cores wildly.
+ */
+static DEFINE_MUTEX(x86_cpu_hotplug_driver_mutex);
+
+void cpu_hotplug_driver_lock()
+{
+	mutex_lock(&x86_cpu_hotplug_driver_mutex);
+}
+
+void cpu_hotplug_driver_unlock()
+{
+	mutex_unlock(&x86_cpu_hotplug_driver_mutex);
+}
+
+ssize_t arch_cpu_probe(const char *buf, size_t count) { return -1; }
+ssize_t arch_cpu_release(const char *buf, size_t count) { return -1; }
 #else
 static struct task_struct *idle_thread_array[NR_CPUS] __cpuinitdata ;
 #define get_idle_for_cpu(x)      (idle_thread_array[(x)])
-- 
1.7.1


-- 
Regards/Gruss,
Boris.

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH -v2 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines
  2010-08-16 12:38                       ` [PATCH -v2 " Borislav Petkov
@ 2010-08-18 18:41                         ` H. Peter Anvin
  2010-08-18 19:09                           ` Borislav Petkov
  2010-08-18 20:55                           ` [tip:x86/urgent] x86-32: Fix dummy trampoline-related inline stubs tip-bot for H. Peter Anvin
  2010-08-18 20:55                         ` [tip:x86/urgent] x86-32: Separate 1:1 pagetables from swapper_pg_dir tip-bot for Joerg Roedel
  1 sibling, 2 replies; 31+ messages in thread
From: H. Peter Anvin @ 2010-08-18 18:41 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Joerg Roedel, Roedel, Joerg, mingo, tglx, Herrmann3, Andreas,
	Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel

On 08/16/2010 05:38 AM, Borislav Petkov wrote:
>  #else
>  static inline void reserve_trampoline_memory(void) {};
> +extern void __init setup_trampoline_page_table(void) {};
>  #endif /* CONFIG_X86_TRAMPOLINE */

This breaks compilation in the !CONFIG_X86_TRAMPOLINE case; I presume
you meant:

static inline void setup_trampoline_page_table(void) {}

... here too?

(No semicolon after a function body!)

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH -v2 2/2] x86, cpu hotplug: Fix cpu bringup concurrency issue
  2010-08-16 12:39                       ` [PATCH -v2 2/2] x86, cpu hotplug: Fix cpu bringup concurrency issue Borislav Petkov
@ 2010-08-18 19:03                         ` H. Peter Anvin
  2010-08-18 19:28                           ` Borislav Petkov
  0 siblings, 1 reply; 31+ messages in thread
From: H. Peter Anvin @ 2010-08-18 19:03 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Joerg Roedel, Roedel, Joerg, mingo, tglx, Herrmann3, Andreas,
	Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel

On 08/16/2010 05:39 AM, Borislav Petkov wrote:
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index dcb0593..1598663 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -58,6 +58,7 @@ config X86
>  	select ANON_INODES
>  	select HAVE_ARCH_KMEMCHECK
>  	select HAVE_USER_RETURN_NOTIFIER
> +	select ARCH_CPU_PROBE_RELEASE
>  

warning: (X86) selects ARCH_CPU_PROBE_RELEASE which has unmet direct
dependencies (HOTPLUG_CPU)

... and build failure when compiling without CONFIG_HOTPLUG_CPU.

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH -v2 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines
  2010-08-18 18:41                         ` H. Peter Anvin
@ 2010-08-18 19:09                           ` Borislav Petkov
  2010-08-18 20:55                           ` [tip:x86/urgent] x86-32: Fix dummy trampoline-related inline stubs tip-bot for H. Peter Anvin
  1 sibling, 0 replies; 31+ messages in thread
From: Borislav Petkov @ 2010-08-18 19:09 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Joerg Roedel, Roedel, Joerg, mingo, tglx, Herrmann3, Andreas,
	Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel

From: "H. Peter Anvin" <hpa@zytor.com>
Date: Wed, Aug 18, 2010 at 02:41:17PM -0400

> On 08/16/2010 05:38 AM, Borislav Petkov wrote:
> >  #else
> >  static inline void reserve_trampoline_memory(void) {};
> > +extern void __init setup_trampoline_page_table(void) {};
> >  #endif /* CONFIG_X86_TRAMPOLINE */
> 
> This breaks compilation in the !CONFIG_X86_TRAMPOLINE case; I presume
> you meant:
> 
> static inline void setup_trampoline_page_table(void) {}
> 
> ... here too?
>
> (No semicolon after a function body!)

Yeah, thanks for catching that - this should be a static inline stub in
the !CONFIG_X86_TRAMPOLINE case.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH -v2 2/2] x86, cpu hotplug: Fix cpu bringup concurrency issue
  2010-08-18 19:03                         ` H. Peter Anvin
@ 2010-08-18 19:28                           ` Borislav Petkov
  2010-08-18 20:04                             ` H. Peter Anvin
  0 siblings, 1 reply; 31+ messages in thread
From: Borislav Petkov @ 2010-08-18 19:28 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Joerg Roedel, Roedel, Joerg, mingo, tglx, Herrmann3, Andreas,
	Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel

From: "H. Peter Anvin" <hpa@zytor.com>
Date: Wed, Aug 18, 2010 at 03:03:33PM -0400

> On 08/16/2010 05:39 AM, Borislav Petkov wrote:
> > 
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index dcb0593..1598663 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -58,6 +58,7 @@ config X86
> >  	select ANON_INODES
> >  	select HAVE_ARCH_KMEMCHECK
> >  	select HAVE_USER_RETURN_NOTIFIER
> > +	select ARCH_CPU_PROBE_RELEASE
> >  
> 
> warning: (X86) selects ARCH_CPU_PROBE_RELEASE which has unmet direct
> dependencies (HOTPLUG_CPU)

well I got 

+config ARCH_CPU_PROBE_RELEASE
+       def_bool y
+       depends on HOTPLUG_CPU

but I guess ARCH_CPU_PROBE_RELEASE gets selected still. Why isn't that
"depends on" forcing it, I guess select is stronger.

> ... and build failure when compiling without CONFIG_HOTPLUG_CPU.

This hunk fixes the !CONFIG_HOTPLUG_CPU build here but I should've
caught it earlier, shame on me. Sorry for the inconvenience, can you
please drop this second patch for now, I'll send you a revised version
tomorrow.

Thanks.

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 251acea..273cfb9 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -261,7 +261,7 @@ int __init cpu_dev_init(void)
 }
 
 static struct sysdev_class_attribute *cpu_sysdev_class_attrs[] = {
-#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+#if defined(CONFIG_ARCH_CPU_PROBE_RELEASE) && defined(CONFIG_HOTPLUG_CPU)
        &attr_probe,
        &attr_release,
 #endif


-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH -v2 2/2] x86, cpu hotplug: Fix cpu bringup concurrency issue
  2010-08-18 19:28                           ` Borislav Petkov
@ 2010-08-18 20:04                             ` H. Peter Anvin
  2010-08-19 18:10                               ` [PATCH -v2.1 " Borislav Petkov
  0 siblings, 1 reply; 31+ messages in thread
From: H. Peter Anvin @ 2010-08-18 20:04 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Joerg Roedel, Roedel, Joerg, mingo, tglx, Herrmann3, Andreas,
	Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel

On 08/18/2010 12:28 PM, Borislav Petkov wrote:
>>
>> warning: (X86) selects ARCH_CPU_PROBE_RELEASE which has unmet direct
>> dependencies (HOTPLUG_CPU)
> 
> well I got 
> 
> +config ARCH_CPU_PROBE_RELEASE
> +       def_bool y
> +       depends on HOTPLUG_CPU
> 
> but I guess ARCH_CPU_PROBE_RELEASE gets selected still. Why isn't that
> "depends on" forcing it, I guess select is stronger.
> 
>> ... and build failure when compiling without CONFIG_HOTPLUG_CPU.
> 
> This hunk fixes the !CONFIG_HOTPLUG_CPU build here but I should've
> caught it earlier, shame on me. Sorry for the inconvenience, can you
> please drop this second patch for now, I'll send you a revised version
> tomorrow.
> 

I would prefer to not continue to have a piece of Kconfig code which
warns, so I'll hold off on this until I have your revised patch.

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [tip:x86/urgent] x86-32: Separate 1:1 pagetables from swapper_pg_dir
  2010-08-16 12:38                       ` [PATCH -v2 " Borislav Petkov
  2010-08-18 18:41                         ` H. Peter Anvin
@ 2010-08-18 20:55                         ` tip-bot for Joerg Roedel
  1 sibling, 0 replies; 31+ messages in thread
From: tip-bot for Joerg Roedel @ 2010-08-18 20:55 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, joerg.roedel, tglx, borislav.petkov

Commit-ID:  fd89a137924e0710078c3ae855e7cec1c43cb845
Gitweb:     http://git.kernel.org/tip/fd89a137924e0710078c3ae855e7cec1c43cb845
Author:     Joerg Roedel <joerg.roedel@amd.com>
AuthorDate: Mon, 16 Aug 2010 14:38:33 +0200
Committer:  H. Peter Anvin <hpa@zytor.com>
CommitDate: Wed, 18 Aug 2010 09:17:20 -0700

x86-32: Separate 1:1 pagetables from swapper_pg_dir

This patch fixes machine crashes which occur when heavily exercising the
CPU hotplug codepaths on a 32-bit kernel. These crashes are caused by
AMD Erratum 383 and result in a fatal machine check exception. Here's
the scenario:

1. On 32-bit, the swapper_pg_dir page table is used as the initial page
table for booting a secondary CPU.

2. To make this work, swapper_pg_dir needs a direct mapping of physical
memory in it (the low mappings). By adding those low, large page (2M)
mappings (PAE kernel), we create the necessary conditions for Erratum
383 to occur.

3. Other CPUs which do not participate in the off- and onlining game may
use swapper_pg_dir while the low mappings are present (when leave_mm is
called). For all steps below, the CPU referred to is a CPU that is using
swapper_pg_dir, and not the CPU which is being onlined.

4. The presence of the low mappings in swapper_pg_dir can result
in TLB entries for addresses below __PAGE_OFFSET to be established
speculatively. These TLB entries are marked global and large.

5. When the CPU with such TLB entry switches to another page table, this
TLB entry remains because it is global.

6. The process then generates an access to an address covered by the
above TLB entry but there is a permission mismatch - the TLB entry
covers a large global page not accessible to userspace.

7. Due to this permission mismatch a new 4kb, user TLB entry gets
established. Further, Erratum 383 provides for a small window of time
where both TLB entries are present. This results in an uncorrectable
machine check exception signalling a TLB multimatch which panics the
machine.

There are two ways to fix this issue:

        1. Always do a global TLB flush when a new cr3 is loaded and the
        old page table was swapper_pg_dir. I consider this a hack hard
        to understand and with performance implications

        2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit
        does.

This patch implements solution 2. It introduces a trampoline_pg_dir
which has the same layout as swapper_pg_dir with low_mappings. This page
table is used as the initial page table of the booting CPU. Later in the
bringup process, it switches to swapper_pg_dir and does a global TLB
flush. This fixes the crashes in our test cases.

-v2: switch to swapper_pg_dir right after entering start_secondary() so
that we are able to access percpu data which might not be mapped in the
trampoline page table.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
LKML-Reference: <20100816123833.GB28147@aftab>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
 arch/x86/include/asm/pgtable_32.h |    1 +
 arch/x86/include/asm/trampoline.h |    3 +++
 arch/x86/kernel/head_32.S         |    8 +++++++-
 arch/x86/kernel/setup.c           |    2 ++
 arch/x86/kernel/smpboot.c         |   32 +++++++++++++-------------------
 arch/x86/kernel/trampoline.c      |   18 ++++++++++++++++++
 6 files changed, 44 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h
index 2984a25..f686f49 100644
--- a/arch/x86/include/asm/pgtable_32.h
+++ b/arch/x86/include/asm/pgtable_32.h
@@ -26,6 +26,7 @@ struct mm_struct;
 struct vm_area_struct;
 
 extern pgd_t swapper_pg_dir[1024];
+extern pgd_t trampoline_pg_dir[1024];
 
 static inline void pgtable_cache_init(void) { }
 static inline void check_pgt_cache(void) { }
diff --git a/arch/x86/include/asm/trampoline.h b/arch/x86/include/asm/trampoline.h
index cb507bb..8f78fdf 100644
--- a/arch/x86/include/asm/trampoline.h
+++ b/arch/x86/include/asm/trampoline.h
@@ -13,14 +13,17 @@ extern unsigned char *trampoline_base;
 
 extern unsigned long init_rsp;
 extern unsigned long initial_code;
+extern unsigned long initial_page_table;
 extern unsigned long initial_gs;
 
 #define TRAMPOLINE_SIZE roundup(trampoline_end - trampoline_data, PAGE_SIZE)
 
 extern unsigned long setup_trampoline(void);
+extern void __init setup_trampoline_page_table(void);
 extern void __init reserve_trampoline_memory(void);
 #else
 static inline void reserve_trampoline_memory(void) {};
+extern void __init setup_trampoline_page_table(void) {};
 #endif /* CONFIG_X86_TRAMPOLINE */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index ff4c453..fa8c1b8 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -334,7 +334,7 @@ ENTRY(startup_32_smp)
 /*
  * Enable paging
  */
-	movl $pa(swapper_pg_dir),%eax
+	movl pa(initial_page_table), %eax
 	movl %eax,%cr3		/* set the page table pointer.. */
 	movl %cr0,%eax
 	orl  $X86_CR0_PG,%eax
@@ -614,6 +614,8 @@ ignore_int:
 .align 4
 ENTRY(initial_code)
 	.long i386_start_kernel
+ENTRY(initial_page_table)
+	.long pa(swapper_pg_dir)
 
 /*
  * BSS section
@@ -629,6 +631,10 @@ ENTRY(swapper_pg_dir)
 #endif
 swapper_pg_fixmap:
 	.fill 1024,4,0
+#ifdef CONFIG_X86_TRAMPOLINE
+ENTRY(trampoline_pg_dir)
+	.fill 1024,4,0
+#endif
 ENTRY(empty_zero_page)
 	.fill 4096,1,0
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index b008e78..c3a4fbb 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1014,6 +1014,8 @@ void __init setup_arch(char **cmdline_p)
 	paging_init();
 	x86_init.paging.pagetable_setup_done(swapper_pg_dir);
 
+	setup_trampoline_page_table();
+
 	tboot_probe();
 
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index a5e928b..abf4a86 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -73,7 +73,6 @@
 
 #ifdef CONFIG_X86_32
 u8 apicid_2_node[MAX_APICID];
-static int low_mappings;
 #endif
 
 /* State of each CPU */
@@ -281,6 +280,18 @@ notrace static void __cpuinit start_secondary(void *unused)
 	 * fragile that we want to limit the things done here to the
 	 * most necessary things.
 	 */
+
+#ifdef CONFIG_X86_32
+	/*
+	 * Switch away from the trampoline page-table
+	 *
+	 * Do this before cpu_init() because it needs to access per-cpu
+	 * data which may not be mapped in the trampoline page-table.
+	 */
+	load_cr3(swapper_pg_dir);
+	__flush_tlb_all();
+#endif
+
 	vmi_bringup();
 	cpu_init();
 	preempt_disable();
@@ -299,12 +310,6 @@ notrace static void __cpuinit start_secondary(void *unused)
 		legacy_pic->chip->unmask(0);
 	}
 
-#ifdef CONFIG_X86_32
-	while (low_mappings)
-		cpu_relax();
-	__flush_tlb_all();
-#endif
-
 	/* This must be done before setting cpu_online_mask */
 	set_cpu_sibling_map(raw_smp_processor_id());
 	wmb();
@@ -750,6 +755,7 @@ do_rest:
 #ifdef CONFIG_X86_32
 	/* Stack for startup_32 can be just as for start_secondary onwards */
 	irq_ctx_init(cpu);
+	initial_page_table = __pa(&trampoline_pg_dir);
 #else
 	clear_tsk_thread_flag(c_idle.idle, TIF_FORK);
 	initial_gs = per_cpu_offset(cpu);
@@ -897,20 +903,8 @@ int __cpuinit native_cpu_up(unsigned int cpu)
 
 	per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
 
-#ifdef CONFIG_X86_32
-	/* init low mem mapping */
-	clone_pgd_range(swapper_pg_dir, swapper_pg_dir + KERNEL_PGD_BOUNDARY,
-		min_t(unsigned long, KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY));
-	flush_tlb_all();
-	low_mappings = 1;
-
 	err = do_boot_cpu(apicid, cpu);
 
-	zap_low_mappings(false);
-	low_mappings = 0;
-#else
-	err = do_boot_cpu(apicid, cpu);
-#endif
 	if (err) {
 		pr_debug("do_boot_cpu failed %d\n", err);
 		return -EIO;
diff --git a/arch/x86/kernel/trampoline.c b/arch/x86/kernel/trampoline.c
index c652ef6..a874495 100644
--- a/arch/x86/kernel/trampoline.c
+++ b/arch/x86/kernel/trampoline.c
@@ -1,6 +1,7 @@
 #include <linux/io.h>
 
 #include <asm/trampoline.h>
+#include <asm/pgtable.h>
 #include <asm/e820.h>
 
 #if defined(CONFIG_X86_64) && defined(CONFIG_ACPI_SLEEP)
@@ -37,3 +38,20 @@ unsigned long __trampinit setup_trampoline(void)
 	memcpy(trampoline_base, trampoline_data, TRAMPOLINE_SIZE);
 	return virt_to_phys(trampoline_base);
 }
+
+void __init setup_trampoline_page_table(void)
+{
+#ifdef CONFIG_X86_32
+	/* Copy kernel address range */
+	clone_pgd_range(trampoline_pg_dir + KERNEL_PGD_BOUNDARY,
+			swapper_pg_dir + KERNEL_PGD_BOUNDARY,
+			min_t(unsigned long, KERNEL_PGD_PTRS,
+			      KERNEL_PGD_BOUNDARY));
+
+	/* Initialize low mappings */
+	clone_pgd_range(trampoline_pg_dir,
+			swapper_pg_dir + KERNEL_PGD_BOUNDARY,
+			min_t(unsigned long, KERNEL_PGD_PTRS,
+			      KERNEL_PGD_BOUNDARY));
+#endif
+}

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [tip:x86/urgent] x86-32: Fix dummy trampoline-related inline stubs
  2010-08-18 18:41                         ` H. Peter Anvin
  2010-08-18 19:09                           ` Borislav Petkov
@ 2010-08-18 20:55                           ` tip-bot for H. Peter Anvin
  1 sibling, 0 replies; 31+ messages in thread
From: tip-bot for H. Peter Anvin @ 2010-08-18 20:55 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, joerg.roedel, tglx, borislav.petkov

Commit-ID:  8848a91068c018bc91f597038a0f41462a0f88a4
Gitweb:     http://git.kernel.org/tip/8848a91068c018bc91f597038a0f41462a0f88a4
Author:     H. Peter Anvin <hpa@zytor.com>
AuthorDate: Wed, 18 Aug 2010 11:42:23 -0700
Committer:  H. Peter Anvin <hpa@zytor.com>
CommitDate: Wed, 18 Aug 2010 12:42:24 -0700

x86-32: Fix dummy trampoline-related inline stubs

Fix dummy inline stubs for trampoline-related functions when no
trampolines exist (until we get rid of the no-trampoline case
entirely.)

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <4C6C294D.3030404@zytor.com>
---
 arch/x86/include/asm/trampoline.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/trampoline.h b/arch/x86/include/asm/trampoline.h
index 8f78fdf..4dde797 100644
--- a/arch/x86/include/asm/trampoline.h
+++ b/arch/x86/include/asm/trampoline.h
@@ -22,8 +22,8 @@ extern unsigned long setup_trampoline(void);
 extern void __init setup_trampoline_page_table(void);
 extern void __init reserve_trampoline_memory(void);
 #else
-static inline void reserve_trampoline_memory(void) {};
-extern void __init setup_trampoline_page_table(void) {};
+static inline void setup_trampoline_page_table(void) {}
+static inline void reserve_trampoline_memory(void) {}
 #endif /* CONFIG_X86_TRAMPOLINE */
 
 #endif /* __ASSEMBLY__ */

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH -v2.1 2/2] x86, cpu hotplug: Fix cpu bringup concurrency issue
  2010-08-18 20:04                             ` H. Peter Anvin
@ 2010-08-19 18:10                               ` Borislav Petkov
  2010-08-19 22:58                                 ` [tip:x86/urgent] x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues tip-bot for Borislav Petkov
  0 siblings, 1 reply; 31+ messages in thread
From: Borislav Petkov @ 2010-08-19 18:10 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Joerg Roedel, Roedel, Joerg, mingo, tglx, Herrmann3, Andreas,
	Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel


When testing cpu hotplug code on 32-bit we kept hitting the "CPU%d:
Stuck ??" message due to multiple cores concurrently accessing the
cpu_callin_mask, among others.

Since these codepaths are not protected from concurrent access due to
the fact that there's no sane reason for making an already complex
code unnecessarily more complex - we hit the issue only when insanely
switching cores off- and online - serialize hotplugging cores on the
sysfs level and be done with it.

[ v2.1: fix !HOTPLUG_CPU build ]

Cc: <stable@kernel.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
 arch/x86/Kconfig          |    5 +++++
 arch/x86/kernel/smpboot.c |   19 +++++++++++++++++++
 2 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a84fc34..ac7827f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -245,6 +245,11 @@ config ARCH_HWEIGHT_CFLAGS
 
 config KTIME_SCALAR
 	def_bool X86_32
+
+config ARCH_CPU_PROBE_RELEASE
+	def_bool y
+	depends on HOTPLUG_CPU
+
 source "init/Kconfig"
 source "kernel/Kconfig.freezer"
 
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index abf4a86..8b3bfc4 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -90,6 +90,25 @@ DEFINE_PER_CPU(int, cpu_state) = { 0 };
 static DEFINE_PER_CPU(struct task_struct *, idle_thread_array);
 #define get_idle_for_cpu(x)      (per_cpu(idle_thread_array, x))
 #define set_idle_for_cpu(x, p)   (per_cpu(idle_thread_array, x) = (p))
+
+/*
+ * We need this for trampoline_base protection from concurrent accesses when
+ * off- and onlining cores wildly.
+ */
+static DEFINE_MUTEX(x86_cpu_hotplug_driver_mutex);
+
+void cpu_hotplug_driver_lock()
+{
+        mutex_lock(&x86_cpu_hotplug_driver_mutex);
+}
+
+void cpu_hotplug_driver_unlock()
+{
+        mutex_unlock(&x86_cpu_hotplug_driver_mutex);
+}
+
+ssize_t arch_cpu_probe(const char *buf, size_t count) { return -1; }
+ssize_t arch_cpu_release(const char *buf, size_t count) { return -1; }
 #else
 static struct task_struct *idle_thread_array[NR_CPUS] __cpuinitdata ;
 #define get_idle_for_cpu(x)      (idle_thread_array[(x)])
-- 
1.7.1

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [tip:x86/urgent] x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues
  2010-08-19 18:10                               ` [PATCH -v2.1 " Borislav Petkov
@ 2010-08-19 22:58                                 ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 31+ messages in thread
From: tip-bot for Borislav Petkov @ 2010-08-19 22:58 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, bp, stable, tglx, hpa, borislav.petkov

Commit-ID:  d7c53c9e822a4fefa13a0cae76f3190bfd0d5c11
Gitweb:     http://git.kernel.org/tip/d7c53c9e822a4fefa13a0cae76f3190bfd0d5c11
Author:     Borislav Petkov <bp@amd64.org>
AuthorDate: Thu, 19 Aug 2010 20:10:29 +0200
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Thu, 19 Aug 2010 14:47:43 -0700

x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues

When testing cpu hotplug code on 32-bit we kept hitting the "CPU%d:
Stuck ??" message due to multiple cores concurrently accessing the
cpu_callin_mask, among others.

Since these codepaths are not protected from concurrent access due to
the fact that there's no sane reason for making an already complex
code unnecessarily more complex - we hit the issue only when insanely
switching cores off- and online - serialize hotplugging cores on the
sysfs level and be done with it.

[ v2.1: fix !HOTPLUG_CPU build ]

Cc: <stable@kernel.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100819181029.GC17171@aftab>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/Kconfig          |    5 +++++
 arch/x86/kernel/smpboot.c |   19 +++++++++++++++++++
 2 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a84fc34..ac7827f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -245,6 +245,11 @@ config ARCH_HWEIGHT_CFLAGS
 
 config KTIME_SCALAR
 	def_bool X86_32
+
+config ARCH_CPU_PROBE_RELEASE
+	def_bool y
+	depends on HOTPLUG_CPU
+
 source "init/Kconfig"
 source "kernel/Kconfig.freezer"
 
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index abf4a86..8b3bfc4 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -90,6 +90,25 @@ DEFINE_PER_CPU(int, cpu_state) = { 0 };
 static DEFINE_PER_CPU(struct task_struct *, idle_thread_array);
 #define get_idle_for_cpu(x)      (per_cpu(idle_thread_array, x))
 #define set_idle_for_cpu(x, p)   (per_cpu(idle_thread_array, x) = (p))
+
+/*
+ * We need this for trampoline_base protection from concurrent accesses when
+ * off- and onlining cores wildly.
+ */
+static DEFINE_MUTEX(x86_cpu_hotplug_driver_mutex);
+
+void cpu_hotplug_driver_lock()
+{
+        mutex_lock(&x86_cpu_hotplug_driver_mutex);
+}
+
+void cpu_hotplug_driver_unlock()
+{
+        mutex_unlock(&x86_cpu_hotplug_driver_mutex);
+}
+
+ssize_t arch_cpu_probe(const char *buf, size_t count) { return -1; }
+ssize_t arch_cpu_release(const char *buf, size_t count) { return -1; }
 #else
 static struct task_struct *idle_thread_array[NR_CPUS] __cpuinitdata ;
 #define get_idle_for_cpu(x)      (idle_thread_array[(x)])

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Initial working version (Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines)
  2010-08-12 15:47               ` H. Peter Anvin
  2010-08-12 17:38                 ` Borislav Petkov
@ 2010-08-24  7:33                 ` Borislav Petkov
  2010-08-29 20:32                   ` [PATCH] x86-32, mm: Add an initial page table for core bootstrapping Borislav Petkov
  1 sibling, 1 reply; 31+ messages in thread
From: Borislav Petkov @ 2010-08-24  7:33 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Joerg Roedel, Roedel, Joerg, mingo, tglx,
	Herrmann3, Andreas, Seidel, Conny, Sarathy, Bhavna, greg, x86,
	linux-kernel

From: "H. Peter Anvin" <hpa@zytor.com>
Date: Thu, Aug 12, 2010 at 08:47:55AM -0700

> > I've been working on what you suggested and the patch boots on a 32-bit
> > PAE kernel but keeps rebooting on non-PAE after setting the stack_start
> > on the AP. I could send it out later for you to look at, if you like -
> > you should be able to spot what I'm missing :)...
> > 
> 
> Sounds good.  This is obviously .37-or-higher material.

Ok, here we go, the version below boots a !PAE and a PAE kernel fine
in kvm. The approach taken is, IMHO, the least intrusive in trying
to touch code assuming swapper_pg_dir as little as possible and use
initial_page_table for boot/acpi sleep/reboot only.

For example, paths like init_memory_mapping ->
kernel_physical_mapping_init -> ... which fill in the kernel mappings
into swapper_pg_dir are left unchanged by switching to swapper_pg_dir
at the beginning of setup_arch() BTW, I guess this could be done even
earlier since I copy the mappings from initial_page_table so there's no
difference between the two at the switch point.

Later, after paging_init(), synching back all swapper_pg_dir changes
into initial_page_table is done for the APs to boot and have a stack
mapping in the initial_page_table so that they don't pagefault into a
triple fault (this was the problem with the previous version, btw.)

Please take a look and let me know if I've forgotten something before I
start hammering on it on a real hardware.

Thanks.

---
diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h
index f686f49..8abde9e 100644
--- a/arch/x86/include/asm/pgtable_32.h
+++ b/arch/x86/include/asm/pgtable_32.h
@@ -26,7 +26,7 @@ struct mm_struct;
 struct vm_area_struct;
 
 extern pgd_t swapper_pg_dir[1024];
-extern pgd_t trampoline_pg_dir[1024];
+extern pgd_t initial_page_table[1024];
 
 static inline void pgtable_cache_init(void) { }
 static inline void check_pgt_cache(void) { }
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 7f3eba0..169be89 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -172,6 +172,4 @@ static inline void flush_tlb_kernel_range(unsigned long start,
 	flush_tlb_all();
 }
 
-extern void zap_low_mappings(bool early);
-
 #endif /* _ASM_X86_TLBFLUSH_H */
diff --git a/arch/x86/include/asm/trampoline.h b/arch/x86/include/asm/trampoline.h
index 4dde797..f4500fb 100644
--- a/arch/x86/include/asm/trampoline.h
+++ b/arch/x86/include/asm/trampoline.h
@@ -13,16 +13,13 @@ extern unsigned char *trampoline_base;
 
 extern unsigned long init_rsp;
 extern unsigned long initial_code;
-extern unsigned long initial_page_table;
 extern unsigned long initial_gs;
 
 #define TRAMPOLINE_SIZE roundup(trampoline_end - trampoline_data, PAGE_SIZE)
 
 extern unsigned long setup_trampoline(void);
-extern void __init setup_trampoline_page_table(void);
 extern void __init reserve_trampoline_memory(void);
 #else
-static inline void setup_trampoline_page_table(void) {}
 static inline void reserve_trampoline_memory(void) {}
 #endif /* CONFIG_X86_TRAMPOLINE */
 
diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c
index 33cec15..d04500e 100644
--- a/arch/x86/kernel/acpi/sleep.c
+++ b/arch/x86/kernel/acpi/sleep.c
@@ -12,6 +12,11 @@
 #include <asm/segment.h>
 #include <asm/desc.h>
 
+#ifdef CONFIG_X86_32
+#include <asm/pgtable.h>
+#include <asm/pgtable_32.h>
+#endif
+
 #include "realmode/wakeup.h"
 #include "sleep.h"
 
@@ -90,7 +95,7 @@ int acpi_save_state_mem(void)
 
 #ifndef CONFIG_64BIT
 	header->pmode_entry = (u32)&wakeup_pmode_return;
-	header->pmode_cr3 = (u32)(swsusp_pg_dir - __PAGE_OFFSET);
+	header->pmode_cr3 = (u32)(initial_page_table - __PAGE_OFFSET);
 	saved_magic = 0x12345678;
 #else /* CONFIG_64BIT */
 	header->trampoline_segment = setup_trampoline() >> 4;
diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index 784360c..8b9c201 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -17,6 +17,7 @@
 #include <asm/apic.h>
 #include <asm/io_apic.h>
 #include <asm/bios_ebda.h>
+#include <asm/tlbflush.h>
 
 static void __init i386_default_early_setup(void)
 {
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index fa8c1b8..005646a 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -183,7 +183,7 @@ default_entry:
 #ifdef CONFIG_X86_PAE
 
 	/*
-	 * In PAE mode swapper_pg_dir is statically defined to contain enough
+	 * In PAE mode initial_page_table is statically defined to contain enough
 	 * entries to cover the VMSPLIT option (that is the top 1, 2 or 3
 	 * entries). The identity mapping is handled by pointing two PGD
 	 * entries to the first kernel PMD.
@@ -197,7 +197,7 @@ default_entry:
 	xorl %ebx,%ebx				/* %ebx is kept at zero */
 
 	movl $pa(__brk_base), %edi
-	movl $pa(swapper_pg_pmd), %edx
+	movl $pa(initial_pg_pmd), %edx
 	movl $PTE_IDENT_ATTR, %eax
 10:
 	leal PDE_IDENT_ATTR(%edi),%ecx		/* Create PMD entry */
@@ -226,14 +226,14 @@ default_entry:
 	movl %eax, pa(max_pfn_mapped)
 
 	/* Do early initialization of the fixmap area */
-	movl $pa(swapper_pg_fixmap)+PDE_IDENT_ATTR,%eax
-	movl %eax,pa(swapper_pg_pmd+0x1000*KPMDS-8)
+	movl $pa(initial_pg_fixmap)+PDE_IDENT_ATTR,%eax
+	movl %eax,pa(initial_pg_pmd+0x1000*KPMDS-8)
 #else	/* Not PAE */
 
 page_pde_offset = (__PAGE_OFFSET >> 20);
 
 	movl $pa(__brk_base), %edi
-	movl $pa(swapper_pg_dir), %edx
+	movl $pa(initial_page_table), %edx
 	movl $PTE_IDENT_ATTR, %eax
 10:
 	leal PDE_IDENT_ATTR(%edi),%ecx		/* Create PDE entry */
@@ -257,8 +257,8 @@ page_pde_offset = (__PAGE_OFFSET >> 20);
 	movl %eax, pa(max_pfn_mapped)
 
 	/* Do early initialization of the fixmap area */
-	movl $pa(swapper_pg_fixmap)+PDE_IDENT_ATTR,%eax
-	movl %eax,pa(swapper_pg_dir+0xffc)
+	movl $pa(initial_pg_fixmap)+PDE_IDENT_ATTR,%eax
+	movl %eax,pa(initial_page_table+0xffc)
 #endif
 	jmp 3f
 /*
@@ -334,7 +334,7 @@ ENTRY(startup_32_smp)
 /*
  * Enable paging
  */
-	movl pa(initial_page_table), %eax
+	movl $pa(initial_page_table), %eax
 	movl %eax,%cr3		/* set the page table pointer.. */
 	movl %cr0,%eax
 	orl  $X86_CR0_PG,%eax
@@ -614,8 +614,6 @@ ignore_int:
 .align 4
 ENTRY(initial_code)
 	.long i386_start_kernel
-ENTRY(initial_page_table)
-	.long pa(swapper_pg_dir)
 
 /*
  * BSS section
@@ -623,20 +621,18 @@ ENTRY(initial_page_table)
 __PAGE_ALIGNED_BSS
 	.align PAGE_SIZE_asm
 #ifdef CONFIG_X86_PAE
-swapper_pg_pmd:
+initial_pg_pmd:
 	.fill 1024*KPMDS,4,0
 #else
-ENTRY(swapper_pg_dir)
+ENTRY(initial_page_table)
 	.fill 1024,4,0
 #endif
-swapper_pg_fixmap:
+initial_pg_fixmap:
 	.fill 1024,4,0
-#ifdef CONFIG_X86_TRAMPOLINE
-ENTRY(trampoline_pg_dir)
-	.fill 1024,4,0
-#endif
 ENTRY(empty_zero_page)
 	.fill 4096,1,0
+ENTRY(swapper_pg_dir)
+	.fill 1024,4,0
 
 /*
  * This starts the data section.
@@ -645,20 +641,20 @@ ENTRY(empty_zero_page)
 __PAGE_ALIGNED_DATA
 	/* Page-aligned for the benefit of paravirt? */
 	.align PAGE_SIZE_asm
-ENTRY(swapper_pg_dir)
-	.long	pa(swapper_pg_pmd+PGD_IDENT_ATTR),0	/* low identity map */
+ENTRY(initial_page_table)
+	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR),0	/* low identity map */
 # if KPMDS == 3
-	.long	pa(swapper_pg_pmd+PGD_IDENT_ATTR),0
-	.long	pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x1000),0
-	.long	pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x2000),0
+	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR),0
+	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR+0x1000),0
+	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR+0x2000),0
 # elif KPMDS == 2
 	.long	0,0
-	.long	pa(swapper_pg_pmd+PGD_IDENT_ATTR),0
-	.long	pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x1000),0
+	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR),0
+	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR+0x1000),0
 # elif KPMDS == 1
 	.long	0,0
 	.long	0,0
-	.long	pa(swapper_pg_pmd+PGD_IDENT_ATTR),0
+	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR),0
 # else
 #  error "Kernel PMDs should be 1, 2 or 3"
 # endif
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index e3af342..ce5566e 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -371,16 +371,10 @@ void machine_real_restart(const unsigned char *code, int length)
 	CMOS_WRITE(0x00, 0x8f);
 	spin_unlock(&rtc_lock);
 
-	/* Remap the kernel at virtual address zero, as well as offset zero
-	   from the kernel segment.  This assumes the kernel segment starts at
-	   virtual address PAGE_OFFSET. */
-	memcpy(swapper_pg_dir, swapper_pg_dir + KERNEL_PGD_BOUNDARY,
-		sizeof(swapper_pg_dir [0]) * KERNEL_PGD_PTRS);
-
 	/*
-	 * Use `swapper_pg_dir' as our page directory.
+	 * Switch back to the initial page table.
 	 */
-	load_cr3(swapper_pg_dir);
+	load_cr3(initial_page_table);
 
 	/* Write 0x1234 to absolute memory location 0x472.  The BIOS reads
 	   this on booting to tell it to "Bypass memory test (also warm
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index c3a4fbb..32322df 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -730,6 +730,17 @@ void __init setup_arch(char **cmdline_p)
 #ifdef CONFIG_X86_32
 	memcpy(&boot_cpu_data, &new_cpu_data, sizeof(new_cpu_data));
 	visws_early_detect();
+
+	/*
+	 * copy kernel address range established so far and switch
+	 * to the proper swapper page table
+	 */
+	clone_pgd_range(swapper_pg_dir     + KERNEL_PGD_BOUNDARY,
+			initial_page_table + KERNEL_PGD_BOUNDARY,
+			min_t(unsigned long, KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY));
+
+	load_cr3(swapper_pg_dir);
+	__flush_tlb_all();
 #else
 	printk(KERN_INFO "Command line: %s\n", boot_command_line);
 #endif
@@ -1014,7 +1025,12 @@ void __init setup_arch(char **cmdline_p)
 	paging_init();
 	x86_init.paging.pagetable_setup_done(swapper_pg_dir);
 
-	setup_trampoline_page_table();
+#ifdef CONFIG_X86_32
+	/* sync back kernel address range */
+	clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY,
+			swapper_pg_dir     + KERNEL_PGD_BOUNDARY,
+			min_t(unsigned long, KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY));
+#endif
 
 	tboot_probe();
 
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 8b3bfc4..0d86d0b 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -300,22 +300,17 @@ notrace static void __cpuinit start_secondary(void *unused)
 	 * most necessary things.
 	 */
 
-#ifdef CONFIG_X86_32
-	/*
-	 * Switch away from the trampoline page-table
-	 *
-	 * Do this before cpu_init() because it needs to access per-cpu
-	 * data which may not be mapped in the trampoline page-table.
-	 */
-	load_cr3(swapper_pg_dir);
-	__flush_tlb_all();
-#endif
-
 	vmi_bringup();
 	cpu_init();
 	preempt_disable();
 	smp_callin();
 
+#ifdef CONFIG_X86_32
+	/* switch away from the initial page table */
+	load_cr3(swapper_pg_dir);
+	__flush_tlb_all();
+#endif
+
 	/* otherwise gcc will move up smp_processor_id before the cpu_init */
 	barrier();
 	/*
@@ -774,7 +769,6 @@ do_rest:
 #ifdef CONFIG_X86_32
 	/* Stack for startup_32 can be just as for start_secondary onwards */
 	irq_ctx_init(cpu);
-	initial_page_table = __pa(&trampoline_pg_dir);
 #else
 	clear_tsk_thread_flag(c_idle.idle, TIF_FORK);
 	initial_gs = per_cpu_offset(cpu);
@@ -923,7 +917,6 @@ int __cpuinit native_cpu_up(unsigned int cpu)
 	per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
 
 	err = do_boot_cpu(apicid, cpu);
-
 	if (err) {
 		pr_debug("do_boot_cpu failed %d\n", err);
 		return -EIO;
diff --git a/arch/x86/kernel/trampoline.c b/arch/x86/kernel/trampoline.c
index a874495..f1488a3 100644
--- a/arch/x86/kernel/trampoline.c
+++ b/arch/x86/kernel/trampoline.c
@@ -38,20 +38,3 @@ unsigned long __trampinit setup_trampoline(void)
 	memcpy(trampoline_base, trampoline_data, TRAMPOLINE_SIZE);
 	return virt_to_phys(trampoline_base);
 }
-
-void __init setup_trampoline_page_table(void)
-{
-#ifdef CONFIG_X86_32
-	/* Copy kernel address range */
-	clone_pgd_range(trampoline_pg_dir + KERNEL_PGD_BOUNDARY,
-			swapper_pg_dir + KERNEL_PGD_BOUNDARY,
-			min_t(unsigned long, KERNEL_PGD_PTRS,
-			      KERNEL_PGD_BOUNDARY));
-
-	/* Initialize low mappings */
-	clone_pgd_range(trampoline_pg_dir,
-			swapper_pg_dir + KERNEL_PGD_BOUNDARY,
-			min_t(unsigned long, KERNEL_PGD_PTRS,
-			      KERNEL_PGD_BOUNDARY));
-#endif
-}
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index bca7909..adad48b 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -548,48 +548,6 @@ static void __init pagetable_init(void)
 	permanent_kmaps_init(pgd_base);
 }
 
-#ifdef CONFIG_ACPI_SLEEP
-/*
- * ACPI suspend needs this for resume, because things like the intel-agp
- * driver might have split up a kernel 4MB mapping.
- */
-char swsusp_pg_dir[PAGE_SIZE]
-	__attribute__ ((aligned(PAGE_SIZE)));
-
-static inline void save_pg_dir(void)
-{
-	memcpy(swsusp_pg_dir, swapper_pg_dir, PAGE_SIZE);
-}
-#else /* !CONFIG_ACPI_SLEEP */
-static inline void save_pg_dir(void)
-{
-}
-#endif /* !CONFIG_ACPI_SLEEP */
-
-void zap_low_mappings(bool early)
-{
-	int i;
-
-	/*
-	 * Zap initial low-memory mappings.
-	 *
-	 * Note that "pgd_clear()" doesn't do it for
-	 * us, because pgd_clear() is a no-op on i386.
-	 */
-	for (i = 0; i < KERNEL_PGD_BOUNDARY; i++) {
-#ifdef CONFIG_X86_PAE
-		set_pgd(swapper_pg_dir+i, __pgd(1 + __pa(empty_zero_page)));
-#else
-		set_pgd(swapper_pg_dir+i, __pgd(0));
-#endif
-	}
-
-	if (early)
-		__flush_tlb();
-	else
-		flush_tlb_all();
-}
-
 pteval_t __supported_pte_mask __read_mostly = ~(_PAGE_NX | _PAGE_GLOBAL | _PAGE_IOMAP);
 EXPORT_SYMBOL_GPL(__supported_pte_mask);
 
@@ -958,9 +916,6 @@ void __init mem_init(void)
 
 	if (boot_cpu_data.wp_works_ok < 0)
 		test_wp_bit();
-
-	save_pg_dir();
-	zap_low_mappings(true);
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH] x86-32, mm: Add an initial page table for core bootstrapping
  2010-08-24  7:33                 ` Initial working version (Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines) Borislav Petkov
@ 2010-08-29 20:32                   ` Borislav Petkov
  2010-09-02  9:10                     ` [PATCH -v1.1] " Borislav Petkov
  0 siblings, 1 reply; 31+ messages in thread
From: Borislav Petkov @ 2010-08-29 20:32 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Joerg Roedel, Roedel, Joerg, mingo, tglx,
	Herrmann3, Andreas, Seidel, Conny, Sarathy, Bhavna, greg, x86,
	linux-kernel

This patch adds an initial page table with low mappings used exclusively
for booting APs/resuming after ACPI suspend/machine restart. After this,
there's no need to add low mappings to swapper_pg_dir and zap them later
or create own swsusp PGD page solely for ACPI sleep needs.

Signed-off-by: Borislav Petkov <bp@alien8.de>
---

Ok, here's the initial version. This patch has survived a whole weekend
of randconfig builds and has been boot/reboot/suspend/resume tested on
32-bit kvm guests and bare metal in all possible configurations of PAE
and !PAE kernels with all possible VMSPLIT_* config options.

Patch is against tip/x86/urgent ontop of Hugh's CONFIG_VMSPLIT_1G and
2G_OPT fix.

 arch/x86/include/asm/pgtable_32.h |    2 +-
 arch/x86/include/asm/tlbflush.h   |    2 -
 arch/x86/include/asm/trampoline.h |    3 --
 arch/x86/kernel/acpi/sleep.c      |    7 ++++-
 arch/x86/kernel/head32.c          |    1 +
 arch/x86/kernel/head_32.S         |   55 +++++++++++++++++--------------------
 arch/x86/kernel/reboot.c          |   10 +-----
 arch/x86/kernel/setup.c           |   18 +++++++++++-
 arch/x86/kernel/smpboot.c         |   19 ++++---------
 arch/x86/kernel/trampoline.c      |   16 -----------
 arch/x86/mm/init_32.c             |   45 ------------------------------
 11 files changed, 58 insertions(+), 120 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h
index f686f49..8abde9e 100644
--- a/arch/x86/include/asm/pgtable_32.h
+++ b/arch/x86/include/asm/pgtable_32.h
@@ -26,7 +26,7 @@ struct mm_struct;
 struct vm_area_struct;
 
 extern pgd_t swapper_pg_dir[1024];
-extern pgd_t trampoline_pg_dir[1024];
+extern pgd_t initial_page_table[1024];
 
 static inline void pgtable_cache_init(void) { }
 static inline void check_pgt_cache(void) { }
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 7f3eba0..169be89 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -172,6 +172,4 @@ static inline void flush_tlb_kernel_range(unsigned long start,
 	flush_tlb_all();
 }
 
-extern void zap_low_mappings(bool early);
-
 #endif /* _ASM_X86_TLBFLUSH_H */
diff --git a/arch/x86/include/asm/trampoline.h b/arch/x86/include/asm/trampoline.h
index 4dde797..f4500fb 100644
--- a/arch/x86/include/asm/trampoline.h
+++ b/arch/x86/include/asm/trampoline.h
@@ -13,16 +13,13 @@ extern unsigned char *trampoline_base;
 
 extern unsigned long init_rsp;
 extern unsigned long initial_code;
-extern unsigned long initial_page_table;
 extern unsigned long initial_gs;
 
 #define TRAMPOLINE_SIZE roundup(trampoline_end - trampoline_data, PAGE_SIZE)
 
 extern unsigned long setup_trampoline(void);
-extern void __init setup_trampoline_page_table(void);
 extern void __init reserve_trampoline_memory(void);
 #else
-static inline void setup_trampoline_page_table(void) {}
 static inline void reserve_trampoline_memory(void) {}
 #endif /* CONFIG_X86_TRAMPOLINE */
 
diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c
index 33cec15..d04500e 100644
--- a/arch/x86/kernel/acpi/sleep.c
+++ b/arch/x86/kernel/acpi/sleep.c
@@ -12,6 +12,11 @@
 #include <asm/segment.h>
 #include <asm/desc.h>
 
+#ifdef CONFIG_X86_32
+#include <asm/pgtable.h>
+#include <asm/pgtable_32.h>
+#endif
+
 #include "realmode/wakeup.h"
 #include "sleep.h"
 
@@ -90,7 +95,7 @@ int acpi_save_state_mem(void)
 
 #ifndef CONFIG_64BIT
 	header->pmode_entry = (u32)&wakeup_pmode_return;
-	header->pmode_cr3 = (u32)(swsusp_pg_dir - __PAGE_OFFSET);
+	header->pmode_cr3 = (u32)(initial_page_table - __PAGE_OFFSET);
 	saved_magic = 0x12345678;
 #else /* CONFIG_64BIT */
 	header->trampoline_segment = setup_trampoline() >> 4;
diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index 784360c..8b9c201 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -17,6 +17,7 @@
 #include <asm/apic.h>
 #include <asm/io_apic.h>
 #include <asm/bios_ebda.h>
+#include <asm/tlbflush.h>
 
 static void __init i386_default_early_setup(void)
 {
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index fa8c1b8..bcece91 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -183,13 +183,12 @@ default_entry:
 #ifdef CONFIG_X86_PAE
 
 	/*
-	 * In PAE mode swapper_pg_dir is statically defined to contain enough
-	 * entries to cover the VMSPLIT option (that is the top 1, 2 or 3
-	 * entries). The identity mapping is handled by pointing two PGD
-	 * entries to the first kernel PMD.
+	 * In PAE mode initial_page_table is statically defined to contain
+	 * enough entries to cover the VMSPLIT option (that is the top 1, 2 or 3
+	 * entries). The identity mapping is handled by pointing two PGD entries
+	 * to the first kernel PMD.
 	 *
-	 * Note the upper half of each PMD or PTE are always zero at
-	 * this stage.
+	 * Note the upper half of each PMD or PTE are always zero at this stage.
 	 */
 
 #define KPMDS (((-__PAGE_OFFSET) >> 30) & 3) /* Number of kernel PMDs */
@@ -197,7 +196,7 @@ default_entry:
 	xorl %ebx,%ebx				/* %ebx is kept at zero */
 
 	movl $pa(__brk_base), %edi
-	movl $pa(swapper_pg_pmd), %edx
+	movl $pa(initial_pg_pmd), %edx
 	movl $PTE_IDENT_ATTR, %eax
 10:
 	leal PDE_IDENT_ATTR(%edi),%ecx		/* Create PMD entry */
@@ -226,14 +225,14 @@ default_entry:
 	movl %eax, pa(max_pfn_mapped)
 
 	/* Do early initialization of the fixmap area */
-	movl $pa(swapper_pg_fixmap)+PDE_IDENT_ATTR,%eax
-	movl %eax,pa(swapper_pg_pmd+0x1000*KPMDS-8)
+	movl $pa(initial_pg_fixmap)+PDE_IDENT_ATTR,%eax
+	movl %eax,pa(initial_pg_pmd+0x1000*KPMDS-8)
 #else	/* Not PAE */
 
 page_pde_offset = (__PAGE_OFFSET >> 20);
 
 	movl $pa(__brk_base), %edi
-	movl $pa(swapper_pg_dir), %edx
+	movl $pa(initial_page_table), %edx
 	movl $PTE_IDENT_ATTR, %eax
 10:
 	leal PDE_IDENT_ATTR(%edi),%ecx		/* Create PDE entry */
@@ -257,8 +256,8 @@ page_pde_offset = (__PAGE_OFFSET >> 20);
 	movl %eax, pa(max_pfn_mapped)
 
 	/* Do early initialization of the fixmap area */
-	movl $pa(swapper_pg_fixmap)+PDE_IDENT_ATTR,%eax
-	movl %eax,pa(swapper_pg_dir+0xffc)
+	movl $pa(initial_pg_fixmap)+PDE_IDENT_ATTR,%eax
+	movl %eax,pa(initial_page_table+0xffc)
 #endif
 	jmp 3f
 /*
@@ -334,7 +333,7 @@ ENTRY(startup_32_smp)
 /*
  * Enable paging
  */
-	movl pa(initial_page_table), %eax
+	movl $pa(initial_page_table), %eax
 	movl %eax,%cr3		/* set the page table pointer.. */
 	movl %cr0,%eax
 	orl  $X86_CR0_PG,%eax
@@ -614,8 +613,6 @@ ignore_int:
 .align 4
 ENTRY(initial_code)
 	.long i386_start_kernel
-ENTRY(initial_page_table)
-	.long pa(swapper_pg_dir)
 
 /*
  * BSS section
@@ -623,20 +620,18 @@ ENTRY(initial_page_table)
 __PAGE_ALIGNED_BSS
 	.align PAGE_SIZE_asm
 #ifdef CONFIG_X86_PAE
-swapper_pg_pmd:
+initial_pg_pmd:
 	.fill 1024*KPMDS,4,0
 #else
-ENTRY(swapper_pg_dir)
+ENTRY(initial_page_table)
 	.fill 1024,4,0
 #endif
-swapper_pg_fixmap:
+initial_pg_fixmap:
 	.fill 1024,4,0
-#ifdef CONFIG_X86_TRAMPOLINE
-ENTRY(trampoline_pg_dir)
-	.fill 1024,4,0
-#endif
 ENTRY(empty_zero_page)
 	.fill 4096,1,0
+ENTRY(swapper_pg_dir)
+	.fill 1024,4,0
 
 /*
  * This starts the data section.
@@ -645,20 +640,20 @@ ENTRY(empty_zero_page)
 __PAGE_ALIGNED_DATA
 	/* Page-aligned for the benefit of paravirt? */
 	.align PAGE_SIZE_asm
-ENTRY(swapper_pg_dir)
-	.long	pa(swapper_pg_pmd+PGD_IDENT_ATTR),0	/* low identity map */
+ENTRY(initial_page_table)
+	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR),0	/* low identity map */
 # if KPMDS == 3
-	.long	pa(swapper_pg_pmd+PGD_IDENT_ATTR),0
-	.long	pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x1000),0
-	.long	pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x2000),0
+	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR),0
+	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR+0x1000),0
+	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR+0x2000),0
 # elif KPMDS == 2
 	.long	0,0
-	.long	pa(swapper_pg_pmd+PGD_IDENT_ATTR),0
-	.long	pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x1000),0
+	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR),0
+	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR+0x1000),0
 # elif KPMDS == 1
 	.long	0,0
 	.long	0,0
-	.long	pa(swapper_pg_pmd+PGD_IDENT_ATTR),0
+	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR),0
 # else
 #  error "Kernel PMDs should be 1, 2 or 3"
 # endif
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index e3af342..ce5566e 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -371,16 +371,10 @@ void machine_real_restart(const unsigned char *code, int length)
 	CMOS_WRITE(0x00, 0x8f);
 	spin_unlock(&rtc_lock);
 
-	/* Remap the kernel at virtual address zero, as well as offset zero
-	   from the kernel segment.  This assumes the kernel segment starts at
-	   virtual address PAGE_OFFSET. */
-	memcpy(swapper_pg_dir, swapper_pg_dir + KERNEL_PGD_BOUNDARY,
-		sizeof(swapper_pg_dir [0]) * KERNEL_PGD_PTRS);
-
 	/*
-	 * Use `swapper_pg_dir' as our page directory.
+	 * Switch back to the initial page table.
 	 */
-	load_cr3(swapper_pg_dir);
+	load_cr3(initial_page_table);
 
 	/* Write 0x1234 to absolute memory location 0x472.  The BIOS reads
 	   this on booting to tell it to "Bypass memory test (also warm
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index c3a4fbb..b315b37 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -730,6 +730,17 @@ void __init setup_arch(char **cmdline_p)
 #ifdef CONFIG_X86_32
 	memcpy(&boot_cpu_data, &new_cpu_data, sizeof(new_cpu_data));
 	visws_early_detect();
+
+	/*
+	 * copy kernel address range established so far and switch
+	 * to the proper swapper page table
+	 */
+	clone_pgd_range(swapper_pg_dir     + KERNEL_PGD_BOUNDARY,
+			initial_page_table + KERNEL_PGD_BOUNDARY,
+			KERNEL_PGD_PTRS);
+
+	load_cr3(swapper_pg_dir);
+	__flush_tlb_all();
 #else
 	printk(KERN_INFO "Command line: %s\n", boot_command_line);
 #endif
@@ -1014,7 +1025,12 @@ void __init setup_arch(char **cmdline_p)
 	paging_init();
 	x86_init.paging.pagetable_setup_done(swapper_pg_dir);
 
-	setup_trampoline_page_table();
+#ifdef CONFIG_X86_32
+	/* sync back kernel address range */
+	clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY,
+			swapper_pg_dir     + KERNEL_PGD_BOUNDARY,
+			KERNEL_PGD_PTRS);
+#endif
 
 	tboot_probe();
 
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 8b3bfc4..0d86d0b 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -300,22 +300,17 @@ notrace static void __cpuinit start_secondary(void *unused)
 	 * most necessary things.
 	 */
 
-#ifdef CONFIG_X86_32
-	/*
-	 * Switch away from the trampoline page-table
-	 *
-	 * Do this before cpu_init() because it needs to access per-cpu
-	 * data which may not be mapped in the trampoline page-table.
-	 */
-	load_cr3(swapper_pg_dir);
-	__flush_tlb_all();
-#endif
-
 	vmi_bringup();
 	cpu_init();
 	preempt_disable();
 	smp_callin();
 
+#ifdef CONFIG_X86_32
+	/* switch away from the initial page table */
+	load_cr3(swapper_pg_dir);
+	__flush_tlb_all();
+#endif
+
 	/* otherwise gcc will move up smp_processor_id before the cpu_init */
 	barrier();
 	/*
@@ -774,7 +769,6 @@ do_rest:
 #ifdef CONFIG_X86_32
 	/* Stack for startup_32 can be just as for start_secondary onwards */
 	irq_ctx_init(cpu);
-	initial_page_table = __pa(&trampoline_pg_dir);
 #else
 	clear_tsk_thread_flag(c_idle.idle, TIF_FORK);
 	initial_gs = per_cpu_offset(cpu);
@@ -923,7 +917,6 @@ int __cpuinit native_cpu_up(unsigned int cpu)
 	per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
 
 	err = do_boot_cpu(apicid, cpu);
-
 	if (err) {
 		pr_debug("do_boot_cpu failed %d\n", err);
 		return -EIO;
diff --git a/arch/x86/kernel/trampoline.c b/arch/x86/kernel/trampoline.c
index e2a5952..f1488a3 100644
--- a/arch/x86/kernel/trampoline.c
+++ b/arch/x86/kernel/trampoline.c
@@ -38,19 +38,3 @@ unsigned long __trampinit setup_trampoline(void)
 	memcpy(trampoline_base, trampoline_data, TRAMPOLINE_SIZE);
 	return virt_to_phys(trampoline_base);
 }
-
-void __init setup_trampoline_page_table(void)
-{
-#ifdef CONFIG_X86_32
-	/* Copy kernel address range */
-	clone_pgd_range(trampoline_pg_dir + KERNEL_PGD_BOUNDARY,
-			swapper_pg_dir + KERNEL_PGD_BOUNDARY,
-			KERNEL_PGD_PTRS);
-
-	/* Initialize low mappings */
-	clone_pgd_range(trampoline_pg_dir,
-			swapper_pg_dir + KERNEL_PGD_BOUNDARY,
-			min_t(unsigned long, KERNEL_PGD_PTRS,
-			      KERNEL_PGD_BOUNDARY));
-#endif
-}
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index bca7909..adad48b 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -548,48 +548,6 @@ static void __init pagetable_init(void)
 	permanent_kmaps_init(pgd_base);
 }
 
-#ifdef CONFIG_ACPI_SLEEP
-/*
- * ACPI suspend needs this for resume, because things like the intel-agp
- * driver might have split up a kernel 4MB mapping.
- */
-char swsusp_pg_dir[PAGE_SIZE]
-	__attribute__ ((aligned(PAGE_SIZE)));
-
-static inline void save_pg_dir(void)
-{
-	memcpy(swsusp_pg_dir, swapper_pg_dir, PAGE_SIZE);
-}
-#else /* !CONFIG_ACPI_SLEEP */
-static inline void save_pg_dir(void)
-{
-}
-#endif /* !CONFIG_ACPI_SLEEP */
-
-void zap_low_mappings(bool early)
-{
-	int i;
-
-	/*
-	 * Zap initial low-memory mappings.
-	 *
-	 * Note that "pgd_clear()" doesn't do it for
-	 * us, because pgd_clear() is a no-op on i386.
-	 */
-	for (i = 0; i < KERNEL_PGD_BOUNDARY; i++) {
-#ifdef CONFIG_X86_PAE
-		set_pgd(swapper_pg_dir+i, __pgd(1 + __pa(empty_zero_page)));
-#else
-		set_pgd(swapper_pg_dir+i, __pgd(0));
-#endif
-	}
-
-	if (early)
-		__flush_tlb();
-	else
-		flush_tlb_all();
-}
-
 pteval_t __supported_pte_mask __read_mostly = ~(_PAGE_NX | _PAGE_GLOBAL | _PAGE_IOMAP);
 EXPORT_SYMBOL_GPL(__supported_pte_mask);
 
@@ -958,9 +916,6 @@ void __init mem_init(void)
 
 	if (boot_cpu_data.wp_works_ok < 0)
 		test_wp_bit();
-
-	save_pg_dir();
-	zap_low_mappings(true);
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-- 
1.7.1

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH -v1.1] x86-32, mm: Add an initial page table for core bootstrapping
  2010-08-29 20:32                   ` [PATCH] x86-32, mm: Add an initial page table for core bootstrapping Borislav Petkov
@ 2010-09-02  9:10                     ` Borislav Petkov
  0 siblings, 0 replies; 31+ messages in thread
From: Borislav Petkov @ 2010-09-02  9:10 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Joerg Roedel, Roedel, Joerg, mingo, tglx,
	Herrmann3, Andreas, Seidel, Conny, Sarathy, Bhavna, greg, x86,
	linux-kernel

From: Borislav Petkov <bp@alien8.de>
Date: Sun, Aug 29, 2010 at 10:32:05PM +0200

> This patch adds an initial page table with low mappings used exclusively
> for booting APs/resuming after ACPI suspend/machine restart. After this,
> there's no need to add low mappings to swapper_pg_dir and zap them later
> or create own swsusp PGD page solely for ACPI sleep needs.
> 
> Signed-off-by: Borislav Petkov <bp@alien8.de>
> ---
> 
> Ok, here's the initial version. This patch has survived a whole weekend
> of randconfig builds and has been boot/reboot/suspend/resume tested on
> 32-bit kvm guests and bare metal in all possible configurations of PAE
> and !PAE kernels with all possible VMSPLIT_* config options.
> 
> Patch is against tip/x86/urgent ontop of Hugh's CONFIG_VMSPLIT_1G and
> 2G_OPT fix.

Ok this one broke suspend-to-ram due to wrongly _not_ passing the
_pointer_ to the initial_page_table as the pmode_cr3 pointer when
resuming from ACPI sleep. Here's a better version.

--
From: Borislav Petkov <bp@alien8.de>
Date: Sat, 28 Aug 2010 15:58:33 +0200
Subject: [PATCH] x86-32, mm: Add an initial page table for core bootstrapping

This patch adds an initial page table with low mappings used exclusively
for booting APs/resuming after ACPI suspend/machine restart. After this,
there's no need to add low mappings to swapper_pg_dir and zap them later
or create own swsusp PGD page solely for ACPI sleep needs.

[ v1.1 Fix suspend-to-ram ]

Signed-off-by: Borislav Petkov <bp@alien8.de>
---
 arch/x86/include/asm/pgtable_32.h |    2 +-
 arch/x86/include/asm/tlbflush.h   |    2 -
 arch/x86/include/asm/trampoline.h |    3 --
 arch/x86/kernel/acpi/sleep.c      |    7 ++++-
 arch/x86/kernel/head32.c          |    1 +
 arch/x86/kernel/head_32.S         |   55 +++++++++++++++++--------------------
 arch/x86/kernel/reboot.c          |   10 +-----
 arch/x86/kernel/setup.c           |   18 +++++++++++-
 arch/x86/kernel/smpboot.c         |   19 ++++---------
 arch/x86/kernel/trampoline.c      |   16 -----------
 arch/x86/mm/init_32.c             |   45 ------------------------------
 11 files changed, 58 insertions(+), 120 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h
index f686f49..8abde9e 100644
--- a/arch/x86/include/asm/pgtable_32.h
+++ b/arch/x86/include/asm/pgtable_32.h
@@ -26,7 +26,7 @@ struct mm_struct;
 struct vm_area_struct;
 
 extern pgd_t swapper_pg_dir[1024];
-extern pgd_t trampoline_pg_dir[1024];
+extern pgd_t initial_page_table[1024];
 
 static inline void pgtable_cache_init(void) { }
 static inline void check_pgt_cache(void) { }
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 7f3eba0..169be89 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -172,6 +172,4 @@ static inline void flush_tlb_kernel_range(unsigned long start,
 	flush_tlb_all();
 }
 
-extern void zap_low_mappings(bool early);
-
 #endif /* _ASM_X86_TLBFLUSH_H */
diff --git a/arch/x86/include/asm/trampoline.h b/arch/x86/include/asm/trampoline.h
index 4dde797..f4500fb 100644
--- a/arch/x86/include/asm/trampoline.h
+++ b/arch/x86/include/asm/trampoline.h
@@ -13,16 +13,13 @@ extern unsigned char *trampoline_base;
 
 extern unsigned long init_rsp;
 extern unsigned long initial_code;
-extern unsigned long initial_page_table;
 extern unsigned long initial_gs;
 
 #define TRAMPOLINE_SIZE roundup(trampoline_end - trampoline_data, PAGE_SIZE)
 
 extern unsigned long setup_trampoline(void);
-extern void __init setup_trampoline_page_table(void);
 extern void __init reserve_trampoline_memory(void);
 #else
-static inline void setup_trampoline_page_table(void) {}
 static inline void reserve_trampoline_memory(void) {}
 #endif /* CONFIG_X86_TRAMPOLINE */
 
diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c
index 33cec15..b35e1ab 100644
--- a/arch/x86/kernel/acpi/sleep.c
+++ b/arch/x86/kernel/acpi/sleep.c
@@ -12,6 +12,11 @@
 #include <asm/segment.h>
 #include <asm/desc.h>
 
+#ifdef CONFIG_X86_32
+#include <asm/pgtable.h>
+#include <asm/pgtable_32.h>
+#endif
+
 #include "realmode/wakeup.h"
 #include "sleep.h"
 
@@ -90,7 +95,7 @@ int acpi_save_state_mem(void)
 
 #ifndef CONFIG_64BIT
 	header->pmode_entry = (u32)&wakeup_pmode_return;
-	header->pmode_cr3 = (u32)(swsusp_pg_dir - __PAGE_OFFSET);
+	header->pmode_cr3 = (u32)__pa(&initial_page_table);
 	saved_magic = 0x12345678;
 #else /* CONFIG_64BIT */
 	header->trampoline_segment = setup_trampoline() >> 4;
diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index 784360c..8b9c201 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -17,6 +17,7 @@
 #include <asm/apic.h>
 #include <asm/io_apic.h>
 #include <asm/bios_ebda.h>
+#include <asm/tlbflush.h>
 
 static void __init i386_default_early_setup(void)
 {
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index fa8c1b8..bcece91 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -183,13 +183,12 @@ default_entry:
 #ifdef CONFIG_X86_PAE
 
 	/*
-	 * In PAE mode swapper_pg_dir is statically defined to contain enough
-	 * entries to cover the VMSPLIT option (that is the top 1, 2 or 3
-	 * entries). The identity mapping is handled by pointing two PGD
-	 * entries to the first kernel PMD.
+	 * In PAE mode initial_page_table is statically defined to contain
+	 * enough entries to cover the VMSPLIT option (that is the top 1, 2 or 3
+	 * entries). The identity mapping is handled by pointing two PGD entries
+	 * to the first kernel PMD.
 	 *
-	 * Note the upper half of each PMD or PTE are always zero at
-	 * this stage.
+	 * Note the upper half of each PMD or PTE are always zero at this stage.
 	 */
 
 #define KPMDS (((-__PAGE_OFFSET) >> 30) & 3) /* Number of kernel PMDs */
@@ -197,7 +196,7 @@ default_entry:
 	xorl %ebx,%ebx				/* %ebx is kept at zero */
 
 	movl $pa(__brk_base), %edi
-	movl $pa(swapper_pg_pmd), %edx
+	movl $pa(initial_pg_pmd), %edx
 	movl $PTE_IDENT_ATTR, %eax
 10:
 	leal PDE_IDENT_ATTR(%edi),%ecx		/* Create PMD entry */
@@ -226,14 +225,14 @@ default_entry:
 	movl %eax, pa(max_pfn_mapped)
 
 	/* Do early initialization of the fixmap area */
-	movl $pa(swapper_pg_fixmap)+PDE_IDENT_ATTR,%eax
-	movl %eax,pa(swapper_pg_pmd+0x1000*KPMDS-8)
+	movl $pa(initial_pg_fixmap)+PDE_IDENT_ATTR,%eax
+	movl %eax,pa(initial_pg_pmd+0x1000*KPMDS-8)
 #else	/* Not PAE */
 
 page_pde_offset = (__PAGE_OFFSET >> 20);
 
 	movl $pa(__brk_base), %edi
-	movl $pa(swapper_pg_dir), %edx
+	movl $pa(initial_page_table), %edx
 	movl $PTE_IDENT_ATTR, %eax
 10:
 	leal PDE_IDENT_ATTR(%edi),%ecx		/* Create PDE entry */
@@ -257,8 +256,8 @@ page_pde_offset = (__PAGE_OFFSET >> 20);
 	movl %eax, pa(max_pfn_mapped)
 
 	/* Do early initialization of the fixmap area */
-	movl $pa(swapper_pg_fixmap)+PDE_IDENT_ATTR,%eax
-	movl %eax,pa(swapper_pg_dir+0xffc)
+	movl $pa(initial_pg_fixmap)+PDE_IDENT_ATTR,%eax
+	movl %eax,pa(initial_page_table+0xffc)
 #endif
 	jmp 3f
 /*
@@ -334,7 +333,7 @@ ENTRY(startup_32_smp)
 /*
  * Enable paging
  */
-	movl pa(initial_page_table), %eax
+	movl $pa(initial_page_table), %eax
 	movl %eax,%cr3		/* set the page table pointer.. */
 	movl %cr0,%eax
 	orl  $X86_CR0_PG,%eax
@@ -614,8 +613,6 @@ ignore_int:
 .align 4
 ENTRY(initial_code)
 	.long i386_start_kernel
-ENTRY(initial_page_table)
-	.long pa(swapper_pg_dir)
 
 /*
  * BSS section
@@ -623,20 +620,18 @@ ENTRY(initial_page_table)
 __PAGE_ALIGNED_BSS
 	.align PAGE_SIZE_asm
 #ifdef CONFIG_X86_PAE
-swapper_pg_pmd:
+initial_pg_pmd:
 	.fill 1024*KPMDS,4,0
 #else
-ENTRY(swapper_pg_dir)
+ENTRY(initial_page_table)
 	.fill 1024,4,0
 #endif
-swapper_pg_fixmap:
+initial_pg_fixmap:
 	.fill 1024,4,0
-#ifdef CONFIG_X86_TRAMPOLINE
-ENTRY(trampoline_pg_dir)
-	.fill 1024,4,0
-#endif
 ENTRY(empty_zero_page)
 	.fill 4096,1,0
+ENTRY(swapper_pg_dir)
+	.fill 1024,4,0
 
 /*
  * This starts the data section.
@@ -645,20 +640,20 @@ ENTRY(empty_zero_page)
 __PAGE_ALIGNED_DATA
 	/* Page-aligned for the benefit of paravirt? */
 	.align PAGE_SIZE_asm
-ENTRY(swapper_pg_dir)
-	.long	pa(swapper_pg_pmd+PGD_IDENT_ATTR),0	/* low identity map */
+ENTRY(initial_page_table)
+	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR),0	/* low identity map */
 # if KPMDS == 3
-	.long	pa(swapper_pg_pmd+PGD_IDENT_ATTR),0
-	.long	pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x1000),0
-	.long	pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x2000),0
+	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR),0
+	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR+0x1000),0
+	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR+0x2000),0
 # elif KPMDS == 2
 	.long	0,0
-	.long	pa(swapper_pg_pmd+PGD_IDENT_ATTR),0
-	.long	pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x1000),0
+	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR),0
+	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR+0x1000),0
 # elif KPMDS == 1
 	.long	0,0
 	.long	0,0
-	.long	pa(swapper_pg_pmd+PGD_IDENT_ATTR),0
+	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR),0
 # else
 #  error "Kernel PMDs should be 1, 2 or 3"
 # endif
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index e3af342..ce5566e 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -371,16 +371,10 @@ void machine_real_restart(const unsigned char *code, int length)
 	CMOS_WRITE(0x00, 0x8f);
 	spin_unlock(&rtc_lock);
 
-	/* Remap the kernel at virtual address zero, as well as offset zero
-	   from the kernel segment.  This assumes the kernel segment starts at
-	   virtual address PAGE_OFFSET. */
-	memcpy(swapper_pg_dir, swapper_pg_dir + KERNEL_PGD_BOUNDARY,
-		sizeof(swapper_pg_dir [0]) * KERNEL_PGD_PTRS);
-
 	/*
-	 * Use `swapper_pg_dir' as our page directory.
+	 * Switch back to the initial page table.
 	 */
-	load_cr3(swapper_pg_dir);
+	load_cr3(initial_page_table);
 
 	/* Write 0x1234 to absolute memory location 0x472.  The BIOS reads
 	   this on booting to tell it to "Bypass memory test (also warm
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index c3a4fbb..b315b37 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -730,6 +730,17 @@ void __init setup_arch(char **cmdline_p)
 #ifdef CONFIG_X86_32
 	memcpy(&boot_cpu_data, &new_cpu_data, sizeof(new_cpu_data));
 	visws_early_detect();
+
+	/*
+	 * copy kernel address range established so far and switch
+	 * to the proper swapper page table
+	 */
+	clone_pgd_range(swapper_pg_dir     + KERNEL_PGD_BOUNDARY,
+			initial_page_table + KERNEL_PGD_BOUNDARY,
+			KERNEL_PGD_PTRS);
+
+	load_cr3(swapper_pg_dir);
+	__flush_tlb_all();
 #else
 	printk(KERN_INFO "Command line: %s\n", boot_command_line);
 #endif
@@ -1014,7 +1025,12 @@ void __init setup_arch(char **cmdline_p)
 	paging_init();
 	x86_init.paging.pagetable_setup_done(swapper_pg_dir);
 
-	setup_trampoline_page_table();
+#ifdef CONFIG_X86_32
+	/* sync back kernel address range */
+	clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY,
+			swapper_pg_dir     + KERNEL_PGD_BOUNDARY,
+			KERNEL_PGD_PTRS);
+#endif
 
 	tboot_probe();
 
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 8b3bfc4..0d86d0b 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -300,22 +300,17 @@ notrace static void __cpuinit start_secondary(void *unused)
 	 * most necessary things.
 	 */
 
-#ifdef CONFIG_X86_32
-	/*
-	 * Switch away from the trampoline page-table
-	 *
-	 * Do this before cpu_init() because it needs to access per-cpu
-	 * data which may not be mapped in the trampoline page-table.
-	 */
-	load_cr3(swapper_pg_dir);
-	__flush_tlb_all();
-#endif
-
 	vmi_bringup();
 	cpu_init();
 	preempt_disable();
 	smp_callin();
 
+#ifdef CONFIG_X86_32
+	/* switch away from the initial page table */
+	load_cr3(swapper_pg_dir);
+	__flush_tlb_all();
+#endif
+
 	/* otherwise gcc will move up smp_processor_id before the cpu_init */
 	barrier();
 	/*
@@ -774,7 +769,6 @@ do_rest:
 #ifdef CONFIG_X86_32
 	/* Stack for startup_32 can be just as for start_secondary onwards */
 	irq_ctx_init(cpu);
-	initial_page_table = __pa(&trampoline_pg_dir);
 #else
 	clear_tsk_thread_flag(c_idle.idle, TIF_FORK);
 	initial_gs = per_cpu_offset(cpu);
@@ -923,7 +917,6 @@ int __cpuinit native_cpu_up(unsigned int cpu)
 	per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
 
 	err = do_boot_cpu(apicid, cpu);
-
 	if (err) {
 		pr_debug("do_boot_cpu failed %d\n", err);
 		return -EIO;
diff --git a/arch/x86/kernel/trampoline.c b/arch/x86/kernel/trampoline.c
index e2a5952..f1488a3 100644
--- a/arch/x86/kernel/trampoline.c
+++ b/arch/x86/kernel/trampoline.c
@@ -38,19 +38,3 @@ unsigned long __trampinit setup_trampoline(void)
 	memcpy(trampoline_base, trampoline_data, TRAMPOLINE_SIZE);
 	return virt_to_phys(trampoline_base);
 }
-
-void __init setup_trampoline_page_table(void)
-{
-#ifdef CONFIG_X86_32
-	/* Copy kernel address range */
-	clone_pgd_range(trampoline_pg_dir + KERNEL_PGD_BOUNDARY,
-			swapper_pg_dir + KERNEL_PGD_BOUNDARY,
-			KERNEL_PGD_PTRS);
-
-	/* Initialize low mappings */
-	clone_pgd_range(trampoline_pg_dir,
-			swapper_pg_dir + KERNEL_PGD_BOUNDARY,
-			min_t(unsigned long, KERNEL_PGD_PTRS,
-			      KERNEL_PGD_BOUNDARY));
-#endif
-}
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index bca7909..adad48b 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -548,48 +548,6 @@ static void __init pagetable_init(void)
 	permanent_kmaps_init(pgd_base);
 }
 
-#ifdef CONFIG_ACPI_SLEEP
-/*
- * ACPI suspend needs this for resume, because things like the intel-agp
- * driver might have split up a kernel 4MB mapping.
- */
-char swsusp_pg_dir[PAGE_SIZE]
-	__attribute__ ((aligned(PAGE_SIZE)));
-
-static inline void save_pg_dir(void)
-{
-	memcpy(swsusp_pg_dir, swapper_pg_dir, PAGE_SIZE);
-}
-#else /* !CONFIG_ACPI_SLEEP */
-static inline void save_pg_dir(void)
-{
-}
-#endif /* !CONFIG_ACPI_SLEEP */
-
-void zap_low_mappings(bool early)
-{
-	int i;
-
-	/*
-	 * Zap initial low-memory mappings.
-	 *
-	 * Note that "pgd_clear()" doesn't do it for
-	 * us, because pgd_clear() is a no-op on i386.
-	 */
-	for (i = 0; i < KERNEL_PGD_BOUNDARY; i++) {
-#ifdef CONFIG_X86_PAE
-		set_pgd(swapper_pg_dir+i, __pgd(1 + __pa(empty_zero_page)));
-#else
-		set_pgd(swapper_pg_dir+i, __pgd(0));
-#endif
-	}
-
-	if (early)
-		__flush_tlb();
-	else
-		flush_tlb_all();
-}
-
 pteval_t __supported_pte_mask __read_mostly = ~(_PAGE_NX | _PAGE_GLOBAL | _PAGE_IOMAP);
 EXPORT_SYMBOL_GPL(__supported_pte_mask);
 
@@ -958,9 +916,6 @@ void __init mem_init(void)
 
 	if (boot_cpu_data.wp_works_ok < 0)
 		test_wp_bit();
-
-	save_pg_dir();
-	zap_low_mappings(true);
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-- 
1.7.1

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply related	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2010-09-02  9:10 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-04 16:45 [PATCH 0/2] Fix 32-bit CPU hotplug issue on AMD Borislav Petkov
2010-08-04 16:45 ` [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines Borislav Petkov
2010-08-04 23:05   ` H. Peter Anvin
2010-08-05  4:48     ` Borislav Petkov
2010-08-05  7:45     ` Roedel, Joerg
2010-08-05 14:14       ` H. Peter Anvin
2010-08-12 14:41         ` Joerg Roedel
2010-08-12 15:34           ` H. Peter Anvin
2010-08-12 15:47             ` Borislav Petkov
2010-08-12 15:47               ` H. Peter Anvin
2010-08-12 17:38                 ` Borislav Petkov
2010-08-24  7:33                 ` Initial working version (Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines) Borislav Petkov
2010-08-29 20:32                   ` [PATCH] x86-32, mm: Add an initial page table for core bootstrapping Borislav Petkov
2010-09-02  9:10                     ` [PATCH -v1.1] " Borislav Petkov
2010-08-12 17:06             ` [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines Joerg Roedel
2010-08-12 19:01               ` H. Peter Anvin
2010-08-12 19:04                 ` Joerg Roedel
2010-08-13 12:35                   ` Borislav Petkov
2010-08-16 12:19                     ` Borislav Petkov
2010-08-16 12:38                       ` [PATCH -v2 " Borislav Petkov
2010-08-18 18:41                         ` H. Peter Anvin
2010-08-18 19:09                           ` Borislav Petkov
2010-08-18 20:55                           ` [tip:x86/urgent] x86-32: Fix dummy trampoline-related inline stubs tip-bot for H. Peter Anvin
2010-08-18 20:55                         ` [tip:x86/urgent] x86-32: Separate 1:1 pagetables from swapper_pg_dir tip-bot for Joerg Roedel
2010-08-16 12:39                       ` [PATCH -v2 2/2] x86, cpu hotplug: Fix cpu bringup concurrency issue Borislav Petkov
2010-08-18 19:03                         ` H. Peter Anvin
2010-08-18 19:28                           ` Borislav Petkov
2010-08-18 20:04                             ` H. Peter Anvin
2010-08-19 18:10                               ` [PATCH -v2.1 " Borislav Petkov
2010-08-19 22:58                                 ` [tip:x86/urgent] x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues tip-bot for Borislav Petkov
2010-08-04 16:45 ` [PATCH 2/2] x86, cpu hotplug: Fix cpu bringup concurrency issue Borislav Petkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.