* [PATCH 0/2] Fix 32-bit CPU hotplug issue on AMD @ 2010-08-04 16:45 Borislav Petkov 2010-08-04 16:45 ` [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines Borislav Petkov 2010-08-04 16:45 ` [PATCH 2/2] x86, cpu hotplug: Fix cpu bringup concurrency issue Borislav Petkov 0 siblings, 2 replies; 31+ messages in thread From: Borislav Petkov @ 2010-08-04 16:45 UTC (permalink / raw) To: hpa, mingo, tglx Cc: andreas.herrmann3, conny.seidel, joerg.roedel, Bhavna.Sarathy, greg, x86, linux-kernel From: Borislav Petkov <borislav.petkov@amd.com> Hi, the following two patches fix an issue which one of our tests triggers here on a 32-bit kernel while exercising the CPU hotplug path. More details are to be found in the respective commit messages. Now, considering the seriousness of the issue, I'd very much like to backport the fixes to .32 and .34. And while the fixes are not oneliners, they're straightforward enough to not cause any problems IMHO. But I may be biased from staring at them for a while now so what do you guys, think, any objections to backporting those? Thanks. ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines 2010-08-04 16:45 [PATCH 0/2] Fix 32-bit CPU hotplug issue on AMD Borislav Petkov @ 2010-08-04 16:45 ` Borislav Petkov 2010-08-04 23:05 ` H. Peter Anvin 2010-08-04 16:45 ` [PATCH 2/2] x86, cpu hotplug: Fix cpu bringup concurrency issue Borislav Petkov 1 sibling, 1 reply; 31+ messages in thread From: Borislav Petkov @ 2010-08-04 16:45 UTC (permalink / raw) To: hpa, mingo, tglx Cc: andreas.herrmann3, conny.seidel, joerg.roedel, Bhavna.Sarathy, greg, x86, linux-kernel From: Joerg Roedel <joerg.roedel@amd.com> This patch fixes machine crashes which occur when heavily exercising the CPU hotplug codepaths on a 32-bit kernel. These crashes are caused by AMD Erratum 383 and result in a fatal machine check exception. Here's the scenario: 1. On 32-bit, the swapper_pg_dir page table is used as the initial page table for booting a secondary CPU. 2. To make this work, swapper_pg_dir needs a direct mapping of physical memory in it (the low mappings). By adding those low, large page (2M) mappings (PAE kernel), we create the necessary conditions for Erratum 383 to occur. 3. Other CPUs which do not participate in the off- and onlining game may use swapper_pg_dir while the low mappings are present (when leave_mm is called). For all steps below, the CPU referred to is a CPU that is using swapper_pg_dir, and not the CPU which is being onlined. 4. The presence of the low mappings in swapper_pg_dir can result in TLB entries for addresses below __PAGE_OFFSET to be established speculatively. These TLB entries are marked global and large. 5. When the CPU with such TLB entry switches to another page table, this TLB entry remains because it is global. 6. The process then generates an access to an address covered by the above TLB entry but there is a permission mismatch - the TLB entry covers a large global page not accessible to userspace. 7. Due to this permission mismatch a new 4kb, user TLB entry gets established. Further, Erratum 383 provides for a small window of time where both TLB entries are present. This results in an uncorrectable machine check exception signalling a TLB multimatch which panics the machine. There are two ways to fix this issue: 1. Always do a global TLB flush when a new cr3 is loaded and the old page table was swapper_pg_dir. I consider this a hack hard to understand and with performance implications 2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit does. This patch implements solution 2. It introduces a trampoline_pg_dir which has the same layout as swapper_pg_dir with low_mappings. This page table is used as the initial page table of the booting CPU. Later in the bringup process, it switches to swapper_pg_dir and does a global TLB flush. This fixes the crashes in our test cases. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> --- arch/x86/include/asm/pgtable_32.h | 1 + arch/x86/include/asm/trampoline.h | 3 +++ arch/x86/kernel/head_32.S | 8 +++++++- arch/x86/kernel/setup.c | 2 ++ arch/x86/kernel/smpboot.c | 18 +++--------------- arch/x86/kernel/trampoline.c | 18 ++++++++++++++++++ 6 files changed, 34 insertions(+), 16 deletions(-) diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h index 2984a25..f686f49 100644 --- a/arch/x86/include/asm/pgtable_32.h +++ b/arch/x86/include/asm/pgtable_32.h @@ -26,6 +26,7 @@ struct mm_struct; struct vm_area_struct; extern pgd_t swapper_pg_dir[1024]; +extern pgd_t trampoline_pg_dir[1024]; static inline void pgtable_cache_init(void) { } static inline void check_pgt_cache(void) { } diff --git a/arch/x86/include/asm/trampoline.h b/arch/x86/include/asm/trampoline.h index cb507bb..8f78fdf 100644 --- a/arch/x86/include/asm/trampoline.h +++ b/arch/x86/include/asm/trampoline.h @@ -13,14 +13,17 @@ extern unsigned char *trampoline_base; extern unsigned long init_rsp; extern unsigned long initial_code; +extern unsigned long initial_page_table; extern unsigned long initial_gs; #define TRAMPOLINE_SIZE roundup(trampoline_end - trampoline_data, PAGE_SIZE) extern unsigned long setup_trampoline(void); +extern void __init setup_trampoline_page_table(void); extern void __init reserve_trampoline_memory(void); #else static inline void reserve_trampoline_memory(void) {}; +extern void __init setup_trampoline_page_table(void) {}; #endif /* CONFIG_X86_TRAMPOLINE */ #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S index 37c3d4b..75e3981 100644 --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -328,7 +328,7 @@ ENTRY(startup_32_smp) /* * Enable paging */ - movl $pa(swapper_pg_dir),%eax + movl pa(initial_page_table), %eax movl %eax,%cr3 /* set the page table pointer.. */ movl %cr0,%eax orl $X86_CR0_PG,%eax @@ -608,6 +608,8 @@ ignore_int: .align 4 ENTRY(initial_code) .long i386_start_kernel +ENTRY(initial_page_table) + .long pa(swapper_pg_dir) /* * BSS section @@ -623,6 +625,10 @@ ENTRY(swapper_pg_dir) #endif swapper_pg_fixmap: .fill 1024,4,0 +#ifdef CONFIG_X86_TRAMPOLINE +ENTRY(trampoline_pg_dir) + .fill 1024,4,0 +#endif ENTRY(empty_zero_page) .fill 4096,1,0 diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index b4ae4ac..6600cfd 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1008,6 +1008,8 @@ void __init setup_arch(char **cmdline_p) paging_init(); x86_init.paging.pagetable_setup_done(swapper_pg_dir); + setup_trampoline_page_table(); + tboot_probe(); #ifdef CONFIG_X86_64 diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index c4f33b2..6859a42 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -73,7 +73,6 @@ #ifdef CONFIG_X86_32 u8 apicid_2_node[MAX_APICID]; -static int low_mappings; #endif /* State of each CPU */ @@ -300,8 +299,8 @@ notrace static void __cpuinit start_secondary(void *unused) } #ifdef CONFIG_X86_32 - while (low_mappings) - cpu_relax(); + /* switch away from the trampoline page-table */ + load_cr3(swapper_pg_dir); __flush_tlb_all(); #endif @@ -754,6 +753,7 @@ do_rest: #ifdef CONFIG_X86_32 /* Stack for startup_32 can be just as for start_secondary onwards */ irq_ctx_init(cpu); + initial_page_table = __pa(&trampoline_pg_dir); #else clear_tsk_thread_flag(c_idle.idle, TIF_FORK); initial_gs = per_cpu_offset(cpu); @@ -894,20 +894,8 @@ int __cpuinit native_cpu_up(unsigned int cpu) per_cpu(cpu_state, cpu) = CPU_UP_PREPARE; -#ifdef CONFIG_X86_32 - /* init low mem mapping */ - clone_pgd_range(swapper_pg_dir, swapper_pg_dir + KERNEL_PGD_BOUNDARY, - min_t(unsigned long, KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY)); - flush_tlb_all(); - low_mappings = 1; - err = do_boot_cpu(apicid, cpu); - zap_low_mappings(false); - low_mappings = 0; -#else - err = do_boot_cpu(apicid, cpu); -#endif if (err) { pr_debug("do_boot_cpu failed %d\n", err); return -EIO; diff --git a/arch/x86/kernel/trampoline.c b/arch/x86/kernel/trampoline.c index c652ef6..a874495 100644 --- a/arch/x86/kernel/trampoline.c +++ b/arch/x86/kernel/trampoline.c @@ -1,6 +1,7 @@ #include <linux/io.h> #include <asm/trampoline.h> +#include <asm/pgtable.h> #include <asm/e820.h> #if defined(CONFIG_X86_64) && defined(CONFIG_ACPI_SLEEP) @@ -37,3 +38,20 @@ unsigned long __trampinit setup_trampoline(void) memcpy(trampoline_base, trampoline_data, TRAMPOLINE_SIZE); return virt_to_phys(trampoline_base); } + +void __init setup_trampoline_page_table(void) +{ +#ifdef CONFIG_X86_32 + /* Copy kernel address range */ + clone_pgd_range(trampoline_pg_dir + KERNEL_PGD_BOUNDARY, + swapper_pg_dir + KERNEL_PGD_BOUNDARY, + min_t(unsigned long, KERNEL_PGD_PTRS, + KERNEL_PGD_BOUNDARY)); + + /* Initialize low mappings */ + clone_pgd_range(trampoline_pg_dir, + swapper_pg_dir + KERNEL_PGD_BOUNDARY, + min_t(unsigned long, KERNEL_PGD_PTRS, + KERNEL_PGD_BOUNDARY)); +#endif +} -- 1.6.4.4 ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines 2010-08-04 16:45 ` [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines Borislav Petkov @ 2010-08-04 23:05 ` H. Peter Anvin 2010-08-05 4:48 ` Borislav Petkov 2010-08-05 7:45 ` Roedel, Joerg 0 siblings, 2 replies; 31+ messages in thread From: H. Peter Anvin @ 2010-08-04 23:05 UTC (permalink / raw) To: Borislav Petkov Cc: mingo, tglx, andreas.herrmann3, conny.seidel, joerg.roedel, Bhavna.Sarathy, greg, x86, linux-kernel On 08/04/2010 09:45 AM, Borislav Petkov wrote: > > 2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit > does. > > This patch implements solution 2. It introduces a trampoline_pg_dir > which has the same layout as swapper_pg_dir with low_mappings. This page > table is used as the initial page table of the booting CPU. Later in the > bringup process, it switches to swapper_pg_dir and does a global TLB > flush. This fixes the crashes in our test cases. > I would like to keep around a page directory with the low mappings around -- and not use it for kernel threads -- at all times *anyway*. This means we can remove any current hacks that we have to do around S3 entry and exit, for example. --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -328,7 +328,7 @@ ENTRY(startup_32_smp) /* * Enable paging */ - movl $pa(swapper_pg_dir),%eax + movl pa(initial_page_table), %eax movl %eax,%cr3 /* set the page table pointer.. */ movl %cr0,%eax orl $X86_CR0_PG,%eax @@ -608,6 +608,8 @@ ignore_int: .align 4 ENTRY(initial_code) .long i386_start_kernel +ENTRY(initial_page_table) + .long pa(swapper_pg_dir) /* * BSS section @@ -623,6 +625,10 @@ ENTRY(swapper_pg_dir) #endif swapper_pg_fixmap: .fill 1024,4,0 +#ifdef CONFIG_X86_TRAMPOLINE +ENTRY(trampoline_pg_dir) + .fill 1024,4,0 +#endif I don't really see why this makes sense, though. It would make more sense that the initial page table we set up becomes trampoline_pg_dir; we can then set up and change to swapper_pg_dir almost immediately. I realize this isn't how the 64-bit code works at the moment, but in a lot of ways I think it would be better if it did. -hpa ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines 2010-08-04 23:05 ` H. Peter Anvin @ 2010-08-05 4:48 ` Borislav Petkov 2010-08-05 7:45 ` Roedel, Joerg 1 sibling, 0 replies; 31+ messages in thread From: Borislav Petkov @ 2010-08-05 4:48 UTC (permalink / raw) To: H. Peter Anvin Cc: mingo, tglx, Herrmann3, Andreas, Seidel, Conny, Roedel, Joerg, Sarathy, Bhavna, greg, x86, linux-kernel From: "H. Peter Anvin" <hpa@zytor.com> Date: Wed, Aug 04, 2010 at 07:05:47PM -0400 > On 08/04/2010 09:45 AM, Borislav Petkov wrote: > > > > 2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit > > does. > > > > This patch implements solution 2. It introduces a trampoline_pg_dir > > which has the same layout as swapper_pg_dir with low_mappings. This page > > table is used as the initial page table of the booting CPU. Later in the > > bringup process, it switches to swapper_pg_dir and does a global TLB > > flush. This fixes the crashes in our test cases. > > > > I would like to keep around a page directory with the low mappings > around -- and not use it for kernel threads -- at all times *anyway*. > This means we can remove any current hacks that we have to do around S3 > entry and exit, for example. > > --- a/arch/x86/kernel/head_32.S > +++ b/arch/x86/kernel/head_32.S > @@ -328,7 +328,7 @@ ENTRY(startup_32_smp) > /* > * Enable paging > */ > - movl $pa(swapper_pg_dir),%eax > + movl pa(initial_page_table), %eax > movl %eax,%cr3 /* set the page table pointer.. */ > movl %cr0,%eax > orl $X86_CR0_PG,%eax > @@ -608,6 +608,8 @@ ignore_int: > .align 4 > ENTRY(initial_code) > .long i386_start_kernel > +ENTRY(initial_page_table) > + .long pa(swapper_pg_dir) > > /* > * BSS section > @@ -623,6 +625,10 @@ ENTRY(swapper_pg_dir) > #endif > swapper_pg_fixmap: > .fill 1024,4,0 > +#ifdef CONFIG_X86_TRAMPOLINE > +ENTRY(trampoline_pg_dir) > + .fill 1024,4,0 > +#endif > > I don't really see why this makes sense, though. It would make more > sense that the initial page table we set up becomes trampoline_pg_dir; > we can then set up and change to swapper_pg_dir almost immediately. Yeah, now we use swapper_pg_dir at all times and zap the low mappings. However, this is not perfectly clean, as this case in point shows how unrelated CPUs might establish TLB entries speculatively. Now imagine if they don't mcheck about it but silently and merrily continue on. In this particular case, there were no improper accesses due to the user/kernel permissions mismatch but imagine if suddenly kernel code started accessing userspace and this not through copy_to_user() et al. So it really does make sense to have an initial page table and copy swapper_pg_dir from it. Which would be a perfect exercise for someone who would like to play with the boot code a bit more, ^hint hint^, if Joerg doesn't beat me to it. But I'd suggest we get those fixes in now if there are no objections and later adjustments should come ontop after excessive testing. And what about backporting those fixes to .32 and .34, would you be ok with that? Greg, what about you? Thanks. -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines 2010-08-04 23:05 ` H. Peter Anvin 2010-08-05 4:48 ` Borislav Petkov @ 2010-08-05 7:45 ` Roedel, Joerg 2010-08-05 14:14 ` H. Peter Anvin 1 sibling, 1 reply; 31+ messages in thread From: Roedel, Joerg @ 2010-08-05 7:45 UTC (permalink / raw) To: H. Peter Anvin Cc: Borislav Petkov, mingo, tglx, Herrmann3, Andreas, Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel On Wed, Aug 04, 2010 at 07:05:47PM -0400, H. Peter Anvin wrote: > On 08/04/2010 09:45 AM, Borislav Petkov wrote: > > > > 2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit > > does. > > > > This patch implements solution 2. It introduces a trampoline_pg_dir > > which has the same layout as swapper_pg_dir with low_mappings. This page > > table is used as the initial page table of the booting CPU. Later in the > > bringup process, it switches to swapper_pg_dir and does a global TLB > > flush. This fixes the crashes in our test cases. > > > > I would like to keep around a page directory with the low mappings > around -- and not use it for kernel threads -- at all times *anyway*. > This means we can remove any current hacks that we have to do around S3 > entry and exit, for example. Yeah, the page table with the low mappings is trampoline_pg_dir introduced in this patch :-) > --- a/arch/x86/kernel/head_32.S > +++ b/arch/x86/kernel/head_32.S > @@ -328,7 +328,7 @@ ENTRY(startup_32_smp) > /* > * Enable paging > */ > - movl $pa(swapper_pg_dir),%eax > + movl pa(initial_page_table), %eax > movl %eax,%cr3 /* set the page table pointer.. */ > movl %cr0,%eax > orl $X86_CR0_PG,%eax > @@ -608,6 +608,8 @@ ignore_int: > .align 4 > ENTRY(initial_code) > .long i386_start_kernel > +ENTRY(initial_page_table) > + .long pa(swapper_pg_dir) > > /* > * BSS section > @@ -623,6 +625,10 @@ ENTRY(swapper_pg_dir) > #endif > swapper_pg_fixmap: > .fill 1024,4,0 > +#ifdef CONFIG_X86_TRAMPOLINE > +ENTRY(trampoline_pg_dir) > + .fill 1024,4,0 > +#endif > > I don't really see why this makes sense, though. It would make more > sense that the initial page table we set up becomes trampoline_pg_dir; > we can then set up and change to swapper_pg_dir almost immediately. To make sure I understand correctly, you suggest to initialize tramponline_pg_dir in the boot sequence of the first cpu and fork swapper_pg_dir from it later on? > I realize this isn't how the 64-bit code works at the moment, but in a > lot of ways I think it would be better if it did. Yeah, may make sense. This patch already brings the 32 bit implementation closer to the 64 bit one. On 64 bit things are somewhat simpler because the tramponline page table can be defined at compile-time there (contains only 2 pgd_t entries) while on 32 bit we have to initialize it at runtime. Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines 2010-08-05 7:45 ` Roedel, Joerg @ 2010-08-05 14:14 ` H. Peter Anvin 2010-08-12 14:41 ` Joerg Roedel 0 siblings, 1 reply; 31+ messages in thread From: H. Peter Anvin @ 2010-08-05 14:14 UTC (permalink / raw) To: Roedel, Joerg Cc: Borislav Petkov, mingo, tglx, Herrmann3, Andreas, Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel On 08/05/2010 12:45 AM, Roedel, Joerg wrote: > > To make sure I understand correctly, you suggest to initialize > tramponline_pg_dir in the boot sequence of the first cpu and fork > swapper_pg_dir from it later on? > Correct. >> I realize this isn't how the 64-bit code works at the moment, but in a >> lot of ways I think it would be better if it did. > > Yeah, may make sense. This patch already brings the 32 bit > implementation closer to the 64 bit one. On 64 bit things are somewhat > simpler because the tramponline page table can be defined at > compile-time there (contains only 2 pgd_t entries) while on 32 bit we > have to initialize it at runtime. Correct, again. It's unclear to me if we can get away with the very simple 64-bit approach -- in particular, not including all the 1:1 mappings in the kernel -- for all future users, though. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines 2010-08-05 14:14 ` H. Peter Anvin @ 2010-08-12 14:41 ` Joerg Roedel 2010-08-12 15:34 ` H. Peter Anvin 0 siblings, 1 reply; 31+ messages in thread From: Joerg Roedel @ 2010-08-12 14:41 UTC (permalink / raw) To: H. Peter Anvin Cc: Roedel, Joerg, Borislav Petkov, mingo, tglx, Herrmann3, Andreas, Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel On Thu, Aug 05, 2010 at 07:14:14AM -0700, H. Peter Anvin wrote: > On 08/05/2010 12:45 AM, Roedel, Joerg wrote: > Correct, again. It's unclear to me if we can get away with the very > simple 64-bit approach -- in particular, not including all the 1:1 > mappings in the kernel -- for all future users, though. Yeah, We could probably use the same symbol names for the trampoline and the swapper page-table on 32 and 64 bit and merge the code that handles it between the architectures. Shouldn't be too difficult. We cook something up :-) But do you mind to take this patch first? It fixes the occurence of an erratum on AMD hardware, it is a quite simple fix compared to the rework suggested. In fact, I would like to have a simple patch to fix the problem first (and backport it as necessary) and then go the way to rework the current code differences between 32 and 64 bit. Joerg ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines 2010-08-12 14:41 ` Joerg Roedel @ 2010-08-12 15:34 ` H. Peter Anvin 2010-08-12 15:47 ` Borislav Petkov 2010-08-12 17:06 ` [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines Joerg Roedel 0 siblings, 2 replies; 31+ messages in thread From: H. Peter Anvin @ 2010-08-12 15:34 UTC (permalink / raw) To: Joerg Roedel Cc: Roedel, Joerg, Borislav Petkov, mingo, tglx, Herrmann3, Andreas, Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel On 08/12/2010 07:41 AM, Joerg Roedel wrote: > On Thu, Aug 05, 2010 at 07:14:14AM -0700, H. Peter Anvin wrote: >> On 08/05/2010 12:45 AM, Roedel, Joerg wrote: >> Correct, again. It's unclear to me if we can get away with the very >> simple 64-bit approach -- in particular, not including all the 1:1 >> mappings in the kernel -- for all future users, though. > > Yeah, We could probably use the same symbol names for the trampoline and > the swapper page-table on 32 and 64 bit and merge the code that handles > it between the architectures. Shouldn't be too difficult. We cook > something up :-) > But do you mind to take this patch first? It fixes the occurence of an > erratum on AMD hardware, it is a quite simple fix compared to the rework > suggested. In fact, I would like to have a simple patch to fix the > problem first (and backport it as necessary) and then go the way to > rework the current code differences between 32 and 64 bit. > Agreed. I do have a concern about the kernel page tables not being synchronized into trampoline_pg_dir (it's only an issue for 32-bit !PAE), so at the very least we need to keep an eye out for it... -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines 2010-08-12 15:34 ` H. Peter Anvin @ 2010-08-12 15:47 ` Borislav Petkov 2010-08-12 15:47 ` H. Peter Anvin 2010-08-12 17:06 ` [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines Joerg Roedel 1 sibling, 1 reply; 31+ messages in thread From: Borislav Petkov @ 2010-08-12 15:47 UTC (permalink / raw) To: H. Peter Anvin Cc: Joerg Roedel, Roedel, Joerg, mingo, tglx, Herrmann3, Andreas, Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel From: "H. Peter Anvin" <hpa@zytor.com> Date: Thu, Aug 12, 2010 at 11:34:37AM -0400 > On 08/12/2010 07:41 AM, Joerg Roedel wrote: > > On Thu, Aug 05, 2010 at 07:14:14AM -0700, H. Peter Anvin wrote: > >> On 08/05/2010 12:45 AM, Roedel, Joerg wrote: > >> Correct, again. It's unclear to me if we can get away with the very > >> simple 64-bit approach -- in particular, not including all the 1:1 > >> mappings in the kernel -- for all future users, though. > > > > Yeah, We could probably use the same symbol names for the trampoline and > > the swapper page-table on 32 and 64 bit and merge the code that handles > > it between the architectures. Shouldn't be too difficult. We cook > > something up :-) > > But do you mind to take this patch first? It fixes the occurence of an > > erratum on AMD hardware, it is a quite simple fix compared to the rework > > suggested. In fact, I would like to have a simple patch to fix the > > problem first (and backport it as necessary) and then go the way to > > rework the current code differences between 32 and 64 bit. > > > > Agreed. I do have a concern about the kernel page tables not being > synchronized into trampoline_pg_dir (it's only an issue for 32-bit > !PAE), so at the very least we need to keep an eye out for it... I've been working on what you suggested and the patch boots on a 32-bit PAE kernel but keeps rebooting on non-PAE after setting the stack_start on the AP. I could send it out later for you to look at, if you like - you should be able to spot what I'm missing :)... -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines 2010-08-12 15:47 ` Borislav Petkov @ 2010-08-12 15:47 ` H. Peter Anvin 2010-08-12 17:38 ` Borislav Petkov 2010-08-24 7:33 ` Initial working version (Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines) Borislav Petkov 0 siblings, 2 replies; 31+ messages in thread From: H. Peter Anvin @ 2010-08-12 15:47 UTC (permalink / raw) To: Borislav Petkov Cc: Joerg Roedel, Roedel, Joerg, mingo, tglx, Herrmann3, Andreas, Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel On 08/12/2010 08:47 AM, Borislav Petkov wrote: >> >> Agreed. I do have a concern about the kernel page tables not being >> synchronized into trampoline_pg_dir (it's only an issue for 32-bit >> !PAE), so at the very least we need to keep an eye out for it... > > I've been working on what you suggested and the patch boots on a 32-bit > PAE kernel but keeps rebooting on non-PAE after setting the stack_start > on the AP. I could send it out later for you to look at, if you like - > you should be able to spot what I'm missing :)... > Sounds good. This is obviously .37-or-higher material. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines 2010-08-12 15:47 ` H. Peter Anvin @ 2010-08-12 17:38 ` Borislav Petkov 2010-08-24 7:33 ` Initial working version (Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines) Borislav Petkov 1 sibling, 0 replies; 31+ messages in thread From: Borislav Petkov @ 2010-08-12 17:38 UTC (permalink / raw) To: H. Peter Anvin Cc: Joerg Roedel, Roedel, Joerg, mingo, tglx, Herrmann3, Andreas, Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel From: "H. Peter Anvin" <hpa@zytor.com> Date: Thu, Aug 12, 2010 at 11:47:55AM -0400 > On 08/12/2010 08:47 AM, Borislav Petkov wrote: > >> > >> Agreed. I do have a concern about the kernel page tables not being > >> synchronized into trampoline_pg_dir (it's only an issue for 32-bit > >> !PAE), so at the very least we need to keep an eye out for it... > > > > I've been working on what you suggested and the patch boots on a 32-bit > > PAE kernel but keeps rebooting on non-PAE after setting the stack_start > > on the AP. I could send it out later for you to look at, if you like - > > you should be able to spot what I'm missing :)... > > > > Sounds good. This is obviously .37-or-higher material. Definitely, this needs a lot of hammering before going upstream. So here's what I got so far, I went and changed everything called swapper_* when used to prep the initial page table to initial_*. Then in setup_arch I switch to the swapper_pg_dir after copying the kernel address mappings into it so all other places touching swapper_pg_dir can remain unchanged. Then, initial_page_table is still used for booting the APs. The nice effect is that we drop all that swsusp_pg_dir for ACPI_SLEEP and unifying the do_boot_cpu() part of start_secondary() even more. So this boots fine on 32-bit PAE but reboots the machine on !PAE when the AP does /* Set up the stack pointer */ lss stack_start,%esp pushl $0 popfl in head_32.S. I've debugged this by adding endless loops in the manner of @@@ -274,6 -274,6 +274,7 @@@ ENTRY(startup_32_smp movl %eax,%es movl %eax,%fs movl %eax,%gs + movl $0xdeadbeef, %ebx #endif /* CONFIG_SMP */ so that this happens only on the AP and then did +667: + cmpl $0xdeadbeef, %ebx + je 667b + So when the loop is before the "pushl.. popfl" sequence above, the BSP would print "CPU stuck??" for the AP and continue booting by not bringing the AP online. If I put the endless loop after that sequence, the machine reboots. This is also nicely reproducible in kvm for even faster and safer experimenting with the code. Thanks. -- diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h index 2984a25..8abde9e 100644 --- a/arch/x86/include/asm/pgtable_32.h +++ b/arch/x86/include/asm/pgtable_32.h @@ -26,6 +26,7 @@ struct mm_struct; struct vm_area_struct; extern pgd_t swapper_pg_dir[1024]; +extern pgd_t initial_page_table[1024]; static inline void pgtable_cache_init(void) { } static inline void check_pgt_cache(void) { } diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 7f3eba0..169be89 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -172,6 +172,4 @@ static inline void flush_tlb_kernel_range(unsigned long start, flush_tlb_all(); } -extern void zap_low_mappings(bool early); - #endif /* _ASM_X86_TLBFLUSH_H */ diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c index fcc3c61..48d89d6 100644 --- a/arch/x86/kernel/acpi/sleep.c +++ b/arch/x86/kernel/acpi/sleep.c @@ -12,6 +12,11 @@ #include <asm/segment.h> #include <asm/desc.h> +#ifdef CONFIG_X86_32 +#include <asm/pgtable.h> +#include <asm/pgtable_32.h> +#endif + #include "realmode/wakeup.h" #include "sleep.h" @@ -90,7 +95,7 @@ int acpi_save_state_mem(void) #ifndef CONFIG_64BIT header->pmode_entry = (u32)&wakeup_pmode_return; - header->pmode_cr3 = (u32)(swsusp_pg_dir - __PAGE_OFFSET); + header->pmode_cr3 = (u32)(initial_page_table - __PAGE_OFFSET); saved_magic = 0x12345678; #else /* CONFIG_64BIT */ header->trampoline_segment = setup_trampoline() >> 4; diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S index 37c3d4b..80b0961 100644 --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -177,7 +177,7 @@ default_entry: #ifdef CONFIG_X86_PAE /* - * In PAE mode swapper_pg_dir is statically defined to contain enough + * In PAE mode initial_page_table is statically defined to contain enough * entries to cover the VMSPLIT option (that is the top 1, 2 or 3 * entries). The identity mapping is handled by pointing two PGD * entries to the first kernel PMD. @@ -191,7 +191,7 @@ default_entry: xorl %ebx,%ebx /* %ebx is kept at zero */ movl $pa(__brk_base), %edi - movl $pa(swapper_pg_pmd), %edx + movl $pa(initial_pg_pmd), %edx movl $PTE_IDENT_ATTR, %eax 10: leal PDE_IDENT_ATTR(%edi),%ecx /* Create PMD entry */ @@ -220,14 +220,14 @@ default_entry: movl %eax, pa(max_pfn_mapped) /* Do early initialization of the fixmap area */ - movl $pa(swapper_pg_fixmap)+PDE_IDENT_ATTR,%eax - movl %eax,pa(swapper_pg_pmd+0x1000*KPMDS-8) + movl $pa(initial_pg_fixmap)+PDE_IDENT_ATTR,%eax + movl %eax,pa(initial_pg_pmd+0x1000*KPMDS-8) #else /* Not PAE */ page_pde_offset = (__PAGE_OFFSET >> 20); movl $pa(__brk_base), %edi - movl $pa(swapper_pg_dir), %edx + movl $pa(initial_page_table), %edx movl $PTE_IDENT_ATTR, %eax 10: leal PDE_IDENT_ATTR(%edi),%ecx /* Create PDE entry */ @@ -251,8 +251,8 @@ page_pde_offset = (__PAGE_OFFSET >> 20); movl %eax, pa(max_pfn_mapped) /* Do early initialization of the fixmap area */ - movl $pa(swapper_pg_fixmap)+PDE_IDENT_ATTR,%eax - movl %eax,pa(swapper_pg_dir+0xffc) + movl $pa(initial_pg_fixmap)+PDE_IDENT_ATTR,%eax + movl %eax,pa(initial_page_table+0xffc) #endif jmp 3f /* @@ -328,7 +328,7 @@ ENTRY(startup_32_smp) /* * Enable paging */ - movl $pa(swapper_pg_dir),%eax + movl $pa(initial_page_table),%eax movl %eax,%cr3 /* set the page table pointer.. */ movl %cr0,%eax orl $X86_CR0_PG,%eax @@ -615,16 +615,18 @@ ENTRY(initial_code) __PAGE_ALIGNED_BSS .align PAGE_SIZE_asm #ifdef CONFIG_X86_PAE -swapper_pg_pmd: +initial_pg_pmd: .fill 1024*KPMDS,4,0 #else -ENTRY(swapper_pg_dir) +ENTRY(initial_page_table) .fill 1024,4,0 #endif -swapper_pg_fixmap: +initial_pg_fixmap: .fill 1024,4,0 ENTRY(empty_zero_page) .fill 4096,1,0 +ENTRY(swapper_pg_dir) + .fill 1024,4,0 /* * This starts the data section. @@ -633,20 +635,20 @@ ENTRY(empty_zero_page) __PAGE_ALIGNED_DATA /* Page-aligned for the benefit of paravirt? */ .align PAGE_SIZE_asm -ENTRY(swapper_pg_dir) - .long pa(swapper_pg_pmd+PGD_IDENT_ATTR),0 /* low identity map */ +ENTRY(initial_page_table) + .long pa(initial_pg_pmd+PGD_IDENT_ATTR),0 /* low identity map */ # if KPMDS == 3 - .long pa(swapper_pg_pmd+PGD_IDENT_ATTR),0 - .long pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x1000),0 - .long pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x2000),0 + .long pa(initial_pg_pmd+PGD_IDENT_ATTR),0 + .long pa(initial_pg_pmd+PGD_IDENT_ATTR+0x1000),0 + .long pa(initial_pg_pmd+PGD_IDENT_ATTR+0x2000),0 # elif KPMDS == 2 .long 0,0 - .long pa(swapper_pg_pmd+PGD_IDENT_ATTR),0 - .long pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x1000),0 + .long pa(initial_pg_pmd+PGD_IDENT_ATTR),0 + .long pa(initial_pg_pmd+PGD_IDENT_ATTR+0x1000),0 # elif KPMDS == 1 .long 0,0 .long 0,0 - .long pa(swapper_pg_pmd+PGD_IDENT_ATTR),0 + .long pa(initial_pg_pmd+PGD_IDENT_ATTR),0 # else # error "Kernel PMDs should be 1, 2 or 3" # endif diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index b4ae4ac..ba19c9e 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -727,6 +727,15 @@ void __init setup_arch(char **cmdline_p) int k8 = 0; #ifdef CONFIG_X86_32 + /* Copy kernel address range */ + clone_pgd_range(swapper_pg_dir + KERNEL_PGD_BOUNDARY, + initial_page_table + KERNEL_PGD_BOUNDARY, + min_t(unsigned long, KERNEL_PGD_PTRS, + KERNEL_PGD_BOUNDARY)); + + load_cr3(swapper_pg_dir); + __flush_tlb_all(); + memcpy(&boot_cpu_data, &new_cpu_data, sizeof(new_cpu_data)); visws_early_detect(); #else diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index c4f33b2..d1bd555 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -73,7 +73,6 @@ #ifdef CONFIG_X86_32 u8 apicid_2_node[MAX_APICID]; -static int low_mappings; #endif /* State of each CPU */ @@ -300,8 +299,7 @@ notrace static void __cpuinit start_secondary(void *unused) } #ifdef CONFIG_X86_32 - while (low_mappings) - cpu_relax(); + load_cr3(swapper_pg_dir); __flush_tlb_all(); #endif @@ -894,20 +892,7 @@ int __cpuinit native_cpu_up(unsigned int cpu) per_cpu(cpu_state, cpu) = CPU_UP_PREPARE; -#ifdef CONFIG_X86_32 - /* init low mem mapping */ - clone_pgd_range(swapper_pg_dir, swapper_pg_dir + KERNEL_PGD_BOUNDARY, - min_t(unsigned long, KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY)); - flush_tlb_all(); - low_mappings = 1; - err = do_boot_cpu(apicid, cpu); - - zap_low_mappings(false); - low_mappings = 0; -#else - err = do_boot_cpu(apicid, cpu); -#endif if (err) { pr_debug("do_boot_cpu failed %d\n", err); return -EIO; diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c index bca7909..adad48b 100644 --- a/arch/x86/mm/init_32.c +++ b/arch/x86/mm/init_32.c @@ -548,48 +548,6 @@ static void __init pagetable_init(void) permanent_kmaps_init(pgd_base); } -#ifdef CONFIG_ACPI_SLEEP -/* - * ACPI suspend needs this for resume, because things like the intel-agp - * driver might have split up a kernel 4MB mapping. - */ -char swsusp_pg_dir[PAGE_SIZE] - __attribute__ ((aligned(PAGE_SIZE))); - -static inline void save_pg_dir(void) -{ - memcpy(swsusp_pg_dir, swapper_pg_dir, PAGE_SIZE); -} -#else /* !CONFIG_ACPI_SLEEP */ -static inline void save_pg_dir(void) -{ -} -#endif /* !CONFIG_ACPI_SLEEP */ - -void zap_low_mappings(bool early) -{ - int i; - - /* - * Zap initial low-memory mappings. - * - * Note that "pgd_clear()" doesn't do it for - * us, because pgd_clear() is a no-op on i386. - */ - for (i = 0; i < KERNEL_PGD_BOUNDARY; i++) { -#ifdef CONFIG_X86_PAE - set_pgd(swapper_pg_dir+i, __pgd(1 + __pa(empty_zero_page))); -#else - set_pgd(swapper_pg_dir+i, __pgd(0)); -#endif - } - - if (early) - __flush_tlb(); - else - flush_tlb_all(); -} - pteval_t __supported_pte_mask __read_mostly = ~(_PAGE_NX | _PAGE_GLOBAL | _PAGE_IOMAP); EXPORT_SYMBOL_GPL(__supported_pte_mask); @@ -958,9 +916,6 @@ void __init mem_init(void) if (boot_cpu_data.wp_works_ok < 0) test_wp_bit(); - - save_pg_dir(); - zap_low_mappings(true); } #ifdef CONFIG_MEMORY_HOTPLUG -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Initial working version (Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines) 2010-08-12 15:47 ` H. Peter Anvin 2010-08-12 17:38 ` Borislav Petkov @ 2010-08-24 7:33 ` Borislav Petkov 2010-08-29 20:32 ` [PATCH] x86-32, mm: Add an initial page table for core bootstrapping Borislav Petkov 1 sibling, 1 reply; 31+ messages in thread From: Borislav Petkov @ 2010-08-24 7:33 UTC (permalink / raw) To: H. Peter Anvin Cc: Borislav Petkov, Joerg Roedel, Roedel, Joerg, mingo, tglx, Herrmann3, Andreas, Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel From: "H. Peter Anvin" <hpa@zytor.com> Date: Thu, Aug 12, 2010 at 08:47:55AM -0700 > > I've been working on what you suggested and the patch boots on a 32-bit > > PAE kernel but keeps rebooting on non-PAE after setting the stack_start > > on the AP. I could send it out later for you to look at, if you like - > > you should be able to spot what I'm missing :)... > > > > Sounds good. This is obviously .37-or-higher material. Ok, here we go, the version below boots a !PAE and a PAE kernel fine in kvm. The approach taken is, IMHO, the least intrusive in trying to touch code assuming swapper_pg_dir as little as possible and use initial_page_table for boot/acpi sleep/reboot only. For example, paths like init_memory_mapping -> kernel_physical_mapping_init -> ... which fill in the kernel mappings into swapper_pg_dir are left unchanged by switching to swapper_pg_dir at the beginning of setup_arch() BTW, I guess this could be done even earlier since I copy the mappings from initial_page_table so there's no difference between the two at the switch point. Later, after paging_init(), synching back all swapper_pg_dir changes into initial_page_table is done for the APs to boot and have a stack mapping in the initial_page_table so that they don't pagefault into a triple fault (this was the problem with the previous version, btw.) Please take a look and let me know if I've forgotten something before I start hammering on it on a real hardware. Thanks. --- diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h index f686f49..8abde9e 100644 --- a/arch/x86/include/asm/pgtable_32.h +++ b/arch/x86/include/asm/pgtable_32.h @@ -26,7 +26,7 @@ struct mm_struct; struct vm_area_struct; extern pgd_t swapper_pg_dir[1024]; -extern pgd_t trampoline_pg_dir[1024]; +extern pgd_t initial_page_table[1024]; static inline void pgtable_cache_init(void) { } static inline void check_pgt_cache(void) { } diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 7f3eba0..169be89 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -172,6 +172,4 @@ static inline void flush_tlb_kernel_range(unsigned long start, flush_tlb_all(); } -extern void zap_low_mappings(bool early); - #endif /* _ASM_X86_TLBFLUSH_H */ diff --git a/arch/x86/include/asm/trampoline.h b/arch/x86/include/asm/trampoline.h index 4dde797..f4500fb 100644 --- a/arch/x86/include/asm/trampoline.h +++ b/arch/x86/include/asm/trampoline.h @@ -13,16 +13,13 @@ extern unsigned char *trampoline_base; extern unsigned long init_rsp; extern unsigned long initial_code; -extern unsigned long initial_page_table; extern unsigned long initial_gs; #define TRAMPOLINE_SIZE roundup(trampoline_end - trampoline_data, PAGE_SIZE) extern unsigned long setup_trampoline(void); -extern void __init setup_trampoline_page_table(void); extern void __init reserve_trampoline_memory(void); #else -static inline void setup_trampoline_page_table(void) {} static inline void reserve_trampoline_memory(void) {} #endif /* CONFIG_X86_TRAMPOLINE */ diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c index 33cec15..d04500e 100644 --- a/arch/x86/kernel/acpi/sleep.c +++ b/arch/x86/kernel/acpi/sleep.c @@ -12,6 +12,11 @@ #include <asm/segment.h> #include <asm/desc.h> +#ifdef CONFIG_X86_32 +#include <asm/pgtable.h> +#include <asm/pgtable_32.h> +#endif + #include "realmode/wakeup.h" #include "sleep.h" @@ -90,7 +95,7 @@ int acpi_save_state_mem(void) #ifndef CONFIG_64BIT header->pmode_entry = (u32)&wakeup_pmode_return; - header->pmode_cr3 = (u32)(swsusp_pg_dir - __PAGE_OFFSET); + header->pmode_cr3 = (u32)(initial_page_table - __PAGE_OFFSET); saved_magic = 0x12345678; #else /* CONFIG_64BIT */ header->trampoline_segment = setup_trampoline() >> 4; diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c index 784360c..8b9c201 100644 --- a/arch/x86/kernel/head32.c +++ b/arch/x86/kernel/head32.c @@ -17,6 +17,7 @@ #include <asm/apic.h> #include <asm/io_apic.h> #include <asm/bios_ebda.h> +#include <asm/tlbflush.h> static void __init i386_default_early_setup(void) { diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S index fa8c1b8..005646a 100644 --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -183,7 +183,7 @@ default_entry: #ifdef CONFIG_X86_PAE /* - * In PAE mode swapper_pg_dir is statically defined to contain enough + * In PAE mode initial_page_table is statically defined to contain enough * entries to cover the VMSPLIT option (that is the top 1, 2 or 3 * entries). The identity mapping is handled by pointing two PGD * entries to the first kernel PMD. @@ -197,7 +197,7 @@ default_entry: xorl %ebx,%ebx /* %ebx is kept at zero */ movl $pa(__brk_base), %edi - movl $pa(swapper_pg_pmd), %edx + movl $pa(initial_pg_pmd), %edx movl $PTE_IDENT_ATTR, %eax 10: leal PDE_IDENT_ATTR(%edi),%ecx /* Create PMD entry */ @@ -226,14 +226,14 @@ default_entry: movl %eax, pa(max_pfn_mapped) /* Do early initialization of the fixmap area */ - movl $pa(swapper_pg_fixmap)+PDE_IDENT_ATTR,%eax - movl %eax,pa(swapper_pg_pmd+0x1000*KPMDS-8) + movl $pa(initial_pg_fixmap)+PDE_IDENT_ATTR,%eax + movl %eax,pa(initial_pg_pmd+0x1000*KPMDS-8) #else /* Not PAE */ page_pde_offset = (__PAGE_OFFSET >> 20); movl $pa(__brk_base), %edi - movl $pa(swapper_pg_dir), %edx + movl $pa(initial_page_table), %edx movl $PTE_IDENT_ATTR, %eax 10: leal PDE_IDENT_ATTR(%edi),%ecx /* Create PDE entry */ @@ -257,8 +257,8 @@ page_pde_offset = (__PAGE_OFFSET >> 20); movl %eax, pa(max_pfn_mapped) /* Do early initialization of the fixmap area */ - movl $pa(swapper_pg_fixmap)+PDE_IDENT_ATTR,%eax - movl %eax,pa(swapper_pg_dir+0xffc) + movl $pa(initial_pg_fixmap)+PDE_IDENT_ATTR,%eax + movl %eax,pa(initial_page_table+0xffc) #endif jmp 3f /* @@ -334,7 +334,7 @@ ENTRY(startup_32_smp) /* * Enable paging */ - movl pa(initial_page_table), %eax + movl $pa(initial_page_table), %eax movl %eax,%cr3 /* set the page table pointer.. */ movl %cr0,%eax orl $X86_CR0_PG,%eax @@ -614,8 +614,6 @@ ignore_int: .align 4 ENTRY(initial_code) .long i386_start_kernel -ENTRY(initial_page_table) - .long pa(swapper_pg_dir) /* * BSS section @@ -623,20 +621,18 @@ ENTRY(initial_page_table) __PAGE_ALIGNED_BSS .align PAGE_SIZE_asm #ifdef CONFIG_X86_PAE -swapper_pg_pmd: +initial_pg_pmd: .fill 1024*KPMDS,4,0 #else -ENTRY(swapper_pg_dir) +ENTRY(initial_page_table) .fill 1024,4,0 #endif -swapper_pg_fixmap: +initial_pg_fixmap: .fill 1024,4,0 -#ifdef CONFIG_X86_TRAMPOLINE -ENTRY(trampoline_pg_dir) - .fill 1024,4,0 -#endif ENTRY(empty_zero_page) .fill 4096,1,0 +ENTRY(swapper_pg_dir) + .fill 1024,4,0 /* * This starts the data section. @@ -645,20 +641,20 @@ ENTRY(empty_zero_page) __PAGE_ALIGNED_DATA /* Page-aligned for the benefit of paravirt? */ .align PAGE_SIZE_asm -ENTRY(swapper_pg_dir) - .long pa(swapper_pg_pmd+PGD_IDENT_ATTR),0 /* low identity map */ +ENTRY(initial_page_table) + .long pa(initial_pg_pmd+PGD_IDENT_ATTR),0 /* low identity map */ # if KPMDS == 3 - .long pa(swapper_pg_pmd+PGD_IDENT_ATTR),0 - .long pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x1000),0 - .long pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x2000),0 + .long pa(initial_pg_pmd+PGD_IDENT_ATTR),0 + .long pa(initial_pg_pmd+PGD_IDENT_ATTR+0x1000),0 + .long pa(initial_pg_pmd+PGD_IDENT_ATTR+0x2000),0 # elif KPMDS == 2 .long 0,0 - .long pa(swapper_pg_pmd+PGD_IDENT_ATTR),0 - .long pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x1000),0 + .long pa(initial_pg_pmd+PGD_IDENT_ATTR),0 + .long pa(initial_pg_pmd+PGD_IDENT_ATTR+0x1000),0 # elif KPMDS == 1 .long 0,0 .long 0,0 - .long pa(swapper_pg_pmd+PGD_IDENT_ATTR),0 + .long pa(initial_pg_pmd+PGD_IDENT_ATTR),0 # else # error "Kernel PMDs should be 1, 2 or 3" # endif diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c index e3af342..ce5566e 100644 --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -371,16 +371,10 @@ void machine_real_restart(const unsigned char *code, int length) CMOS_WRITE(0x00, 0x8f); spin_unlock(&rtc_lock); - /* Remap the kernel at virtual address zero, as well as offset zero - from the kernel segment. This assumes the kernel segment starts at - virtual address PAGE_OFFSET. */ - memcpy(swapper_pg_dir, swapper_pg_dir + KERNEL_PGD_BOUNDARY, - sizeof(swapper_pg_dir [0]) * KERNEL_PGD_PTRS); - /* - * Use `swapper_pg_dir' as our page directory. + * Switch back to the initial page table. */ - load_cr3(swapper_pg_dir); + load_cr3(initial_page_table); /* Write 0x1234 to absolute memory location 0x472. The BIOS reads this on booting to tell it to "Bypass memory test (also warm diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index c3a4fbb..32322df 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -730,6 +730,17 @@ void __init setup_arch(char **cmdline_p) #ifdef CONFIG_X86_32 memcpy(&boot_cpu_data, &new_cpu_data, sizeof(new_cpu_data)); visws_early_detect(); + + /* + * copy kernel address range established so far and switch + * to the proper swapper page table + */ + clone_pgd_range(swapper_pg_dir + KERNEL_PGD_BOUNDARY, + initial_page_table + KERNEL_PGD_BOUNDARY, + min_t(unsigned long, KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY)); + + load_cr3(swapper_pg_dir); + __flush_tlb_all(); #else printk(KERN_INFO "Command line: %s\n", boot_command_line); #endif @@ -1014,7 +1025,12 @@ void __init setup_arch(char **cmdline_p) paging_init(); x86_init.paging.pagetable_setup_done(swapper_pg_dir); - setup_trampoline_page_table(); +#ifdef CONFIG_X86_32 + /* sync back kernel address range */ + clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY, + swapper_pg_dir + KERNEL_PGD_BOUNDARY, + min_t(unsigned long, KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY)); +#endif tboot_probe(); diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 8b3bfc4..0d86d0b 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -300,22 +300,17 @@ notrace static void __cpuinit start_secondary(void *unused) * most necessary things. */ -#ifdef CONFIG_X86_32 - /* - * Switch away from the trampoline page-table - * - * Do this before cpu_init() because it needs to access per-cpu - * data which may not be mapped in the trampoline page-table. - */ - load_cr3(swapper_pg_dir); - __flush_tlb_all(); -#endif - vmi_bringup(); cpu_init(); preempt_disable(); smp_callin(); +#ifdef CONFIG_X86_32 + /* switch away from the initial page table */ + load_cr3(swapper_pg_dir); + __flush_tlb_all(); +#endif + /* otherwise gcc will move up smp_processor_id before the cpu_init */ barrier(); /* @@ -774,7 +769,6 @@ do_rest: #ifdef CONFIG_X86_32 /* Stack for startup_32 can be just as for start_secondary onwards */ irq_ctx_init(cpu); - initial_page_table = __pa(&trampoline_pg_dir); #else clear_tsk_thread_flag(c_idle.idle, TIF_FORK); initial_gs = per_cpu_offset(cpu); @@ -923,7 +917,6 @@ int __cpuinit native_cpu_up(unsigned int cpu) per_cpu(cpu_state, cpu) = CPU_UP_PREPARE; err = do_boot_cpu(apicid, cpu); - if (err) { pr_debug("do_boot_cpu failed %d\n", err); return -EIO; diff --git a/arch/x86/kernel/trampoline.c b/arch/x86/kernel/trampoline.c index a874495..f1488a3 100644 --- a/arch/x86/kernel/trampoline.c +++ b/arch/x86/kernel/trampoline.c @@ -38,20 +38,3 @@ unsigned long __trampinit setup_trampoline(void) memcpy(trampoline_base, trampoline_data, TRAMPOLINE_SIZE); return virt_to_phys(trampoline_base); } - -void __init setup_trampoline_page_table(void) -{ -#ifdef CONFIG_X86_32 - /* Copy kernel address range */ - clone_pgd_range(trampoline_pg_dir + KERNEL_PGD_BOUNDARY, - swapper_pg_dir + KERNEL_PGD_BOUNDARY, - min_t(unsigned long, KERNEL_PGD_PTRS, - KERNEL_PGD_BOUNDARY)); - - /* Initialize low mappings */ - clone_pgd_range(trampoline_pg_dir, - swapper_pg_dir + KERNEL_PGD_BOUNDARY, - min_t(unsigned long, KERNEL_PGD_PTRS, - KERNEL_PGD_BOUNDARY)); -#endif -} diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c index bca7909..adad48b 100644 --- a/arch/x86/mm/init_32.c +++ b/arch/x86/mm/init_32.c @@ -548,48 +548,6 @@ static void __init pagetable_init(void) permanent_kmaps_init(pgd_base); } -#ifdef CONFIG_ACPI_SLEEP -/* - * ACPI suspend needs this for resume, because things like the intel-agp - * driver might have split up a kernel 4MB mapping. - */ -char swsusp_pg_dir[PAGE_SIZE] - __attribute__ ((aligned(PAGE_SIZE))); - -static inline void save_pg_dir(void) -{ - memcpy(swsusp_pg_dir, swapper_pg_dir, PAGE_SIZE); -} -#else /* !CONFIG_ACPI_SLEEP */ -static inline void save_pg_dir(void) -{ -} -#endif /* !CONFIG_ACPI_SLEEP */ - -void zap_low_mappings(bool early) -{ - int i; - - /* - * Zap initial low-memory mappings. - * - * Note that "pgd_clear()" doesn't do it for - * us, because pgd_clear() is a no-op on i386. - */ - for (i = 0; i < KERNEL_PGD_BOUNDARY; i++) { -#ifdef CONFIG_X86_PAE - set_pgd(swapper_pg_dir+i, __pgd(1 + __pa(empty_zero_page))); -#else - set_pgd(swapper_pg_dir+i, __pgd(0)); -#endif - } - - if (early) - __flush_tlb(); - else - flush_tlb_all(); -} - pteval_t __supported_pte_mask __read_mostly = ~(_PAGE_NX | _PAGE_GLOBAL | _PAGE_IOMAP); EXPORT_SYMBOL_GPL(__supported_pte_mask); @@ -958,9 +916,6 @@ void __init mem_init(void) if (boot_cpu_data.wp_works_ok < 0) test_wp_bit(); - - save_pg_dir(); - zap_low_mappings(true); } #ifdef CONFIG_MEMORY_HOTPLUG -- Regards/Gruss, Boris. ^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH] x86-32, mm: Add an initial page table for core bootstrapping 2010-08-24 7:33 ` Initial working version (Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines) Borislav Petkov @ 2010-08-29 20:32 ` Borislav Petkov 2010-09-02 9:10 ` [PATCH -v1.1] " Borislav Petkov 0 siblings, 1 reply; 31+ messages in thread From: Borislav Petkov @ 2010-08-29 20:32 UTC (permalink / raw) To: H. Peter Anvin Cc: Borislav Petkov, Joerg Roedel, Roedel, Joerg, mingo, tglx, Herrmann3, Andreas, Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel This patch adds an initial page table with low mappings used exclusively for booting APs/resuming after ACPI suspend/machine restart. After this, there's no need to add low mappings to swapper_pg_dir and zap them later or create own swsusp PGD page solely for ACPI sleep needs. Signed-off-by: Borislav Petkov <bp@alien8.de> --- Ok, here's the initial version. This patch has survived a whole weekend of randconfig builds and has been boot/reboot/suspend/resume tested on 32-bit kvm guests and bare metal in all possible configurations of PAE and !PAE kernels with all possible VMSPLIT_* config options. Patch is against tip/x86/urgent ontop of Hugh's CONFIG_VMSPLIT_1G and 2G_OPT fix. arch/x86/include/asm/pgtable_32.h | 2 +- arch/x86/include/asm/tlbflush.h | 2 - arch/x86/include/asm/trampoline.h | 3 -- arch/x86/kernel/acpi/sleep.c | 7 ++++- arch/x86/kernel/head32.c | 1 + arch/x86/kernel/head_32.S | 55 +++++++++++++++++-------------------- arch/x86/kernel/reboot.c | 10 +----- arch/x86/kernel/setup.c | 18 +++++++++++- arch/x86/kernel/smpboot.c | 19 ++++--------- arch/x86/kernel/trampoline.c | 16 ----------- arch/x86/mm/init_32.c | 45 ------------------------------ 11 files changed, 58 insertions(+), 120 deletions(-) diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h index f686f49..8abde9e 100644 --- a/arch/x86/include/asm/pgtable_32.h +++ b/arch/x86/include/asm/pgtable_32.h @@ -26,7 +26,7 @@ struct mm_struct; struct vm_area_struct; extern pgd_t swapper_pg_dir[1024]; -extern pgd_t trampoline_pg_dir[1024]; +extern pgd_t initial_page_table[1024]; static inline void pgtable_cache_init(void) { } static inline void check_pgt_cache(void) { } diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 7f3eba0..169be89 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -172,6 +172,4 @@ static inline void flush_tlb_kernel_range(unsigned long start, flush_tlb_all(); } -extern void zap_low_mappings(bool early); - #endif /* _ASM_X86_TLBFLUSH_H */ diff --git a/arch/x86/include/asm/trampoline.h b/arch/x86/include/asm/trampoline.h index 4dde797..f4500fb 100644 --- a/arch/x86/include/asm/trampoline.h +++ b/arch/x86/include/asm/trampoline.h @@ -13,16 +13,13 @@ extern unsigned char *trampoline_base; extern unsigned long init_rsp; extern unsigned long initial_code; -extern unsigned long initial_page_table; extern unsigned long initial_gs; #define TRAMPOLINE_SIZE roundup(trampoline_end - trampoline_data, PAGE_SIZE) extern unsigned long setup_trampoline(void); -extern void __init setup_trampoline_page_table(void); extern void __init reserve_trampoline_memory(void); #else -static inline void setup_trampoline_page_table(void) {} static inline void reserve_trampoline_memory(void) {} #endif /* CONFIG_X86_TRAMPOLINE */ diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c index 33cec15..d04500e 100644 --- a/arch/x86/kernel/acpi/sleep.c +++ b/arch/x86/kernel/acpi/sleep.c @@ -12,6 +12,11 @@ #include <asm/segment.h> #include <asm/desc.h> +#ifdef CONFIG_X86_32 +#include <asm/pgtable.h> +#include <asm/pgtable_32.h> +#endif + #include "realmode/wakeup.h" #include "sleep.h" @@ -90,7 +95,7 @@ int acpi_save_state_mem(void) #ifndef CONFIG_64BIT header->pmode_entry = (u32)&wakeup_pmode_return; - header->pmode_cr3 = (u32)(swsusp_pg_dir - __PAGE_OFFSET); + header->pmode_cr3 = (u32)(initial_page_table - __PAGE_OFFSET); saved_magic = 0x12345678; #else /* CONFIG_64BIT */ header->trampoline_segment = setup_trampoline() >> 4; diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c index 784360c..8b9c201 100644 --- a/arch/x86/kernel/head32.c +++ b/arch/x86/kernel/head32.c @@ -17,6 +17,7 @@ #include <asm/apic.h> #include <asm/io_apic.h> #include <asm/bios_ebda.h> +#include <asm/tlbflush.h> static void __init i386_default_early_setup(void) { diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S index fa8c1b8..bcece91 100644 --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -183,13 +183,12 @@ default_entry: #ifdef CONFIG_X86_PAE /* - * In PAE mode swapper_pg_dir is statically defined to contain enough - * entries to cover the VMSPLIT option (that is the top 1, 2 or 3 - * entries). The identity mapping is handled by pointing two PGD - * entries to the first kernel PMD. + * In PAE mode initial_page_table is statically defined to contain + * enough entries to cover the VMSPLIT option (that is the top 1, 2 or 3 + * entries). The identity mapping is handled by pointing two PGD entries + * to the first kernel PMD. * - * Note the upper half of each PMD or PTE are always zero at - * this stage. + * Note the upper half of each PMD or PTE are always zero at this stage. */ #define KPMDS (((-__PAGE_OFFSET) >> 30) & 3) /* Number of kernel PMDs */ @@ -197,7 +196,7 @@ default_entry: xorl %ebx,%ebx /* %ebx is kept at zero */ movl $pa(__brk_base), %edi - movl $pa(swapper_pg_pmd), %edx + movl $pa(initial_pg_pmd), %edx movl $PTE_IDENT_ATTR, %eax 10: leal PDE_IDENT_ATTR(%edi),%ecx /* Create PMD entry */ @@ -226,14 +225,14 @@ default_entry: movl %eax, pa(max_pfn_mapped) /* Do early initialization of the fixmap area */ - movl $pa(swapper_pg_fixmap)+PDE_IDENT_ATTR,%eax - movl %eax,pa(swapper_pg_pmd+0x1000*KPMDS-8) + movl $pa(initial_pg_fixmap)+PDE_IDENT_ATTR,%eax + movl %eax,pa(initial_pg_pmd+0x1000*KPMDS-8) #else /* Not PAE */ page_pde_offset = (__PAGE_OFFSET >> 20); movl $pa(__brk_base), %edi - movl $pa(swapper_pg_dir), %edx + movl $pa(initial_page_table), %edx movl $PTE_IDENT_ATTR, %eax 10: leal PDE_IDENT_ATTR(%edi),%ecx /* Create PDE entry */ @@ -257,8 +256,8 @@ page_pde_offset = (__PAGE_OFFSET >> 20); movl %eax, pa(max_pfn_mapped) /* Do early initialization of the fixmap area */ - movl $pa(swapper_pg_fixmap)+PDE_IDENT_ATTR,%eax - movl %eax,pa(swapper_pg_dir+0xffc) + movl $pa(initial_pg_fixmap)+PDE_IDENT_ATTR,%eax + movl %eax,pa(initial_page_table+0xffc) #endif jmp 3f /* @@ -334,7 +333,7 @@ ENTRY(startup_32_smp) /* * Enable paging */ - movl pa(initial_page_table), %eax + movl $pa(initial_page_table), %eax movl %eax,%cr3 /* set the page table pointer.. */ movl %cr0,%eax orl $X86_CR0_PG,%eax @@ -614,8 +613,6 @@ ignore_int: .align 4 ENTRY(initial_code) .long i386_start_kernel -ENTRY(initial_page_table) - .long pa(swapper_pg_dir) /* * BSS section @@ -623,20 +620,18 @@ ENTRY(initial_page_table) __PAGE_ALIGNED_BSS .align PAGE_SIZE_asm #ifdef CONFIG_X86_PAE -swapper_pg_pmd: +initial_pg_pmd: .fill 1024*KPMDS,4,0 #else -ENTRY(swapper_pg_dir) +ENTRY(initial_page_table) .fill 1024,4,0 #endif -swapper_pg_fixmap: +initial_pg_fixmap: .fill 1024,4,0 -#ifdef CONFIG_X86_TRAMPOLINE -ENTRY(trampoline_pg_dir) - .fill 1024,4,0 -#endif ENTRY(empty_zero_page) .fill 4096,1,0 +ENTRY(swapper_pg_dir) + .fill 1024,4,0 /* * This starts the data section. @@ -645,20 +640,20 @@ ENTRY(empty_zero_page) __PAGE_ALIGNED_DATA /* Page-aligned for the benefit of paravirt? */ .align PAGE_SIZE_asm -ENTRY(swapper_pg_dir) - .long pa(swapper_pg_pmd+PGD_IDENT_ATTR),0 /* low identity map */ +ENTRY(initial_page_table) + .long pa(initial_pg_pmd+PGD_IDENT_ATTR),0 /* low identity map */ # if KPMDS == 3 - .long pa(swapper_pg_pmd+PGD_IDENT_ATTR),0 - .long pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x1000),0 - .long pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x2000),0 + .long pa(initial_pg_pmd+PGD_IDENT_ATTR),0 + .long pa(initial_pg_pmd+PGD_IDENT_ATTR+0x1000),0 + .long pa(initial_pg_pmd+PGD_IDENT_ATTR+0x2000),0 # elif KPMDS == 2 .long 0,0 - .long pa(swapper_pg_pmd+PGD_IDENT_ATTR),0 - .long pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x1000),0 + .long pa(initial_pg_pmd+PGD_IDENT_ATTR),0 + .long pa(initial_pg_pmd+PGD_IDENT_ATTR+0x1000),0 # elif KPMDS == 1 .long 0,0 .long 0,0 - .long pa(swapper_pg_pmd+PGD_IDENT_ATTR),0 + .long pa(initial_pg_pmd+PGD_IDENT_ATTR),0 # else # error "Kernel PMDs should be 1, 2 or 3" # endif diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c index e3af342..ce5566e 100644 --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -371,16 +371,10 @@ void machine_real_restart(const unsigned char *code, int length) CMOS_WRITE(0x00, 0x8f); spin_unlock(&rtc_lock); - /* Remap the kernel at virtual address zero, as well as offset zero - from the kernel segment. This assumes the kernel segment starts at - virtual address PAGE_OFFSET. */ - memcpy(swapper_pg_dir, swapper_pg_dir + KERNEL_PGD_BOUNDARY, - sizeof(swapper_pg_dir [0]) * KERNEL_PGD_PTRS); - /* - * Use `swapper_pg_dir' as our page directory. + * Switch back to the initial page table. */ - load_cr3(swapper_pg_dir); + load_cr3(initial_page_table); /* Write 0x1234 to absolute memory location 0x472. The BIOS reads this on booting to tell it to "Bypass memory test (also warm diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index c3a4fbb..b315b37 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -730,6 +730,17 @@ void __init setup_arch(char **cmdline_p) #ifdef CONFIG_X86_32 memcpy(&boot_cpu_data, &new_cpu_data, sizeof(new_cpu_data)); visws_early_detect(); + + /* + * copy kernel address range established so far and switch + * to the proper swapper page table + */ + clone_pgd_range(swapper_pg_dir + KERNEL_PGD_BOUNDARY, + initial_page_table + KERNEL_PGD_BOUNDARY, + KERNEL_PGD_PTRS); + + load_cr3(swapper_pg_dir); + __flush_tlb_all(); #else printk(KERN_INFO "Command line: %s\n", boot_command_line); #endif @@ -1014,7 +1025,12 @@ void __init setup_arch(char **cmdline_p) paging_init(); x86_init.paging.pagetable_setup_done(swapper_pg_dir); - setup_trampoline_page_table(); +#ifdef CONFIG_X86_32 + /* sync back kernel address range */ + clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY, + swapper_pg_dir + KERNEL_PGD_BOUNDARY, + KERNEL_PGD_PTRS); +#endif tboot_probe(); diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 8b3bfc4..0d86d0b 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -300,22 +300,17 @@ notrace static void __cpuinit start_secondary(void *unused) * most necessary things. */ -#ifdef CONFIG_X86_32 - /* - * Switch away from the trampoline page-table - * - * Do this before cpu_init() because it needs to access per-cpu - * data which may not be mapped in the trampoline page-table. - */ - load_cr3(swapper_pg_dir); - __flush_tlb_all(); -#endif - vmi_bringup(); cpu_init(); preempt_disable(); smp_callin(); +#ifdef CONFIG_X86_32 + /* switch away from the initial page table */ + load_cr3(swapper_pg_dir); + __flush_tlb_all(); +#endif + /* otherwise gcc will move up smp_processor_id before the cpu_init */ barrier(); /* @@ -774,7 +769,6 @@ do_rest: #ifdef CONFIG_X86_32 /* Stack for startup_32 can be just as for start_secondary onwards */ irq_ctx_init(cpu); - initial_page_table = __pa(&trampoline_pg_dir); #else clear_tsk_thread_flag(c_idle.idle, TIF_FORK); initial_gs = per_cpu_offset(cpu); @@ -923,7 +917,6 @@ int __cpuinit native_cpu_up(unsigned int cpu) per_cpu(cpu_state, cpu) = CPU_UP_PREPARE; err = do_boot_cpu(apicid, cpu); - if (err) { pr_debug("do_boot_cpu failed %d\n", err); return -EIO; diff --git a/arch/x86/kernel/trampoline.c b/arch/x86/kernel/trampoline.c index e2a5952..f1488a3 100644 --- a/arch/x86/kernel/trampoline.c +++ b/arch/x86/kernel/trampoline.c @@ -38,19 +38,3 @@ unsigned long __trampinit setup_trampoline(void) memcpy(trampoline_base, trampoline_data, TRAMPOLINE_SIZE); return virt_to_phys(trampoline_base); } - -void __init setup_trampoline_page_table(void) -{ -#ifdef CONFIG_X86_32 - /* Copy kernel address range */ - clone_pgd_range(trampoline_pg_dir + KERNEL_PGD_BOUNDARY, - swapper_pg_dir + KERNEL_PGD_BOUNDARY, - KERNEL_PGD_PTRS); - - /* Initialize low mappings */ - clone_pgd_range(trampoline_pg_dir, - swapper_pg_dir + KERNEL_PGD_BOUNDARY, - min_t(unsigned long, KERNEL_PGD_PTRS, - KERNEL_PGD_BOUNDARY)); -#endif -} diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c index bca7909..adad48b 100644 --- a/arch/x86/mm/init_32.c +++ b/arch/x86/mm/init_32.c @@ -548,48 +548,6 @@ static void __init pagetable_init(void) permanent_kmaps_init(pgd_base); } -#ifdef CONFIG_ACPI_SLEEP -/* - * ACPI suspend needs this for resume, because things like the intel-agp - * driver might have split up a kernel 4MB mapping. - */ -char swsusp_pg_dir[PAGE_SIZE] - __attribute__ ((aligned(PAGE_SIZE))); - -static inline void save_pg_dir(void) -{ - memcpy(swsusp_pg_dir, swapper_pg_dir, PAGE_SIZE); -} -#else /* !CONFIG_ACPI_SLEEP */ -static inline void save_pg_dir(void) -{ -} -#endif /* !CONFIG_ACPI_SLEEP */ - -void zap_low_mappings(bool early) -{ - int i; - - /* - * Zap initial low-memory mappings. - * - * Note that "pgd_clear()" doesn't do it for - * us, because pgd_clear() is a no-op on i386. - */ - for (i = 0; i < KERNEL_PGD_BOUNDARY; i++) { -#ifdef CONFIG_X86_PAE - set_pgd(swapper_pg_dir+i, __pgd(1 + __pa(empty_zero_page))); -#else - set_pgd(swapper_pg_dir+i, __pgd(0)); -#endif - } - - if (early) - __flush_tlb(); - else - flush_tlb_all(); -} - pteval_t __supported_pte_mask __read_mostly = ~(_PAGE_NX | _PAGE_GLOBAL | _PAGE_IOMAP); EXPORT_SYMBOL_GPL(__supported_pte_mask); @@ -958,9 +916,6 @@ void __init mem_init(void) if (boot_cpu_data.wp_works_ok < 0) test_wp_bit(); - - save_pg_dir(); - zap_low_mappings(true); } #ifdef CONFIG_MEMORY_HOTPLUG -- 1.7.1 -- Regards/Gruss, Boris. ^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH -v1.1] x86-32, mm: Add an initial page table for core bootstrapping 2010-08-29 20:32 ` [PATCH] x86-32, mm: Add an initial page table for core bootstrapping Borislav Petkov @ 2010-09-02 9:10 ` Borislav Petkov 0 siblings, 0 replies; 31+ messages in thread From: Borislav Petkov @ 2010-09-02 9:10 UTC (permalink / raw) To: H. Peter Anvin Cc: Borislav Petkov, Joerg Roedel, Roedel, Joerg, mingo, tglx, Herrmann3, Andreas, Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel From: Borislav Petkov <bp@alien8.de> Date: Sun, Aug 29, 2010 at 10:32:05PM +0200 > This patch adds an initial page table with low mappings used exclusively > for booting APs/resuming after ACPI suspend/machine restart. After this, > there's no need to add low mappings to swapper_pg_dir and zap them later > or create own swsusp PGD page solely for ACPI sleep needs. > > Signed-off-by: Borislav Petkov <bp@alien8.de> > --- > > Ok, here's the initial version. This patch has survived a whole weekend > of randconfig builds and has been boot/reboot/suspend/resume tested on > 32-bit kvm guests and bare metal in all possible configurations of PAE > and !PAE kernels with all possible VMSPLIT_* config options. > > Patch is against tip/x86/urgent ontop of Hugh's CONFIG_VMSPLIT_1G and > 2G_OPT fix. Ok this one broke suspend-to-ram due to wrongly _not_ passing the _pointer_ to the initial_page_table as the pmode_cr3 pointer when resuming from ACPI sleep. Here's a better version. -- From: Borislav Petkov <bp@alien8.de> Date: Sat, 28 Aug 2010 15:58:33 +0200 Subject: [PATCH] x86-32, mm: Add an initial page table for core bootstrapping This patch adds an initial page table with low mappings used exclusively for booting APs/resuming after ACPI suspend/machine restart. After this, there's no need to add low mappings to swapper_pg_dir and zap them later or create own swsusp PGD page solely for ACPI sleep needs. [ v1.1 Fix suspend-to-ram ] Signed-off-by: Borislav Petkov <bp@alien8.de> --- arch/x86/include/asm/pgtable_32.h | 2 +- arch/x86/include/asm/tlbflush.h | 2 - arch/x86/include/asm/trampoline.h | 3 -- arch/x86/kernel/acpi/sleep.c | 7 ++++- arch/x86/kernel/head32.c | 1 + arch/x86/kernel/head_32.S | 55 +++++++++++++++++-------------------- arch/x86/kernel/reboot.c | 10 +----- arch/x86/kernel/setup.c | 18 +++++++++++- arch/x86/kernel/smpboot.c | 19 ++++--------- arch/x86/kernel/trampoline.c | 16 ----------- arch/x86/mm/init_32.c | 45 ------------------------------ 11 files changed, 58 insertions(+), 120 deletions(-) diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h index f686f49..8abde9e 100644 --- a/arch/x86/include/asm/pgtable_32.h +++ b/arch/x86/include/asm/pgtable_32.h @@ -26,7 +26,7 @@ struct mm_struct; struct vm_area_struct; extern pgd_t swapper_pg_dir[1024]; -extern pgd_t trampoline_pg_dir[1024]; +extern pgd_t initial_page_table[1024]; static inline void pgtable_cache_init(void) { } static inline void check_pgt_cache(void) { } diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 7f3eba0..169be89 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -172,6 +172,4 @@ static inline void flush_tlb_kernel_range(unsigned long start, flush_tlb_all(); } -extern void zap_low_mappings(bool early); - #endif /* _ASM_X86_TLBFLUSH_H */ diff --git a/arch/x86/include/asm/trampoline.h b/arch/x86/include/asm/trampoline.h index 4dde797..f4500fb 100644 --- a/arch/x86/include/asm/trampoline.h +++ b/arch/x86/include/asm/trampoline.h @@ -13,16 +13,13 @@ extern unsigned char *trampoline_base; extern unsigned long init_rsp; extern unsigned long initial_code; -extern unsigned long initial_page_table; extern unsigned long initial_gs; #define TRAMPOLINE_SIZE roundup(trampoline_end - trampoline_data, PAGE_SIZE) extern unsigned long setup_trampoline(void); -extern void __init setup_trampoline_page_table(void); extern void __init reserve_trampoline_memory(void); #else -static inline void setup_trampoline_page_table(void) {} static inline void reserve_trampoline_memory(void) {} #endif /* CONFIG_X86_TRAMPOLINE */ diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c index 33cec15..b35e1ab 100644 --- a/arch/x86/kernel/acpi/sleep.c +++ b/arch/x86/kernel/acpi/sleep.c @@ -12,6 +12,11 @@ #include <asm/segment.h> #include <asm/desc.h> +#ifdef CONFIG_X86_32 +#include <asm/pgtable.h> +#include <asm/pgtable_32.h> +#endif + #include "realmode/wakeup.h" #include "sleep.h" @@ -90,7 +95,7 @@ int acpi_save_state_mem(void) #ifndef CONFIG_64BIT header->pmode_entry = (u32)&wakeup_pmode_return; - header->pmode_cr3 = (u32)(swsusp_pg_dir - __PAGE_OFFSET); + header->pmode_cr3 = (u32)__pa(&initial_page_table); saved_magic = 0x12345678; #else /* CONFIG_64BIT */ header->trampoline_segment = setup_trampoline() >> 4; diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c index 784360c..8b9c201 100644 --- a/arch/x86/kernel/head32.c +++ b/arch/x86/kernel/head32.c @@ -17,6 +17,7 @@ #include <asm/apic.h> #include <asm/io_apic.h> #include <asm/bios_ebda.h> +#include <asm/tlbflush.h> static void __init i386_default_early_setup(void) { diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S index fa8c1b8..bcece91 100644 --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -183,13 +183,12 @@ default_entry: #ifdef CONFIG_X86_PAE /* - * In PAE mode swapper_pg_dir is statically defined to contain enough - * entries to cover the VMSPLIT option (that is the top 1, 2 or 3 - * entries). The identity mapping is handled by pointing two PGD - * entries to the first kernel PMD. + * In PAE mode initial_page_table is statically defined to contain + * enough entries to cover the VMSPLIT option (that is the top 1, 2 or 3 + * entries). The identity mapping is handled by pointing two PGD entries + * to the first kernel PMD. * - * Note the upper half of each PMD or PTE are always zero at - * this stage. + * Note the upper half of each PMD or PTE are always zero at this stage. */ #define KPMDS (((-__PAGE_OFFSET) >> 30) & 3) /* Number of kernel PMDs */ @@ -197,7 +196,7 @@ default_entry: xorl %ebx,%ebx /* %ebx is kept at zero */ movl $pa(__brk_base), %edi - movl $pa(swapper_pg_pmd), %edx + movl $pa(initial_pg_pmd), %edx movl $PTE_IDENT_ATTR, %eax 10: leal PDE_IDENT_ATTR(%edi),%ecx /* Create PMD entry */ @@ -226,14 +225,14 @@ default_entry: movl %eax, pa(max_pfn_mapped) /* Do early initialization of the fixmap area */ - movl $pa(swapper_pg_fixmap)+PDE_IDENT_ATTR,%eax - movl %eax,pa(swapper_pg_pmd+0x1000*KPMDS-8) + movl $pa(initial_pg_fixmap)+PDE_IDENT_ATTR,%eax + movl %eax,pa(initial_pg_pmd+0x1000*KPMDS-8) #else /* Not PAE */ page_pde_offset = (__PAGE_OFFSET >> 20); movl $pa(__brk_base), %edi - movl $pa(swapper_pg_dir), %edx + movl $pa(initial_page_table), %edx movl $PTE_IDENT_ATTR, %eax 10: leal PDE_IDENT_ATTR(%edi),%ecx /* Create PDE entry */ @@ -257,8 +256,8 @@ page_pde_offset = (__PAGE_OFFSET >> 20); movl %eax, pa(max_pfn_mapped) /* Do early initialization of the fixmap area */ - movl $pa(swapper_pg_fixmap)+PDE_IDENT_ATTR,%eax - movl %eax,pa(swapper_pg_dir+0xffc) + movl $pa(initial_pg_fixmap)+PDE_IDENT_ATTR,%eax + movl %eax,pa(initial_page_table+0xffc) #endif jmp 3f /* @@ -334,7 +333,7 @@ ENTRY(startup_32_smp) /* * Enable paging */ - movl pa(initial_page_table), %eax + movl $pa(initial_page_table), %eax movl %eax,%cr3 /* set the page table pointer.. */ movl %cr0,%eax orl $X86_CR0_PG,%eax @@ -614,8 +613,6 @@ ignore_int: .align 4 ENTRY(initial_code) .long i386_start_kernel -ENTRY(initial_page_table) - .long pa(swapper_pg_dir) /* * BSS section @@ -623,20 +620,18 @@ ENTRY(initial_page_table) __PAGE_ALIGNED_BSS .align PAGE_SIZE_asm #ifdef CONFIG_X86_PAE -swapper_pg_pmd: +initial_pg_pmd: .fill 1024*KPMDS,4,0 #else -ENTRY(swapper_pg_dir) +ENTRY(initial_page_table) .fill 1024,4,0 #endif -swapper_pg_fixmap: +initial_pg_fixmap: .fill 1024,4,0 -#ifdef CONFIG_X86_TRAMPOLINE -ENTRY(trampoline_pg_dir) - .fill 1024,4,0 -#endif ENTRY(empty_zero_page) .fill 4096,1,0 +ENTRY(swapper_pg_dir) + .fill 1024,4,0 /* * This starts the data section. @@ -645,20 +640,20 @@ ENTRY(empty_zero_page) __PAGE_ALIGNED_DATA /* Page-aligned for the benefit of paravirt? */ .align PAGE_SIZE_asm -ENTRY(swapper_pg_dir) - .long pa(swapper_pg_pmd+PGD_IDENT_ATTR),0 /* low identity map */ +ENTRY(initial_page_table) + .long pa(initial_pg_pmd+PGD_IDENT_ATTR),0 /* low identity map */ # if KPMDS == 3 - .long pa(swapper_pg_pmd+PGD_IDENT_ATTR),0 - .long pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x1000),0 - .long pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x2000),0 + .long pa(initial_pg_pmd+PGD_IDENT_ATTR),0 + .long pa(initial_pg_pmd+PGD_IDENT_ATTR+0x1000),0 + .long pa(initial_pg_pmd+PGD_IDENT_ATTR+0x2000),0 # elif KPMDS == 2 .long 0,0 - .long pa(swapper_pg_pmd+PGD_IDENT_ATTR),0 - .long pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x1000),0 + .long pa(initial_pg_pmd+PGD_IDENT_ATTR),0 + .long pa(initial_pg_pmd+PGD_IDENT_ATTR+0x1000),0 # elif KPMDS == 1 .long 0,0 .long 0,0 - .long pa(swapper_pg_pmd+PGD_IDENT_ATTR),0 + .long pa(initial_pg_pmd+PGD_IDENT_ATTR),0 # else # error "Kernel PMDs should be 1, 2 or 3" # endif diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c index e3af342..ce5566e 100644 --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -371,16 +371,10 @@ void machine_real_restart(const unsigned char *code, int length) CMOS_WRITE(0x00, 0x8f); spin_unlock(&rtc_lock); - /* Remap the kernel at virtual address zero, as well as offset zero - from the kernel segment. This assumes the kernel segment starts at - virtual address PAGE_OFFSET. */ - memcpy(swapper_pg_dir, swapper_pg_dir + KERNEL_PGD_BOUNDARY, - sizeof(swapper_pg_dir [0]) * KERNEL_PGD_PTRS); - /* - * Use `swapper_pg_dir' as our page directory. + * Switch back to the initial page table. */ - load_cr3(swapper_pg_dir); + load_cr3(initial_page_table); /* Write 0x1234 to absolute memory location 0x472. The BIOS reads this on booting to tell it to "Bypass memory test (also warm diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index c3a4fbb..b315b37 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -730,6 +730,17 @@ void __init setup_arch(char **cmdline_p) #ifdef CONFIG_X86_32 memcpy(&boot_cpu_data, &new_cpu_data, sizeof(new_cpu_data)); visws_early_detect(); + + /* + * copy kernel address range established so far and switch + * to the proper swapper page table + */ + clone_pgd_range(swapper_pg_dir + KERNEL_PGD_BOUNDARY, + initial_page_table + KERNEL_PGD_BOUNDARY, + KERNEL_PGD_PTRS); + + load_cr3(swapper_pg_dir); + __flush_tlb_all(); #else printk(KERN_INFO "Command line: %s\n", boot_command_line); #endif @@ -1014,7 +1025,12 @@ void __init setup_arch(char **cmdline_p) paging_init(); x86_init.paging.pagetable_setup_done(swapper_pg_dir); - setup_trampoline_page_table(); +#ifdef CONFIG_X86_32 + /* sync back kernel address range */ + clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY, + swapper_pg_dir + KERNEL_PGD_BOUNDARY, + KERNEL_PGD_PTRS); +#endif tboot_probe(); diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 8b3bfc4..0d86d0b 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -300,22 +300,17 @@ notrace static void __cpuinit start_secondary(void *unused) * most necessary things. */ -#ifdef CONFIG_X86_32 - /* - * Switch away from the trampoline page-table - * - * Do this before cpu_init() because it needs to access per-cpu - * data which may not be mapped in the trampoline page-table. - */ - load_cr3(swapper_pg_dir); - __flush_tlb_all(); -#endif - vmi_bringup(); cpu_init(); preempt_disable(); smp_callin(); +#ifdef CONFIG_X86_32 + /* switch away from the initial page table */ + load_cr3(swapper_pg_dir); + __flush_tlb_all(); +#endif + /* otherwise gcc will move up smp_processor_id before the cpu_init */ barrier(); /* @@ -774,7 +769,6 @@ do_rest: #ifdef CONFIG_X86_32 /* Stack for startup_32 can be just as for start_secondary onwards */ irq_ctx_init(cpu); - initial_page_table = __pa(&trampoline_pg_dir); #else clear_tsk_thread_flag(c_idle.idle, TIF_FORK); initial_gs = per_cpu_offset(cpu); @@ -923,7 +917,6 @@ int __cpuinit native_cpu_up(unsigned int cpu) per_cpu(cpu_state, cpu) = CPU_UP_PREPARE; err = do_boot_cpu(apicid, cpu); - if (err) { pr_debug("do_boot_cpu failed %d\n", err); return -EIO; diff --git a/arch/x86/kernel/trampoline.c b/arch/x86/kernel/trampoline.c index e2a5952..f1488a3 100644 --- a/arch/x86/kernel/trampoline.c +++ b/arch/x86/kernel/trampoline.c @@ -38,19 +38,3 @@ unsigned long __trampinit setup_trampoline(void) memcpy(trampoline_base, trampoline_data, TRAMPOLINE_SIZE); return virt_to_phys(trampoline_base); } - -void __init setup_trampoline_page_table(void) -{ -#ifdef CONFIG_X86_32 - /* Copy kernel address range */ - clone_pgd_range(trampoline_pg_dir + KERNEL_PGD_BOUNDARY, - swapper_pg_dir + KERNEL_PGD_BOUNDARY, - KERNEL_PGD_PTRS); - - /* Initialize low mappings */ - clone_pgd_range(trampoline_pg_dir, - swapper_pg_dir + KERNEL_PGD_BOUNDARY, - min_t(unsigned long, KERNEL_PGD_PTRS, - KERNEL_PGD_BOUNDARY)); -#endif -} diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c index bca7909..adad48b 100644 --- a/arch/x86/mm/init_32.c +++ b/arch/x86/mm/init_32.c @@ -548,48 +548,6 @@ static void __init pagetable_init(void) permanent_kmaps_init(pgd_base); } -#ifdef CONFIG_ACPI_SLEEP -/* - * ACPI suspend needs this for resume, because things like the intel-agp - * driver might have split up a kernel 4MB mapping. - */ -char swsusp_pg_dir[PAGE_SIZE] - __attribute__ ((aligned(PAGE_SIZE))); - -static inline void save_pg_dir(void) -{ - memcpy(swsusp_pg_dir, swapper_pg_dir, PAGE_SIZE); -} -#else /* !CONFIG_ACPI_SLEEP */ -static inline void save_pg_dir(void) -{ -} -#endif /* !CONFIG_ACPI_SLEEP */ - -void zap_low_mappings(bool early) -{ - int i; - - /* - * Zap initial low-memory mappings. - * - * Note that "pgd_clear()" doesn't do it for - * us, because pgd_clear() is a no-op on i386. - */ - for (i = 0; i < KERNEL_PGD_BOUNDARY; i++) { -#ifdef CONFIG_X86_PAE - set_pgd(swapper_pg_dir+i, __pgd(1 + __pa(empty_zero_page))); -#else - set_pgd(swapper_pg_dir+i, __pgd(0)); -#endif - } - - if (early) - __flush_tlb(); - else - flush_tlb_all(); -} - pteval_t __supported_pte_mask __read_mostly = ~(_PAGE_NX | _PAGE_GLOBAL | _PAGE_IOMAP); EXPORT_SYMBOL_GPL(__supported_pte_mask); @@ -958,9 +916,6 @@ void __init mem_init(void) if (boot_cpu_data.wp_works_ok < 0) test_wp_bit(); - - save_pg_dir(); - zap_low_mappings(true); } #ifdef CONFIG_MEMORY_HOTPLUG -- 1.7.1 -- Regards/Gruss, Boris. ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines 2010-08-12 15:34 ` H. Peter Anvin 2010-08-12 15:47 ` Borislav Petkov @ 2010-08-12 17:06 ` Joerg Roedel 2010-08-12 19:01 ` H. Peter Anvin 1 sibling, 1 reply; 31+ messages in thread From: Joerg Roedel @ 2010-08-12 17:06 UTC (permalink / raw) To: H. Peter Anvin Cc: Roedel, Joerg, Borislav Petkov, mingo, tglx, Herrmann3, Andreas, Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel On Thu, Aug 12, 2010 at 08:34:37AM -0700, H. Peter Anvin wrote: > Agreed. I do have a concern about the kernel page tables not being > synchronized into trampoline_pg_dir (it's only an issue for 32-bit > !PAE), so at the very least we need to keep an eye out for it... Great. The synchronization problem might be real. The cpu_init function uses per-cpu variables which might not be present in the early synced tramponline page table (iirc). The change to swapper_pg_dir and the global tlb flush should probably be moved to the very beginning of the start_secondary function. Joerg ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines 2010-08-12 17:06 ` [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines Joerg Roedel @ 2010-08-12 19:01 ` H. Peter Anvin 2010-08-12 19:04 ` Joerg Roedel 0 siblings, 1 reply; 31+ messages in thread From: H. Peter Anvin @ 2010-08-12 19:01 UTC (permalink / raw) To: Joerg Roedel Cc: Roedel, Joerg, Borislav Petkov, mingo, tglx, Herrmann3, Andreas, Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel On 08/12/2010 10:06 AM, Joerg Roedel wrote: > On Thu, Aug 12, 2010 at 08:34:37AM -0700, H. Peter Anvin wrote: > >> Agreed. I do have a concern about the kernel page tables not being >> synchronized into trampoline_pg_dir (it's only an issue for 32-bit >> !PAE), so at the very least we need to keep an eye out for it... > > Great. > > The synchronization problem might be real. The cpu_init function uses > per-cpu variables which might not be present in the early synced > tramponline page table (iirc). The change to swapper_pg_dir and the > global tlb flush should probably be moved to the very beginning of the > start_secondary function. > Are you going to submit an updated patch, then? -hpa ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines 2010-08-12 19:01 ` H. Peter Anvin @ 2010-08-12 19:04 ` Joerg Roedel 2010-08-13 12:35 ` Borislav Petkov 0 siblings, 1 reply; 31+ messages in thread From: Joerg Roedel @ 2010-08-12 19:04 UTC (permalink / raw) To: H. Peter Anvin Cc: Roedel, Joerg, Borislav Petkov, mingo, tglx, Herrmann3, Andreas, Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel On Thu, Aug 12, 2010 at 12:01:27PM -0700, H. Peter Anvin wrote: > On 08/12/2010 10:06 AM, Joerg Roedel wrote: > > The synchronization problem might be real. The cpu_init function uses > > per-cpu variables which might not be present in the early synced > > tramponline page table (iirc). The change to swapper_pg_dir and the > > global tlb flush should probably be moved to the very beginning of the > > start_secondary function. > > > > Are you going to submit an updated patch, then? Yes, I will send it next week when I am back in office and gave it some testing. Joerg ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines 2010-08-12 19:04 ` Joerg Roedel @ 2010-08-13 12:35 ` Borislav Petkov 2010-08-16 12:19 ` Borislav Petkov 0 siblings, 1 reply; 31+ messages in thread From: Borislav Petkov @ 2010-08-13 12:35 UTC (permalink / raw) To: Joerg Roedel Cc: H. Peter Anvin, Roedel, Joerg, mingo, tglx, Herrmann3, Andreas, Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel From: Joerg Roedel <joro@8bytes.org> Date: Thu, Aug 12, 2010 at 03:04:14PM -0400 > On Thu, Aug 12, 2010 at 12:01:27PM -0700, H. Peter Anvin wrote: > > On 08/12/2010 10:06 AM, Joerg Roedel wrote: > > > > The synchronization problem might be real. The cpu_init function uses > > > per-cpu variables which might not be present in the early synced > > > tramponline page table (iirc). The change to swapper_pg_dir and the > > > global tlb flush should probably be moved to the very beginning of the > > > start_secondary function. > > > > > > > Are you going to submit an updated patch, then? > > Yes, I will send it next week when I am back in office and gave it some > testing. currently testing... -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines 2010-08-13 12:35 ` Borislav Petkov @ 2010-08-16 12:19 ` Borislav Petkov 2010-08-16 12:38 ` [PATCH -v2 " Borislav Petkov 2010-08-16 12:39 ` [PATCH -v2 2/2] x86, cpu hotplug: Fix cpu bringup concurrency issue Borislav Petkov 0 siblings, 2 replies; 31+ messages in thread From: Borislav Petkov @ 2010-08-16 12:19 UTC (permalink / raw) To: H. Peter Anvin Cc: Joerg Roedel, Roedel, Joerg, mingo, tglx, Herrmann3, Andreas, Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel From: Borislav Petkov <bp@amd64.org> Date: Fri, Aug 13, 2010 at 08:35:43AM -0400 > > > > The synchronization problem might be real. The cpu_init function uses > > > > per-cpu variables which might not be present in the early synced > > > > tramponline page table (iirc). The change to swapper_pg_dir and the > > > > global tlb flush should probably be moved to the very beginning of the > > > > start_secondary function. > > > > > > > > > > Are you going to submit an updated patch, then? > > > > Yes, I will send it next week when I am back in office and gave it some > > testing. > > currently testing... Updated versions survived almost 48h of uninterrupted hammering, I'm sending them as a reply to this message... -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH -v2 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines 2010-08-16 12:19 ` Borislav Petkov @ 2010-08-16 12:38 ` Borislav Petkov 2010-08-18 18:41 ` H. Peter Anvin 2010-08-18 20:55 ` [tip:x86/urgent] x86-32: Separate 1:1 pagetables from swapper_pg_dir tip-bot for Joerg Roedel 2010-08-16 12:39 ` [PATCH -v2 2/2] x86, cpu hotplug: Fix cpu bringup concurrency issue Borislav Petkov 1 sibling, 2 replies; 31+ messages in thread From: Borislav Petkov @ 2010-08-16 12:38 UTC (permalink / raw) To: H. Peter Anvin Cc: Joerg Roedel, Roedel, Joerg, mingo, tglx, Herrmann3, Andreas, Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel From: Joerg Roedel <joerg.roedel@amd.com> This patch fixes machine crashes which occur when heavily exercising the CPU hotplug codepaths on a 32-bit kernel. These crashes are caused by AMD Erratum 383 and result in a fatal machine check exception. Here's the scenario: 1. On 32-bit, the swapper_pg_dir page table is used as the initial page table for booting a secondary CPU. 2. To make this work, swapper_pg_dir needs a direct mapping of physical memory in it (the low mappings). By adding those low, large page (2M) mappings (PAE kernel), we create the necessary conditions for Erratum 383 to occur. 3. Other CPUs which do not participate in the off- and onlining game may use swapper_pg_dir while the low mappings are present (when leave_mm is called). For all steps below, the CPU referred to is a CPU that is using swapper_pg_dir, and not the CPU which is being onlined. 4. The presence of the low mappings in swapper_pg_dir can result in TLB entries for addresses below __PAGE_OFFSET to be established speculatively. These TLB entries are marked global and large. 5. When the CPU with such TLB entry switches to another page table, this TLB entry remains because it is global. 6. The process then generates an access to an address covered by the above TLB entry but there is a permission mismatch - the TLB entry covers a large global page not accessible to userspace. 7. Due to this permission mismatch a new 4kb, user TLB entry gets established. Further, Erratum 383 provides for a small window of time where both TLB entries are present. This results in an uncorrectable machine check exception signalling a TLB multimatch which panics the machine. There are two ways to fix this issue: 1. Always do a global TLB flush when a new cr3 is loaded and the old page table was swapper_pg_dir. I consider this a hack hard to understand and with performance implications 2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit does. This patch implements solution 2. It introduces a trampoline_pg_dir which has the same layout as swapper_pg_dir with low_mappings. This page table is used as the initial page table of the booting CPU. Later in the bringup process, it switches to swapper_pg_dir and does a global TLB flush. This fixes the crashes in our test cases. -v2: switch to swapper_pg_dir right after entering start_secondary() so that we are able to access percpu data which might not be mapped in the trampoline page table. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> --- arch/x86/include/asm/pgtable_32.h | 1 + arch/x86/include/asm/trampoline.h | 3 +++ arch/x86/kernel/head_32.S | 8 +++++++- arch/x86/kernel/setup.c | 2 ++ arch/x86/kernel/smpboot.c | 32 +++++++++++++------------------- arch/x86/kernel/trampoline.c | 18 ++++++++++++++++++ 6 files changed, 44 insertions(+), 20 deletions(-) diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h index 2984a25..f686f49 100644 --- a/arch/x86/include/asm/pgtable_32.h +++ b/arch/x86/include/asm/pgtable_32.h @@ -26,6 +26,7 @@ struct mm_struct; struct vm_area_struct; extern pgd_t swapper_pg_dir[1024]; +extern pgd_t trampoline_pg_dir[1024]; static inline void pgtable_cache_init(void) { } static inline void check_pgt_cache(void) { } diff --git a/arch/x86/include/asm/trampoline.h b/arch/x86/include/asm/trampoline.h index cb507bb..8f78fdf 100644 --- a/arch/x86/include/asm/trampoline.h +++ b/arch/x86/include/asm/trampoline.h @@ -13,14 +13,17 @@ extern unsigned char *trampoline_base; extern unsigned long init_rsp; extern unsigned long initial_code; +extern unsigned long initial_page_table; extern unsigned long initial_gs; #define TRAMPOLINE_SIZE roundup(trampoline_end - trampoline_data, PAGE_SIZE) extern unsigned long setup_trampoline(void); +extern void __init setup_trampoline_page_table(void); extern void __init reserve_trampoline_memory(void); #else static inline void reserve_trampoline_memory(void) {}; +extern void __init setup_trampoline_page_table(void) {}; #endif /* CONFIG_X86_TRAMPOLINE */ #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S index 37c3d4b..75e3981 100644 --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -328,7 +328,7 @@ ENTRY(startup_32_smp) /* * Enable paging */ - movl $pa(swapper_pg_dir),%eax + movl pa(initial_page_table), %eax movl %eax,%cr3 /* set the page table pointer.. */ movl %cr0,%eax orl $X86_CR0_PG,%eax @@ -608,6 +608,8 @@ ignore_int: .align 4 ENTRY(initial_code) .long i386_start_kernel +ENTRY(initial_page_table) + .long pa(swapper_pg_dir) /* * BSS section @@ -623,6 +625,10 @@ ENTRY(swapper_pg_dir) #endif swapper_pg_fixmap: .fill 1024,4,0 +#ifdef CONFIG_X86_TRAMPOLINE +ENTRY(trampoline_pg_dir) + .fill 1024,4,0 +#endif ENTRY(empty_zero_page) .fill 4096,1,0 diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index b4ae4ac..6600cfd 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1008,6 +1008,8 @@ void __init setup_arch(char **cmdline_p) paging_init(); x86_init.paging.pagetable_setup_done(swapper_pg_dir); + setup_trampoline_page_table(); + tboot_probe(); #ifdef CONFIG_X86_64 diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index c4f33b2..2d148a9 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -73,7 +73,6 @@ #ifdef CONFIG_X86_32 u8 apicid_2_node[MAX_APICID]; -static int low_mappings; #endif /* State of each CPU */ @@ -281,6 +280,18 @@ notrace static void __cpuinit start_secondary(void *unused) * fragile that we want to limit the things done here to the * most necessary things. */ + +#ifdef CONFIG_X86_32 + /* + * Switch away from the trampoline page-table + * + * Do this before cpu_init() because it needs to access per-cpu + * data which may not be mapped in the trampoline page-table. + */ + load_cr3(swapper_pg_dir); + __flush_tlb_all(); +#endif + vmi_bringup(); cpu_init(); preempt_disable(); @@ -299,12 +310,6 @@ notrace static void __cpuinit start_secondary(void *unused) legacy_pic->chip->unmask(0); } -#ifdef CONFIG_X86_32 - while (low_mappings) - cpu_relax(); - __flush_tlb_all(); -#endif - /* This must be done before setting cpu_online_mask */ set_cpu_sibling_map(raw_smp_processor_id()); wmb(); @@ -754,6 +759,7 @@ do_rest: #ifdef CONFIG_X86_32 /* Stack for startup_32 can be just as for start_secondary onwards */ irq_ctx_init(cpu); + initial_page_table = __pa(&trampoline_pg_dir); #else clear_tsk_thread_flag(c_idle.idle, TIF_FORK); initial_gs = per_cpu_offset(cpu); @@ -894,20 +900,8 @@ int __cpuinit native_cpu_up(unsigned int cpu) per_cpu(cpu_state, cpu) = CPU_UP_PREPARE; -#ifdef CONFIG_X86_32 - /* init low mem mapping */ - clone_pgd_range(swapper_pg_dir, swapper_pg_dir + KERNEL_PGD_BOUNDARY, - min_t(unsigned long, KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY)); - flush_tlb_all(); - low_mappings = 1; - err = do_boot_cpu(apicid, cpu); - zap_low_mappings(false); - low_mappings = 0; -#else - err = do_boot_cpu(apicid, cpu); -#endif if (err) { pr_debug("do_boot_cpu failed %d\n", err); return -EIO; diff --git a/arch/x86/kernel/trampoline.c b/arch/x86/kernel/trampoline.c index c652ef6..a874495 100644 --- a/arch/x86/kernel/trampoline.c +++ b/arch/x86/kernel/trampoline.c @@ -1,6 +1,7 @@ #include <linux/io.h> #include <asm/trampoline.h> +#include <asm/pgtable.h> #include <asm/e820.h> #if defined(CONFIG_X86_64) && defined(CONFIG_ACPI_SLEEP) @@ -37,3 +38,20 @@ unsigned long __trampinit setup_trampoline(void) memcpy(trampoline_base, trampoline_data, TRAMPOLINE_SIZE); return virt_to_phys(trampoline_base); } + +void __init setup_trampoline_page_table(void) +{ +#ifdef CONFIG_X86_32 + /* Copy kernel address range */ + clone_pgd_range(trampoline_pg_dir + KERNEL_PGD_BOUNDARY, + swapper_pg_dir + KERNEL_PGD_BOUNDARY, + min_t(unsigned long, KERNEL_PGD_PTRS, + KERNEL_PGD_BOUNDARY)); + + /* Initialize low mappings */ + clone_pgd_range(trampoline_pg_dir, + swapper_pg_dir + KERNEL_PGD_BOUNDARY, + min_t(unsigned long, KERNEL_PGD_PTRS, + KERNEL_PGD_BOUNDARY)); +#endif +} -- 1.7.1 -- Regards/Gruss, Boris. ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH -v2 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines 2010-08-16 12:38 ` [PATCH -v2 " Borislav Petkov @ 2010-08-18 18:41 ` H. Peter Anvin 2010-08-18 19:09 ` Borislav Petkov 2010-08-18 20:55 ` [tip:x86/urgent] x86-32: Fix dummy trampoline-related inline stubs tip-bot for H. Peter Anvin 2010-08-18 20:55 ` [tip:x86/urgent] x86-32: Separate 1:1 pagetables from swapper_pg_dir tip-bot for Joerg Roedel 1 sibling, 2 replies; 31+ messages in thread From: H. Peter Anvin @ 2010-08-18 18:41 UTC (permalink / raw) To: Borislav Petkov Cc: Joerg Roedel, Roedel, Joerg, mingo, tglx, Herrmann3, Andreas, Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel On 08/16/2010 05:38 AM, Borislav Petkov wrote: > #else > static inline void reserve_trampoline_memory(void) {}; > +extern void __init setup_trampoline_page_table(void) {}; > #endif /* CONFIG_X86_TRAMPOLINE */ This breaks compilation in the !CONFIG_X86_TRAMPOLINE case; I presume you meant: static inline void setup_trampoline_page_table(void) {} ... here too? (No semicolon after a function body!) -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH -v2 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines 2010-08-18 18:41 ` H. Peter Anvin @ 2010-08-18 19:09 ` Borislav Petkov 2010-08-18 20:55 ` [tip:x86/urgent] x86-32: Fix dummy trampoline-related inline stubs tip-bot for H. Peter Anvin 1 sibling, 0 replies; 31+ messages in thread From: Borislav Petkov @ 2010-08-18 19:09 UTC (permalink / raw) To: H. Peter Anvin Cc: Joerg Roedel, Roedel, Joerg, mingo, tglx, Herrmann3, Andreas, Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel From: "H. Peter Anvin" <hpa@zytor.com> Date: Wed, Aug 18, 2010 at 02:41:17PM -0400 > On 08/16/2010 05:38 AM, Borislav Petkov wrote: > > #else > > static inline void reserve_trampoline_memory(void) {}; > > +extern void __init setup_trampoline_page_table(void) {}; > > #endif /* CONFIG_X86_TRAMPOLINE */ > > This breaks compilation in the !CONFIG_X86_TRAMPOLINE case; I presume > you meant: > > static inline void setup_trampoline_page_table(void) {} > > ... here too? > > (No semicolon after a function body!) Yeah, thanks for catching that - this should be a static inline stub in the !CONFIG_X86_TRAMPOLINE case. -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 31+ messages in thread
* [tip:x86/urgent] x86-32: Fix dummy trampoline-related inline stubs 2010-08-18 18:41 ` H. Peter Anvin 2010-08-18 19:09 ` Borislav Petkov @ 2010-08-18 20:55 ` tip-bot for H. Peter Anvin 1 sibling, 0 replies; 31+ messages in thread From: tip-bot for H. Peter Anvin @ 2010-08-18 20:55 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, hpa, mingo, joerg.roedel, tglx, borislav.petkov Commit-ID: 8848a91068c018bc91f597038a0f41462a0f88a4 Gitweb: http://git.kernel.org/tip/8848a91068c018bc91f597038a0f41462a0f88a4 Author: H. Peter Anvin <hpa@zytor.com> AuthorDate: Wed, 18 Aug 2010 11:42:23 -0700 Committer: H. Peter Anvin <hpa@zytor.com> CommitDate: Wed, 18 Aug 2010 12:42:24 -0700 x86-32: Fix dummy trampoline-related inline stubs Fix dummy inline stubs for trampoline-related functions when no trampolines exist (until we get rid of the no-trampoline case entirely.) Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <4C6C294D.3030404@zytor.com> --- arch/x86/include/asm/trampoline.h | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/trampoline.h b/arch/x86/include/asm/trampoline.h index 8f78fdf..4dde797 100644 --- a/arch/x86/include/asm/trampoline.h +++ b/arch/x86/include/asm/trampoline.h @@ -22,8 +22,8 @@ extern unsigned long setup_trampoline(void); extern void __init setup_trampoline_page_table(void); extern void __init reserve_trampoline_memory(void); #else -static inline void reserve_trampoline_memory(void) {}; -extern void __init setup_trampoline_page_table(void) {}; +static inline void setup_trampoline_page_table(void) {} +static inline void reserve_trampoline_memory(void) {} #endif /* CONFIG_X86_TRAMPOLINE */ #endif /* __ASSEMBLY__ */ ^ permalink raw reply related [flat|nested] 31+ messages in thread
* [tip:x86/urgent] x86-32: Separate 1:1 pagetables from swapper_pg_dir 2010-08-16 12:38 ` [PATCH -v2 " Borislav Petkov 2010-08-18 18:41 ` H. Peter Anvin @ 2010-08-18 20:55 ` tip-bot for Joerg Roedel 1 sibling, 0 replies; 31+ messages in thread From: tip-bot for Joerg Roedel @ 2010-08-18 20:55 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, hpa, mingo, joerg.roedel, tglx, borislav.petkov Commit-ID: fd89a137924e0710078c3ae855e7cec1c43cb845 Gitweb: http://git.kernel.org/tip/fd89a137924e0710078c3ae855e7cec1c43cb845 Author: Joerg Roedel <joerg.roedel@amd.com> AuthorDate: Mon, 16 Aug 2010 14:38:33 +0200 Committer: H. Peter Anvin <hpa@zytor.com> CommitDate: Wed, 18 Aug 2010 09:17:20 -0700 x86-32: Separate 1:1 pagetables from swapper_pg_dir This patch fixes machine crashes which occur when heavily exercising the CPU hotplug codepaths on a 32-bit kernel. These crashes are caused by AMD Erratum 383 and result in a fatal machine check exception. Here's the scenario: 1. On 32-bit, the swapper_pg_dir page table is used as the initial page table for booting a secondary CPU. 2. To make this work, swapper_pg_dir needs a direct mapping of physical memory in it (the low mappings). By adding those low, large page (2M) mappings (PAE kernel), we create the necessary conditions for Erratum 383 to occur. 3. Other CPUs which do not participate in the off- and onlining game may use swapper_pg_dir while the low mappings are present (when leave_mm is called). For all steps below, the CPU referred to is a CPU that is using swapper_pg_dir, and not the CPU which is being onlined. 4. The presence of the low mappings in swapper_pg_dir can result in TLB entries for addresses below __PAGE_OFFSET to be established speculatively. These TLB entries are marked global and large. 5. When the CPU with such TLB entry switches to another page table, this TLB entry remains because it is global. 6. The process then generates an access to an address covered by the above TLB entry but there is a permission mismatch - the TLB entry covers a large global page not accessible to userspace. 7. Due to this permission mismatch a new 4kb, user TLB entry gets established. Further, Erratum 383 provides for a small window of time where both TLB entries are present. This results in an uncorrectable machine check exception signalling a TLB multimatch which panics the machine. There are two ways to fix this issue: 1. Always do a global TLB flush when a new cr3 is loaded and the old page table was swapper_pg_dir. I consider this a hack hard to understand and with performance implications 2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit does. This patch implements solution 2. It introduces a trampoline_pg_dir which has the same layout as swapper_pg_dir with low_mappings. This page table is used as the initial page table of the booting CPU. Later in the bringup process, it switches to swapper_pg_dir and does a global TLB flush. This fixes the crashes in our test cases. -v2: switch to swapper_pg_dir right after entering start_secondary() so that we are able to access percpu data which might not be mapped in the trampoline page table. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> LKML-Reference: <20100816123833.GB28147@aftab> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> --- arch/x86/include/asm/pgtable_32.h | 1 + arch/x86/include/asm/trampoline.h | 3 +++ arch/x86/kernel/head_32.S | 8 +++++++- arch/x86/kernel/setup.c | 2 ++ arch/x86/kernel/smpboot.c | 32 +++++++++++++------------------- arch/x86/kernel/trampoline.c | 18 ++++++++++++++++++ 6 files changed, 44 insertions(+), 20 deletions(-) diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h index 2984a25..f686f49 100644 --- a/arch/x86/include/asm/pgtable_32.h +++ b/arch/x86/include/asm/pgtable_32.h @@ -26,6 +26,7 @@ struct mm_struct; struct vm_area_struct; extern pgd_t swapper_pg_dir[1024]; +extern pgd_t trampoline_pg_dir[1024]; static inline void pgtable_cache_init(void) { } static inline void check_pgt_cache(void) { } diff --git a/arch/x86/include/asm/trampoline.h b/arch/x86/include/asm/trampoline.h index cb507bb..8f78fdf 100644 --- a/arch/x86/include/asm/trampoline.h +++ b/arch/x86/include/asm/trampoline.h @@ -13,14 +13,17 @@ extern unsigned char *trampoline_base; extern unsigned long init_rsp; extern unsigned long initial_code; +extern unsigned long initial_page_table; extern unsigned long initial_gs; #define TRAMPOLINE_SIZE roundup(trampoline_end - trampoline_data, PAGE_SIZE) extern unsigned long setup_trampoline(void); +extern void __init setup_trampoline_page_table(void); extern void __init reserve_trampoline_memory(void); #else static inline void reserve_trampoline_memory(void) {}; +extern void __init setup_trampoline_page_table(void) {}; #endif /* CONFIG_X86_TRAMPOLINE */ #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S index ff4c453..fa8c1b8 100644 --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -334,7 +334,7 @@ ENTRY(startup_32_smp) /* * Enable paging */ - movl $pa(swapper_pg_dir),%eax + movl pa(initial_page_table), %eax movl %eax,%cr3 /* set the page table pointer.. */ movl %cr0,%eax orl $X86_CR0_PG,%eax @@ -614,6 +614,8 @@ ignore_int: .align 4 ENTRY(initial_code) .long i386_start_kernel +ENTRY(initial_page_table) + .long pa(swapper_pg_dir) /* * BSS section @@ -629,6 +631,10 @@ ENTRY(swapper_pg_dir) #endif swapper_pg_fixmap: .fill 1024,4,0 +#ifdef CONFIG_X86_TRAMPOLINE +ENTRY(trampoline_pg_dir) + .fill 1024,4,0 +#endif ENTRY(empty_zero_page) .fill 4096,1,0 diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index b008e78..c3a4fbb 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1014,6 +1014,8 @@ void __init setup_arch(char **cmdline_p) paging_init(); x86_init.paging.pagetable_setup_done(swapper_pg_dir); + setup_trampoline_page_table(); + tboot_probe(); #ifdef CONFIG_X86_64 diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index a5e928b..abf4a86 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -73,7 +73,6 @@ #ifdef CONFIG_X86_32 u8 apicid_2_node[MAX_APICID]; -static int low_mappings; #endif /* State of each CPU */ @@ -281,6 +280,18 @@ notrace static void __cpuinit start_secondary(void *unused) * fragile that we want to limit the things done here to the * most necessary things. */ + +#ifdef CONFIG_X86_32 + /* + * Switch away from the trampoline page-table + * + * Do this before cpu_init() because it needs to access per-cpu + * data which may not be mapped in the trampoline page-table. + */ + load_cr3(swapper_pg_dir); + __flush_tlb_all(); +#endif + vmi_bringup(); cpu_init(); preempt_disable(); @@ -299,12 +310,6 @@ notrace static void __cpuinit start_secondary(void *unused) legacy_pic->chip->unmask(0); } -#ifdef CONFIG_X86_32 - while (low_mappings) - cpu_relax(); - __flush_tlb_all(); -#endif - /* This must be done before setting cpu_online_mask */ set_cpu_sibling_map(raw_smp_processor_id()); wmb(); @@ -750,6 +755,7 @@ do_rest: #ifdef CONFIG_X86_32 /* Stack for startup_32 can be just as for start_secondary onwards */ irq_ctx_init(cpu); + initial_page_table = __pa(&trampoline_pg_dir); #else clear_tsk_thread_flag(c_idle.idle, TIF_FORK); initial_gs = per_cpu_offset(cpu); @@ -897,20 +903,8 @@ int __cpuinit native_cpu_up(unsigned int cpu) per_cpu(cpu_state, cpu) = CPU_UP_PREPARE; -#ifdef CONFIG_X86_32 - /* init low mem mapping */ - clone_pgd_range(swapper_pg_dir, swapper_pg_dir + KERNEL_PGD_BOUNDARY, - min_t(unsigned long, KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY)); - flush_tlb_all(); - low_mappings = 1; - err = do_boot_cpu(apicid, cpu); - zap_low_mappings(false); - low_mappings = 0; -#else - err = do_boot_cpu(apicid, cpu); -#endif if (err) { pr_debug("do_boot_cpu failed %d\n", err); return -EIO; diff --git a/arch/x86/kernel/trampoline.c b/arch/x86/kernel/trampoline.c index c652ef6..a874495 100644 --- a/arch/x86/kernel/trampoline.c +++ b/arch/x86/kernel/trampoline.c @@ -1,6 +1,7 @@ #include <linux/io.h> #include <asm/trampoline.h> +#include <asm/pgtable.h> #include <asm/e820.h> #if defined(CONFIG_X86_64) && defined(CONFIG_ACPI_SLEEP) @@ -37,3 +38,20 @@ unsigned long __trampinit setup_trampoline(void) memcpy(trampoline_base, trampoline_data, TRAMPOLINE_SIZE); return virt_to_phys(trampoline_base); } + +void __init setup_trampoline_page_table(void) +{ +#ifdef CONFIG_X86_32 + /* Copy kernel address range */ + clone_pgd_range(trampoline_pg_dir + KERNEL_PGD_BOUNDARY, + swapper_pg_dir + KERNEL_PGD_BOUNDARY, + min_t(unsigned long, KERNEL_PGD_PTRS, + KERNEL_PGD_BOUNDARY)); + + /* Initialize low mappings */ + clone_pgd_range(trampoline_pg_dir, + swapper_pg_dir + KERNEL_PGD_BOUNDARY, + min_t(unsigned long, KERNEL_PGD_PTRS, + KERNEL_PGD_BOUNDARY)); +#endif +} ^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH -v2 2/2] x86, cpu hotplug: Fix cpu bringup concurrency issue 2010-08-16 12:19 ` Borislav Petkov 2010-08-16 12:38 ` [PATCH -v2 " Borislav Petkov @ 2010-08-16 12:39 ` Borislav Petkov 2010-08-18 19:03 ` H. Peter Anvin 1 sibling, 1 reply; 31+ messages in thread From: Borislav Petkov @ 2010-08-16 12:39 UTC (permalink / raw) To: H. Peter Anvin Cc: Joerg Roedel, Roedel, Joerg, mingo, tglx, Herrmann3, Andreas, Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel When testing cpu hotplug code on 32-bit we kept hitting the "CPU%d: Stuck ??" message due to multiple cores concurrently accessing the cpu_callin_mask, among others. Since these codepaths are not protected from concurrent access due to the fact that there's no sane reason for making an already complex code unnecessarily more complex - we hit the issue only when insanely switching cores off- and online - serialize hotplugging cores on the sysfs level and be done with it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> --- arch/x86/Kconfig | 6 ++++++ arch/x86/kernel/smpboot.c | 19 +++++++++++++++++++ 2 files changed, 25 insertions(+), 0 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index dcb0593..1598663 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -58,6 +58,7 @@ config X86 select ANON_INODES select HAVE_ARCH_KMEMCHECK select HAVE_USER_RETURN_NOTIFIER + select ARCH_CPU_PROBE_RELEASE config INSTRUCTION_DECODER def_bool (KPROBES || PERF_EVENTS) @@ -247,6 +248,11 @@ config ARCH_HWEIGHT_CFLAGS config KTIME_SCALAR def_bool X86_32 + +config ARCH_CPU_PROBE_RELEASE + def_bool y + depends on HOTPLUG_CPU + source "init/Kconfig" source "kernel/Kconfig.freezer" diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 2d148a9..733ad2b 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -90,6 +90,25 @@ DEFINE_PER_CPU(int, cpu_state) = { 0 }; static DEFINE_PER_CPU(struct task_struct *, idle_thread_array); #define get_idle_for_cpu(x) (per_cpu(idle_thread_array, x)) #define set_idle_for_cpu(x, p) (per_cpu(idle_thread_array, x) = (p)) + +/* + * We need this for trampoline_base protection from concurrent accesses when + * off- and onlining cores wildly. + */ +static DEFINE_MUTEX(x86_cpu_hotplug_driver_mutex); + +void cpu_hotplug_driver_lock() +{ + mutex_lock(&x86_cpu_hotplug_driver_mutex); +} + +void cpu_hotplug_driver_unlock() +{ + mutex_unlock(&x86_cpu_hotplug_driver_mutex); +} + +ssize_t arch_cpu_probe(const char *buf, size_t count) { return -1; } +ssize_t arch_cpu_release(const char *buf, size_t count) { return -1; } #else static struct task_struct *idle_thread_array[NR_CPUS] __cpuinitdata ; #define get_idle_for_cpu(x) (idle_thread_array[(x)]) -- 1.7.1 -- Regards/Gruss, Boris. ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH -v2 2/2] x86, cpu hotplug: Fix cpu bringup concurrency issue 2010-08-16 12:39 ` [PATCH -v2 2/2] x86, cpu hotplug: Fix cpu bringup concurrency issue Borislav Petkov @ 2010-08-18 19:03 ` H. Peter Anvin 2010-08-18 19:28 ` Borislav Petkov 0 siblings, 1 reply; 31+ messages in thread From: H. Peter Anvin @ 2010-08-18 19:03 UTC (permalink / raw) To: Borislav Petkov Cc: Joerg Roedel, Roedel, Joerg, mingo, tglx, Herrmann3, Andreas, Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel On 08/16/2010 05:39 AM, Borislav Petkov wrote: > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index dcb0593..1598663 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -58,6 +58,7 @@ config X86 > select ANON_INODES > select HAVE_ARCH_KMEMCHECK > select HAVE_USER_RETURN_NOTIFIER > + select ARCH_CPU_PROBE_RELEASE > warning: (X86) selects ARCH_CPU_PROBE_RELEASE which has unmet direct dependencies (HOTPLUG_CPU) ... and build failure when compiling without CONFIG_HOTPLUG_CPU. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH -v2 2/2] x86, cpu hotplug: Fix cpu bringup concurrency issue 2010-08-18 19:03 ` H. Peter Anvin @ 2010-08-18 19:28 ` Borislav Petkov 2010-08-18 20:04 ` H. Peter Anvin 0 siblings, 1 reply; 31+ messages in thread From: Borislav Petkov @ 2010-08-18 19:28 UTC (permalink / raw) To: H. Peter Anvin Cc: Joerg Roedel, Roedel, Joerg, mingo, tglx, Herrmann3, Andreas, Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel From: "H. Peter Anvin" <hpa@zytor.com> Date: Wed, Aug 18, 2010 at 03:03:33PM -0400 > On 08/16/2010 05:39 AM, Borislav Petkov wrote: > > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > > index dcb0593..1598663 100644 > > --- a/arch/x86/Kconfig > > +++ b/arch/x86/Kconfig > > @@ -58,6 +58,7 @@ config X86 > > select ANON_INODES > > select HAVE_ARCH_KMEMCHECK > > select HAVE_USER_RETURN_NOTIFIER > > + select ARCH_CPU_PROBE_RELEASE > > > > warning: (X86) selects ARCH_CPU_PROBE_RELEASE which has unmet direct > dependencies (HOTPLUG_CPU) well I got +config ARCH_CPU_PROBE_RELEASE + def_bool y + depends on HOTPLUG_CPU but I guess ARCH_CPU_PROBE_RELEASE gets selected still. Why isn't that "depends on" forcing it, I guess select is stronger. > ... and build failure when compiling without CONFIG_HOTPLUG_CPU. This hunk fixes the !CONFIG_HOTPLUG_CPU build here but I should've caught it earlier, shame on me. Sorry for the inconvenience, can you please drop this second patch for now, I'll send you a revised version tomorrow. Thanks. diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c index 251acea..273cfb9 100644 --- a/drivers/base/cpu.c +++ b/drivers/base/cpu.c @@ -261,7 +261,7 @@ int __init cpu_dev_init(void) } static struct sysdev_class_attribute *cpu_sysdev_class_attrs[] = { -#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE +#if defined(CONFIG_ARCH_CPU_PROBE_RELEASE) && defined(CONFIG_HOTPLUG_CPU) &attr_probe, &attr_release, #endif -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH -v2 2/2] x86, cpu hotplug: Fix cpu bringup concurrency issue 2010-08-18 19:28 ` Borislav Petkov @ 2010-08-18 20:04 ` H. Peter Anvin 2010-08-19 18:10 ` [PATCH -v2.1 " Borislav Petkov 0 siblings, 1 reply; 31+ messages in thread From: H. Peter Anvin @ 2010-08-18 20:04 UTC (permalink / raw) To: Borislav Petkov Cc: Joerg Roedel, Roedel, Joerg, mingo, tglx, Herrmann3, Andreas, Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel On 08/18/2010 12:28 PM, Borislav Petkov wrote: >> >> warning: (X86) selects ARCH_CPU_PROBE_RELEASE which has unmet direct >> dependencies (HOTPLUG_CPU) > > well I got > > +config ARCH_CPU_PROBE_RELEASE > + def_bool y > + depends on HOTPLUG_CPU > > but I guess ARCH_CPU_PROBE_RELEASE gets selected still. Why isn't that > "depends on" forcing it, I guess select is stronger. > >> ... and build failure when compiling without CONFIG_HOTPLUG_CPU. > > This hunk fixes the !CONFIG_HOTPLUG_CPU build here but I should've > caught it earlier, shame on me. Sorry for the inconvenience, can you > please drop this second patch for now, I'll send you a revised version > tomorrow. > I would prefer to not continue to have a piece of Kconfig code which warns, so I'll hold off on this until I have your revised patch. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH -v2.1 2/2] x86, cpu hotplug: Fix cpu bringup concurrency issue 2010-08-18 20:04 ` H. Peter Anvin @ 2010-08-19 18:10 ` Borislav Petkov 2010-08-19 22:58 ` [tip:x86/urgent] x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues tip-bot for Borislav Petkov 0 siblings, 1 reply; 31+ messages in thread From: Borislav Petkov @ 2010-08-19 18:10 UTC (permalink / raw) To: H. Peter Anvin Cc: Joerg Roedel, Roedel, Joerg, mingo, tglx, Herrmann3, Andreas, Seidel, Conny, Sarathy, Bhavna, greg, x86, linux-kernel When testing cpu hotplug code on 32-bit we kept hitting the "CPU%d: Stuck ??" message due to multiple cores concurrently accessing the cpu_callin_mask, among others. Since these codepaths are not protected from concurrent access due to the fact that there's no sane reason for making an already complex code unnecessarily more complex - we hit the issue only when insanely switching cores off- and online - serialize hotplugging cores on the sysfs level and be done with it. [ v2.1: fix !HOTPLUG_CPU build ] Cc: <stable@kernel.org> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> --- arch/x86/Kconfig | 5 +++++ arch/x86/kernel/smpboot.c | 19 +++++++++++++++++++ 2 files changed, 24 insertions(+), 0 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a84fc34..ac7827f 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -245,6 +245,11 @@ config ARCH_HWEIGHT_CFLAGS config KTIME_SCALAR def_bool X86_32 + +config ARCH_CPU_PROBE_RELEASE + def_bool y + depends on HOTPLUG_CPU + source "init/Kconfig" source "kernel/Kconfig.freezer" diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index abf4a86..8b3bfc4 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -90,6 +90,25 @@ DEFINE_PER_CPU(int, cpu_state) = { 0 }; static DEFINE_PER_CPU(struct task_struct *, idle_thread_array); #define get_idle_for_cpu(x) (per_cpu(idle_thread_array, x)) #define set_idle_for_cpu(x, p) (per_cpu(idle_thread_array, x) = (p)) + +/* + * We need this for trampoline_base protection from concurrent accesses when + * off- and onlining cores wildly. + */ +static DEFINE_MUTEX(x86_cpu_hotplug_driver_mutex); + +void cpu_hotplug_driver_lock() +{ + mutex_lock(&x86_cpu_hotplug_driver_mutex); +} + +void cpu_hotplug_driver_unlock() +{ + mutex_unlock(&x86_cpu_hotplug_driver_mutex); +} + +ssize_t arch_cpu_probe(const char *buf, size_t count) { return -1; } +ssize_t arch_cpu_release(const char *buf, size_t count) { return -1; } #else static struct task_struct *idle_thread_array[NR_CPUS] __cpuinitdata ; #define get_idle_for_cpu(x) (idle_thread_array[(x)]) -- 1.7.1 -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 ^ permalink raw reply related [flat|nested] 31+ messages in thread
* [tip:x86/urgent] x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues 2010-08-19 18:10 ` [PATCH -v2.1 " Borislav Petkov @ 2010-08-19 22:58 ` tip-bot for Borislav Petkov 0 siblings, 0 replies; 31+ messages in thread From: tip-bot for Borislav Petkov @ 2010-08-19 22:58 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, hpa, mingo, bp, stable, tglx, hpa, borislav.petkov Commit-ID: d7c53c9e822a4fefa13a0cae76f3190bfd0d5c11 Gitweb: http://git.kernel.org/tip/d7c53c9e822a4fefa13a0cae76f3190bfd0d5c11 Author: Borislav Petkov <bp@amd64.org> AuthorDate: Thu, 19 Aug 2010 20:10:29 +0200 Committer: H. Peter Anvin <hpa@linux.intel.com> CommitDate: Thu, 19 Aug 2010 14:47:43 -0700 x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues When testing cpu hotplug code on 32-bit we kept hitting the "CPU%d: Stuck ??" message due to multiple cores concurrently accessing the cpu_callin_mask, among others. Since these codepaths are not protected from concurrent access due to the fact that there's no sane reason for making an already complex code unnecessarily more complex - we hit the issue only when insanely switching cores off- and online - serialize hotplugging cores on the sysfs level and be done with it. [ v2.1: fix !HOTPLUG_CPU build ] Cc: <stable@kernel.org> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20100819181029.GC17171@aftab> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> --- arch/x86/Kconfig | 5 +++++ arch/x86/kernel/smpboot.c | 19 +++++++++++++++++++ 2 files changed, 24 insertions(+), 0 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a84fc34..ac7827f 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -245,6 +245,11 @@ config ARCH_HWEIGHT_CFLAGS config KTIME_SCALAR def_bool X86_32 + +config ARCH_CPU_PROBE_RELEASE + def_bool y + depends on HOTPLUG_CPU + source "init/Kconfig" source "kernel/Kconfig.freezer" diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index abf4a86..8b3bfc4 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -90,6 +90,25 @@ DEFINE_PER_CPU(int, cpu_state) = { 0 }; static DEFINE_PER_CPU(struct task_struct *, idle_thread_array); #define get_idle_for_cpu(x) (per_cpu(idle_thread_array, x)) #define set_idle_for_cpu(x, p) (per_cpu(idle_thread_array, x) = (p)) + +/* + * We need this for trampoline_base protection from concurrent accesses when + * off- and onlining cores wildly. + */ +static DEFINE_MUTEX(x86_cpu_hotplug_driver_mutex); + +void cpu_hotplug_driver_lock() +{ + mutex_lock(&x86_cpu_hotplug_driver_mutex); +} + +void cpu_hotplug_driver_unlock() +{ + mutex_unlock(&x86_cpu_hotplug_driver_mutex); +} + +ssize_t arch_cpu_probe(const char *buf, size_t count) { return -1; } +ssize_t arch_cpu_release(const char *buf, size_t count) { return -1; } #else static struct task_struct *idle_thread_array[NR_CPUS] __cpuinitdata ; #define get_idle_for_cpu(x) (idle_thread_array[(x)]) ^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH 2/2] x86, cpu hotplug: Fix cpu bringup concurrency issue 2010-08-04 16:45 [PATCH 0/2] Fix 32-bit CPU hotplug issue on AMD Borislav Petkov 2010-08-04 16:45 ` [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines Borislav Petkov @ 2010-08-04 16:45 ` Borislav Petkov 1 sibling, 0 replies; 31+ messages in thread From: Borislav Petkov @ 2010-08-04 16:45 UTC (permalink / raw) To: hpa, mingo, tglx Cc: andreas.herrmann3, conny.seidel, joerg.roedel, Bhavna.Sarathy, greg, x86, linux-kernel From: Borislav Petkov <borislav.petkov@amd.com> When testing cpu hotplug code on 32-bit we kept hitting the "CPU%d: Stuck ??" message due to multiple cores concurrently accessing the cpu_callin_mask, among others. Since these codepaths are not protected from concurrent access due to the fact that there's no sane reason for making an already complex code unnecessarily more complex - we hit the issue only when insanely switching cores off- and online - serialize hotplugging cores on the sysfs level and be done with it. Cc: <stable@kernel.org> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> --- arch/x86/Kconfig | 6 ++++++ arch/x86/kernel/smpboot.c | 19 +++++++++++++++++++ 2 files changed, 25 insertions(+), 0 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index dcb0593..1598663 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -58,6 +58,7 @@ config X86 select ANON_INODES select HAVE_ARCH_KMEMCHECK select HAVE_USER_RETURN_NOTIFIER + select ARCH_CPU_PROBE_RELEASE config INSTRUCTION_DECODER def_bool (KPROBES || PERF_EVENTS) @@ -247,6 +248,11 @@ config ARCH_HWEIGHT_CFLAGS config KTIME_SCALAR def_bool X86_32 + +config ARCH_CPU_PROBE_RELEASE + def_bool y + depends on HOTPLUG_CPU + source "init/Kconfig" source "kernel/Kconfig.freezer" diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 6859a42..9e8e61b 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -90,6 +90,25 @@ DEFINE_PER_CPU(int, cpu_state) = { 0 }; static DEFINE_PER_CPU(struct task_struct *, idle_thread_array); #define get_idle_for_cpu(x) (per_cpu(idle_thread_array, x)) #define set_idle_for_cpu(x, p) (per_cpu(idle_thread_array, x) = (p)) + +/* + * We need this for trampoline_base protection from concurrent accesses when + * off- and onlining cores wildly. + */ +static DEFINE_MUTEX(x86_cpu_hotplug_driver_mutex); + +void cpu_hotplug_driver_lock() +{ + mutex_lock(&x86_cpu_hotplug_driver_mutex); +} + +void cpu_hotplug_driver_unlock() +{ + mutex_unlock(&x86_cpu_hotplug_driver_mutex); +} + +ssize_t arch_cpu_probe(const char *buf, size_t count) { return -1; } +ssize_t arch_cpu_release(const char *buf, size_t count) { return -1; } #else static struct task_struct *idle_thread_array[NR_CPUS] __cpuinitdata ; #define get_idle_for_cpu(x) (idle_thread_array[(x)]) -- 1.6.4.4 ^ permalink raw reply related [flat|nested] 31+ messages in thread
end of thread, other threads:[~2010-09-02 9:10 UTC | newest] Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-08-04 16:45 [PATCH 0/2] Fix 32-bit CPU hotplug issue on AMD Borislav Petkov 2010-08-04 16:45 ` [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines Borislav Petkov 2010-08-04 23:05 ` H. Peter Anvin 2010-08-05 4:48 ` Borislav Petkov 2010-08-05 7:45 ` Roedel, Joerg 2010-08-05 14:14 ` H. Peter Anvin 2010-08-12 14:41 ` Joerg Roedel 2010-08-12 15:34 ` H. Peter Anvin 2010-08-12 15:47 ` Borislav Petkov 2010-08-12 15:47 ` H. Peter Anvin 2010-08-12 17:38 ` Borislav Petkov 2010-08-24 7:33 ` Initial working version (Re: [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines) Borislav Petkov 2010-08-29 20:32 ` [PATCH] x86-32, mm: Add an initial page table for core bootstrapping Borislav Petkov 2010-09-02 9:10 ` [PATCH -v1.1] " Borislav Petkov 2010-08-12 17:06 ` [PATCH 1/2] x86-32: Fix crashes with CPU hotplug on AMD machines Joerg Roedel 2010-08-12 19:01 ` H. Peter Anvin 2010-08-12 19:04 ` Joerg Roedel 2010-08-13 12:35 ` Borislav Petkov 2010-08-16 12:19 ` Borislav Petkov 2010-08-16 12:38 ` [PATCH -v2 " Borislav Petkov 2010-08-18 18:41 ` H. Peter Anvin 2010-08-18 19:09 ` Borislav Petkov 2010-08-18 20:55 ` [tip:x86/urgent] x86-32: Fix dummy trampoline-related inline stubs tip-bot for H. Peter Anvin 2010-08-18 20:55 ` [tip:x86/urgent] x86-32: Separate 1:1 pagetables from swapper_pg_dir tip-bot for Joerg Roedel 2010-08-16 12:39 ` [PATCH -v2 2/2] x86, cpu hotplug: Fix cpu bringup concurrency issue Borislav Petkov 2010-08-18 19:03 ` H. Peter Anvin 2010-08-18 19:28 ` Borislav Petkov 2010-08-18 20:04 ` H. Peter Anvin 2010-08-19 18:10 ` [PATCH -v2.1 " Borislav Petkov 2010-08-19 22:58 ` [tip:x86/urgent] x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues tip-bot for Borislav Petkov 2010-08-04 16:45 ` [PATCH 2/2] x86, cpu hotplug: Fix cpu bringup concurrency issue Borislav Petkov
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.