linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] x86 boot, pda and gdt cleanups
@ 2007-03-06 12:39 Rusty Russell
  2007-03-06 12:53 ` [PATCH 1/8] Remove cpu_gdt_table: use boot_gdt_table until migration to per-cpu Rusty Russell
  2007-03-13 20:48 ` [PATCH 0/8] x86 boot, pda and gdt cleanups Jeremy Fitzhardinge
  0 siblings, 2 replies; 36+ messages in thread
From: Rusty Russell @ 2007-03-06 12:39 UTC (permalink / raw)
  To: lkml - Kernel Mailing List
  Cc: Zachary Amsden, Jeremy Fitzhardinge, Ingo Molnar, Andrew Morton,
	Andi Kleen

Hi all,

	The GDT stuff on x86 is a little more complex than it need be, but
playing with boot code is always dangerous.  These compile and boot on
UP and SMP for me, but Andrew should let the cook in -mm for a while.

Thanks!
Rusty.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/8] Remove cpu_gdt_table: use boot_gdt_table until migration to per-cpu
  2007-03-06 12:39 [PATCH 0/8] x86 boot, pda and gdt cleanups Rusty Russell
@ 2007-03-06 12:53 ` Rusty Russell
  2007-03-06 12:54   ` [PATCH 2/8] Remove NR_CPUS from asm-generic/percpu.h Rusty Russell
  2007-03-06 13:20   ` [PATCH 1/8] Remove cpu_gdt_table: use boot_gdt_table until migration to per-cpu Ingo Molnar
  2007-03-13 20:48 ` [PATCH 0/8] x86 boot, pda and gdt cleanups Jeremy Fitzhardinge
  1 sibling, 2 replies; 36+ messages in thread
From: Rusty Russell @ 2007-03-06 12:53 UTC (permalink / raw)
  To: lkml - Kernel Mailing List
  Cc: Zachary Amsden, Jeremy Fitzhardinge, Ingo Molnar, Andrew Morton,
	Andi Kleen

Linux uses three GDTs to boot: the boot_gdt_table, which contains only
the __BOOT_CS and __BOOT_DS entries is used first up, before kernel is
mapped at PAGE_OFFSET.  Then we transition to cpu_gdt_table during
boot, and finally we allocate a per-cpu GDT and switch to that.

We can simplify this by using the boot_gdt_table until switching to
the per-cpu GDT table.  As a bonus, this gets rid of the
horribly-named "cpu_gdt_table" (it's not per-cpu, and the T in GDT
already stands for table).

Finally, some old bogus comments are deleted.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff -r 9cf03cbf8bde arch/i386/kernel/cpu/common.c
--- a/arch/i386/kernel/cpu/common.c	Thu Mar 01 16:40:41 2007 +1100
+++ b/arch/i386/kernel/cpu/common.c	Thu Mar 01 21:58:16 2007 +1100
@@ -694,7 +694,7 @@ __cpuinit int init_gdt(int cpu, struct t
 	 * Initialize the per-CPU GDT with the boot GDT,
 	 * and set up the GDT descriptor:
 	 */
- 	memcpy(gdt, cpu_gdt_table, GDT_SIZE);
+ 	memcpy(gdt, boot_gdt_table, GDT_SIZE);
 	cpu_gdt_descr->size = GDT_SIZE - 1;
 
 	pack_descriptor((u32 *)&gdt[GDT_ENTRY_PDA].a,
diff -r 9cf03cbf8bde arch/i386/kernel/head.S
--- a/arch/i386/kernel/head.S	Thu Mar 01 16:40:41 2007 +1100
+++ b/arch/i386/kernel/head.S	Thu Mar 01 22:05:37 2007 +1100
@@ -595,31 +595,19 @@ idt_descr:
 	.word IDT_ENTRIES*8-1		# idt contains 256 entries
 	.long idt_table
 
-# boot GDT descriptor (later on used by CPU#0):
+# The boot GDT descriptor once paging is enabled.
 	.word 0				# 32 bit align gdt_desc.address
 ENTRY(early_gdt_descr)
 	.word GDT_ENTRIES*8-1
-	.long cpu_gdt_table
-
-/*
- * The boot_gdt_table must mirror the equivalent in setup.S and is
- * used only for booting.
- */
+	.long boot_gdt_table
+
+/* The boot Global Descriptor Table: after boot we allocate a per-cpu copy */
 	.align L1_CACHE_BYTES
 ENTRY(boot_gdt_table)
-	.fill GDT_ENTRY_BOOT_CS,8,0
-	.quad 0x00cf9a000000ffff	/* kernel 4GB code at 0x00000000 */
-	.quad 0x00cf92000000ffff	/* kernel 4GB data at 0x00000000 */
-
-/*
- * The Global Descriptor Table contains 28 quadwords, per-CPU.
- */
-	.align L1_CACHE_BYTES
-ENTRY(cpu_gdt_table)
 	.quad 0x0000000000000000	/* NULL descriptor */
 	.quad 0x0000000000000000	/* 0x0b reserved */
-	.quad 0x0000000000000000	/* 0x13 reserved */
-	.quad 0x0000000000000000	/* 0x1b reserved */
+	.quad 0x00cf9a000000ffff	/* boot: 4GB code at 0x00000000 */
+	.quad 0x00cf92000000ffff	/* boot: 4GB data at 0x00000000 */
 	.quad 0x0000000000000000	/* 0x20 unused */
 	.quad 0x0000000000000000	/* 0x28 unused */
 	.quad 0x0000000000000000	/* 0x33 TLS entry 1 */
diff -r 9cf03cbf8bde include/asm-i386/desc.h
--- a/include/asm-i386/desc.h	Thu Mar 01 16:40:41 2007 +1100
+++ b/include/asm-i386/desc.h	Thu Mar 01 21:57:37 2007 +1100
@@ -12,7 +12,7 @@
 
 #include <asm/mmu.h>
 
-extern struct desc_struct cpu_gdt_table[GDT_ENTRIES];
+extern struct desc_struct boot_gdt_table[GDT_ENTRIES];
 
 struct Xgt_desc_struct {
 	unsigned short size;



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 2/8] Remove NR_CPUS from asm-generic/percpu.h
  2007-03-06 12:53 ` [PATCH 1/8] Remove cpu_gdt_table: use boot_gdt_table until migration to per-cpu Rusty Russell
@ 2007-03-06 12:54   ` Rusty Russell
  2007-03-06 12:55     ` [PATCH 3/8] Use per-cpu variables for GDT, PDA Rusty Russell
  2007-03-06 13:21     ` [PATCH 2/8] Remove NR_CPUS from asm-generic/percpu.h Ingo Molnar
  2007-03-06 13:20   ` [PATCH 1/8] Remove cpu_gdt_table: use boot_gdt_table until migration to per-cpu Ingo Molnar
  1 sibling, 2 replies; 36+ messages in thread
From: Rusty Russell @ 2007-03-06 12:54 UTC (permalink / raw)
  To: lkml - Kernel Mailing List
  Cc: Zachary Amsden, Jeremy Fitzhardinge, Ingo Molnar, Andrew Morton,
	Andi Kleen

I managed to get an error about NR_CPUS undefined after a make
allyesconfig.  Admittedly this was a patched kernel, but it's easy to
remove it and avoid the error in future even if it's OK at the moment.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff -r 4519e3475c4f include/asm-generic/percpu.h
--- a/include/asm-generic/percpu.h	Mon Mar 05 12:17:10 2007 +1100
+++ b/include/asm-generic/percpu.h	Mon Mar 05 16:14:05 2007 +1100
@@ -5,7 +5,7 @@
 #define __GENERIC_PER_CPU
 #ifdef CONFIG_SMP
 
-extern unsigned long __per_cpu_offset[NR_CPUS];
+extern unsigned long __per_cpu_offset[];
 
 #define per_cpu_offset(x) (__per_cpu_offset[x])
 



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 3/8] Use per-cpu variables for GDT, PDA
  2007-03-06 12:54   ` [PATCH 2/8] Remove NR_CPUS from asm-generic/percpu.h Rusty Russell
@ 2007-03-06 12:55     ` Rusty Russell
  2007-03-06 12:57       ` [PATCH 4/8] Cleanup setup_pda Rusty Russell
  2007-03-06 18:14       ` [PATCH 3/8] Use per-cpu variables for GDT, PDA Jeremy Fitzhardinge
  2007-03-06 13:21     ` [PATCH 2/8] Remove NR_CPUS from asm-generic/percpu.h Ingo Molnar
  1 sibling, 2 replies; 36+ messages in thread
From: Rusty Russell @ 2007-03-06 12:55 UTC (permalink / raw)
  To: lkml - Kernel Mailing List
  Cc: Zachary Amsden, Jeremy Fitzhardinge, Ingo Molnar, Andrew Morton,
	Andi Kleen

Allocating PDA and GDT at boot is a pain.  Using simple per-cpu
variables adds happiness (although we need the GDT page-aligned for
Xen, see later).

Finally, we can simply call it "cpu_gdt" rather than enduring the
superfluous and unnecessarily redundant tautology of "cpu_gdt_table".

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff -r c2b61e13394d arch/i386/kernel/cpu/common.c
--- a/arch/i386/kernel/cpu/common.c	Fri Mar 02 09:35:32 2007 +1100
+++ b/arch/i386/kernel/cpu/common.c	Mon Mar 05 11:34:31 2007 +1100
@@ -25,8 +25,10 @@ DEFINE_PER_CPU(struct Xgt_desc_struct, c
 DEFINE_PER_CPU(struct Xgt_desc_struct, cpu_gdt_descr);
 EXPORT_PER_CPU_SYMBOL(cpu_gdt_descr);
 
-struct i386_pda *_cpu_pda[NR_CPUS] __read_mostly;
-EXPORT_SYMBOL(_cpu_pda);
+DEFINE_PER_CPU(struct desc_struct, cpu_gdt[GDT_ENTRIES]);
+
+DEFINE_PER_CPU(struct i386_pda, _cpu_pda);
+EXPORT_PER_CPU_SYMBOL(_cpu_pda);
 
 static int cachesize_override __cpuinitdata = -1;
 static int disable_x86_fxsr __cpuinitdata;
@@ -609,52 +611,6 @@ struct pt_regs * __devinit idle_regs(str
 	return regs;
 }
 
-static __cpuinit int alloc_gdt(int cpu)
-{
-	struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, cpu);
-	struct desc_struct *gdt;
-	struct i386_pda *pda;
-
-	gdt = (struct desc_struct *)cpu_gdt_descr->address;
-	pda = cpu_pda(cpu);
-
-	/*
-	 * This is a horrible hack to allocate the GDT.  The problem
-	 * is that cpu_init() is called really early for the boot CPU
-	 * (and hence needs bootmem) but much later for the secondary
-	 * CPUs, when bootmem will have gone away
-	 */
-	if (NODE_DATA(0)->bdata->node_bootmem_map) {
-		BUG_ON(gdt != NULL || pda != NULL);
-
-		gdt = alloc_bootmem_pages(PAGE_SIZE);
-		pda = alloc_bootmem(sizeof(*pda));
-		/* alloc_bootmem(_pages) panics on failure, so no check */
-
-		memset(gdt, 0, PAGE_SIZE);
-		memset(pda, 0, sizeof(*pda));
-	} else {
-		/* GDT and PDA might already have been allocated if
-		   this is a CPU hotplug re-insertion. */
-		if (gdt == NULL)
-			gdt = (struct desc_struct *)get_zeroed_page(GFP_KERNEL);
-
-		if (pda == NULL)
-			pda = kmalloc_node(sizeof(*pda), GFP_KERNEL, cpu_to_node(cpu));
-
-		if (unlikely(!gdt || !pda)) {
-			free_pages((unsigned long)gdt, 0);
-			kfree(pda);
-			return 0;
-		}
-	}
-
- 	cpu_gdt_descr->address = (unsigned long)gdt;
-	cpu_pda(cpu) = pda;
-
-	return 1;
-}
-
 /* Initial PDA used by boot CPU */
 struct i386_pda boot_pda = {
 	._pda = &boot_pda,
@@ -670,31 +626,17 @@ static inline void set_kernel_fs(void)
 	asm volatile ("mov %0, %%fs" : : "r" (__KERNEL_PDA) : "memory");
 }
 
-/* Initialize the CPU's GDT and PDA.  The boot CPU does this for
-   itself, but secondaries find this done for them. */
-__cpuinit int init_gdt(int cpu, struct task_struct *idle)
+/* Initialize the CPU's GDT and PDA.  This is either the boot CPU doing itself
+   (still using boot_gdt_table), or a CPU doing it for a secondary which
+   will soon come up. */
+__cpuinit void init_gdt(int cpu, struct task_struct *idle)
 {
 	struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, cpu);
-	struct desc_struct *gdt;
-	struct i386_pda *pda;
-
-	/* For non-boot CPUs, the GDT and PDA should already have been
-	   allocated. */
-	if (!alloc_gdt(cpu)) {
-		printk(KERN_CRIT "CPU%d failed to allocate GDT or PDA\n", cpu);
-		return 0;
-	}
-
-	gdt = (struct desc_struct *)cpu_gdt_descr->address;
-	pda = cpu_pda(cpu);
-
-	BUG_ON(gdt == NULL || pda == NULL);
-
-	/*
-	 * Initialize the per-CPU GDT with the boot GDT,
-	 * and set up the GDT descriptor:
-	 */
+	struct desc_struct *gdt = per_cpu(cpu_gdt, cpu);
+	struct i386_pda *pda = &per_cpu(_cpu_pda, cpu);
+
  	memcpy(gdt, boot_gdt_table, GDT_SIZE);
+ 	cpu_gdt_descr->address = (unsigned long)gdt;
 	cpu_gdt_descr->size = GDT_SIZE - 1;
 
 	pack_descriptor((u32 *)&gdt[GDT_ENTRY_PDA].a,
@@ -706,17 +648,13 @@ __cpuinit int init_gdt(int cpu, struct t
 	pda->_pda = pda;
 	pda->cpu_number = cpu;
 	pda->pcurrent = idle;
-
-	return 1;
-}
-
+}
+
+/* Move this CPU from boot_gdt_table & boot_pda to this cpu's proper one. */
 void __cpuinit cpu_set_gdt(int cpu)
 {
 	struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, cpu);
 
-	/* Reinit these anyway, even if they've already been done (on
-	   the boot CPU, this will transition from the boot gdt+pda to
-	   the real ones). */
 	load_gdt(cpu_gdt_descr);
 	set_kernel_fs();
 }
@@ -804,13 +742,8 @@ void __cpuinit cpu_init(void)
 	struct task_struct *curr = current;
 
 	/* Set up the real GDT and PDA, so we can transition from the
-	   boot versions. */
-	if (!init_gdt(cpu, curr)) {
-		/* failed to allocate something; not much we can do... */
-		for (;;)
-			local_irq_enable();
-	}
-
+	   boot_gdt_table & boot_pda. */
+	init_gdt(cpu, curr);
 	cpu_set_gdt(cpu);
 	_cpu_init(cpu, curr);
 }
diff -r c2b61e13394d arch/i386/kernel/smpboot.c
--- a/arch/i386/kernel/smpboot.c	Fri Mar 02 09:35:32 2007 +1100
+++ b/arch/i386/kernel/smpboot.c	Fri Mar 02 10:59:18 2007 +1100
@@ -813,13 +813,7 @@ static int __cpuinit do_boot_cpu(int api
 	if (IS_ERR(idle))
 		panic("failed fork for CPU %d", cpu);
 
-	/* Pre-allocate and initialize the CPU's GDT and PDA so it
-	   doesn't have to do any memory allocation during the
-	   delicate CPU-bringup phase. */
-	if (!init_gdt(cpu, idle)) {
-		printk(KERN_INFO "Couldn't allocate GDT/PDA for CPU %d\n", cpu);
-		return -1;	/* ? */
-	}
+	init_gdt(cpu, idle);
 
 	idle->thread.eip = (unsigned long) start_secondary;
 	/* start_eip had better be page-aligned! */
@@ -945,24 +939,11 @@ static int __cpuinit __smp_prepare_cpu(i
 	DECLARE_COMPLETION_ONSTACK(done);
 	struct warm_boot_cpu_info info;
 	int	apicid, ret;
-	struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, cpu);
 
 	apicid = x86_cpu_to_apicid[cpu];
 	if (apicid == BAD_APICID) {
 		ret = -ENODEV;
 		goto exit;
-	}
-
-	/*
-	 * the CPU isn't initialized at boot time, allocate gdt table here.
-	 * cpu_init will initialize it
-	 */
-	if (!cpu_gdt_descr->address) {
-		cpu_gdt_descr->address = get_zeroed_page(GFP_KERNEL);
-		if (!cpu_gdt_descr->address)
-			printk(KERN_CRIT "CPU%d failed to allocate GDT\n", cpu);
-			ret = -ENOMEM;
-			goto exit;
 	}
 
 	info.complete = &done;
diff -r c2b61e13394d arch/i386/mach-voyager/voyager_smp.c
--- a/arch/i386/mach-voyager/voyager_smp.c	Fri Mar 02 09:35:32 2007 +1100
+++ b/arch/i386/mach-voyager/voyager_smp.c	Fri Mar 02 10:28:14 2007 +1100
@@ -580,15 +580,7 @@ do_boot_cpu(__u8 cpu)
 	/* init_tasks (in sched.c) is indexed logically */
 	stack_start.esp = (void *) idle->thread.esp;
 
-	/* Pre-allocate and initialize the CPU's GDT and PDA so it
-	   doesn't have to do any memory allocation during the
-	   delicate CPU-bringup phase. */
-	if (!init_gdt(cpu, idle)) {
-		printk(KERN_INFO "Couldn't allocate GDT/PDA for CPU %d\n", cpu);
-		cpucount--;
-		return;
-	}
-
+	init_gdt(cpu, idle);
 	irq_ctx_init(cpu);
 
 	/* Note: Don't modify initial ss override */
diff -r c2b61e13394d include/asm-i386/desc.h
--- a/include/asm-i386/desc.h	Fri Mar 02 09:35:32 2007 +1100
+++ b/include/asm-i386/desc.h	Mon Mar 05 11:34:31 2007 +1100
@@ -22,6 +22,7 @@ struct Xgt_desc_struct {
 
 extern struct Xgt_desc_struct idt_descr;
 DECLARE_PER_CPU(struct Xgt_desc_struct, cpu_gdt_descr);
+DECLARE_PER_CPU(struct desc_struct, cpu_gdt[GDT_ENTRIES]);
 extern struct Xgt_desc_struct early_gdt_descr;
 
 static inline struct desc_struct *get_cpu_gdt_table(unsigned int cpu)
diff -r c2b61e13394d include/asm-i386/pda.h
--- a/include/asm-i386/pda.h	Fri Mar 02 09:35:32 2007 +1100
+++ b/include/asm-i386/pda.h	Mon Mar 05 11:34:31 2007 +1100
@@ -8,6 +8,7 @@
 
 #include <linux/stddef.h>
 #include <linux/types.h>
+#include <asm/percpu.h>
 
 struct i386_pda
 {
@@ -18,9 +19,9 @@ struct i386_pda
 	struct pt_regs *irq_regs;
 };
 
-extern struct i386_pda *_cpu_pda[];
+DECLARE_PER_CPU(struct i386_pda, _cpu_pda);
 
-#define cpu_pda(i)	(_cpu_pda[i])
+#define cpu_pda(i)	(&per_cpu(_cpu_pda, (i)))
 
 #define pda_offset(field) offsetof(struct i386_pda, field)
 
diff -r c2b61e13394d include/asm-i386/processor.h
--- a/include/asm-i386/processor.h	Fri Mar 02 09:35:32 2007 +1100
+++ b/include/asm-i386/processor.h	Fri Mar 02 10:28:14 2007 +1100
@@ -750,7 +750,7 @@ extern void enable_sep_cpu(void);
 extern void enable_sep_cpu(void);
 extern int sysenter_setup(void);
 
-extern int init_gdt(int cpu, struct task_struct *idle);
+extern void init_gdt(int cpu, struct task_struct *idle);
 extern void cpu_set_gdt(int);
 extern void secondary_cpu_init(void);
 



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 4/8] Cleanup setup_pda
  2007-03-06 12:55     ` [PATCH 3/8] Use per-cpu variables for GDT, PDA Rusty Russell
@ 2007-03-06 12:57       ` Rusty Russell
  2007-03-06 12:58         ` [PATCH 5/8] Cleanup GDT access Rusty Russell
  2007-03-06 18:14       ` [PATCH 3/8] Use per-cpu variables for GDT, PDA Jeremy Fitzhardinge
  1 sibling, 1 reply; 36+ messages in thread
From: Rusty Russell @ 2007-03-06 12:57 UTC (permalink / raw)
  To: lkml - Kernel Mailing List
  Cc: Zachary Amsden, Jeremy Fitzhardinge, Ingo Molnar, Andrew Morton,
	Andi Kleen

The comment above this function has never been true in mainline:
talking to Jeremy it's from previous patch churn.  What setup_pda
actually does is to write the segment entry for boot_pda into the
boot_gdt_table.  Ideally, this would be a static initialization, but
because the addresses are split over multiple parts in the funk bit
layout of the GDT entry (thanks Intel!) that's not a valid C
initializer.

We in fact call this routine on every CPU bringup, but that's
harmless.  Non-boot CPUs switch to their own GDTs immediately in
initialize_secondary.

VMI certainly doesn't need to call it explicitly.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff -r 2d9aa53572e2 arch/i386/kernel/head.S
--- a/arch/i386/kernel/head.S	Tue Mar 06 19:01:08 2007 +1100
+++ b/arch/i386/kernel/head.S	Tue Mar 06 19:01:12 2007 +1100
@@ -365,17 +365,10 @@ 1:	movb $1,X86_HARD_MATH
 	.byte 0xDB,0xE4		/* fsetpm for 287, ignored by 387 */
 	ret
 
-/*
- * Point the GDT at this CPU's PDA.  On boot this will be
- * cpu_gdt_table and boot_pda; for secondary CPUs, these will be
- * that CPU's GDT and PDA.
- */
-ENTRY(setup_pda)
-	/* get the PDA pointer */
-	movl start_pda, %eax
-
-	/* slot the PDA address into the GDT */
-	mov early_gdt_descr+2, %ecx
+/* Initialize the boot_gdt_table's GDT_ENTRY_PDA (can't be done statically) */
+setup_pda:
+	movl $boot_pda, %eax
+	movl $boot_gdt_table, %ecx
 	mov %ax, (__KERNEL_PDA+0+2)(%ecx)		/* base & 0x0000ffff */
 	shr $16, %eax
 	mov %al, (__KERNEL_PDA+4+0)(%ecx)		/* base & 0x00ff0000 */
@@ -554,9 +547,6 @@ ENTRY(empty_zero_page)
  * This starts the data section.
  */
 .data
-ENTRY(start_pda)
-	.long boot_pda
-
 ENTRY(stack_start)
 	.long init_thread_union+THREAD_SIZE
 	.long __BOOT_DS
diff -r 2d9aa53572e2 arch/i386/kernel/vmi.c
--- a/arch/i386/kernel/vmi.c	Tue Mar 06 19:01:08 2007 +1100
+++ b/arch/i386/kernel/vmi.c	Tue Mar 06 19:01:47 2007 +1100
@@ -525,8 +525,6 @@ void vmi_pmd_clear(pmd_t *pmd)
 #endif
 
 #ifdef CONFIG_SMP
-extern void setup_pda(void);
-
 static void __devinit
 vmi_startup_ipi_hook(int phys_apicid, unsigned long start_eip,
 		     unsigned long start_esp)
@@ -555,8 +553,6 @@ vmi_startup_ipi_hook(int phys_apicid, un
 	ap.gs = 0;
 
 	ap.eflags = 0;
-
-	setup_pda();
 
 #ifdef CONFIG_X86_PAE
 	/* efer should match BSP efer. */



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 5/8] Cleanup GDT access
  2007-03-06 12:57       ` [PATCH 4/8] Cleanup setup_pda Rusty Russell
@ 2007-03-06 12:58         ` Rusty Russell
  2007-03-06 13:00           ` [PATCH 6/8] Allow per-cpu variables to be page-aligned Rusty Russell
  2007-03-06 18:16           ` [PATCH 5/8] Cleanup GDT access Jeremy Fitzhardinge
  0 siblings, 2 replies; 36+ messages in thread
From: Rusty Russell @ 2007-03-06 12:58 UTC (permalink / raw)
  To: lkml - Kernel Mailing List
  Cc: Zachary Amsden, Jeremy Fitzhardinge, Ingo Molnar, Andrew Morton,
	Andi Kleen

Now we have an explicit per-cpu GDT variable, we don't need to keep
the descriptors around to use them to find the GDT.  Also remove the
cpu_pda() accessor: it's just a per-cpu variable.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff -r e9d5b8e6f5ee arch/i386/kernel/cpu/common.c
--- a/arch/i386/kernel/cpu/common.c	Tue Mar 06 16:20:48 2007 +1100
+++ b/arch/i386/kernel/cpu/common.c	Tue Mar 06 17:04:37 2007 +1100
@@ -22,11 +22,8 @@
 
 #include "cpu.h"
 
-DEFINE_PER_CPU(struct Xgt_desc_struct, cpu_gdt_descr);
-EXPORT_PER_CPU_SYMBOL(cpu_gdt_descr);
-
 DEFINE_PER_CPU(struct desc_struct, cpu_gdt[GDT_ENTRIES]);
-
+EXPORT_PER_CPU_SYMBOL_GPL(cpu_gdt);
 DEFINE_PER_CPU(struct i386_pda, _cpu_pda);
 EXPORT_PER_CPU_SYMBOL(_cpu_pda);
 
@@ -631,14 +628,11 @@ static inline void set_kernel_fs(void)
    will soon come up. */
 __cpuinit void init_gdt(int cpu, struct task_struct *idle)
 {
-	struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, cpu);
 	struct desc_struct *gdt = per_cpu(cpu_gdt, cpu);
 	struct i386_pda *pda = &per_cpu(_cpu_pda, cpu);
 
+	/* Based on boot_gdt_table: set PDA so it can be used immediately */
  	memcpy(gdt, boot_gdt_table, GDT_SIZE);
- 	cpu_gdt_descr->address = (unsigned long)gdt;
-	cpu_gdt_descr->size = GDT_SIZE - 1;
-
 	pack_descriptor((u32 *)&gdt[GDT_ENTRY_PDA].a,
 			(u32 *)&gdt[GDT_ENTRY_PDA].b,
 			(unsigned long)pda, sizeof(*pda) - 1,
@@ -653,9 +647,12 @@ __cpuinit void init_gdt(int cpu, struct 
 /* Move this CPU from boot_gdt_table & boot_pda to this cpu's proper one. */
 void __cpuinit cpu_set_gdt(int cpu)
 {
-	struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, cpu);
-
-	load_gdt(cpu_gdt_descr);
+	struct Xgt_desc_struct gdt_descr;
+
+ 	gdt_descr.address = (unsigned long)per_cpu(cpu_gdt, cpu);
+	gdt_descr.size = GDT_SIZE - 1;
+
+	load_gdt(&gdt_descr);
 	set_kernel_fs();
 }
 
diff -r e9d5b8e6f5ee arch/i386/kernel/efi.c
--- a/arch/i386/kernel/efi.c	Tue Mar 06 16:20:48 2007 +1100
+++ b/arch/i386/kernel/efi.c	Tue Mar 06 17:05:41 2007 +1100
@@ -69,12 +69,10 @@ static void efi_call_phys_prelog(void) _
 {
 	unsigned long cr4;
 	unsigned long temp;
-	struct Xgt_desc_struct *cpu_gdt_descr;
+	struct Xgt_desc_struct gdt_descr;
 
 	spin_lock(&efi_rt_lock);
 	local_irq_save(efi_rt_eflags);
-
-	cpu_gdt_descr = &per_cpu(cpu_gdt_descr, 0);
 
 	/*
 	 * If I don't have PSE, I should just duplicate two entries in page
@@ -105,17 +103,19 @@ static void efi_call_phys_prelog(void) _
 	 */
 	local_flush_tlb();
 
-	cpu_gdt_descr->address = __pa(cpu_gdt_descr->address);
-	load_gdt(cpu_gdt_descr);
+	gdt_descr.address = __pa(get_cpu_gdt_table(0));
+	gdt_descr.size = GDT_SIZE - 1;
+	load_gdt(&gdt_descr);
 }
 
 static void efi_call_phys_epilog(void) __releases(efi_rt_lock)
 {
 	unsigned long cr4;
-	struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, 0);
-
-	cpu_gdt_descr->address = (unsigned long)__va(cpu_gdt_descr->address);
-	load_gdt(cpu_gdt_descr);
+	struct Xgt_desc_struct gdt_descr;
+
+	gdt_descr.address = (unsigned long)get_cpu_gdt_table(0);
+	gdt_descr.size = GDT_SIZE - 1;
+	load_gdt(&gdt_descr);
 
 	cr4 = read_cr4();
 
diff -r e9d5b8e6f5ee arch/i386/kernel/entry.S
--- a/arch/i386/kernel/entry.S	Tue Mar 06 16:20:48 2007 +1100
+++ b/arch/i386/kernel/entry.S	Tue Mar 06 17:04:37 2007 +1100
@@ -561,8 +561,7 @@ END(syscall_badsys)
 #define FIXUP_ESPFIX_STACK \
 	/* since we are on a wrong stack, we cant make it a C code :( */ \
 	movl %fs:PDA_cpu, %ebx; \
-	PER_CPU(cpu_gdt_descr, %ebx); \
-	movl GDS_address(%ebx), %ebx; \
+	PER_CPU(cpu_gdt, %ebx); \
 	GET_DESC_BASE(GDT_ENTRY_ESPFIX_SS, %ebx, %eax, %ax, %al, %ah); \
 	addl %esp, %eax; \
 	pushl $__KERNEL_DS; \
diff -r e9d5b8e6f5ee arch/i386/kernel/traps.c
--- a/arch/i386/kernel/traps.c	Tue Mar 06 16:20:48 2007 +1100
+++ b/arch/i386/kernel/traps.c	Tue Mar 06 17:04:37 2007 +1100
@@ -1018,9 +1018,7 @@ fastcall unsigned long patch_espfix_desc
 fastcall unsigned long patch_espfix_desc(unsigned long uesp,
 					  unsigned long kesp)
 {
-	int cpu = smp_processor_id();
-	struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, cpu);
-	struct desc_struct *gdt = (struct desc_struct *)cpu_gdt_descr->address;
+	struct desc_struct *gdt = __get_cpu_var(cpu_gdt);
 	unsigned long base = (kesp - uesp) & -THREAD_SIZE;
 	unsigned long new_kesp = kesp - base;
 	unsigned long lim_pages = (new_kesp | (THREAD_SIZE - 1)) >> PAGE_SHIFT;
diff -r e9d5b8e6f5ee include/asm-i386/desc.h
--- a/include/asm-i386/desc.h	Tue Mar 06 16:20:48 2007 +1100
+++ b/include/asm-i386/desc.h	Tue Mar 06 17:04:37 2007 +1100
@@ -13,6 +13,7 @@
 #include <asm/mmu.h>
 
 extern struct desc_struct boot_gdt_table[GDT_ENTRIES];
+DECLARE_PER_CPU(struct desc_struct, cpu_gdt[GDT_ENTRIES]);
 
 struct Xgt_desc_struct {
 	unsigned short size;
@@ -21,13 +22,9 @@ struct Xgt_desc_struct {
 } __attribute__ ((packed));
 
 extern struct Xgt_desc_struct idt_descr;
-DECLARE_PER_CPU(struct Xgt_desc_struct, cpu_gdt_descr);
-DECLARE_PER_CPU(struct desc_struct, cpu_gdt[GDT_ENTRIES]);
-extern struct Xgt_desc_struct early_gdt_descr;
-
 static inline struct desc_struct *get_cpu_gdt_table(unsigned int cpu)
 {
-	return (struct desc_struct *)per_cpu(cpu_gdt_descr, cpu).address;
+	return per_cpu(cpu_gdt, cpu);
 }
 
 extern struct desc_struct idt_table[];
@@ -102,13 +99,12 @@ static inline fastcall void native_set_l
 	if (likely(entries == 0))
 		__asm__ __volatile__("lldt %w0"::"q" (0));
 	else {
-		unsigned cpu = smp_processor_id();
 		__u32 a, b;
 
 		pack_descriptor(&a, &b, (unsigned long)addr,
 				entries * sizeof(struct desc_struct) - 1,
 				DESCTYPE_LDT, 0);
-		write_gdt_entry(get_cpu_gdt_table(cpu), GDT_ENTRY_LDT, a, b);
+		write_gdt_entry(__get_cpu_var(cpu_gdt), GDT_ENTRY_LDT, a, b);
 		__asm__ __volatile__("lldt %w0"::"q" (GDT_ENTRY_LDT*8));
 	}
 }
diff -r e9d5b8e6f5ee include/asm-i386/pda.h
--- a/include/asm-i386/pda.h	Tue Mar 06 16:20:48 2007 +1100
+++ b/include/asm-i386/pda.h	Tue Mar 06 17:04:26 2007 +1100
@@ -20,8 +20,6 @@ struct i386_pda
 };
 
 DECLARE_PER_CPU(struct i386_pda, _cpu_pda);
-
-#define cpu_pda(i)	(&per_cpu(_cpu_pda, (i)))
 
 #define pda_offset(field) offsetof(struct i386_pda, field)
 



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 6/8] Allow per-cpu variables to be page-aligned
  2007-03-06 12:58         ` [PATCH 5/8] Cleanup GDT access Rusty Russell
@ 2007-03-06 13:00           ` Rusty Russell
  2007-03-06 13:01             ` [PATCH 7/8] Page-align the GDT Rusty Russell
                               ` (2 more replies)
  2007-03-06 18:16           ` [PATCH 5/8] Cleanup GDT access Jeremy Fitzhardinge
  1 sibling, 3 replies; 36+ messages in thread
From: Rusty Russell @ 2007-03-06 13:00 UTC (permalink / raw)
  To: lkml - Kernel Mailing List
  Cc: Zachary Amsden, Jeremy Fitzhardinge, Ingo Molnar, Andrew Morton,
	Andi Kleen

Xen wants page-aligned GDT (and PDA must not cross a page-boundary,
but that doesn't happen at the moment since it's so close to start of
page).  Let's allow page-alignment in general for per-cpu data.

Because larger alignments can use more room, we increase the max
per-cpu memory to 64k rather than 32k: it's getting a little tight.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff -r 213b1ec27001 arch/alpha/kernel/vmlinux.lds.S
--- a/arch/alpha/kernel/vmlinux.lds.S	Tue Mar 06 19:01:59 2007 +1100
+++ b/arch/alpha/kernel/vmlinux.lds.S	Tue Mar 06 19:02:03 2007 +1100
@@ -69,7 +69,7 @@ SECTIONS
   . = ALIGN(8);
   SECURITY_INIT
 
-  . = ALIGN(64);
+  . = ALIGN(8192);
   __per_cpu_start = .;
   .data.percpu : { *(.data.percpu) }
   __per_cpu_end = .;
diff -r 213b1ec27001 arch/arm/kernel/vmlinux.lds.S
--- a/arch/arm/kernel/vmlinux.lds.S	Tue Mar 06 19:01:59 2007 +1100
+++ b/arch/arm/kernel/vmlinux.lds.S	Tue Mar 06 19:02:03 2007 +1100
@@ -59,7 +59,7 @@ SECTIONS
 			usr/built-in.o(.init.ramfs)
 		__initramfs_end = .;
 #endif
-		. = ALIGN(64);
+		. = ALIGN(4096);
 		__per_cpu_start = .;
 			*(.data.percpu)
 		__per_cpu_end = .;
diff -r 213b1ec27001 arch/cris/arch-v32/vmlinux.lds.S
--- a/arch/cris/arch-v32/vmlinux.lds.S	Tue Mar 06 19:01:59 2007 +1100
+++ b/arch/cris/arch-v32/vmlinux.lds.S	Tue Mar 06 19:02:03 2007 +1100
@@ -91,6 +91,7 @@ SECTIONS
 	}
 	SECURITY_INIT
 
+	. =  ALIGN (8192);
 	__per_cpu_start = .;
 	.data.percpu  : { *(.data.percpu) }
 	__per_cpu_end = .;
diff -r 213b1ec27001 arch/frv/kernel/vmlinux.lds.S
--- a/arch/frv/kernel/vmlinux.lds.S	Tue Mar 06 19:01:59 2007 +1100
+++ b/arch/frv/kernel/vmlinux.lds.S	Tue Mar 06 19:02:03 2007 +1100
@@ -57,6 +57,7 @@ SECTIONS
   __alt_instructions_end = .;
  .altinstr_replacement : { *(.altinstr_replacement) }
 
+  . = ALIGN(4096);
   __per_cpu_start = .;
   .data.percpu  : { *(.data.percpu) }
   __per_cpu_end = .;
diff -r 213b1ec27001 arch/i386/kernel/vmlinux.lds.S
--- a/arch/i386/kernel/vmlinux.lds.S	Tue Mar 06 19:01:59 2007 +1100
+++ b/arch/i386/kernel/vmlinux.lds.S	Tue Mar 06 19:02:03 2007 +1100
@@ -196,7 +196,7 @@ SECTIONS
 	__initramfs_end = .;
   }
 #endif
-  . = ALIGN(L1_CACHE_BYTES);
+  . = ALIGN(4096);
   .data.percpu  : AT(ADDR(.data.percpu) - LOAD_OFFSET) {
 	__per_cpu_start = .;
 	*(.data.percpu)
diff -r 213b1ec27001 arch/m32r/kernel/vmlinux.lds.S
--- a/arch/m32r/kernel/vmlinux.lds.S	Tue Mar 06 19:01:59 2007 +1100
+++ b/arch/m32r/kernel/vmlinux.lds.S	Tue Mar 06 19:02:03 2007 +1100
@@ -110,7 +110,7 @@ SECTIONS
   __initramfs_end = .;
 #endif
 
-  . = ALIGN(32);
+  . = ALIGN(4096);
   __per_cpu_start = .;
   .data.percpu  : { *(.data.percpu) }
   __per_cpu_end = .;
diff -r 213b1ec27001 arch/mips/kernel/vmlinux.lds.S
--- a/arch/mips/kernel/vmlinux.lds.S	Tue Mar 06 19:01:59 2007 +1100
+++ b/arch/mips/kernel/vmlinux.lds.S	Tue Mar 06 19:02:03 2007 +1100
@@ -119,7 +119,7 @@ SECTIONS
   .init.ramfs : { *(.init.ramfs) }
   __initramfs_end = .;
 #endif
-  . = ALIGN(32);
+  . = ALIGN(_PAGE_SIZE);
   __per_cpu_start = .;
   .data.percpu  : { *(.data.percpu) }
   __per_cpu_end = .;
diff -r 213b1ec27001 arch/parisc/kernel/vmlinux.lds.S
--- a/arch/parisc/kernel/vmlinux.lds.S	Tue Mar 06 19:01:59 2007 +1100
+++ b/arch/parisc/kernel/vmlinux.lds.S	Tue Mar 06 19:02:03 2007 +1100
@@ -181,7 +181,7 @@ SECTIONS
   .init.ramfs : { *(.init.ramfs) }
   __initramfs_end = .;
 #endif
-  . = ALIGN(32);
+  . = ALIGN(ASM_PAGE_SIZE);
   __per_cpu_start = .;
   .data.percpu  : { *(.data.percpu) }
   __per_cpu_end = .;
diff -r 213b1ec27001 arch/powerpc/kernel/vmlinux.lds.S
--- a/arch/powerpc/kernel/vmlinux.lds.S	Tue Mar 06 19:01:59 2007 +1100
+++ b/arch/powerpc/kernel/vmlinux.lds.S	Tue Mar 06 19:02:03 2007 +1100
@@ -139,11 +139,7 @@ SECTIONS
 		__initramfs_end = .;
 	}
 #endif
-#ifdef CONFIG_PPC32
-	. = ALIGN(32);
-#else
-	. = ALIGN(128);
-#endif
+	. = ALIGN(PAGE_SIZE);
 	.data.percpu : {
 		__per_cpu_start = .;
 		*(.data.percpu)
diff -r 213b1ec27001 arch/ppc/kernel/vmlinux.lds.S
--- a/arch/ppc/kernel/vmlinux.lds.S	Tue Mar 06 19:01:59 2007 +1100
+++ b/arch/ppc/kernel/vmlinux.lds.S	Tue Mar 06 19:02:03 2007 +1100
@@ -130,7 +130,7 @@ SECTIONS
   __ftr_fixup : { *(__ftr_fixup) }
   __stop___ftr_fixup = .;
 
-  . = ALIGN(32);
+  . = ALIGN(4096);
   __per_cpu_start = .;
   .data.percpu  : { *(.data.percpu) }
   __per_cpu_end = .;
diff -r 213b1ec27001 arch/s390/kernel/vmlinux.lds.S
--- a/arch/s390/kernel/vmlinux.lds.S	Tue Mar 06 19:01:59 2007 +1100
+++ b/arch/s390/kernel/vmlinux.lds.S	Tue Mar 06 19:02:03 2007 +1100
@@ -99,7 +99,7 @@ SECTIONS
   . = ALIGN(2);
   __initramfs_end = .;
 #endif
-  . = ALIGN(256);
+  . = ALIGN(4096);
   __per_cpu_start = .;
   .data.percpu  : { *(.data.percpu) }
   __per_cpu_end = .;
diff -r 213b1ec27001 arch/sh/kernel/vmlinux.lds.S
--- a/arch/sh/kernel/vmlinux.lds.S	Tue Mar 06 19:01:59 2007 +1100
+++ b/arch/sh/kernel/vmlinux.lds.S	Tue Mar 06 19:02:25 2007 +1100
@@ -54,7 +54,7 @@ SECTIONS
   . = ALIGN(PAGE_SIZE);
   .data.page_aligned : { *(.data.page_aligned) }
 
-  . = ALIGN(L1_CACHE_BYTES);
+  . = ALIGN(PAGE_SIZE);
   __per_cpu_start = .;
   .data.percpu : { *(.data.percpu) }
   __per_cpu_end = .;
diff -r 213b1ec27001 arch/sh64/kernel/vmlinux.lds.S
--- a/arch/sh64/kernel/vmlinux.lds.S	Tue Mar 06 19:01:59 2007 +1100
+++ b/arch/sh64/kernel/vmlinux.lds.S	Tue Mar 06 19:02:03 2007 +1100
@@ -85,7 +85,7 @@ SECTIONS
   . = ALIGN(PAGE_SIZE);
   .data.page_aligned : C_PHYS(.data.page_aligned) { *(.data.page_aligned) }
 
-  . = ALIGN(L1_CACHE_BYTES);
+  . = ALIGN(PAGE_SIZE);
   __per_cpu_start = .;
   .data.percpu : C_PHYS(.data.percpu) { *(.data.percpu) }
   __per_cpu_end = . ;
diff -r 213b1ec27001 arch/sparc/kernel/vmlinux.lds.S
--- a/arch/sparc/kernel/vmlinux.lds.S	Tue Mar 06 19:01:59 2007 +1100
+++ b/arch/sparc/kernel/vmlinux.lds.S	Tue Mar 06 19:02:03 2007 +1100
@@ -65,7 +65,7 @@ SECTIONS
   __initramfs_end = .;
 #endif
 
-  . = ALIGN(32);
+  . = ALIGN(4096);
   __per_cpu_start = .;
   .data.percpu  : { *(.data.percpu) }
   __per_cpu_end = .;
diff -r 213b1ec27001 arch/x86_64/kernel/vmlinux.lds.S
--- a/arch/x86_64/kernel/vmlinux.lds.S	Tue Mar 06 19:01:59 2007 +1100
+++ b/arch/x86_64/kernel/vmlinux.lds.S	Tue Mar 06 19:02:03 2007 +1100
@@ -194,7 +194,7 @@ SECTIONS
   __initramfs_end = .;
 #endif
 
-    . = ALIGN(CONFIG_X86_L1_CACHE_BYTES);
+  . = ALIGN(4096);
   __per_cpu_start = .;
   .data.percpu  : AT(ADDR(.data.percpu) - LOAD_OFFSET) { *(.data.percpu) }
   __per_cpu_end = .;
diff -r 213b1ec27001 arch/xtensa/kernel/vmlinux.lds.S
--- a/arch/xtensa/kernel/vmlinux.lds.S	Tue Mar 06 19:01:59 2007 +1100
+++ b/arch/xtensa/kernel/vmlinux.lds.S	Tue Mar 06 19:02:03 2007 +1100
@@ -198,7 +198,7 @@ SECTIONS
   __ftr_fixup : { *(__ftr_fixup) }
   __stop___ftr_fixup = .;
 
-  . = ALIGN(32);
+  . = ALIGN(4096);
   __per_cpu_start = .;
   .data.percpu  : { *(.data.percpu) }
   __per_cpu_end = .;
diff -r 213b1ec27001 include/linux/percpu.h
--- a/include/linux/percpu.h	Tue Mar 06 19:01:59 2007 +1100
+++ b/include/linux/percpu.h	Tue Mar 06 23:32:35 2007 +1100
@@ -11,7 +11,7 @@
 
 /* Enough to cover all DEFINE_PER_CPUs in kernel, including modules. */
 #ifndef PERCPU_ENOUGH_ROOM
-#define PERCPU_ENOUGH_ROOM 32768
+#define PERCPU_ENOUGH_ROOM 65536
 #endif
 
 /*
diff -r 213b1ec27001 init/main.c
--- a/init/main.c	Tue Mar 06 19:01:59 2007 +1100
+++ b/init/main.c	Tue Mar 06 19:02:03 2007 +1100
@@ -368,12 +368,12 @@ static void __init setup_per_cpu_areas(v
 	unsigned long nr_possible_cpus = num_possible_cpus();
 
 	/* Copy section for each CPU (we discard the original) */
-	size = ALIGN(__per_cpu_end - __per_cpu_start, SMP_CACHE_BYTES);
+	size = ALIGN(__per_cpu_end - __per_cpu_start, PAGE_SIZE);
 #ifdef CONFIG_MODULES
 	if (size < PERCPU_ENOUGH_ROOM)
 		size = PERCPU_ENOUGH_ROOM;
 #endif
-	ptr = alloc_bootmem(size * nr_possible_cpus);
+	ptr = alloc_bootmem_pages(size * nr_possible_cpus);
 
 	for_each_possible_cpu(i) {
 		__per_cpu_offset[i] = ptr - __per_cpu_start;
diff -r 213b1ec27001 kernel/module.c
--- a/kernel/module.c	Tue Mar 06 19:01:59 2007 +1100
+++ b/kernel/module.c	Tue Mar 06 19:02:03 2007 +1100
@@ -544,10 +544,10 @@ static void *percpu_modalloc(unsigned lo
 	unsigned int i;
 	void *ptr;
 
-	if (align > SMP_CACHE_BYTES) {
-		printk(KERN_WARNING "%s: per-cpu alignment %li > %i\n",
-		       name, align, SMP_CACHE_BYTES);
-		align = SMP_CACHE_BYTES;
+	if (align > PAGE_SIZE) {
+		printk(KERN_WARNING "%s: per-cpu alignment %li > %li\n",
+		       name, align, PAGE_SIZE);
+		align = PAGE_SIZE;
 	}
 
 	ptr = __per_cpu_start;
@@ -628,7 +628,7 @@ static int percpu_modinit(void)
 	pcpu_size = kmalloc(sizeof(pcpu_size[0]) * pcpu_num_allocated,
 			    GFP_KERNEL);
 	/* Static in-kernel percpu data (used). */
-	pcpu_size[0] = -ALIGN(__per_cpu_end-__per_cpu_start, SMP_CACHE_BYTES);
+	pcpu_size[0] = -ALIGN(__per_cpu_end-__per_cpu_start, PAGE_SIZE);
 	/* Free room. */
 	pcpu_size[1] = PERCPU_ENOUGH_ROOM + pcpu_size[0];
 	if (pcpu_size[1] < 0) {



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 7/8] Page-align the GDT
  2007-03-06 13:00           ` [PATCH 6/8] Allow per-cpu variables to be page-aligned Rusty Russell
@ 2007-03-06 13:01             ` Rusty Russell
  2007-03-06 13:03               ` [PATCH 8/8] Convert PDA into the percpu section Rusty Russell
  2007-03-06 13:15             ` [PATCH 6/8] Allow per-cpu variables to be page-aligned Ingo Molnar
  2007-03-06 18:17             ` Jeremy Fitzhardinge
  2 siblings, 1 reply; 36+ messages in thread
From: Rusty Russell @ 2007-03-06 13:01 UTC (permalink / raw)
  To: lkml - Kernel Mailing List
  Cc: Zachary Amsden, Jeremy Fitzhardinge, Ingo Molnar, Andrew Morton,
	Andi Kleen

Xen wants a dedicated page for the GDT.  I believe VMI likes it too.
lguest, KVM and native don't care.

Simple transformation to page-aligned "struct gdt_page".

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff -r 576929b5b43f arch/i386/kernel/cpu/common.c
--- a/arch/i386/kernel/cpu/common.c	Tue Mar 06 16:28:38 2007 +1100
+++ b/arch/i386/kernel/cpu/common.c	Tue Mar 06 16:47:11 2007 +1100
@@ -22,8 +22,8 @@
 
 #include "cpu.h"
 
-DEFINE_PER_CPU(struct desc_struct, cpu_gdt[GDT_ENTRIES]);
-EXPORT_PER_CPU_SYMBOL_GPL(cpu_gdt);
+DEFINE_PER_CPU(struct gdt_page, gdt_page);
+EXPORT_PER_CPU_SYMBOL_GPL(gdt_page);
 DEFINE_PER_CPU(struct i386_pda, _cpu_pda);
 EXPORT_PER_CPU_SYMBOL(_cpu_pda);
 
@@ -628,7 +628,7 @@ static inline void set_kernel_fs(void)
    will soon come up. */
 __cpuinit void init_gdt(int cpu, struct task_struct *idle)
 {
-	struct desc_struct *gdt = per_cpu(cpu_gdt, cpu);
+	struct desc_struct *gdt = per_cpu(gdt_page, cpu).gdt;
 	struct i386_pda *pda = &per_cpu(_cpu_pda, cpu);
 
 	/* Based on boot_gdt_table: set PDA so it can be used immediately */
@@ -649,7 +649,7 @@ void __cpuinit cpu_set_gdt(int cpu)
 {
 	struct Xgt_desc_struct gdt_descr;
 
- 	gdt_descr.address = (unsigned long)per_cpu(cpu_gdt, cpu);
+ 	gdt_descr.address = (unsigned long)per_cpu(gdt_page, cpu).gdt;
 	gdt_descr.size = GDT_SIZE - 1;
 
 	load_gdt(&gdt_descr);
diff -r 576929b5b43f arch/i386/kernel/entry.S
--- a/arch/i386/kernel/entry.S	Tue Mar 06 16:28:38 2007 +1100
+++ b/arch/i386/kernel/entry.S	Tue Mar 06 16:47:11 2007 +1100
@@ -561,7 +561,7 @@ END(syscall_badsys)
 #define FIXUP_ESPFIX_STACK \
 	/* since we are on a wrong stack, we cant make it a C code :( */ \
 	movl %fs:PDA_cpu, %ebx; \
-	PER_CPU(cpu_gdt, %ebx); \
+	PER_CPU(gdt_page, %ebx); \
 	GET_DESC_BASE(GDT_ENTRY_ESPFIX_SS, %ebx, %eax, %ax, %al, %ah); \
 	addl %esp, %eax; \
 	pushl $__KERNEL_DS; \
diff -r 576929b5b43f arch/i386/kernel/head.S
--- a/arch/i386/kernel/head.S	Tue Mar 06 16:28:38 2007 +1100
+++ b/arch/i386/kernel/head.S	Tue Mar 06 16:47:11 2007 +1100
@@ -592,7 +592,7 @@ ENTRY(early_gdt_descr)
 	.long boot_gdt_table
 
 /* The boot Global Descriptor Table: after boot we allocate a per-cpu copy */
-	.align L1_CACHE_BYTES
+	.p2align PAGE_SHIFT
 ENTRY(boot_gdt_table)
 	.quad 0x0000000000000000	/* NULL descriptor */
 	.quad 0x0000000000000000	/* 0x0b reserved */
diff -r 576929b5b43f arch/i386/kernel/traps.c
--- a/arch/i386/kernel/traps.c	Tue Mar 06 16:28:38 2007 +1100
+++ b/arch/i386/kernel/traps.c	Tue Mar 06 16:28:40 2007 +1100
@@ -1018,7 +1018,7 @@ fastcall unsigned long patch_espfix_desc
 fastcall unsigned long patch_espfix_desc(unsigned long uesp,
 					  unsigned long kesp)
 {
-	struct desc_struct *gdt = __get_cpu_var(cpu_gdt);
+	struct desc_struct *gdt = __get_cpu_var(gdt_page).gdt;
 	unsigned long base = (kesp - uesp) & -THREAD_SIZE;
 	unsigned long new_kesp = kesp - base;
 	unsigned long lim_pages = (new_kesp | (THREAD_SIZE - 1)) >> PAGE_SHIFT;
diff -r 576929b5b43f include/asm-i386/desc.h
--- a/include/asm-i386/desc.h	Tue Mar 06 16:28:38 2007 +1100
+++ b/include/asm-i386/desc.h	Tue Mar 06 16:47:22 2007 +1100
@@ -13,7 +13,11 @@
 #include <asm/mmu.h>
 
 extern struct desc_struct boot_gdt_table[GDT_ENTRIES];
-DECLARE_PER_CPU(struct desc_struct, cpu_gdt[GDT_ENTRIES]);
+struct gdt_page
+{
+	struct desc_struct gdt[GDT_ENTRIES];
+} __attribute__((aligned(PAGE_SIZE)));
+DECLARE_PER_CPU(struct gdt_page, gdt_page);
 
 struct Xgt_desc_struct {
 	unsigned short size;
@@ -24,7 +28,7 @@ extern struct Xgt_desc_struct idt_descr;
 extern struct Xgt_desc_struct idt_descr;
 static inline struct desc_struct *get_cpu_gdt_table(unsigned int cpu)
 {
-	return per_cpu(cpu_gdt, cpu);
+	return per_cpu(gdt_page, cpu).gdt;
 }
 
 extern struct desc_struct idt_table[];
@@ -104,7 +108,8 @@ static inline fastcall void native_set_l
 		pack_descriptor(&a, &b, (unsigned long)addr,
 				entries * sizeof(struct desc_struct) - 1,
 				DESCTYPE_LDT, 0);
-		write_gdt_entry(__get_cpu_var(cpu_gdt), GDT_ENTRY_LDT, a, b);
+		write_gdt_entry(__get_cpu_var(gdt_page).gdt, GDT_ENTRY_LDT,
+				a, b);
 		__asm__ __volatile__("lldt %w0"::"q" (GDT_ENTRY_LDT*8));
 	}
 }



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 8/8] Convert PDA into the percpu section
  2007-03-06 13:01             ` [PATCH 7/8] Page-align the GDT Rusty Russell
@ 2007-03-06 13:03               ` Rusty Russell
  2007-03-06 13:10                 ` Ingo Molnar
                                   ` (3 more replies)
  0 siblings, 4 replies; 36+ messages in thread
From: Rusty Russell @ 2007-03-06 13:03 UTC (permalink / raw)
  To: lkml - Kernel Mailing List
  Cc: Zachary Amsden, Jeremy Fitzhardinge, Ingo Molnar, Andrew Morton,
	Andi Kleen

Currently x86 (similar to x84-64) has a special per-cpu structure
called "i386_pda" which can be easily and efficiently referenced via
the %fs register.  An ELF section is more flexible than a structure,
allowing any piece of code to use this area.  Indeed, such a section
already exists: the per-cpu area.

So this patch
(1) Removes the PDA and uses per-cpu variables for each current member.
(2) Replaces the __KERNEL_PDA segment with __KERNEL_PERCPU.
(3) Creates a per-cpu mirror of __per_cpu_offset called this_cpu_off, which
    can be used to calculate addresses for this CPU's variables.
(4) Moves the boot cpu's GDT/percpu setup to smp_prepare_boot_cpu(),
    immediately after the per-cpu areas are allocated.

The result is one less x86-specific concept.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff -r b30bbd45ba64 arch/i386/kernel/asm-offsets.c
--- a/arch/i386/kernel/asm-offsets.c	Tue Mar 06 23:35:01 2007 +1100
+++ b/arch/i386/kernel/asm-offsets.c	Tue Mar 06 23:35:03 2007 +1100
@@ -15,7 +15,6 @@
 #include <asm/processor.h>
 #include <asm/thread_info.h>
 #include <asm/elf.h>
-#include <asm/pda.h>
 #ifdef CONFIG_LGUEST_GUEST
 #include <asm/lguest.h>
 #include "../lguest/lg.h"
@@ -105,10 +104,6 @@ void foo(void)
 
 	OFFSET(crypto_tfm_ctx_offset, crypto_tfm, __crt_ctx);
 
-	BLANK();
- 	OFFSET(PDA_cpu, i386_pda, cpu_number);
-	OFFSET(PDA_pcurrent, i386_pda, pcurrent);
-
 #ifdef CONFIG_PARAVIRT
 	BLANK();
 	OFFSET(PARAVIRT_enabled, paravirt_ops, paravirt_enabled);
diff -r b30bbd45ba64 arch/i386/kernel/cpu/common.c
--- a/arch/i386/kernel/cpu/common.c	Tue Mar 06 23:35:01 2007 +1100
+++ b/arch/i386/kernel/cpu/common.c	Tue Mar 06 23:37:51 2007 +1100
@@ -18,14 +18,11 @@
 #include <asm/apic.h>
 #include <mach_apic.h>
 #endif
-#include <asm/pda.h>
 
 #include "cpu.h"
 
 DEFINE_PER_CPU(struct gdt_page, gdt_page);
 EXPORT_PER_CPU_SYMBOL_GPL(gdt_page);
-DEFINE_PER_CPU(struct i386_pda, _cpu_pda);
-EXPORT_PER_CPU_SYMBOL(_cpu_pda);
 
 static int cachesize_override __cpuinitdata = -1;
 static int disable_x86_fxsr __cpuinitdata;
@@ -600,51 +597,44 @@ void __init early_cpu_init(void)
 #endif
 }
 
-/* Make sure %gs is initialized properly in idle threads */
+/* Make sure %fs is initialized properly in idle threads */
 struct pt_regs * __devinit idle_regs(struct pt_regs *regs)
 {
 	memset(regs, 0, sizeof(struct pt_regs));
-	regs->xfs = __KERNEL_PDA;
+	regs->xfs = __KERNEL_PERCPU;
 	return regs;
 }
 
-/* Initial PDA used by boot CPU */
-struct i386_pda boot_pda = {
-	._pda = &boot_pda,
-	.cpu_number = 0,
-	.pcurrent = &init_task,
-};
-
 static inline void set_kernel_fs(void)
 {
-	/* Set %fs for this CPU's PDA.  Memory clobber is to create a
-	   barrier with respect to any PDA operations, so the compiler
-	   doesn't move any before here. */
-	asm volatile ("mov %0, %%fs" : : "r" (__KERNEL_PDA) : "memory");
-}
-
-/* Initialize the CPU's GDT and PDA.  This is either the boot CPU doing itself
+	/* Set %fs for this CPU's per-cpu area.  Memory clobber is to
+	   create a barrier with respect to any per-cpu operations, so the
+	   compiler doesn't move any before here. */
+	asm volatile ("mov %0, %%fs" : : "r" (__KERNEL_PERCPU) : "memory");
+}
+
+/* Initialize the CPU's GDT.  This is either the boot CPU doing itself
    (still using boot_gdt_table), or a CPU doing it for a secondary which
    will soon come up. */
 __cpuinit void init_gdt(int cpu, struct task_struct *idle)
 {
 	struct desc_struct *gdt = per_cpu(gdt_page, cpu).gdt;
-	struct i386_pda *pda = &per_cpu(_cpu_pda, cpu);
-
-	/* Based on boot_gdt_table: set PDA so it can be used immediately */
+
+	/* Based on boot_gdt_table: set percpu so it can be used immediately */
  	memcpy(gdt, boot_gdt_table, GDT_SIZE);
-	pack_descriptor((u32 *)&gdt[GDT_ENTRY_PDA].a,
-			(u32 *)&gdt[GDT_ENTRY_PDA].b,
-			(unsigned long)pda, sizeof(*pda) - 1,
+#ifdef CONFIG_SMP
+	pack_descriptor((u32 *)&gdt[GDT_ENTRY_PERCPU].a,
+			(u32 *)&gdt[GDT_ENTRY_PERCPU].b,
+			__per_cpu_offset[cpu], 0xFFFFF,
 			0x80 | DESCTYPE_S | 0x2, 0); /* present read-write data segment */
-
-	memset(pda, 0, sizeof(*pda));
-	pda->_pda = pda;
-	pda->cpu_number = cpu;
-	pda->pcurrent = idle;
-}
-
-/* Move this CPU from boot_gdt_table & boot_pda to this cpu's proper one. */
+	per_cpu(this_cpu_off, cpu) = __per_cpu_offset[cpu];
+	per_cpu(cpu_number, cpu) = cpu;
+#endif /* SMP*/
+
+	per_cpu(current_task, cpu) = idle;
+}
+
+/* Move this CPU from boot_gdt_table to this cpu's proper one. */
 void __cpuinit cpu_set_gdt(int cpu)
 {
 	struct Xgt_desc_struct gdt_descr;
@@ -738,8 +728,7 @@ void __cpuinit cpu_init(void)
 	int cpu = smp_processor_id();
 	struct task_struct *curr = current;
 
-	/* Set up the real GDT and PDA, so we can transition from the
-	   boot_gdt_table & boot_pda. */
+	/* Set up the real GDT so we can transition from the boot_gdt_table. */
 	init_gdt(cpu, curr);
 	cpu_set_gdt(cpu);
 	_cpu_init(cpu, curr);
diff -r b30bbd45ba64 arch/i386/kernel/entry.S
--- a/arch/i386/kernel/entry.S	Tue Mar 06 23:35:01 2007 +1100
+++ b/arch/i386/kernel/entry.S	Tue Mar 06 23:35:03 2007 +1100
@@ -132,7 +132,7 @@ 1:
 	movl $(__USER_DS), %edx; \
 	movl %edx, %ds; \
 	movl %edx, %es; \
-	movl $(__KERNEL_PDA), %edx; \
+	movl $(__KERNEL_PERCPU), %edx; \
 	movl %edx, %fs
 
 #define RESTORE_INT_REGS \
@@ -557,7 +557,6 @@ END(syscall_badsys)
 
 #define FIXUP_ESPFIX_STACK \
 	/* since we are on a wrong stack, we cant make it a C code :( */ \
-	movl %fs:PDA_cpu, %ebx; \
 	PER_CPU(gdt_page, %ebx); \
 	GET_DESC_BASE(GDT_ENTRY_ESPFIX_SS, %ebx, %eax, %ax, %al, %ah); \
 	addl %esp, %eax; \
@@ -682,7 +681,7 @@ error_code:
 	pushl %fs
 	CFI_ADJUST_CFA_OFFSET 4
 	/*CFI_REL_OFFSET fs, 0*/
-	movl $(__KERNEL_PDA), %ecx
+	movl $(__KERNEL_PERCPU), %ecx
 	movl %ecx, %fs
 	UNWIND_ESPFIX_STACK
 	popl %ecx
diff -r b30bbd45ba64 arch/i386/kernel/head.S
--- a/arch/i386/kernel/head.S	Tue Mar 06 23:35:01 2007 +1100
+++ b/arch/i386/kernel/head.S	Tue Mar 06 23:35:03 2007 +1100
@@ -318,7 +318,6 @@ 2:	movl %cr0,%eax
 	movl %eax,%cr0
 
 	call check_x87
-	call setup_pda
 	lgdt early_gdt_descr
 	lidt idt_descr
 	ljmp $(__KERNEL_CS),$1f
@@ -333,7 +332,7 @@ 1:	movl $(__KERNEL_DS),%eax	# reload all
 	movl %eax,%gs
 	lldt %ax
 
-	movl $(__KERNEL_PDA),%eax
+	movl $(__KERNEL_PERCPU),%eax
 	mov  %eax,%fs
 
 	cld			# gcc2 wants the direction flag cleared at all times
@@ -363,16 +362,6 @@ check_x87:
 	ALIGN
 1:	movb $1,X86_HARD_MATH
 	.byte 0xDB,0xE4		/* fsetpm for 287, ignored by 387 */
-	ret
-
-/* Initialize the boot_gdt_table's GDT_ENTRY_PDA (can't be done statically) */
-setup_pda:
-	movl $boot_pda, %eax
-	movl $boot_gdt_table, %ecx
-	mov %ax, (__KERNEL_PDA+0+2)(%ecx)		/* base & 0x0000ffff */
-	shr $16, %eax
-	mov %al, (__KERNEL_PDA+4+0)(%ecx)		/* base & 0x00ff0000 */
-	mov %ah, (__KERNEL_PDA+4+3)(%ecx)		/* base & 0xff000000 */
 	ret
 
 /*
@@ -635,7 +624,7 @@ ENTRY(boot_gdt_table)
 	.quad 0x004092000000ffff	/* 0xc8 APM DS    data */
 
 	.quad 0x00c0920000000000	/* 0xd0 - ESPFIX SS */
-	.quad 0x00cf92000000ffff	/* 0xd8 - PDA */
+	.quad 0x00cf92000000ffff	/* 0xd8 - per-cpu */
 	.quad 0x0000000000000000	/* 0xe0 - unused */
 	.quad 0x0000000000000000	/* 0xe8 - unused */
 	.quad 0x0000000000000000	/* 0xf0 - unused */
diff -r b30bbd45ba64 arch/i386/kernel/irq.c
--- a/arch/i386/kernel/irq.c	Tue Mar 06 23:35:01 2007 +1100
+++ b/arch/i386/kernel/irq.c	Tue Mar 06 23:35:03 2007 +1100
@@ -23,6 +23,9 @@
 
 DEFINE_PER_CPU(irq_cpustat_t, irq_stat) ____cacheline_internodealigned_in_smp;
 EXPORT_PER_CPU_SYMBOL(irq_stat);
+
+DEFINE_PER_CPU(struct pt_regs *, irq_regs);
+EXPORT_PER_CPU_SYMBOL(irq_regs);
 
 /*
  * 'what should we do if we get a hw irq event on an illegal vector'.
diff -r b30bbd45ba64 arch/i386/kernel/process.c
--- a/arch/i386/kernel/process.c	Tue Mar 06 23:35:01 2007 +1100
+++ b/arch/i386/kernel/process.c	Tue Mar 06 23:35:03 2007 +1100
@@ -39,6 +39,7 @@
 #include <linux/random.h>
 #include <linux/personality.h>
 #include <linux/tick.h>
+#include <linux/percpu.h>
 
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
@@ -57,7 +58,6 @@
 
 #include <asm/tlbflush.h>
 #include <asm/cpu.h>
-#include <asm/pda.h>
 
 asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
 
@@ -65,6 +65,12 @@ static int hlt_counter;
 
 unsigned long boot_option_idle_override = 0;
 EXPORT_SYMBOL(boot_option_idle_override);
+
+DEFINE_PER_CPU(struct task_struct *, current_task) = &init_task;
+EXPORT_PER_CPU_SYMBOL(current_task);
+
+DEFINE_PER_CPU(int, cpu_number);
+EXPORT_PER_CPU_SYMBOL(cpu_number);
 
 /*
  * Return saved PC of a blocked thread.
@@ -343,7 +349,7 @@ int kernel_thread(int (*fn)(void *), voi
 
 	regs.xds = __USER_DS;
 	regs.xes = __USER_DS;
-	regs.xfs = __KERNEL_PDA;
+	regs.xfs = __KERNEL_PERCPU;
 	regs.orig_eax = -1;
 	regs.eip = (unsigned long) kernel_thread_helper;
 	regs.xcs = __KERNEL_CS | get_kernel_rpl();
@@ -712,7 +718,7 @@ struct task_struct fastcall * __switch_t
 	if (prev->gs | next->gs)
 		loadsegment(gs, next->gs);
 
-	write_pda(pcurrent, next_p);
+	x86_write_percpu(current_task, next_p);
 
 	return prev_p;
 }
diff -r b30bbd45ba64 arch/i386/kernel/smpboot.c
--- a/arch/i386/kernel/smpboot.c	Tue Mar 06 23:35:01 2007 +1100
+++ b/arch/i386/kernel/smpboot.c	Tue Mar 06 23:35:03 2007 +1100
@@ -52,7 +52,6 @@
 #include <asm/desc.h>
 #include <asm/arch_hooks.h>
 #include <asm/nmi.h>
-#include <asm/pda.h>
 
 #include <mach_apic.h>
 #include <mach_wakecpu.h>
@@ -97,6 +96,9 @@ EXPORT_SYMBOL(x86_cpu_to_apicid);
 EXPORT_SYMBOL(x86_cpu_to_apicid);
 
 u8 apicid_2_node[MAX_APICID];
+
+DEFINE_PER_CPU(unsigned long, this_cpu_off);
+EXPORT_PER_CPU_SYMBOL(this_cpu_off);
 
 /*
  * Trampoline 80x86 program as an array.
@@ -461,7 +463,6 @@ extern struct {
 	void * esp;
 	unsigned short ss;
 } stack_start;
-extern struct i386_pda *start_pda;
 
 #ifdef CONFIG_NUMA
 
@@ -1167,6 +1168,10 @@ void __devinit smp_prepare_boot_cpu(void
 	cpu_set(smp_processor_id(), cpu_present_map);
 	cpu_set(smp_processor_id(), cpu_possible_map);
 	per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
+
+	/* Set up %fs to point to our per-CPU area now it's allocated */
+	init_gdt(smp_processor_id(), &init_task);
+	cpu_set_gdt(smp_processor_id());
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
diff -r b30bbd45ba64 arch/i386/kernel/vmi.c
--- a/arch/i386/kernel/vmi.c	Tue Mar 06 23:35:01 2007 +1100
+++ b/arch/i386/kernel/vmi.c	Tue Mar 06 23:35:04 2007 +1100
@@ -549,7 +549,7 @@ vmi_startup_ipi_hook(int phys_apicid, un
 
 	ap.ds = __USER_DS;
 	ap.es = __USER_DS;
-	ap.fs = __KERNEL_PDA;
+	ap.fs = __KERNEL_PERCPU;
 	ap.gs = 0;
 
 	ap.eflags = 0;
diff -r b30bbd45ba64 arch/i386/lguest/core.c
--- a/arch/i386/lguest/core.c	Tue Mar 06 23:35:01 2007 +1100
+++ b/arch/i386/lguest/core.c	Tue Mar 06 23:35:04 2007 +1100
@@ -302,7 +302,7 @@ int run_guest(struct lguest *lg, char *_
 
 		/* Host state to be restored after the guest returns. */
 		asm("sidt %0":"=m"(lg->state->host.idt));
-		lg->state->host.gdt = __get_cpu_var(cpu_gdt_descr);
+		asm("sgdt %0":"=m"(lg->state->host.gdt));
 
 		/* Even if *we* don't want FPU trap, guest might... */
 		set_ts(lg->ts);
diff -r b30bbd45ba64 arch/i386/lguest/lguest.c
--- a/arch/i386/lguest/lguest.c	Tue Mar 06 23:35:01 2007 +1100
+++ b/arch/i386/lguest/lguest.c	Tue Mar 06 23:35:04 2007 +1100
@@ -34,7 +34,6 @@
 #include <asm/desc.h>
 #include <asm/setup.h>
 #include <asm/e820.h>
-#include <asm/pda.h>
 #include <asm/asm-offsets.h>
 #include <asm/mce.h>
 
@@ -509,13 +508,8 @@ static __attribute_used__ __init void lg
 	/* We use top of mem for initial pagetables. */
 	init_pg_tables_end = __pa(pg0);
 
-	/* set up PDA descriptor */
-	pack_descriptor((u32 *)&cpu_gdt_table[GDT_ENTRY_PDA].a,
-			(u32 *)&cpu_gdt_table[GDT_ENTRY_PDA].b,
-			(unsigned)&boot_pda, sizeof(boot_pda)-1,
-			0x80 | DESCTYPE_S | 0x02, 0);
-	load_gdt(&early_gdt_descr);
-	asm volatile ("mov %0, %%fs" : : "r" (__KERNEL_PDA) : "memory");
+	init_gdt(0, &init_task);
+	cpu_set_gdt(0);
 
 	reserve_top_address(lguest_data.reserve_mem);
 
diff -r b30bbd45ba64 include/asm-i386/current.h
--- a/include/asm-i386/current.h	Tue Mar 06 23:35:01 2007 +1100
+++ b/include/asm-i386/current.h	Tue Mar 06 23:35:04 2007 +1100
@@ -1,14 +1,15 @@
 #ifndef _I386_CURRENT_H
 #define _I386_CURRENT_H
 
-#include <asm/pda.h>
 #include <linux/compiler.h>
+#include <asm/percpu.h>
 
 struct task_struct;
 
+DECLARE_PER_CPU(struct task_struct *, current_task);
 static __always_inline struct task_struct *get_current(void)
 {
-	return read_pda(pcurrent);
+	return x86_read_percpu(current_task);
 }
  
 #define current get_current()
diff -r b30bbd45ba64 include/asm-i386/irq_regs.h
--- a/include/asm-i386/irq_regs.h	Tue Mar 06 23:35:01 2007 +1100
+++ b/include/asm-i386/irq_regs.h	Tue Mar 06 23:35:04 2007 +1100
@@ -1,25 +1,27 @@
 /*
  * Per-cpu current frame pointer - the location of the last exception frame on
- * the stack, stored in the PDA.
+ * the stack, stored in the per-cpu area.
  *
  * Jeremy Fitzhardinge <jeremy@goop.org>
  */
 #ifndef _ASM_I386_IRQ_REGS_H
 #define _ASM_I386_IRQ_REGS_H
 
-#include <asm/pda.h>
+#include <asm/percpu.h>
+
+DECLARE_PER_CPU(struct pt_regs *, irq_regs);
 
 static inline struct pt_regs *get_irq_regs(void)
 {
-	return read_pda(irq_regs);
+	return x86_read_percpu(irq_regs);
 }
 
 static inline struct pt_regs *set_irq_regs(struct pt_regs *new_regs)
 {
 	struct pt_regs *old_regs;
 
-	old_regs = read_pda(irq_regs);
-	write_pda(irq_regs, new_regs);
+	old_regs = get_irq_regs();
+	x86_write_percpu(irq_regs, new_regs);
 
 	return old_regs;
 }
diff -r b30bbd45ba64 include/asm-i386/pda.h
--- a/include/asm-i386/pda.h	Tue Mar 06 23:35:01 2007 +1100
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,100 +0,0 @@
-/*
-   Per-processor Data Areas
-   Jeremy Fitzhardinge <jeremy@goop.org> 2006
-   Based on asm-x86_64/pda.h by Andi Kleen.
- */
-#ifndef _I386_PDA_H
-#define _I386_PDA_H
-
-#include <linux/stddef.h>
-#include <linux/types.h>
-#include <asm/percpu.h>
-
-struct i386_pda
-{
-	struct i386_pda *_pda;		/* pointer to self */
-
-	int cpu_number;
-	struct task_struct *pcurrent;	/* current process */
-	struct pt_regs *irq_regs;
-};
-
-DECLARE_PER_CPU(struct i386_pda, _cpu_pda);
-
-#define pda_offset(field) offsetof(struct i386_pda, field)
-
-extern void __bad_pda_field(void);
-
-/* This variable is never instantiated.  It is only used as a stand-in
-   for the real per-cpu PDA memory, so that gcc can understand what
-   memory operations the inline asms() below are performing.  This
-   eliminates the need to make the asms volatile or have memory
-   clobbers, so gcc can readily analyse them. */
-extern struct i386_pda _proxy_pda;
-
-#define pda_to_op(op,field,val)						\
-	do {								\
-		typedef typeof(_proxy_pda.field) T__;			\
-		if (0) { T__ tmp__; tmp__ = (val); }			\
-		switch (sizeof(_proxy_pda.field)) {			\
-		case 1:							\
-			asm(op "b %1,%%fs:%c2"				\
-			    : "+m" (_proxy_pda.field)			\
-			    :"ri" ((T__)val),				\
-			     "i"(pda_offset(field)));			\
-			break;						\
-		case 2:							\
-			asm(op "w %1,%%fs:%c2"				\
-			    : "+m" (_proxy_pda.field)			\
-			    :"ri" ((T__)val),				\
-			     "i"(pda_offset(field)));			\
-			break;						\
-		case 4:							\
-			asm(op "l %1,%%fs:%c2"				\
-			    : "+m" (_proxy_pda.field)			\
-			    :"ri" ((T__)val),				\
-			     "i"(pda_offset(field)));			\
-			break;						\
-		default: __bad_pda_field();				\
-		}							\
-	} while (0)
-
-#define pda_from_op(op,field)						\
-	({								\
-		typeof(_proxy_pda.field) ret__;				\
-		switch (sizeof(_proxy_pda.field)) {			\
-		case 1:							\
-			asm(op "b %%fs:%c1,%0"				\
-			    : "=r" (ret__)				\
-			    : "i" (pda_offset(field)),			\
-			      "m" (_proxy_pda.field));			\
-			break;						\
-		case 2:							\
-			asm(op "w %%fs:%c1,%0"				\
-			    : "=r" (ret__)				\
-			    : "i" (pda_offset(field)),			\
-			      "m" (_proxy_pda.field));			\
-			break;						\
-		case 4:							\
-			asm(op "l %%fs:%c1,%0"				\
-			    : "=r" (ret__)				\
-			    : "i" (pda_offset(field)),			\
-			      "m" (_proxy_pda.field));			\
-			break;						\
-		default: __bad_pda_field();				\
-		}							\
-		ret__; })
-
-/* Return a pointer to a pda field */
-#define pda_addr(field)							\
-	((typeof(_proxy_pda.field) *)((unsigned char *)read_pda(_pda) + \
-				      pda_offset(field)))
-
-#define read_pda(field) pda_from_op("mov",field)
-#define write_pda(field,val) pda_to_op("mov",field,val)
-#define add_pda(field,val) pda_to_op("add",field,val)
-#define sub_pda(field,val) pda_to_op("sub",field,val)
-#define or_pda(field,val) pda_to_op("or",field,val)
-
-extern struct i386_pda boot_pda;
-#endif	/* _I386_PDA_H */
diff -r b30bbd45ba64 include/asm-i386/percpu.h
--- a/include/asm-i386/percpu.h	Tue Mar 06 23:35:01 2007 +1100
+++ b/include/asm-i386/percpu.h	Tue Mar 06 23:35:04 2007 +1100
@@ -1,31 +1,133 @@
 #ifndef __ARCH_I386_PERCPU__
 #define __ARCH_I386_PERCPU__
 
-#ifndef __ASSEMBLY__
-#include <asm-generic/percpu.h>
-#else
+#ifdef __ASSEMBLY__
 
 /*
  * PER_CPU finds an address of a per-cpu variable.
  *
  * Args:
  *    var - variable name
- *    cpu - 32bit register containing the current CPU number
+ *    reg - 32bit register
  *
- * The resulting address is stored in the "cpu" argument.
+ * The resulting address is stored in the "reg" argument.
  *
  * Example:
  *    PER_CPU(cpu_gdt_descr, %ebx)
  */
 #ifdef CONFIG_SMP
-#define PER_CPU(var, cpu) \
-	movl __per_cpu_offset(,cpu,4), cpu;	\
-	addl $per_cpu__##var, cpu;
+#define PER_CPU(var, reg)			\
+	movl %fs:per_cpu__this_cpu_off, reg;		\
+	addl $per_cpu__##var, reg
 #else /* ! SMP */
-#define PER_CPU(var, cpu) \
-	movl $per_cpu__##var, cpu;
+#define PER_CPU(var, reg) \
+	movl $per_cpu__##var, reg
 #endif	/* SMP */
 
+#else /* ...!ASSEMBLY */
+
+#ifdef CONFIG_SMP
+/* Same as generic implementation except for optimized local access. */
+#define __GENERIC_PER_CPU
+
+/* This is used for other cpus to find our section. */
+extern unsigned long __per_cpu_offset[];
+
+/* Separate out the type, so (int[3], foo) works. */
+#define DECLARE_PER_CPU(type, name) extern __typeof__(type) per_cpu__##name
+#define DEFINE_PER_CPU(type, name) \
+    __attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name
+
+/* We can use this directly for local CPU (faster). */
+DECLARE_PER_CPU(unsigned long, this_cpu_off);
+
+/* var is in discarded region: offset to particular copy we want */
+#define per_cpu(var, cpu) (*({				\
+	extern int simple_indentifier_##var(void);	\
+	RELOC_HIDE(&per_cpu__##var, __per_cpu_offset[cpu]); }))
+
+#define __raw_get_cpu_var(var) (*({					\
+	extern int simple_indentifier_##var(void);			\
+	RELOC_HIDE(&per_cpu__##var, x86_read_percpu(this_cpu_off));	\
+}))
+
+#define __get_cpu_var(var) __raw_get_cpu_var(var)
+
+/* A macro to avoid #include hell... */
+#define percpu_modcopy(pcpudst, src, size)			\
+do {								\
+	unsigned int __i;					\
+	for_each_possible_cpu(__i)				\
+		memcpy((pcpudst)+__per_cpu_offset[__i],		\
+		       (src), (size));				\
+} while (0)
+
+#define EXPORT_PER_CPU_SYMBOL(var) EXPORT_SYMBOL(per_cpu__##var)
+#define EXPORT_PER_CPU_SYMBOL_GPL(var) EXPORT_SYMBOL_GPL(per_cpu__##var)
+
+/* fs segment starts at (positive) offset == __per_cpu_offset[cpu] */
+#define __percpu_seg "%%fs:"
+#else  /* !SMP */
+#include <asm-generic/percpu.h>
+#define __percpu_seg ""
+#endif	/* SMP */
+
+/* For arch-specific code, we can use direct single-insn ops (they
+ * don't give an lvalue though). */
+extern void __bad_percpu_size(void);
+
+#define percpu_to_op(op,var,val)				\
+	do {							\
+		typedef typeof(var) T__;			\
+		if (0) { T__ tmp__; tmp__ = (val); }		\
+		switch (sizeof(var)) {				\
+		case 1:						\
+			asm(op "b %1,"__percpu_seg"%0"		\
+			    : "+m" (var)			\
+			    :"ri" ((T__)val));			\
+			break;					\
+		case 2:						\
+			asm(op "w %1,"__percpu_seg"%0"		\
+			    : "+m" (var)			\
+			    :"ri" ((T__)val));			\
+			break;					\
+		case 4:						\
+			asm(op "l %1,"__percpu_seg"%0"		\
+			    : "+m" (var)			\
+			    :"ri" ((T__)val));			\
+			break;					\
+		default: __bad_percpu_size();			\
+		}						\
+	} while (0)
+
+#define percpu_from_op(op,var)					\
+	({							\
+		typeof(var) ret__;				\
+		switch (sizeof(var)) {				\
+		case 1:						\
+			asm(op "b "__percpu_seg"%1,%0"		\
+			    : "=r" (ret__)			\
+			    : "m" (var));			\
+			break;					\
+		case 2:						\
+			asm(op "w "__percpu_seg"%1,%0"		\
+			    : "=r" (ret__)			\
+			    : "m" (var));			\
+			break;					\
+		case 4:						\
+			asm(op "l "__percpu_seg"%1,%0"		\
+			    : "=r" (ret__)			\
+			    : "m" (var));			\
+			break;					\
+		default: __bad_percpu_size();			\
+		}						\
+		ret__; })
+
+#define x86_read_percpu(var) percpu_from_op("mov", per_cpu__##var)
+#define x86_write_percpu(var,val) percpu_to_op("mov", per_cpu__##var, val)
+#define x86_add_percpu(var,val) percpu_to_op("add", per_cpu__##var, val)
+#define x86_sub_percpu(var,val) percpu_to_op("sub", per_cpu__##var, val)
+#define x86_or_percpu(var,val) percpu_to_op("or", per_cpu__##var, val)
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ARCH_I386_PERCPU__ */
diff -r b30bbd45ba64 include/asm-i386/processor.h
--- a/include/asm-i386/processor.h	Tue Mar 06 23:35:01 2007 +1100
+++ b/include/asm-i386/processor.h	Tue Mar 06 23:35:04 2007 +1100
@@ -425,7 +425,7 @@ struct thread_struct {
 	.vm86_info = NULL,						\
 	.sysenter_cs = __KERNEL_CS,					\
 	.io_bitmap_ptr = NULL,						\
-	.fs = __KERNEL_PDA,						\
+	.fs = __KERNEL_PERCPU,						\
 }
 
 /*
diff -r b30bbd45ba64 include/asm-i386/segment.h
--- a/include/asm-i386/segment.h	Tue Mar 06 23:35:01 2007 +1100
+++ b/include/asm-i386/segment.h	Tue Mar 06 23:35:04 2007 +1100
@@ -39,7 +39,7 @@
  *  25 - APM BIOS support 
  *
  *  26 - ESPFIX small SS
- *  27 - PDA				[ per-cpu private data area ]
+ *  27 - per-cpu			[ offset to per-cpu data area ]
  *  28 - unused
  *  29 - unused
  *  30 - unused
@@ -74,8 +74,8 @@
 #define GDT_ENTRY_ESPFIX_SS		(GDT_ENTRY_KERNEL_BASE + 14)
 #define __ESPFIX_SS (GDT_ENTRY_ESPFIX_SS * 8)
 
-#define GDT_ENTRY_PDA			(GDT_ENTRY_KERNEL_BASE + 15)
-#define __KERNEL_PDA (GDT_ENTRY_PDA * 8)
+#define GDT_ENTRY_PERCPU			(GDT_ENTRY_KERNEL_BASE + 15)
+#define __KERNEL_PERCPU (GDT_ENTRY_PERCPU * 8)
 
 #define GDT_ENTRY_DOUBLEFAULT_TSS	31
 
diff -r b30bbd45ba64 include/asm-i386/smp.h
--- a/include/asm-i386/smp.h	Tue Mar 06 23:35:01 2007 +1100
+++ b/include/asm-i386/smp.h	Tue Mar 06 23:35:04 2007 +1100
@@ -8,7 +8,6 @@
 #include <linux/kernel.h>
 #include <linux/threads.h>
 #include <linux/cpumask.h>
-#include <asm/pda.h>
 #endif
 
 #if defined(CONFIG_X86_LOCAL_APIC) && !defined(__ASSEMBLY__)
@@ -59,7 +58,8 @@ do { } while (0)
  * from the initial startup. We map APIC_BASE very early in page_setup(),
  * so this is correct in the x86 case.
  */
-#define raw_smp_processor_id() (read_pda(cpu_number))
+DECLARE_PER_CPU(int, cpu_number);
+#define raw_smp_processor_id() (x86_read_percpu(cpu_number))
 
 extern cpumask_t cpu_callout_map;
 extern cpumask_t cpu_callin_map;



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 8/8] Convert PDA into the percpu section
  2007-03-06 13:03               ` [PATCH 8/8] Convert PDA into the percpu section Rusty Russell
@ 2007-03-06 13:10                 ` Ingo Molnar
  2007-03-07  0:12                   ` Rusty Russell
  2007-03-06 18:28                 ` Jeremy Fitzhardinge
                                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 36+ messages in thread
From: Ingo Molnar @ 2007-03-06 13:10 UTC (permalink / raw)
  To: Rusty Russell
  Cc: lkml - Kernel Mailing List, Zachary Amsden, Jeremy Fitzhardinge,
	Andrew Morton, Andi Kleen


* Rusty Russell <rusty@rustcorp.com.au> wrote:

> Currently x86 (similar to x84-64) has a special per-cpu structure 
> called "i386_pda" which can be easily and efficiently referenced via 
> the %fs register.  An ELF section is more flexible than a structure, 
> allowing any piece of code to use this area.  Indeed, such a section 
> already exists: the per-cpu area.
> 
> So this patch
> (1) Removes the PDA and uses per-cpu variables for each current member.

hmm ... i very much like this, but its needs performance and kernel-size 
testing before it can move from -mm into mainline. We are now exposing 
wide ranges of the kernel to segment prefixes again. (Btw., i'd expect 
there to be a kernel size reduction.)

	Ingo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 6/8] Allow per-cpu variables to be page-aligned
  2007-03-06 13:00           ` [PATCH 6/8] Allow per-cpu variables to be page-aligned Rusty Russell
  2007-03-06 13:01             ` [PATCH 7/8] Page-align the GDT Rusty Russell
@ 2007-03-06 13:15             ` Ingo Molnar
  2007-03-07  0:16               ` Rusty Russell
  2007-03-06 18:17             ` Jeremy Fitzhardinge
  2 siblings, 1 reply; 36+ messages in thread
From: Ingo Molnar @ 2007-03-06 13:15 UTC (permalink / raw)
  To: Rusty Russell
  Cc: lkml - Kernel Mailing List, Zachary Amsden, Jeremy Fitzhardinge,
	Andrew Morton, Andi Kleen


* Rusty Russell <rusty@rustcorp.com.au> wrote:

> Xen wants page-aligned GDT (and PDA must not cross a page-boundary, 
> but that doesn't happen at the moment since it's so close to start of 
> page).  Let's allow page-alignment in general for per-cpu data.
> 
> Because larger alignments can use more room, we increase the max 
> per-cpu memory to 64k rather than 32k: it's getting a little tight.

i recently needed page-aligned per-cpu data too for KVM-paravirt. Btw., 
what's the size increase of the native kernel?

Acked-by: Ingo Molnar <mingo@elte.hu>

	Ingo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/8] Remove cpu_gdt_table: use boot_gdt_table until migration to per-cpu
  2007-03-06 12:53 ` [PATCH 1/8] Remove cpu_gdt_table: use boot_gdt_table until migration to per-cpu Rusty Russell
  2007-03-06 12:54   ` [PATCH 2/8] Remove NR_CPUS from asm-generic/percpu.h Rusty Russell
@ 2007-03-06 13:20   ` Ingo Molnar
  2007-03-06 13:26     ` Ingo Molnar
  2007-03-07  0:22     ` Rusty Russell
  1 sibling, 2 replies; 36+ messages in thread
From: Ingo Molnar @ 2007-03-06 13:20 UTC (permalink / raw)
  To: Rusty Russell
  Cc: lkml - Kernel Mailing List, Zachary Amsden, Jeremy Fitzhardinge,
	Andrew Morton, Andi Kleen


* Rusty Russell <rusty@rustcorp.com.au> wrote:

> +/* The boot Global Descriptor Table: after boot we allocate a per-cpu copy */
>  	.align L1_CACHE_BYTES
>  ENTRY(boot_gdt_table)
> -	.fill GDT_ENTRY_BOOT_CS,8,0
> -	.quad 0x00cf9a000000ffff	/* kernel 4GB code at 0x00000000 */
> -	.quad 0x00cf92000000ffff	/* kernel 4GB data at 0x00000000 */
> -
> -/*
> - * The Global Descriptor Table contains 28 quadwords, per-CPU.
> - */
> -	.align L1_CACHE_BYTES
> -ENTRY(cpu_gdt_table)
>  	.quad 0x0000000000000000	/* NULL descriptor */
>  	.quad 0x0000000000000000	/* 0x0b reserved */
> -	.quad 0x0000000000000000	/* 0x13 reserved */
> -	.quad 0x0000000000000000	/* 0x1b reserved */
> +	.quad 0x00cf9a000000ffff	/* boot: 4GB code at 0x00000000 */
> +	.quad 0x00cf92000000ffff	/* boot: 4GB data at 0x00000000 */
>  	.quad 0x0000000000000000	/* 0x20 unused */
>  	.quad 0x0000000000000000	/* 0x28 unused */
>  	.quad 0x0000000000000000	/* 0x33 TLS entry 1 */

actually, the reason for the small boot GDT was that some systems 
wouldnt even boot with a larger GDT. (there was some BIOS interaction, 
forgot what it was - iirc it was mach-visws and also some other older 
box)

the 'simplification' you do here is actually how Linux worked before it 
was fixed for those systems.

so i'd be quite nervous to touch this area of the GDT code. (I'd support 
a renaming though, gdt_table is indeed ugly.)

	Ingo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/8] Remove NR_CPUS from asm-generic/percpu.h
  2007-03-06 12:54   ` [PATCH 2/8] Remove NR_CPUS from asm-generic/percpu.h Rusty Russell
  2007-03-06 12:55     ` [PATCH 3/8] Use per-cpu variables for GDT, PDA Rusty Russell
@ 2007-03-06 13:21     ` Ingo Molnar
  1 sibling, 0 replies; 36+ messages in thread
From: Ingo Molnar @ 2007-03-06 13:21 UTC (permalink / raw)
  To: Rusty Russell
  Cc: lkml - Kernel Mailing List, Zachary Amsden, Jeremy Fitzhardinge,
	Andrew Morton, Andi Kleen


* Rusty Russell <rusty@rustcorp.com.au> wrote:

> I managed to get an error about NR_CPUS undefined after a make 
> allyesconfig.  Admittedly this was a patched kernel, but it's easy to 
> remove it and avoid the error in future even if it's OK at the moment.

> -extern unsigned long __per_cpu_offset[NR_CPUS];
> +extern unsigned long __per_cpu_offset[];

i'd much rather have us fix that include file dependency problem than to 
work it around by hiding the true dimension of the __per_cpu_offset 
array

	Ingo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/8] Remove cpu_gdt_table: use boot_gdt_table until migration to per-cpu
  2007-03-06 13:20   ` [PATCH 1/8] Remove cpu_gdt_table: use boot_gdt_table until migration to per-cpu Ingo Molnar
@ 2007-03-06 13:26     ` Ingo Molnar
  2007-03-07  0:22     ` Rusty Russell
  1 sibling, 0 replies; 36+ messages in thread
From: Ingo Molnar @ 2007-03-06 13:26 UTC (permalink / raw)
  To: Rusty Russell
  Cc: lkml - Kernel Mailing List, Zachary Amsden, Jeremy Fitzhardinge,
	Andrew Morton, Andi Kleen


* Ingo Molnar <mingo@elte.hu> wrote:

> > +	.quad 0x00cf9a000000ffff	/* boot: 4GB code at 0x00000000 */
> > +	.quad 0x00cf92000000ffff	/* boot: 4GB data at 0x00000000 */
> >  	.quad 0x0000000000000000	/* 0x20 unused */
> >  	.quad 0x0000000000000000	/* 0x28 unused */
> >  	.quad 0x0000000000000000	/* 0x33 TLS entry 1 */
> 
> actually, the reason for the small boot GDT was that some systems 
> wouldnt even boot with a larger GDT. (there was some BIOS interaction, 
> forgot what it was - iirc it was mach-visws and also some other older 
> box)
> 
> the 'simplification' you do here is actually how Linux worked before 
> it was fixed for those systems.
> 
> so i'd be quite nervous to touch this area of the GDT code. (I'd 
> support a renaming though, gdt_table is indeed ugly.)

and this seems to have a ripple effect on the rest of your GDT changes - 
i'll wait for this to be cleared before doing a line by line review of 
them. I commented on the mostly-unaffected ones already.

	Ingo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/8] Use per-cpu variables for GDT, PDA
  2007-03-06 12:55     ` [PATCH 3/8] Use per-cpu variables for GDT, PDA Rusty Russell
  2007-03-06 12:57       ` [PATCH 4/8] Cleanup setup_pda Rusty Russell
@ 2007-03-06 18:14       ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-06 18:14 UTC (permalink / raw)
  To: Rusty Russell
  Cc: lkml - Kernel Mailing List, Zachary Amsden, Jeremy Fitzhardinge,
	Ingo Molnar, Andrew Morton, Andi Kleen

Rusty Russell wrote:
> Allocating PDA and GDT at boot is a pain.  Using simple per-cpu
> variables adds happiness (although we need the GDT page-aligned for
> Xen, see later).
>   

Not just Xen.  Apparently it leads to general goodness.

> Finally, we can simply call it "cpu_gdt" rather than enduring the
> superfluous and unnecessarily redundant tautology of "cpu_gdt_table".
>   

Ah, well, at least it isn't excessive.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>

    J

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/8] Cleanup GDT access
  2007-03-06 12:58         ` [PATCH 5/8] Cleanup GDT access Rusty Russell
  2007-03-06 13:00           ` [PATCH 6/8] Allow per-cpu variables to be page-aligned Rusty Russell
@ 2007-03-06 18:16           ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-06 18:16 UTC (permalink / raw)
  To: Rusty Russell
  Cc: lkml - Kernel Mailing List, Zachary Amsden, Jeremy Fitzhardinge,
	Ingo Molnar, Andrew Morton, Andi Kleen

Rusty Russell wrote:
> Now we have an explicit per-cpu GDT variable, we don't need to keep
> the descriptors around to use them to find the GDT.  Also remove the
> cpu_pda() accessor: it's just a per-cpu variable.
>   
Looks good.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>

    J

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 6/8] Allow per-cpu variables to be page-aligned
  2007-03-06 13:00           ` [PATCH 6/8] Allow per-cpu variables to be page-aligned Rusty Russell
  2007-03-06 13:01             ` [PATCH 7/8] Page-align the GDT Rusty Russell
  2007-03-06 13:15             ` [PATCH 6/8] Allow per-cpu variables to be page-aligned Ingo Molnar
@ 2007-03-06 18:17             ` Jeremy Fitzhardinge
  2007-03-07  0:29               ` Rusty Russell
  2 siblings, 1 reply; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-06 18:17 UTC (permalink / raw)
  To: Rusty Russell
  Cc: lkml - Kernel Mailing List, Zachary Amsden, Jeremy Fitzhardinge,
	Ingo Molnar, Andrew Morton, Andi Kleen

Rusty Russell wrote:
> Xen wants page-aligned GDT (and PDA must not cross a page-boundary,
> but that doesn't happen at the moment since it's so close to start of
> page).  Let's allow page-alignment in general for per-cpu data.
>
> Because larger alignments can use more room, we increase the max
> per-cpu memory to 64k rather than 32k: it's getting a little tight.
>
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
>
> diff -r 213b1ec27001 arch/alpha/kernel/vmlinux.lds.S
> --- a/arch/alpha/kernel/vmlinux.lds.S	Tue Mar 06 19:01:59 2007 +1100
> +++ b/arch/alpha/kernel/vmlinux.lds.S	Tue Mar 06 19:02:03 2007 +1100
> @@ -69,7 +69,7 @@ SECTIONS
>    . = ALIGN(8);
>    SECURITY_INIT
>  
> -  . = ALIGN(64);
> +  . = ALIGN(8192);
>   

Isn't there a PAGE_SIZE we can use here?  PAGE_SIZE_asm?
(ditto all archs)

    J

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 8/8] Convert PDA into the percpu section
  2007-03-06 13:03               ` [PATCH 8/8] Convert PDA into the percpu section Rusty Russell
  2007-03-06 13:10                 ` Ingo Molnar
@ 2007-03-06 18:28                 ` Jeremy Fitzhardinge
  2007-03-06 19:34                 ` Andi Kleen
  2007-03-13 17:15                 ` Jeremy Fitzhardinge
  3 siblings, 0 replies; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-06 18:28 UTC (permalink / raw)
  To: Rusty Russell
  Cc: lkml - Kernel Mailing List, Zachary Amsden, Jeremy Fitzhardinge,
	Ingo Molnar, Andrew Morton, Andi Kleen

Rusty Russell wrote:
> Currently x86 (similar to x84-64) has a special per-cpu structure
> called "i386_pda" which can be easily and efficiently referenced via
> the %fs register.  An ELF section is more flexible than a structure,
> allowing any piece of code to use this area.  Indeed, such a section
> already exists: the per-cpu area.
>
> So this patch
> (1) Removes the PDA and uses per-cpu variables for each current member.
> (2) Replaces the __KERNEL_PDA segment with __KERNEL_PERCPU.
> (3) Creates a per-cpu mirror of __per_cpu_offset called this_cpu_off, which
>     can be used to calculate addresses for this CPU's variables.
> (4) Moves the boot cpu's GDT/percpu setup to smp_prepare_boot_cpu(),
>     immediately after the per-cpu areas are allocated.
>
> The result is one less x86-specific concept.
>   

Looks good.  I think you can drop x86_add/sub/or_percpu; there are no users.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>

    J

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 8/8] Convert PDA into the percpu section
  2007-03-06 19:34                 ` Andi Kleen
@ 2007-03-06 18:37                   ` Jeremy Fitzhardinge
  2007-03-07  0:33                   ` Rusty Russell
  1 sibling, 0 replies; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-06 18:37 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Rusty Russell, lkml - Kernel Mailing List, Zachary Amsden,
	Jeremy Fitzhardinge, Ingo Molnar, Andrew Morton

Andi Kleen wrote:
> Sigh -- i had hoped this had settled down because it was a
> merging nightmare last time. Ok. 
>
>   
>> +#define percpu_to_op(op,var,val)				\
>> +	do {							\
>> +		typedef typeof(var) T__;			\
>> +		if (0) { T__ tmp__; tmp__ = (val); }		\
>> +		switch (sizeof(var)) {				\
>> +		case 1:						\
>> +			asm(op "b %1,"__percpu_seg"%0"		\
>> +			    : "+m" (var)			\
>> +			    :"ri" ((T__)val));			\
>>     
>
> Perhaps I'm blind but I can't see where the %fs reference is there.
>   
__percpu_seg.  It's defined to nothing in the UP case.

    J

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 8/8] Convert PDA into the percpu section
  2007-03-06 13:03               ` [PATCH 8/8] Convert PDA into the percpu section Rusty Russell
  2007-03-06 13:10                 ` Ingo Molnar
  2007-03-06 18:28                 ` Jeremy Fitzhardinge
@ 2007-03-06 19:34                 ` Andi Kleen
  2007-03-06 18:37                   ` Jeremy Fitzhardinge
  2007-03-07  0:33                   ` Rusty Russell
  2007-03-13 17:15                 ` Jeremy Fitzhardinge
  3 siblings, 2 replies; 36+ messages in thread
From: Andi Kleen @ 2007-03-06 19:34 UTC (permalink / raw)
  To: Rusty Russell
  Cc: lkml - Kernel Mailing List, Zachary Amsden, Jeremy Fitzhardinge,
	Ingo Molnar, Andrew Morton, Andi Kleen


Sigh -- i had hoped this had settled down because it was a
merging nightmare last time. Ok. 

> +#define percpu_to_op(op,var,val)				\
> +	do {							\
> +		typedef typeof(var) T__;			\
> +		if (0) { T__ tmp__; tmp__ = (val); }		\
> +		switch (sizeof(var)) {				\
> +		case 1:						\
> +			asm(op "b %1,"__percpu_seg"%0"		\
> +			    : "+m" (var)			\
> +			    :"ri" ((T__)val));			\

Perhaps I'm blind but I can't see where the %fs reference is there.
How does this work? 

Do you have text size comparisons before/after and possible lmbench? 

-Andi

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 8/8] Convert PDA into the percpu section
  2007-03-06 13:10                 ` Ingo Molnar
@ 2007-03-07  0:12                   ` Rusty Russell
  2007-03-07  0:35                     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 36+ messages in thread
From: Rusty Russell @ 2007-03-07  0:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: lkml - Kernel Mailing List, Zachary Amsden, Jeremy Fitzhardinge,
	Andrew Morton, Andi Kleen

On Tue, 2007-03-06 at 14:10 +0100, Ingo Molnar wrote:
> * Rusty Russell <rusty@rustcorp.com.au> wrote:
> 
> > Currently x86 (similar to x84-64) has a special per-cpu structure 
> > called "i386_pda" which can be easily and efficiently referenced via 
> > the %fs register.  An ELF section is more flexible than a structure, 
> > allowing any piece of code to use this area.  Indeed, such a section 
> > already exists: the per-cpu area.
> > 
> > So this patch
> > (1) Removes the PDA and uses per-cpu variables for each current member.
> 
> hmm ... i very much like this, but its needs performance and kernel-size 
> testing before it can move from -mm into mainline. We are now exposing 
> wide ranges of the kernel to segment prefixes again. (Btw., i'd expect 
> there to be a kernel size reduction.)

Hi Ingo,

	Thanks!  There are some interesting issues.  Because __get_cpu_var()
returns an lvalue, we don't use the %fs:value directly, but calculate
offset (%fs:this_cpu_off + &value).  So previously there was only a tiny
code reduction.

	If we used __thread, then gcc could do this optimization for us when it
knows an rvalue is needed, however:

1) gcc wants to use %gs, not %fs, which is measurably slower for the
kernel,
2) gcc wants to use huge offsets to store the address of the per-cpu
space, and this breaks Xen (and current lguest, but new lguest no longer
uses segments for protection)

One solution would be to expose x86_read_percpu() as read_percpu() and
implement it in asm-generic/percpu.h as well, then use it in places
where only an rvalue is required.

Cheers!
Rusty.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 6/8] Allow per-cpu variables to be page-aligned
  2007-03-06 13:15             ` [PATCH 6/8] Allow per-cpu variables to be page-aligned Ingo Molnar
@ 2007-03-07  0:16               ` Rusty Russell
  2007-03-07  0:44                 ` H. Peter Anvin
  0 siblings, 1 reply; 36+ messages in thread
From: Rusty Russell @ 2007-03-07  0:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: lkml - Kernel Mailing List, Zachary Amsden, Jeremy Fitzhardinge,
	Andrew Morton, Andi Kleen

On Tue, 2007-03-06 at 14:15 +0100, Ingo Molnar wrote:
> * Rusty Russell <rusty@rustcorp.com.au> wrote:
> 
> > Xen wants page-aligned GDT (and PDA must not cross a page-boundary, 
> > but that doesn't happen at the moment since it's so close to start of 
> > page).  Let's allow page-alignment in general for per-cpu data.
> > 
> > Because larger alignments can use more room, we increase the max 
> > per-cpu memory to 64k rather than 32k: it's getting a little tight.
> 
> i recently needed page-aligned per-cpu data too for KVM-paravirt. Btw., 
> what's the size increase of the native kernel?

To clarify for those reading fast, this patch doesn't increase at all,
since noone uses page-alignment.  For the next patch which aligns the
gdt, it's about +12k (21k -> 33k), but then we take away the pda in the
next patch and it drops back to +8k (the pda is in the same file as the
gdt, and on my system at least it ends up getting its own 4k alignment
too).

AFAICT, we could save the wasted partial page in three ways: by trying
to link it first, if it's only a case of one file, by having a special
page-aligned percpu data section, or by having a smarter linker.

Cheers,
Rusty.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/8] Remove cpu_gdt_table: use boot_gdt_table until migration to per-cpu
  2007-03-06 13:20   ` [PATCH 1/8] Remove cpu_gdt_table: use boot_gdt_table until migration to per-cpu Ingo Molnar
  2007-03-06 13:26     ` Ingo Molnar
@ 2007-03-07  0:22     ` Rusty Russell
  1 sibling, 0 replies; 36+ messages in thread
From: Rusty Russell @ 2007-03-07  0:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: lkml - Kernel Mailing List, Zachary Amsden, Jeremy Fitzhardinge,
	Andrew Morton, Andi Kleen, pazke

On Tue, 2007-03-06 at 14:20 +0100, Ingo Molnar wrote:
> * Rusty Russell <rusty@rustcorp.com.au> wrote:
> 
> > +/* The boot Global Descriptor Table: after boot we allocate a per-cpu copy */
> >  	.align L1_CACHE_BYTES
> >  ENTRY(boot_gdt_table)
> > -	.fill GDT_ENTRY_BOOT_CS,8,0
> > -	.quad 0x00cf9a000000ffff	/* kernel 4GB code at 0x00000000 */
> > -	.quad 0x00cf92000000ffff	/* kernel 4GB data at 0x00000000 */
> > -
> > -/*
> > - * The Global Descriptor Table contains 28 quadwords, per-CPU.
> > - */
> > -	.align L1_CACHE_BYTES
> > -ENTRY(cpu_gdt_table)
> >  	.quad 0x0000000000000000	/* NULL descriptor */
> >  	.quad 0x0000000000000000	/* 0x0b reserved */
> > -	.quad 0x0000000000000000	/* 0x13 reserved */
> > -	.quad 0x0000000000000000	/* 0x1b reserved */
> > +	.quad 0x00cf9a000000ffff	/* boot: 4GB code at 0x00000000 */
> > +	.quad 0x00cf92000000ffff	/* boot: 4GB data at 0x00000000 */
> >  	.quad 0x0000000000000000	/* 0x20 unused */
> >  	.quad 0x0000000000000000	/* 0x28 unused */
> >  	.quad 0x0000000000000000	/* 0x33 TLS entry 1 */
> 
> actually, the reason for the small boot GDT was that some systems 
> wouldnt even boot with a larger GDT. (there was some BIOS interaction, 
> forgot what it was - iirc it was mach-visws and also some other older 
> box)

Thanks for the explanation: I did wonder!  I left the descriptor
truncated to just the first few entries, so I'm *pretty sure* it won't
matter: the layout is identical.

I'd really like to try it and see, though; Andre?
Rusty.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 6/8] Allow per-cpu variables to be page-aligned
  2007-03-06 18:17             ` Jeremy Fitzhardinge
@ 2007-03-07  0:29               ` Rusty Russell
  0 siblings, 0 replies; 36+ messages in thread
From: Rusty Russell @ 2007-03-07  0:29 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: lkml - Kernel Mailing List, Zachary Amsden, Jeremy Fitzhardinge,
	Ingo Molnar, Andrew Morton, Andi Kleen

On Tue, 2007-03-06 at 10:17 -0800, Jeremy Fitzhardinge wrote:
> Rusty Russell wrote:
> > diff -r 213b1ec27001 arch/alpha/kernel/vmlinux.lds.S
> > --- a/arch/alpha/kernel/vmlinux.lds.S	Tue Mar 06 19:01:59 2007 +1100
> > +++ b/arch/alpha/kernel/vmlinux.lds.S	Tue Mar 06 19:02:03 2007 +1100
> > @@ -69,7 +69,7 @@ SECTIONS
> >    . = ALIGN(8);
> >    SECURITY_INIT
> >  
> > -  . = ALIGN(64);
> > +  . = ALIGN(8192);
> >   
> 
> Isn't there a PAGE_SIZE we can use here?  PAGE_SIZE_asm?
> (ditto all archs)

Probably, but I carefully copied the style from other parts of the same
file for the same arch.  So, that cleanup would be another patch 8)

Rusty.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 8/8] Convert PDA into the percpu section
  2007-03-06 19:34                 ` Andi Kleen
  2007-03-06 18:37                   ` Jeremy Fitzhardinge
@ 2007-03-07  0:33                   ` Rusty Russell
  2007-03-07 11:55                     ` Rusty Russell
  1 sibling, 1 reply; 36+ messages in thread
From: Rusty Russell @ 2007-03-07  0:33 UTC (permalink / raw)
  To: Andi Kleen
  Cc: lkml - Kernel Mailing List, Zachary Amsden, Jeremy Fitzhardinge,
	Ingo Molnar, Andrew Morton

On Tue, 2007-03-06 at 20:34 +0100, Andi Kleen wrote:
> Sigh -- i had hoped this had settled down because it was a
> merging nightmare last time. Ok. 

Indeed, that's why I waited until everything else was fully merged and
accepted.

> > +#define percpu_to_op(op,var,val)				\
> > +	do {							\
> > +		typedef typeof(var) T__;			\
> > +		if (0) { T__ tmp__; tmp__ = (val); }		\
> > +		switch (sizeof(var)) {				\
> > +		case 1:						\
> > +			asm(op "b %1,"__percpu_seg"%0"		\
> > +			    : "+m" (var)			\
> > +			    :"ri" ((T__)val));			\
> 
> Perhaps I'm blind but I can't see where the %fs reference is there.
> How does this work? 

Here:

+/* fs segment starts at (positive) offset == __per_cpu_offset[cpu] */
+#define __percpu_seg "%%fs:"
+#else  /* !SMP */
+#include <asm-generic/percpu.h>
+#define __percpu_seg ""
+#endif /* SMP */

That's how we get SMP & non-SMP unification in that code.

> Do you have text size comparisons before/after and possible lmbench? 

No, but I'll run them this evening.  Last time the size reduction was
slight, and there was no measurable performance improvement in
microbenchmarks.

Thanks,
Rusty.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 8/8] Convert PDA into the percpu section
  2007-03-07  0:12                   ` Rusty Russell
@ 2007-03-07  0:35                     ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-07  0:35 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Ingo Molnar, lkml - Kernel Mailing List, Zachary Amsden,
	Jeremy Fitzhardinge, Andrew Morton, Andi Kleen

Rusty Russell wrote:
> 	If we used __thread, then gcc could do this optimization for us when it
> knows an rvalue is needed, however:
>
> 1) gcc wants to use %gs, not %fs, which is measurably slower for the
> kernel,
> 2) gcc wants to use huge offsets to store the address of the per-cpu
> space, and this breaks Xen (and current lguest, but new lguest no longer
> uses segments for protection)
>   

Well, if we go to the effort of teaching gcc how to use %fs, we can
probably convince it to generate positive offset TLS relocs too.

    J

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 6/8] Allow per-cpu variables to be page-aligned
  2007-03-07  0:16               ` Rusty Russell
@ 2007-03-07  0:44                 ` H. Peter Anvin
  0 siblings, 0 replies; 36+ messages in thread
From: H. Peter Anvin @ 2007-03-07  0:44 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Ingo Molnar, lkml - Kernel Mailing List, Zachary Amsden,
	Jeremy Fitzhardinge, Andrew Morton, Andi Kleen

Rusty Russell wrote:
> 
> AFAICT, we could save the wasted partial page in three ways: by trying
> to link it first, if it's only a case of one file, by having a special
> page-aligned percpu data section, or by having a smarter linker.
> 

We already do the second of those for conventional data, so it would 
make sense for percpu as well.  Smarter linker would be better, but ugh, 
I don't see that happening any time soon.

	-hpa

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 8/8] Convert PDA into the percpu section
  2007-03-07  0:33                   ` Rusty Russell
@ 2007-03-07 11:55                     ` Rusty Russell
  0 siblings, 0 replies; 36+ messages in thread
From: Rusty Russell @ 2007-03-07 11:55 UTC (permalink / raw)
  To: Andi Kleen
  Cc: lkml - Kernel Mailing List, Zachary Amsden, Jeremy Fitzhardinge,
	Ingo Molnar, Andrew Morton

On Wed, 2007-03-07 at 11:33 +1100, Rusty Russell wrote:
> On Tue, 2007-03-06 at 20:34 +0100, Andi Kleen wrote:
> > Do you have text size comparisons before/after and possible lmbench? 
> 
> No, but I'll run them this evening.  Last time the size reduction was
> slight, and there was no measurable performance improvement in
> microbenchmarks.

Here are the size results, for a start:


UP:
Before:
size vmlinux
   text    data     bss     dec     hex filename
3094881  243110  221184 3559175  364f07 vmlinux

After:
size vmlinux 
   text    data     bss     dec     hex filename
3093409  243142  221184 3557735  364967 vmlinux

SMP:
Before:
size vmlinux
   text    data     bss     dec     hex filename
3222269  318770  237568 3778607  39a82f vmlinux

After:
size vmlinux
   text    data     bss     dec     hex filename
3221421  314674  237568 3773663  3994df vmlinux

(The data size changes are moving from pda -> percpu, and on SMP
removing the page-aligned PDA).

So, a slight win.  lmbench tomorrow...

Rusty.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 8/8] Convert PDA into the percpu section
  2007-03-06 13:03               ` [PATCH 8/8] Convert PDA into the percpu section Rusty Russell
                                   ` (2 preceding siblings ...)
  2007-03-06 19:34                 ` Andi Kleen
@ 2007-03-13 17:15                 ` Jeremy Fitzhardinge
  2007-03-14  2:27                   ` Rusty Russell
  3 siblings, 1 reply; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-13 17:15 UTC (permalink / raw)
  To: Rusty Russell
  Cc: lkml - Kernel Mailing List, Zachary Amsden, Jeremy Fitzhardinge,
	Ingo Molnar, Andrew Morton, Andi Kleen

Rusty Russell wrote:
> +	pack_descriptor((u32 *)&gdt[GDT_ENTRY_PERCPU].a,
> +			(u32 *)&gdt[GDT_ENTRY_PERCPU].b,
> +			__per_cpu_offset[cpu], 0xFFFFF,
>  			0x80 | DESCTYPE_S | 0x2, 0); /* present read-write data segment */
>   

Why testing with qemu is not enough.

diff -r 8dcd1dc9b298 arch/i386/kernel/cpu/common.c
--- a/arch/i386/kernel/cpu/common.c	Tue Mar 13 00:33:37 2007 -0700
+++ b/arch/i386/kernel/cpu/common.c	Tue Mar 13 08:33:42 2007 -0700
@@ -627,7 +627,7 @@ __cpuinit void init_gdt(int cpu, struct 
 	pack_descriptor((u32 *)&gdt[GDT_ENTRY_PERCPU].a,
 			(u32 *)&gdt[GDT_ENTRY_PERCPU].b,
 			__per_cpu_offset[cpu], 0xFFFFF,
-			0x80 | DESCTYPE_S | 0x2, 0); /* present read-write data segment */
+			0x80 | DESCTYPE_S | 0x2, 0x8); /* present read-write data segment, G */
 	per_cpu(this_cpu_off, cpu) = __per_cpu_offset[cpu];
 	per_cpu(cpu_number, cpu) = cpu;
 #endif /* SMP*/

    J

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/8] x86 boot, pda and gdt cleanups
  2007-03-06 12:39 [PATCH 0/8] x86 boot, pda and gdt cleanups Rusty Russell
  2007-03-06 12:53 ` [PATCH 1/8] Remove cpu_gdt_table: use boot_gdt_table until migration to per-cpu Rusty Russell
@ 2007-03-13 20:48 ` Jeremy Fitzhardinge
  2007-03-14  2:25   ` Rusty Russell
  2007-03-14 23:55   ` Rusty Russell
  1 sibling, 2 replies; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-13 20:48 UTC (permalink / raw)
  To: Rusty Russell
  Cc: lkml - Kernel Mailing List, Zachary Amsden, Jeremy Fitzhardinge,
	Ingo Molnar, Andrew Morton, Andi Kleen

Rusty Russell wrote:
> Hi all,
>
> 	The GDT stuff on x86 is a little more complex than it need be, but
> playing with boot code is always dangerous.  These compile and boot on
> UP and SMP for me, but Andrew should let the cook in -mm for a while.
>   
Hi Rusty,

This is my rough hacking patch I needed to get things into a Xen-shape
state.

There are a few of things here:

    * init_gdt should always use write_gdt_entry when touching the gdt;
      if it doesn't and it ends up touching an already-installed gdt
      under Xen, it will get a write fault.  This happens because
      init_gdt ends up getting called twice in SMP (see below).
    * init_gdt should always be called before bringing up the cpu,
      rather than by the cpu itself (and therefore, cpu_init() shouldn't
      call it).  Obviously the the boot cpu is an exception.
    * secondary_cpu_init stops being necessary.
    * On SMP, init_gdt can get called twice: first time in
      smp_prepare_boot_cpu, and a second time in  trap_init.  On UP,
      trap_init is the only caller.

diff -r 8dcd1dc9b298 arch/i386/kernel/cpu/common.c
--- a/arch/i386/kernel/cpu/common.c	Tue Mar 13 00:33:37 2007 -0700
+++ b/arch/i386/kernel/cpu/common.c	Tue Mar 13 13:13:08 2007 -0700
@@ -620,14 +620,24 @@ __cpuinit void init_gdt(int cpu, struct 
 __cpuinit void init_gdt(int cpu, struct task_struct *idle)
 {
 	struct desc_struct *gdt = per_cpu(gdt_page, cpu).gdt;
+	int i;
 
 	/* Based on boot_gdt_table: set percpu so it can be used immediately */
- 	memcpy(gdt, boot_gdt_table, GDT_SIZE);
+	for(i = 0; i < GDT_ENTRIES; i++) {
+		const struct desc_struct *bgdtt = &boot_gdt_table[i];
+		write_gdt_entry(gdt, i, bgdtt->a, bgdtt->b);
+	}
+
 #ifdef CONFIG_SMP
-	pack_descriptor((u32 *)&gdt[GDT_ENTRY_PERCPU].a,
-			(u32 *)&gdt[GDT_ENTRY_PERCPU].b,
-			__per_cpu_offset[cpu], 0xFFFFF,
-			0x80 | DESCTYPE_S | 0x2, 0); /* present read-write data segment */
+	{
+		u32 a, b;
+
+		/* present read-write data segment, pagesize granularity */
+		pack_descriptor(&a, &b, __per_cpu_offset[cpu], 0xFFFFF,
+				0x80 | DESCTYPE_S | 0x2, 0x8);
+		write_gdt_entry(gdt, GDT_ENTRY_PERCPU, a, b);
+	}
+
 	per_cpu(this_cpu_off, cpu) = __per_cpu_offset[cpu];
 	per_cpu(cpu_number, cpu) = cpu;
 #endif /* SMP*/
@@ -709,30 +719,21 @@ static void __cpuinit _cpu_init(int cpu,
 	mxcsr_feature_mask_init();
 }
 
-/* Entrypoint to initialize secondary CPU */
-void __cpuinit secondary_cpu_init(void)
-{
-	int cpu = smp_processor_id();
-	struct task_struct *curr = current;
-
-	_cpu_init(cpu, curr);
-}
-
 /*
  * cpu_init() initializes state that is per-CPU. Some data is already
  * initialized (naturally) in the bootstrap process, such as the GDT
  * and IDT. We reload them nevertheless, this function acts as a
  * 'CPU state barrier', nothing should get across.
+ * 
+ * Note: we expect the gdt itself to be already set up for this
+ * processor; we just point the cpu at it.
  */
 void __cpuinit cpu_init(void)
 {
 	int cpu = smp_processor_id();
-	struct task_struct *curr = current;
-
-	/* Set up the real GDT so we can transition from the boot_gdt_table. */
-	init_gdt(cpu, curr);
+
 	cpu_set_gdt(cpu);
-	_cpu_init(cpu, curr);
+	_cpu_init(cpu, current);
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
diff -r 8dcd1dc9b298 arch/i386/kernel/smpboot.c
--- a/arch/i386/kernel/smpboot.c	Tue Mar 13 00:33:37 2007 -0700
+++ b/arch/i386/kernel/smpboot.c	Tue Mar 13 13:13:08 2007 -0700
@@ -381,14 +381,14 @@ static void __cpuinit start_secondary(vo
 static void __cpuinit start_secondary(void *unused)
 {
 	/*
-	 * Don't put *anything* before secondary_cpu_init(), SMP
-	 * booting is too fragile that we want to limit the
-	 * things done here to the most necessary things.
+	 * Don't put *anything* before cpu_init(), SMP booting is too
+	 * fragile that we want to limit the things done here to the
+	 * most necessary things.
 	 */
 #ifdef CONFIG_VMI
 	vmi_bringup();
 #endif
-	secondary_cpu_init();
+	cpu_init();
 	preempt_disable();
 	smp_callin();
 	while (!cpu_isset(smp_processor_id(), smp_commenced_mask))
@@ -1165,15 +1165,17 @@ void __init native_smp_prepare_cpus(unsi
 
 void __devinit native_smp_prepare_boot_cpu(void)
 {
-	cpu_set(smp_processor_id(), cpu_online_map);
-	cpu_set(smp_processor_id(), cpu_callout_map);
-	cpu_set(smp_processor_id(), cpu_present_map);
-	cpu_set(smp_processor_id(), cpu_possible_map);
-	per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
+	int cpu = smp_processor_id();
+
+	cpu_set(cpu, cpu_online_map);
+	cpu_set(cpu, cpu_callout_map);
+	cpu_set(cpu, cpu_present_map);
+	cpu_set(cpu, cpu_possible_map);
+	per_cpu(cpu_state, cpu) = CPU_ONLINE;
 
 	/* Set up %fs to point to our per-CPU area now it's allocated */
-	init_gdt(smp_processor_id(), &init_task);
-	cpu_set_gdt(smp_processor_id());
+	init_gdt(cpu, &init_task);
+	cpu_set_gdt(cpu);
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
diff -r 8dcd1dc9b298 arch/i386/kernel/traps.c
--- a/arch/i386/kernel/traps.c	Tue Mar 13 00:33:37 2007 -0700
+++ b/arch/i386/kernel/traps.c	Tue Mar 13 13:13:08 2007 -0700
@@ -1181,6 +1181,7 @@ void __init trap_init(void)
 	/*
 	 * Should be a barrier for any external CPU state.
 	 */
+	init_gdt(smp_processor_id(), current);
 	cpu_init();
 
 	trap_init_hook();
diff -r 8dcd1dc9b298 include/asm-i386/processor.h
--- a/include/asm-i386/processor.h	Tue Mar 13 00:33:37 2007 -0700
+++ b/include/asm-i386/processor.h	Tue Mar 13 13:13:08 2007 -0700
@@ -744,6 +744,6 @@ extern int sysenter_setup(void);
 
 extern void init_gdt(int cpu, struct task_struct *idle);
 extern void cpu_set_gdt(int);
-extern void secondary_cpu_init(void);
+extern void cpu_init(void);
 
 #endif /* __ASM_I386_PROCESSOR_H */



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/8] x86 boot, pda and gdt cleanups
  2007-03-13 20:48 ` [PATCH 0/8] x86 boot, pda and gdt cleanups Jeremy Fitzhardinge
@ 2007-03-14  2:25   ` Rusty Russell
  2007-03-14  4:39     ` Jeremy Fitzhardinge
  2007-03-14 23:55   ` Rusty Russell
  1 sibling, 1 reply; 36+ messages in thread
From: Rusty Russell @ 2007-03-14  2:25 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: lkml - Kernel Mailing List, Zachary Amsden, Jeremy Fitzhardinge,
	Ingo Molnar, Andrew Morton, Andi Kleen

On Tue, 2007-03-13 at 13:48 -0700, Jeremy Fitzhardinge wrote:
> Rusty Russell wrote:
> > Hi all,
> >
> > 	The GDT stuff on x86 is a little more complex than it need be, but
> > playing with boot code is always dangerous.  These compile and boot on
> > UP and SMP for me, but Andrew should let the cook in -mm for a while.
> >   
> Hi Rusty,
> 
> This is my rough hacking patch I needed to get things into a Xen-shape
> state.

Looks good.  Just one thing:

>  void __devinit native_smp_prepare_boot_cpu(void)
>  {
> -	cpu_set(smp_processor_id(), cpu_online_map);
> -	cpu_set(smp_processor_id(), cpu_callout_map);
> -	cpu_set(smp_processor_id(), cpu_present_map);
> -	cpu_set(smp_processor_id(), cpu_possible_map);
> -	per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
> +	int cpu = smp_processor_id();
> +
> +	cpu_set(cpu, cpu_online_map);
> +	cpu_set(cpu, cpu_callout_map);
> +	cpu_set(cpu, cpu_present_map);
> +	cpu_set(cpu, cpu_possible_map);
> +	per_cpu(cpu_state, cpu) = CPU_ONLINE;
>  
>  	/* Set up %fs to point to our per-CPU area now it's allocated */
> -	init_gdt(smp_processor_id(), &init_task);
> -	cpu_set_gdt(smp_processor_id());
> +	init_gdt(cpu, &init_task);
> +	cpu_set_gdt(cpu);
>  }

This is called "pissing in the corners".  Don't do it: we don't need to
touch that code and I actually prefer the original anyway (explicit is
*good*).  

The habit of extracting cpu number once then using it is an optimization
which we should be aiming to get rid of (it simply hurts archs with
efficient per-cpu implementations).

Cheers,
Rusty.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 8/8] Convert PDA into the percpu section
  2007-03-13 17:15                 ` Jeremy Fitzhardinge
@ 2007-03-14  2:27                   ` Rusty Russell
  0 siblings, 0 replies; 36+ messages in thread
From: Rusty Russell @ 2007-03-14  2:27 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: lkml - Kernel Mailing List, Zachary Amsden, Jeremy Fitzhardinge,
	Ingo Molnar, Andrew Morton, Andi Kleen

On Tue, 2007-03-13 at 10:15 -0700, Jeremy Fitzhardinge wrote:
> Rusty Russell wrote:
> > +	pack_descriptor((u32 *)&gdt[GDT_ENTRY_PERCPU].a,
> > +			(u32 *)&gdt[GDT_ENTRY_PERCPU].b,
> > +			__per_cpu_offset[cpu], 0xFFFFF,
> >  			0x80 | DESCTYPE_S | 0x2, 0); /* present read-write data segment */
> >   
> 
> Why testing with qemu is not enough.

Indeed 8(.

Thanks!
Rusty.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/8] x86 boot, pda and gdt cleanups
  2007-03-14  2:25   ` Rusty Russell
@ 2007-03-14  4:39     ` Jeremy Fitzhardinge
  2007-03-14  6:54       ` Rusty Russell
  0 siblings, 1 reply; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-14  4:39 UTC (permalink / raw)
  To: Rusty Russell
  Cc: lkml - Kernel Mailing List, Zachary Amsden, Jeremy Fitzhardinge,
	Ingo Molnar, Andrew Morton, Andi Kleen

Rusty Russell wrote:
> This is called "pissing in the corners".  Don't do it: we don't need to
> touch that code and I actually prefer the original anyway (explicit is
> *good*).  
>
> The habit of extracting cpu number once then using it is an optimization
> which we should be aiming to get rid of (it simply hurts archs with
> efficient per-cpu implementations).

No, that was for a reason.  I was worried about smp_processor_id() not
returning valid values between init_gdt and cpu_set_gdt.  It's not
actually a problem, but relying on smp_processor_id() while we're moving
the foundations its based on seems fragile.

    J

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/8] x86 boot, pda and gdt cleanups
  2007-03-14  4:39     ` Jeremy Fitzhardinge
@ 2007-03-14  6:54       ` Rusty Russell
  0 siblings, 0 replies; 36+ messages in thread
From: Rusty Russell @ 2007-03-14  6:54 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: lkml - Kernel Mailing List, Zachary Amsden, Jeremy Fitzhardinge,
	Ingo Molnar, Andrew Morton, Andi Kleen

On Tue, 2007-03-13 at 21:39 -0700, Jeremy Fitzhardinge wrote:
> Rusty Russell wrote:
> > This is called "pissing in the corners".  Don't do it: we don't need to
> > touch that code and I actually prefer the original anyway (explicit is
> > *good*).  
> >
> > The habit of extracting cpu number once then using it is an optimization
> > which we should be aiming to get rid of (it simply hurts archs with
> > efficient per-cpu implementations).
> 
> No, that was for a reason.  I was worried about smp_processor_id() not
> returning valid values between init_gdt and cpu_set_gdt.  It's not
> actually a problem, but relying on smp_processor_id() while we're moving
> the foundations its based on seems fragile.

smp_processor_id() always works, so it's fundamental, not fragile.

However, we *should* remove the arg from cpu_set_gdt, since we have such
faith in smp_processor_id() 8)

Cheers,
Rusty.




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/8] x86 boot, pda and gdt cleanups
  2007-03-13 20:48 ` [PATCH 0/8] x86 boot, pda and gdt cleanups Jeremy Fitzhardinge
  2007-03-14  2:25   ` Rusty Russell
@ 2007-03-14 23:55   ` Rusty Russell
  2007-03-15  1:57     ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 36+ messages in thread
From: Rusty Russell @ 2007-03-14 23:55 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: lkml - Kernel Mailing List, Zachary Amsden, Jeremy Fitzhardinge,
	Ingo Molnar, Andrew Morton, Andi Kleen

On Tue, 2007-03-13 at 13:48 -0700, Jeremy Fitzhardinge wrote:
>     * init_gdt should always use write_gdt_entry when touching the gdt;
>       if it doesn't and it ends up touching an already-installed gdt
>       under Xen, it will get a write fault.  This happens because
>       init_gdt ends up getting called twice in SMP (see below).

Hmm, this invalidated my assumption that write_gdt_entry is always a
write to this cpu's active gdt.  Better fix is not to call it twice
anyway...

>     * init_gdt should always be called before bringing up the cpu,
>       rather than by the cpu itself (and therefore, cpu_init() shouldn't
>       call it).  Obviously the the boot cpu is an exception.

Makes sense.

>     * secondary_cpu_init stops being necessary.

Indeed.

>     * On SMP, init_gdt can get called twice: first time in
>       smp_prepare_boot_cpu, and a second time in  trap_init.  On UP,
>       trap_init is the only caller.

Getting rid of the call in smp_prepare_boot_cpu currently works, but
it's fragile:  __get_cpu_var(x) && per_cpu(x, smp_processor_id()) will
differ, and changes made to __get_cpu_var(x) will vanish...

Fortunately, UP doesn't have to call init_gdt at all, so I think it's
better to place it in smp_prepare_boot_cpu only and then clean up the UP
code.  I'll try now...

Rusty.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/8] x86 boot, pda and gdt cleanups
  2007-03-14 23:55   ` Rusty Russell
@ 2007-03-15  1:57     ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-15  1:57 UTC (permalink / raw)
  To: Rusty Russell
  Cc: lkml - Kernel Mailing List, Zachary Amsden, Jeremy Fitzhardinge,
	Ingo Molnar, Andrew Morton, Andi Kleen

Rusty Russell wrote:
> Hmm, this invalidated my assumption that write_gdt_entry is always a
> write to this cpu's active gdt.  Better fix is not to call it twice
> anyway...
>   

No, I don't think that's true.  I implemented the write_*_entry
functions with the assumption they could be called either on setup or on
an in-use entry.  I think its good policy to use it all the time anyway,
since the pv_ops backend might want to fiddle with the values on the way
through.

I tried to avoid calling init_gdt twice, but it seemed cleaner to just
let it happen.

> Getting rid of the call in smp_prepare_boot_cpu currently works, but
> it's fragile:  __get_cpu_var(x) && per_cpu(x, smp_processor_id()) will
> differ, and changes made to __get_cpu_var(x) will vanish...
>   

Yes.  I think its definitely a good idea to call init_gdt asap after
doing the percpu setup.

> Fortunately, UP doesn't have to call init_gdt at all, so I think it's
> better to place it in smp_prepare_boot_cpu only and then clean up the UP
> code.  I'll try now...
>   
It doesn't?  The per-cpu gdt is the same as the boot gdt?


    J

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2007-03-15  1:57 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-06 12:39 [PATCH 0/8] x86 boot, pda and gdt cleanups Rusty Russell
2007-03-06 12:53 ` [PATCH 1/8] Remove cpu_gdt_table: use boot_gdt_table until migration to per-cpu Rusty Russell
2007-03-06 12:54   ` [PATCH 2/8] Remove NR_CPUS from asm-generic/percpu.h Rusty Russell
2007-03-06 12:55     ` [PATCH 3/8] Use per-cpu variables for GDT, PDA Rusty Russell
2007-03-06 12:57       ` [PATCH 4/8] Cleanup setup_pda Rusty Russell
2007-03-06 12:58         ` [PATCH 5/8] Cleanup GDT access Rusty Russell
2007-03-06 13:00           ` [PATCH 6/8] Allow per-cpu variables to be page-aligned Rusty Russell
2007-03-06 13:01             ` [PATCH 7/8] Page-align the GDT Rusty Russell
2007-03-06 13:03               ` [PATCH 8/8] Convert PDA into the percpu section Rusty Russell
2007-03-06 13:10                 ` Ingo Molnar
2007-03-07  0:12                   ` Rusty Russell
2007-03-07  0:35                     ` Jeremy Fitzhardinge
2007-03-06 18:28                 ` Jeremy Fitzhardinge
2007-03-06 19:34                 ` Andi Kleen
2007-03-06 18:37                   ` Jeremy Fitzhardinge
2007-03-07  0:33                   ` Rusty Russell
2007-03-07 11:55                     ` Rusty Russell
2007-03-13 17:15                 ` Jeremy Fitzhardinge
2007-03-14  2:27                   ` Rusty Russell
2007-03-06 13:15             ` [PATCH 6/8] Allow per-cpu variables to be page-aligned Ingo Molnar
2007-03-07  0:16               ` Rusty Russell
2007-03-07  0:44                 ` H. Peter Anvin
2007-03-06 18:17             ` Jeremy Fitzhardinge
2007-03-07  0:29               ` Rusty Russell
2007-03-06 18:16           ` [PATCH 5/8] Cleanup GDT access Jeremy Fitzhardinge
2007-03-06 18:14       ` [PATCH 3/8] Use per-cpu variables for GDT, PDA Jeremy Fitzhardinge
2007-03-06 13:21     ` [PATCH 2/8] Remove NR_CPUS from asm-generic/percpu.h Ingo Molnar
2007-03-06 13:20   ` [PATCH 1/8] Remove cpu_gdt_table: use boot_gdt_table until migration to per-cpu Ingo Molnar
2007-03-06 13:26     ` Ingo Molnar
2007-03-07  0:22     ` Rusty Russell
2007-03-13 20:48 ` [PATCH 0/8] x86 boot, pda and gdt cleanups Jeremy Fitzhardinge
2007-03-14  2:25   ` Rusty Russell
2007-03-14  4:39     ` Jeremy Fitzhardinge
2007-03-14  6:54       ` Rusty Russell
2007-03-14 23:55   ` Rusty Russell
2007-03-15  1:57     ` Jeremy Fitzhardinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).