linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] percpu: Optimize percpu accesses
@ 2008-02-01 19:14 travis
  2008-02-01 19:14 ` [PATCH 1/4] generic: Percpu infrastructure to rebase the per cpu area to zero travis
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: travis @ 2008-02-01 19:14 UTC (permalink / raw)
  To: Andrew Morton, Andi Kleen, Ingo Molnar, Thomas Gleixner
  Cc: Jeremy Fitzhardinge, Christoph Lameter, Jack Steiner, linux-mm,
	linux-kernel


This patchset provides the following:

  * Generic: Percpu infrastructure to rebase the per cpu area to zero

    This provides for the capability of accessing the percpu variables
    using a local register instead of having to go through a table
    on node 0 to find this cpu specific offsets.  It also would allow
    atomic operations on percpu variables to reduce required locking.

  * Init: Move setup of nr_cpu_ids to as early as possible for usage
    by early boot functions.

  * x86_64: Fold pda into per cpu area

    Declare the pda as a per cpu variable. This will move the pda
    area to an address accessible by the x86_64 per cpu macros.
    Subtraction of __per_cpu_start will make the offset based from
    the beginning of the per cpu area.  Since %gs is pointing to the
    pda, it will then also point to the per cpu variables and can be
    accessed thusly:

	%gs:[&per_cpu_xxxx - __per_cpu_start]

  * x86_64: Rebase per cpu variables to zero

    Take advantage of the zero-based per cpu area provided above.
    Then we can directly use the x86_32 percpu operations. x86_32
    offsets %fs by __per_cpu_start. x86_64 has %gs pointing directly
    to the pda and the per cpu area thereby allowing access to the
    pda with the x86_64 pda operations and access to the per cpu
    variables using x86_32 percpu operations.  After rebasing
    the access now becomes:

	%gs:[&per_cpu_xxxx]

    Introduces a new DEFINE_PER_CPU_FIRST to locate the percpu
    variable (pda in this case) at the beginning of the percpu
    .data section.

  * x86_64: Cleanup non-smp usage of cpu maps

    Cleanup references to the early cpu maps for the non-SMP configuration
    and remove some functions called for SMP configurations only.

Based on linux-2.6.git + x86.git

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
---
Notes:

(1 - had to disable CONFIG_SIS190 to build)
(2 - no modules)

Configs built and booted:

    x86_64-default
    x86_64-defconfig (2)
    x86_64-nonuma (2)
    x86_64-nosmp (2)
    x86_64-"Ingo Stress Test" (1,2)

Configs built with no errors:

    arm-default
    i386-allyesconfig (1)
    i386-allmodconfig (1)
    i386-defconfig
    i386-nosmp
    ppc-pmac32
    ppc-smp
    sparc64-default
    sparc64-smp
    x86_64-allmodconfig (1)
    x86_64-allyesconfig (1)
    x86_64-maxsmp (NR_CPUS=4k, MAXNODES=512)

Configs with errors prior to patch (preventing full build checkout):

    ia64-sn2: undefined reference to `mem_map' (more)
    ia64-default: (same error)
    ia64-nosmp: `per_cpu__kstat' truncated in .bss (more)
    s390-default: implicit declaration of '__raw_spin_is_contended'
    sparc-default: include/asm/pgtable.h: syntax '___f___swp_entry'

Memory Effects (using x86_64-maxsmp config):

    Note that 1/2MB has been moved from permanent data to
    the init data section, (which is removed after bootup),
    while the per cpu section is only increased by 128 bytes
    per cpu.  Also text size is reduced increasing cache
    performance.

    4k-cpus-before                  4k-cpus-after
       6588928 .data.cacheline_alig     -524288 -7%
	 48072 .data.percpu                +128 +0%
       4804576 .data.read_mostly         -32656 +0%
	854048 .init.data               +557056 +65%
	160382 .init.text                   +62 +0%
       1254214 .rodata                     +274 +0%
       3915552 .text                      -1632 +0%
	 11040 __param                     -272 -2%

       3915552 Text                       -1632 +0%
       1085440 InitData                 +557056 +51%
      11454056 OtherData                -557056 -4%
	 48072 PerCpu                      +128 +0%
      20459748 Total                      -1330 +0%

-- 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] generic: Percpu infrastructure to rebase the per cpu area to zero
  2008-02-01 19:14 [PATCH 0/4] percpu: Optimize percpu accesses travis
@ 2008-02-01 19:14 ` travis
  2008-02-01 19:14 ` [PATCH 2/4] init: move setup of nr_cpu_ids to as early as possible travis
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: travis @ 2008-02-01 19:14 UTC (permalink / raw)
  To: Andrew Morton, Andi Kleen, Ingo Molnar, Thomas Gleixner
  Cc: Jeremy Fitzhardinge, Christoph Lameter, Jack Steiner, linux-mm,
	linux-kernel

[-- Attachment #1: zero_based_infrastructure --]
[-- Type: text/plain, Size: 5517 bytes --]

    * Support an option

	CONFIG_HAVE_ZERO_BASED_PER_CPU

      that makes offsets for per cpu variables to start at zero.

      If a percpu area starts at zero then:

	-  We do not need RELOC_HIDE anymore

	-  Provides for the future capability of architectures providing
	   a per cpu allocator that returns offsets instead of pointers.
	   The offsets would be independent of the processor so that
	   address calculations can be done in a processor independent way.
	   Per cpu instructions can then add the processor specific offset
	   at the last minute possibly in an atomic instruction.

      The data the linker provides is different for zero based percpu segments:

	__per_cpu_load	-> The address at which the percpu area was loaded
	__per_cpu_size	-> The length of the per cpu area

      For non-zero-based percpu segments, the above symbols are adjusted to
      maintain compatibility with existing architectures.

    * Removes the &__per_cpu_x in lockdep. The __per_cpu_x are already
      pointers. There is no need to take the address.

Based on linux-2.6.git + x86.git

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
---
 include/asm-generic/percpu.h      |    5 +++++
 include/asm-generic/sections.h    |   10 ++++++++++
 include/asm-generic/vmlinux.lds.h |   14 ++++++++++++++
 kernel/lockdep.c                  |    4 ++--
 kernel/module.c                   |   10 ++++------
 5 files changed, 35 insertions(+), 8 deletions(-)

--- a/include/asm-generic/percpu.h
+++ b/include/asm-generic/percpu.h
@@ -43,7 +43,12 @@ extern unsigned long __per_cpu_offset[NR
  * Only S390 provides its own means of moving the pointer.
  */
 #ifndef SHIFT_PERCPU_PTR
+#ifndef CONFIG_HAVE_ZERO_BASED_PER_CPU
 #define SHIFT_PERCPU_PTR(__p, __offset)	RELOC_HIDE((__p), (__offset))
+#else
+#define SHIFT_PERCPU_PTR(__p, __offset) \
+	((__typeof(__p))(((void *)(__p)) + (__offset)))
+#endif
 #endif
 
 /*
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -11,7 +11,17 @@ extern char _sinittext[], _einittext[];
 extern char _sextratext[] __attribute__((weak));
 extern char _eextratext[] __attribute__((weak));
 extern char _end[];
+#ifdef CONFIG_HAVE_ZERO_BASED_PER_CPU
+extern char __per_cpu_load[];
+extern char ____per_cpu_size[];
+#define __per_cpu_size ((unsigned long)&____per_cpu_size)
+#define __per_cpu_start ((char *)0)
+#define __per_cpu_end ((char *)__per_cpu_size)
+#else
 extern char __per_cpu_start[], __per_cpu_end[];
+#define __per_cpu_load __per_cpu_start
+#define __per_cpu_size (__per_cpu_end - __per_cpu_start)
+#endif
 extern char __kprobes_text_start[], __kprobes_text_end[];
 extern char __initdata_begin[], __initdata_end[];
 extern char __start_rodata[], __end_rodata[];
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -341,6 +341,19 @@
   	*(.initcall7.init)						\
   	*(.initcall7s.init)
 
+#ifdef CONFIG_HAVE_ZERO_BASED_PER_CPU
+#define PERCPU(align)							\
+	. = ALIGN(align);						\
+	percpu : { } :percpu						\
+	__per_cpu_load = .;						\
+	.data.percpu 0 : AT(__per_cpu_load - LOAD_OFFSET) {		\
+		*(.data.percpu)						\
+		*(.data.percpu.shared_aligned)				\
+		____per_cpu_size = .;					\
+	}								\
+	. = __per_cpu_load + ____per_cpu_size;				\
+	data : { } :data
+#else
 #define PERCPU(align)							\
 	. = ALIGN(align);						\
 	__per_cpu_start = .;						\
@@ -349,3 +362,4 @@
 		*(.data.percpu.shared_aligned)				\
 	}								\
 	__per_cpu_end = .;
+#endif
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -609,8 +609,8 @@ static int static_obj(void *obj)
 	 * percpu var?
 	 */
 	for_each_possible_cpu(i) {
-		start = (unsigned long) &__per_cpu_start + per_cpu_offset(i);
-		end   = (unsigned long) &__per_cpu_start + PERCPU_ENOUGH_ROOM
+		start = (unsigned long) __per_cpu_start + per_cpu_offset(i);
+		end   = (unsigned long) __per_cpu_start + PERCPU_ENOUGH_ROOM
 					+ per_cpu_offset(i);
 
 		if ((addr >= start) && (addr < end))
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -45,6 +45,7 @@
 #include <asm/uaccess.h>
 #include <asm/semaphore.h>
 #include <asm/cacheflush.h>
+#include <asm/sections.h>
 #include <linux/license.h>
 
 #if 0
@@ -344,9 +345,6 @@ static inline unsigned int block_size(in
 	return val;
 }
 
-/* Created by linker magic */
-extern char __per_cpu_start[], __per_cpu_end[];
-
 static void *percpu_modalloc(unsigned long size, unsigned long align,
 			     const char *name)
 {
@@ -360,7 +358,7 @@ static void *percpu_modalloc(unsigned lo
 		align = PAGE_SIZE;
 	}
 
-	ptr = __per_cpu_start;
+	ptr = __per_cpu_load;
 	for (i = 0; i < pcpu_num_used; ptr += block_size(pcpu_size[i]), i++) {
 		/* Extra for alignment requirement. */
 		extra = ALIGN((unsigned long)ptr, align) - (unsigned long)ptr;
@@ -395,7 +393,7 @@ static void *percpu_modalloc(unsigned lo
 static void percpu_modfree(void *freeme)
 {
 	unsigned int i;
-	void *ptr = __per_cpu_start + block_size(pcpu_size[0]);
+	void *ptr = __per_cpu_load + block_size(pcpu_size[0]);
 
 	/* First entry is core kernel percpu data. */
 	for (i = 1; i < pcpu_num_used; ptr += block_size(pcpu_size[i]), i++) {
@@ -446,7 +444,7 @@ static int percpu_modinit(void)
 	pcpu_size = kmalloc(sizeof(pcpu_size[0]) * pcpu_num_allocated,
 			    GFP_KERNEL);
 	/* Static in-kernel percpu data (used). */
-	pcpu_size[0] = -(__per_cpu_end-__per_cpu_start);
+	pcpu_size[0] = -__per_cpu_size;
 	/* Free room. */
 	pcpu_size[1] = PERCPU_ENOUGH_ROOM + pcpu_size[0];
 	if (pcpu_size[1] < 0) {

-- 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/4] init: move setup of nr_cpu_ids to as early as possible
  2008-02-01 19:14 [PATCH 0/4] percpu: Optimize percpu accesses travis
  2008-02-01 19:14 ` [PATCH 1/4] generic: Percpu infrastructure to rebase the per cpu area to zero travis
@ 2008-02-01 19:14 ` travis
  2008-02-01 19:14 ` [PATCH 3/4] x86_64: Fold pda into per cpu area travis
  2008-02-01 19:14 ` [PATCH 4/4] x86_64: Cleanup non-smp usage of cpu maps travis
  3 siblings, 0 replies; 10+ messages in thread
From: travis @ 2008-02-01 19:14 UTC (permalink / raw)
  To: Andrew Morton, Andi Kleen, Ingo Molnar, Thomas Gleixner
  Cc: Jeremy Fitzhardinge, Christoph Lameter, Jack Steiner, linux-mm,
	linux-kernel

[-- Attachment #1: mv-set-nr_cpu_ids --]
[-- Type: text/plain, Size: 2251 bytes --]

Move the setting of nr_cpu_ids from sched_init() to init/main.c,
so that it's available as early as possible.

Based on the linux-2.6.git + x86.git

Signed-off-by: Mike Travis <travis@sgi.com>
---
 init/main.c    |   21 +++++++++++++++++++++
 kernel/sched.c |    7 -------
 2 files changed, 21 insertions(+), 7 deletions(-)

--- a/init/main.c
+++ b/init/main.c
@@ -363,10 +363,30 @@ static void __init smp_init(void)
 #endif
 
 static inline void setup_per_cpu_areas(void) { }
+static inline void setup_nr_cpu_ids(void) { }
 static inline void smp_prepare_cpus(unsigned int maxcpus) { }
 
 #else
 
+/*
+ * Setup number of possible processor ids.
+ * This is different than setup_max_cpus as it accounts
+ * for zero bits embedded between one bits in the cpu
+ * possible map due to disabled cpu cores.
+ */
+int nr_cpu_ids __read_mostly = NR_CPUS;
+EXPORT_SYMBOL(nr_cpu_ids);
+
+static void __init setup_nr_cpu_ids(void)
+{
+	int cpu, highest_cpu = 0;
+
+	for_each_possible_cpu(cpu)
+		highest_cpu = cpu;
+
+	nr_cpu_ids = highest_cpu + 1;
+}
+
 #ifndef CONFIG_HAVE_SETUP_PER_CPU_AREA
 unsigned long __per_cpu_offset[NR_CPUS] __read_mostly;
 
@@ -542,6 +562,7 @@ asmlinkage void __init start_kernel(void
 	setup_arch(&command_line);
 	setup_command_line(command_line);
 	unwind_setup();
+	setup_nr_cpu_ids();
 	setup_per_cpu_areas();
 	smp_prepare_boot_cpu();	/* arch-specific boot-cpu hooks */
 
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -5925,10 +5925,6 @@ void __init migration_init(void)
 
 #ifdef CONFIG_SMP
 
-/* Number of possible processor ids */
-int nr_cpu_ids __read_mostly = NR_CPUS;
-EXPORT_SYMBOL(nr_cpu_ids);
-
 #ifdef CONFIG_SCHED_DEBUG
 
 static int sched_domain_debug_one(struct sched_domain *sd, int cpu, int level)
@@ -7161,7 +7157,6 @@ static void init_tg_rt_entry(struct rq *
 
 void __init sched_init(void)
 {
-	int highest_cpu = 0;
 	int i, j;
 
 #ifdef CONFIG_SMP
@@ -7213,7 +7208,6 @@ void __init sched_init(void)
 #endif
 		init_rq_hrtick(rq);
 		atomic_set(&rq->nr_iowait, 0);
-		highest_cpu = i;
 	}
 
 	set_load_weight(&init_task);
@@ -7223,7 +7217,6 @@ void __init sched_init(void)
 #endif
 
 #ifdef CONFIG_SMP
-	nr_cpu_ids = highest_cpu + 1;
 	open_softirq(SCHED_SOFTIRQ, run_rebalance_domains, NULL);
 #endif
 

-- 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 3/4] x86_64: Fold pda into per cpu area
  2008-02-01 19:14 [PATCH 0/4] percpu: Optimize percpu accesses travis
  2008-02-01 19:14 ` [PATCH 1/4] generic: Percpu infrastructure to rebase the per cpu area to zero travis
  2008-02-01 19:14 ` [PATCH 2/4] init: move setup of nr_cpu_ids to as early as possible travis
@ 2008-02-01 19:14 ` travis
  2008-02-15 20:16   ` Ingo Molnar
  2008-02-01 19:14 ` [PATCH 4/4] x86_64: Cleanup non-smp usage of cpu maps travis
  3 siblings, 1 reply; 10+ messages in thread
From: travis @ 2008-02-01 19:14 UTC (permalink / raw)
  To: Andrew Morton, Andi Kleen, Ingo Molnar, Thomas Gleixner
  Cc: Jeremy Fitzhardinge, Christoph Lameter, Jack Steiner, linux-mm,
	linux-kernel

[-- Attachment #1: x86_64_fold_pda --]
[-- Type: text/plain, Size: 12490 bytes --]

  * Declare the pda as a per cpu variable. This will move the pda area
    to an address accessible by the x86_64 per cpu macros.  Subtraction
    of __per_cpu_start will make the offset based from the beginning
    of the per cpu area.  Since %gs is pointing to the pda, it will
    then also point to the per cpu variables and can be accessed thusly:

	%gs:[&per_cpu_xxxx - __per_cpu_start]

  * The boot_pdas are only needed in head64.c so move the declaration
    over there.  And since the boot_cpu_pda is only used during
    bootup and then copied to the per_cpu areas during init, it is
    then removable.  In addition, the initial cpu_pda pointer table
    is reallocated to be the correct size for the number of cpus.

  * Remove the code that allocates special pda data structures.
    Since the percpu area is currently maintained for all possible
    cpus then the pda regions will stay intact in case cpus are
    hotplugged off and then back on.

  * Relocate the x86_64 percpu variables to begin at zero. Then
    we can directly use the x86_32 percpu operations. x86_32
    offsets %fs by __per_cpu_start. x86_64 has %gs pointing
    directly to the pda and the per cpu area thereby allowing
    access to the pda with the x86_64 pda operations and access
    to the per cpu variables using x86_32 percpu operations.

  * Introduces a new DEFINE_PER_CPU_FIRST to locate the percpu
    variable (cpu_pda in this case) at the beginning of the percpu
    .data section.

  * This also supports further integration of x86_32/64.

Based on linux-2.6.git + x86.git

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
---
 arch/x86/Kconfig                  |    3 +
 arch/x86/kernel/head64.c          |   41 +++++++++++++++++++++++
 arch/x86/kernel/setup64.c         |   67 +++++++++++++++++++++++---------------
 arch/x86/kernel/smpboot_64.c      |   16 ---------
 arch/x86/kernel/vmlinux_64.lds.S  |    1 
 include/asm-generic/vmlinux.lds.h |    2 +
 include/asm-x86/pda.h             |   13 +++++--
 include/asm-x86/percpu.h          |   33 +++++++++++-------
 include/linux/percpu.h            |    9 ++++-
 9 files changed, 126 insertions(+), 59 deletions(-)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -103,6 +103,9 @@ config GENERIC_TIME_VSYSCALL
 config HAVE_SETUP_PER_CPU_AREA
 	def_bool X86_64
 
+config HAVE_ZERO_BASED_PER_CPU
+	def_bool X86_64
+
 config ARCH_SUPPORTS_OPROFILE
 	bool
 	default y
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -11,6 +11,7 @@
 #include <linux/string.h>
 #include <linux/percpu.h>
 #include <linux/start_kernel.h>
+#include <linux/bootmem.h>
 
 #include <asm/processor.h>
 #include <asm/proto.h>
@@ -23,6 +24,12 @@
 #include <asm/kdebug.h>
 #include <asm/e820.h>
 
+#ifdef CONFIG_SMP
+/* Only used before the per cpu areas are setup. */
+static struct x8664_pda boot_cpu_pda[NR_CPUS] __initdata;
+static struct x8664_pda *_cpu_pda_init[NR_CPUS] __initdata;
+#endif
+
 static void __init zap_identity_mappings(void)
 {
 	pgd_t *pgd = pgd_offset_k(0UL);
@@ -99,8 +106,14 @@ void __init x86_64_start_kernel(char * r
 
 	early_printk("Kernel alive\n");
 
+#ifdef CONFIG_SMP
+	_cpu_pda = (void *)_cpu_pda_init;
  	for (i = 0; i < NR_CPUS; i++)
  		cpu_pda(i) = &boot_cpu_pda[i];
+#endif
+
+	/* setup percpu segment offset for cpu 0 */
+	cpu_pda(0)->data_offset = (unsigned long)__per_cpu_load;
 
 	pda_init(0);
 	copy_bootdata(__va(real_mode_data));
@@ -125,3 +138,31 @@ void __init x86_64_start_kernel(char * r
 
 	start_kernel();
 }
+
+#ifdef	CONFIG_SMP
+/*
+ * Remove initial boot_cpu_pda array and cpu_pda pointer table.
+ *
+ * This depends on setup_per_cpu_areas relocating the pda to the beginning
+ * of the per_cpu area so that (_cpu_pda[i] != &boot_cpu_pda[i]).  If it
+ * is equal then the new pda has not been setup for this cpu, and the pda
+ * table will have a NULL address for this cpu.
+ */
+void __init x86_64_cleanup_pda(void)
+{
+	int i;
+
+	_cpu_pda = alloc_bootmem_low(nr_cpu_ids * sizeof(void *));
+
+	if (!_cpu_pda)
+		panic("Cannot allocate cpu pda table\n");
+
+	/* cpu_pda() now points to allocated cpu_pda_table */
+
+	for (i = 0; i < NR_CPUS; i++)
+ 		if (_cpu_pda_init[i] == &boot_cpu_pda[i])
+			cpu_pda(i) = NULL;
+		else
+			cpu_pda(i) = _cpu_pda_init[i];
+}
+#endif
--- a/arch/x86/kernel/setup64.c
+++ b/arch/x86/kernel/setup64.c
@@ -32,9 +32,13 @@ struct boot_params boot_params;
 
 cpumask_t cpu_initialized __cpuinitdata = CPU_MASK_NONE;
 
-struct x8664_pda *_cpu_pda[NR_CPUS] __read_mostly;
+#ifdef CONFIG_SMP
+struct x8664_pda **_cpu_pda __read_mostly;
 EXPORT_SYMBOL(_cpu_pda);
-struct x8664_pda boot_cpu_pda[NR_CPUS] __cacheline_aligned;
+#endif
+
+DEFINE_PER_CPU_FIRST(struct x8664_pda, pda);
+EXPORT_PER_CPU_SYMBOL(pda);
 
 struct desc_ptr idt_descr = { 256 * 16 - 1, (unsigned long) idt_table };
 
@@ -95,22 +99,14 @@ static void __init setup_per_cpu_maps(vo
 	int cpu;
 
 	for_each_possible_cpu(cpu) {
-#ifdef CONFIG_SMP
-		if (per_cpu_offset(cpu)) {
-#endif
-			per_cpu(x86_cpu_to_apicid, cpu) =
-						x86_cpu_to_apicid_init[cpu];
-			per_cpu(x86_bios_cpu_apicid, cpu) =
-						x86_bios_cpu_apicid_init[cpu];
+		per_cpu(x86_cpu_to_apicid, cpu) =
+					x86_cpu_to_apicid_init[cpu];
+
+		per_cpu(x86_bios_cpu_apicid, cpu) =
+					x86_bios_cpu_apicid_init[cpu];
 #ifdef CONFIG_NUMA
-			per_cpu(x86_cpu_to_node_map, cpu) =
-						x86_cpu_to_node_map_init[cpu];
-#endif
-#ifdef CONFIG_SMP
-		}
-		else
-			printk(KERN_NOTICE "per_cpu_offset zero for cpu %d\n",
-									cpu);
+		per_cpu(x86_cpu_to_node_map, cpu) =
+					x86_cpu_to_node_map_init[cpu];
 #endif
 	}
 
@@ -139,25 +135,46 @@ void __init setup_per_cpu_areas(void)
 	/* Copy section for each CPU (we discard the original) */
 	size = PERCPU_ENOUGH_ROOM;
 
-	printk(KERN_INFO "PERCPU: Allocating %lu bytes of per cpu data\n", size);
-	for_each_cpu_mask (i, cpu_possible_map) {
+	printk(KERN_INFO
+		"PERCPU: Allocating %lu bytes of per cpu data\n", size);
+
+	for_each_possible_cpu(i) {
+
+#ifndef CONFIG_NEED_MULTIPLE_NODES
+		char *ptr = alloc_bootmem_pages(size);
+#else
 		char *ptr;
 
-		if (!NODE_DATA(early_cpu_to_node(i))) {
-			printk("cpu with no node %d, num_online_nodes %d\n",
-			       i, num_online_nodes());
+		if (NODE_DATA(early_cpu_to_node(i)))
+			ptr = alloc_bootmem_pages_node
+				(NODE_DATA(early_cpu_to_node(i)), size);
+
+		else {
+			printk(KERN_INFO
+				"cpu %d has no node, num_online_nodes %d\n",
+				i, num_online_nodes());
 			ptr = alloc_bootmem_pages(size);
-		} else { 
-			ptr = alloc_bootmem_pages_node(NODE_DATA(early_cpu_to_node(i)), size);
 		}
+#endif
 		if (!ptr)
 			panic("Cannot allocate cpu data for CPU %d\n", i);
+
+		memcpy(ptr, __per_cpu_load, __per_cpu_size);
+
+		/* Relocate the pda */
+		memcpy(ptr, cpu_pda(i), sizeof(struct x8664_pda));
+		cpu_pda(i) = (struct x8664_pda *)ptr;
 		cpu_pda(i)->data_offset = ptr - __per_cpu_start;
-		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
 	}
 
 	/* setup percpu data maps early */
 	setup_per_cpu_maps();
+
+	/* clean up early cpu_pda pointer array */
+	x86_64_cleanup_pda();
+
+	/* Fix up pda for this processor .... */
+	pda_init(0);
 } 
 
 void pda_init(int cpu)
--- a/arch/x86/kernel/smpboot_64.c
+++ b/arch/x86/kernel/smpboot_64.c
@@ -566,22 +566,6 @@ static int __cpuinit do_boot_cpu(int cpu
 		return -1;
 	}
 
-	/* Allocate node local memory for AP pdas */
-	if (cpu_pda(cpu) == &boot_cpu_pda[cpu]) {
-		struct x8664_pda *newpda, *pda;
-		int node = cpu_to_node(cpu);
-		pda = cpu_pda(cpu);
-		newpda = kmalloc_node(sizeof (struct x8664_pda), GFP_ATOMIC,
-				      node);
-		if (newpda) {
-			memcpy(newpda, pda, sizeof (struct x8664_pda));
-			cpu_pda(cpu) = newpda;
-		} else
-			printk(KERN_ERR
-		"Could not allocate node local PDA for CPU %d on node %d\n",
-				cpu, node);
-	}
-
 	alternatives_smp_switch(1);
 
 	c_idle.idle = get_idle_for_cpu(cpu);
--- a/arch/x86/kernel/vmlinux_64.lds.S
+++ b/arch/x86/kernel/vmlinux_64.lds.S
@@ -16,6 +16,7 @@ jiffies_64 = jiffies;
 _proxy_pda = 1;
 PHDRS {
 	text PT_LOAD FLAGS(5);	/* R_E */
+	percpu PT_LOAD FLAGS(4);	/* R__ */
 	data PT_LOAD FLAGS(7);	/* RWE */
 	user PT_LOAD FLAGS(7);	/* RWE */
 	data.init PT_LOAD FLAGS(7);	/* RWE */
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -347,6 +347,7 @@
 	percpu : { } :percpu						\
 	__per_cpu_load = .;						\
 	.data.percpu 0 : AT(__per_cpu_load - LOAD_OFFSET) {		\
+		*(.data.percpu.first)					\
 		*(.data.percpu)						\
 		*(.data.percpu.shared_aligned)				\
 		____per_cpu_size = .;					\
@@ -358,6 +359,7 @@
 	. = ALIGN(align);						\
 	__per_cpu_start = .;						\
 	.data.percpu  : AT(ADDR(.data.percpu) - LOAD_OFFSET) {		\
+		*(.data.percpu.first)					\
 		*(.data.percpu)						\
 		*(.data.percpu.shared_aligned)				\
 	}								\
--- a/include/asm-x86/pda.h
+++ b/include/asm-x86/pda.h
@@ -38,11 +38,16 @@ struct x8664_pda {
 	unsigned irq_spurious_count;
 } ____cacheline_aligned_in_smp;
 
-extern struct x8664_pda *_cpu_pda[];
-extern struct x8664_pda boot_cpu_pda[];
-extern void pda_init(int);
-
+#ifdef CONFIG_SMP
 #define cpu_pda(i) (_cpu_pda[i])
+extern struct x8664_pda **_cpu_pda;
+extern void x86_64_cleanup_pda(void);
+#else
+#define	cpu_pda(i)	(&per_cpu(pda, i))
+static inline void x86_64_cleanup_pda(void) { }
+#endif
+
+extern void pda_init(int);
 
 /*
  * There is no fast way to get the base address of the PDA, all the accesses
--- a/include/asm-x86/percpu.h
+++ b/include/asm-x86/percpu.h
@@ -13,13 +13,19 @@
 #include <asm/pda.h>
 
 #define __per_cpu_offset(cpu) (cpu_pda(cpu)->data_offset)
-#define __my_cpu_offset read_pda(data_offset)
-
 #define per_cpu_offset(x) (__per_cpu_offset(x))
 
+#define __my_cpu_offset read_pda(data_offset)
+#define __percpu_seg "%%gs:"
+
+#else
+#define __percpu_seg ""
 #endif
 #include <asm-generic/percpu.h>
 
+/* Calculate the offset to use with the segment register */
+#define seg_offset(name)   per_cpu_var(name)
+
 DECLARE_PER_CPU(struct x8664_pda, pda);
 
 #else /* CONFIG_X86_64 */
@@ -64,16 +70,11 @@ DECLARE_PER_CPU(struct x8664_pda, pda);
  *    PER_CPU(cpu_gdt_descr, %ebx)
  */
 #ifdef CONFIG_SMP
-
 #define __my_cpu_offset x86_read_percpu(this_cpu_off)
-
 /* fs segment starts at (positive) offset == __per_cpu_offset[cpu] */
 #define __percpu_seg "%%fs:"
-
 #else  /* !SMP */
-
 #define __percpu_seg ""
-
 #endif	/* SMP */
 
 #include <asm-generic/percpu.h>
@@ -81,6 +82,13 @@ DECLARE_PER_CPU(struct x8664_pda, pda);
 /* We can use this directly for local CPU (faster). */
 DECLARE_PER_CPU(unsigned long, this_cpu_off);
 
+#define seg_offset(name)	per_cpu_var(name)
+
+#endif /* __ASSEMBLY__ */
+#endif /* !CONFIG_X86_64 */
+
+#ifndef __ASSEMBLY__
+
 /* For arch-specific code, we can use direct single-insn ops (they
  * don't give an lvalue though). */
 extern void __bad_percpu_size(void);
@@ -132,11 +140,10 @@ extern void __bad_percpu_size(void);
 		}						\
 		ret__; })
 
-#define x86_read_percpu(var) percpu_from_op("mov", per_cpu__##var)
-#define x86_write_percpu(var,val) percpu_to_op("mov", per_cpu__##var, val)
-#define x86_add_percpu(var,val) percpu_to_op("add", per_cpu__##var, val)
-#define x86_sub_percpu(var,val) percpu_to_op("sub", per_cpu__##var, val)
-#define x86_or_percpu(var,val) percpu_to_op("or", per_cpu__##var, val)
+#define x86_read_percpu(var) percpu_from_op("mov", seg_offset(var))
+#define x86_write_percpu(var,val) percpu_to_op("mov", seg_offset(var), val)
+#define x86_add_percpu(var,val) percpu_to_op("add", seg_offset(var), val)
+#define x86_sub_percpu(var,val) percpu_to_op("sub", seg_offset(var), val)
+#define x86_or_percpu(var,val) percpu_to_op("or", seg_offset(var), val)
 #endif /* !__ASSEMBLY__ */
-#endif /* !CONFIG_X86_64 */
 #endif /* _ASM_X86_PERCPU_H_ */
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -18,11 +18,18 @@
 	__attribute__((__section__(".data.percpu.shared_aligned")))	\
 	PER_CPU_ATTRIBUTES __typeof__(type) per_cpu__##name		\
 	____cacheline_aligned_in_smp
+
+#define DEFINE_PER_CPU_FIRST(type, name)				\
+	__attribute__((__section__(".data.percpu.first")))		\
+	PER_CPU_ATTRIBUTES __typeof__(type) per_cpu__##name
 #else
 #define DEFINE_PER_CPU(type, name)					\
 	PER_CPU_ATTRIBUTES __typeof__(type) per_cpu__##name
 
-#define DEFINE_PER_CPU_SHARED_ALIGNED(type, name)		      \
+#define DEFINE_PER_CPU_SHARED_ALIGNED(type, name)			\
+	DEFINE_PER_CPU(type, name)
+
+#define DEFINE_PER_CPU_FIRST(type, name)				\
 	DEFINE_PER_CPU(type, name)
 #endif
 

-- 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 4/4] x86_64: Cleanup non-smp usage of cpu maps
  2008-02-01 19:14 [PATCH 0/4] percpu: Optimize percpu accesses travis
                   ` (2 preceding siblings ...)
  2008-02-01 19:14 ` [PATCH 3/4] x86_64: Fold pda into per cpu area travis
@ 2008-02-01 19:14 ` travis
  2008-02-15 20:17   ` Ingo Molnar
  3 siblings, 1 reply; 10+ messages in thread
From: travis @ 2008-02-01 19:14 UTC (permalink / raw)
  To: Andrew Morton, Andi Kleen, Ingo Molnar, Thomas Gleixner
  Cc: Jeremy Fitzhardinge, Christoph Lameter, Jack Steiner, linux-mm,
	linux-kernel

[-- Attachment #1: cleanup_nonsmp_maps --]
[-- Type: text/plain, Size: 6165 bytes --]

Cleanup references to the early cpu maps for the non-SMP configuration
and remove some functions called for SMP configurations only.

Based on linux-2.6.git + x86.git

Signed-off-by: Mike Travis <travis@sgi.com>
---
 arch/x86/kernel/genapic_64.c |    2 ++
 arch/x86/kernel/mpparse_64.c |    2 ++
 arch/x86/kernel/setup64.c    |    3 +++
 arch/x86/kernel/smpboot_32.c |    2 ++
 arch/x86/mm/numa_64.c        |   12 ++++++------
 include/asm-x86/pda.h        |    1 -
 include/asm-x86/smp_32.h     |    4 ++++
 include/asm-x86/smp_64.h     |    5 +++++
 include/asm-x86/topology.h   |   16 ++++++++++++----
 9 files changed, 36 insertions(+), 11 deletions(-)

--- a/arch/x86/kernel/genapic_64.c
+++ b/arch/x86/kernel/genapic_64.c
@@ -25,9 +25,11 @@
 #endif
 
 /* which logical CPU number maps to which CPU (physical APIC ID) */
+#ifdef CONFIG_SMP
 u16 x86_cpu_to_apicid_init[NR_CPUS] __initdata
 					= { [0 ... NR_CPUS-1] = BAD_APICID };
 void *x86_cpu_to_apicid_early_ptr;
+#endif
 DEFINE_PER_CPU(u16, x86_cpu_to_apicid) = BAD_APICID;
 EXPORT_PER_CPU_SYMBOL(x86_cpu_to_apicid);
 
--- a/arch/x86/kernel/mpparse_64.c
+++ b/arch/x86/kernel/mpparse_64.c
@@ -67,9 +67,11 @@ unsigned disabled_cpus __cpuinitdata;
 /* Bitmask of physically existing CPUs */
 physid_mask_t phys_cpu_present_map = PHYSID_MASK_NONE;
 
+#ifdef CONFIG_SMP
 u16 x86_bios_cpu_apicid_init[NR_CPUS] __initdata
 				= { [0 ... NR_CPUS-1] = BAD_APICID };
 void *x86_bios_cpu_apicid_early_ptr;
+#endif
 DEFINE_PER_CPU(u16, x86_bios_cpu_apicid) = BAD_APICID;
 EXPORT_PER_CPU_SYMBOL(x86_bios_cpu_apicid);
 
--- a/arch/x86/kernel/setup64.c
+++ b/arch/x86/kernel/setup64.c
@@ -89,6 +89,8 @@ static int __init nonx32_setup(char *str
 }
 __setup("noexec32=", nonx32_setup);
 
+
+#ifdef CONFIG_SMP
 /*
  * Copy data used in early init routines from the initial arrays to the
  * per cpu data areas.  These arrays then become expendable and the
@@ -176,6 +178,7 @@ void __init setup_per_cpu_areas(void)
 	/* Fix up pda for this processor .... */
 	pda_init(0);
 } 
+#endif /* CONFIG_SMP */
 
 void pda_init(int cpu)
 { 
--- a/arch/x86/kernel/smpboot_32.c
+++ b/arch/x86/kernel/smpboot_32.c
@@ -92,9 +92,11 @@ DEFINE_PER_CPU_SHARED_ALIGNED(struct cpu
 EXPORT_PER_CPU_SYMBOL(cpu_info);
 
 /* which logical CPU number maps to which CPU (physical APIC ID) */
+#ifdef CONFIG_SMP
 u8 x86_cpu_to_apicid_init[NR_CPUS] __initdata =
 			{ [0 ... NR_CPUS-1] = BAD_APICID };
 void *x86_cpu_to_apicid_early_ptr;
+#endif
 DEFINE_PER_CPU(u8, x86_cpu_to_apicid) = BAD_APICID;
 EXPORT_PER_CPU_SYMBOL(x86_cpu_to_apicid);
 
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -31,13 +31,15 @@ bootmem_data_t plat_node_bdata[MAX_NUMNO
 
 struct memnode memnode;
 
+#ifdef CONFIG_SMP
 int x86_cpu_to_node_map_init[NR_CPUS] = {
 	[0 ... NR_CPUS-1] = NUMA_NO_NODE
 };
 void *x86_cpu_to_node_map_early_ptr;
+EXPORT_SYMBOL(x86_cpu_to_node_map_early_ptr);
+#endif
 DEFINE_PER_CPU(int, x86_cpu_to_node_map) = NUMA_NO_NODE;
 EXPORT_PER_CPU_SYMBOL(x86_cpu_to_node_map);
-EXPORT_SYMBOL(x86_cpu_to_node_map_early_ptr);
 
 s16 apicid_to_node[MAX_LOCAL_APIC] __cpuinitdata = {
 	[0 ... MAX_LOCAL_APIC-1] = NUMA_NO_NODE
@@ -536,13 +538,11 @@ void __cpuinit numa_set_node(int cpu, in
 {
 	int *cpu_to_node_map = x86_cpu_to_node_map_early_ptr;
 
-	cpu_pda(cpu)->nodenumber = node;
-
-	if(cpu_to_node_map)
+	if (cpu_to_node_map)
 		cpu_to_node_map[cpu] = node;
-	else if(per_cpu_offset(cpu))
+	else if (per_cpu_offset(cpu))
 		per_cpu(x86_cpu_to_node_map, cpu) = node;
-	else
+ 	else
 		Dprintk(KERN_INFO "Setting node for non-present cpu %d\n", cpu);
 }
 
--- a/include/asm-x86/pda.h
+++ b/include/asm-x86/pda.h
@@ -22,7 +22,6 @@ struct x8664_pda {
 					   offset 40!!! */
 #endif
 	char *irqstackptr;
-	unsigned int nodenumber;	/* number of current node */
 	unsigned int __softirq_pending;
 	unsigned int __nmi_count;	/* number of NMI on this CPUs */
 	short mmu_state;
--- a/include/asm-x86/smp_32.h
+++ b/include/asm-x86/smp_32.h
@@ -29,8 +29,12 @@ extern void unlock_ipi_call_lock(void);
 extern void (*mtrr_hook) (void);
 extern void zap_low_mappings (void);
 
+#ifdef CONFIG_SMP
 extern u8 __initdata x86_cpu_to_apicid_init[];
 extern void *x86_cpu_to_apicid_early_ptr;
+#else
+#define x86_cpu_to_apicid_early_ptr NULL
+#endif
 
 DECLARE_PER_CPU(cpumask_t, cpu_sibling_map);
 DECLARE_PER_CPU(cpumask_t, cpu_core_map);
--- a/include/asm-x86/smp_64.h
+++ b/include/asm-x86/smp_64.h
@@ -26,10 +26,15 @@ extern void unlock_ipi_call_lock(void);
 extern int smp_call_function_mask(cpumask_t mask, void (*func)(void *),
 				  void *info, int wait);
 
+#ifdef CONFIG_SMP
 extern u16 __initdata x86_cpu_to_apicid_init[];
 extern u16 __initdata x86_bios_cpu_apicid_init[];
 extern void *x86_cpu_to_apicid_early_ptr;
 extern void *x86_bios_cpu_apicid_early_ptr;
+#else
+#define x86_cpu_to_apicid_early_ptr NULL
+#define x86_bios_cpu_apicid_early_ptr NULL
+#endif
 
 DECLARE_PER_CPU(cpumask_t, cpu_sibling_map);
 DECLARE_PER_CPU(cpumask_t, cpu_core_map);
--- a/include/asm-x86/topology.h
+++ b/include/asm-x86/topology.h
@@ -35,8 +35,14 @@ extern int cpu_to_node_map[];
 
 #else
 DECLARE_PER_CPU(int, x86_cpu_to_node_map);
+
+#ifdef CONFIG_SMP
 extern int x86_cpu_to_node_map_init[];
 extern void *x86_cpu_to_node_map_early_ptr;
+#else
+#define x86_cpu_to_node_map_early_ptr NULL
+#endif
+
 /* Returns the number of the current Node. */
 #define numa_node_id()		(early_cpu_to_node(raw_smp_processor_id()))
 #endif
@@ -54,6 +60,8 @@ static inline int cpu_to_node(int cpu)
 }
 
 #else /* CONFIG_X86_64 */
+
+#ifdef CONFIG_SMP
 static inline int early_cpu_to_node(int cpu)
 {
 	int *cpu_to_node_map = x86_cpu_to_node_map_early_ptr;
@@ -65,6 +73,9 @@ static inline int early_cpu_to_node(int 
 	else
 		return NUMA_NO_NODE;
 }
+#else
+#define	early_cpu_to_node(cpu)	cpu_to_node(cpu)
+#endif
 
 static inline int cpu_to_node(int cpu)
 {
@@ -76,10 +87,7 @@ static inline int cpu_to_node(int cpu)
 		return ((int *)x86_cpu_to_node_map_early_ptr)[cpu];
 	}
 #endif
-	if (per_cpu_offset(cpu))
-		return per_cpu(x86_cpu_to_node_map, cpu);
-	else
-		return NUMA_NO_NODE;
+	return per_cpu(x86_cpu_to_node_map, cpu);
 }
 #endif /* CONFIG_X86_64 */
 

-- 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/4] x86_64: Fold pda into per cpu area
  2008-02-01 19:14 ` [PATCH 3/4] x86_64: Fold pda into per cpu area travis
@ 2008-02-15 20:16   ` Ingo Molnar
  2008-02-15 22:43     ` Christoph Lameter
  2008-02-17  6:22     ` Yinghai Lu
  0 siblings, 2 replies; 10+ messages in thread
From: Ingo Molnar @ 2008-02-15 20:16 UTC (permalink / raw)
  To: travis
  Cc: Andrew Morton, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge,
	Christoph Lameter, Jack Steiner, linux-mm, linux-kernel


* travis@sgi.com <travis@sgi.com> wrote:

>  include/asm-generic/vmlinux.lds.h |    2 +
>  include/linux/percpu.h            |    9 ++++-

couldnt these two generic bits be done separately (perhaps a preparatory 
but otherwise NOP patch pushed upstream straight away) to make 
subsequent patches only touch x86 architecture files?

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] x86_64: Cleanup non-smp usage of cpu maps
  2008-02-01 19:14 ` [PATCH 4/4] x86_64: Cleanup non-smp usage of cpu maps travis
@ 2008-02-15 20:17   ` Ingo Molnar
  0 siblings, 0 replies; 10+ messages in thread
From: Ingo Molnar @ 2008-02-15 20:17 UTC (permalink / raw)
  To: travis
  Cc: Andrew Morton, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge,
	Christoph Lameter, Jack Steiner, linux-mm, linux-kernel


* travis@sgi.com <travis@sgi.com> wrote:

> Cleanup references to the early cpu maps for the non-SMP configuration 
> and remove some functions called for SMP configurations only.

thanks, applied.

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/4] x86_64: Fold pda into per cpu area
  2008-02-15 20:16   ` Ingo Molnar
@ 2008-02-15 22:43     ` Christoph Lameter
  2008-02-17  6:22     ` Yinghai Lu
  1 sibling, 0 replies; 10+ messages in thread
From: Christoph Lameter @ 2008-02-15 22:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: travis, Andrew Morton, Andi Kleen, Thomas Gleixner,
	Jeremy Fitzhardinge, Jack Steiner, linux-mm, linux-kernel

On Fri, 15 Feb 2008, Ingo Molnar wrote:

> 
> * travis@sgi.com <travis@sgi.com> wrote:
> 
> >  include/asm-generic/vmlinux.lds.h |    2 +
> >  include/linux/percpu.h            |    9 ++++-
> 
> couldnt these two generic bits be done separately (perhaps a preparatory 
> but otherwise NOP patch pushed upstream straight away) to make 
> subsequent patches only touch x86 architecture files?

Yes those modifications could be folded into the generic patch for zero 
based percpu configurations.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/4] x86_64: Fold pda into per cpu area
  2008-02-15 20:16   ` Ingo Molnar
  2008-02-15 22:43     ` Christoph Lameter
@ 2008-02-17  6:22     ` Yinghai Lu
  2008-02-17  7:36       ` Yinghai Lu
  1 sibling, 1 reply; 10+ messages in thread
From: Yinghai Lu @ 2008-02-17  6:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: travis, Andrew Morton, Andi Kleen, Thomas Gleixner,
	Jeremy Fitzhardinge, Christoph Lameter, Jack Steiner, linux-mm,
	linux-kernel

On Feb 15, 2008 12:16 PM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * travis@sgi.com <travis@sgi.com> wrote:
>
> >  include/asm-generic/vmlinux.lds.h |    2 +
> >  include/linux/percpu.h            |    9 ++++-
>
> couldnt these two generic bits be done separately (perhaps a preparatory
> but otherwise NOP patch pushed upstream straight away) to make
> subsequent patches only touch x86 architecture files?

this patch need to apply to mainline asap.

or you need revert to the patch about include/asm-x86/percpu.h

+#ifdef CONFIG_X86_64
+#include <linux/compiler.h>
+
+/* Same as asm-generic/percpu.h, except that we store the per cpu offset
+   in the PDA. Longer term the PDA and every per cpu variable
+   should be just put into a single section and referenced directly
+   from %gs */
+
+#ifdef CONFIG_SMP
+#include <asm/pda.h>
+
+#define __per_cpu_offset(cpu) (cpu_pda(cpu)->data_offset)
+#define __my_cpu_offset read_pda(data_offset)
+
+#define per_cpu_offset(x) (__per_cpu_offset(x))
+
 #endif
+#include <asm-generic/percpu.h>
+
+DECLARE_PER_CPU(struct x8664_pda, pda);
+
+#else /* CONFIG_X86_64 */

because current tree
in setup_per_cpu_areas will have
     cpu_pda(i)->data_offset = ptr - __per_cpu_start;

but at that time all APs will use cpu_pda for boot cpu...,and APs will
get their pda in do_boot_cpu()

the result is all cpu will have same data_offset, there will share one
per_cpu_data..that is totally wrong!!

that could explain a lot of strange panic ....recently about NUMA...

YH

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/4] x86_64: Fold pda into per cpu area
  2008-02-17  6:22     ` Yinghai Lu
@ 2008-02-17  7:36       ` Yinghai Lu
  0 siblings, 0 replies; 10+ messages in thread
From: Yinghai Lu @ 2008-02-17  7:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: travis, Andrew Morton, Andi Kleen, Thomas Gleixner,
	Jeremy Fitzhardinge, Christoph Lameter, Jack Steiner, linux-mm,
	linux-kernel

On Feb 16, 2008 10:22 PM, Yinghai Lu <yhlu.kernel@gmail.com> wrote:
> On Feb 15, 2008 12:16 PM, Ingo Molnar <mingo@elte.hu> wrote:
> >
> > * travis@sgi.com <travis@sgi.com> wrote:
> >
> > >  include/asm-generic/vmlinux.lds.h |    2 +
> > >  include/linux/percpu.h            |    9 ++++-
> >
> > couldnt these two generic bits be done separately (perhaps a preparatory
> > but otherwise NOP patch pushed upstream straight away) to make
> > subsequent patches only touch x86 architecture files?
>
> this patch need to apply to mainline asap.
>
> or you need revert to the patch about include/asm-x86/percpu.h
>
> +#ifdef CONFIG_X86_64
> +#include <linux/compiler.h>
> +
> +/* Same as asm-generic/percpu.h, except that we store the per cpu offset
> +   in the PDA. Longer term the PDA and every per cpu variable
> +   should be just put into a single section and referenced directly
> +   from %gs */
> +
> +#ifdef CONFIG_SMP
> +#include <asm/pda.h>
> +
> +#define __per_cpu_offset(cpu) (cpu_pda(cpu)->data_offset)
> +#define __my_cpu_offset read_pda(data_offset)
> +
> +#define per_cpu_offset(x) (__per_cpu_offset(x))
> +
>  #endif
> +#include <asm-generic/percpu.h>
> +
> +DECLARE_PER_CPU(struct x8664_pda, pda);
> +
> +#else /* CONFIG_X86_64 */
>
> because current tree
> in setup_per_cpu_areas will have
>      cpu_pda(i)->data_offset = ptr - __per_cpu_start;
>
> but at that time all APs will use cpu_pda for boot cpu...,and APs will
> get their pda in do_boot_cpu()

sorry, boot_cpu_pda is array... so that is safe.

YH

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-02-17  7:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-01 19:14 [PATCH 0/4] percpu: Optimize percpu accesses travis
2008-02-01 19:14 ` [PATCH 1/4] generic: Percpu infrastructure to rebase the per cpu area to zero travis
2008-02-01 19:14 ` [PATCH 2/4] init: move setup of nr_cpu_ids to as early as possible travis
2008-02-01 19:14 ` [PATCH 3/4] x86_64: Fold pda into per cpu area travis
2008-02-15 20:16   ` Ingo Molnar
2008-02-15 22:43     ` Christoph Lameter
2008-02-17  6:22     ` Yinghai Lu
2008-02-17  7:36       ` Yinghai Lu
2008-02-01 19:14 ` [PATCH 4/4] x86_64: Cleanup non-smp usage of cpu maps travis
2008-02-15 20:17   ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).