linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 00/14] Per cpu code simplification
@ 2007-11-27  0:14 Christoph Lameter
  2007-11-27  0:14 ` [patch 01/14] Modules: Handle symbols that have a zero value Christoph Lameter
                   ` (13 more replies)
  0 siblings, 14 replies; 59+ messages in thread
From: Christoph Lameter @ 2007-11-27  0:14 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel

This patchset simplifies the code that arches need to maintain to support
per cpu functionality. Most of the code is moved into arch independent
code. Only a set of minimal definitions is kept for each arch.

The patch also unifies the x86 arch so that there is only a single
asm-x86/percpu.h

-- 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [patch 01/14] Modules: Handle symbols that have a zero value
  2007-11-27  0:14 [patch 00/14] Per cpu code simplification Christoph Lameter
@ 2007-11-27  0:14 ` Christoph Lameter
  2007-11-27  0:14 ` [patch 02/14] Modules: Include sections.h to avoid defining linker variables explicitly Christoph Lameter
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 59+ messages in thread
From: Christoph Lameter @ 2007-11-27  0:14 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, Mathieu Desnoyers

[-- Attachment #1: weaky --]
[-- Type: text/plain, Size: 2630 bytes --]

The module subsystem cannot handle symbols that are zero. If symbols are
present that have a zero value then the module resolver prints out
a message that these symbols are unresolved.

[patch already in mm]

Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 kernel/module.c |   17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

Index: linux-2.6/kernel/module.c
===================================================================
--- linux-2.6.orig/kernel/module.c	2007-11-21 12:58:33.095608448 -0800
+++ linux-2.6/kernel/module.c	2007-11-21 13:00:30.199108674 -0800
@@ -285,7 +285,7 @@ static unsigned long __find_symbol(const
 		}
 	}
 	DEBUGP("Failed to find symbol %s\n", name);
-	return 0;
+	return -ENOENT;
 }
 
 /* Search for module by name: must hold module_mutex. */
@@ -756,7 +756,7 @@ void __symbol_put(const char *symbol)
 	const unsigned long *crc;
 
 	preempt_disable();
-	if (!__find_symbol(symbol, &owner, &crc, 1))
+	if (IS_ERR_VALUE(__find_symbol(symbol, &owner, &crc, 1)))
 		BUG();
 	module_put(owner);
 	preempt_enable();
@@ -902,7 +902,8 @@ static inline int check_modstruct_versio
 	const unsigned long *crc;
 	struct module *owner;
 
-	if (!__find_symbol("struct_module", &owner, &crc, 1))
+	if (IS_ERR_VALUE(__find_symbol("struct_module",
+						&owner, &crc, 1)))
 		BUG();
 	return check_version(sechdrs, versindex, "struct_module", mod,
 			     crc);
@@ -955,7 +956,7 @@ static unsigned long resolve_symbol(Elf_
 		/* use_module can fail due to OOM, or module unloading */
 		if (!check_version(sechdrs, versindex, name, mod, crc) ||
 		    !use_module(mod, owner))
-			ret = 0;
+			ret = -EINVAL;
 	}
 	return ret;
 }
@@ -1348,14 +1349,16 @@ static int verify_export_symbols(struct 
 	const unsigned long *crc;
 
 	for (i = 0; i < mod->num_syms; i++)
-		if (__find_symbol(mod->syms[i].name, &owner, &crc, 1)) {
+		if (!IS_ERR_VALUE(__find_symbol(mod->syms[i].name,
+							&owner, &crc, 1))) {
 			name = mod->syms[i].name;
 			ret = -ENOEXEC;
 			goto dup;
 		}
 
 	for (i = 0; i < mod->num_gpl_syms; i++)
-		if (__find_symbol(mod->gpl_syms[i].name, &owner, &crc, 1)) {
+		if (!IS_ERR_VALUE(__find_symbol(mod->gpl_syms[i].name,
+							&owner, &crc, 1))) {
 			name = mod->gpl_syms[i].name;
 			ret = -ENOEXEC;
 			goto dup;
@@ -1405,7 +1408,7 @@ static int simplify_symbols(Elf_Shdr *se
 					   strtab + sym[i].st_name, mod);
 
 			/* Ok if resolved.  */
-			if (sym[i].st_value != 0)
+			if (!IS_ERR_VALUE(sym[i].st_value))
 				break;
 			/* Ok if weak.  */
 			if (ELF_ST_BIND(sym[i].st_info) == STB_WEAK)

-- 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [patch 02/14] Modules: Include sections.h to avoid defining linker variables explicitly
  2007-11-27  0:14 [patch 00/14] Per cpu code simplification Christoph Lameter
  2007-11-27  0:14 ` [patch 01/14] Modules: Handle symbols that have a zero value Christoph Lameter
@ 2007-11-27  0:14 ` Christoph Lameter
  2007-11-27  0:14 ` [patch 03/14] Modules: Fold percpu_modcopy into module.c and get rid of the macro from hell Christoph Lameter
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 59+ messages in thread
From: Christoph Lameter @ 2007-11-27  0:14 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, Rusty Russell, Andi Kleen

[-- Attachment #1: mod_sections --]
[-- Type: text/plain, Size: 974 bytes --]

Module.c should not define linker variables on its own. We have an include
file for that.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 kernel/module.c |    4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

Index: linux-2.6/kernel/module.c
===================================================================
--- linux-2.6.orig/kernel/module.c	2007-11-21 13:11:28.495858392 -0800
+++ linux-2.6/kernel/module.c	2007-11-21 13:11:29.322858718 -0800
@@ -46,6 +46,7 @@
 #include <asm/semaphore.h>
 #include <asm/cacheflush.h>
 #include <linux/license.h>
+#include <asm/sections.h>
 
 extern int module_sysfs_initialized;
 
@@ -338,9 +339,6 @@ static inline unsigned int block_size(in
 	return val;
 }
 
-/* Created by linker magic */
-extern char __per_cpu_start[], __per_cpu_end[];
-
 static void *percpu_modalloc(unsigned long size, unsigned long align,
 			     const char *name)
 {

-- 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [patch 03/14] Modules: Fold percpu_modcopy into module.c and get rid of the macro from hell
  2007-11-27  0:14 [patch 00/14] Per cpu code simplification Christoph Lameter
  2007-11-27  0:14 ` [patch 01/14] Modules: Handle symbols that have a zero value Christoph Lameter
  2007-11-27  0:14 ` [patch 02/14] Modules: Include sections.h to avoid defining linker variables explicitly Christoph Lameter
@ 2007-11-27  0:14 ` Christoph Lameter
  2007-11-27  0:14 ` [patch 04/14] ia64: Remove the __SMALL_ADDR_AREA attribute for per cpu access Christoph Lameter
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 59+ messages in thread
From: Christoph Lameter @ 2007-11-27  0:14 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, Rusty Russell, Andi Kleen

[-- Attachment #1: modcopy --]
[-- Type: text/plain, Size: 7633 bytes --]

percpu_modcopy() is defined multiple times in arch files. However, the only
user is module.c. Put a static definition into module.c and remove
the definitions from the arch files.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 arch/ia64/kernel/module.c    |   10 ----------
 include/asm-generic/percpu.h |    8 --------
 include/asm-ia64/percpu.h    |    5 -----
 include/asm-powerpc/percpu.h |    9 ---------
 include/asm-s390/percpu.h    |    9 ---------
 include/asm-sparc64/percpu.h |    8 --------
 include/asm-x86/percpu_32.h  |    9 ---------
 include/asm-x86/percpu_64.h  |    9 ---------
 kernel/module.c              |    8 ++++++++
 9 files changed, 8 insertions(+), 67 deletions(-)

Index: linux-2.6/include/asm-generic/percpu.h
===================================================================
--- linux-2.6.orig/include/asm-generic/percpu.h	2007-11-21 13:11:18.430858642 -0800
+++ linux-2.6/include/asm-generic/percpu.h	2007-11-21 13:11:42.871108294 -0800
@@ -26,14 +26,6 @@ extern unsigned long __per_cpu_offset[NR
 #define __get_cpu_var(var) per_cpu(var, smp_processor_id())
 #define __raw_get_cpu_var(var) per_cpu(var, raw_smp_processor_id())
 
-/* A macro to avoid #include hell... */
-#define percpu_modcopy(pcpudst, src, size)			\
-do {								\
-	unsigned int __i;					\
-	for_each_possible_cpu(__i)				\
-		memcpy((pcpudst)+__per_cpu_offset[__i],		\
-		       (src), (size));				\
-} while (0)
 #else /* ! SMP */
 
 #define DEFINE_PER_CPU(type, name) \
Index: linux-2.6/arch/ia64/kernel/module.c
===================================================================
--- linux-2.6.orig/arch/ia64/kernel/module.c	2007-11-21 13:13:06.587858751 -0800
+++ linux-2.6/arch/ia64/kernel/module.c	2007-11-21 13:13:19.527309025 -0800
@@ -941,13 +941,3 @@ module_arch_cleanup (struct module *mod)
 		unw_remove_unwind_table(mod->arch.core_unw_table);
 }
 
-#ifdef CONFIG_SMP
-void
-percpu_modcopy (void *pcpudst, const void *src, unsigned long size)
-{
-	unsigned int i;
-	for_each_possible_cpu(i) {
-		memcpy(pcpudst + __per_cpu_offset[i], src, size);
-	}
-}
-#endif /* CONFIG_SMP */
Index: linux-2.6/include/asm-ia64/percpu.h
===================================================================
--- linux-2.6.orig/include/asm-ia64/percpu.h	2007-11-21 13:12:37.140358213 -0800
+++ linux-2.6/include/asm-ia64/percpu.h	2007-11-21 13:12:55.271731039 -0800
@@ -39,10 +39,6 @@
 	DEFINE_PER_CPU(type, name)
 #endif
 
-/*
- * Pretty much a literal copy of asm-generic/percpu.h, except that percpu_modcopy() is an
- * external routine, to avoid include-hell.
- */
 #ifdef CONFIG_SMP
 
 extern unsigned long __per_cpu_offset[NR_CPUS];
@@ -55,7 +51,6 @@ DECLARE_PER_CPU(unsigned long, local_per
 #define __get_cpu_var(var) (*RELOC_HIDE(&per_cpu__##var, __ia64_per_cpu_var(local_per_cpu_offset)))
 #define __raw_get_cpu_var(var) (*RELOC_HIDE(&per_cpu__##var, __ia64_per_cpu_var(local_per_cpu_offset)))
 
-extern void percpu_modcopy(void *pcpudst, const void *src, unsigned long size);
 extern void setup_per_cpu_areas (void);
 extern void *per_cpu_init(void);
 
Index: linux-2.6/include/asm-powerpc/percpu.h
===================================================================
--- linux-2.6.orig/include/asm-powerpc/percpu.h	2007-11-21 13:14:21.754859049 -0800
+++ linux-2.6/include/asm-powerpc/percpu.h	2007-11-21 13:14:33.651108379 -0800
@@ -30,15 +30,6 @@
 #define __get_cpu_var(var) (*RELOC_HIDE(&per_cpu__##var, __my_cpu_offset()))
 #define __raw_get_cpu_var(var) (*RELOC_HIDE(&per_cpu__##var, local_paca->data_offset))
 
-/* A macro to avoid #include hell... */
-#define percpu_modcopy(pcpudst, src, size)			\
-do {								\
-	unsigned int __i;					\
-	for_each_possible_cpu(__i)				\
-		memcpy((pcpudst)+__per_cpu_offset(__i),		\
-		       (src), (size));				\
-} while (0)
-
 extern void setup_per_cpu_areas(void);
 
 #else /* ! SMP */
Index: linux-2.6/include/asm-s390/percpu.h
===================================================================
--- linux-2.6.orig/include/asm-s390/percpu.h	2007-11-21 13:14:39.835108493 -0800
+++ linux-2.6/include/asm-s390/percpu.h	2007-11-21 13:14:48.590858137 -0800
@@ -51,15 +51,6 @@ extern unsigned long __per_cpu_offset[NR
 #define per_cpu(var,cpu) __reloc_hide(var,__per_cpu_offset[cpu])
 #define per_cpu_offset(x) (__per_cpu_offset[x])
 
-/* A macro to avoid #include hell... */
-#define percpu_modcopy(pcpudst, src, size)			\
-do {								\
-	unsigned int __i;					\
-	for_each_possible_cpu(__i)				\
-		memcpy((pcpudst)+__per_cpu_offset[__i],		\
-		       (src), (size));				\
-} while (0)
-
 #else /* ! SMP */
 
 #define DEFINE_PER_CPU(type, name) \
Index: linux-2.6/include/asm-sparc64/percpu.h
===================================================================
--- linux-2.6.orig/include/asm-sparc64/percpu.h	2007-11-21 13:15:04.043858836 -0800
+++ linux-2.6/include/asm-sparc64/percpu.h	2007-11-21 13:15:11.330973138 -0800
@@ -30,14 +30,6 @@ extern unsigned long __per_cpu_shift;
 #define __get_cpu_var(var) (*RELOC_HIDE(&per_cpu__##var, __local_per_cpu_offset))
 #define __raw_get_cpu_var(var) (*RELOC_HIDE(&per_cpu__##var, __local_per_cpu_offset))
 
-/* A macro to avoid #include hell... */
-#define percpu_modcopy(pcpudst, src, size)			\
-do {								\
-	unsigned int __i;					\
-	for_each_possible_cpu(__i)				\
-		memcpy((pcpudst)+__per_cpu_offset(__i),		\
-		       (src), (size));				\
-} while (0)
 #else /* ! SMP */
 
 #define real_setup_per_cpu_areas()		do { } while (0)
Index: linux-2.6/include/asm-x86/percpu_32.h
===================================================================
--- linux-2.6.orig/include/asm-x86/percpu_32.h	2007-11-21 13:15:19.947608444 -0800
+++ linux-2.6/include/asm-x86/percpu_32.h	2007-11-21 13:15:26.562798940 -0800
@@ -74,15 +74,6 @@ DECLARE_PER_CPU(unsigned long, this_cpu_
 
 #define __get_cpu_var(var) __raw_get_cpu_var(var)
 
-/* A macro to avoid #include hell... */
-#define percpu_modcopy(pcpudst, src, size)			\
-do {								\
-	unsigned int __i;					\
-	for_each_possible_cpu(__i)				\
-		memcpy((pcpudst)+__per_cpu_offset[__i],		\
-		       (src), (size));				\
-} while (0)
-
 #define EXPORT_PER_CPU_SYMBOL(var) EXPORT_SYMBOL(per_cpu__##var)
 #define EXPORT_PER_CPU_SYMBOL_GPL(var) EXPORT_SYMBOL_GPL(per_cpu__##var)
 
Index: linux-2.6/include/asm-x86/percpu_64.h
===================================================================
--- linux-2.6.orig/include/asm-x86/percpu_64.h	2007-11-21 13:15:29.222858444 -0800
+++ linux-2.6/include/asm-x86/percpu_64.h	2007-11-21 13:15:34.703108217 -0800
@@ -36,15 +36,6 @@
 	extern int simple_identifier_##var(void);	\
 	RELOC_HIDE(&per_cpu__##var, __my_cpu_offset()); }))
 
-/* A macro to avoid #include hell... */
-#define percpu_modcopy(pcpudst, src, size)			\
-do {								\
-	unsigned int __i;					\
-	for_each_possible_cpu(__i)				\
-		memcpy((pcpudst)+__per_cpu_offset(__i),		\
-		       (src), (size));				\
-} while (0)
-
 extern void setup_per_cpu_areas(void);
 
 #else /* ! SMP */
Index: linux-2.6/kernel/module.c
===================================================================
--- linux-2.6.orig/kernel/module.c	2007-11-21 13:18:06.743108472 -0800
+++ linux-2.6/kernel/module.c	2007-11-21 13:23:29.394688088 -0800
@@ -423,6 +423,14 @@ static unsigned int find_pcpusec(Elf_Ehd
 	return find_sec(hdr, sechdrs, secstrings, ".data.percpu");
 }
 
+static void percpu_modcopy(void *pcpudest, const void *from, unsigned long size)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		memcpy(pcpudest + per_cpu_offset(cpu), from, size);
+}
+
 static int percpu_modinit(void)
 {
 	pcpu_num_used = 2;

-- 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [patch 04/14] ia64: Remove the __SMALL_ADDR_AREA attribute for per cpu access
  2007-11-27  0:14 [patch 00/14] Per cpu code simplification Christoph Lameter
                   ` (2 preceding siblings ...)
  2007-11-27  0:14 ` [patch 03/14] Modules: Fold percpu_modcopy into module.c and get rid of the macro from hell Christoph Lameter
@ 2007-11-27  0:14 ` Christoph Lameter
  2007-11-27  5:20   ` David Mosberger-Tang
  2007-11-27  9:30   ` Andreas Schwab
  2007-11-27  0:14 ` [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup Christoph Lameter
                   ` (9 subsequent siblings)
  13 siblings, 2 replies; 59+ messages in thread
From: Christoph Lameter @ 2007-11-27  0:14 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-ia64, tony.luck

[-- Attachment #1: remove_ia64_model_section --]
[-- Type: text/plain, Size: 2586 bytes --]

The model(small) attribute is not supported by gcc 4.X. The tests
will always be negative today.

Cc: linux-ia64@vger.kernel.org
Cc: tony.luck@intel.com
Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 arch/ia64/scripts/check-model.c   |    1 -
 arch/ia64/scripts/toolchain-flags |    6 ------
 include/asm-ia64/percpu.h         |   12 +++---------
 3 files changed, 3 insertions(+), 16 deletions(-)

Index: linux-2.6/include/asm-ia64/percpu.h
===================================================================
--- linux-2.6.orig/include/asm-ia64/percpu.h	2007-11-22 15:55:47.634454755 -0800
+++ linux-2.6/include/asm-ia64/percpu.h	2007-11-22 15:56:15.974704716 -0800
@@ -15,24 +15,18 @@
 
 #include <linux/threads.h>
 
-#ifdef HAVE_MODEL_SMALL_ATTRIBUTE
-# define __SMALL_ADDR_AREA	__attribute__((__model__ (__small__)))
-#else
-# define __SMALL_ADDR_AREA
-#endif
-
 #define DECLARE_PER_CPU(type, name)				\
-	extern __SMALL_ADDR_AREA __typeof__(type) per_cpu__##name
+	extern __typeof__(type) per_cpu__##name
 
 /* Separate out the type, so (int[3], foo) works. */
 #define DEFINE_PER_CPU(type, name)				\
 	__attribute__((__section__(".data.percpu")))		\
-	__SMALL_ADDR_AREA __typeof__(type) per_cpu__##name
+	__typeof__(type) per_cpu__##name
 
 #ifdef CONFIG_SMP
 #define DEFINE_PER_CPU_SHARED_ALIGNED(type, name)			\
 	__attribute__((__section__(".data.percpu.shared_aligned")))	\
-	__SMALL_ADDR_AREA __typeof__(type) per_cpu__##name		\
+	__typeof__(type) per_cpu__##name				\
 	____cacheline_aligned_in_smp
 #else
 #define DEFINE_PER_CPU_SHARED_ALIGNED(type, name)	\
Index: linux-2.6/arch/ia64/scripts/check-model.c
===================================================================
--- linux-2.6.orig/arch/ia64/scripts/check-model.c	2007-11-22 15:56:40.890455063 -0800
+++ /dev/null	1970-01-01 00:00:00.000000000 +0000
@@ -1 +0,0 @@
-int __attribute__ ((__model__ (__small__))) x;
Index: linux-2.6/arch/ia64/scripts/toolchain-flags
===================================================================
--- linux-2.6.orig/arch/ia64/scripts/toolchain-flags	2007-11-22 15:57:07.329204964 -0800
+++ linux-2.6/arch/ia64/scripts/toolchain-flags	2007-11-22 15:57:27.229018356 -0800
@@ -35,12 +35,6 @@ if [ $res -eq 0 ]; then
     CPPFLAGS="$CPPFLAGS -DHAVE_WORKING_TEXT_ALIGN"
 fi
 
-if ! $CC -c $dir/check-model.c -o $out 2>&1 | grep  __model__ | grep -q attrib
-then
-    CPPFLAGS="$CPPFLAGS -DHAVE_MODEL_SMALL_ATTRIBUTE"
-fi
-rm -f $out
-
 # Check whether assembler supports .serialize.{data,instruction} directive.
 
 $CC -c $dir/check-serialize.S -o $out 2>/dev/null

-- 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
  2007-11-27  0:14 [patch 00/14] Per cpu code simplification Christoph Lameter
                   ` (3 preceding siblings ...)
  2007-11-27  0:14 ` [patch 04/14] ia64: Remove the __SMALL_ADDR_AREA attribute for per cpu access Christoph Lameter
@ 2007-11-27  0:14 ` Christoph Lameter
  2007-11-27  4:30   ` Rusty Russell
  2007-11-27 23:40   ` Randy Dunlap
  2007-11-27  0:14 ` [patch 06/14] percpu: Move arch XX_PER_CPU_XX definitions into linux/percpu.h Christoph Lameter
                   ` (8 subsequent siblings)
  13 siblings, 2 replies; 59+ messages in thread
From: Christoph Lameter @ 2007-11-27  0:14 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, Rusty Russell, Andi Kleen

[-- Attachment #1: arch_sets_up_per_cpu_areas --]
[-- Type: text/plain, Size: 4556 bytes --]

The use of the __GENERIC_PERCPU is a bit problematic since arches
may want to run their own percpu setup while using the generic
percpu definitions. Replace it through a kconfig variable.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 arch/ia64/Kconfig            |    4 ++++
 arch/powerpc/Kconfig         |    4 ++++
 arch/sparc64/Kconfig         |    4 ++++
 arch/x86/Kconfig             |    6 +++---
 include/asm-generic/percpu.h |    1 -
 include/asm-s390/percpu.h    |    2 --
 include/asm-x86/percpu_32.h  |    2 --
 init/main.c                  |    4 ++--
 8 files changed, 17 insertions(+), 10 deletions(-)

Index: linux-2.6/init/main.c
===================================================================
--- linux-2.6.orig/init/main.c	2007-11-26 15:38:56.407111768 -0800
+++ linux-2.6/init/main.c	2007-11-26 15:40:10.425862722 -0800
@@ -363,7 +363,7 @@ static inline void smp_prepare_cpus(unsi
 
 #else
 
-#ifdef __GENERIC_PER_CPU
+#ifndef CONFIG_ARCH_SETS_UP_PER_CPU_AREA
 unsigned long __per_cpu_offset[NR_CPUS] __read_mostly;
 
 EXPORT_SYMBOL(__per_cpu_offset);
@@ -384,7 +384,7 @@ static void __init setup_per_cpu_areas(v
 		ptr += size;
 	}
 }
-#endif /* !__GENERIC_PER_CPU */
+#endif /* CONFIG_ARCH_SETS_UP_CPU_AREA */
 
 /* Called by boot processor to activate the rest. */
 static void __init smp_init(void)
Index: linux-2.6/arch/ia64/Kconfig
===================================================================
--- linux-2.6.orig/arch/ia64/Kconfig	2007-11-26 15:38:56.415112360 -0800
+++ linux-2.6/arch/ia64/Kconfig	2007-11-26 15:40:10.425862722 -0800
@@ -75,6 +75,10 @@ config GENERIC_TIME_VSYSCALL
 	bool
 	default y
 
+config ARCH_SETS_UP_PER_CPU_AREA
+	bool
+	default y
+
 config DMI
 	bool
 	default y
Index: linux-2.6/arch/powerpc/Kconfig
===================================================================
--- linux-2.6.orig/arch/powerpc/Kconfig	2007-11-26 15:38:56.427111914 -0800
+++ linux-2.6/arch/powerpc/Kconfig	2007-11-26 15:40:10.425862722 -0800
@@ -42,6 +42,10 @@ config GENERIC_HARDIRQS
 	bool
 	default y
 
+config ARCH_SETS_UP_PER_CPU_AREA
+	bool
+	default PPC64
+
 config IRQ_PER_CPU
 	bool
 	default y
Index: linux-2.6/arch/sparc64/Kconfig
===================================================================
--- linux-2.6.orig/arch/sparc64/Kconfig	2007-11-26 15:38:56.447111936 -0800
+++ linux-2.6/arch/sparc64/Kconfig	2007-11-26 15:40:10.425862722 -0800
@@ -66,6 +66,10 @@ config AUDIT_ARCH
 	bool
 	default y
 
+config ARCH_SETS_UP_PER_CPU_AREA
+	bool
+	default y
+
 config ARCH_NO_VIRT_TO_BUS
 	def_bool y
 
Index: linux-2.6/arch/x86/Kconfig
===================================================================
--- linux-2.6.orig/arch/x86/Kconfig	2007-11-26 15:38:58.234361975 -0800
+++ linux-2.6/arch/x86/Kconfig	2007-11-26 15:40:52.465611449 -0800
@@ -112,9 +112,9 @@ config GENERIC_TIME_VSYSCALL
 	bool
 	default X86_64
 
-
-
-
+config ARCH_SETS_UP_PER_CPU_AREA
+	bool
+	default X86_64
 
 config ZONE_DMA32
 	bool
Index: linux-2.6/include/asm-generic/percpu.h
===================================================================
--- linux-2.6.orig/include/asm-generic/percpu.h	2007-11-26 15:40:04.469611815 -0800
+++ linux-2.6/include/asm-generic/percpu.h	2007-11-26 15:40:10.437861790 -0800
@@ -3,7 +3,6 @@
 #include <linux/compiler.h>
 #include <linux/threads.h>
 
-#define __GENERIC_PER_CPU
 #ifdef CONFIG_SMP
 
 extern unsigned long __per_cpu_offset[NR_CPUS];
Index: linux-2.6/include/asm-x86/percpu_32.h
===================================================================
--- linux-2.6.orig/include/asm-x86/percpu_32.h	2007-11-26 15:40:04.469611815 -0800
+++ linux-2.6/include/asm-x86/percpu_32.h	2007-11-26 15:40:10.441861845 -0800
@@ -41,8 +41,6 @@
  *    PER_CPU(cpu_gdt_descr, %ebx)
  */
 #ifdef CONFIG_SMP
-/* Same as generic implementation except for optimized local access. */
-#define __GENERIC_PER_CPU
 
 /* This is used for other cpus to find our section. */
 extern unsigned long __per_cpu_offset[];
Index: linux-2.6/include/asm-s390/percpu.h
===================================================================
--- linux-2.6.orig/include/asm-s390/percpu.h	2007-11-26 15:40:04.469611815 -0800
+++ linux-2.6/include/asm-s390/percpu.h	2007-11-26 15:40:10.441861845 -0800
@@ -4,8 +4,6 @@
 #include <linux/compiler.h>
 #include <asm/lowcore.h>
 
-#define __GENERIC_PER_CPU
-
 /*
  * s390 uses its own implementation for per cpu data, the offset of
  * the cpu local data area is cached in the cpu's lowcore memory.

-- 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [patch 06/14] percpu: Move arch XX_PER_CPU_XX definitions into linux/percpu.h
  2007-11-27  0:14 [patch 00/14] Per cpu code simplification Christoph Lameter
                   ` (4 preceding siblings ...)
  2007-11-27  0:14 ` [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup Christoph Lameter
@ 2007-11-27  0:14 ` Christoph Lameter
  2007-11-27  0:14 ` [patch 07/14] percpu: Make the asm-generic/percpu.h more generic Christoph Lameter
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 59+ messages in thread
From: Christoph Lameter @ 2007-11-27  0:14 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, Rusty Russell, Andi Kleen

[-- Attachment #1: move_percpu_declarations --]
[-- Type: text/plain, Size: 12883 bytes --]

The arch definitions are all the same. So move them into linux/percpu.h.

We cannot move DECLARE_PER_CPU since some include files just include
asm/percpu.h to avoid include recursion problems.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/asm-generic/percpu.h |   18 ------------------
 include/asm-ia64/percpu.h    |   18 ------------------
 include/asm-powerpc/percpu.h |   17 -----------------
 include/asm-s390/percpu.h    |   18 ------------------
 include/asm-sparc64/percpu.h |   16 ----------------
 include/asm-x86/percpu_32.h  |   12 ------------
 include/asm-x86/percpu_64.h  |   17 -----------------
 include/linux/percpu.h       |   17 +++++++++++++++++
 8 files changed, 17 insertions(+), 116 deletions(-)

Index: linux-2.6/include/linux/percpu.h
===================================================================
--- linux-2.6.orig/include/linux/percpu.h	2007-11-24 19:25:19.781850716 -0800
+++ linux-2.6/include/linux/percpu.h	2007-11-24 19:33:55.416103196 -0800
@@ -9,6 +9,23 @@
 
 #include <asm/percpu.h>
 
+#define DEFINE_PER_CPU(type, name)					\
+	 __attribute__((__section__(".data.percpu")))			\
+	__typeof__(type) per_cpu__##name
+
+#ifdef CONFIG_SMP
+#define DEFINE_PER_CPU_SHARED_ALIGNED(type, name)			\
+	__attribute__((__section__(".data.percpu.shared_aligned")))	\
+	__typeof__(type) per_cpu__##name				\
+	____cacheline_aligned_in_smp
+#else
+#define DEFINE_PER_CPU_SHARED_ALIGNED(type, name)		      \
+	DEFINE_PER_CPU(type, name)
+#endif
+
+#define EXPORT_PER_CPU_SYMBOL(var) EXPORT_SYMBOL(per_cpu__##var)
+#define EXPORT_PER_CPU_SYMBOL_GPL(var) EXPORT_SYMBOL_GPL(per_cpu__##var)
+
 /* Enough to cover all DEFINE_PER_CPUs in kernel, including modules. */
 #ifndef PERCPU_ENOUGH_ROOM
 #ifdef CONFIG_MODULES
Index: linux-2.6/include/asm-generic/percpu.h
===================================================================
--- linux-2.6.orig/include/asm-generic/percpu.h	2007-11-24 19:33:13.676100523 -0800
+++ linux-2.6/include/asm-generic/percpu.h	2007-11-24 19:33:55.416103196 -0800
@@ -9,15 +9,6 @@ extern unsigned long __per_cpu_offset[NR
 
 #define per_cpu_offset(x) (__per_cpu_offset[x])
 
-/* Separate out the type, so (int[3], foo) works. */
-#define DEFINE_PER_CPU(type, name) \
-    __attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name
-
-#define DEFINE_PER_CPU_SHARED_ALIGNED(type, name)		\
-    __attribute__((__section__(".data.percpu.shared_aligned"))) \
-    __typeof__(type) per_cpu__##name				\
-    ____cacheline_aligned_in_smp
-
 /* var is in discarded region: offset to particular copy we want */
 #define per_cpu(var, cpu) (*({				\
 	extern int simple_identifier_##var(void);	\
@@ -27,12 +18,6 @@ extern unsigned long __per_cpu_offset[NR
 
 #else /* ! SMP */
 
-#define DEFINE_PER_CPU(type, name) \
-    __typeof__(type) per_cpu__##name
-
-#define DEFINE_PER_CPU_SHARED_ALIGNED(type, name)	\
-    DEFINE_PER_CPU(type, name)
-
 #define per_cpu(var, cpu)			(*((void)(cpu), &per_cpu__##var))
 #define __get_cpu_var(var)			per_cpu__##var
 #define __raw_get_cpu_var(var)			per_cpu__##var
@@ -41,7 +26,4 @@ extern unsigned long __per_cpu_offset[NR
 
 #define DECLARE_PER_CPU(type, name) extern __typeof__(type) per_cpu__##name
 
-#define EXPORT_PER_CPU_SYMBOL(var) EXPORT_SYMBOL(per_cpu__##var)
-#define EXPORT_PER_CPU_SYMBOL_GPL(var) EXPORT_SYMBOL_GPL(per_cpu__##var)
-
 #endif /* _ASM_GENERIC_PERCPU_H_ */
Index: linux-2.6/include/asm-ia64/percpu.h
===================================================================
--- linux-2.6.orig/include/asm-ia64/percpu.h	2007-11-24 19:33:13.641850404 -0800
+++ linux-2.6/include/asm-ia64/percpu.h	2007-11-24 19:33:55.416103196 -0800
@@ -18,21 +18,6 @@
 #define DECLARE_PER_CPU(type, name)				\
 	extern __typeof__(type) per_cpu__##name
 
-/* Separate out the type, so (int[3], foo) works. */
-#define DEFINE_PER_CPU(type, name)				\
-	__attribute__((__section__(".data.percpu")))		\
-	__typeof__(type) per_cpu__##name
-
-#ifdef CONFIG_SMP
-#define DEFINE_PER_CPU_SHARED_ALIGNED(type, name)			\
-	__attribute__((__section__(".data.percpu.shared_aligned")))	\
-	__typeof__(type) per_cpu__##name				\
-	____cacheline_aligned_in_smp
-#else
-#define DEFINE_PER_CPU_SHARED_ALIGNED(type, name)	\
-	DEFINE_PER_CPU(type, name)
-#endif
-
 #ifdef CONFIG_SMP
 
 extern unsigned long __per_cpu_offset[NR_CPUS];
@@ -57,9 +42,6 @@ extern void *per_cpu_init(void);
 
 #endif	/* SMP */
 
-#define EXPORT_PER_CPU_SYMBOL(var)		EXPORT_SYMBOL(per_cpu__##var)
-#define EXPORT_PER_CPU_SYMBOL_GPL(var)		EXPORT_SYMBOL_GPL(per_cpu__##var)
-
 /*
  * Be extremely careful when taking the address of this variable!  Due to virtual
  * remapping, it is different from the canonical address returned by __get_cpu_var(var)!
Index: linux-2.6/include/asm-powerpc/percpu.h
===================================================================
--- linux-2.6.orig/include/asm-powerpc/percpu.h	2007-11-24 19:33:13.617850616 -0800
+++ linux-2.6/include/asm-powerpc/percpu.h	2007-11-24 19:33:55.416103196 -0800
@@ -16,15 +16,6 @@
 #define __my_cpu_offset() get_paca()->data_offset
 #define per_cpu_offset(x) (__per_cpu_offset(x))
 
-/* Separate out the type, so (int[3], foo) works. */
-#define DEFINE_PER_CPU(type, name) \
-    __attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name
-
-#define DEFINE_PER_CPU_SHARED_ALIGNED(type, name)		\
-    __attribute__((__section__(".data.percpu.shared_aligned"))) \
-    __typeof__(type) per_cpu__##name				\
-    ____cacheline_aligned_in_smp
-
 /* var is in discarded region: offset to particular copy we want */
 #define per_cpu(var, cpu) (*RELOC_HIDE(&per_cpu__##var, __per_cpu_offset(cpu)))
 #define __get_cpu_var(var) (*RELOC_HIDE(&per_cpu__##var, __my_cpu_offset()))
@@ -34,11 +25,6 @@ extern void setup_per_cpu_areas(void);
 
 #else /* ! SMP */
 
-#define DEFINE_PER_CPU(type, name) \
-    __typeof__(type) per_cpu__##name
-#define DEFINE_PER_CPU_SHARED_ALIGNED(type, name)	\
-    DEFINE_PER_CPU(type, name)
-
 #define per_cpu(var, cpu)			(*((void)(cpu), &per_cpu__##var))
 #define __get_cpu_var(var)			per_cpu__##var
 #define __raw_get_cpu_var(var)			per_cpu__##var
@@ -47,9 +33,6 @@ extern void setup_per_cpu_areas(void);
 
 #define DECLARE_PER_CPU(type, name) extern __typeof__(type) per_cpu__##name
 
-#define EXPORT_PER_CPU_SYMBOL(var) EXPORT_SYMBOL(per_cpu__##var)
-#define EXPORT_PER_CPU_SYMBOL_GPL(var) EXPORT_SYMBOL_GPL(per_cpu__##var)
-
 #else
 #include <asm-generic/percpu.h>
 #endif
Index: linux-2.6/include/asm-s390/percpu.h
===================================================================
--- linux-2.6.orig/include/asm-s390/percpu.h	2007-11-24 19:33:13.676100523 -0800
+++ linux-2.6/include/asm-s390/percpu.h	2007-11-24 19:33:55.416103196 -0800
@@ -34,16 +34,6 @@
 
 extern unsigned long __per_cpu_offset[NR_CPUS];
 
-/* Separate out the type, so (int[3], foo) works. */
-#define DEFINE_PER_CPU(type, name) \
-    __attribute__((__section__(".data.percpu"))) \
-    __typeof__(type) per_cpu__##name
-
-#define DEFINE_PER_CPU_SHARED_ALIGNED(type, name)		\
-    __attribute__((__section__(".data.percpu.shared_aligned"))) \
-    __typeof__(type) per_cpu__##name				\
-    ____cacheline_aligned_in_smp
-
 #define __get_cpu_var(var) __reloc_hide(var,S390_lowcore.percpu_offset)
 #define __raw_get_cpu_var(var) __reloc_hide(var,S390_lowcore.percpu_offset)
 #define per_cpu(var,cpu) __reloc_hide(var,__per_cpu_offset[cpu])
@@ -51,11 +41,6 @@ extern unsigned long __per_cpu_offset[NR
 
 #else /* ! SMP */
 
-#define DEFINE_PER_CPU(type, name) \
-    __typeof__(type) per_cpu__##name
-#define DEFINE_PER_CPU_SHARED_ALIGNED(type, name)	\
-    DEFINE_PER_CPU(type, name)
-
 #define __get_cpu_var(var) __reloc_hide(var,0)
 #define __raw_get_cpu_var(var) __reloc_hide(var,0)
 #define per_cpu(var,cpu) __reloc_hide(var,0)
@@ -64,7 +49,4 @@ extern unsigned long __per_cpu_offset[NR
 
 #define DECLARE_PER_CPU(type, name) extern __typeof__(type) per_cpu__##name
 
-#define EXPORT_PER_CPU_SYMBOL(var) EXPORT_SYMBOL(per_cpu__##var)
-#define EXPORT_PER_CPU_SYMBOL_GPL(var) EXPORT_SYMBOL_GPL(per_cpu__##var)
-
 #endif /* __ARCH_S390_PERCPU__ */
Index: linux-2.6/include/asm-sparc64/percpu.h
===================================================================
--- linux-2.6.orig/include/asm-sparc64/percpu.h	2007-11-24 19:33:13.617850616 -0800
+++ linux-2.6/include/asm-sparc64/percpu.h	2007-11-24 19:33:55.416103196 -0800
@@ -16,15 +16,6 @@ extern unsigned long __per_cpu_shift;
 	(__per_cpu_base + ((unsigned long)(__cpu) << __per_cpu_shift))
 #define per_cpu_offset(x) (__per_cpu_offset(x))
 
-/* Separate out the type, so (int[3], foo) works. */
-#define DEFINE_PER_CPU(type, name) \
-    __attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name
-
-#define DEFINE_PER_CPU_SHARED_ALIGNED(type, name)		\
-    __attribute__((__section__(".data.percpu.shared_aligned"))) \
-    __typeof__(type) per_cpu__##name				\
-    ____cacheline_aligned_in_smp
-
 /* var is in discarded region: offset to particular copy we want */
 #define per_cpu(var, cpu) (*RELOC_HIDE(&per_cpu__##var, __per_cpu_offset(cpu)))
 #define __get_cpu_var(var) (*RELOC_HIDE(&per_cpu__##var, __local_per_cpu_offset))
@@ -33,10 +24,6 @@ extern unsigned long __per_cpu_shift;
 #else /* ! SMP */
 
 #define real_setup_per_cpu_areas()		do { } while (0)
-#define DEFINE_PER_CPU(type, name) \
-    __typeof__(type) per_cpu__##name
-#define DEFINE_PER_CPU_SHARED_ALIGNED(type, name)	\
-    DEFINE_PER_CPU(type, name)
 
 #define per_cpu(var, cpu)			(*((void)cpu, &per_cpu__##var))
 #define __get_cpu_var(var)			per_cpu__##var
@@ -46,7 +33,4 @@ extern unsigned long __per_cpu_shift;
 
 #define DECLARE_PER_CPU(type, name) extern __typeof__(type) per_cpu__##name
 
-#define EXPORT_PER_CPU_SYMBOL(var) EXPORT_SYMBOL(per_cpu__##var)
-#define EXPORT_PER_CPU_SYMBOL_GPL(var) EXPORT_SYMBOL_GPL(per_cpu__##var)
-
 #endif /* __ARCH_SPARC64_PERCPU__ */
Index: linux-2.6/include/asm-x86/percpu_32.h
===================================================================
--- linux-2.6.orig/include/asm-x86/percpu_32.h	2007-11-24 19:33:13.676100523 -0800
+++ linux-2.6/include/asm-x86/percpu_32.h	2007-11-24 19:33:55.416103196 -0800
@@ -47,16 +47,7 @@ extern unsigned long __per_cpu_offset[];
 
 #define per_cpu_offset(x) (__per_cpu_offset[x])
 
-/* Separate out the type, so (int[3], foo) works. */
 #define DECLARE_PER_CPU(type, name) extern __typeof__(type) per_cpu__##name
-#define DEFINE_PER_CPU(type, name) \
-    __attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name
-
-#define DEFINE_PER_CPU_SHARED_ALIGNED(type, name)		\
-    __attribute__((__section__(".data.percpu.shared_aligned"))) \
-    __typeof__(type) per_cpu__##name				\
-    ____cacheline_aligned_in_smp
-
 /* We can use this directly for local CPU (faster). */
 DECLARE_PER_CPU(unsigned long, this_cpu_off);
 
@@ -72,9 +63,6 @@ DECLARE_PER_CPU(unsigned long, this_cpu_
 
 #define __get_cpu_var(var) __raw_get_cpu_var(var)
 
-#define EXPORT_PER_CPU_SYMBOL(var) EXPORT_SYMBOL(per_cpu__##var)
-#define EXPORT_PER_CPU_SYMBOL_GPL(var) EXPORT_SYMBOL_GPL(per_cpu__##var)
-
 /* fs segment starts at (positive) offset == __per_cpu_offset[cpu] */
 #define __percpu_seg "%%fs:"
 #else  /* !SMP */
Index: linux-2.6/include/asm-x86/percpu_64.h
===================================================================
--- linux-2.6.orig/include/asm-x86/percpu_64.h	2007-11-24 19:33:13.617850616 -0800
+++ linux-2.6/include/asm-x86/percpu_64.h	2007-11-24 19:33:55.416103196 -0800
@@ -16,15 +16,6 @@
 
 #define per_cpu_offset(x) (__per_cpu_offset(x))
 
-/* Separate out the type, so (int[3], foo) works. */
-#define DEFINE_PER_CPU(type, name) \
-    __attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name
-
-#define DEFINE_PER_CPU_SHARED_ALIGNED(type, name)		\
-    __attribute__((__section__(".data.percpu.shared_aligned"))) \
-    __typeof__(type) per_cpu__##name				\
-    ____cacheline_internodealigned_in_smp
-
 /* var is in discarded region: offset to particular copy we want */
 #define per_cpu(var, cpu) (*({				\
 	extern int simple_identifier_##var(void);	\
@@ -40,11 +31,6 @@ extern void setup_per_cpu_areas(void);
 
 #else /* ! SMP */
 
-#define DEFINE_PER_CPU(type, name) \
-    __typeof__(type) per_cpu__##name
-#define DEFINE_PER_CPU_SHARED_ALIGNED(type, name)	\
-    DEFINE_PER_CPU(type, name)
-
 #define per_cpu(var, cpu)			(*((void)(cpu), &per_cpu__##var))
 #define __get_cpu_var(var)			per_cpu__##var
 #define __raw_get_cpu_var(var)			per_cpu__##var
@@ -53,7 +39,4 @@ extern void setup_per_cpu_areas(void);
 
 #define DECLARE_PER_CPU(type, name) extern __typeof__(type) per_cpu__##name
 
-#define EXPORT_PER_CPU_SYMBOL(var) EXPORT_SYMBOL(per_cpu__##var)
-#define EXPORT_PER_CPU_SYMBOL_GPL(var) EXPORT_SYMBOL_GPL(per_cpu__##var)
-
 #endif /* _ASM_X8664_PERCPU_H_ */

-- 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [patch 07/14] percpu: Make the asm-generic/percpu.h more generic
  2007-11-27  0:14 [patch 00/14] Per cpu code simplification Christoph Lameter
                   ` (5 preceding siblings ...)
  2007-11-27  0:14 ` [patch 06/14] percpu: Move arch XX_PER_CPU_XX definitions into linux/percpu.h Christoph Lameter
@ 2007-11-27  0:14 ` Christoph Lameter
  2007-11-27  0:14 ` [patch 08/14] x86_32: Use generic percpu.h Christoph Lameter
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 59+ messages in thread
From: Christoph Lameter @ 2007-11-27  0:14 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, Rusty Russell, Andi Kleen

[-- Attachment #1: genericize-percpu.h --]
[-- Type: text/plain, Size: 4008 bytes --]

Add the ability to use generic/percpu even if the arch needs to override
several aspects of its operations. This will enable the use of generic
percpu.h for all arches.

An arch may define:

__per_cpu_offset	Do not use the generic pointer array. Arch must
			define per_cpu_offset(cpu) (used by x86_64, s390).

__my_cpu_offset		Can be defined to provide an optimized way to determine
			the offset for variables of the currently executing
			processor. Used by ia64, x86_64, x86_32, sparc64, s/390.

SHIFT_PTR(ptr, offset)	If an arch defines it then special handling
			of pointer arithmentic may be implemented. Used
			by s/390.


(Some of these special percpu arch implementations may be later consolidated
so that there are less cases to deal with.)

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/asm-generic/percpu.h |   64 +++++++++++++++++++++++++++++++++++--------
 1 file changed, 53 insertions(+), 11 deletions(-)

Index: linux-2.6/include/asm-generic/percpu.h
===================================================================
--- linux-2.6.orig/include/asm-generic/percpu.h	2007-11-24 19:33:55.416103196 -0800
+++ linux-2.6/include/asm-generic/percpu.h	2007-11-24 19:36:35.016350539 -0800
@@ -3,27 +3,69 @@
 #include <linux/compiler.h>
 #include <linux/threads.h>
 
+/*
+ * Determine the real variable name from the name visible in the
+ * kernel sources.
+ */
+#define per_cpu_var(var) per_cpu__##var
+
 #ifdef CONFIG_SMP
 
+/*
+ * per_cpu_offset() is the offset that has to be added to a
+ * percpu variable to get to the instance for a certain processor.
+ *
+ * Most arches use the __per_cpu_offset array for those offsets but
+ * some arches have their own ways of determining the offset (x86_64, s390).
+ */
+#ifndef __per_cpu_offset
 extern unsigned long __per_cpu_offset[NR_CPUS];
-
 #define per_cpu_offset(x) (__per_cpu_offset[x])
+#endif
 
-/* var is in discarded region: offset to particular copy we want */
-#define per_cpu(var, cpu) (*({				\
-	extern int simple_identifier_##var(void);	\
-	RELOC_HIDE(&per_cpu__##var, __per_cpu_offset[cpu]); }))
-#define __get_cpu_var(var) per_cpu(var, smp_processor_id())
-#define __raw_get_cpu_var(var) per_cpu(var, raw_smp_processor_id())
+/*
+ * Determine the offset for the currently active processor.
+ * An arch may define __my_cpu_offset to provide a more effective
+ * means of obtaining the offset to the per cpu variables of the
+ * current processor.
+ */
+#ifndef __my_cpu_offset
+#define __my_cpu_offset per_cpu_offset(raw_smp_processor_id())
+#define my_cpu_offset per_cpu_offset(smp_processor_id())
+#else
+#define my_cpu_offset __my_cpu_offset
+#endif
+
+/*
+ * Add a offset to a pointer but keep the pointer as is.
+ *
+ * Only S390 provides its own means of moving the pointer.
+ */
+#ifndef SHIFT_PTR
+#define SHIFT_PTR(__p, __offset)	RELOC_HIDE((__p), (__offset))
+#endif
+
+/*
+ * A percpu variable may point to a discarded reghions. The following are
+ * established ways to produce a usable pointer from the percpu variable
+ * offset.
+ */
+#define per_cpu(var, cpu) (*SHIFT_PTR(&per_cpu_var(var), per_cpu_offset(cpu)))
+#define __get_cpu_var(var) (*SHIFT_PTR(&per_cpu_var(var), my_cpu_offset))
+#define __raw_get_cpu_var(var) (*SHIFT_PTR(&per_cpu_var(var), __my_cpu_offset))
+
+#ifdef CONFIG_ARCH_SETS_UP_PER_CPU_AREA
+extern void setup_per_cpu_areas(void);
+#endif
 
 #else /* ! SMP */
 
-#define per_cpu(var, cpu)			(*((void)(cpu), &per_cpu__##var))
-#define __get_cpu_var(var)			per_cpu__##var
-#define __raw_get_cpu_var(var)			per_cpu__##var
+#define per_cpu(var, cpu)			(*((void)(cpu), &per_cpu_var(var)))
+#define __get_cpu_var(var)			per_cpu_var(var)
+#define __raw_get_cpu_var(var)			per_cpu_var(var)
 
 #endif	/* SMP */
 
-#define DECLARE_PER_CPU(type, name) extern __typeof__(type) per_cpu__##name
+#define DECLARE_PER_CPU(type, name) extern __typeof__(type) per_cpu_var(name)
 
 #endif /* _ASM_GENERIC_PERCPU_H_ */

-- 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [patch 08/14] x86_32: Use generic percpu.h
  2007-11-27  0:14 [patch 00/14] Per cpu code simplification Christoph Lameter
                   ` (6 preceding siblings ...)
  2007-11-27  0:14 ` [patch 07/14] percpu: Make the asm-generic/percpu.h more generic Christoph Lameter
@ 2007-11-27  0:14 ` Christoph Lameter
  2007-11-27  0:14 ` [patch 09/14] x86_64: Use generic percpu Christoph Lameter
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 59+ messages in thread
From: Christoph Lameter @ 2007-11-27  0:14 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, tglx, mingo, ak

[-- Attachment #1: x86_32_use_generic_percpu --]
[-- Type: text/plain, Size: 2009 bytes --]

x86_32 only provides a special way to obtain the local per cpu area offset
via x86_read_percpu. Otherwise it can fully use the generic handling.

Cc: tglx@linutronix.de
Cc: mingo@redhat.com
Cc: ak@suse.de
Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/asm-x86/percpu_32.h |   30 +++++++++---------------------
 1 file changed, 9 insertions(+), 21 deletions(-)

Index: linux-2.6/include/asm-x86/percpu_32.h
===================================================================
--- linux-2.6.orig/include/asm-x86/percpu_32.h	2007-11-22 17:33:35.194455426 -0800
+++ linux-2.6/include/asm-x86/percpu_32.h	2007-11-22 17:39:20.472955100 -0800
@@ -42,34 +42,22 @@
  */
 #ifdef CONFIG_SMP
 
-/* This is used for other cpus to find our section. */
-extern unsigned long __per_cpu_offset[];
-
-#define per_cpu_offset(x) (__per_cpu_offset[x])
-
-#define DECLARE_PER_CPU(type, name) extern __typeof__(type) per_cpu__##name
-/* We can use this directly for local CPU (faster). */
-DECLARE_PER_CPU(unsigned long, this_cpu_off);
-
-/* var is in discarded region: offset to particular copy we want */
-#define per_cpu(var, cpu) (*({				\
-	extern int simple_indentifier_##var(void);	\
-	RELOC_HIDE(&per_cpu__##var, __per_cpu_offset[cpu]); }))
-
-#define __raw_get_cpu_var(var) (*({					\
-	extern int simple_indentifier_##var(void);			\
-	RELOC_HIDE(&per_cpu__##var, x86_read_percpu(this_cpu_off));	\
-}))
-
-#define __get_cpu_var(var) __raw_get_cpu_var(var)
+#define __my_cpu_offset x86_read_percpu(this_cpu_off)
 
 /* fs segment starts at (positive) offset == __per_cpu_offset[cpu] */
 #define __percpu_seg "%%fs:"
+
 #else  /* !SMP */
-#include <asm-generic/percpu.h>
+
 #define __percpu_seg ""
+
 #endif	/* SMP */
 
+#include <asm-generic/percpu.h>
+
+/* We can use this directly for local CPU (faster). */
+DECLARE_PER_CPU(unsigned long, this_cpu_off);
+
 /* For arch-specific code, we can use direct single-insn ops (they
  * don't give an lvalue though). */
 extern void __bad_percpu_size(void);

-- 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [patch 09/14] x86_64: Use generic percpu
  2007-11-27  0:14 [patch 00/14] Per cpu code simplification Christoph Lameter
                   ` (7 preceding siblings ...)
  2007-11-27  0:14 ` [patch 08/14] x86_32: Use generic percpu.h Christoph Lameter
@ 2007-11-27  0:14 ` Christoph Lameter
  2007-11-27  0:14 ` [patch 10/14] s390: " Christoph Lameter
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 59+ messages in thread
From: Christoph Lameter @ 2007-11-27  0:14 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, Rusty Russell, Andi Kleen, tglx, mingo

[-- Attachment #1: x86_64_use_generic_percpu --]
[-- Type: text/plain, Size: 1887 bytes --]

x86_64 provides an optimized way to determine the local per cpu area
offset through the pda and determines the base by accessing a remote
pda.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Cc: tglx@linutronix.de
Cc: mingo@redhat.com
Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/asm-x86/percpu_64.h |   28 +++-------------------------
 1 file changed, 3 insertions(+), 25 deletions(-)

Index: linux-2.6/include/asm-x86/percpu_64.h
===================================================================
--- linux-2.6.orig/include/asm-x86/percpu_64.h	2007-11-24 10:27:31.088350556 -0800
+++ linux-2.6/include/asm-x86/percpu_64.h	2007-11-24 10:28:51.020600562 -0800
@@ -8,35 +8,13 @@
    from %gs */
 
 #ifdef CONFIG_SMP
-
 #include <asm/pda.h>
 
 #define __per_cpu_offset(cpu) (cpu_pda(cpu)->data_offset)
-#define __my_cpu_offset() read_pda(data_offset)
+#define __my_cpu_offset read_pda(data_offset)
 
 #define per_cpu_offset(x) (__per_cpu_offset(x))
 
-/* var is in discarded region: offset to particular copy we want */
-#define per_cpu(var, cpu) (*({				\
-	extern int simple_identifier_##var(void);	\
-	RELOC_HIDE(&per_cpu__##var, __per_cpu_offset(cpu)); }))
-#define __get_cpu_var(var) (*({				\
-	extern int simple_identifier_##var(void);	\
-	RELOC_HIDE(&per_cpu__##var, __my_cpu_offset()); }))
-#define __raw_get_cpu_var(var) (*({			\
-	extern int simple_identifier_##var(void);	\
-	RELOC_HIDE(&per_cpu__##var, __my_cpu_offset()); }))
-
-extern void setup_per_cpu_areas(void);
-
-#else /* ! SMP */
-
-#define per_cpu(var, cpu)			(*((void)(cpu), &per_cpu__##var))
-#define __get_cpu_var(var)			per_cpu__##var
-#define __raw_get_cpu_var(var)			per_cpu__##var
-
-#endif	/* SMP */
-
-#define DECLARE_PER_CPU(type, name) extern __typeof__(type) per_cpu__##name
-
+#endif
+#include <asm-generic/percpu.h>
 #endif /* _ASM_X8664_PERCPU_H_ */

-- 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [patch 10/14] s390: Use generic percpu
  2007-11-27  0:14 [patch 00/14] Per cpu code simplification Christoph Lameter
                   ` (8 preceding siblings ...)
  2007-11-27  0:14 ` [patch 09/14] x86_64: Use generic percpu Christoph Lameter
@ 2007-11-27  0:14 ` Christoph Lameter
  2007-11-27  0:14 ` [patch 11/14] Powerpc: Use generic per cpu Christoph Lameter
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 59+ messages in thread
From: Christoph Lameter @ 2007-11-27  0:14 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, schwidefsky

[-- Attachment #1: s390_generic_percpu --]
[-- Type: text/plain, Size: 2317 bytes --]

s390 has a special way to determine the pointer to a per cpu area
plus there is a way to access the base of the per cpu area of the
currently executing processor.

Note: I had to do a minor change to ASM code. Please check that
this was done right.

Cc: schwidefsky@de.ibm.com
Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/asm-s390/percpu.h |   33 +++++++++------------------------
 1 file changed, 9 insertions(+), 24 deletions(-)

Index: linux-2.6/include/asm-s390/percpu.h
===================================================================
--- linux-2.6.orig/include/asm-s390/percpu.h	2007-11-24 16:20:38.776100537 -0800
+++ linux-2.6/include/asm-s390/percpu.h	2007-11-24 16:21:11.536101444 -0800
@@ -13,40 +13,25 @@
  */
 #if defined(__s390x__) && defined(MODULE)
 
-#define __reloc_hide(var,offset) (*({			\
+#define SHIFT_PTR(ptr,offset) (({			\
 	extern int simple_identifier_##var(void);	\
 	unsigned long *__ptr;				\
-	asm ( "larl %0,per_cpu__"#var"@GOTENT"		\
-	    : "=a" (__ptr) : "X" (per_cpu__##var) );	\
-	(typeof(&per_cpu__##var))((*__ptr) + (offset));	}))
+	asm ( "larl %0, %1@GOTENT"		\
+	    : "=a" (__ptr) : "X" (ptr) );		\
+	(typeof(ptr))((*__ptr) + (offset));	}))
 
 #else
 
-#define __reloc_hide(var, offset) (*({				\
+#define SHIFT_PTR(ptr, offset) (({				\
 	extern int simple_identifier_##var(void);		\
 	unsigned long __ptr;					\
-	asm ( "" : "=a" (__ptr) : "0" (&per_cpu__##var) );	\
-	(typeof(&per_cpu__##var)) (__ptr + (offset)); }))
+	asm ( "" : "=a" (__ptr) : "0" (ptr) );			\
+	(typeof(ptr)) (__ptr + (offset)); }))
 
 #endif
 
-#ifdef CONFIG_SMP
+#define __my_cpu_offset S390_lowcore.percpu_offset
 
-extern unsigned long __per_cpu_offset[NR_CPUS];
-
-#define __get_cpu_var(var) __reloc_hide(var,S390_lowcore.percpu_offset)
-#define __raw_get_cpu_var(var) __reloc_hide(var,S390_lowcore.percpu_offset)
-#define per_cpu(var,cpu) __reloc_hide(var,__per_cpu_offset[cpu])
-#define per_cpu_offset(x) (__per_cpu_offset[x])
-
-#else /* ! SMP */
-
-#define __get_cpu_var(var) __reloc_hide(var,0)
-#define __raw_get_cpu_var(var) __reloc_hide(var,0)
-#define per_cpu(var,cpu) __reloc_hide(var,0)
-
-#endif /* SMP */
-
-#define DECLARE_PER_CPU(type, name) extern __typeof__(type) per_cpu__##name
+#include <asm-generic/percpu.h>
 
 #endif /* __ARCH_S390_PERCPU__ */

-- 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [patch 11/14] Powerpc: Use generic per cpu
  2007-11-27  0:14 [patch 00/14] Per cpu code simplification Christoph Lameter
                   ` (9 preceding siblings ...)
  2007-11-27  0:14 ` [patch 10/14] s390: " Christoph Lameter
@ 2007-11-27  0:14 ` Christoph Lameter
  2007-11-27  7:41   ` Kumar Gala
  2007-11-27  0:14 ` [patch 12/14] Sparc64: Use generic percpu Christoph Lameter
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 59+ messages in thread
From: Christoph Lameter @ 2007-11-27  0:14 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, Paul Mackerras

[-- Attachment #1: power_generic_percpu --]
[-- Type: text/plain, Size: 1545 bytes --]

Powerpc has a way to determine the address of the per cpu area of the
currently executing processor via the paca and the array of per cpu
offsets is avoided by looking up the per cpu area from the remote
paca's (copying x86_64).

Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/asm-powerpc/percpu.h |   19 -------------------
 1 file changed, 19 deletions(-)

Index: linux-2.6/include/asm-powerpc/percpu.h
===================================================================
--- linux-2.6.orig/include/asm-powerpc/percpu.h	2007-11-24 10:27:31.088350556 -0800
+++ linux-2.6/include/asm-powerpc/percpu.h	2007-11-24 10:29:20.752350757 -0800
@@ -16,25 +16,6 @@
 #define __my_cpu_offset() get_paca()->data_offset
 #define per_cpu_offset(x) (__per_cpu_offset(x))
 
-/* var is in discarded region: offset to particular copy we want */
-#define per_cpu(var, cpu) (*RELOC_HIDE(&per_cpu__##var, __per_cpu_offset(cpu)))
-#define __get_cpu_var(var) (*RELOC_HIDE(&per_cpu__##var, __my_cpu_offset()))
-#define __raw_get_cpu_var(var) (*RELOC_HIDE(&per_cpu__##var, local_paca->data_offset))
-
-extern void setup_per_cpu_areas(void);
-
-#else /* ! SMP */
-
-#define per_cpu(var, cpu)			(*((void)(cpu), &per_cpu__##var))
-#define __get_cpu_var(var)			per_cpu__##var
-#define __raw_get_cpu_var(var)			per_cpu__##var
-
 #endif	/* SMP */
-
-#define DECLARE_PER_CPU(type, name) extern __typeof__(type) per_cpu__##name
-
-#else
 #include <asm-generic/percpu.h>
-#endif
-
 #endif /* _ASM_POWERPC_PERCPU_H_ */

-- 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [patch 12/14] Sparc64: Use generic percpu
  2007-11-27  0:14 [patch 00/14] Per cpu code simplification Christoph Lameter
                   ` (10 preceding siblings ...)
  2007-11-27  0:14 ` [patch 11/14] Powerpc: Use generic per cpu Christoph Lameter
@ 2007-11-27  0:14 ` Christoph Lameter
  2007-11-27  0:14 ` [patch 13/14] ia64: " Christoph Lameter
  2007-11-27  0:14 ` [patch 14/14] x86: Unify percpu.h Christoph Lameter
  13 siblings, 0 replies; 59+ messages in thread
From: Christoph Lameter @ 2007-11-27  0:14 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, David Miller

[-- Attachment #1: sparc64_generic_percpu --]
[-- Type: text/plain, Size: 2493 bytes --]

Sparc64 has a way of providing the base address for the per cpu area of the
currently executing processor in a global register.

Sparc64 also provides a way to calculate the address of a per cpu area
from a base address instead of performing an array lookup.

Cc: David Miller <davem@davemloft.net>
Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 arch/sparc64/mm/init.c       |    5 +++++
 include/asm-sparc64/percpu.h |   12 ++----------
 2 files changed, 7 insertions(+), 10 deletions(-)

Index: linux-2.6/include/asm-sparc64/percpu.h
===================================================================
--- linux-2.6.orig/include/asm-sparc64/percpu.h	2007-11-24 10:27:31.088350556 -0800
+++ linux-2.6/include/asm-sparc64/percpu.h	2007-11-24 10:29:52.133100730 -0800
@@ -7,7 +7,6 @@ register unsigned long __local_per_cpu_o
 
 #ifdef CONFIG_SMP
 
-#define setup_per_cpu_areas()			do { } while (0)
 extern void real_setup_per_cpu_areas(void);
 
 extern unsigned long __per_cpu_base;
@@ -16,21 +15,14 @@ extern unsigned long __per_cpu_shift;
 	(__per_cpu_base + ((unsigned long)(__cpu) << __per_cpu_shift))
 #define per_cpu_offset(x) (__per_cpu_offset(x))
 
-/* var is in discarded region: offset to particular copy we want */
-#define per_cpu(var, cpu) (*RELOC_HIDE(&per_cpu__##var, __per_cpu_offset(cpu)))
-#define __get_cpu_var(var) (*RELOC_HIDE(&per_cpu__##var, __local_per_cpu_offset))
-#define __raw_get_cpu_var(var) (*RELOC_HIDE(&per_cpu__##var, __local_per_cpu_offset))
+#define __my_cpu_offset __local_per_cpu_offset
 
 #else /* ! SMP */
 
 #define real_setup_per_cpu_areas()		do { } while (0)
 
-#define per_cpu(var, cpu)			(*((void)cpu, &per_cpu__##var))
-#define __get_cpu_var(var)			per_cpu__##var
-#define __raw_get_cpu_var(var)			per_cpu__##var
-
 #endif	/* SMP */
 
-#define DECLARE_PER_CPU(type, name) extern __typeof__(type) per_cpu__##name
+#include <asm-generic/percpu.h>
 
 #endif /* __ARCH_SPARC64_PERCPU__ */
Index: linux-2.6/arch/sparc64/mm/init.c
===================================================================
--- linux-2.6.orig/arch/sparc64/mm/init.c	2007-11-24 10:30:40.908850808 -0800
+++ linux-2.6/arch/sparc64/mm/init.c	2007-11-24 10:31:16.464100581 -0800
@@ -1323,6 +1323,11 @@ pgd_t swapper_pg_dir[2048];
 static void sun4u_pgprot_init(void);
 static void sun4v_pgprot_init(void);
 
+/* Dummy function */
+void __init setup_per_cpu_areas(void)
+{
+}
+
 void __init paging_init(void)
 {
 	unsigned long end_pfn, pages_avail, shift, phys_base;

-- 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [patch 13/14] ia64: Use generic percpu
  2007-11-27  0:14 [patch 00/14] Per cpu code simplification Christoph Lameter
                   ` (11 preceding siblings ...)
  2007-11-27  0:14 ` [patch 12/14] Sparc64: Use generic percpu Christoph Lameter
@ 2007-11-27  0:14 ` Christoph Lameter
  2007-11-27  1:37   ` Christoph Lameter
  2007-11-27  0:14 ` [patch 14/14] x86: Unify percpu.h Christoph Lameter
  13 siblings, 1 reply; 59+ messages in thread
From: Christoph Lameter @ 2007-11-27  0:14 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-ia64, tony.luck

[-- Attachment #1: ia64_generic_percpu --]
[-- Type: text/plain, Size: 2460 bytes --]

ia64 has a special processor specific mapping that can be used to locate the
offset for the current per cpu area.

Cc: linux-ia64@vger.kernel.org
Cc: tony.luck@intel.com
Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/asm-ia64/percpu.h |   24 ++++++++----------------
 1 file changed, 8 insertions(+), 16 deletions(-)

Index: linux-2.6/include/asm-ia64/percpu.h
===================================================================
--- linux-2.6.orig/include/asm-ia64/percpu.h	2007-11-24 10:27:31.088350556 -0800
+++ linux-2.6/include/asm-ia64/percpu.h	2007-11-24 10:31:54.053187674 -0800
@@ -9,10 +9,9 @@
 #define PERCPU_ENOUGH_ROOM PERCPU_PAGE_SIZE
 
 #ifdef __ASSEMBLY__
-# define THIS_CPU(var)	(per_cpu__##var)  /* use this to mark accesses to per-CPU variables... */
+# define THIS_CPU(var)	(PERCPU_ADDR + per_cpu__##var)  /* use this to mark accesses to per-CPU variables... */
 #else /* !__ASSEMBLY__ */
 
-
 #include <linux/threads.h>
 
 #define DECLARE_PER_CPU(type, name)				\
@@ -20,24 +19,12 @@
 
 #ifdef CONFIG_SMP
 
-extern unsigned long __per_cpu_offset[NR_CPUS];
-#define per_cpu_offset(x) (__per_cpu_offset[x])
-
-/* Equal to __per_cpu_offset[smp_processor_id()], but faster to access: */
-DECLARE_PER_CPU(unsigned long, local_per_cpu_offset);
-
-#define per_cpu(var, cpu)  (*RELOC_HIDE(&per_cpu__##var, __per_cpu_offset[cpu]))
-#define __get_cpu_var(var) (*RELOC_HIDE(&per_cpu__##var, __ia64_per_cpu_var(local_per_cpu_offset)))
-#define __raw_get_cpu_var(var) (*RELOC_HIDE(&per_cpu__##var, __ia64_per_cpu_var(local_per_cpu_offset)))
+#define __my_cpu_offset	__ia64_per_cpu_var(local_per_cpu_offset)
 
-extern void setup_per_cpu_areas (void);
 extern void *per_cpu_init(void);
 
 #else /* ! SMP */
 
-#define per_cpu(var, cpu)			(*((void)(cpu), &per_cpu__##var))
-#define __get_cpu_var(var)			per_cpu__##var
-#define __raw_get_cpu_var(var)			per_cpu__##var
 #define per_cpu_init()				(__phys_per_cpu_start)
 
 #endif	/* SMP */
@@ -48,7 +35,12 @@ extern void *per_cpu_init(void);
  * On the positive side, using __ia64_per_cpu_var() instead of __get_cpu_var() is slightly
  * more efficient.
  */
-#define __ia64_per_cpu_var(var)	(per_cpu__##var)
+#define __ia64_per_cpu_var(var)	(*SHIFT_PTR(&per_cpu__##var, PERCPU_ADDR))
+
+#include <asm-generic/percpu.h>
+
+/* Equal to __per_cpu_offset[smp_processor_id()], but faster to access: */
+DECLARE_PER_CPU(unsigned long, local_per_cpu_offset);
 
 #endif /* !__ASSEMBLY__ */
 

-- 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [patch 14/14] x86: Unify percpu.h
  2007-11-27  0:14 [patch 00/14] Per cpu code simplification Christoph Lameter
                   ` (12 preceding siblings ...)
  2007-11-27  0:14 ` [patch 13/14] ia64: " Christoph Lameter
@ 2007-11-27  0:14 ` Christoph Lameter
  13 siblings, 0 replies; 59+ messages in thread
From: Christoph Lameter @ 2007-11-27  0:14 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, Rusty Russell, tglx, mingo

[-- Attachment #1: unification --]
[-- Type: text/plain, Size: 8777 bytes --]

Form a single percpu.h from percpu_32.h and percpu_64.h. Both are now pretty
small so the patch is simply putting them together.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: tglx@linutronix.de
Cc: mingo@redhat.com
Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/asm-x86/percpu.h    |  145 ++++++++++++++++++++++++++++++++++++++++++--
 include/asm-x86/percpu_32.h |  119 ------------------------------------
 include/asm-x86/percpu_64.h |   20 ------
 3 files changed, 141 insertions(+), 143 deletions(-)

Index: linux-2.6/include/asm-x86/percpu.h
===================================================================
--- linux-2.6.orig/include/asm-x86/percpu.h	2007-11-24 19:25:15.104600720 -0800
+++ linux-2.6/include/asm-x86/percpu.h	2007-11-26 11:39:26.698432045 -0800
@@ -1,5 +1,142 @@
-#ifdef CONFIG_X86_32
-# include "percpu_32.h"
-#else
-# include "percpu_64.h"
+#ifndef _ASM_X86_PERCPU_H_
+#define _ASM_X86_PERCPU_H_
+
+#ifdef CONFIG_X86_64
+#include <linux/compiler.h>
+
+/* Same as asm-generic/percpu.h, except that we store the per cpu offset
+   in the PDA. Longer term the PDA and every per cpu variable
+   should be just put into a single section and referenced directly
+   from %gs */
+
+#ifdef CONFIG_SMP
+#include <asm/pda.h>
+
+#define __per_cpu_offset(cpu) (cpu_pda(cpu)->data_offset)
+#define __my_cpu_offset read_pda(data_offset)
+
+#define per_cpu_offset(x) (__per_cpu_offset(x))
+
 #endif
+#include <asm-generic/percpu.h>
+
+DECLARE_PER_CPU(struct x8664_pda, pda);
+
+#else /* CONFIG_X86_64 */
+
+#ifdef __ASSEMBLY__
+
+/*
+ * PER_CPU finds an address of a per-cpu variable.
+ *
+ * Args:
+ *    var - variable name
+ *    reg - 32bit register
+ *
+ * The resulting address is stored in the "reg" argument.
+ *
+ * Example:
+ *    PER_CPU(cpu_gdt_descr, %ebx)
+ */
+#ifdef CONFIG_SMP
+#define PER_CPU(var, reg)				\
+	movl %fs:per_cpu__##this_cpu_off, reg;		\
+	lea per_cpu__##var(reg), reg
+#define PER_CPU_VAR(var)	%fs:per_cpu__##var
+#else /* ! SMP */
+#define PER_CPU(var, reg)			\
+	movl $per_cpu__##var, reg
+#define PER_CPU_VAR(var)	per_cpu__##var
+#endif	/* SMP */
+
+#else /* ...!ASSEMBLY */
+
+/*
+ * PER_CPU finds an address of a per-cpu variable.
+ *
+ * Args:
+ *    var - variable name
+ *    cpu - 32bit register containing the current CPU number
+ *
+ * The resulting address is stored in the "cpu" argument.
+ *
+ * Example:
+ *    PER_CPU(cpu_gdt_descr, %ebx)
+ */
+#ifdef CONFIG_SMP
+
+#define __my_cpu_offset x86_read_percpu(this_cpu_off)
+
+/* fs segment starts at (positive) offset == __per_cpu_offset[cpu] */
+#define __percpu_seg "%%fs:"
+
+#else  /* !SMP */
+
+#define __percpu_seg ""
+
+#endif	/* SMP */
+
+#include <asm-generic/percpu.h>
+
+/* We can use this directly for local CPU (faster). */
+DECLARE_PER_CPU(unsigned long, this_cpu_off);
+
+/* For arch-specific code, we can use direct single-insn ops (they
+ * don't give an lvalue though). */
+extern void __bad_percpu_size(void);
+
+#define percpu_to_op(op,var,val)				\
+	do {							\
+		typedef typeof(var) T__;			\
+		if (0) { T__ tmp__; tmp__ = (val); }		\
+		switch (sizeof(var)) {				\
+		case 1:						\
+			asm(op "b %1,"__percpu_seg"%0"		\
+			    : "+m" (var)			\
+			    :"ri" ((T__)val));			\
+			break;					\
+		case 2:						\
+			asm(op "w %1,"__percpu_seg"%0"		\
+			    : "+m" (var)			\
+			    :"ri" ((T__)val));			\
+			break;					\
+		case 4:						\
+			asm(op "l %1,"__percpu_seg"%0"		\
+			    : "+m" (var)			\
+			    :"ri" ((T__)val));			\
+			break;					\
+		default: __bad_percpu_size();			\
+		}						\
+	} while (0)
+
+#define percpu_from_op(op,var)					\
+	({							\
+		typeof(var) ret__;				\
+		switch (sizeof(var)) {				\
+		case 1:						\
+			asm(op "b "__percpu_seg"%1,%0"		\
+			    : "=r" (ret__)			\
+			    : "m" (var));			\
+			break;					\
+		case 2:						\
+			asm(op "w "__percpu_seg"%1,%0"		\
+			    : "=r" (ret__)			\
+			    : "m" (var));			\
+			break;					\
+		case 4:						\
+			asm(op "l "__percpu_seg"%1,%0"		\
+			    : "=r" (ret__)			\
+			    : "m" (var));			\
+			break;					\
+		default: __bad_percpu_size();			\
+		}						\
+		ret__; })
+
+#define x86_read_percpu(var) percpu_from_op("mov", per_cpu__##var)
+#define x86_write_percpu(var,val) percpu_to_op("mov", per_cpu__##var, val)
+#define x86_add_percpu(var,val) percpu_to_op("add", per_cpu__##var, val)
+#define x86_sub_percpu(var,val) percpu_to_op("sub", per_cpu__##var, val)
+#define x86_or_percpu(var,val) percpu_to_op("or", per_cpu__##var, val)
+#endif /* !__ASSEMBLY__ */
+#endif /* !CONFIG_X86_64 */
+#endif /* _ASM_X86_PERCPU_H_ */
Index: linux-2.6/include/asm-x86/percpu_32.h
===================================================================
--- linux-2.6.orig/include/asm-x86/percpu_32.h	2007-11-24 19:36:39.168850491 -0800
+++ /dev/null	1970-01-01 00:00:00.000000000 +0000
@@ -1,119 +0,0 @@
-#ifndef __ARCH_I386_PERCPU__
-#define __ARCH_I386_PERCPU__
-
-#ifdef __ASSEMBLY__
-
-/*
- * PER_CPU finds an address of a per-cpu variable.
- *
- * Args:
- *    var - variable name
- *    reg - 32bit register
- *
- * The resulting address is stored in the "reg" argument.
- *
- * Example:
- *    PER_CPU(cpu_gdt_descr, %ebx)
- */
-#ifdef CONFIG_SMP
-#define PER_CPU(var, reg)				\
-	movl %fs:per_cpu__##this_cpu_off, reg;		\
-	lea per_cpu__##var(reg), reg
-#define PER_CPU_VAR(var)	%fs:per_cpu__##var
-#else /* ! SMP */
-#define PER_CPU(var, reg)			\
-	movl $per_cpu__##var, reg
-#define PER_CPU_VAR(var)	per_cpu__##var
-#endif	/* SMP */
-
-#else /* ...!ASSEMBLY */
-
-/*
- * PER_CPU finds an address of a per-cpu variable.
- *
- * Args:
- *    var - variable name
- *    cpu - 32bit register containing the current CPU number
- *
- * The resulting address is stored in the "cpu" argument.
- *
- * Example:
- *    PER_CPU(cpu_gdt_descr, %ebx)
- */
-#ifdef CONFIG_SMP
-
-#define __my_cpu_offset x86_read_percpu(this_cpu_off)
-
-/* fs segment starts at (positive) offset == __per_cpu_offset[cpu] */
-#define __percpu_seg "%%fs:"
-
-#else  /* !SMP */
-
-#define __percpu_seg ""
-
-#endif	/* SMP */
-
-#include <asm-generic/percpu.h>
-
-/* We can use this directly for local CPU (faster). */
-DECLARE_PER_CPU(unsigned long, this_cpu_off);
-
-/* For arch-specific code, we can use direct single-insn ops (they
- * don't give an lvalue though). */
-extern void __bad_percpu_size(void);
-
-#define percpu_to_op(op,var,val)				\
-	do {							\
-		typedef typeof(var) T__;			\
-		if (0) { T__ tmp__; tmp__ = (val); }		\
-		switch (sizeof(var)) {				\
-		case 1:						\
-			asm(op "b %1,"__percpu_seg"%0"		\
-			    : "+m" (var)			\
-			    :"ri" ((T__)val));			\
-			break;					\
-		case 2:						\
-			asm(op "w %1,"__percpu_seg"%0"		\
-			    : "+m" (var)			\
-			    :"ri" ((T__)val));			\
-			break;					\
-		case 4:						\
-			asm(op "l %1,"__percpu_seg"%0"		\
-			    : "+m" (var)			\
-			    :"ri" ((T__)val));			\
-			break;					\
-		default: __bad_percpu_size();			\
-		}						\
-	} while (0)
-
-#define percpu_from_op(op,var)					\
-	({							\
-		typeof(var) ret__;				\
-		switch (sizeof(var)) {				\
-		case 1:						\
-			asm(op "b "__percpu_seg"%1,%0"		\
-			    : "=r" (ret__)			\
-			    : "m" (var));			\
-			break;					\
-		case 2:						\
-			asm(op "w "__percpu_seg"%1,%0"		\
-			    : "=r" (ret__)			\
-			    : "m" (var));			\
-			break;					\
-		case 4:						\
-			asm(op "l "__percpu_seg"%1,%0"		\
-			    : "=r" (ret__)			\
-			    : "m" (var));			\
-			break;					\
-		default: __bad_percpu_size();			\
-		}						\
-		ret__; })
-
-#define x86_read_percpu(var) percpu_from_op("mov", per_cpu__##var)
-#define x86_write_percpu(var,val) percpu_to_op("mov", per_cpu__##var, val)
-#define x86_add_percpu(var,val) percpu_to_op("add", per_cpu__##var, val)
-#define x86_sub_percpu(var,val) percpu_to_op("sub", per_cpu__##var, val)
-#define x86_or_percpu(var,val) percpu_to_op("or", per_cpu__##var, val)
-#endif /* !__ASSEMBLY__ */
-
-#endif /* __ARCH_I386_PERCPU__ */
Index: linux-2.6/include/asm-x86/percpu_64.h
===================================================================
--- linux-2.6.orig/include/asm-x86/percpu_64.h	2007-11-24 19:36:40.156103380 -0800
+++ /dev/null	1970-01-01 00:00:00.000000000 +0000
@@ -1,20 +0,0 @@
-#ifndef _ASM_X8664_PERCPU_H_
-#define _ASM_X8664_PERCPU_H_
-#include <linux/compiler.h>
-
-/* Same as asm-generic/percpu.h, except that we store the per cpu offset
-   in the PDA. Longer term the PDA and every per cpu variable
-   should be just put into a single section and referenced directly
-   from %gs */
-
-#ifdef CONFIG_SMP
-#include <asm/pda.h>
-
-#define __per_cpu_offset(cpu) (cpu_pda(cpu)->data_offset)
-#define __my_cpu_offset read_pda(data_offset)
-
-#define per_cpu_offset(x) (__per_cpu_offset(x))
-
-#endif
-#include <asm-generic/percpu.h>
-#endif /* _ASM_X8664_PERCPU_H_ */

-- 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 13/14] ia64: Use generic percpu
  2007-11-27  0:14 ` [patch 13/14] ia64: " Christoph Lameter
@ 2007-11-27  1:37   ` Christoph Lameter
  0 siblings, 0 replies; 59+ messages in thread
From: Christoph Lameter @ 2007-11-27  1:37 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-ia64, tony.luck

Duh. This particular patch assumes already relocated per cpu areas which 
does not work with ia64's per cpu area mapings. This fix is needed:

---
 include/asm-ia64/percpu.h |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6/include/asm-ia64/percpu.h
===================================================================
--- linux-2.6.orig/include/asm-ia64/percpu.h	2007-11-26 17:14:22.823022434 -0800
+++ linux-2.6/include/asm-ia64/percpu.h	2007-11-26 17:18:34.063021793 -0800
@@ -9,7 +9,7 @@
 #define PERCPU_ENOUGH_ROOM PERCPU_PAGE_SIZE
 
 #ifdef __ASSEMBLY__
-# define THIS_CPU(var)	(PERCPU_ADDR + per_cpu__##var)  /* use this to mark accesses to per-CPU variables... */
+# define THIS_CPU(var)	per_cpu__##var  /* use this to mark accesses to per-CPU variables... */
 #else /* !__ASSEMBLY__ */
 
 #include <linux/threads.h>
@@ -35,7 +35,7 @@ extern void *per_cpu_init(void);
  * On the positive side, using __ia64_per_cpu_var() instead of __get_cpu_var() is slightly
  * more efficient.
  */
-#define __ia64_per_cpu_var(var)	(*SHIFT_PTR(&per_cpu__##var, PERCPU_ADDR))
+#define __ia64_per_cpu_var(var)	per_cpu__##var
 
 #include <asm-generic/percpu.h>
 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
  2007-11-27  0:14 ` [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup Christoph Lameter
@ 2007-11-27  4:30   ` Rusty Russell
  2007-11-27 18:14     ` Christoph Lameter
  2007-11-27 23:40   ` Randy Dunlap
  1 sibling, 1 reply; 59+ messages in thread
From: Rusty Russell @ 2007-11-27  4:30 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-kernel, Andi Kleen

On Tuesday 27 November 2007 11:14:12 Christoph Lameter wrote:
> The use of the __GENERIC_PERCPU is a bit problematic since arches
> may want to run their own percpu setup while using the generic
> percpu definitions. Replace it through a kconfig variable.

Thanks for this Christoph!

These patches are great: the early experiments are obviously over, and so this 
consolidation is overdue.

Have you considered moving x86-64's setup_per_cpu_areas into generic code?  
It's a bit messier because some archs might not have set up NUMA stuff yet, 
but it's logically generic...

Thanks!
Rusty.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 04/14] ia64: Remove the __SMALL_ADDR_AREA attribute for per cpu access
  2007-11-27  0:14 ` [patch 04/14] ia64: Remove the __SMALL_ADDR_AREA attribute for per cpu access Christoph Lameter
@ 2007-11-27  5:20   ` David Mosberger-Tang
  2007-11-27 18:15     ` Christoph Lameter
  2007-11-27  9:30   ` Andreas Schwab
  1 sibling, 1 reply; 59+ messages in thread
From: David Mosberger-Tang @ 2007-11-27  5:20 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-kernel, linux-ia64, tony.luck

On 11/26/07, Christoph Lameter <clameter@sgi.com> wrote:
> The model(small) attribute is not supported by gcc 4.X. The tests
> will always be negative today.

What was the rationale for removing this attribute?

  --david

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 11/14] Powerpc: Use generic per cpu
  2007-11-27  0:14 ` [patch 11/14] Powerpc: Use generic per cpu Christoph Lameter
@ 2007-11-27  7:41   ` Kumar Gala
  2007-11-27 18:16     ` Christoph Lameter
  0 siblings, 1 reply; 59+ messages in thread
From: Kumar Gala @ 2007-11-27  7:41 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-kernel, Paul Mackerras


On Nov 26, 2007, at 6:14 PM, Christoph Lameter wrote:

> Powerpc has a way to determine the address of the per cpu area of the
> currently executing processor via the paca and the array of per cpu
> offsets is avoided by looking up the per cpu area from the remote
> paca's (copying x86_64).
>
> Cc: Paul Mackerras <paulus@samba.org>
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
>
> ---
> include/asm-powerpc/percpu.h |   19 -------------------
> 1 file changed, 19 deletions(-)
>
> Index: linux-2.6/include/asm-powerpc/percpu.h
> ===================================================================
> --- linux-2.6.orig/include/asm-powerpc/percpu.h	2007-11-24  
> 10:27:31.088350556 -0800
> +++ linux-2.6/include/asm-powerpc/percpu.h	2007-11-24  
> 10:29:20.752350757 -0800
> @@ -16,25 +16,6 @@
> #define __my_cpu_offset() get_paca()->data_offset
> #define per_cpu_offset(x) (__per_cpu_offset(x))

This concerns me.  paca doesn't exist on all PPC platforms.

- k

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 04/14] ia64: Remove the __SMALL_ADDR_AREA attribute for per cpu access
  2007-11-27  0:14 ` [patch 04/14] ia64: Remove the __SMALL_ADDR_AREA attribute for per cpu access Christoph Lameter
  2007-11-27  5:20   ` David Mosberger-Tang
@ 2007-11-27  9:30   ` Andreas Schwab
  2007-11-27 18:17     ` Christoph Lameter
  1 sibling, 1 reply; 59+ messages in thread
From: Andreas Schwab @ 2007-11-27  9:30 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-kernel, linux-ia64, tony.luck

Christoph Lameter <clameter@sgi.com> writes:

> The model(small) attribute is not supported by gcc 4.X.

Which gcc 4.X are you talking about?

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
  2007-11-27  4:30   ` Rusty Russell
@ 2007-11-27 18:14     ` Christoph Lameter
  2007-11-28  1:36       ` Rusty Russell
  0 siblings, 1 reply; 59+ messages in thread
From: Christoph Lameter @ 2007-11-27 18:14 UTC (permalink / raw)
  To: Rusty Russell; +Cc: akpm, linux-kernel, Andi Kleen

On Tue, 27 Nov 2007, Rusty Russell wrote:

> Have you considered moving x86-64's setup_per_cpu_areas into generic code?  
> It's a bit messier because some archs might not have set up NUMA stuff yet, 
> but it's logically generic...

Yes that will happen later. This is just the early cleanup work. I 
plan to generally bring the two x86 arches in line. The pda will be 
folded into the per cpu area and after that its easy to do.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 04/14] ia64: Remove the __SMALL_ADDR_AREA attribute for per cpu access
  2007-11-27  5:20   ` David Mosberger-Tang
@ 2007-11-27 18:15     ` Christoph Lameter
  2007-11-27 21:10       ` David Mosberger-Tang
  0 siblings, 1 reply; 59+ messages in thread
From: Christoph Lameter @ 2007-11-27 18:15 UTC (permalink / raw)
  To: David Mosberger-Tang; +Cc: akpm, linux-kernel, linux-ia64, tony.luck

On Mon, 26 Nov 2007, David Mosberger-Tang wrote:

> On 11/26/07, Christoph Lameter <clameter@sgi.com> wrote:
> > The model(small) attribute is not supported by gcc 4.X. The tests
> > will always be negative today.
> 
> What was the rationale for removing this attribute?

The code is then similar across all architectures and can be moved into 
generic code.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 11/14] Powerpc: Use generic per cpu
  2007-11-27  7:41   ` Kumar Gala
@ 2007-11-27 18:16     ` Christoph Lameter
  2007-11-27 20:58       ` Paul Mackerras
  0 siblings, 1 reply; 59+ messages in thread
From: Christoph Lameter @ 2007-11-27 18:16 UTC (permalink / raw)
  To: Kumar Gala; +Cc: akpm, linux-kernel, Paul Mackerras

On Tue, 27 Nov 2007, Kumar Gala wrote:

> > Index: linux-2.6/include/asm-powerpc/percpu.h
> > ===================================================================
> > --- linux-2.6.orig/include/asm-powerpc/percpu.h	2007-11-24
> > 10:27:31.088350556 -0800
> > +++ linux-2.6/include/asm-powerpc/percpu.h	2007-11-24 10:29:20.752350757
> > -0800
> > @@ -16,25 +16,6 @@
> > #define __my_cpu_offset() get_paca()->data_offset
> > #define per_cpu_offset(x) (__per_cpu_offset(x))
> 
> This concerns me.  paca doesn't exist on all PPC platforms.

I wonder why the current code is working then.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 04/14] ia64: Remove the __SMALL_ADDR_AREA attribute for per cpu access
  2007-11-27  9:30   ` Andreas Schwab
@ 2007-11-27 18:17     ` Christoph Lameter
  2007-11-27 21:24       ` Andreas Schwab
  0 siblings, 1 reply; 59+ messages in thread
From: Christoph Lameter @ 2007-11-27 18:17 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: akpm, linux-kernel, linux-ia64, tony.luck

On Tue, 27 Nov 2007, Andreas Schwab wrote:

> Christoph Lameter <clameter@sgi.com> writes:
> 
> > The model(small) attribute is not supported by gcc 4.X.
> 
> Which gcc 4.X are you talking about?

All. Last gcc that supported this was 3.4.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 11/14] Powerpc: Use generic per cpu
  2007-11-27 18:16     ` Christoph Lameter
@ 2007-11-27 20:58       ` Paul Mackerras
  2007-11-27 21:13         ` Christoph Lameter
  0 siblings, 1 reply; 59+ messages in thread
From: Paul Mackerras @ 2007-11-27 20:58 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Kumar Gala, akpm, linux-kernel

Christoph Lameter writes:
> On Tue, 27 Nov 2007, Kumar Gala wrote:
> 
> > > Index: linux-2.6/include/asm-powerpc/percpu.h
> > > ===================================================================
> > > --- linux-2.6.orig/include/asm-powerpc/percpu.h	2007-11-24
> > > 10:27:31.088350556 -0800
> > > +++ linux-2.6/include/asm-powerpc/percpu.h	2007-11-24 10:29:20.752350757
> > > -0800
> > > @@ -16,25 +16,6 @@
> > > #define __my_cpu_offset() get_paca()->data_offset
> > > #define per_cpu_offset(x) (__per_cpu_offset(x))
> > 
> > This concerns me.  paca doesn't exist on all PPC platforms.
> 
> I wonder why the current code is working then.

Did you try both 32-bit (CONFIG_64BIT=n) and 64-bit (CONFIG_64BIT=y)
configurations?  The paca only exists in 64-bit kernels.

Paul.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 04/14] ia64: Remove the __SMALL_ADDR_AREA attribute for per cpu access
  2007-11-27 18:15     ` Christoph Lameter
@ 2007-11-27 21:10       ` David Mosberger-Tang
  2007-11-27 21:18         ` Christoph Lameter
  0 siblings, 1 reply; 59+ messages in thread
From: David Mosberger-Tang @ 2007-11-27 21:10 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-kernel, linux-ia64, tony.luck

On 11/27/07, Christoph Lameter <clameter@sgi.com> wrote:
> On Mon, 26 Nov 2007, David Mosberger-Tang wrote:
>
> > On 11/26/07, Christoph Lameter <clameter@sgi.com> wrote:
> > > The model(small) attribute is not supported by gcc 4.X. The tests
> > > will always be negative today.
> >
> > What was the rationale for removing this attribute?
>
> The code is then similar across all architectures and can be moved into
> generic code.

Uniformity for the sake of uniformity?  The small data addressing is
really elegant and I don't think it should be dropped just for the
sake of uniformity.

  --david

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 11/14] Powerpc: Use generic per cpu
  2007-11-27 20:58       ` Paul Mackerras
@ 2007-11-27 21:13         ` Christoph Lameter
  2007-11-28  2:35           ` Paul Mackerras
  0 siblings, 1 reply; 59+ messages in thread
From: Christoph Lameter @ 2007-11-27 21:13 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Kumar Gala, akpm, linux-kernel

On Wed, 28 Nov 2007, Paul Mackerras wrote:

> Did you try both 32-bit (CONFIG_64BIT=n) and 64-bit (CONFIG_64BIT=y)
> configurations?  The paca only exists in 64-bit kernels.

I build both and there is no dependency on 32bit or 64 bit in 
include/asm-powerpc/percpu.h. Both access the paca (whatever that is).


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 04/14] ia64: Remove the __SMALL_ADDR_AREA attribute for per cpu access
  2007-11-27 21:10       ` David Mosberger-Tang
@ 2007-11-27 21:18         ` Christoph Lameter
  2007-11-27 21:27           ` David Mosberger-Tang
  0 siblings, 1 reply; 59+ messages in thread
From: Christoph Lameter @ 2007-11-27 21:18 UTC (permalink / raw)
  To: David Mosberger-Tang; +Cc: akpm, linux-kernel, linux-ia64, tony.luck

On Tue, 27 Nov 2007, David Mosberger-Tang wrote:

> Uniformity for the sake of uniformity?  The small data addressing is
> really elegant and I don't think it should be dropped just for the
> sake of uniformity.

Uniformity for the sake of code size reduction and easier maintenance. 

Yes I think it would be great to have this feature on all arches if 
possible. If someone could work with the gcc /linker folks to get this 
done that would be great.

But the feature has been removed from gcc and so its not usable for IA64 
with a current compiler anymore. This is basically removing useless code.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 04/14] ia64: Remove the __SMALL_ADDR_AREA attribute for per cpu access
  2007-11-27 18:17     ` Christoph Lameter
@ 2007-11-27 21:24       ` Andreas Schwab
  2007-11-27 21:38         ` Christoph Lameter
  0 siblings, 1 reply; 59+ messages in thread
From: Andreas Schwab @ 2007-11-27 21:24 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-kernel, linux-ia64, tony.luck

Christoph Lameter <clameter@sgi.com> writes:

> On Tue, 27 Nov 2007, Andreas Schwab wrote:
>
>> Christoph Lameter <clameter@sgi.com> writes:
>> 
>> > The model(small) attribute is not supported by gcc 4.X.
>> 
>> Which gcc 4.X are you talking about?
>
> All. Last gcc that supported this was 3.4.

Strange.  Works fine here.

$ arch/ia64/scripts/toolchain-flags gcc objdump readelf
-DHAVE_WORKING_TEXT_ALIGN -DHAVE_MODEL_SMALL_ATTRIBUTE -DHAVE_SERIALIZE_DIRECTIVE
$ gcc --version | head -n 1
gcc (GCC) 4.2.1 (SUSE Linux)
$ grep ia64_handle_model_attribute config/ia64/*.c
config/ia64/ia64.c:static tree ia64_handle_model_attribute (tree *, tree, tree, int, bool *);
config/ia64/ia64.c:  { "model",	       1, 1, true, false, false, ia64_handle_model_attribute },
config/ia64/ia64.c:ia64_handle_model_attribute (tree *node, tree name, tree args,
$ grep small_addr_symbolic_operand config/ia64/*.md
config/ia64/constraints.md:  (match_operand 0 "small_addr_symbolic_operand"))
config/ia64/predicates.md:(define_predicate "small_addr_symbolic_operand" 

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 04/14] ia64: Remove the __SMALL_ADDR_AREA attribute for per cpu access
  2007-11-27 21:18         ` Christoph Lameter
@ 2007-11-27 21:27           ` David Mosberger-Tang
  2007-11-27 22:02             ` Christoph Lameter
  0 siblings, 1 reply; 59+ messages in thread
From: David Mosberger-Tang @ 2007-11-27 21:27 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-kernel, linux-ia64, tony.luck

On 11/27/07, Christoph Lameter <clameter@sgi.com> wrote:
> On Tue, 27 Nov 2007, David Mosberger-Tang wrote:
>
> > Uniformity for the sake of uniformity?  The small data addressing is
> > really elegant and I don't think it should be dropped just for the
> > sake of uniformity.
>
> Uniformity for the sake of code size reduction and easier maintenance.

Code-size reduction?  You must be talking *source* code size
reduction.  Surely the small-data access-method decreases object code
size.

  --david

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 04/14] ia64: Remove the __SMALL_ADDR_AREA attribute for per cpu access
  2007-11-27 21:24       ` Andreas Schwab
@ 2007-11-27 21:38         ` Christoph Lameter
  2007-11-27 22:14           ` Adrian Bunk
  0 siblings, 1 reply; 59+ messages in thread
From: Christoph Lameter @ 2007-11-27 21:38 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: akpm, linux-kernel, linux-ia64, tony.luck

On Tue, 27 Nov 2007, Andreas Schwab wrote:

> Strange.  Works fine here.
> 
> $ arch/ia64/scripts/toolchain-flags gcc objdump readelf
> -DHAVE_WORKING_TEXT_ALIGN -DHAVE_MODEL_SMALL_ATTRIBUTE -DHAVE_SERIALIZE_DIRECTIVE
> $ gcc --version | head -n 1
> gcc (GCC) 4.2.1 (SUSE Linux)
> $ grep ia64_handle_model_attribute config/ia64/*.c
> config/ia64/ia64.c:static tree ia64_handle_model_attribute (tree *, tree, tree, int, bool *);
> config/ia64/ia64.c:  { "model",	       1, 1, true, false, false, ia64_handle_model_attribute },
> config/ia64/ia64.c:ia64_handle_model_attribute (tree *node, tree name, tree args,
> $ grep small_addr_symbolic_operand config/ia64/*.md
> config/ia64/constraints.md:  (match_operand 0 "small_addr_symbolic_operand"))
> config/ia64/predicates.md:(define_predicate "small_addr_symbolic_operand" 

Hmmm...

http://www.ohse.de/uwe/articles/gcc-attributes.html

says:

model (MODEL-NAME)
    Found in versions: 2.8-3.4

But true my compiler still takes it. Ok, I am going to add an option to 
add attributes to percpu definitions.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 04/14] ia64: Remove the __SMALL_ADDR_AREA attribute for per cpu access
  2007-11-27 21:27           ` David Mosberger-Tang
@ 2007-11-27 22:02             ` Christoph Lameter
  0 siblings, 0 replies; 59+ messages in thread
From: Christoph Lameter @ 2007-11-27 22:02 UTC (permalink / raw)
  To: David Mosberger-Tang; +Cc: akpm, linux-kernel, linux-ia64, tony.luck

On Tue, 27 Nov 2007, David Mosberger-Tang wrote:

> Code-size reduction?  You must be talking *source* code size
> reduction.  Surely the small-data access-method decreases object code
> size.

Yes source code reduction. I just added the attribute back but in such a 
way that any arch can add attributes to per cpu definitions.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 04/14] ia64: Remove the __SMALL_ADDR_AREA attribute for per cpu access
  2007-11-27 21:38         ` Christoph Lameter
@ 2007-11-27 22:14           ` Adrian Bunk
  0 siblings, 0 replies; 59+ messages in thread
From: Adrian Bunk @ 2007-11-27 22:14 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andreas Schwab, akpm, linux-kernel, linux-ia64, tony.luck

On Tue, Nov 27, 2007 at 01:38:02PM -0800, Christoph Lameter wrote:
>...
> Hmmm...
> 
> http://www.ohse.de/uwe/articles/gcc-attributes.html
> 
> says:
> 
> model (MODEL-NAME)
>     Found in versions: 2.8-3.4
>...

This site says at the top it used gcc versions up to 3.4, so it 
obviously can't find anything in gcc >= 4.0 ...

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
  2007-11-27  0:14 ` [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup Christoph Lameter
  2007-11-27  4:30   ` Rusty Russell
@ 2007-11-27 23:40   ` Randy Dunlap
  2007-11-28  0:03     ` Christoph Lameter
  1 sibling, 1 reply; 59+ messages in thread
From: Randy Dunlap @ 2007-11-27 23:40 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-kernel, Rusty Russell, Andi Kleen

On Mon, 26 Nov 2007 16:14:12 -0800 Christoph Lameter wrote:

> The use of the __GENERIC_PERCPU is a bit problematic since arches
> may want to run their own percpu setup while using the generic
> percpu definitions. Replace it through a kconfig variable.
> 
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Andi Kleen <ak@suse.de>
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
> ---
> 
> Index: linux-2.6/arch/ia64/Kconfig
> ===================================================================
> --- linux-2.6.orig/arch/ia64/Kconfig	2007-11-26 15:38:56.415112360 -0800
> +++ linux-2.6/arch/ia64/Kconfig	2007-11-26 15:40:10.425862722 -0800
> @@ -75,6 +75,10 @@ config GENERIC_TIME_VSYSCALL
>  	bool
>  	default y
>  
> +config ARCH_SETS_UP_PER_CPU_AREA
> +	bool
> +	default y
> +
>  config DMI
>  	bool
>  	default y

> Index: linux-2.6/arch/sparc64/Kconfig
> ===================================================================
> --- linux-2.6.orig/arch/sparc64/Kconfig	2007-11-26 15:38:56.447111936 -0800
> +++ linux-2.6/arch/sparc64/Kconfig	2007-11-26 15:40:10.425862722 -0800
> @@ -66,6 +66,10 @@ config AUDIT_ARCH
>  	bool
>  	default y
>  
> +config ARCH_SETS_UP_PER_CPU_AREA
> +	bool
> +	default y

	def_bool y
  is the preferred form for those 2-liners above...


> +
>  config ARCH_NO_VIRT_TO_BUS
>  	def_bool y
>  


---
~Randy

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
  2007-11-27 23:40   ` Randy Dunlap
@ 2007-11-28  0:03     ` Christoph Lameter
  2007-11-28  0:05       ` Randy Dunlap
  0 siblings, 1 reply; 59+ messages in thread
From: Christoph Lameter @ 2007-11-28  0:03 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: akpm, linux-kernel, Rusty Russell, Andi Kleen

On Tue, 27 Nov 2007, Randy Dunlap wrote:

> > +config ARCH_SETS_UP_PER_CPU_AREA
> > +	bool
> > +	default y
> 
> 	def_bool y
>   is the preferred form for those 2-liners above...
> 
> 
> > +
> >  config ARCH_NO_VIRT_TO_BUS
> >  	def_bool y
> >  

Ok. Changed.

x86 should use

config ARCH_SETS_UP_PER_CPU_AREA
        def_bool X86_64

?


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
  2007-11-28  0:03     ` Christoph Lameter
@ 2007-11-28  0:05       ` Randy Dunlap
  0 siblings, 0 replies; 59+ messages in thread
From: Randy Dunlap @ 2007-11-28  0:05 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-kernel, Rusty Russell, Andi Kleen

Christoph Lameter wrote:
> On Tue, 27 Nov 2007, Randy Dunlap wrote:
> 
>>> +config ARCH_SETS_UP_PER_CPU_AREA
>>> +	bool
>>> +	default y
>> 	def_bool y
>>   is the preferred form for those 2-liners above...
>>
>>
>>> +
>>>  config ARCH_NO_VIRT_TO_BUS
>>>  	def_bool y
>>>  
> 
> Ok. Changed.
> 
> x86 should use
> 
> config ARCH_SETS_UP_PER_CPU_AREA
>         def_bool X86_64
> 
> ?

Yes, you can do
	def_bool <config symbol>
as well to make the new symbol be variable instead of constant.


-- 
~Randy

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
  2007-11-27 18:14     ` Christoph Lameter
@ 2007-11-28  1:36       ` Rusty Russell
  2007-11-28 18:51         ` Christoph Lameter
  0 siblings, 1 reply; 59+ messages in thread
From: Rusty Russell @ 2007-11-28  1:36 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-kernel, Andi Kleen, Ben Elliston

On Wednesday 28 November 2007 05:14:47 Christoph Lameter wrote:
> On Tue, 27 Nov 2007, Rusty Russell wrote:
> > Have you considered moving x86-64's setup_per_cpu_areas into generic
> > code? It's a bit messier because some archs might not have set up NUMA
> > stuff yet, but it's logically generic...
>
> Yes that will happen later. This is just the early cleanup work. I
> plan to generally bring the two x86 arches in line. The pda will be
> folded into the per cpu area and after that its easy to do.

Unfortunately, we tried to get rid of the x86-64 pda (like i386) but you lose 
the ability to use the stack protection config option.  That's because it 
assumes that gs:0x68 (or something) is the stack canary; we need a YA gcc 
change to make this gs:__builtin_stack_canary_off (where gcc can emit 
__builtin_stack_canary_off as a weak absolute symbol, so we can override it 
for the kernel.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 11/14] Powerpc: Use generic per cpu
  2007-11-27 21:13         ` Christoph Lameter
@ 2007-11-28  2:35           ` Paul Mackerras
  2007-11-28 18:54             ` Christoph Lameter
  0 siblings, 1 reply; 59+ messages in thread
From: Paul Mackerras @ 2007-11-28  2:35 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Kumar Gala, akpm, linux-kernel

Christoph Lameter writes:

> On Wed, 28 Nov 2007, Paul Mackerras wrote:
> 
> > Did you try both 32-bit (CONFIG_64BIT=n) and 64-bit (CONFIG_64BIT=y)
> > configurations?  The paca only exists in 64-bit kernels.
> 
> I build both and there is no dependency on 32bit or 64 bit in 
> include/asm-powerpc/percpu.h. Both access the paca (whatever that is).

Look at line 3 of include/asm-powerpc/percpu.h:

#ifdef __powerpc64__

As far as I can see, after applying your series of patches, I end up
with an unbalanced #ifdef in include/asm-powerpc/percpu.h.  I see 3
#ifdef/#ifndef, but only two #endifs.  It needs another #endif after
the #endif /* SMP */ to match the #ifdef __powerpc64__.  With that
change it looks OK, since 32-bit uses asm-generic/percpu.h for both
SMP and UP.

Paul.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
  2007-11-28  1:36       ` Rusty Russell
@ 2007-11-28 18:51         ` Christoph Lameter
  2007-11-28 23:17           ` Rusty Russell
  0 siblings, 1 reply; 59+ messages in thread
From: Christoph Lameter @ 2007-11-28 18:51 UTC (permalink / raw)
  To: Rusty Russell; +Cc: akpm, linux-kernel, Andi Kleen, Ben Elliston

On Wed, 28 Nov 2007, Rusty Russell wrote:

> On Wednesday 28 November 2007 05:14:47 Christoph Lameter wrote:
> > On Tue, 27 Nov 2007, Rusty Russell wrote:
> > > Have you considered moving x86-64's setup_per_cpu_areas into generic
> > > code? It's a bit messier because some archs might not have set up NUMA
> > > stuff yet, but it's logically generic...
> >
> > Yes that will happen later. This is just the early cleanup work. I
> > plan to generally bring the two x86 arches in line. The pda will be
> > folded into the per cpu area and after that its easy to do.
> 
> Unfortunately, we tried to get rid of the x86-64 pda (like i386) but you lose 
> the ability to use the stack protection config option.  That's because it 
> assumes that gs:0x68 (or something) is the stack canary; we need a YA gcc 
> change to make this gs:__builtin_stack_canary_off (where gcc can emit 
> __builtin_stack_canary_off as a weak absolute symbol, so we can override it 
> for the kernel.

This works if you rebase the per cpu area at zero. gs:0x68 is still the 
stack canary.

The i386 method does not work because the segment register does not 
directly point to the pda.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 11/14] Powerpc: Use generic per cpu
  2007-11-28  2:35           ` Paul Mackerras
@ 2007-11-28 18:54             ` Christoph Lameter
  2007-12-02 20:55               ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 59+ messages in thread
From: Christoph Lameter @ 2007-11-28 18:54 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Kumar Gala, akpm, linux-kernel

On Wed, 28 Nov 2007, Paul Mackerras wrote:

> Christoph Lameter writes:
> 
> > On Wed, 28 Nov 2007, Paul Mackerras wrote:
> > 
> > > Did you try both 32-bit (CONFIG_64BIT=n) and 64-bit (CONFIG_64BIT=y)
> > > configurations?  The paca only exists in 64-bit kernels.
> > 
> > I build both and there is no dependency on 32bit or 64 bit in 
> > include/asm-powerpc/percpu.h. Both access the paca (whatever that is).
> 
> Look at line 3 of include/asm-powerpc/percpu.h:
> 
> #ifdef __powerpc64__
> 
> As far as I can see, after applying your series of patches, I end up
> with an unbalanced #ifdef in include/asm-powerpc/percpu.h.  I see 3
> #ifdef/#ifndef, but only two #endifs.  It needs another #endif after
> the #endif /* SMP */ to match the #ifdef __powerpc64__.  With that
> change it looks OK, since 32-bit uses asm-generic/percpu.h for both
> SMP and UP.

Ahhh.. Ok. Fixed.

Do you know where to get a ppc64 crosscompiler? I 
tried to build gcc for ppc64 but the build failed on x86_64.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
  2007-11-28 18:51         ` Christoph Lameter
@ 2007-11-28 23:17           ` Rusty Russell
  2007-11-28 23:36             ` Christoph Lameter
  2007-11-28 23:45             ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 59+ messages in thread
From: Rusty Russell @ 2007-11-28 23:17 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-kernel, Andi Kleen, Jeremy Fitzhardinge

On Thursday 29 November 2007 05:51:29 Christoph Lameter wrote:
> On Wed, 28 Nov 2007, Rusty Russell wrote:
> > On Wednesday 28 November 2007 05:14:47 Christoph Lameter wrote:
> > > On Tue, 27 Nov 2007, Rusty Russell wrote:
> > > > Have you considered moving x86-64's setup_per_cpu_areas into generic
> > > > code? It's a bit messier because some archs might not have set up
> > > > NUMA stuff yet, but it's logically generic...
> > >
> > > Yes that will happen later. This is just the early cleanup work. I
> > > plan to generally bring the two x86 arches in line. The pda will be
> > > folded into the per cpu area and after that its easy to do.
> >
> > Unfortunately, we tried to get rid of the x86-64 pda (like i386) but you
> > lose the ability to use the stack protection config option.  That's
> > because it assumes that gs:0x68 (or something) is the stack canary; we
> > need a YA gcc change to make this gs:__builtin_stack_canary_off (where
> > gcc can emit __builtin_stack_canary_off as a weak absolute symbol, so we
> > can override it for the kernel.
>
> This works if you rebase the per cpu area at zero. gs:0x68 is still the
> stack canary.
>
> The i386 method does not work because the segment register does not
> directly point to the pda.

But the PDA itself is silly (Jeremy ported it to i386 and I balked).  We have 
a generic one: it's called the per-cpu data.  Having a completely separate 
per-cpu structure for x86-64 is a mistake.

Setting up gs as the per-cpu offset has lovely properties and avoids YA 
arch-specific concept; see the i386 code.  Introducing a generic 
read_percpu()/write_percpu() would even make it optimal.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
  2007-11-28 23:17           ` Rusty Russell
@ 2007-11-28 23:36             ` Christoph Lameter
  2007-11-30  2:23               ` Rusty Russell
  2007-11-28 23:45             ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 59+ messages in thread
From: Christoph Lameter @ 2007-11-28 23:36 UTC (permalink / raw)
  To: Rusty Russell; +Cc: akpm, linux-kernel, Andi Kleen, Jeremy Fitzhardinge

On Thu, 29 Nov 2007, Rusty Russell wrote:

> But the PDA itself is silly (Jeremy ported it to i386 and I balked).  We have 
> a generic one: it's called the per-cpu data.  Having a completely separate 
> per-cpu structure for x86-64 is a mistake.

Yes ultimately the pda can be dissolved. However, the stack canary 
probably has to be kept for backward compatibility.
 
> Setting up gs as the per-cpu offset has lovely properties and avoids YA 
> arch-specific concept; see the i386 code.  Introducing a generic 
> read_percpu()/write_percpu() would even make it optimal.

The code becomes much simpler if gs would point to the beginning of the 
per cpu area and if the __per_cpu_offset[i] would do the same. No weird 
__per_cpu_start offsetting anymore.

The offsets are smaller if they are relative to the per cpu areas which 
will make more compact instructions possible.

The generic write/readpercpu functionality introduced by the cpu_alloc 
patchset works best with offsets relative to an arch dependent 
register. All per cpu data (pda, percpu and allocpercpu) is handles as an 
offset relative to the start of the per cpu data.

If the current offset by __per_cpu_start is kept then a per cpu allocator 
may have to dish out addresses that go beyond __per_cpu_end.

I think dealing with a per cpu variable as if it would be an offset 
relative to a base is natural for the typical addressing of cpus based on 
an offset relative to some register.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
  2007-11-28 23:17           ` Rusty Russell
  2007-11-28 23:36             ` Christoph Lameter
@ 2007-11-28 23:45             ` Jeremy Fitzhardinge
  2007-11-29  0:11               ` Christoph Lameter
  1 sibling, 1 reply; 59+ messages in thread
From: Jeremy Fitzhardinge @ 2007-11-28 23:45 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Christoph Lameter, akpm, linux-kernel, Andi Kleen

Rusty Russell wrote:
> On Thursday 29 November 2007 05:51:29 Christoph Lameter wrote:
>   
>> On Wed, 28 Nov 2007, Rusty Russell wrote:
>>     
>>> On Wednesday 28 November 2007 05:14:47 Christoph Lameter wrote:
>>>       
>>>> On Tue, 27 Nov 2007, Rusty Russell wrote:
>>>>         
>>>>> Have you considered moving x86-64's setup_per_cpu_areas into generic
>>>>> code? It's a bit messier because some archs might not have set up
>>>>> NUMA stuff yet, but it's logically generic...
>>>>>           
>>>> Yes that will happen later. This is just the early cleanup work. I
>>>> plan to generally bring the two x86 arches in line. The pda will be
>>>> folded into the per cpu area and after that its easy to do.
>>>>         
>>> Unfortunately, we tried to get rid of the x86-64 pda (like i386) but you
>>> lose the ability to use the stack protection config option.  That's
>>> because it assumes that gs:0x68 (or something) is the stack canary; we
>>> need a YA gcc change to make this gs:__builtin_stack_canary_off (where
>>> gcc can emit __builtin_stack_canary_off as a weak absolute symbol, so we
>>> can override it for the kernel.
>>>       
>> This works if you rebase the per cpu area at zero. gs:0x68 is still the
>> stack canary.
>>
>> The i386 method does not work because the segment register does not
>> directly point to the pda.
>>     
>
> But the PDA itself is silly (Jeremy ported it to i386 and I balked).  We have 
> a generic one: it's called the per-cpu data.  Having a completely separate 
> per-cpu structure for x86-64 is a mistake.
>   

Yes, I would like to convert x86_64 to match i386's percpu, and drop the
pda altogether.  The only thing preventing this is the stack canary, and
I'm wondering how much value there is in keeping it, given the
disadvantages of having this divergence between 32 and 64 bit.

    J

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
  2007-11-28 23:45             ` Jeremy Fitzhardinge
@ 2007-11-29  0:11               ` Christoph Lameter
  2007-11-29  1:18                 ` Andi Kleen
  2007-11-29  1:30                 ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 59+ messages in thread
From: Christoph Lameter @ 2007-11-29  0:11 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Rusty Russell, akpm, linux-kernel, Andi Kleen

On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote:

 > Yes, I would like to convert x86_64 to match i386's percpu, and drop the
> pda altogether.  The only thing preventing this is the stack canary, and
> I'm wondering how much value there is in keeping it, given the
> disadvantages of having this divergence between 32 and 64 bit.

I think most of the PDA could be gotten rid of. The problems are

1. The stack canary

2. The PDA is used to store per cpu data before the per cpu areas
   are setup.

The i386 way of referring to per cpu data is not optimal because it is 
always offset by __per_cpu_start. per cpu data offsets need to be relative 
to the beginning of the per cpu area. per cpu data is less than 64k so 2 
byte offsets would be enough.

That way the __per_cpu_offset array and the registers that are used on 
various platforms are pointing to the actual data and can be loaded
directly into a register and then a load with a small offset to that 
register can be performed. On x86_64 this is gs, on i386 fs, on sparc g5, 
on ia64 a fixed address stands in for the register. In loops over all per 
cpu variables this will also simplify the code.

And ultimately we can get rid of the ugly RELOC_HIDE macro. It simply 
becomes the adding of the base address in a register to a per cpu offset.







^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
  2007-11-29  0:11               ` Christoph Lameter
@ 2007-11-29  1:18                 ` Andi Kleen
  2007-11-29  1:27                   ` Christoph Lameter
  2007-11-29  1:30                 ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 59+ messages in thread
From: Andi Kleen @ 2007-11-29  1:18 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Jeremy Fitzhardinge, Rusty Russell, akpm, linux-kernel, Andi Kleen

On Wed, Nov 28, 2007 at 04:11:37PM -0800, Christoph Lameter wrote:
> 1. The stack canary

You would need to change gcc with a new option and only allow the stack
checking when the compiler supports the new option. However the problem
is still how to get a reasonable fixed offset. Or perhaps just change
gcc to use a linker symbol relative to %gs that could be set to anything?

-Andi

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
  2007-11-29  1:18                 ` Andi Kleen
@ 2007-11-29  1:27                   ` Christoph Lameter
  0 siblings, 0 replies; 59+ messages in thread
From: Christoph Lameter @ 2007-11-29  1:27 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jeremy Fitzhardinge, Rusty Russell, akpm, linux-kernel, Andi Kleen

On Thu, 29 Nov 2007, Andi Kleen wrote:

> On Wed, Nov 28, 2007 at 04:11:37PM -0800, Christoph Lameter wrote:
> > 1. The stack canary
> 
> You would need to change gcc with a new option and only allow the stack
> checking when the compiler supports the new option. However the problem
> is still how to get a reasonable fixed offset. Or perhaps just change
> gcc to use a linker symbol relative to %gs that could be set to anything?

I still think we should leave the canary as is.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
  2007-11-29  0:11               ` Christoph Lameter
  2007-11-29  1:18                 ` Andi Kleen
@ 2007-11-29  1:30                 ` Jeremy Fitzhardinge
  2007-11-29  1:32                   ` Andi Kleen
  2007-11-29  1:35                   ` Christoph Lameter
  1 sibling, 2 replies; 59+ messages in thread
From: Jeremy Fitzhardinge @ 2007-11-29  1:30 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Rusty Russell, akpm, linux-kernel, Andi Kleen

Christoph Lameter wrote:
> On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote:
>
>  > Yes, I would like to convert x86_64 to match i386's percpu, and drop the
>   
>> pda altogether.  The only thing preventing this is the stack canary, and
>> I'm wondering how much value there is in keeping it, given the
>> disadvantages of having this divergence between 32 and 64 bit.
>>     
>
> I think most of the PDA could be gotten rid of. The problems are
>
> 1. The stack canary
>   

Yes, this is a biggie.  It needs one of:

    * fix gcc
    * post-process the .s file
    * drop support for stack-protector (does it really help? do people
      use it?)


> 2. The PDA is used to store per cpu data before the per cpu areas
>    are setup.
>   

I don't see the problem.  The way i386 does it inherently supports
per-cpu data very early on (it uses the prototype percpu section until
the real percpu values are set up).

> The i386 way of referring to per cpu data is not optimal because it is 
> always offset by __per_cpu_start. per cpu data offsets need to be relative 
> to the beginning of the per cpu area. per cpu data is less than 64k so 2 
> byte offsets would be enough.
>   

I don't see that's terribly important.  percpu references aren't all
that common overall, and - at least on x86 - using a 16-bit offset
(assuming its possible) would require a prefix anyway, so it would only
save 1 byte per reference.  But I can't convince gas to generate a
16-bit offset anyway.

> That way the __per_cpu_offset array and the registers that are used on 
> various platforms are pointing to the actual data and can be loaded
> directly into a register and then a load with a small offset to that 
> register can be performed. On x86_64 this is gs, on i386 fs, on sparc g5, 
> on ia64 a fixed address stands in for the register.

The asm used to generate these references is inherently arch-specific
anyway, so the type and size of offset needed from the per-cpu base
register to the data itself can be arch-dependent without loss of
generality.  

I definitely see that small offsets might be useful for other
architectures, but for x86 it doesn't help and makes things more
complex.  The only difference between 32- and 64-bit is whether we
generate an offset from %fs, %gs or nothing (for the UP case).


>  In loops over all per 
> cpu variables this will also simplify the code.
>   

Why's that?

> And ultimately we can get rid of the ugly RELOC_HIDE macro. It simply 
> becomes the adding of the base address in a register to a per cpu offset.
>   

I was never quite sure what that was for.

    J

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
  2007-11-29  1:30                 ` Jeremy Fitzhardinge
@ 2007-11-29  1:32                   ` Andi Kleen
  2007-11-29  1:35                   ` Christoph Lameter
  1 sibling, 0 replies; 59+ messages in thread
From: Andi Kleen @ 2007-11-29  1:32 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Christoph Lameter, Rusty Russell, akpm, linux-kernel


>     * drop support for stack-protector (does it really help? do people
>       use it?)

AFAIK we only ever had a single classical stack buffer overflow in the kernel.
It certainly doesn't seem to be a common security problem it is solving.

-Andi

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
  2007-11-29  1:30                 ` Jeremy Fitzhardinge
  2007-11-29  1:32                   ` Andi Kleen
@ 2007-11-29  1:35                   ` Christoph Lameter
  2007-11-29  1:42                     ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 59+ messages in thread
From: Christoph Lameter @ 2007-11-29  1:35 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Rusty Russell, akpm, linux-kernel, Andi Kleen

On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote:

> I don't see the problem.  The way i386 does it inherently supports
> per-cpu data very early on (it uses the prototype percpu section until
> the real percpu values are set up).

Ok so we could do that for x86_64 as well? There is more complicated 
bootstrap since i386 does not support NUMA aware placement of per cpu 
areas.

> > The i386 way of referring to per cpu data is not optimal because it is 
> > always offset by __per_cpu_start. per cpu data offsets need to be relative 
> > to the beginning of the per cpu area. per cpu data is less than 64k so 2 
> > byte offsets would be enough.
> >   
> 
> I don't see that's terribly important.  percpu references aren't all
> that common overall, and - at least on x86 - using a 16-bit offset
> (assuming its possible) would require a prefix anyway, so it would only
> save 1 byte per reference.  But I can't convince gas to generate a
> 16-bit offset anyway.

percpu references are quite frequent already (vm statistics) and will be 
more frequent after we have converted the per cpu arrays to per cpu 
allocations.


> > That way the __per_cpu_offset array and the registers that are used on 
> > various platforms are pointing to the actual data and can be loaded
> > directly into a register and then a load with a small offset to that 
> > register can be performed. On x86_64 this is gs, on i386 fs, on sparc g5, 
> > on ia64 a fixed address stands in for the register.
> 
> The asm used to generate these references is inherently arch-specific
> anyway, so the type and size of offset needed from the per-cpu base
> register to the data itself can be arch-dependent without loss of
> generality.  

Well yes that is already the case and made explicit by the percpu cleanup 
done so far. The offset of a base is used by multiple architectures.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
  2007-11-29  1:35                   ` Christoph Lameter
@ 2007-11-29  1:42                     ` Jeremy Fitzhardinge
  2007-11-29  1:48                       ` Christoph Lameter
  2007-11-29  2:06                       ` Christoph Lameter
  0 siblings, 2 replies; 59+ messages in thread
From: Jeremy Fitzhardinge @ 2007-11-29  1:42 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Rusty Russell, akpm, linux-kernel, Andi Kleen

Christoph Lameter wrote:
> On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote:
>
>   
>> I don't see the problem.  The way i386 does it inherently supports
>> per-cpu data very early on (it uses the prototype percpu section until
>> the real percpu values are set up).
>>     
>
> Ok so we could do that for x86_64 as well? There is more complicated 
> bootstrap since i386 does not support NUMA aware placement of per cpu 
> areas.
>   

Don't think it matters either way.  Before percpu is allocated, NUMA
issues don't matter.  Once they are - by whatever mechanism - you can
set the segment bases up appropriately.  The fact that you chose to put
percpu data at address X doesn't affect the percpu mechanism one way or
the other.

> percpu references are quite frequent already (vm statistics) and will be 
> more frequent after we have converted the per cpu arrays to per cpu 
> allocations.
>   

Well, I think the point is moot, because x86 will always use 32-bit
offsets.  Each reference will only be 1 byte bigger than a normal
variable reference.

    J

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
  2007-11-29  1:42                     ` Jeremy Fitzhardinge
@ 2007-11-29  1:48                       ` Christoph Lameter
  2007-11-29  1:54                         ` Jeremy Fitzhardinge
  2007-11-29  2:06                       ` Christoph Lameter
  1 sibling, 1 reply; 59+ messages in thread
From: Christoph Lameter @ 2007-11-29  1:48 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Rusty Russell, akpm, linux-kernel, Andi Kleen

On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote:

> Don't think it matters either way.  Before percpu is allocated, NUMA
> issues don't matter.  Once they are - by whatever mechanism - you can
> set the segment bases up appropriately.  The fact that you chose to put
> percpu data at address X doesn't affect the percpu mechanism one way or
> the other.

The percpu areas need to be allocated in a NUMA aware fashion. Otherwise 
you use distant memory for the most performance sensitive areas. The NUMA 
subsystem must be so far up that these allocations can be performed in the 
right way. And this means at least you need to know on which node each 
processor is located. That is what the PDA is currently used for and i386 
has no other way of doing that. I think we could use an array [NR_CPUS] 
for this one but we want to avoid these arrays because NR_CPUS may get 
very big.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
  2007-11-29  1:48                       ` Christoph Lameter
@ 2007-11-29  1:54                         ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 59+ messages in thread
From: Jeremy Fitzhardinge @ 2007-11-29  1:54 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Rusty Russell, akpm, linux-kernel, Andi Kleen

Christoph Lameter wrote:
> The percpu areas need to be allocated in a NUMA aware fashion. Otherwise 
> you use distant memory for the most performance sensitive areas. The NUMA 
> subsystem must be so far up that these allocations can be performed in the 
> right way. And this means at least you need to know on which node each 
> processor is located. That is what the PDA is currently used for and i386 
> has no other way of doing that. I think we could use an array [NR_CPUS] 
> for this one but we want to avoid these arrays because NR_CPUS may get 
> very big.
>   

Oh, you mean there needs to be some percpu data mechanism operating in
order to do numa-aware allocations, which would be necessary to allocate
the percpu memory itself?

I can see how that would be awkward.

    J


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
  2007-11-29  1:42                     ` Jeremy Fitzhardinge
  2007-11-29  1:48                       ` Christoph Lameter
@ 2007-11-29  2:06                       ` Christoph Lameter
  2007-11-29  5:29                         ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 59+ messages in thread
From: Christoph Lameter @ 2007-11-29  2:06 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Rusty Russell, akpm, linux-kernel, Andi Kleen

On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote:

> > percpu references are quite frequent already (vm statistics) and will be 
> > more frequent after we have converted the per cpu arrays to per cpu 
> > allocations.
> >   
> 
> Well, I think the point is moot, because x86 will always use 32-bit
> offsets.  Each reference will only be 1 byte bigger than a normal
> variable reference.

Just because i386 is not able to use it does not mean that other arches 
are not. F.e. IA64 can embedd offsets in the actual instruction (but of 
course not 64bit).

x86_64 can use a 32 bit offset instead of a 64 bit addres because it uses 
the small model. A load of a 64 bit address would require much more 
expensive instructions. A load of a 64 bit address is currently avoided 
through the use of the pda that contains the full 64 bit address in the
data_offset field. Operations on per cpu data on x86_64 must therefore 
first load data_offset via gs and then add the per cpu address to this
offset. Then the per cpu operation is performed on that address.

In order to avoid this situation through one instruction we need a small 
32 bit offset relative to gs. Otherwise we cannot get away from the PDA 
and the use of data_offset.

 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
  2007-11-29  2:06                       ` Christoph Lameter
@ 2007-11-29  5:29                         ` Jeremy Fitzhardinge
  2007-11-29  6:08                           ` Christoph Lameter
  2007-11-29  6:10                           ` Christoph Lameter
  0 siblings, 2 replies; 59+ messages in thread
From: Jeremy Fitzhardinge @ 2007-11-29  5:29 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Rusty Russell, akpm, linux-kernel, Andi Kleen

Christoph Lameter wrote:
> x86_64 can use a 32 bit offset instead of a 64 bit addres because it uses 
> the small model. A load of a 64 bit address would require much more 
> expensive instructions. A load of a 64 bit address is currently avoided 
> through the use of the pda that contains the full 64 bit address in the
> data_offset field. Operations on per cpu data on x86_64 must therefore 
> first load data_offset via gs and then add the per cpu address to this
> offset. Then the per cpu operation is performed on that address.
>   

Hm.  Certainly a non-one-instruction access would be considerably less
useful than one that is, because of preemption issues.

(In general you need to pin yourself to a cpu if you're using percpu
data, but sometimes it doesn't matter.  In particular, the reason I'm
interested in this at all is because Xen puts its interrupt mask flag in
per-cpu data, and a single instruction means that masking interrupts
[=disable preemption] can be done in one instruction with no scope for
preemption in the middle doing something unexpected.)

> In order to avoid this situation through one instruction we need a small 
> 32 bit offset relative to gs. Otherwise we cannot get away from the PDA 
> and the use of data_offset.
>   

Hm, yes, I see.  Dratted large address space.  What's wrong with 4G
anyway? ;)

Anyway, I can see the problem with my thinking about this so far.

    J

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
  2007-11-29  5:29                         ` Jeremy Fitzhardinge
@ 2007-11-29  6:08                           ` Christoph Lameter
  2007-11-29  6:10                           ` Christoph Lameter
  1 sibling, 0 replies; 59+ messages in thread
From: Christoph Lameter @ 2007-11-29  6:08 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Rusty Russell, akpm, linux-kernel, Andi Kleen

Here is the first of two patches for x86_64 that move the pda into the per 
cpu area and then make the x86 percpu macros work for x86_64. This needs 
to be generalized for other arches. The __per_cpu_start offsets can be 
taken care of by the linker. We can also tell the linker to completely 
relocate the percpu area to 0.



X86_64: Declare pda as per cpu data thereby moving it into the cpu area

Declare the pda as a per cpu variable. This will have the effect of moving
the pda data into the cpu area managed by cpu alloc.

The boot_pdas are only needed in head64.c so move the declaration
over there and make it static.

Remove the code that allocates special pda data structures.

The pda is moved to the beginning of the per cpu area. gs is pointing to the
pda. And therefore gs: is now pointing to the per cpu area of the current
processor. A per cpu variable can then be reached at

%gs:[&per_cpu_xxxx - __per_cpu_start]

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 arch/x86/kernel/head64.c          |    6 ++++++
 arch/x86/kernel/setup64.c         |   13 ++++++++++---
 arch/x86/kernel/smpboot_64.c      |   16 ----------------
 include/asm-generic/vmlinux.lds.h |    1 +
 include/asm-x86/pda.h             |    1 -
 include/linux/percpu.h            |    4 ++++
 6 files changed, 21 insertions(+), 20 deletions(-)

Index: linux-2.6.24-rc3-mm2/arch/x86/kernel/setup64.c
===================================================================
--- linux-2.6.24-rc3-mm2.orig/arch/x86/kernel/setup64.c	2007-11-28 20:59:13.124188194 -0800
+++ linux-2.6.24-rc3-mm2/arch/x86/kernel/setup64.c	2007-11-28 21:08:50.473347382 -0800
@@ -30,7 +30,9 @@ cpumask_t cpu_initialized __cpuinitdata 
 
 struct x8664_pda *_cpu_pda[NR_CPUS] __read_mostly;
 EXPORT_SYMBOL(_cpu_pda);
-struct x8664_pda boot_cpu_pda[NR_CPUS] __cacheline_aligned;
+
+DEFINE_PER_CPU_FIRST(struct x8664_pda, pda);
+EXPORT_PER_CPU_SYMBOL(pda);
 
 struct desc_ptr idt_descr = { 256 * 16 - 1, (unsigned long) idt_table };
 
@@ -109,10 +111,15 @@ void __init setup_per_cpu_areas(void)
 		}
 		if (!ptr)
 			panic("Cannot allocate cpu data for CPU %d\n", i);
-		cpu_pda(i)->data_offset = ptr - __per_cpu_start;
 		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
+		/* Relocate the pda */
+		memcpy(ptr, cpu_pda(i), sizeof(struct x8664_pda));
+		cpu_pda(i) = (struct x8664_pda *)ptr;
+		cpu_pda(i)->data_offset = ptr - __per_cpu_start;
 	}
-} 
+	/* Fix up pda for this processor .... */
+	pda_init(0);
+}
 
 void pda_init(int cpu)
 { 
Index: linux-2.6.24-rc3-mm2/arch/x86/kernel/smpboot_64.c
===================================================================
--- linux-2.6.24-rc3-mm2.orig/arch/x86/kernel/smpboot_64.c	2007-11-28 20:59:13.136188167 -0800
+++ linux-2.6.24-rc3-mm2/arch/x86/kernel/smpboot_64.c	2007-11-28 20:59:35.399937395 -0800
@@ -556,22 +556,6 @@ static int __cpuinit do_boot_cpu(int cpu
 		return -1;
 	}
 
-	/* Allocate node local memory for AP pdas */
-	if (cpu_pda(cpu) == &boot_cpu_pda[cpu]) {
-		struct x8664_pda *newpda, *pda;
-		int node = cpu_to_node(cpu);
-		pda = cpu_pda(cpu);
-		newpda = kmalloc_node(sizeof (struct x8664_pda), GFP_ATOMIC,
-				      node);
-		if (newpda) {
-			memcpy(newpda, pda, sizeof (struct x8664_pda));
-			cpu_pda(cpu) = newpda;
-		} else
-			printk(KERN_ERR
-		"Could not allocate node local PDA for CPU %d on node %d\n",
-				cpu, node);
-	}
-
 	alternatives_smp_switch(1);
 
 	c_idle.idle = get_idle_for_cpu(cpu);
Index: linux-2.6.24-rc3-mm2/arch/x86/kernel/head64.c
===================================================================
--- linux-2.6.24-rc3-mm2.orig/arch/x86/kernel/head64.c	2007-11-28 20:59:13.152187359 -0800
+++ linux-2.6.24-rc3-mm2/arch/x86/kernel/head64.c	2007-11-28 20:59:35.403937534 -0800
@@ -22,6 +22,12 @@
 #include <asm/sections.h>
 #include <asm/kdebug.h>
 
+/*
+ * Only used before the per cpu areas are setup. The use for the non possible
+ * cpus continues after boot
+ */
+static struct x8664_pda boot_cpu_pda[NR_CPUS] __cacheline_aligned;
+
 static void __init zap_identity_mappings(void)
 {
 	pgd_t *pgd = pgd_offset_k(0UL);
Index: linux-2.6.24-rc3-mm2/include/asm-x86/pda.h
===================================================================
--- linux-2.6.24-rc3-mm2.orig/include/asm-x86/pda.h	2007-11-28 20:59:13.164187921 -0800
+++ linux-2.6.24-rc3-mm2/include/asm-x86/pda.h	2007-11-28 20:59:35.403937534 -0800
@@ -39,7 +39,6 @@ struct x8664_pda {
 } ____cacheline_aligned_in_smp;
 
 extern struct x8664_pda *_cpu_pda[];
-extern struct x8664_pda boot_cpu_pda[];
 extern void pda_init(int);
 
 #define cpu_pda(i) (_cpu_pda[i])
Index: linux-2.6.24-rc3-mm2/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux-2.6.24-rc3-mm2.orig/include/asm-generic/vmlinux.lds.h	2007-11-28 20:59:13.176187886 -0800
+++ linux-2.6.24-rc3-mm2/include/asm-generic/vmlinux.lds.h	2007-11-28 20:59:35.403937534 -0800
@@ -259,6 +259,7 @@
 	. = ALIGN(align);						\
 	__per_cpu_start = .;						\
 	.data.percpu  : AT(ADDR(.data.percpu) - LOAD_OFFSET) {		\
+		*(.data.percpu.first)					\
 		*(.data.percpu)						\
 		*(.data.percpu.shared_aligned)				\
 	}								\
Index: linux-2.6.24-rc3-mm2/include/linux/percpu.h
===================================================================
--- linux-2.6.24-rc3-mm2.orig/include/linux/percpu.h	2007-11-28 20:59:13.188187940 -0800
+++ linux-2.6.24-rc3-mm2/include/linux/percpu.h	2007-11-28 21:09:23.399307556 -0800
@@ -23,6 +23,10 @@
 	DEFINE_PER_CPU(type, name)
 #endif
 
+#define DEFINE_PER_CPU_FIRST(type, name)				\
+	__attribute__((__section__(".data.percpu.first")))		\
+	PER_CPU_ATTRIBUTES __typeof__(type) per_cpu__##name
+
 #define EXPORT_PER_CPU_SYMBOL(var) EXPORT_SYMBOL(per_cpu__##var)
 #define EXPORT_PER_CPU_SYMBOL_GPL(var) EXPORT_SYMBOL_GPL(per_cpu__##var)
 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
  2007-11-29  5:29                         ` Jeremy Fitzhardinge
  2007-11-29  6:08                           ` Christoph Lameter
@ 2007-11-29  6:10                           ` Christoph Lameter
  1 sibling, 0 replies; 59+ messages in thread
From: Christoph Lameter @ 2007-11-29  6:10 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Rusty Russell, akpm, linux-kernel, Andi Kleen

Second portion. Add a new seg_offset macro to calculate the offset. This 
can be avoided if the linker relocates the per cpu area to zero. Includes 
a patch to read trickle count via both methods to verify that it actually 
works. Both patches on top of the per cpu cleanup patches that I sent 
today too.


x86_64: Make the x86_32 percpu operations usable on x86_64

Calculate the offset relative to gs in order to be able to address
per cpu data using the x86_64 per cpu macros.

The subtraction of __per_cpu_start will make the offset based
from the beginning of the per cpu area. That is where gs points to.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 drivers/char/random.c    |    2 +-
 include/asm-x86/percpu.h |   29 ++++++++++++++++++-----------
 init/main.c              |    5 +++++
 3 files changed, 24 insertions(+), 12 deletions(-)

Index: linux-2.6.24-rc3-mm2/include/asm-x86/percpu.h
===================================================================
--- linux-2.6.24-rc3-mm2.orig/include/asm-x86/percpu.h	2007-11-28 17:50:01.861182410 -0800
+++ linux-2.6.24-rc3-mm2/include/asm-x86/percpu.h	2007-11-28 21:22:50.845872906 -0800
@@ -16,7 +16,13 @@
 #define __my_cpu_offset read_pda(data_offset)
 
 #define per_cpu_offset(x) (__per_cpu_offset(x))
+#define __percpu_seg "%%gs:"
+/* Calculate the offset to use with the segment register */
+#define seg_offset(name)   (*SHIFT_PTR(&per_cpu_var(name), - (unsigned long)__per_cpu_start))
 
+#else
+#define __percpu_seg ""
+#define seg_offset(name)   per_cpu_var(name)
 #endif
 #include <asm-generic/percpu.h>
 
@@ -64,16 +70,11 @@ DECLARE_PER_CPU(struct x8664_pda, pda);
  *    PER_CPU(cpu_gdt_descr, %ebx)
  */
 #ifdef CONFIG_SMP
-
 #define __my_cpu_offset x86_read_percpu(this_cpu_off)
-
 /* fs segment starts at (positive) offset == __per_cpu_offset[cpu] */
 #define __percpu_seg "%%fs:"
-
 #else  /* !SMP */
-
 #define __percpu_seg ""
-
 #endif	/* SMP */
 
 #include <asm-generic/percpu.h>
@@ -81,6 +82,13 @@ DECLARE_PER_CPU(struct x8664_pda, pda);
 /* We can use this directly for local CPU (faster). */
 DECLARE_PER_CPU(unsigned long, this_cpu_off);
 
+#define seg_offset(name)	per_cpu_var(name)
+
+#endif /* __ASSEMBLY__ */
+#endif /* !CONFIG_X86_64 */
+
+#ifndef __ASSEMBLY__
+
 /* For arch-specific code, we can use direct single-insn ops (they
  * don't give an lvalue though). */
 extern void __bad_percpu_size(void);
@@ -132,11 +140,10 @@ extern void __bad_percpu_size(void);
 		}						\
 		ret__; })
 
-#define x86_read_percpu(var) percpu_from_op("mov", per_cpu__##var)
-#define x86_write_percpu(var,val) percpu_to_op("mov", per_cpu__##var, val)
-#define x86_add_percpu(var,val) percpu_to_op("add", per_cpu__##var, val)
-#define x86_sub_percpu(var,val) percpu_to_op("sub", per_cpu__##var, val)
-#define x86_or_percpu(var,val) percpu_to_op("or", per_cpu__##var, val)
+#define x86_read_percpu(var) percpu_from_op("mov", seg_offset(var))
+#define x86_write_percpu(var,val) percpu_to_op("mov", seg_offset(var), val)
+#define x86_add_percpu(var,val) percpu_to_op("add", seg_offset(var), val)
+#define x86_sub_percpu(var,val) percpu_to_op("sub", seg_offset(var), val)
+#define x86_or_percpu(var,val) percpu_to_op("or", seg_offset(var), val)
 #endif /* !__ASSEMBLY__ */
-#endif /* !CONFIG_X86_64 */
 #endif /* _ASM_X86_PERCPU_H_ */
Index: linux-2.6.24-rc3-mm2/drivers/char/random.c
===================================================================
--- linux-2.6.24-rc3-mm2.orig/drivers/char/random.c	2007-11-28 21:20:58.225804398 -0800
+++ linux-2.6.24-rc3-mm2/drivers/char/random.c	2007-11-28 21:28:38.967363573 -0800
@@ -272,7 +272,7 @@ static int random_write_wakeup_thresh = 
 
 static int trickle_thresh __read_mostly = INPUT_POOL_WORDS * 28;
 
-static DEFINE_PER_CPU(int, trickle_count) = 0;
+DEFINE_PER_CPU(int, trickle_count) = 55;
 
 /*
  * A pool of size .poolwords is stirred with a primitive polynomial
Index: linux-2.6.24-rc3-mm2/init/main.c
===================================================================
--- linux-2.6.24-rc3-mm2.orig/init/main.c	2007-11-28 21:10:54.245804225 -0800
+++ linux-2.6.24-rc3-mm2/init/main.c	2007-11-28 21:22:17.769053628 -0800
@@ -504,6 +504,8 @@ void __init __attribute__((weak)) smp_se
 {
 }
 
+DECLARE_PER_CPU(int, trickle_count);
+
 asmlinkage void __init start_kernel(void)
 {
 	char * command_line;
@@ -645,6 +647,9 @@ asmlinkage void __init start_kernel(void
 
 	acpi_early_init(); /* before LAPIC and SMP init */
 
+	printk("Reading trickle cound =%lu. Is %lu\n",
+		x86_read_percpu(trickle_count),
+		__raw_get_cpu_var(trickle_count));
 	/* Do the rest non-__init'ed, we're now alive */
 	rest_init();
 }


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
  2007-11-28 23:36             ` Christoph Lameter
@ 2007-11-30  2:23               ` Rusty Russell
  0 siblings, 0 replies; 59+ messages in thread
From: Rusty Russell @ 2007-11-30  2:23 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-kernel, Andi Kleen, Jeremy Fitzhardinge

On Thursday 29 November 2007 10:36:06 Christoph Lameter wrote:
> The code becomes much simpler if gs would point to the beginning of the
> per cpu area and if the __per_cpu_offset[i] would do the same. No weird
> __per_cpu_start offsetting anymore.

It is a little weird, but it gave flexibility for most archs.

ISTR I had issues relocating the percpu area to 0, but I look forward to your 
code!

> The generic write/readpercpu functionality introduced by the cpu_alloc
> patchset works best with offsets relative to an arch dependent
> register. All per cpu data (pda, percpu and allocpercpu) is handles as an
> offset relative to the start of the per cpu data.

Hmm, did someone cc me on the patchset and I missed it?

> If the current offset by __per_cpu_start is kept then a per cpu allocator
> may have to dish out addresses that go beyond __per_cpu_end.

Of course; you just need congruence in your allocation across CPUs.  It's 
possible, but no worse than the requirements on other schemes where you can 
reach a variable with a single addition for the CPU.

> I think dealing with a per cpu variable as if it would be an offset
> relative to a base is natural for the typical addressing of cpus based on
> an offset relative to some register.

We've had practical problems getting the compiler to eke out the potential 
benefit.  That's why we settled for an offset between where the compiler 
expected and where the variable actually was.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch 11/14] Powerpc: Use generic per cpu
  2007-11-28 18:54             ` Christoph Lameter
@ 2007-12-02 20:55               ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 59+ messages in thread
From: Benjamin Herrenschmidt @ 2007-12-02 20:55 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Paul Mackerras, Kumar Gala, akpm, linux-kernel


On Wed, 2007-11-28 at 10:54 -0800, Christoph Lameter wrote:
> > As far as I can see, after applying your series of patches, I end up
> > with an unbalanced #ifdef in include/asm-powerpc/percpu.h.  I see 3
> > #ifdef/#ifndef, but only two #endifs.  It needs another #endif after
> > the #endif /* SMP */ to match the #ifdef __powerpc64__.  With that
> > change it looks OK, since 32-bit uses asm-generic/percpu.h for both
> > SMP and UP.
> 
> Ahhh.. Ok. Fixed.
> 
> Do you know where to get a ppc64 crosscompiler? I 
> tried to build gcc for ppc64 but the build failed on x86_64.

Usually, we build biarch... checked if your existing gcc happen to work
with -m64 ?

Ben.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* [patch 00/14] Per cpu code simplification
@ 2007-11-27  0:11 Christoph Lameter
  0 siblings, 0 replies; 59+ messages in thread
From: Christoph Lameter @ 2007-11-27  0:11 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel

This patchset simplifies the code that arches need to maintain to support
per cpu functionality. Most of the code is moved into arch independent
code. Only a set of minimal definitions is kept for each arch.

The patch also unifies the x86 arch so that there is only a single
asm-x86/percpu.h

-- 

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2007-12-02 20:56 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-11-27  0:14 [patch 00/14] Per cpu code simplification Christoph Lameter
2007-11-27  0:14 ` [patch 01/14] Modules: Handle symbols that have a zero value Christoph Lameter
2007-11-27  0:14 ` [patch 02/14] Modules: Include sections.h to avoid defining linker variables explicitly Christoph Lameter
2007-11-27  0:14 ` [patch 03/14] Modules: Fold percpu_modcopy into module.c and get rid of the macro from hell Christoph Lameter
2007-11-27  0:14 ` [patch 04/14] ia64: Remove the __SMALL_ADDR_AREA attribute for per cpu access Christoph Lameter
2007-11-27  5:20   ` David Mosberger-Tang
2007-11-27 18:15     ` Christoph Lameter
2007-11-27 21:10       ` David Mosberger-Tang
2007-11-27 21:18         ` Christoph Lameter
2007-11-27 21:27           ` David Mosberger-Tang
2007-11-27 22:02             ` Christoph Lameter
2007-11-27  9:30   ` Andreas Schwab
2007-11-27 18:17     ` Christoph Lameter
2007-11-27 21:24       ` Andreas Schwab
2007-11-27 21:38         ` Christoph Lameter
2007-11-27 22:14           ` Adrian Bunk
2007-11-27  0:14 ` [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup Christoph Lameter
2007-11-27  4:30   ` Rusty Russell
2007-11-27 18:14     ` Christoph Lameter
2007-11-28  1:36       ` Rusty Russell
2007-11-28 18:51         ` Christoph Lameter
2007-11-28 23:17           ` Rusty Russell
2007-11-28 23:36             ` Christoph Lameter
2007-11-30  2:23               ` Rusty Russell
2007-11-28 23:45             ` Jeremy Fitzhardinge
2007-11-29  0:11               ` Christoph Lameter
2007-11-29  1:18                 ` Andi Kleen
2007-11-29  1:27                   ` Christoph Lameter
2007-11-29  1:30                 ` Jeremy Fitzhardinge
2007-11-29  1:32                   ` Andi Kleen
2007-11-29  1:35                   ` Christoph Lameter
2007-11-29  1:42                     ` Jeremy Fitzhardinge
2007-11-29  1:48                       ` Christoph Lameter
2007-11-29  1:54                         ` Jeremy Fitzhardinge
2007-11-29  2:06                       ` Christoph Lameter
2007-11-29  5:29                         ` Jeremy Fitzhardinge
2007-11-29  6:08                           ` Christoph Lameter
2007-11-29  6:10                           ` Christoph Lameter
2007-11-27 23:40   ` Randy Dunlap
2007-11-28  0:03     ` Christoph Lameter
2007-11-28  0:05       ` Randy Dunlap
2007-11-27  0:14 ` [patch 06/14] percpu: Move arch XX_PER_CPU_XX definitions into linux/percpu.h Christoph Lameter
2007-11-27  0:14 ` [patch 07/14] percpu: Make the asm-generic/percpu.h more generic Christoph Lameter
2007-11-27  0:14 ` [patch 08/14] x86_32: Use generic percpu.h Christoph Lameter
2007-11-27  0:14 ` [patch 09/14] x86_64: Use generic percpu Christoph Lameter
2007-11-27  0:14 ` [patch 10/14] s390: " Christoph Lameter
2007-11-27  0:14 ` [patch 11/14] Powerpc: Use generic per cpu Christoph Lameter
2007-11-27  7:41   ` Kumar Gala
2007-11-27 18:16     ` Christoph Lameter
2007-11-27 20:58       ` Paul Mackerras
2007-11-27 21:13         ` Christoph Lameter
2007-11-28  2:35           ` Paul Mackerras
2007-11-28 18:54             ` Christoph Lameter
2007-12-02 20:55               ` Benjamin Herrenschmidt
2007-11-27  0:14 ` [patch 12/14] Sparc64: Use generic percpu Christoph Lameter
2007-11-27  0:14 ` [patch 13/14] ia64: " Christoph Lameter
2007-11-27  1:37   ` Christoph Lameter
2007-11-27  0:14 ` [patch 14/14] x86: Unify percpu.h Christoph Lameter
  -- strict thread matches above, loose matches on Subject: below --
2007-11-27  0:11 [patch 00/14] Per cpu code simplification Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).