All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH/RFC 0/8] Numa: Use Generic Per-cpu Variables for numa_*_id()
@ 2010-03-04 17:06 ` Lee Schermerhorn
  0 siblings, 0 replies; 42+ messages in thread
From: Lee Schermerhorn @ 2010-03-04 17:06 UTC (permalink / raw)
  To: linux-arch, linux-mm, linux-numa
  Cc: Tejun Heo, Mel Gorman, Andi Kleen, Christoph Lameter,
	Nick Piggin, David Rientjes, akpm, eric.whitney

Use Generic Per cpu infrastructure for numa_*_id() V3

Series against 2.6.33-mmotm-100302-1838

Just getting back to this series as the original issue that got me on this
track is still a problem for us.  For context, see my inital posting:

	http://marc.info/?l=linux-arch&m=125814673120678&w=4
	http://marc.info/?l=linux-mm&m=125814674120706&w=4
	http://marc.info/?l=linux-mm&m=125814677520784&w=4
	http://marc.info/?l=linux-mm&m=125814678120803&w=4
	http://marc.info/?l=linux-arch&m=125814678120803&w=4
	...

This series resolved the problem on our ia64 platforms and caused no
regression in x86_64 [a slight improvement even] for the admittedly
few tests that I ran.  However, reviewers raised a couple of issues:

1) I hacked up {linux|asm-generic}/percpu.h quite heavily to break a
circular header dependency.  Tejun reviewed my patches and sent
suggestions off-list.  I'll respond to his comments below.

2) Using the generic percpu.h would require all archs to adjust their
asm/percpu.h to utilize the generic percpu verion of numa_*_id().
However, I think my series did not require changes to other archs just
to build with the old behavior.

On Tue, 2009-12-15 at 17:05 +0900, Tejun Heo wrote:

>Hello,
>On 12/01/2009 09:49 PM, Lee Schermerhorn wrote:
> On Tue, 2009-12-01 at 14:56 +0900, Tejun Heo wrote:
>> Hello,
>>
>> (private reply)
>>
>> On 12/01/2009 05:28 AM, Lee Schermerhorn wrote:
>>> So here's what happened:
>>>
>>> linux/topology.h now depends on */percpu.h to implement numa_node_id()
>>> and numa_mem_id().  Not so much an issue for x86 because its
>>> asm/topology.h already depended on its asm/percpu.h.  But ia64, for
>>> instance--maybe any arch that doesn't already implement numa_node_id()
>>> as a percpu variable--didn't define this_cpu_read() for
>>> linux/topology.h.
>>
>> Can you please send me the patches?
>>
>> Tejun:
>>
>> I have attached the entire series as a tarball.  If you'd like me to
>> send you the patches as separate messages, let me know.
>>
>> I should have copied you directly on the original posting [13nov on -mm
>> and -arch].  I intended to, but forgot last minute.

> Sorry about the delay.  Several comments.

Thank you for the review.  No problem with the "delay".  I've been busy
with other matters myself.

> nid-01:
>
> * Is moving this_cpu ops to asm-generic/percpu.h necessary?  I know
>   the current ops / defs organization is a bit messy and intend to
>   clean things up but I'm not quite sure those ops belong to
>   asm-generic/percpu.h which I kind of want to remove and move stuff
>   to either linux/percpu-defs.h or linux/percpu.h.

Is it necessary?  If we want to keep the definition of numa_node_id() in
topology.h, where it currently resides, and use the generic percpu
infrastructure, as Christoph suggested, we need to break the circular
dependency:
	  topology.h -> percpu.h [added] -> slab.h -> gfp.h -> topology.h

Willy suggested that I un-inline __alloc_percpu() and free_percpu() for
the !SMP case.  This would allow me to remove the include of slab.h from
percpu.h.  I tried this.  In allyesconfig w/ !SMP, this results in over
700 files failing to build.  Apparently, they depend on percpu.h to
include slab.h [?!!!].   I can generate patches to fix these, but
I'm wondering whether that's the right approach.

My first generic percpu numa_node_id series followed models I saw for other
{linux|asm|asm-generic}/foo.h stacks.  I wanted to be able to continue to
have percpu.h include slab.h for all the places that assume this [:(], while
giving topology.h access to the generic definitions via the arch specific
percpu.h.  Of course, I did this by assuming that the asm/topology.h will
include asm/percpu.h -- smaller patch :).  I'll fix that.  Then arch won't
need to modify their asm/topology.h to use the generic numa_*_id() defs,
unless they already implement numa_node_id() using percpu variables and
want to back that out to use the generic versions [like x86].

So, if we can sort this issue out -- how to break the circular header
dependency in a manner acceptable to all -- we should be able to use the
generic percpu infrastructure for the numa_*_id() functions, as Christoph
suggested.  The 7th patch in the series [slab use numa_mem_id], which is
my primary reason for working this, may still need work to handle node
hotplug and zonelist rebuild.  I'll address that as a separate series/thread,
if Andi Kleen's and others' slab hotplug work doesn't handle it.

Tejun's suggestion of using a linux/percpu-defs.h for the generic defs
appears to work, enabling me to include the generic definitions in topology.h
w/o pulling in slab.h.  This version of the series takes that approach.

>* Also, please separate out the changes to implement the numa stuff
>  from percpu changes.  It's a bit confusing to review.

Patch 1 of this series separates out the percpu.h changes.

>nid-02:
>
>* If you define __this_cpu_read/write() in
>  arch/x86/include/asm/percpu.h, you don't need to define any of
>  __this_cpu_read/write_n() versions, right?

The *_n versions were already there.

In the previous version asked whether we could at least define the '*_n()'
wrappers in terms of _this_cpu_read/write().  I recall that Christoph
responded [offlist, maybe?] that we need the '_n versions as they're defined.
But, I agree that the basic x86 implementation seems to handle any sized
argument.

> Also, I think this belongs to a separate patch.

Will do.

>nid-03:
>
>* I think numa_node and numa_mem variables are better defined in
>  page_alloc (or some other file which has more to do with numa aware
>  memory allocation).  Till now, mm/percpu.c only contains the percpu
>  allocator itself, so adding numa stuff there seems a bit strange.

Done.  =>page_alloc.c

>nid-04:
>
>* Isn't #define numa_mem numa_node a bit dangerous?  Someone might use
>  numa_mem as a local variable name.  Why not define it as a inline
>  function or at least a macro which takes argument.

numa_mem and numa_node are the names of the per cpu variables, referenced
by __this_cpu_read().  So, I suppose we can rename them both something like:
percpu_numa_*.  Would satisfy your concern?

What do others think?

Currently I've left them as numa_mem and numa_node.

Lee

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH/RFC 0/8] Numa: Use Generic Per-cpu Variables for numa_*_id()
@ 2010-03-04 17:06 ` Lee Schermerhorn
  0 siblings, 0 replies; 42+ messages in thread
From: Lee Schermerhorn @ 2010-03-04 17:06 UTC (permalink / raw)
  To: linux-arch, linux-mm, linux-numa
  Cc: Tejun Heo, Mel Gorman, Andi Kleen, Christoph Lameter,
	Nick Piggin, David Rientjes, akpm, eric.whitney

Use Generic Per cpu infrastructure for numa_*_id() V3

Series against 2.6.33-mmotm-100302-1838

Just getting back to this series as the original issue that got me on this
track is still a problem for us.  For context, see my inital posting:

	http://marc.info/?l=linux-arch&m=125814673120678&w=4
	http://marc.info/?l=linux-mm&m=125814674120706&w=4
	http://marc.info/?l=linux-mm&m=125814677520784&w=4
	http://marc.info/?l=linux-mm&m=125814678120803&w=4
	http://marc.info/?l=linux-arch&m=125814678120803&w=4
	...

This series resolved the problem on our ia64 platforms and caused no
regression in x86_64 [a slight improvement even] for the admittedly
few tests that I ran.  However, reviewers raised a couple of issues:

1) I hacked up {linux|asm-generic}/percpu.h quite heavily to break a
circular header dependency.  Tejun reviewed my patches and sent
suggestions off-list.  I'll respond to his comments below.

2) Using the generic percpu.h would require all archs to adjust their
asm/percpu.h to utilize the generic percpu verion of numa_*_id().
However, I think my series did not require changes to other archs just
to build with the old behavior.

On Tue, 2009-12-15 at 17:05 +0900, Tejun Heo wrote:

>Hello,
>On 12/01/2009 09:49 PM, Lee Schermerhorn wrote:
> On Tue, 2009-12-01 at 14:56 +0900, Tejun Heo wrote:
>> Hello,
>>
>> (private reply)
>>
>> On 12/01/2009 05:28 AM, Lee Schermerhorn wrote:
>>> So here's what happened:
>>>
>>> linux/topology.h now depends on */percpu.h to implement numa_node_id()
>>> and numa_mem_id().  Not so much an issue for x86 because its
>>> asm/topology.h already depended on its asm/percpu.h.  But ia64, for
>>> instance--maybe any arch that doesn't already implement numa_node_id()
>>> as a percpu variable--didn't define this_cpu_read() for
>>> linux/topology.h.
>>
>> Can you please send me the patches?
>>
>> Tejun:
>>
>> I have attached the entire series as a tarball.  If you'd like me to
>> send you the patches as separate messages, let me know.
>>
>> I should have copied you directly on the original posting [13nov on -mm
>> and -arch].  I intended to, but forgot last minute.

> Sorry about the delay.  Several comments.

Thank you for the review.  No problem with the "delay".  I've been busy
with other matters myself.

> nid-01:
>
> * Is moving this_cpu ops to asm-generic/percpu.h necessary?  I know
>   the current ops / defs organization is a bit messy and intend to
>   clean things up but I'm not quite sure those ops belong to
>   asm-generic/percpu.h which I kind of want to remove and move stuff
>   to either linux/percpu-defs.h or linux/percpu.h.

Is it necessary?  If we want to keep the definition of numa_node_id() in
topology.h, where it currently resides, and use the generic percpu
infrastructure, as Christoph suggested, we need to break the circular
dependency:
	  topology.h -> percpu.h [added] -> slab.h -> gfp.h -> topology.h

Willy suggested that I un-inline __alloc_percpu() and free_percpu() for
the !SMP case.  This would allow me to remove the include of slab.h from
percpu.h.  I tried this.  In allyesconfig w/ !SMP, this results in over
700 files failing to build.  Apparently, they depend on percpu.h to
include slab.h [?!!!].   I can generate patches to fix these, but
I'm wondering whether that's the right approach.

My first generic percpu numa_node_id series followed models I saw for other
{linux|asm|asm-generic}/foo.h stacks.  I wanted to be able to continue to
have percpu.h include slab.h for all the places that assume this [:(], while
giving topology.h access to the generic definitions via the arch specific
percpu.h.  Of course, I did this by assuming that the asm/topology.h will
include asm/percpu.h -- smaller patch :).  I'll fix that.  Then arch won't
need to modify their asm/topology.h to use the generic numa_*_id() defs,
unless they already implement numa_node_id() using percpu variables and
want to back that out to use the generic versions [like x86].

So, if we can sort this issue out -- how to break the circular header
dependency in a manner acceptable to all -- we should be able to use the
generic percpu infrastructure for the numa_*_id() functions, as Christoph
suggested.  The 7th patch in the series [slab use numa_mem_id], which is
my primary reason for working this, may still need work to handle node
hotplug and zonelist rebuild.  I'll address that as a separate series/thread,
if Andi Kleen's and others' slab hotplug work doesn't handle it.

Tejun's suggestion of using a linux/percpu-defs.h for the generic defs
appears to work, enabling me to include the generic definitions in topology.h
w/o pulling in slab.h.  This version of the series takes that approach.

>* Also, please separate out the changes to implement the numa stuff
>  from percpu changes.  It's a bit confusing to review.

Patch 1 of this series separates out the percpu.h changes.

>nid-02:
>
>* If you define __this_cpu_read/write() in
>  arch/x86/include/asm/percpu.h, you don't need to define any of
>  __this_cpu_read/write_n() versions, right?

The *_n versions were already there.

In the previous version asked whether we could at least define the '*_n()'
wrappers in terms of _this_cpu_read/write().  I recall that Christoph
responded [offlist, maybe?] that we need the '_n versions as they're defined.
But, I agree that the basic x86 implementation seems to handle any sized
argument.

> Also, I think this belongs to a separate patch.

Will do.

>nid-03:
>
>* I think numa_node and numa_mem variables are better defined in
>  page_alloc (or some other file which has more to do with numa aware
>  memory allocation).  Till now, mm/percpu.c only contains the percpu
>  allocator itself, so adding numa stuff there seems a bit strange.

Done.  =>page_alloc.c

>nid-04:
>
>* Isn't #define numa_mem numa_node a bit dangerous?  Someone might use
>  numa_mem as a local variable name.  Why not define it as a inline
>  function or at least a macro which takes argument.

numa_mem and numa_node are the names of the per cpu variables, referenced
by __this_cpu_read().  So, I suppose we can rename them both something like:
percpu_numa_*.  Would satisfy your concern?

What do others think?

Currently I've left them as numa_mem and numa_node.

Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH/RFC 1/8] numa: prep:  move generic percpu interface definitions to percpu-defs.h
@ 2010-03-04 17:07   ` Lee Schermerhorn
  0 siblings, 0 replies; 42+ messages in thread
From: Lee Schermerhorn @ 2010-03-04 17:07 UTC (permalink / raw)
  To: linux-arch, linux-mm, linux-numa
  Cc: Tejun Heo, Mel Gorman, Andi Kleen, Christoph Lameter,
	Nick Piggin, David Rientjes, akpm, eric.whitney

PATCH:  numa - prep:  move generic percpu interface definitions to percpu-defs.h

Against:  2.6.33-mmotm-100302-1838

To use the generic percpu infrastructure for the numa_node_id() interface,
defined in linux/topology.h, we need to break the circular header dependency
that results from including <linux/percpu.h> in <linux/topology.h>.  The
circular dependency:

	percpu.h -> slab.h -> gfp.h -> topology.h

percpu.h includes slab.h to obtain the definition of kzalloc()/kfree() for
inlining __alloc_percpu() and free_percpu() in !SMP configurations.  One could
un-inline these functions in the !SMP case, but a large number of files depend
on percpu.h to include slab.h.  Tejun Heo suggested moving the definitions to
percpu-defs.h and requested that this be separated from the remainder of the
generic percpu numa_node_id() preparation patch.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

 include/linux/percpu-defs.h |  455 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/percpu.h      |  454 -------------------------------------------
 2 files changed, 455 insertions(+), 454 deletions(-)

Index: linux-2.6.33-mmotm-100302-1838/include/linux/percpu-defs.h
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/include/linux/percpu-defs.h
+++ linux-2.6.33-mmotm-100302-1838/include/linux/percpu-defs.h
@@ -151,4 +151,459 @@
 #define EXPORT_PER_CPU_SYMBOL_GPL(var)
 #endif
 
+/*
+ * Optional methods for optimized non-lvalue per-cpu variable access.
+ *
+ * @var can be a percpu variable or a field of it and its size should
+ * equal char, int or long.  percpu_read() evaluates to a lvalue and
+ * all others to void.
+ *
+ * These operations are guaranteed to be atomic w.r.t. preemption.
+ * The generic versions use plain get/put_cpu_var().  Archs are
+ * encouraged to implement single-instruction alternatives which don't
+ * require preemption protection.
+ */
+#ifndef percpu_read
+# define percpu_read(var)						\
+  ({									\
+	typeof(var) *pr_ptr__ = &(var);					\
+	typeof(var) pr_ret__;						\
+	pr_ret__ = get_cpu_var(*pr_ptr__);				\
+	put_cpu_var(*pr_ptr__);						\
+	pr_ret__;							\
+  })
+#endif
+
+#define __percpu_generic_to_op(var, val, op)				\
+do {									\
+	typeof(var) *pgto_ptr__ = &(var);				\
+	get_cpu_var(*pgto_ptr__) op val;				\
+	put_cpu_var(*pgto_ptr__);					\
+} while (0)
+
+#ifndef percpu_write
+# define percpu_write(var, val)		__percpu_generic_to_op(var, (val), =)
+#endif
+
+#ifndef percpu_add
+# define percpu_add(var, val)		__percpu_generic_to_op(var, (val), +=)
+#endif
+
+#ifndef percpu_sub
+# define percpu_sub(var, val)		__percpu_generic_to_op(var, (val), -=)
+#endif
+
+#ifndef percpu_and
+# define percpu_and(var, val)		__percpu_generic_to_op(var, (val), &=)
+#endif
+
+#ifndef percpu_or
+# define percpu_or(var, val)		__percpu_generic_to_op(var, (val), |=)
+#endif
+
+#ifndef percpu_xor
+# define percpu_xor(var, val)		__percpu_generic_to_op(var, (val), ^=)
+#endif
+
+/*
+ * Branching function to split up a function into a set of functions that
+ * are called for different scalar sizes of the objects handled.
+ */
+
+extern void __bad_size_call_parameter(void);
+
+#define __pcpu_size_call_return(stem, variable)				\
+({	typeof(variable) pscr_ret__;					\
+	__verify_pcpu_ptr(&(variable));					\
+	switch(sizeof(variable)) {					\
+	case 1: pscr_ret__ = stem##1(variable);break;			\
+	case 2: pscr_ret__ = stem##2(variable);break;			\
+	case 4: pscr_ret__ = stem##4(variable);break;			\
+	case 8: pscr_ret__ = stem##8(variable);break;			\
+	default:							\
+		__bad_size_call_parameter();break;			\
+	}								\
+	pscr_ret__;							\
+})
+
+#define __pcpu_size_call(stem, variable, ...)				\
+do {									\
+	__verify_pcpu_ptr(&(variable));					\
+	switch(sizeof(variable)) {					\
+		case 1: stem##1(variable, __VA_ARGS__);break;		\
+		case 2: stem##2(variable, __VA_ARGS__);break;		\
+		case 4: stem##4(variable, __VA_ARGS__);break;		\
+		case 8: stem##8(variable, __VA_ARGS__);break;		\
+		default: 						\
+			__bad_size_call_parameter();break;		\
+	}								\
+} while (0)
+
+/*
+ * Optimized manipulation for memory allocated through the per cpu
+ * allocator or for addresses of per cpu variables.
+ *
+ * These operation guarantee exclusivity of access for other operations
+ * on the *same* processor. The assumption is that per cpu data is only
+ * accessed by a single processor instance (the current one).
+ *
+ * The first group is used for accesses that must be done in a
+ * preemption safe way since we know that the context is not preempt
+ * safe. Interrupts may occur. If the interrupt modifies the variable
+ * too then RMW actions will not be reliable.
+ *
+ * The arch code can provide optimized functions in two ways:
+ *
+ * 1. Override the function completely. F.e. define this_cpu_add().
+ *    The arch must then ensure that the various scalar format passed
+ *    are handled correctly.
+ *
+ * 2. Provide functions for certain scalar sizes. F.e. provide
+ *    this_cpu_add_2() to provide per cpu atomic operations for 2 byte
+ *    sized RMW actions. If arch code does not provide operations for
+ *    a scalar size then the fallback in the generic code will be
+ *    used.
+ */
+
+#define _this_cpu_generic_read(pcp)					\
+({	typeof(pcp) ret__;						\
+	preempt_disable();						\
+	ret__ = *this_cpu_ptr(&(pcp));					\
+	preempt_enable();						\
+	ret__;								\
+})
+
+#ifndef this_cpu_read
+# ifndef this_cpu_read_1
+#  define this_cpu_read_1(pcp)	_this_cpu_generic_read(pcp)
+# endif
+# ifndef this_cpu_read_2
+#  define this_cpu_read_2(pcp)	_this_cpu_generic_read(pcp)
+# endif
+# ifndef this_cpu_read_4
+#  define this_cpu_read_4(pcp)	_this_cpu_generic_read(pcp)
+# endif
+# ifndef this_cpu_read_8
+#  define this_cpu_read_8(pcp)	_this_cpu_generic_read(pcp)
+# endif
+# define this_cpu_read(pcp)	__pcpu_size_call_return(this_cpu_read_, (pcp))
+#endif
+
+#define _this_cpu_generic_to_op(pcp, val, op)				\
+do {									\
+	preempt_disable();						\
+	*__this_cpu_ptr(&(pcp)) op val;					\
+	preempt_enable();						\
+} while (0)
+
+#ifndef this_cpu_write
+# ifndef this_cpu_write_1
+#  define this_cpu_write_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), =)
+# endif
+# ifndef this_cpu_write_2
+#  define this_cpu_write_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), =)
+# endif
+# ifndef this_cpu_write_4
+#  define this_cpu_write_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), =)
+# endif
+# ifndef this_cpu_write_8
+#  define this_cpu_write_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), =)
+# endif
+# define this_cpu_write(pcp, val)	__pcpu_size_call(this_cpu_write_, (pcp), (val))
+#endif
+
+#ifndef this_cpu_add
+# ifndef this_cpu_add_1
+#  define this_cpu_add_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# ifndef this_cpu_add_2
+#  define this_cpu_add_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# ifndef this_cpu_add_4
+#  define this_cpu_add_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# ifndef this_cpu_add_8
+#  define this_cpu_add_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# define this_cpu_add(pcp, val)		__pcpu_size_call(this_cpu_add_, (pcp), (val))
+#endif
+
+#ifndef this_cpu_sub
+# define this_cpu_sub(pcp, val)		this_cpu_add((pcp), -(val))
+#endif
+
+#ifndef this_cpu_inc
+# define this_cpu_inc(pcp)		this_cpu_add((pcp), 1)
+#endif
+
+#ifndef this_cpu_dec
+# define this_cpu_dec(pcp)		this_cpu_sub((pcp), 1)
+#endif
+
+#ifndef this_cpu_and
+# ifndef this_cpu_and_1
+#  define this_cpu_and_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# ifndef this_cpu_and_2
+#  define this_cpu_and_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# ifndef this_cpu_and_4
+#  define this_cpu_and_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# ifndef this_cpu_and_8
+#  define this_cpu_and_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# define this_cpu_and(pcp, val)		__pcpu_size_call(this_cpu_and_, (pcp), (val))
+#endif
+
+#ifndef this_cpu_or
+# ifndef this_cpu_or_1
+#  define this_cpu_or_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# ifndef this_cpu_or_2
+#  define this_cpu_or_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# ifndef this_cpu_or_4
+#  define this_cpu_or_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# ifndef this_cpu_or_8
+#  define this_cpu_or_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# define this_cpu_or(pcp, val)		__pcpu_size_call(this_cpu_or_, (pcp), (val))
+#endif
+
+#ifndef this_cpu_xor
+# ifndef this_cpu_xor_1
+#  define this_cpu_xor_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# ifndef this_cpu_xor_2
+#  define this_cpu_xor_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# ifndef this_cpu_xor_4
+#  define this_cpu_xor_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# ifndef this_cpu_xor_8
+#  define this_cpu_xor_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# define this_cpu_xor(pcp, val)		__pcpu_size_call(this_cpu_or_, (pcp), (val))
+#endif
+
+/*
+ * Generic percpu operations that do not require preemption handling.
+ * Either we do not care about races or the caller has the
+ * responsibility of handling preemptions issues. Arch code can still
+ * override these instructions since the arch per cpu code may be more
+ * efficient and may actually get race freeness for free (that is the
+ * case for x86 for example).
+ *
+ * If there is no other protection through preempt disable and/or
+ * disabling interupts then one of these RMW operations can show unexpected
+ * behavior because the execution thread was rescheduled on another processor
+ * or an interrupt occurred and the same percpu variable was modified from
+ * the interrupt context.
+ */
+#ifndef __this_cpu_read
+# ifndef __this_cpu_read_1
+#  define __this_cpu_read_1(pcp)	(*__this_cpu_ptr(&(pcp)))
+# endif
+# ifndef __this_cpu_read_2
+#  define __this_cpu_read_2(pcp)	(*__this_cpu_ptr(&(pcp)))
+# endif
+# ifndef __this_cpu_read_4
+#  define __this_cpu_read_4(pcp)	(*__this_cpu_ptr(&(pcp)))
+# endif
+# ifndef __this_cpu_read_8
+#  define __this_cpu_read_8(pcp)	(*__this_cpu_ptr(&(pcp)))
+# endif
+# define __this_cpu_read(pcp)	__pcpu_size_call_return(__this_cpu_read_, (pcp))
+#endif
+
+#define __this_cpu_generic_to_op(pcp, val, op)				\
+do {									\
+	*__this_cpu_ptr(&(pcp)) op val;					\
+} while (0)
+
+#ifndef __this_cpu_write
+# ifndef __this_cpu_write_1
+#  define __this_cpu_write_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), =)
+# endif
+# ifndef __this_cpu_write_2
+#  define __this_cpu_write_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), =)
+# endif
+# ifndef __this_cpu_write_4
+#  define __this_cpu_write_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), =)
+# endif
+# ifndef __this_cpu_write_8
+#  define __this_cpu_write_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), =)
+# endif
+# define __this_cpu_write(pcp, val)	__pcpu_size_call(__this_cpu_write_, (pcp), (val))
+#endif
+
+#ifndef __this_cpu_add
+# ifndef __this_cpu_add_1
+#  define __this_cpu_add_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# ifndef __this_cpu_add_2
+#  define __this_cpu_add_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# ifndef __this_cpu_add_4
+#  define __this_cpu_add_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# ifndef __this_cpu_add_8
+#  define __this_cpu_add_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# define __this_cpu_add(pcp, val)	__pcpu_size_call(__this_cpu_add_, (pcp), (val))
+#endif
+
+#ifndef __this_cpu_sub
+# define __this_cpu_sub(pcp, val)	__this_cpu_add((pcp), -(val))
+#endif
+
+#ifndef __this_cpu_inc
+# define __this_cpu_inc(pcp)		__this_cpu_add((pcp), 1)
+#endif
+
+#ifndef __this_cpu_dec
+# define __this_cpu_dec(pcp)		__this_cpu_sub((pcp), 1)
+#endif
+
+#ifndef __this_cpu_and
+# ifndef __this_cpu_and_1
+#  define __this_cpu_and_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# ifndef __this_cpu_and_2
+#  define __this_cpu_and_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# ifndef __this_cpu_and_4
+#  define __this_cpu_and_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# ifndef __this_cpu_and_8
+#  define __this_cpu_and_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# define __this_cpu_and(pcp, val)	__pcpu_size_call(__this_cpu_and_, (pcp), (val))
+#endif
+
+#ifndef __this_cpu_or
+# ifndef __this_cpu_or_1
+#  define __this_cpu_or_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# ifndef __this_cpu_or_2
+#  define __this_cpu_or_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# ifndef __this_cpu_or_4
+#  define __this_cpu_or_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# ifndef __this_cpu_or_8
+#  define __this_cpu_or_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# define __this_cpu_or(pcp, val)	__pcpu_size_call(__this_cpu_or_, (pcp), (val))
+#endif
+
+#ifndef __this_cpu_xor
+# ifndef __this_cpu_xor_1
+#  define __this_cpu_xor_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# ifndef __this_cpu_xor_2
+#  define __this_cpu_xor_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# ifndef __this_cpu_xor_4
+#  define __this_cpu_xor_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# ifndef __this_cpu_xor_8
+#  define __this_cpu_xor_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# define __this_cpu_xor(pcp, val)	__pcpu_size_call(__this_cpu_xor_, (pcp), (val))
+#endif
+
+/*
+ * IRQ safe versions of the per cpu RMW operations. Note that these operations
+ * are *not* safe against modification of the same variable from another
+ * processors (which one gets when using regular atomic operations)
+ . They are guaranteed to be atomic vs. local interrupts and
+ * preemption only.
+ */
+#define irqsafe_cpu_generic_to_op(pcp, val, op)				\
+do {									\
+	unsigned long flags;						\
+	local_irq_save(flags);						\
+	*__this_cpu_ptr(&(pcp)) op val;					\
+	local_irq_restore(flags);					\
+} while (0)
+
+#ifndef irqsafe_cpu_add
+# ifndef irqsafe_cpu_add_1
+#  define irqsafe_cpu_add_1(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# ifndef irqsafe_cpu_add_2
+#  define irqsafe_cpu_add_2(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# ifndef irqsafe_cpu_add_4
+#  define irqsafe_cpu_add_4(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# ifndef irqsafe_cpu_add_8
+#  define irqsafe_cpu_add_8(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# define irqsafe_cpu_add(pcp, val) __pcpu_size_call(irqsafe_cpu_add_, (pcp), (val))
+#endif
+
+#ifndef irqsafe_cpu_sub
+# define irqsafe_cpu_sub(pcp, val)	irqsafe_cpu_add((pcp), -(val))
+#endif
+
+#ifndef irqsafe_cpu_inc
+# define irqsafe_cpu_inc(pcp)	irqsafe_cpu_add((pcp), 1)
+#endif
+
+#ifndef irqsafe_cpu_dec
+# define irqsafe_cpu_dec(pcp)	irqsafe_cpu_sub((pcp), 1)
+#endif
+
+#ifndef irqsafe_cpu_and
+# ifndef irqsafe_cpu_and_1
+#  define irqsafe_cpu_and_1(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# ifndef irqsafe_cpu_and_2
+#  define irqsafe_cpu_and_2(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# ifndef irqsafe_cpu_and_4
+#  define irqsafe_cpu_and_4(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# ifndef irqsafe_cpu_and_8
+#  define irqsafe_cpu_and_8(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# define irqsafe_cpu_and(pcp, val) __pcpu_size_call(irqsafe_cpu_and_, (val))
+#endif
+
+#ifndef irqsafe_cpu_or
+# ifndef irqsafe_cpu_or_1
+#  define irqsafe_cpu_or_1(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# ifndef irqsafe_cpu_or_2
+#  define irqsafe_cpu_or_2(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# ifndef irqsafe_cpu_or_4
+#  define irqsafe_cpu_or_4(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# ifndef irqsafe_cpu_or_8
+#  define irqsafe_cpu_or_8(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# define irqsafe_cpu_or(pcp, val) __pcpu_size_call(irqsafe_cpu_or_, (val))
+#endif
+
+#ifndef irqsafe_cpu_xor
+# ifndef irqsafe_cpu_xor_1
+#  define irqsafe_cpu_xor_1(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# ifndef irqsafe_cpu_xor_2
+#  define irqsafe_cpu_xor_2(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# ifndef irqsafe_cpu_xor_4
+#  define irqsafe_cpu_xor_4(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# ifndef irqsafe_cpu_xor_8
+#  define irqsafe_cpu_xor_8(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# define irqsafe_cpu_xor(pcp, val) __pcpu_size_call(irqsafe_cpu_xor_, (val))
+#endif
+
 #endif /* _LINUX_PERCPU_DEFS_H */
Index: linux-2.6.33-mmotm-100302-1838/include/linux/percpu.h
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/include/linux/percpu.h
+++ linux-2.6.33-mmotm-100302-1838/include/linux/percpu.h
@@ -180,459 +180,5 @@ static inline void *pcpu_lpage_remapped(
 #define alloc_percpu(type)	\
 	(typeof(type) __percpu *)__alloc_percpu(sizeof(type), __alignof__(type))
 
-/*
- * Optional methods for optimized non-lvalue per-cpu variable access.
- *
- * @var can be a percpu variable or a field of it and its size should
- * equal char, int or long.  percpu_read() evaluates to a lvalue and
- * all others to void.
- *
- * These operations are guaranteed to be atomic w.r.t. preemption.
- * The generic versions use plain get/put_cpu_var().  Archs are
- * encouraged to implement single-instruction alternatives which don't
- * require preemption protection.
- */
-#ifndef percpu_read
-# define percpu_read(var)						\
-  ({									\
-	typeof(var) *pr_ptr__ = &(var);					\
-	typeof(var) pr_ret__;						\
-	pr_ret__ = get_cpu_var(*pr_ptr__);				\
-	put_cpu_var(*pr_ptr__);						\
-	pr_ret__;							\
-  })
-#endif
-
-#define __percpu_generic_to_op(var, val, op)				\
-do {									\
-	typeof(var) *pgto_ptr__ = &(var);				\
-	get_cpu_var(*pgto_ptr__) op val;				\
-	put_cpu_var(*pgto_ptr__);					\
-} while (0)
-
-#ifndef percpu_write
-# define percpu_write(var, val)		__percpu_generic_to_op(var, (val), =)
-#endif
-
-#ifndef percpu_add
-# define percpu_add(var, val)		__percpu_generic_to_op(var, (val), +=)
-#endif
-
-#ifndef percpu_sub
-# define percpu_sub(var, val)		__percpu_generic_to_op(var, (val), -=)
-#endif
-
-#ifndef percpu_and
-# define percpu_and(var, val)		__percpu_generic_to_op(var, (val), &=)
-#endif
-
-#ifndef percpu_or
-# define percpu_or(var, val)		__percpu_generic_to_op(var, (val), |=)
-#endif
-
-#ifndef percpu_xor
-# define percpu_xor(var, val)		__percpu_generic_to_op(var, (val), ^=)
-#endif
-
-/*
- * Branching function to split up a function into a set of functions that
- * are called for different scalar sizes of the objects handled.
- */
-
-extern void __bad_size_call_parameter(void);
-
-#define __pcpu_size_call_return(stem, variable)				\
-({	typeof(variable) pscr_ret__;					\
-	__verify_pcpu_ptr(&(variable));					\
-	switch(sizeof(variable)) {					\
-	case 1: pscr_ret__ = stem##1(variable);break;			\
-	case 2: pscr_ret__ = stem##2(variable);break;			\
-	case 4: pscr_ret__ = stem##4(variable);break;			\
-	case 8: pscr_ret__ = stem##8(variable);break;			\
-	default:							\
-		__bad_size_call_parameter();break;			\
-	}								\
-	pscr_ret__;							\
-})
-
-#define __pcpu_size_call(stem, variable, ...)				\
-do {									\
-	__verify_pcpu_ptr(&(variable));					\
-	switch(sizeof(variable)) {					\
-		case 1: stem##1(variable, __VA_ARGS__);break;		\
-		case 2: stem##2(variable, __VA_ARGS__);break;		\
-		case 4: stem##4(variable, __VA_ARGS__);break;		\
-		case 8: stem##8(variable, __VA_ARGS__);break;		\
-		default: 						\
-			__bad_size_call_parameter();break;		\
-	}								\
-} while (0)
-
-/*
- * Optimized manipulation for memory allocated through the per cpu
- * allocator or for addresses of per cpu variables.
- *
- * These operation guarantee exclusivity of access for other operations
- * on the *same* processor. The assumption is that per cpu data is only
- * accessed by a single processor instance (the current one).
- *
- * The first group is used for accesses that must be done in a
- * preemption safe way since we know that the context is not preempt
- * safe. Interrupts may occur. If the interrupt modifies the variable
- * too then RMW actions will not be reliable.
- *
- * The arch code can provide optimized functions in two ways:
- *
- * 1. Override the function completely. F.e. define this_cpu_add().
- *    The arch must then ensure that the various scalar format passed
- *    are handled correctly.
- *
- * 2. Provide functions for certain scalar sizes. F.e. provide
- *    this_cpu_add_2() to provide per cpu atomic operations for 2 byte
- *    sized RMW actions. If arch code does not provide operations for
- *    a scalar size then the fallback in the generic code will be
- *    used.
- */
-
-#define _this_cpu_generic_read(pcp)					\
-({	typeof(pcp) ret__;						\
-	preempt_disable();						\
-	ret__ = *this_cpu_ptr(&(pcp));					\
-	preempt_enable();						\
-	ret__;								\
-})
-
-#ifndef this_cpu_read
-# ifndef this_cpu_read_1
-#  define this_cpu_read_1(pcp)	_this_cpu_generic_read(pcp)
-# endif
-# ifndef this_cpu_read_2
-#  define this_cpu_read_2(pcp)	_this_cpu_generic_read(pcp)
-# endif
-# ifndef this_cpu_read_4
-#  define this_cpu_read_4(pcp)	_this_cpu_generic_read(pcp)
-# endif
-# ifndef this_cpu_read_8
-#  define this_cpu_read_8(pcp)	_this_cpu_generic_read(pcp)
-# endif
-# define this_cpu_read(pcp)	__pcpu_size_call_return(this_cpu_read_, (pcp))
-#endif
-
-#define _this_cpu_generic_to_op(pcp, val, op)				\
-do {									\
-	preempt_disable();						\
-	*__this_cpu_ptr(&(pcp)) op val;					\
-	preempt_enable();						\
-} while (0)
-
-#ifndef this_cpu_write
-# ifndef this_cpu_write_1
-#  define this_cpu_write_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), =)
-# endif
-# ifndef this_cpu_write_2
-#  define this_cpu_write_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), =)
-# endif
-# ifndef this_cpu_write_4
-#  define this_cpu_write_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), =)
-# endif
-# ifndef this_cpu_write_8
-#  define this_cpu_write_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), =)
-# endif
-# define this_cpu_write(pcp, val)	__pcpu_size_call(this_cpu_write_, (pcp), (val))
-#endif
-
-#ifndef this_cpu_add
-# ifndef this_cpu_add_1
-#  define this_cpu_add_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# ifndef this_cpu_add_2
-#  define this_cpu_add_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# ifndef this_cpu_add_4
-#  define this_cpu_add_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# ifndef this_cpu_add_8
-#  define this_cpu_add_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# define this_cpu_add(pcp, val)		__pcpu_size_call(this_cpu_add_, (pcp), (val))
-#endif
-
-#ifndef this_cpu_sub
-# define this_cpu_sub(pcp, val)		this_cpu_add((pcp), -(val))
-#endif
-
-#ifndef this_cpu_inc
-# define this_cpu_inc(pcp)		this_cpu_add((pcp), 1)
-#endif
-
-#ifndef this_cpu_dec
-# define this_cpu_dec(pcp)		this_cpu_sub((pcp), 1)
-#endif
-
-#ifndef this_cpu_and
-# ifndef this_cpu_and_1
-#  define this_cpu_and_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# ifndef this_cpu_and_2
-#  define this_cpu_and_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# ifndef this_cpu_and_4
-#  define this_cpu_and_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# ifndef this_cpu_and_8
-#  define this_cpu_and_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# define this_cpu_and(pcp, val)		__pcpu_size_call(this_cpu_and_, (pcp), (val))
-#endif
-
-#ifndef this_cpu_or
-# ifndef this_cpu_or_1
-#  define this_cpu_or_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# ifndef this_cpu_or_2
-#  define this_cpu_or_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# ifndef this_cpu_or_4
-#  define this_cpu_or_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# ifndef this_cpu_or_8
-#  define this_cpu_or_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# define this_cpu_or(pcp, val)		__pcpu_size_call(this_cpu_or_, (pcp), (val))
-#endif
-
-#ifndef this_cpu_xor
-# ifndef this_cpu_xor_1
-#  define this_cpu_xor_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# ifndef this_cpu_xor_2
-#  define this_cpu_xor_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# ifndef this_cpu_xor_4
-#  define this_cpu_xor_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# ifndef this_cpu_xor_8
-#  define this_cpu_xor_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# define this_cpu_xor(pcp, val)		__pcpu_size_call(this_cpu_or_, (pcp), (val))
-#endif
-
-/*
- * Generic percpu operations that do not require preemption handling.
- * Either we do not care about races or the caller has the
- * responsibility of handling preemptions issues. Arch code can still
- * override these instructions since the arch per cpu code may be more
- * efficient and may actually get race freeness for free (that is the
- * case for x86 for example).
- *
- * If there is no other protection through preempt disable and/or
- * disabling interupts then one of these RMW operations can show unexpected
- * behavior because the execution thread was rescheduled on another processor
- * or an interrupt occurred and the same percpu variable was modified from
- * the interrupt context.
- */
-#ifndef __this_cpu_read
-# ifndef __this_cpu_read_1
-#  define __this_cpu_read_1(pcp)	(*__this_cpu_ptr(&(pcp)))
-# endif
-# ifndef __this_cpu_read_2
-#  define __this_cpu_read_2(pcp)	(*__this_cpu_ptr(&(pcp)))
-# endif
-# ifndef __this_cpu_read_4
-#  define __this_cpu_read_4(pcp)	(*__this_cpu_ptr(&(pcp)))
-# endif
-# ifndef __this_cpu_read_8
-#  define __this_cpu_read_8(pcp)	(*__this_cpu_ptr(&(pcp)))
-# endif
-# define __this_cpu_read(pcp)	__pcpu_size_call_return(__this_cpu_read_, (pcp))
-#endif
-
-#define __this_cpu_generic_to_op(pcp, val, op)				\
-do {									\
-	*__this_cpu_ptr(&(pcp)) op val;					\
-} while (0)
-
-#ifndef __this_cpu_write
-# ifndef __this_cpu_write_1
-#  define __this_cpu_write_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), =)
-# endif
-# ifndef __this_cpu_write_2
-#  define __this_cpu_write_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), =)
-# endif
-# ifndef __this_cpu_write_4
-#  define __this_cpu_write_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), =)
-# endif
-# ifndef __this_cpu_write_8
-#  define __this_cpu_write_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), =)
-# endif
-# define __this_cpu_write(pcp, val)	__pcpu_size_call(__this_cpu_write_, (pcp), (val))
-#endif
-
-#ifndef __this_cpu_add
-# ifndef __this_cpu_add_1
-#  define __this_cpu_add_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# ifndef __this_cpu_add_2
-#  define __this_cpu_add_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# ifndef __this_cpu_add_4
-#  define __this_cpu_add_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# ifndef __this_cpu_add_8
-#  define __this_cpu_add_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# define __this_cpu_add(pcp, val)	__pcpu_size_call(__this_cpu_add_, (pcp), (val))
-#endif
-
-#ifndef __this_cpu_sub
-# define __this_cpu_sub(pcp, val)	__this_cpu_add((pcp), -(val))
-#endif
-
-#ifndef __this_cpu_inc
-# define __this_cpu_inc(pcp)		__this_cpu_add((pcp), 1)
-#endif
-
-#ifndef __this_cpu_dec
-# define __this_cpu_dec(pcp)		__this_cpu_sub((pcp), 1)
-#endif
-
-#ifndef __this_cpu_and
-# ifndef __this_cpu_and_1
-#  define __this_cpu_and_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# ifndef __this_cpu_and_2
-#  define __this_cpu_and_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# ifndef __this_cpu_and_4
-#  define __this_cpu_and_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# ifndef __this_cpu_and_8
-#  define __this_cpu_and_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# define __this_cpu_and(pcp, val)	__pcpu_size_call(__this_cpu_and_, (pcp), (val))
-#endif
-
-#ifndef __this_cpu_or
-# ifndef __this_cpu_or_1
-#  define __this_cpu_or_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# ifndef __this_cpu_or_2
-#  define __this_cpu_or_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# ifndef __this_cpu_or_4
-#  define __this_cpu_or_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# ifndef __this_cpu_or_8
-#  define __this_cpu_or_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# define __this_cpu_or(pcp, val)	__pcpu_size_call(__this_cpu_or_, (pcp), (val))
-#endif
-
-#ifndef __this_cpu_xor
-# ifndef __this_cpu_xor_1
-#  define __this_cpu_xor_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# ifndef __this_cpu_xor_2
-#  define __this_cpu_xor_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# ifndef __this_cpu_xor_4
-#  define __this_cpu_xor_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# ifndef __this_cpu_xor_8
-#  define __this_cpu_xor_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# define __this_cpu_xor(pcp, val)	__pcpu_size_call(__this_cpu_xor_, (pcp), (val))
-#endif
-
-/*
- * IRQ safe versions of the per cpu RMW operations. Note that these operations
- * are *not* safe against modification of the same variable from another
- * processors (which one gets when using regular atomic operations)
- . They are guaranteed to be atomic vs. local interrupts and
- * preemption only.
- */
-#define irqsafe_cpu_generic_to_op(pcp, val, op)				\
-do {									\
-	unsigned long flags;						\
-	local_irq_save(flags);						\
-	*__this_cpu_ptr(&(pcp)) op val;					\
-	local_irq_restore(flags);					\
-} while (0)
-
-#ifndef irqsafe_cpu_add
-# ifndef irqsafe_cpu_add_1
-#  define irqsafe_cpu_add_1(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# ifndef irqsafe_cpu_add_2
-#  define irqsafe_cpu_add_2(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# ifndef irqsafe_cpu_add_4
-#  define irqsafe_cpu_add_4(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# ifndef irqsafe_cpu_add_8
-#  define irqsafe_cpu_add_8(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# define irqsafe_cpu_add(pcp, val) __pcpu_size_call(irqsafe_cpu_add_, (pcp), (val))
-#endif
-
-#ifndef irqsafe_cpu_sub
-# define irqsafe_cpu_sub(pcp, val)	irqsafe_cpu_add((pcp), -(val))
-#endif
-
-#ifndef irqsafe_cpu_inc
-# define irqsafe_cpu_inc(pcp)	irqsafe_cpu_add((pcp), 1)
-#endif
-
-#ifndef irqsafe_cpu_dec
-# define irqsafe_cpu_dec(pcp)	irqsafe_cpu_sub((pcp), 1)
-#endif
-
-#ifndef irqsafe_cpu_and
-# ifndef irqsafe_cpu_and_1
-#  define irqsafe_cpu_and_1(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# ifndef irqsafe_cpu_and_2
-#  define irqsafe_cpu_and_2(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# ifndef irqsafe_cpu_and_4
-#  define irqsafe_cpu_and_4(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# ifndef irqsafe_cpu_and_8
-#  define irqsafe_cpu_and_8(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# define irqsafe_cpu_and(pcp, val) __pcpu_size_call(irqsafe_cpu_and_, (val))
-#endif
-
-#ifndef irqsafe_cpu_or
-# ifndef irqsafe_cpu_or_1
-#  define irqsafe_cpu_or_1(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# ifndef irqsafe_cpu_or_2
-#  define irqsafe_cpu_or_2(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# ifndef irqsafe_cpu_or_4
-#  define irqsafe_cpu_or_4(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# ifndef irqsafe_cpu_or_8
-#  define irqsafe_cpu_or_8(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# define irqsafe_cpu_or(pcp, val) __pcpu_size_call(irqsafe_cpu_or_, (val))
-#endif
-
-#ifndef irqsafe_cpu_xor
-# ifndef irqsafe_cpu_xor_1
-#  define irqsafe_cpu_xor_1(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# ifndef irqsafe_cpu_xor_2
-#  define irqsafe_cpu_xor_2(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# ifndef irqsafe_cpu_xor_4
-#  define irqsafe_cpu_xor_4(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# ifndef irqsafe_cpu_xor_8
-#  define irqsafe_cpu_xor_8(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# define irqsafe_cpu_xor(pcp, val) __pcpu_size_call(irqsafe_cpu_xor_, (val))
-#endif
 
 #endif /* __LINUX_PERCPU_H */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH/RFC 1/8] numa: prep:  move generic percpu interface definitions to percpu-defs.h
@ 2010-03-04 17:07   ` Lee Schermerhorn
  0 siblings, 0 replies; 42+ messages in thread
From: Lee Schermerhorn @ 2010-03-04 17:07 UTC (permalink / raw)
  To: linux-arch, linux-mm, linux-numa
  Cc: Tejun Heo, Mel Gorman, Andi Kleen, Christoph Lameter,
	Nick Piggin, David Rientjes, akpm, eric.whitney

PATCH:  numa - prep:  move generic percpu interface definitions to percpu-defs.h

Against:  2.6.33-mmotm-100302-1838

To use the generic percpu infrastructure for the numa_node_id() interface,
defined in linux/topology.h, we need to break the circular header dependency
that results from including <linux/percpu.h> in <linux/topology.h>.  The
circular dependency:

	percpu.h -> slab.h -> gfp.h -> topology.h

percpu.h includes slab.h to obtain the definition of kzalloc()/kfree() for
inlining __alloc_percpu() and free_percpu() in !SMP configurations.  One could
un-inline these functions in the !SMP case, but a large number of files depend
on percpu.h to include slab.h.  Tejun Heo suggested moving the definitions to
percpu-defs.h and requested that this be separated from the remainder of the
generic percpu numa_node_id() preparation patch.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

 include/linux/percpu-defs.h |  455 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/percpu.h      |  454 -------------------------------------------
 2 files changed, 455 insertions(+), 454 deletions(-)

Index: linux-2.6.33-mmotm-100302-1838/include/linux/percpu-defs.h
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/include/linux/percpu-defs.h
+++ linux-2.6.33-mmotm-100302-1838/include/linux/percpu-defs.h
@@ -151,4 +151,459 @@
 #define EXPORT_PER_CPU_SYMBOL_GPL(var)
 #endif
 
+/*
+ * Optional methods for optimized non-lvalue per-cpu variable access.
+ *
+ * @var can be a percpu variable or a field of it and its size should
+ * equal char, int or long.  percpu_read() evaluates to a lvalue and
+ * all others to void.
+ *
+ * These operations are guaranteed to be atomic w.r.t. preemption.
+ * The generic versions use plain get/put_cpu_var().  Archs are
+ * encouraged to implement single-instruction alternatives which don't
+ * require preemption protection.
+ */
+#ifndef percpu_read
+# define percpu_read(var)						\
+  ({									\
+	typeof(var) *pr_ptr__ = &(var);					\
+	typeof(var) pr_ret__;						\
+	pr_ret__ = get_cpu_var(*pr_ptr__);				\
+	put_cpu_var(*pr_ptr__);						\
+	pr_ret__;							\
+  })
+#endif
+
+#define __percpu_generic_to_op(var, val, op)				\
+do {									\
+	typeof(var) *pgto_ptr__ = &(var);				\
+	get_cpu_var(*pgto_ptr__) op val;				\
+	put_cpu_var(*pgto_ptr__);					\
+} while (0)
+
+#ifndef percpu_write
+# define percpu_write(var, val)		__percpu_generic_to_op(var, (val), =)
+#endif
+
+#ifndef percpu_add
+# define percpu_add(var, val)		__percpu_generic_to_op(var, (val), +=)
+#endif
+
+#ifndef percpu_sub
+# define percpu_sub(var, val)		__percpu_generic_to_op(var, (val), -=)
+#endif
+
+#ifndef percpu_and
+# define percpu_and(var, val)		__percpu_generic_to_op(var, (val), &=)
+#endif
+
+#ifndef percpu_or
+# define percpu_or(var, val)		__percpu_generic_to_op(var, (val), |=)
+#endif
+
+#ifndef percpu_xor
+# define percpu_xor(var, val)		__percpu_generic_to_op(var, (val), ^=)
+#endif
+
+/*
+ * Branching function to split up a function into a set of functions that
+ * are called for different scalar sizes of the objects handled.
+ */
+
+extern void __bad_size_call_parameter(void);
+
+#define __pcpu_size_call_return(stem, variable)				\
+({	typeof(variable) pscr_ret__;					\
+	__verify_pcpu_ptr(&(variable));					\
+	switch(sizeof(variable)) {					\
+	case 1: pscr_ret__ = stem##1(variable);break;			\
+	case 2: pscr_ret__ = stem##2(variable);break;			\
+	case 4: pscr_ret__ = stem##4(variable);break;			\
+	case 8: pscr_ret__ = stem##8(variable);break;			\
+	default:							\
+		__bad_size_call_parameter();break;			\
+	}								\
+	pscr_ret__;							\
+})
+
+#define __pcpu_size_call(stem, variable, ...)				\
+do {									\
+	__verify_pcpu_ptr(&(variable));					\
+	switch(sizeof(variable)) {					\
+		case 1: stem##1(variable, __VA_ARGS__);break;		\
+		case 2: stem##2(variable, __VA_ARGS__);break;		\
+		case 4: stem##4(variable, __VA_ARGS__);break;		\
+		case 8: stem##8(variable, __VA_ARGS__);break;		\
+		default: 						\
+			__bad_size_call_parameter();break;		\
+	}								\
+} while (0)
+
+/*
+ * Optimized manipulation for memory allocated through the per cpu
+ * allocator or for addresses of per cpu variables.
+ *
+ * These operation guarantee exclusivity of access for other operations
+ * on the *same* processor. The assumption is that per cpu data is only
+ * accessed by a single processor instance (the current one).
+ *
+ * The first group is used for accesses that must be done in a
+ * preemption safe way since we know that the context is not preempt
+ * safe. Interrupts may occur. If the interrupt modifies the variable
+ * too then RMW actions will not be reliable.
+ *
+ * The arch code can provide optimized functions in two ways:
+ *
+ * 1. Override the function completely. F.e. define this_cpu_add().
+ *    The arch must then ensure that the various scalar format passed
+ *    are handled correctly.
+ *
+ * 2. Provide functions for certain scalar sizes. F.e. provide
+ *    this_cpu_add_2() to provide per cpu atomic operations for 2 byte
+ *    sized RMW actions. If arch code does not provide operations for
+ *    a scalar size then the fallback in the generic code will be
+ *    used.
+ */
+
+#define _this_cpu_generic_read(pcp)					\
+({	typeof(pcp) ret__;						\
+	preempt_disable();						\
+	ret__ = *this_cpu_ptr(&(pcp));					\
+	preempt_enable();						\
+	ret__;								\
+})
+
+#ifndef this_cpu_read
+# ifndef this_cpu_read_1
+#  define this_cpu_read_1(pcp)	_this_cpu_generic_read(pcp)
+# endif
+# ifndef this_cpu_read_2
+#  define this_cpu_read_2(pcp)	_this_cpu_generic_read(pcp)
+# endif
+# ifndef this_cpu_read_4
+#  define this_cpu_read_4(pcp)	_this_cpu_generic_read(pcp)
+# endif
+# ifndef this_cpu_read_8
+#  define this_cpu_read_8(pcp)	_this_cpu_generic_read(pcp)
+# endif
+# define this_cpu_read(pcp)	__pcpu_size_call_return(this_cpu_read_, (pcp))
+#endif
+
+#define _this_cpu_generic_to_op(pcp, val, op)				\
+do {									\
+	preempt_disable();						\
+	*__this_cpu_ptr(&(pcp)) op val;					\
+	preempt_enable();						\
+} while (0)
+
+#ifndef this_cpu_write
+# ifndef this_cpu_write_1
+#  define this_cpu_write_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), =)
+# endif
+# ifndef this_cpu_write_2
+#  define this_cpu_write_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), =)
+# endif
+# ifndef this_cpu_write_4
+#  define this_cpu_write_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), =)
+# endif
+# ifndef this_cpu_write_8
+#  define this_cpu_write_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), =)
+# endif
+# define this_cpu_write(pcp, val)	__pcpu_size_call(this_cpu_write_, (pcp), (val))
+#endif
+
+#ifndef this_cpu_add
+# ifndef this_cpu_add_1
+#  define this_cpu_add_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# ifndef this_cpu_add_2
+#  define this_cpu_add_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# ifndef this_cpu_add_4
+#  define this_cpu_add_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# ifndef this_cpu_add_8
+#  define this_cpu_add_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# define this_cpu_add(pcp, val)		__pcpu_size_call(this_cpu_add_, (pcp), (val))
+#endif
+
+#ifndef this_cpu_sub
+# define this_cpu_sub(pcp, val)		this_cpu_add((pcp), -(val))
+#endif
+
+#ifndef this_cpu_inc
+# define this_cpu_inc(pcp)		this_cpu_add((pcp), 1)
+#endif
+
+#ifndef this_cpu_dec
+# define this_cpu_dec(pcp)		this_cpu_sub((pcp), 1)
+#endif
+
+#ifndef this_cpu_and
+# ifndef this_cpu_and_1
+#  define this_cpu_and_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# ifndef this_cpu_and_2
+#  define this_cpu_and_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# ifndef this_cpu_and_4
+#  define this_cpu_and_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# ifndef this_cpu_and_8
+#  define this_cpu_and_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# define this_cpu_and(pcp, val)		__pcpu_size_call(this_cpu_and_, (pcp), (val))
+#endif
+
+#ifndef this_cpu_or
+# ifndef this_cpu_or_1
+#  define this_cpu_or_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# ifndef this_cpu_or_2
+#  define this_cpu_or_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# ifndef this_cpu_or_4
+#  define this_cpu_or_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# ifndef this_cpu_or_8
+#  define this_cpu_or_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# define this_cpu_or(pcp, val)		__pcpu_size_call(this_cpu_or_, (pcp), (val))
+#endif
+
+#ifndef this_cpu_xor
+# ifndef this_cpu_xor_1
+#  define this_cpu_xor_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# ifndef this_cpu_xor_2
+#  define this_cpu_xor_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# ifndef this_cpu_xor_4
+#  define this_cpu_xor_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# ifndef this_cpu_xor_8
+#  define this_cpu_xor_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# define this_cpu_xor(pcp, val)		__pcpu_size_call(this_cpu_or_, (pcp), (val))
+#endif
+
+/*
+ * Generic percpu operations that do not require preemption handling.
+ * Either we do not care about races or the caller has the
+ * responsibility of handling preemptions issues. Arch code can still
+ * override these instructions since the arch per cpu code may be more
+ * efficient and may actually get race freeness for free (that is the
+ * case for x86 for example).
+ *
+ * If there is no other protection through preempt disable and/or
+ * disabling interupts then one of these RMW operations can show unexpected
+ * behavior because the execution thread was rescheduled on another processor
+ * or an interrupt occurred and the same percpu variable was modified from
+ * the interrupt context.
+ */
+#ifndef __this_cpu_read
+# ifndef __this_cpu_read_1
+#  define __this_cpu_read_1(pcp)	(*__this_cpu_ptr(&(pcp)))
+# endif
+# ifndef __this_cpu_read_2
+#  define __this_cpu_read_2(pcp)	(*__this_cpu_ptr(&(pcp)))
+# endif
+# ifndef __this_cpu_read_4
+#  define __this_cpu_read_4(pcp)	(*__this_cpu_ptr(&(pcp)))
+# endif
+# ifndef __this_cpu_read_8
+#  define __this_cpu_read_8(pcp)	(*__this_cpu_ptr(&(pcp)))
+# endif
+# define __this_cpu_read(pcp)	__pcpu_size_call_return(__this_cpu_read_, (pcp))
+#endif
+
+#define __this_cpu_generic_to_op(pcp, val, op)				\
+do {									\
+	*__this_cpu_ptr(&(pcp)) op val;					\
+} while (0)
+
+#ifndef __this_cpu_write
+# ifndef __this_cpu_write_1
+#  define __this_cpu_write_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), =)
+# endif
+# ifndef __this_cpu_write_2
+#  define __this_cpu_write_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), =)
+# endif
+# ifndef __this_cpu_write_4
+#  define __this_cpu_write_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), =)
+# endif
+# ifndef __this_cpu_write_8
+#  define __this_cpu_write_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), =)
+# endif
+# define __this_cpu_write(pcp, val)	__pcpu_size_call(__this_cpu_write_, (pcp), (val))
+#endif
+
+#ifndef __this_cpu_add
+# ifndef __this_cpu_add_1
+#  define __this_cpu_add_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# ifndef __this_cpu_add_2
+#  define __this_cpu_add_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# ifndef __this_cpu_add_4
+#  define __this_cpu_add_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# ifndef __this_cpu_add_8
+#  define __this_cpu_add_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# define __this_cpu_add(pcp, val)	__pcpu_size_call(__this_cpu_add_, (pcp), (val))
+#endif
+
+#ifndef __this_cpu_sub
+# define __this_cpu_sub(pcp, val)	__this_cpu_add((pcp), -(val))
+#endif
+
+#ifndef __this_cpu_inc
+# define __this_cpu_inc(pcp)		__this_cpu_add((pcp), 1)
+#endif
+
+#ifndef __this_cpu_dec
+# define __this_cpu_dec(pcp)		__this_cpu_sub((pcp), 1)
+#endif
+
+#ifndef __this_cpu_and
+# ifndef __this_cpu_and_1
+#  define __this_cpu_and_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# ifndef __this_cpu_and_2
+#  define __this_cpu_and_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# ifndef __this_cpu_and_4
+#  define __this_cpu_and_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# ifndef __this_cpu_and_8
+#  define __this_cpu_and_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# define __this_cpu_and(pcp, val)	__pcpu_size_call(__this_cpu_and_, (pcp), (val))
+#endif
+
+#ifndef __this_cpu_or
+# ifndef __this_cpu_or_1
+#  define __this_cpu_or_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# ifndef __this_cpu_or_2
+#  define __this_cpu_or_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# ifndef __this_cpu_or_4
+#  define __this_cpu_or_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# ifndef __this_cpu_or_8
+#  define __this_cpu_or_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# define __this_cpu_or(pcp, val)	__pcpu_size_call(__this_cpu_or_, (pcp), (val))
+#endif
+
+#ifndef __this_cpu_xor
+# ifndef __this_cpu_xor_1
+#  define __this_cpu_xor_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# ifndef __this_cpu_xor_2
+#  define __this_cpu_xor_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# ifndef __this_cpu_xor_4
+#  define __this_cpu_xor_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# ifndef __this_cpu_xor_8
+#  define __this_cpu_xor_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# define __this_cpu_xor(pcp, val)	__pcpu_size_call(__this_cpu_xor_, (pcp), (val))
+#endif
+
+/*
+ * IRQ safe versions of the per cpu RMW operations. Note that these operations
+ * are *not* safe against modification of the same variable from another
+ * processors (which one gets when using regular atomic operations)
+ . They are guaranteed to be atomic vs. local interrupts and
+ * preemption only.
+ */
+#define irqsafe_cpu_generic_to_op(pcp, val, op)				\
+do {									\
+	unsigned long flags;						\
+	local_irq_save(flags);						\
+	*__this_cpu_ptr(&(pcp)) op val;					\
+	local_irq_restore(flags);					\
+} while (0)
+
+#ifndef irqsafe_cpu_add
+# ifndef irqsafe_cpu_add_1
+#  define irqsafe_cpu_add_1(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# ifndef irqsafe_cpu_add_2
+#  define irqsafe_cpu_add_2(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# ifndef irqsafe_cpu_add_4
+#  define irqsafe_cpu_add_4(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# ifndef irqsafe_cpu_add_8
+#  define irqsafe_cpu_add_8(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# define irqsafe_cpu_add(pcp, val) __pcpu_size_call(irqsafe_cpu_add_, (pcp), (val))
+#endif
+
+#ifndef irqsafe_cpu_sub
+# define irqsafe_cpu_sub(pcp, val)	irqsafe_cpu_add((pcp), -(val))
+#endif
+
+#ifndef irqsafe_cpu_inc
+# define irqsafe_cpu_inc(pcp)	irqsafe_cpu_add((pcp), 1)
+#endif
+
+#ifndef irqsafe_cpu_dec
+# define irqsafe_cpu_dec(pcp)	irqsafe_cpu_sub((pcp), 1)
+#endif
+
+#ifndef irqsafe_cpu_and
+# ifndef irqsafe_cpu_and_1
+#  define irqsafe_cpu_and_1(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# ifndef irqsafe_cpu_and_2
+#  define irqsafe_cpu_and_2(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# ifndef irqsafe_cpu_and_4
+#  define irqsafe_cpu_and_4(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# ifndef irqsafe_cpu_and_8
+#  define irqsafe_cpu_and_8(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# define irqsafe_cpu_and(pcp, val) __pcpu_size_call(irqsafe_cpu_and_, (val))
+#endif
+
+#ifndef irqsafe_cpu_or
+# ifndef irqsafe_cpu_or_1
+#  define irqsafe_cpu_or_1(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# ifndef irqsafe_cpu_or_2
+#  define irqsafe_cpu_or_2(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# ifndef irqsafe_cpu_or_4
+#  define irqsafe_cpu_or_4(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# ifndef irqsafe_cpu_or_8
+#  define irqsafe_cpu_or_8(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# define irqsafe_cpu_or(pcp, val) __pcpu_size_call(irqsafe_cpu_or_, (val))
+#endif
+
+#ifndef irqsafe_cpu_xor
+# ifndef irqsafe_cpu_xor_1
+#  define irqsafe_cpu_xor_1(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# ifndef irqsafe_cpu_xor_2
+#  define irqsafe_cpu_xor_2(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# ifndef irqsafe_cpu_xor_4
+#  define irqsafe_cpu_xor_4(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# ifndef irqsafe_cpu_xor_8
+#  define irqsafe_cpu_xor_8(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# define irqsafe_cpu_xor(pcp, val) __pcpu_size_call(irqsafe_cpu_xor_, (val))
+#endif
+
 #endif /* _LINUX_PERCPU_DEFS_H */
Index: linux-2.6.33-mmotm-100302-1838/include/linux/percpu.h
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/include/linux/percpu.h
+++ linux-2.6.33-mmotm-100302-1838/include/linux/percpu.h
@@ -180,459 +180,5 @@ static inline void *pcpu_lpage_remapped(
 #define alloc_percpu(type)	\
 	(typeof(type) __percpu *)__alloc_percpu(sizeof(type), __alignof__(type))
 
-/*
- * Optional methods for optimized non-lvalue per-cpu variable access.
- *
- * @var can be a percpu variable or a field of it and its size should
- * equal char, int or long.  percpu_read() evaluates to a lvalue and
- * all others to void.
- *
- * These operations are guaranteed to be atomic w.r.t. preemption.
- * The generic versions use plain get/put_cpu_var().  Archs are
- * encouraged to implement single-instruction alternatives which don't
- * require preemption protection.
- */
-#ifndef percpu_read
-# define percpu_read(var)						\
-  ({									\
-	typeof(var) *pr_ptr__ = &(var);					\
-	typeof(var) pr_ret__;						\
-	pr_ret__ = get_cpu_var(*pr_ptr__);				\
-	put_cpu_var(*pr_ptr__);						\
-	pr_ret__;							\
-  })
-#endif
-
-#define __percpu_generic_to_op(var, val, op)				\
-do {									\
-	typeof(var) *pgto_ptr__ = &(var);				\
-	get_cpu_var(*pgto_ptr__) op val;				\
-	put_cpu_var(*pgto_ptr__);					\
-} while (0)
-
-#ifndef percpu_write
-# define percpu_write(var, val)		__percpu_generic_to_op(var, (val), =)
-#endif
-
-#ifndef percpu_add
-# define percpu_add(var, val)		__percpu_generic_to_op(var, (val), +=)
-#endif
-
-#ifndef percpu_sub
-# define percpu_sub(var, val)		__percpu_generic_to_op(var, (val), -=)
-#endif
-
-#ifndef percpu_and
-# define percpu_and(var, val)		__percpu_generic_to_op(var, (val), &=)
-#endif
-
-#ifndef percpu_or
-# define percpu_or(var, val)		__percpu_generic_to_op(var, (val), |=)
-#endif
-
-#ifndef percpu_xor
-# define percpu_xor(var, val)		__percpu_generic_to_op(var, (val), ^=)
-#endif
-
-/*
- * Branching function to split up a function into a set of functions that
- * are called for different scalar sizes of the objects handled.
- */
-
-extern void __bad_size_call_parameter(void);
-
-#define __pcpu_size_call_return(stem, variable)				\
-({	typeof(variable) pscr_ret__;					\
-	__verify_pcpu_ptr(&(variable));					\
-	switch(sizeof(variable)) {					\
-	case 1: pscr_ret__ = stem##1(variable);break;			\
-	case 2: pscr_ret__ = stem##2(variable);break;			\
-	case 4: pscr_ret__ = stem##4(variable);break;			\
-	case 8: pscr_ret__ = stem##8(variable);break;			\
-	default:							\
-		__bad_size_call_parameter();break;			\
-	}								\
-	pscr_ret__;							\
-})
-
-#define __pcpu_size_call(stem, variable, ...)				\
-do {									\
-	__verify_pcpu_ptr(&(variable));					\
-	switch(sizeof(variable)) {					\
-		case 1: stem##1(variable, __VA_ARGS__);break;		\
-		case 2: stem##2(variable, __VA_ARGS__);break;		\
-		case 4: stem##4(variable, __VA_ARGS__);break;		\
-		case 8: stem##8(variable, __VA_ARGS__);break;		\
-		default: 						\
-			__bad_size_call_parameter();break;		\
-	}								\
-} while (0)
-
-/*
- * Optimized manipulation for memory allocated through the per cpu
- * allocator or for addresses of per cpu variables.
- *
- * These operation guarantee exclusivity of access for other operations
- * on the *same* processor. The assumption is that per cpu data is only
- * accessed by a single processor instance (the current one).
- *
- * The first group is used for accesses that must be done in a
- * preemption safe way since we know that the context is not preempt
- * safe. Interrupts may occur. If the interrupt modifies the variable
- * too then RMW actions will not be reliable.
- *
- * The arch code can provide optimized functions in two ways:
- *
- * 1. Override the function completely. F.e. define this_cpu_add().
- *    The arch must then ensure that the various scalar format passed
- *    are handled correctly.
- *
- * 2. Provide functions for certain scalar sizes. F.e. provide
- *    this_cpu_add_2() to provide per cpu atomic operations for 2 byte
- *    sized RMW actions. If arch code does not provide operations for
- *    a scalar size then the fallback in the generic code will be
- *    used.
- */
-
-#define _this_cpu_generic_read(pcp)					\
-({	typeof(pcp) ret__;						\
-	preempt_disable();						\
-	ret__ = *this_cpu_ptr(&(pcp));					\
-	preempt_enable();						\
-	ret__;								\
-})
-
-#ifndef this_cpu_read
-# ifndef this_cpu_read_1
-#  define this_cpu_read_1(pcp)	_this_cpu_generic_read(pcp)
-# endif
-# ifndef this_cpu_read_2
-#  define this_cpu_read_2(pcp)	_this_cpu_generic_read(pcp)
-# endif
-# ifndef this_cpu_read_4
-#  define this_cpu_read_4(pcp)	_this_cpu_generic_read(pcp)
-# endif
-# ifndef this_cpu_read_8
-#  define this_cpu_read_8(pcp)	_this_cpu_generic_read(pcp)
-# endif
-# define this_cpu_read(pcp)	__pcpu_size_call_return(this_cpu_read_, (pcp))
-#endif
-
-#define _this_cpu_generic_to_op(pcp, val, op)				\
-do {									\
-	preempt_disable();						\
-	*__this_cpu_ptr(&(pcp)) op val;					\
-	preempt_enable();						\
-} while (0)
-
-#ifndef this_cpu_write
-# ifndef this_cpu_write_1
-#  define this_cpu_write_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), =)
-# endif
-# ifndef this_cpu_write_2
-#  define this_cpu_write_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), =)
-# endif
-# ifndef this_cpu_write_4
-#  define this_cpu_write_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), =)
-# endif
-# ifndef this_cpu_write_8
-#  define this_cpu_write_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), =)
-# endif
-# define this_cpu_write(pcp, val)	__pcpu_size_call(this_cpu_write_, (pcp), (val))
-#endif
-
-#ifndef this_cpu_add
-# ifndef this_cpu_add_1
-#  define this_cpu_add_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# ifndef this_cpu_add_2
-#  define this_cpu_add_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# ifndef this_cpu_add_4
-#  define this_cpu_add_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# ifndef this_cpu_add_8
-#  define this_cpu_add_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# define this_cpu_add(pcp, val)		__pcpu_size_call(this_cpu_add_, (pcp), (val))
-#endif
-
-#ifndef this_cpu_sub
-# define this_cpu_sub(pcp, val)		this_cpu_add((pcp), -(val))
-#endif
-
-#ifndef this_cpu_inc
-# define this_cpu_inc(pcp)		this_cpu_add((pcp), 1)
-#endif
-
-#ifndef this_cpu_dec
-# define this_cpu_dec(pcp)		this_cpu_sub((pcp), 1)
-#endif
-
-#ifndef this_cpu_and
-# ifndef this_cpu_and_1
-#  define this_cpu_and_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# ifndef this_cpu_and_2
-#  define this_cpu_and_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# ifndef this_cpu_and_4
-#  define this_cpu_and_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# ifndef this_cpu_and_8
-#  define this_cpu_and_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# define this_cpu_and(pcp, val)		__pcpu_size_call(this_cpu_and_, (pcp), (val))
-#endif
-
-#ifndef this_cpu_or
-# ifndef this_cpu_or_1
-#  define this_cpu_or_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# ifndef this_cpu_or_2
-#  define this_cpu_or_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# ifndef this_cpu_or_4
-#  define this_cpu_or_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# ifndef this_cpu_or_8
-#  define this_cpu_or_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# define this_cpu_or(pcp, val)		__pcpu_size_call(this_cpu_or_, (pcp), (val))
-#endif
-
-#ifndef this_cpu_xor
-# ifndef this_cpu_xor_1
-#  define this_cpu_xor_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# ifndef this_cpu_xor_2
-#  define this_cpu_xor_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# ifndef this_cpu_xor_4
-#  define this_cpu_xor_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# ifndef this_cpu_xor_8
-#  define this_cpu_xor_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# define this_cpu_xor(pcp, val)		__pcpu_size_call(this_cpu_or_, (pcp), (val))
-#endif
-
-/*
- * Generic percpu operations that do not require preemption handling.
- * Either we do not care about races or the caller has the
- * responsibility of handling preemptions issues. Arch code can still
- * override these instructions since the arch per cpu code may be more
- * efficient and may actually get race freeness for free (that is the
- * case for x86 for example).
- *
- * If there is no other protection through preempt disable and/or
- * disabling interupts then one of these RMW operations can show unexpected
- * behavior because the execution thread was rescheduled on another processor
- * or an interrupt occurred and the same percpu variable was modified from
- * the interrupt context.
- */
-#ifndef __this_cpu_read
-# ifndef __this_cpu_read_1
-#  define __this_cpu_read_1(pcp)	(*__this_cpu_ptr(&(pcp)))
-# endif
-# ifndef __this_cpu_read_2
-#  define __this_cpu_read_2(pcp)	(*__this_cpu_ptr(&(pcp)))
-# endif
-# ifndef __this_cpu_read_4
-#  define __this_cpu_read_4(pcp)	(*__this_cpu_ptr(&(pcp)))
-# endif
-# ifndef __this_cpu_read_8
-#  define __this_cpu_read_8(pcp)	(*__this_cpu_ptr(&(pcp)))
-# endif
-# define __this_cpu_read(pcp)	__pcpu_size_call_return(__this_cpu_read_, (pcp))
-#endif
-
-#define __this_cpu_generic_to_op(pcp, val, op)				\
-do {									\
-	*__this_cpu_ptr(&(pcp)) op val;					\
-} while (0)
-
-#ifndef __this_cpu_write
-# ifndef __this_cpu_write_1
-#  define __this_cpu_write_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), =)
-# endif
-# ifndef __this_cpu_write_2
-#  define __this_cpu_write_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), =)
-# endif
-# ifndef __this_cpu_write_4
-#  define __this_cpu_write_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), =)
-# endif
-# ifndef __this_cpu_write_8
-#  define __this_cpu_write_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), =)
-# endif
-# define __this_cpu_write(pcp, val)	__pcpu_size_call(__this_cpu_write_, (pcp), (val))
-#endif
-
-#ifndef __this_cpu_add
-# ifndef __this_cpu_add_1
-#  define __this_cpu_add_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# ifndef __this_cpu_add_2
-#  define __this_cpu_add_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# ifndef __this_cpu_add_4
-#  define __this_cpu_add_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# ifndef __this_cpu_add_8
-#  define __this_cpu_add_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# define __this_cpu_add(pcp, val)	__pcpu_size_call(__this_cpu_add_, (pcp), (val))
-#endif
-
-#ifndef __this_cpu_sub
-# define __this_cpu_sub(pcp, val)	__this_cpu_add((pcp), -(val))
-#endif
-
-#ifndef __this_cpu_inc
-# define __this_cpu_inc(pcp)		__this_cpu_add((pcp), 1)
-#endif
-
-#ifndef __this_cpu_dec
-# define __this_cpu_dec(pcp)		__this_cpu_sub((pcp), 1)
-#endif
-
-#ifndef __this_cpu_and
-# ifndef __this_cpu_and_1
-#  define __this_cpu_and_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# ifndef __this_cpu_and_2
-#  define __this_cpu_and_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# ifndef __this_cpu_and_4
-#  define __this_cpu_and_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# ifndef __this_cpu_and_8
-#  define __this_cpu_and_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# define __this_cpu_and(pcp, val)	__pcpu_size_call(__this_cpu_and_, (pcp), (val))
-#endif
-
-#ifndef __this_cpu_or
-# ifndef __this_cpu_or_1
-#  define __this_cpu_or_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# ifndef __this_cpu_or_2
-#  define __this_cpu_or_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# ifndef __this_cpu_or_4
-#  define __this_cpu_or_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# ifndef __this_cpu_or_8
-#  define __this_cpu_or_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# define __this_cpu_or(pcp, val)	__pcpu_size_call(__this_cpu_or_, (pcp), (val))
-#endif
-
-#ifndef __this_cpu_xor
-# ifndef __this_cpu_xor_1
-#  define __this_cpu_xor_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# ifndef __this_cpu_xor_2
-#  define __this_cpu_xor_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# ifndef __this_cpu_xor_4
-#  define __this_cpu_xor_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# ifndef __this_cpu_xor_8
-#  define __this_cpu_xor_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# define __this_cpu_xor(pcp, val)	__pcpu_size_call(__this_cpu_xor_, (pcp), (val))
-#endif
-
-/*
- * IRQ safe versions of the per cpu RMW operations. Note that these operations
- * are *not* safe against modification of the same variable from another
- * processors (which one gets when using regular atomic operations)
- . They are guaranteed to be atomic vs. local interrupts and
- * preemption only.
- */
-#define irqsafe_cpu_generic_to_op(pcp, val, op)				\
-do {									\
-	unsigned long flags;						\
-	local_irq_save(flags);						\
-	*__this_cpu_ptr(&(pcp)) op val;					\
-	local_irq_restore(flags);					\
-} while (0)
-
-#ifndef irqsafe_cpu_add
-# ifndef irqsafe_cpu_add_1
-#  define irqsafe_cpu_add_1(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# ifndef irqsafe_cpu_add_2
-#  define irqsafe_cpu_add_2(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# ifndef irqsafe_cpu_add_4
-#  define irqsafe_cpu_add_4(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# ifndef irqsafe_cpu_add_8
-#  define irqsafe_cpu_add_8(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# define irqsafe_cpu_add(pcp, val) __pcpu_size_call(irqsafe_cpu_add_, (pcp), (val))
-#endif
-
-#ifndef irqsafe_cpu_sub
-# define irqsafe_cpu_sub(pcp, val)	irqsafe_cpu_add((pcp), -(val))
-#endif
-
-#ifndef irqsafe_cpu_inc
-# define irqsafe_cpu_inc(pcp)	irqsafe_cpu_add((pcp), 1)
-#endif
-
-#ifndef irqsafe_cpu_dec
-# define irqsafe_cpu_dec(pcp)	irqsafe_cpu_sub((pcp), 1)
-#endif
-
-#ifndef irqsafe_cpu_and
-# ifndef irqsafe_cpu_and_1
-#  define irqsafe_cpu_and_1(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# ifndef irqsafe_cpu_and_2
-#  define irqsafe_cpu_and_2(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# ifndef irqsafe_cpu_and_4
-#  define irqsafe_cpu_and_4(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# ifndef irqsafe_cpu_and_8
-#  define irqsafe_cpu_and_8(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# define irqsafe_cpu_and(pcp, val) __pcpu_size_call(irqsafe_cpu_and_, (val))
-#endif
-
-#ifndef irqsafe_cpu_or
-# ifndef irqsafe_cpu_or_1
-#  define irqsafe_cpu_or_1(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# ifndef irqsafe_cpu_or_2
-#  define irqsafe_cpu_or_2(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# ifndef irqsafe_cpu_or_4
-#  define irqsafe_cpu_or_4(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# ifndef irqsafe_cpu_or_8
-#  define irqsafe_cpu_or_8(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# define irqsafe_cpu_or(pcp, val) __pcpu_size_call(irqsafe_cpu_or_, (val))
-#endif
-
-#ifndef irqsafe_cpu_xor
-# ifndef irqsafe_cpu_xor_1
-#  define irqsafe_cpu_xor_1(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# ifndef irqsafe_cpu_xor_2
-#  define irqsafe_cpu_xor_2(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# ifndef irqsafe_cpu_xor_4
-#  define irqsafe_cpu_xor_4(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# ifndef irqsafe_cpu_xor_8
-#  define irqsafe_cpu_xor_8(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# define irqsafe_cpu_xor(pcp, val) __pcpu_size_call(irqsafe_cpu_xor_, (val))
-#endif
 
 #endif /* __LINUX_PERCPU_H */

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH/RFC 2/8] numa:  add generic percpu var implementation of numa_node_id()
  2010-03-04 17:06 ` Lee Schermerhorn
@ 2010-03-04 17:07   ` Lee Schermerhorn
  -1 siblings, 0 replies; 42+ messages in thread
From: Lee Schermerhorn @ 2010-03-04 17:07 UTC (permalink / raw)
  To: linux-arch, linux-mm, linux-numa
  Cc: Tejun Heo, Mel Gorman, Andi Kleen, Christoph Lameter,
	Nick Piggin, David Rientjes, akpm, eric.whitney

From: Christoph Lameter <cl@linux-foundation.org>

Against:  2.6.33-mmotm-100302-1838

Rework the generic version of the numa_node_id() function to use the
new generic percpu variable infrastructure.

Guard the new implementation with a new config option:

        CONFIG_USE_PERCPU_NUMA_NODE_ID.

Archs which support this new implemention will default this option
to 'y' when NUMA is configured.  This config option could be removed
if/when all archs switch over to the generic percpu implementation
of numa_node_id().  Arch support involves:

  1) converting any existing per cpu variable implementations to use
     this implementation.  x86_64 is an instance of such an arch.
  2) archs that don't use a per cpu variable for numa_node_id() will
     need to initialize the new per cpu variable "numa_node" as cpus
     are brought on-line.  ia64 is an example.
  3) Defining USE_PERCPU_NUMA_NODE_ID in arch dependent Kconfig--e.g.,
     when NUMA is configured

Subsequent patches will convert x86_64 and ia64 to use this
implemenation.


Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

V0:
#  From cl@linux-foundation.org Wed Nov  4 10:36:12 2009
#  Date: Wed, 4 Nov 2009 12:35:14 -0500 (EST)
#  From: Christoph Lameter <cl@linux-foundation.org>
#  To: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
#  Subject: Re: [PATCH/RFC] slab:  handle memoryless nodes efficiently
#
#  I have a very early form of a draft of a patch here that genericizes
#  numa_node_id(). Uses the new generic this_cpu_xxx stuff.
#
#  Not complete.

V1:
  + split out x86 specific changes to subsequent patch
  + split out "numa_mem_id()" and related changes to separate patch
  + moved generic definitions of __this_cpu_xxx from linux/percpu.h
    to asm-generic/percpu.h where asm/percpu.h and other asm hdrs
    can use them.
  + export new percpu symbol 'numa_node' in mm/percpu.h
  + include <asm/percpu.h> in <linux/topology.h> for use by new
    numa_node_id().

V2:
  + add back the #ifndef/#endif guard around numa_node_id() so that archs
    can override generic definition
  + add generic stub for set_numa_node()
  + use generic percpu numa_node_id() only if enabled by
      CONFIG_USE_PERCPU_NUMA_NODE_ID
   to allow incremental per arch support.  This option could be removed when/if
   all archs that support NUMA support this option.

V3:
  + separated the rework of linux/percpu.h into another [preceding] patch.
  + moved definition of the numa_node percpu variable from mm/percpu.c to
    mm/page-alloc.c
  + moved premature definition of cpu_to_mem() to later patch.

 include/linux/topology.h |   33 ++++++++++++++++++++++++++++-----
 mm/page_alloc.c          |    5 +++++
 2 files changed, 33 insertions(+), 5 deletions(-)

Index: linux-2.6.33-mmotm-100302-1838/mm/page_alloc.c
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/mm/page_alloc.c
+++ linux-2.6.33-mmotm-100302-1838/mm/page_alloc.c
@@ -56,6 +56,11 @@
 #include <asm/div64.h>
 #include "internal.h"
 
+#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
+DEFINE_PER_CPU(int, numa_node);
+EXPORT_PER_CPU_SYMBOL(numa_node);
+#endif
+
 /*
  * Array of node states.
  */
Index: linux-2.6.33-mmotm-100302-1838/include/linux/topology.h
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/include/linux/topology.h
+++ linux-2.6.33-mmotm-100302-1838/include/linux/topology.h
@@ -31,6 +31,7 @@
 #include <linux/bitops.h>
 #include <linux/mmzone.h>
 #include <linux/smp.h>
+#include <linux/percpu-defs.h>
 #include <asm/topology.h>
 
 #ifndef node_has_online_mem
@@ -203,8 +204,35 @@ int arch_update_cpu_topology(void);
 #ifndef SD_NODE_INIT
 #error Please define an appropriate SD_NODE_INIT in include/asm/topology.h!!!
 #endif
+
 #endif /* CONFIG_NUMA */
 
+#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
+DECLARE_PER_CPU(int, numa_node);
+
+#ifndef numa_node_id
+/* Returns the number of the current Node. */
+#define numa_node_id()		__this_cpu_read(numa_node)
+#endif
+
+#ifndef cpu_to_node
+#define cpu_to_node(__cpu)	per_cpu(numa_node, (__cpu))
+#endif
+
+#ifndef set_numa_node
+#define set_numa_node(__node) percpu_write(numa_node, __node)
+#endif
+
+#else	/* !CONFIG_USE_PERCPU_NUMA_NODE_ID */
+
+/* Returns the number of the current Node. */
+#ifndef numa_node_id
+#define numa_node_id()		(cpu_to_node(raw_smp_processor_id()))
+
+#endif
+
+#endif	/* [!]CONFIG_USE_PERCPU_NUMA_NODE_ID */
+
 #ifndef topology_physical_package_id
 #define topology_physical_package_id(cpu)	((void)(cpu), -1)
 #endif
@@ -218,9 +246,4 @@ int arch_update_cpu_topology(void);
 #define topology_core_cpumask(cpu)		cpumask_of(cpu)
 #endif
 
-/* Returns the number of the current Node. */
-#ifndef numa_node_id
-#define numa_node_id()		(cpu_to_node(raw_smp_processor_id()))
-#endif
-
 #endif /* _LINUX_TOPOLOGY_H */

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH/RFC 2/8] numa:  add generic percpu var implementation of numa_node_id()
@ 2010-03-04 17:07   ` Lee Schermerhorn
  0 siblings, 0 replies; 42+ messages in thread
From: Lee Schermerhorn @ 2010-03-04 17:07 UTC (permalink / raw)
  To: linux-arch, linux-mm, linux-numa
  Cc: Tejun Heo, Mel Gorman, Andi Kleen, Christoph Lameter,
	Nick Piggin, David Rientjes, akpm, eric.whitney

From: Christoph Lameter <cl@linux-foundation.org>

Against:  2.6.33-mmotm-100302-1838

Rework the generic version of the numa_node_id() function to use the
new generic percpu variable infrastructure.

Guard the new implementation with a new config option:

        CONFIG_USE_PERCPU_NUMA_NODE_ID.

Archs which support this new implemention will default this option
to 'y' when NUMA is configured.  This config option could be removed
if/when all archs switch over to the generic percpu implementation
of numa_node_id().  Arch support involves:

  1) converting any existing per cpu variable implementations to use
     this implementation.  x86_64 is an instance of such an arch.
  2) archs that don't use a per cpu variable for numa_node_id() will
     need to initialize the new per cpu variable "numa_node" as cpus
     are brought on-line.  ia64 is an example.
  3) Defining USE_PERCPU_NUMA_NODE_ID in arch dependent Kconfig--e.g.,
     when NUMA is configured

Subsequent patches will convert x86_64 and ia64 to use this
implemenation.


Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

V0:
#  From cl@linux-foundation.org Wed Nov  4 10:36:12 2009
#  Date: Wed, 4 Nov 2009 12:35:14 -0500 (EST)
#  From: Christoph Lameter <cl@linux-foundation.org>
#  To: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
#  Subject: Re: [PATCH/RFC] slab:  handle memoryless nodes efficiently
#
#  I have a very early form of a draft of a patch here that genericizes
#  numa_node_id(). Uses the new generic this_cpu_xxx stuff.
#
#  Not complete.

V1:
  + split out x86 specific changes to subsequent patch
  + split out "numa_mem_id()" and related changes to separate patch
  + moved generic definitions of __this_cpu_xxx from linux/percpu.h
    to asm-generic/percpu.h where asm/percpu.h and other asm hdrs
    can use them.
  + export new percpu symbol 'numa_node' in mm/percpu.h
  + include <asm/percpu.h> in <linux/topology.h> for use by new
    numa_node_id().

V2:
  + add back the #ifndef/#endif guard around numa_node_id() so that archs
    can override generic definition
  + add generic stub for set_numa_node()
  + use generic percpu numa_node_id() only if enabled by
      CONFIG_USE_PERCPU_NUMA_NODE_ID
   to allow incremental per arch support.  This option could be removed when/if
   all archs that support NUMA support this option.

V3:
  + separated the rework of linux/percpu.h into another [preceding] patch.
  + moved definition of the numa_node percpu variable from mm/percpu.c to
    mm/page-alloc.c
  + moved premature definition of cpu_to_mem() to later patch.

 include/linux/topology.h |   33 ++++++++++++++++++++++++++++-----
 mm/page_alloc.c          |    5 +++++
 2 files changed, 33 insertions(+), 5 deletions(-)

Index: linux-2.6.33-mmotm-100302-1838/mm/page_alloc.c
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/mm/page_alloc.c
+++ linux-2.6.33-mmotm-100302-1838/mm/page_alloc.c
@@ -56,6 +56,11 @@
 #include <asm/div64.h>
 #include "internal.h"
 
+#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
+DEFINE_PER_CPU(int, numa_node);
+EXPORT_PER_CPU_SYMBOL(numa_node);
+#endif
+
 /*
  * Array of node states.
  */
Index: linux-2.6.33-mmotm-100302-1838/include/linux/topology.h
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/include/linux/topology.h
+++ linux-2.6.33-mmotm-100302-1838/include/linux/topology.h
@@ -31,6 +31,7 @@
 #include <linux/bitops.h>
 #include <linux/mmzone.h>
 #include <linux/smp.h>
+#include <linux/percpu-defs.h>
 #include <asm/topology.h>
 
 #ifndef node_has_online_mem
@@ -203,8 +204,35 @@ int arch_update_cpu_topology(void);
 #ifndef SD_NODE_INIT
 #error Please define an appropriate SD_NODE_INIT in include/asm/topology.h!!!
 #endif
+
 #endif /* CONFIG_NUMA */
 
+#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
+DECLARE_PER_CPU(int, numa_node);
+
+#ifndef numa_node_id
+/* Returns the number of the current Node. */
+#define numa_node_id()		__this_cpu_read(numa_node)
+#endif
+
+#ifndef cpu_to_node
+#define cpu_to_node(__cpu)	per_cpu(numa_node, (__cpu))
+#endif
+
+#ifndef set_numa_node
+#define set_numa_node(__node) percpu_write(numa_node, __node)
+#endif
+
+#else	/* !CONFIG_USE_PERCPU_NUMA_NODE_ID */
+
+/* Returns the number of the current Node. */
+#ifndef numa_node_id
+#define numa_node_id()		(cpu_to_node(raw_smp_processor_id()))
+
+#endif
+
+#endif	/* [!]CONFIG_USE_PERCPU_NUMA_NODE_ID */
+
 #ifndef topology_physical_package_id
 #define topology_physical_package_id(cpu)	((void)(cpu), -1)
 #endif
@@ -218,9 +246,4 @@ int arch_update_cpu_topology(void);
 #define topology_core_cpumask(cpu)		cpumask_of(cpu)
 #endif
 
-/* Returns the number of the current Node. */
-#ifndef numa_node_id
-#define numa_node_id()		(cpu_to_node(raw_smp_processor_id()))
-#endif
-
 #endif /* _LINUX_TOPOLOGY_H */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH/RFC 3/8] numa:  x86_64:  use generic percpu var for numa_node_id() implementation
  2010-03-04 17:06 ` Lee Schermerhorn
@ 2010-03-04 17:07   ` Lee Schermerhorn
  -1 siblings, 0 replies; 42+ messages in thread
From: Lee Schermerhorn @ 2010-03-04 17:07 UTC (permalink / raw)
  To: linux-arch, linux-mm, linux-numa
  Cc: Tejun Heo, Mel Gorman, Andi Kleen, Christoph Lameter,
	Nick Piggin, David Rientjes, akpm, eric.whitney

Against:  2.6.33-mmotm-100302-1838

x86 arch specific changes to use generic numa_node_id() based on
generic percpu variable infrastructure.  Back out x86's custom
version of numa_node_id()

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
[Christoph's signoff here?]

V0: based on:
# From cl@linux-foundation.org Wed Nov  4 10:36:12 2009
# Date: Wed, 4 Nov 2009 12:35:14 -0500 (EST)
# From: Christoph Lameter <cl@linux-foundation.org>
# To: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
# Subject: Re: [PATCH/RFC] slab:  handle memoryless nodes efficiently
# 
# I have a very early form of a draft of a patch here that genericizes
# numa_node_id(). Uses the new generic this_cpu_xxx stuff.
# 
# Not complete.

V1:
  + split out x86-specific changes from generic.
  + change 'node_number' => 'numa_node' in x86 arch code
  + define __this_cpu_read in x86 asm/percpu.h
  + change x86/kernel/setup_percpu.c to use early_cpu_to_node() to
    setup 'numa_node' as cpu_to_node() now depends on the per cpu var.
    [I think!  What about cpu_to_node() func in x86/mm/numa_64.c ???]

V2:
  + cpu_to_node() => early_cpu_to_node(); incomplete change in V01
  + x86 arch define USE_PERCPU_NUMA_NODE_ID.

 arch/x86/Kconfig                |    4 ++++
 arch/x86/include/asm/percpu.h   |    2 ++
 arch/x86/include/asm/topology.h |   13 +------------
 arch/x86/kernel/cpu/common.c    |    6 +++---
 arch/x86/kernel/setup_percpu.c  |    4 ++--
 arch/x86/mm/numa_64.c           |    5 +----
 6 files changed, 13 insertions(+), 21 deletions(-)

Index: linux-2.6.33-mmotm-100302-1838/arch/x86/include/asm/topology.h
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/arch/x86/include/asm/topology.h
+++ linux-2.6.33-mmotm-100302-1838/arch/x86/include/asm/topology.h
@@ -53,33 +53,22 @@
 extern int cpu_to_node_map[];
 
 /* Returns the number of the node containing CPU 'cpu' */
-static inline int cpu_to_node(int cpu)
+static inline int early_cpu_to_node(int cpu)
 {
 	return cpu_to_node_map[cpu];
 }
-#define early_cpu_to_node(cpu)	cpu_to_node(cpu)
 
 #else /* CONFIG_X86_64 */
 
 /* Mappings between logical cpu number and node number */
 DECLARE_EARLY_PER_CPU(int, x86_cpu_to_node_map);
 
-/* Returns the number of the current Node. */
-DECLARE_PER_CPU(int, node_number);
-#define numa_node_id()		percpu_read(node_number)
-
 #ifdef CONFIG_DEBUG_PER_CPU_MAPS
 extern int cpu_to_node(int cpu);
 extern int early_cpu_to_node(int cpu);
 
 #else	/* !CONFIG_DEBUG_PER_CPU_MAPS */
 
-/* Returns the number of the node containing CPU 'cpu' */
-static inline int cpu_to_node(int cpu)
-{
-	return per_cpu(x86_cpu_to_node_map, cpu);
-}
-
 /* Same function but used if called before per_cpu areas are setup */
 static inline int early_cpu_to_node(int cpu)
 {
Index: linux-2.6.33-mmotm-100302-1838/arch/x86/mm/numa_64.c
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/arch/x86/mm/numa_64.c
+++ linux-2.6.33-mmotm-100302-1838/arch/x86/mm/numa_64.c
@@ -33,9 +33,6 @@ int numa_off __initdata;
 static unsigned long __initdata nodemap_addr;
 static unsigned long __initdata nodemap_size;
 
-DEFINE_PER_CPU(int, node_number) = 0;
-EXPORT_PER_CPU_SYMBOL(node_number);
-
 /*
  * Map cpu index to node index
  */
@@ -809,7 +806,7 @@ void __cpuinit numa_set_node(int cpu, in
 	per_cpu(x86_cpu_to_node_map, cpu) = node;
 
 	if (node != NUMA_NO_NODE)
-		per_cpu(node_number, cpu) = node;
+		per_cpu(numa_node, cpu) = node;
 }
 
 void __cpuinit numa_clear_node(int cpu)
Index: linux-2.6.33-mmotm-100302-1838/arch/x86/include/asm/percpu.h
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/arch/x86/include/asm/percpu.h
+++ linux-2.6.33-mmotm-100302-1838/arch/x86/include/asm/percpu.h
@@ -208,10 +208,12 @@ do {									\
 #define percpu_or(var, val)		percpu_to_op("or", var, val)
 #define percpu_xor(var, val)		percpu_to_op("xor", var, val)
 
+#define __this_cpu_read(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
 #define __this_cpu_read_1(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
 #define __this_cpu_read_2(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
 #define __this_cpu_read_4(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
 
+#define __this_cpu_write(pcp, val)	percpu_to_op("mov", (pcp), val)
 #define __this_cpu_write_1(pcp, val)	percpu_to_op("mov", (pcp), val)
 #define __this_cpu_write_2(pcp, val)	percpu_to_op("mov", (pcp), val)
 #define __this_cpu_write_4(pcp, val)	percpu_to_op("mov", (pcp), val)
Index: linux-2.6.33-mmotm-100302-1838/arch/x86/kernel/cpu/common.c
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/arch/x86/kernel/cpu/common.c
+++ linux-2.6.33-mmotm-100302-1838/arch/x86/kernel/cpu/common.c
@@ -1121,9 +1121,9 @@ void __cpuinit cpu_init(void)
 	oist = &per_cpu(orig_ist, cpu);
 
 #ifdef CONFIG_NUMA
-	if (cpu != 0 && percpu_read(node_number) == 0 &&
-	    cpu_to_node(cpu) != NUMA_NO_NODE)
-		percpu_write(node_number, cpu_to_node(cpu));
+	if (cpu != 0 && percpu_read(numa_node) == 0 &&
+	    early_cpu_to_node(cpu) != NUMA_NO_NODE)
+		set_numa_node(early_cpu_to_node(cpu));
 #endif
 
 	me = current;
Index: linux-2.6.33-mmotm-100302-1838/arch/x86/kernel/setup_percpu.c
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/arch/x86/kernel/setup_percpu.c
+++ linux-2.6.33-mmotm-100302-1838/arch/x86/kernel/setup_percpu.c
@@ -265,10 +265,10 @@ void __init setup_per_cpu_areas(void)
 
 #if defined(CONFIG_X86_64) && defined(CONFIG_NUMA)
 	/*
-	 * make sure boot cpu node_number is right, when boot cpu is on the
+	 * make sure boot cpu numa_node is right, when boot cpu is on the
 	 * node that doesn't have mem installed
 	 */
-	per_cpu(node_number, boot_cpu_id) = cpu_to_node(boot_cpu_id);
+	per_cpu(numa_node, boot_cpu_id) = early_cpu_to_node(boot_cpu_id);
 #endif
 
 	/* Setup node to cpumask map */
Index: linux-2.6.33-mmotm-100302-1838/arch/x86/Kconfig
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/arch/x86/Kconfig
+++ linux-2.6.33-mmotm-100302-1838/arch/x86/Kconfig
@@ -1696,6 +1696,10 @@ config HAVE_ARCH_EARLY_PFN_TO_NID
 	def_bool X86_64
 	depends on NUMA
 
+config USE_PERCPU_NUMA_NODE_ID
+	def_bool y
+	depends on NUMA
+
 menu "Power management and ACPI options"
 
 config ARCH_HIBERNATION_HEADER

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH/RFC 3/8] numa:  x86_64:  use generic percpu var for numa_node_id() implementation
@ 2010-03-04 17:07   ` Lee Schermerhorn
  0 siblings, 0 replies; 42+ messages in thread
From: Lee Schermerhorn @ 2010-03-04 17:07 UTC (permalink / raw)
  To: linux-arch, linux-mm, linux-numa
  Cc: Tejun Heo, Mel Gorman, Andi Kleen, Christoph Lameter,
	Nick Piggin, David Rientjes, akpm, eric.whitney

Against:  2.6.33-mmotm-100302-1838

x86 arch specific changes to use generic numa_node_id() based on
generic percpu variable infrastructure.  Back out x86's custom
version of numa_node_id()

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
[Christoph's signoff here?]

V0: based on:
# From cl@linux-foundation.org Wed Nov  4 10:36:12 2009
# Date: Wed, 4 Nov 2009 12:35:14 -0500 (EST)
# From: Christoph Lameter <cl@linux-foundation.org>
# To: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
# Subject: Re: [PATCH/RFC] slab:  handle memoryless nodes efficiently
# 
# I have a very early form of a draft of a patch here that genericizes
# numa_node_id(). Uses the new generic this_cpu_xxx stuff.
# 
# Not complete.

V1:
  + split out x86-specific changes from generic.
  + change 'node_number' => 'numa_node' in x86 arch code
  + define __this_cpu_read in x86 asm/percpu.h
  + change x86/kernel/setup_percpu.c to use early_cpu_to_node() to
    setup 'numa_node' as cpu_to_node() now depends on the per cpu var.
    [I think!  What about cpu_to_node() func in x86/mm/numa_64.c ???]

V2:
  + cpu_to_node() => early_cpu_to_node(); incomplete change in V01
  + x86 arch define USE_PERCPU_NUMA_NODE_ID.

 arch/x86/Kconfig                |    4 ++++
 arch/x86/include/asm/percpu.h   |    2 ++
 arch/x86/include/asm/topology.h |   13 +------------
 arch/x86/kernel/cpu/common.c    |    6 +++---
 arch/x86/kernel/setup_percpu.c  |    4 ++--
 arch/x86/mm/numa_64.c           |    5 +----
 6 files changed, 13 insertions(+), 21 deletions(-)

Index: linux-2.6.33-mmotm-100302-1838/arch/x86/include/asm/topology.h
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/arch/x86/include/asm/topology.h
+++ linux-2.6.33-mmotm-100302-1838/arch/x86/include/asm/topology.h
@@ -53,33 +53,22 @@
 extern int cpu_to_node_map[];
 
 /* Returns the number of the node containing CPU 'cpu' */
-static inline int cpu_to_node(int cpu)
+static inline int early_cpu_to_node(int cpu)
 {
 	return cpu_to_node_map[cpu];
 }
-#define early_cpu_to_node(cpu)	cpu_to_node(cpu)
 
 #else /* CONFIG_X86_64 */
 
 /* Mappings between logical cpu number and node number */
 DECLARE_EARLY_PER_CPU(int, x86_cpu_to_node_map);
 
-/* Returns the number of the current Node. */
-DECLARE_PER_CPU(int, node_number);
-#define numa_node_id()		percpu_read(node_number)
-
 #ifdef CONFIG_DEBUG_PER_CPU_MAPS
 extern int cpu_to_node(int cpu);
 extern int early_cpu_to_node(int cpu);
 
 #else	/* !CONFIG_DEBUG_PER_CPU_MAPS */
 
-/* Returns the number of the node containing CPU 'cpu' */
-static inline int cpu_to_node(int cpu)
-{
-	return per_cpu(x86_cpu_to_node_map, cpu);
-}
-
 /* Same function but used if called before per_cpu areas are setup */
 static inline int early_cpu_to_node(int cpu)
 {
Index: linux-2.6.33-mmotm-100302-1838/arch/x86/mm/numa_64.c
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/arch/x86/mm/numa_64.c
+++ linux-2.6.33-mmotm-100302-1838/arch/x86/mm/numa_64.c
@@ -33,9 +33,6 @@ int numa_off __initdata;
 static unsigned long __initdata nodemap_addr;
 static unsigned long __initdata nodemap_size;
 
-DEFINE_PER_CPU(int, node_number) = 0;
-EXPORT_PER_CPU_SYMBOL(node_number);
-
 /*
  * Map cpu index to node index
  */
@@ -809,7 +806,7 @@ void __cpuinit numa_set_node(int cpu, in
 	per_cpu(x86_cpu_to_node_map, cpu) = node;
 
 	if (node != NUMA_NO_NODE)
-		per_cpu(node_number, cpu) = node;
+		per_cpu(numa_node, cpu) = node;
 }
 
 void __cpuinit numa_clear_node(int cpu)
Index: linux-2.6.33-mmotm-100302-1838/arch/x86/include/asm/percpu.h
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/arch/x86/include/asm/percpu.h
+++ linux-2.6.33-mmotm-100302-1838/arch/x86/include/asm/percpu.h
@@ -208,10 +208,12 @@ do {									\
 #define percpu_or(var, val)		percpu_to_op("or", var, val)
 #define percpu_xor(var, val)		percpu_to_op("xor", var, val)
 
+#define __this_cpu_read(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
 #define __this_cpu_read_1(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
 #define __this_cpu_read_2(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
 #define __this_cpu_read_4(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
 
+#define __this_cpu_write(pcp, val)	percpu_to_op("mov", (pcp), val)
 #define __this_cpu_write_1(pcp, val)	percpu_to_op("mov", (pcp), val)
 #define __this_cpu_write_2(pcp, val)	percpu_to_op("mov", (pcp), val)
 #define __this_cpu_write_4(pcp, val)	percpu_to_op("mov", (pcp), val)
Index: linux-2.6.33-mmotm-100302-1838/arch/x86/kernel/cpu/common.c
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/arch/x86/kernel/cpu/common.c
+++ linux-2.6.33-mmotm-100302-1838/arch/x86/kernel/cpu/common.c
@@ -1121,9 +1121,9 @@ void __cpuinit cpu_init(void)
 	oist = &per_cpu(orig_ist, cpu);
 
 #ifdef CONFIG_NUMA
-	if (cpu != 0 && percpu_read(node_number) == 0 &&
-	    cpu_to_node(cpu) != NUMA_NO_NODE)
-		percpu_write(node_number, cpu_to_node(cpu));
+	if (cpu != 0 && percpu_read(numa_node) == 0 &&
+	    early_cpu_to_node(cpu) != NUMA_NO_NODE)
+		set_numa_node(early_cpu_to_node(cpu));
 #endif
 
 	me = current;
Index: linux-2.6.33-mmotm-100302-1838/arch/x86/kernel/setup_percpu.c
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/arch/x86/kernel/setup_percpu.c
+++ linux-2.6.33-mmotm-100302-1838/arch/x86/kernel/setup_percpu.c
@@ -265,10 +265,10 @@ void __init setup_per_cpu_areas(void)
 
 #if defined(CONFIG_X86_64) && defined(CONFIG_NUMA)
 	/*
-	 * make sure boot cpu node_number is right, when boot cpu is on the
+	 * make sure boot cpu numa_node is right, when boot cpu is on the
 	 * node that doesn't have mem installed
 	 */
-	per_cpu(node_number, boot_cpu_id) = cpu_to_node(boot_cpu_id);
+	per_cpu(numa_node, boot_cpu_id) = early_cpu_to_node(boot_cpu_id);
 #endif
 
 	/* Setup node to cpumask map */
Index: linux-2.6.33-mmotm-100302-1838/arch/x86/Kconfig
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/arch/x86/Kconfig
+++ linux-2.6.33-mmotm-100302-1838/arch/x86/Kconfig
@@ -1696,6 +1696,10 @@ config HAVE_ARCH_EARLY_PFN_TO_NID
 	def_bool X86_64
 	depends on NUMA
 
+config USE_PERCPU_NUMA_NODE_ID
+	def_bool y
+	depends on NUMA
+
 menu "Power management and ACPI options"
 
 config ARCH_HIBERNATION_HEADER

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH/RFC 4/8] numa:  ia64:  use generic percpu var numa_node_id() implementation
@ 2010-03-04 17:07   ` Lee Schermerhorn
  0 siblings, 0 replies; 42+ messages in thread
From: Lee Schermerhorn @ 2010-03-04 17:07 UTC (permalink / raw)
  To: linux-arch, linux-mm, linux-numa
  Cc: Tejun Heo, Mel Gorman, Andi Kleen, Christoph Lameter,
	Nick Piggin, David Rientjes, akpm, eric.whitney

Against:  2.6.33-mmotm-100302-1838

ia64:  Use generic percpu implementation of numa_node_id()
   + intialize per cpu 'numa_node'
   + remove ia64 cpu_to_node() macro;  use generic
   + define CONFIG_USE_PERCPU_NUMA_NODE_ID when NUMA configured

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

New in V2

 arch/ia64/Kconfig                |    4 ++++
 arch/ia64/include/asm/topology.h |    5 -----
 arch/ia64/kernel/smpboot.c       |    6 ++++++
 3 files changed, 10 insertions(+), 5 deletions(-)

Index: linux-2.6.33-mmotm-100302-1838/arch/ia64/kernel/smpboot.c
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/arch/ia64/kernel/smpboot.c
+++ linux-2.6.33-mmotm-100302-1838/arch/ia64/kernel/smpboot.c
@@ -390,6 +390,11 @@ smp_callin (void)
 
 	fix_b0_for_bsp();
 
+	/*
+	 * numa_node_id() works after this.
+	 */
+	set_numa_node(cpu_to_node_map[cpuid]);
+
 	ipi_call_lock_irq();
 	spin_lock(&vector_lock);
 	/* Setup the per cpu irq handling data structures */
@@ -632,6 +637,7 @@ void __devinit smp_prepare_boot_cpu(void
 {
 	cpu_set(smp_processor_id(), cpu_online_map);
 	cpu_set(smp_processor_id(), cpu_callin_map);
+	set_numa_node(cpu_to_node_map[smp_processor_id()]);
 	per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
 	paravirt_post_smp_prepare_boot_cpu();
 }
Index: linux-2.6.33-mmotm-100302-1838/arch/ia64/include/asm/topology.h
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/arch/ia64/include/asm/topology.h
+++ linux-2.6.33-mmotm-100302-1838/arch/ia64/include/asm/topology.h
@@ -26,11 +26,6 @@
 #define RECLAIM_DISTANCE 15
 
 /*
- * Returns the number of the node containing CPU 'cpu'
- */
-#define cpu_to_node(cpu) (int)(cpu_to_node_map[cpu])
-
-/*
  * Returns a bitmask of CPUs on Node 'node'.
  */
 #define cpumask_of_node(node) ((node) == -1 ?				\
Index: linux-2.6.33-mmotm-100302-1838/arch/ia64/Kconfig
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/arch/ia64/Kconfig
+++ linux-2.6.33-mmotm-100302-1838/arch/ia64/Kconfig
@@ -498,6 +498,10 @@ config HAVE_ARCH_NODEDATA_EXTENSION
 	def_bool y
 	depends on NUMA
 
+config USE_PERCPU_NUMA_NODE_ID
+	def_bool y
+	depends on NUMA
+
 config ARCH_PROC_KCORE_TEXT
 	def_bool y
 	depends on PROC_KCORE

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH/RFC 4/8] numa:  ia64:  use generic percpu var numa_node_id() implementation
@ 2010-03-04 17:07   ` Lee Schermerhorn
  0 siblings, 0 replies; 42+ messages in thread
From: Lee Schermerhorn @ 2010-03-04 17:07 UTC (permalink / raw)
  To: linux-arch, linux-mm, linux-numa
  Cc: Tejun Heo, Mel Gorman, Andi Kleen, Christoph Lameter,
	Nick Piggin, David Rientjes, akpm, eric.whitney

Against:  2.6.33-mmotm-100302-1838

ia64:  Use generic percpu implementation of numa_node_id()
   + intialize per cpu 'numa_node'
   + remove ia64 cpu_to_node() macro;  use generic
   + define CONFIG_USE_PERCPU_NUMA_NODE_ID when NUMA configured

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

New in V2

 arch/ia64/Kconfig                |    4 ++++
 arch/ia64/include/asm/topology.h |    5 -----
 arch/ia64/kernel/smpboot.c       |    6 ++++++
 3 files changed, 10 insertions(+), 5 deletions(-)

Index: linux-2.6.33-mmotm-100302-1838/arch/ia64/kernel/smpboot.c
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/arch/ia64/kernel/smpboot.c
+++ linux-2.6.33-mmotm-100302-1838/arch/ia64/kernel/smpboot.c
@@ -390,6 +390,11 @@ smp_callin (void)
 
 	fix_b0_for_bsp();
 
+	/*
+	 * numa_node_id() works after this.
+	 */
+	set_numa_node(cpu_to_node_map[cpuid]);
+
 	ipi_call_lock_irq();
 	spin_lock(&vector_lock);
 	/* Setup the per cpu irq handling data structures */
@@ -632,6 +637,7 @@ void __devinit smp_prepare_boot_cpu(void
 {
 	cpu_set(smp_processor_id(), cpu_online_map);
 	cpu_set(smp_processor_id(), cpu_callin_map);
+	set_numa_node(cpu_to_node_map[smp_processor_id()]);
 	per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
 	paravirt_post_smp_prepare_boot_cpu();
 }
Index: linux-2.6.33-mmotm-100302-1838/arch/ia64/include/asm/topology.h
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/arch/ia64/include/asm/topology.h
+++ linux-2.6.33-mmotm-100302-1838/arch/ia64/include/asm/topology.h
@@ -26,11 +26,6 @@
 #define RECLAIM_DISTANCE 15
 
 /*
- * Returns the number of the node containing CPU 'cpu'
- */
-#define cpu_to_node(cpu) (int)(cpu_to_node_map[cpu])
-
-/*
  * Returns a bitmask of CPUs on Node 'node'.
  */
 #define cpumask_of_node(node) ((node) == -1 ?				\
Index: linux-2.6.33-mmotm-100302-1838/arch/ia64/Kconfig
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/arch/ia64/Kconfig
+++ linux-2.6.33-mmotm-100302-1838/arch/ia64/Kconfig
@@ -498,6 +498,10 @@ config HAVE_ARCH_NODEDATA_EXTENSION
 	def_bool y
 	depends on NUMA
 
+config USE_PERCPU_NUMA_NODE_ID
+	def_bool y
+	depends on NUMA
+
 config ARCH_PROC_KCORE_TEXT
 	def_bool y
 	depends on PROC_KCORE

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH/RFC 5/8] numa: Introduce numa_mem_id()- effective local memory node id
  2010-03-04 17:06 ` Lee Schermerhorn
@ 2010-03-04 17:08   ` Lee Schermerhorn
  -1 siblings, 0 replies; 42+ messages in thread
From: Lee Schermerhorn @ 2010-03-04 17:08 UTC (permalink / raw)
  To: linux-arch, linux-mm, linux-numa
  Cc: Tejun Heo, Mel Gorman, Andi Kleen, Christoph Lameter,
	Nick Piggin, David Rientjes, akpm, eric.whitney

Against:  2.6.33-mmotm-100302-1838

Introduce numa_mem_id(), based on generic percpu variable infrastructure
to track "effective local memory node" for archs that support memoryless
nodes.

Define API in <linux/topology.h> when CONFIG_HAVE_MEMORYLESS_NODES
defined, else stubs. Architectures will define HAVE_MEMORYLESS_NODES
if/when they support them.

Archs can override definitions of:

numa_mem_id() - returns node number of "local memory" node
set_numa_mem() - initialize [this cpus'] per cpu variable 'numa_mem'
cpu_to_mem()  - return numa_mem for specified cpu; may be used as lvalue

if they don't want to use the generic version, but want to support
memoryless nodes.

Generic initialization of 'numa_mem' occurs in __build_all_zonelists().
This will initialize the boot cpu at boot time, and all cpus on change of
numa_zonelist_order, or when node or memory hot-plug requires zonelist rebuild.
Archs that use this implementation will need to initialize 'numa_mem' for
secondary cpus as they're brought on-line.

Question:  Is it worth adding a generic initialization of per cpu numa_mem?
E.g.,  built only when CONFIG_HAVE_MEMORYLESS_NODES defined?  Or leave it
to the archs?

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

V2:  + split this out of Christoph's incomplete "starter patch"
     + flesh out the definition

 include/asm-generic/topology.h |    3 +++
 include/linux/mmzone.h         |    6 ++++++
 include/linux/topology.h       |   24 ++++++++++++++++++++++++
 mm/page_alloc.c                |   39 ++++++++++++++++++++++++++++++++++++++-
 4 files changed, 71 insertions(+), 1 deletion(-)

Index: linux-2.6.33-mmotm-100302-1838/include/linux/topology.h
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/include/linux/topology.h	2010-03-03 16:28:53.000000000 -0500
+++ linux-2.6.33-mmotm-100302-1838/include/linux/topology.h	2010-03-03 16:28:55.000000000 -0500
@@ -233,6 +233,30 @@ DECLARE_PER_CPU(int, numa_node);
 
 #endif	/* [!]CONFIG_USE_PERCPU_NUMA_NODE_ID */
 
+#ifdef CONFIG_HAVE_MEMORYLESS_NODES
+
+DECLARE_PER_CPU(int, numa_mem);
+
+#ifndef set_numa_mem
+#define set_numa_mem(__node) percpu_write(numa_mem, __node)
+#endif
+
+#else	/* !CONFIG_HAVE_MEMORYLESS_NODES */
+
+#define numa_mem numa_node
+static inline void set_numa_mem(int node) {}
+
+#endif	/* [!]CONFIG_HAVE_MEMORYLESS_NODES */
+
+#ifndef numa_mem_id
+/* Returns the number of the nearest Node with memory */
+#define numa_mem_id()		__this_cpu_read(numa_mem)
+#endif
+
+#ifndef cpu_to_mem
+#define cpu_to_mem(__cpu)	per_cpu(numa_mem, (__cpu))
+#endif
+
 #ifndef topology_physical_package_id
 #define topology_physical_package_id(cpu)	((void)(cpu), -1)
 #endif
Index: linux-2.6.33-mmotm-100302-1838/mm/page_alloc.c
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/mm/page_alloc.c	2010-03-03 16:28:53.000000000 -0500
+++ linux-2.6.33-mmotm-100302-1838/mm/page_alloc.c	2010-03-03 16:28:55.000000000 -0500
@@ -61,6 +61,11 @@ DEFINE_PER_CPU(int, numa_node);
 EXPORT_PER_CPU_SYMBOL(numa_node);
 #endif
 
+#ifdef CONFIG_HAVE_MEMORYLESS_NODES
+DEFINE_PER_CPU(int, numa_mem);		/* Kernel "local memory" node */
+EXPORT_PER_CPU_SYMBOL(numa_mem);
+#endif
+
 /*
  * Array of node states.
  */
@@ -2733,6 +2738,24 @@ static void build_zonelist_cache(pg_data
 		zlc->z_to_n[z - zonelist->_zonerefs] = zonelist_node_idx(z);
 }
 
+#ifdef CONFIG_HAVE_MEMORYLESS_NODES
+/*
+ * Return node id of node used for "local" allocations.
+ * I.e., first node id of first zone in arg node's generic zonelist.
+ * Used for initializing percpu 'numa_mem', which is used primarily
+ * for kernel allocations, so use GFP_KERNEL flags to locate zonelist.
+ */
+int local_memory_node(int node)
+{
+	struct zone *zone;
+
+	(void)first_zones_zonelist(node_zonelist(node, GFP_KERNEL),
+				   gfp_zone(GFP_KERNEL),
+				   NULL,
+				   &zone);
+	return zone->node;
+}
+#endif
 
 #else	/* CONFIG_NUMA */
 
@@ -2832,9 +2855,23 @@ static int __build_all_zonelists(void *d
 	 * needs the percpu allocator in order to allocate its pagesets
 	 * (a chicken-egg dilemma).
 	 */
-	for_each_possible_cpu(cpu)
+	for_each_possible_cpu(cpu) {
 		setup_pageset(&per_cpu(boot_pageset, cpu), 0);
 
+#ifdef CONFIG_HAVE_MEMORYLESS_NODES
+		/*
+		 * We now know the "local memory node" for each node--
+		 * i.e., the node of the first zone in the generic zonelist.
+		 * Set up numa_mem percpu variable for on-line cpus.  During
+		 * boot, only the boot cpu should be on-line;  we'll init the
+		 * secondary cpus' numa_mem as they come on-line.  During
+		 * node/memory hotplug, we'll fixup all on-line cpus.
+		 */
+		if (cpu_online(cpu))
+			cpu_to_mem(cpu) = local_memory_node(cpu_to_node(cpu));
+#endif
+	}
+
 	return 0;
 }
 
Index: linux-2.6.33-mmotm-100302-1838/include/linux/mmzone.h
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/include/linux/mmzone.h	2010-03-03 16:28:53.000000000 -0500
+++ linux-2.6.33-mmotm-100302-1838/include/linux/mmzone.h	2010-03-03 16:28:55.000000000 -0500
@@ -661,6 +661,12 @@ void memory_present(int nid, unsigned lo
 static inline void memory_present(int nid, unsigned long start, unsigned long end) {}
 #endif
 
+#ifdef CONFIG_HAVE_MEMORYLESS_NODES
+int local_memory_node(int node_id);
+#else
+static inline int local_memory_node(int node_id) { return node_id; };
+#endif
+
 #ifdef CONFIG_NEED_NODE_MEMMAP_SIZE
 unsigned long __init node_memmap_size_bytes(int, unsigned long, unsigned long);
 #endif
Index: linux-2.6.33-mmotm-100302-1838/include/asm-generic/topology.h
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/include/asm-generic/topology.h	2010-03-03 16:28:53.000000000 -0500
+++ linux-2.6.33-mmotm-100302-1838/include/asm-generic/topology.h	2010-03-03 16:28:55.000000000 -0500
@@ -34,6 +34,9 @@
 #ifndef cpu_to_node
 #define cpu_to_node(cpu)	((void)(cpu),0)
 #endif
+#ifndef cpu_to_mem
+#define cpu_to_mem(cpu)		(void)(cpu),0)
+#endif
 #ifndef parent_node
 #define parent_node(node)	((void)(node),0)
 #endif

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH/RFC 5/8] numa: Introduce numa_mem_id()- effective local memory node id
@ 2010-03-04 17:08   ` Lee Schermerhorn
  0 siblings, 0 replies; 42+ messages in thread
From: Lee Schermerhorn @ 2010-03-04 17:08 UTC (permalink / raw)
  To: linux-arch, linux-mm, linux-numa
  Cc: Tejun Heo, Mel Gorman, Andi Kleen, Christoph Lameter,
	Nick Piggin, David Rientjes, akpm, eric.whitney

Against:  2.6.33-mmotm-100302-1838

Introduce numa_mem_id(), based on generic percpu variable infrastructure
to track "effective local memory node" for archs that support memoryless
nodes.

Define API in <linux/topology.h> when CONFIG_HAVE_MEMORYLESS_NODES
defined, else stubs. Architectures will define HAVE_MEMORYLESS_NODES
if/when they support them.

Archs can override definitions of:

numa_mem_id() - returns node number of "local memory" node
set_numa_mem() - initialize [this cpus'] per cpu variable 'numa_mem'
cpu_to_mem()  - return numa_mem for specified cpu; may be used as lvalue

if they don't want to use the generic version, but want to support
memoryless nodes.

Generic initialization of 'numa_mem' occurs in __build_all_zonelists().
This will initialize the boot cpu at boot time, and all cpus on change of
numa_zonelist_order, or when node or memory hot-plug requires zonelist rebuild.
Archs that use this implementation will need to initialize 'numa_mem' for
secondary cpus as they're brought on-line.

Question:  Is it worth adding a generic initialization of per cpu numa_mem?
E.g.,  built only when CONFIG_HAVE_MEMORYLESS_NODES defined?  Or leave it
to the archs?

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

V2:  + split this out of Christoph's incomplete "starter patch"
     + flesh out the definition

 include/asm-generic/topology.h |    3 +++
 include/linux/mmzone.h         |    6 ++++++
 include/linux/topology.h       |   24 ++++++++++++++++++++++++
 mm/page_alloc.c                |   39 ++++++++++++++++++++++++++++++++++++++-
 4 files changed, 71 insertions(+), 1 deletion(-)

Index: linux-2.6.33-mmotm-100302-1838/include/linux/topology.h
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/include/linux/topology.h	2010-03-03 16:28:53.000000000 -0500
+++ linux-2.6.33-mmotm-100302-1838/include/linux/topology.h	2010-03-03 16:28:55.000000000 -0500
@@ -233,6 +233,30 @@ DECLARE_PER_CPU(int, numa_node);
 
 #endif	/* [!]CONFIG_USE_PERCPU_NUMA_NODE_ID */
 
+#ifdef CONFIG_HAVE_MEMORYLESS_NODES
+
+DECLARE_PER_CPU(int, numa_mem);
+
+#ifndef set_numa_mem
+#define set_numa_mem(__node) percpu_write(numa_mem, __node)
+#endif
+
+#else	/* !CONFIG_HAVE_MEMORYLESS_NODES */
+
+#define numa_mem numa_node
+static inline void set_numa_mem(int node) {}
+
+#endif	/* [!]CONFIG_HAVE_MEMORYLESS_NODES */
+
+#ifndef numa_mem_id
+/* Returns the number of the nearest Node with memory */
+#define numa_mem_id()		__this_cpu_read(numa_mem)
+#endif
+
+#ifndef cpu_to_mem
+#define cpu_to_mem(__cpu)	per_cpu(numa_mem, (__cpu))
+#endif
+
 #ifndef topology_physical_package_id
 #define topology_physical_package_id(cpu)	((void)(cpu), -1)
 #endif
Index: linux-2.6.33-mmotm-100302-1838/mm/page_alloc.c
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/mm/page_alloc.c	2010-03-03 16:28:53.000000000 -0500
+++ linux-2.6.33-mmotm-100302-1838/mm/page_alloc.c	2010-03-03 16:28:55.000000000 -0500
@@ -61,6 +61,11 @@ DEFINE_PER_CPU(int, numa_node);
 EXPORT_PER_CPU_SYMBOL(numa_node);
 #endif
 
+#ifdef CONFIG_HAVE_MEMORYLESS_NODES
+DEFINE_PER_CPU(int, numa_mem);		/* Kernel "local memory" node */
+EXPORT_PER_CPU_SYMBOL(numa_mem);
+#endif
+
 /*
  * Array of node states.
  */
@@ -2733,6 +2738,24 @@ static void build_zonelist_cache(pg_data
 		zlc->z_to_n[z - zonelist->_zonerefs] = zonelist_node_idx(z);
 }
 
+#ifdef CONFIG_HAVE_MEMORYLESS_NODES
+/*
+ * Return node id of node used for "local" allocations.
+ * I.e., first node id of first zone in arg node's generic zonelist.
+ * Used for initializing percpu 'numa_mem', which is used primarily
+ * for kernel allocations, so use GFP_KERNEL flags to locate zonelist.
+ */
+int local_memory_node(int node)
+{
+	struct zone *zone;
+
+	(void)first_zones_zonelist(node_zonelist(node, GFP_KERNEL),
+				   gfp_zone(GFP_KERNEL),
+				   NULL,
+				   &zone);
+	return zone->node;
+}
+#endif
 
 #else	/* CONFIG_NUMA */
 
@@ -2832,9 +2855,23 @@ static int __build_all_zonelists(void *d
 	 * needs the percpu allocator in order to allocate its pagesets
 	 * (a chicken-egg dilemma).
 	 */
-	for_each_possible_cpu(cpu)
+	for_each_possible_cpu(cpu) {
 		setup_pageset(&per_cpu(boot_pageset, cpu), 0);
 
+#ifdef CONFIG_HAVE_MEMORYLESS_NODES
+		/*
+		 * We now know the "local memory node" for each node--
+		 * i.e., the node of the first zone in the generic zonelist.
+		 * Set up numa_mem percpu variable for on-line cpus.  During
+		 * boot, only the boot cpu should be on-line;  we'll init the
+		 * secondary cpus' numa_mem as they come on-line.  During
+		 * node/memory hotplug, we'll fixup all on-line cpus.
+		 */
+		if (cpu_online(cpu))
+			cpu_to_mem(cpu) = local_memory_node(cpu_to_node(cpu));
+#endif
+	}
+
 	return 0;
 }
 
Index: linux-2.6.33-mmotm-100302-1838/include/linux/mmzone.h
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/include/linux/mmzone.h	2010-03-03 16:28:53.000000000 -0500
+++ linux-2.6.33-mmotm-100302-1838/include/linux/mmzone.h	2010-03-03 16:28:55.000000000 -0500
@@ -661,6 +661,12 @@ void memory_present(int nid, unsigned lo
 static inline void memory_present(int nid, unsigned long start, unsigned long end) {}
 #endif
 
+#ifdef CONFIG_HAVE_MEMORYLESS_NODES
+int local_memory_node(int node_id);
+#else
+static inline int local_memory_node(int node_id) { return node_id; };
+#endif
+
 #ifdef CONFIG_NEED_NODE_MEMMAP_SIZE
 unsigned long __init node_memmap_size_bytes(int, unsigned long, unsigned long);
 #endif
Index: linux-2.6.33-mmotm-100302-1838/include/asm-generic/topology.h
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/include/asm-generic/topology.h	2010-03-03 16:28:53.000000000 -0500
+++ linux-2.6.33-mmotm-100302-1838/include/asm-generic/topology.h	2010-03-03 16:28:55.000000000 -0500
@@ -34,6 +34,9 @@
 #ifndef cpu_to_node
 #define cpu_to_node(cpu)	((void)(cpu),0)
 #endif
+#ifndef cpu_to_mem
+#define cpu_to_mem(cpu)		(void)(cpu),0)
+#endif
 #ifndef parent_node
 #define parent_node(node)	((void)(node),0)
 #endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH/RFC 6/8] numa: ia64: support numa_mem_id() for memoryless nodes
  2010-03-04 17:06 ` Lee Schermerhorn
@ 2010-03-04 17:08   ` Lee Schermerhorn
  -1 siblings, 0 replies; 42+ messages in thread
From: Lee Schermerhorn @ 2010-03-04 17:08 UTC (permalink / raw)
  To: linux-arch, linux-mm, linux-numa
  Cc: Tejun Heo, Mel Gorman, Andi Kleen, Christoph Lameter,
	Nick Piggin, David Rientjes, akpm, eric.whitney

PATCH/RFC numa: ia64:  support memoryless nodes

Against:  2.6.33-mmotm-100302-1838

Enable 'HAVE_MEMORYLESS_NODES' by default when NUMA configured
on ia64.  Initialize percpu 'numa_mem' variable when starting
secondary cpus.  Generic initialization will handle the boot
cpu.

Nothing uses 'numa_mem_id()' yet.  Subsequent patch with modify
slab to use this.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

New in V2

 arch/ia64/Kconfig          |    4 ++++
 arch/ia64/kernel/smpboot.c |    1 +
 2 files changed, 5 insertions(+)

Index: linux-2.6.33-mmotm-100302-1838/arch/ia64/Kconfig
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/arch/ia64/Kconfig
+++ linux-2.6.33-mmotm-100302-1838/arch/ia64/Kconfig
@@ -502,6 +502,10 @@ config USE_PERCPU_NUMA_NODE_ID
 	def_bool y
 	depends on NUMA
 
+config HAVE_MEMORYLESS_NODES
+	def_bool y
+	depends on NUMA
+
 config ARCH_PROC_KCORE_TEXT
 	def_bool y
 	depends on PROC_KCORE
Index: linux-2.6.33-mmotm-100302-1838/arch/ia64/kernel/smpboot.c
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/arch/ia64/kernel/smpboot.c
+++ linux-2.6.33-mmotm-100302-1838/arch/ia64/kernel/smpboot.c
@@ -394,6 +394,7 @@ smp_callin (void)
 	 * numa_node_id() works after this.
 	 */
 	set_numa_node(cpu_to_node_map[cpuid]);
+	set_numa_mem(local_memory_node(cpu_to_node_map[cpuid]));
 
 	ipi_call_lock_irq();
 	spin_lock(&vector_lock);

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH/RFC 6/8] numa: ia64: support numa_mem_id() for memoryless nodes
@ 2010-03-04 17:08   ` Lee Schermerhorn
  0 siblings, 0 replies; 42+ messages in thread
From: Lee Schermerhorn @ 2010-03-04 17:08 UTC (permalink / raw)
  To: linux-arch, linux-mm, linux-numa
  Cc: Tejun Heo, Mel Gorman, Andi Kleen, Christoph Lameter,
	Nick Piggin, David Rientjes, akpm, eric.whitney

PATCH/RFC numa: ia64:  support memoryless nodes

Against:  2.6.33-mmotm-100302-1838

Enable 'HAVE_MEMORYLESS_NODES' by default when NUMA configured
on ia64.  Initialize percpu 'numa_mem' variable when starting
secondary cpus.  Generic initialization will handle the boot
cpu.

Nothing uses 'numa_mem_id()' yet.  Subsequent patch with modify
slab to use this.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

New in V2

 arch/ia64/Kconfig          |    4 ++++
 arch/ia64/kernel/smpboot.c |    1 +
 2 files changed, 5 insertions(+)

Index: linux-2.6.33-mmotm-100302-1838/arch/ia64/Kconfig
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/arch/ia64/Kconfig
+++ linux-2.6.33-mmotm-100302-1838/arch/ia64/Kconfig
@@ -502,6 +502,10 @@ config USE_PERCPU_NUMA_NODE_ID
 	def_bool y
 	depends on NUMA
 
+config HAVE_MEMORYLESS_NODES
+	def_bool y
+	depends on NUMA
+
 config ARCH_PROC_KCORE_TEXT
 	def_bool y
 	depends on PROC_KCORE
Index: linux-2.6.33-mmotm-100302-1838/arch/ia64/kernel/smpboot.c
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/arch/ia64/kernel/smpboot.c
+++ linux-2.6.33-mmotm-100302-1838/arch/ia64/kernel/smpboot.c
@@ -394,6 +394,7 @@ smp_callin (void)
 	 * numa_node_id() works after this.
 	 */
 	set_numa_node(cpu_to_node_map[cpuid]);
+	set_numa_mem(local_memory_node(cpu_to_node_map[cpuid]));
 
 	ipi_call_lock_irq();
 	spin_lock(&vector_lock);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH/RFC 7/8] numa: slab:  use numa_mem_id() for slab local memory node
  2010-03-04 17:06 ` Lee Schermerhorn
@ 2010-03-04 17:08   ` Lee Schermerhorn
  -1 siblings, 0 replies; 42+ messages in thread
From: Lee Schermerhorn @ 2010-03-04 17:08 UTC (permalink / raw)
  To: linux-arch, linux-mm, linux-numa
  Cc: Tejun Heo, Mel Gorman, Andi Kleen, Christoph Lameter,
	Nick Piggin, David Rientjes, akpm, eric.whitney

[PATCH] numa:  Slab handle memoryless nodes

Against:  2.6.33-mmotm-100302-1838

Example usage of generic "numa_mem_id()":

The mainline slab code, since ~ 2.6.19, does not handle memoryless
nodes well.  Specifically, the "fast path"--____cache_alloc()--will
never succeed as slab doesn't cache offnode objects on the per cpu
queues, and for memoryless nodes, all memory will be "off node"
relative to numa_node_id().  This adds significant overhead to all
kmem cache allocations, incurring a significant regression relative
to earlier kernels [from before slab.c was reorganized].

This patch uses the generic topology function "numa_mem_id()" to
return the "effective local memory node" for the calling context.
This is the first node in the local node's generic fallback zonelist--
i.e., the same node that "local" mempolicy-based allocations would
use.  This lets slab cache these "local" allocations and avoid
fallback/refill on every allocation.

N.B.:  Slab will need to handle node and memory hotplug events that
could change the value returned by numa_mem_id() for any given
node.  E.g., flush all per cpu slab queues before rebuilding the
zonelists.  Andi Kleen and David Rientjes are currently working on
patch series to improve slab support for memory hotplug.  When that
effort settles down, and if there is general agreement on this
approach, I'll prepare another patch to address possible change in
"local memory node", if still necessary.

Performance impact on "hackbench 400 process 200"

2.6.33+mmotm-100302-1838	       no-patch  this-patch [series]
ia64 no memoryless nodes [avg of 10]: 	 11.853	   11.739 (secs)
ia64 cpus all on memless nodes  [10]: 	264.909	   27.938 ~10x speedup

The slowdown of the patched kernel from ~12 sec to ~28 seconds when
configured with memoryless nodes is the result of all cpus allocating
from a single node's mm pagepool.  The cache lines of the single node
are distributed/interleaved over the memory of the real physical nodes,
but the zone locks of the single node with memory still each live in a
single cache line that is accessed from all processors.

x86_64 [8x6 AMD] [avg of 10]:	   	  3.322	    3.148

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

 mm/slab.c |   27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

Index: linux-2.6.33-mmotm-100302-1838/mm/slab.c
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/mm/slab.c
+++ linux-2.6.33-mmotm-100302-1838/mm/slab.c
@@ -1073,7 +1073,7 @@ static inline int cache_free_alien(struc
 	struct array_cache *alien = NULL;
 	int node;
 
-	node = numa_node_id();
+	node = numa_mem_id();
 
 	/*
 	 * Make sure we are not freeing a object from another node to the array
@@ -1418,7 +1418,7 @@ void __init kmem_cache_init(void)
 	 * 6) Resize the head arrays of the kmalloc caches to their final sizes.
 	 */
 
-	node = numa_node_id();
+	node = numa_mem_id();
 
 	/* 1) create the cache_cache */
 	INIT_LIST_HEAD(&cache_chain);
@@ -2052,7 +2052,7 @@ static int __init_refok setup_cpu_cache(
 			}
 		}
 	}
-	cachep->nodelists[numa_node_id()]->next_reap =
+	cachep->nodelists[numa_mem_id()]->next_reap =
 			jiffies + REAPTIMEOUT_LIST3 +
 			((unsigned long)cachep) % REAPTIMEOUT_LIST3;
 
@@ -2383,7 +2383,7 @@ static void check_spinlock_acquired(stru
 {
 #ifdef CONFIG_SMP
 	check_irq_off();
-	assert_spin_locked(&cachep->nodelists[numa_node_id()]->list_lock);
+	assert_spin_locked(&cachep->nodelists[numa_mem_id()]->list_lock);
 #endif
 }
 
@@ -2410,7 +2410,7 @@ static void do_drain(void *arg)
 {
 	struct kmem_cache *cachep = arg;
 	struct array_cache *ac;
-	int node = numa_node_id();
+	int node = numa_mem_id();
 
 	check_irq_off();
 	ac = cpu_cache_get(cachep);
@@ -2943,7 +2943,7 @@ static void *cache_alloc_refill(struct k
 
 retry:
 	check_irq_off();
-	node = numa_node_id();
+	node = numa_mem_id();
 	ac = cpu_cache_get(cachep);
 	batchcount = ac->batchcount;
 	if (!ac->touched && batchcount > BATCHREFILL_LIMIT) {
@@ -3147,7 +3147,7 @@ static void *alternate_node_alloc(struct
 
 	if (in_interrupt() || (flags & __GFP_THISNODE))
 		return NULL;
-	nid_alloc = nid_here = numa_node_id();
+	nid_alloc = nid_here = numa_mem_id();
 	if (cpuset_do_slab_mem_spread() && (cachep->flags & SLAB_MEM_SPREAD))
 		nid_alloc = cpuset_mem_spread_node();
 	else if (current->mempolicy)
@@ -3316,6 +3316,7 @@ __cache_alloc_node(struct kmem_cache *ca
 {
 	unsigned long save_flags;
 	void *ptr;
+	int slab_node = numa_mem_id();
 
 	flags &= gfp_allowed_mask;
 
@@ -3328,7 +3329,7 @@ __cache_alloc_node(struct kmem_cache *ca
 	local_irq_save(save_flags);
 
 	if (nodeid == -1)
-		nodeid = numa_node_id();
+		nodeid = slab_node;
 
 	if (unlikely(!cachep->nodelists[nodeid])) {
 		/* Node not bootstrapped yet */
@@ -3336,7 +3337,7 @@ __cache_alloc_node(struct kmem_cache *ca
 		goto out;
 	}
 
-	if (nodeid == numa_node_id()) {
+	if (nodeid == slab_node) {
 		/*
 		 * Use the locally cached objects if possible.
 		 * However ____cache_alloc does not allow fallback
@@ -3380,8 +3381,8 @@ __do_cache_alloc(struct kmem_cache *cach
 	 * We may just have run out of memory on the local node.
 	 * ____cache_alloc_node() knows how to locate memory on other nodes
 	 */
- 	if (!objp)
- 		objp = ____cache_alloc_node(cache, flags, numa_node_id());
+	if (!objp)
+		objp = ____cache_alloc_node(cache, flags, numa_mem_id());
 
   out:
 	return objp;
@@ -3478,7 +3479,7 @@ static void cache_flusharray(struct kmem
 {
 	int batchcount;
 	struct kmem_list3 *l3;
-	int node = numa_node_id();
+	int node = numa_mem_id();
 
 	batchcount = ac->batchcount;
 #if DEBUG
@@ -4053,7 +4054,7 @@ static void cache_reap(struct work_struc
 {
 	struct kmem_cache *searchp;
 	struct kmem_list3 *l3;
-	int node = numa_node_id();
+	int node = numa_mem_id();
 	struct delayed_work *work = to_delayed_work(w);
 
 	if (!mutex_trylock(&cache_chain_mutex))

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH/RFC 7/8] numa: slab:  use numa_mem_id() for slab local memory node
@ 2010-03-04 17:08   ` Lee Schermerhorn
  0 siblings, 0 replies; 42+ messages in thread
From: Lee Schermerhorn @ 2010-03-04 17:08 UTC (permalink / raw)
  To: linux-arch, linux-mm, linux-numa
  Cc: Tejun Heo, Mel Gorman, Andi Kleen, Christoph Lameter,
	Nick Piggin, David Rientjes, akpm, eric.whitney

[PATCH] numa:  Slab handle memoryless nodes

Against:  2.6.33-mmotm-100302-1838

Example usage of generic "numa_mem_id()":

The mainline slab code, since ~ 2.6.19, does not handle memoryless
nodes well.  Specifically, the "fast path"--____cache_alloc()--will
never succeed as slab doesn't cache offnode objects on the per cpu
queues, and for memoryless nodes, all memory will be "off node"
relative to numa_node_id().  This adds significant overhead to all
kmem cache allocations, incurring a significant regression relative
to earlier kernels [from before slab.c was reorganized].

This patch uses the generic topology function "numa_mem_id()" to
return the "effective local memory node" for the calling context.
This is the first node in the local node's generic fallback zonelist--
i.e., the same node that "local" mempolicy-based allocations would
use.  This lets slab cache these "local" allocations and avoid
fallback/refill on every allocation.

N.B.:  Slab will need to handle node and memory hotplug events that
could change the value returned by numa_mem_id() for any given
node.  E.g., flush all per cpu slab queues before rebuilding the
zonelists.  Andi Kleen and David Rientjes are currently working on
patch series to improve slab support for memory hotplug.  When that
effort settles down, and if there is general agreement on this
approach, I'll prepare another patch to address possible change in
"local memory node", if still necessary.

Performance impact on "hackbench 400 process 200"

2.6.33+mmotm-100302-1838	       no-patch  this-patch [series]
ia64 no memoryless nodes [avg of 10]: 	 11.853	   11.739 (secs)
ia64 cpus all on memless nodes  [10]: 	264.909	   27.938 ~10x speedup

The slowdown of the patched kernel from ~12 sec to ~28 seconds when
configured with memoryless nodes is the result of all cpus allocating
from a single node's mm pagepool.  The cache lines of the single node
are distributed/interleaved over the memory of the real physical nodes,
but the zone locks of the single node with memory still each live in a
single cache line that is accessed from all processors.

x86_64 [8x6 AMD] [avg of 10]:	   	  3.322	    3.148

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

 mm/slab.c |   27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

Index: linux-2.6.33-mmotm-100302-1838/mm/slab.c
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/mm/slab.c
+++ linux-2.6.33-mmotm-100302-1838/mm/slab.c
@@ -1073,7 +1073,7 @@ static inline int cache_free_alien(struc
 	struct array_cache *alien = NULL;
 	int node;
 
-	node = numa_node_id();
+	node = numa_mem_id();
 
 	/*
 	 * Make sure we are not freeing a object from another node to the array
@@ -1418,7 +1418,7 @@ void __init kmem_cache_init(void)
 	 * 6) Resize the head arrays of the kmalloc caches to their final sizes.
 	 */
 
-	node = numa_node_id();
+	node = numa_mem_id();
 
 	/* 1) create the cache_cache */
 	INIT_LIST_HEAD(&cache_chain);
@@ -2052,7 +2052,7 @@ static int __init_refok setup_cpu_cache(
 			}
 		}
 	}
-	cachep->nodelists[numa_node_id()]->next_reap =
+	cachep->nodelists[numa_mem_id()]->next_reap =
 			jiffies + REAPTIMEOUT_LIST3 +
 			((unsigned long)cachep) % REAPTIMEOUT_LIST3;
 
@@ -2383,7 +2383,7 @@ static void check_spinlock_acquired(stru
 {
 #ifdef CONFIG_SMP
 	check_irq_off();
-	assert_spin_locked(&cachep->nodelists[numa_node_id()]->list_lock);
+	assert_spin_locked(&cachep->nodelists[numa_mem_id()]->list_lock);
 #endif
 }
 
@@ -2410,7 +2410,7 @@ static void do_drain(void *arg)
 {
 	struct kmem_cache *cachep = arg;
 	struct array_cache *ac;
-	int node = numa_node_id();
+	int node = numa_mem_id();
 
 	check_irq_off();
 	ac = cpu_cache_get(cachep);
@@ -2943,7 +2943,7 @@ static void *cache_alloc_refill(struct k
 
 retry:
 	check_irq_off();
-	node = numa_node_id();
+	node = numa_mem_id();
 	ac = cpu_cache_get(cachep);
 	batchcount = ac->batchcount;
 	if (!ac->touched && batchcount > BATCHREFILL_LIMIT) {
@@ -3147,7 +3147,7 @@ static void *alternate_node_alloc(struct
 
 	if (in_interrupt() || (flags & __GFP_THISNODE))
 		return NULL;
-	nid_alloc = nid_here = numa_node_id();
+	nid_alloc = nid_here = numa_mem_id();
 	if (cpuset_do_slab_mem_spread() && (cachep->flags & SLAB_MEM_SPREAD))
 		nid_alloc = cpuset_mem_spread_node();
 	else if (current->mempolicy)
@@ -3316,6 +3316,7 @@ __cache_alloc_node(struct kmem_cache *ca
 {
 	unsigned long save_flags;
 	void *ptr;
+	int slab_node = numa_mem_id();
 
 	flags &= gfp_allowed_mask;
 
@@ -3328,7 +3329,7 @@ __cache_alloc_node(struct kmem_cache *ca
 	local_irq_save(save_flags);
 
 	if (nodeid == -1)
-		nodeid = numa_node_id();
+		nodeid = slab_node;
 
 	if (unlikely(!cachep->nodelists[nodeid])) {
 		/* Node not bootstrapped yet */
@@ -3336,7 +3337,7 @@ __cache_alloc_node(struct kmem_cache *ca
 		goto out;
 	}
 
-	if (nodeid == numa_node_id()) {
+	if (nodeid == slab_node) {
 		/*
 		 * Use the locally cached objects if possible.
 		 * However ____cache_alloc does not allow fallback
@@ -3380,8 +3381,8 @@ __do_cache_alloc(struct kmem_cache *cach
 	 * We may just have run out of memory on the local node.
 	 * ____cache_alloc_node() knows how to locate memory on other nodes
 	 */
- 	if (!objp)
- 		objp = ____cache_alloc_node(cache, flags, numa_node_id());
+	if (!objp)
+		objp = ____cache_alloc_node(cache, flags, numa_mem_id());
 
   out:
 	return objp;
@@ -3478,7 +3479,7 @@ static void cache_flusharray(struct kmem
 {
 	int batchcount;
 	struct kmem_list3 *l3;
-	int node = numa_node_id();
+	int node = numa_mem_id();
 
 	batchcount = ac->batchcount;
 #if DEBUG
@@ -4053,7 +4054,7 @@ static void cache_reap(struct work_struc
 {
 	struct kmem_cache *searchp;
 	struct kmem_list3 *l3;
-	int node = numa_node_id();
+	int node = numa_mem_id();
 	struct delayed_work *work = to_delayed_work(w);
 
 	if (!mutex_trylock(&cache_chain_mutex))

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH/RFC 8/8] numa:  in-kernel profiling -- support memoryless nodes
@ 2010-03-04 17:08   ` Lee Schermerhorn
  0 siblings, 0 replies; 42+ messages in thread
From: Lee Schermerhorn @ 2010-03-04 17:08 UTC (permalink / raw)
  To: linux-arch, linux-mm, linux-numa
  Cc: Tejun Heo, Mel Gorman, Andi Kleen, Christoph Lameter,
	Nick Piggin, David Rientjes, akpm, eric.whitney

Against:  2.6.33-mmotm-100302-1838

Patch:  in-kernel profiling -- support memoryless nodes.

Another example of using numa_mem_id() to support memoryless
nodes efficiently.  I stumbled across this when trying to profile
the kernel in the memoryless nodes configuration.  A quick look
at other usages of numa_node_id() and cpu_to_node() for explicit
local allocations indicates that there are several other places
that could be problematic for systems with memoryless nodes that
can also be addressed with this simple substitution:

In-kernel profiling requires that we be able to allocate "local"
memory for each cpu.  Use "cpu_to_mem()" instead of "cpu_to_node()"
to support memoryless nodes.

Depends on the "numa_mem_id()" patch.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

 kernel/profile.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.33-mmotm-100302-1838/kernel/profile.c
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/kernel/profile.c
+++ linux-2.6.33-mmotm-100302-1838/kernel/profile.c
@@ -363,7 +363,7 @@ static int __cpuinit profile_cpu_callbac
 	switch (action) {
 	case CPU_UP_PREPARE:
 	case CPU_UP_PREPARE_FROZEN:
-		node = cpu_to_node(cpu);
+		node = cpu_to_mem(cpu);
 		per_cpu(cpu_profile_flip, cpu) = 0;
 		if (!per_cpu(cpu_profile_hits, cpu)[1]) {
 			page = alloc_pages_exact_node(node,
@@ -565,7 +565,7 @@ static int create_hash_tables(void)
 	int cpu;
 
 	for_each_online_cpu(cpu) {
-		int node = cpu_to_node(cpu);
+		int node = cpu_to_mem(cpu);
 		struct page *page;
 
 		page = alloc_pages_exact_node(node,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH/RFC 8/8] numa:  in-kernel profiling -- support memoryless nodes
@ 2010-03-04 17:08   ` Lee Schermerhorn
  0 siblings, 0 replies; 42+ messages in thread
From: Lee Schermerhorn @ 2010-03-04 17:08 UTC (permalink / raw)
  To: linux-arch, linux-mm, linux-numa
  Cc: Tejun Heo, Mel Gorman, Andi Kleen, Christoph Lameter,
	Nick Piggin, David Rientjes, akpm, eric.whitney

Against:  2.6.33-mmotm-100302-1838

Patch:  in-kernel profiling -- support memoryless nodes.

Another example of using numa_mem_id() to support memoryless
nodes efficiently.  I stumbled across this when trying to profile
the kernel in the memoryless nodes configuration.  A quick look
at other usages of numa_node_id() and cpu_to_node() for explicit
local allocations indicates that there are several other places
that could be problematic for systems with memoryless nodes that
can also be addressed with this simple substitution:

In-kernel profiling requires that we be able to allocate "local"
memory for each cpu.  Use "cpu_to_mem()" instead of "cpu_to_node()"
to support memoryless nodes.

Depends on the "numa_mem_id()" patch.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

 kernel/profile.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.33-mmotm-100302-1838/kernel/profile.c
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/kernel/profile.c
+++ linux-2.6.33-mmotm-100302-1838/kernel/profile.c
@@ -363,7 +363,7 @@ static int __cpuinit profile_cpu_callbac
 	switch (action) {
 	case CPU_UP_PREPARE:
 	case CPU_UP_PREPARE_FROZEN:
-		node = cpu_to_node(cpu);
+		node = cpu_to_mem(cpu);
 		per_cpu(cpu_profile_flip, cpu) = 0;
 		if (!per_cpu(cpu_profile_hits, cpu)[1]) {
 			page = alloc_pages_exact_node(node,
@@ -565,7 +565,7 @@ static int create_hash_tables(void)
 	int cpu;
 
 	for_each_online_cpu(cpu) {
-		int node = cpu_to_node(cpu);
+		int node = cpu_to_mem(cpu);
 		struct page *page;
 
 		page = alloc_pages_exact_node(node,

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH/RFC 2/8] numa:  add generic percpu var implementation of numa_node_id()
@ 2010-03-04 18:44     ` Christoph Lameter
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Lameter @ 2010-03-04 18:44 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-arch, linux-mm, linux-numa, Tejun Heo, Mel Gorman,
	Andi Kleen, Nick Piggin, David Rientjes, akpm, eric.whitney


Reviewed-by: Christoph Lameter <cl@linux-foundation.org>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH/RFC 2/8] numa:  add generic percpu var implementation of numa_node_id()
@ 2010-03-04 18:44     ` Christoph Lameter
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Lameter @ 2010-03-04 18:44 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-arch, linux-mm, linux-numa, Tejun Heo, Mel Gorman,
	Andi Kleen, Nick Piggin, David Rientjes, akpm, eric.whitney


Reviewed-by: Christoph Lameter <cl@linux-foundation.org>



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH/RFC 3/8] numa:  x86_64:  use generic percpu var for numa_node_id() implementation
@ 2010-03-04 18:47     ` Christoph Lameter
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Lameter @ 2010-03-04 18:47 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-arch, linux-mm, linux-numa, Tejun Heo, Mel Gorman,
	Andi Kleen, Nick Piggin, David Rientjes, akpm, eric.whitney

On Thu, 4 Mar 2010, Lee Schermerhorn wrote:

> Index: linux-2.6.33-mmotm-100302-1838/arch/x86/include/asm/percpu.h
> ===================================================================
> --- linux-2.6.33-mmotm-100302-1838.orig/arch/x86/include/asm/percpu.h
> +++ linux-2.6.33-mmotm-100302-1838/arch/x86/include/asm/percpu.h
> @@ -208,10 +208,12 @@ do {									\
>  #define percpu_or(var, val)		percpu_to_op("or", var, val)
>  #define percpu_xor(var, val)		percpu_to_op("xor", var, val)
>
> +#define __this_cpu_read(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
>  #define __this_cpu_read_1(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
>  #define __this_cpu_read_2(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
>  #define __this_cpu_read_4(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
>
> +#define __this_cpu_write(pcp, val)	percpu_to_op("mov", (pcp), val)
>  #define __this_cpu_write_1(pcp, val)	percpu_to_op("mov", (pcp), val)
>  #define __this_cpu_write_2(pcp, val)	percpu_to_op("mov", (pcp), val)
>  #define __this_cpu_write_4(pcp, val)	percpu_to_op("mov", (pcp), val)


The functions added are already defined in linux/percpu.h and their
definition here is wrong since the u64 case is not handled (percpu.h does
that correctly).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH/RFC 3/8] numa:  x86_64:  use generic percpu var for numa_node_id() implementation
@ 2010-03-04 18:47     ` Christoph Lameter
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Lameter @ 2010-03-04 18:47 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-arch, linux-mm, linux-numa, Tejun Heo, Mel Gorman,
	Andi Kleen, Nick Piggin, David Rientjes, akpm, eric.whitney

On Thu, 4 Mar 2010, Lee Schermerhorn wrote:

> Index: linux-2.6.33-mmotm-100302-1838/arch/x86/include/asm/percpu.h
> ===================================================================
> --- linux-2.6.33-mmotm-100302-1838.orig/arch/x86/include/asm/percpu.h
> +++ linux-2.6.33-mmotm-100302-1838/arch/x86/include/asm/percpu.h
> @@ -208,10 +208,12 @@ do {									\
>  #define percpu_or(var, val)		percpu_to_op("or", var, val)
>  #define percpu_xor(var, val)		percpu_to_op("xor", var, val)
>
> +#define __this_cpu_read(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
>  #define __this_cpu_read_1(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
>  #define __this_cpu_read_2(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
>  #define __this_cpu_read_4(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
>
> +#define __this_cpu_write(pcp, val)	percpu_to_op("mov", (pcp), val)
>  #define __this_cpu_write_1(pcp, val)	percpu_to_op("mov", (pcp), val)
>  #define __this_cpu_write_2(pcp, val)	percpu_to_op("mov", (pcp), val)
>  #define __this_cpu_write_4(pcp, val)	percpu_to_op("mov", (pcp), val)


The functions added are already defined in linux/percpu.h and their
definition here is wrong since the u64 case is not handled (percpu.h does
that correctly).


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH/RFC 4/8] numa:  ia64:  use generic percpu var numa_node_id() implementation
@ 2010-03-04 18:48     ` Christoph Lameter
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Lameter @ 2010-03-04 18:48 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-arch, linux-mm, linux-numa, Tejun Heo, Mel Gorman,
	Andi Kleen, Nick Piggin, David Rientjes, akpm, eric.whitney


Reviewed-by: Christoph Lameter <cl@linux-foundation.org>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH/RFC 4/8] numa:  ia64:  use generic percpu var numa_node_id() implementation
@ 2010-03-04 18:48     ` Christoph Lameter
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Lameter @ 2010-03-04 18:48 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-arch, linux-mm, linux-numa, Tejun Heo, Mel Gorman,
	Andi Kleen, Nick Piggin, David Rientjes, akpm, eric.whitney


Reviewed-by: Christoph Lameter <cl@linux-foundation.org>



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH/RFC 5/8] numa: Introduce numa_mem_id()- effective local memory node id
@ 2010-03-04 18:52     ` Christoph Lameter
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Lameter @ 2010-03-04 18:52 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-arch, linux-mm, linux-numa, Tejun Heo, Mel Gorman,
	Andi Kleen, Nick Piggin, David Rientjes, akpm, eric.whitney

On Thu, 4 Mar 2010, Lee Schermerhorn wrote:

> numa_mem_id() - returns node number of "local memory" node

Can we call that numa_nearest_node or so? What happens if multiple nodes
are at the same distance? Still feel unsecure about what happens if there
are N closest nodes to M cpuless cpus. Will each of the M cpus use the
first of the N closest nodes for allocation?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH/RFC 5/8] numa: Introduce numa_mem_id()- effective local memory node id
@ 2010-03-04 18:52     ` Christoph Lameter
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Lameter @ 2010-03-04 18:52 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-arch, linux-mm, linux-numa, Tejun Heo, Mel Gorman,
	Andi Kleen, Nick Piggin, David Rientjes, akpm, eric.whitney

On Thu, 4 Mar 2010, Lee Schermerhorn wrote:

> numa_mem_id() - returns node number of "local memory" node

Can we call that numa_nearest_node or so? What happens if multiple nodes
are at the same distance? Still feel unsecure about what happens if there
are N closest nodes to M cpuless cpus. Will each of the M cpus use the
first of the N closest nodes for allocation?


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH/RFC 5/8] numa: Introduce numa_mem_id()- effective local memory node id
  2010-03-04 18:52     ` Christoph Lameter
@ 2010-03-04 19:28       ` Lee Schermerhorn
  -1 siblings, 0 replies; 42+ messages in thread
From: Lee Schermerhorn @ 2010-03-04 19:28 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-arch, linux-mm, linux-numa, Tejun Heo, Mel Gorman,
	Andi Kleen, Nick Piggin, David Rientjes, akpm, eric.whitney

On Thu, 2010-03-04 at 12:52 -0600, Christoph Lameter wrote:
> On Thu, 4 Mar 2010, Lee Schermerhorn wrote:
> 
> > numa_mem_id() - returns node number of "local memory" node
> 
> Can we call that numa_nearest_node or so? 

Or "numa_local_memory_node"?  We think/hope it's the nearest one...

We'll choose something for the next respin.

> What happens if multiple nodes
> are at the same distance? 

This is handled by build_all_zonelists().  It attempts to distribute
the various nodes' zonelists to avoid multiple nodes falling back to the
same node when distances are equal--e.g., with a default SLIT.  However,
if the SLIT distances indicate that a node [A] is closer to 2 or more
other nodes [B...] than any other node, all of nodes B... will fallback
to node A.  

> Still feel unsecure about what happens if there
> are N closest nodes to M cpuless cpus. Will each of the M cpus use the
> first of the N closest nodes for allocation?

Each of the M cpus [memless, right?] will use the first node in their
respective node's zonelist.  If the cpu's node has local memory, the cpu
will allocate from there.  If the cpu's node is memoryless, the cpu will
allocate from the node that build_all_zonelists/find_next_best_node
assigned as the first node-with-memory in the cpu's node's zonelist.

I.e., the same logic all "local" mempolicy based allocations will use.

Lee


> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH/RFC 5/8] numa: Introduce numa_mem_id()- effective local memory node id
@ 2010-03-04 19:28       ` Lee Schermerhorn
  0 siblings, 0 replies; 42+ messages in thread
From: Lee Schermerhorn @ 2010-03-04 19:28 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-arch, linux-mm, linux-numa, Tejun Heo, Mel Gorman,
	Andi Kleen, Nick Piggin, David Rientjes, akpm, eric.whitney

On Thu, 2010-03-04 at 12:52 -0600, Christoph Lameter wrote:
> On Thu, 4 Mar 2010, Lee Schermerhorn wrote:
> 
> > numa_mem_id() - returns node number of "local memory" node
> 
> Can we call that numa_nearest_node or so? 

Or "numa_local_memory_node"?  We think/hope it's the nearest one...

We'll choose something for the next respin.

> What happens if multiple nodes
> are at the same distance? 

This is handled by build_all_zonelists().  It attempts to distribute
the various nodes' zonelists to avoid multiple nodes falling back to the
same node when distances are equal--e.g., with a default SLIT.  However,
if the SLIT distances indicate that a node [A] is closer to 2 or more
other nodes [B...] than any other node, all of nodes B... will fallback
to node A.  

> Still feel unsecure about what happens if there
> are N closest nodes to M cpuless cpus. Will each of the M cpus use the
> first of the N closest nodes for allocation?

Each of the M cpus [memless, right?] will use the first node in their
respective node's zonelist.  If the cpu's node has local memory, the cpu
will allocate from there.  If the cpu's node is memoryless, the cpu will
allocate from the node that build_all_zonelists/find_next_best_node
assigned as the first node-with-memory in the cpu's node's zonelist.

I.e., the same logic all "local" mempolicy based allocations will use.

Lee


> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH/RFC 3/8] numa:  x86_64:  use generic percpu var for numa_node_id() implementation
  2010-03-04 18:47     ` Christoph Lameter
@ 2010-03-04 20:42       ` Lee Schermerhorn
  -1 siblings, 0 replies; 42+ messages in thread
From: Lee Schermerhorn @ 2010-03-04 20:42 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-arch, linux-mm, linux-numa, Tejun Heo, Mel Gorman,
	Andi Kleen, Nick Piggin, David Rientjes, akpm, eric.whitney

On Thu, 2010-03-04 at 12:47 -0600, Christoph Lameter wrote: 
> On Thu, 4 Mar 2010, Lee Schermerhorn wrote:
> 
> > Index: linux-2.6.33-mmotm-100302-1838/arch/x86/include/asm/percpu.h
> > ===================================================================
> > --- linux-2.6.33-mmotm-100302-1838.orig/arch/x86/include/asm/percpu.h
> > +++ linux-2.6.33-mmotm-100302-1838/arch/x86/include/asm/percpu.h
> > @@ -208,10 +208,12 @@ do {									\
> >  #define percpu_or(var, val)		percpu_to_op("or", var, val)
> >  #define percpu_xor(var, val)		percpu_to_op("xor", var, val)
> >
> > +#define __this_cpu_read(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
> >  #define __this_cpu_read_1(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
> >  #define __this_cpu_read_2(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
> >  #define __this_cpu_read_4(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
> >
> > +#define __this_cpu_write(pcp, val)	percpu_to_op("mov", (pcp), val)
> >  #define __this_cpu_write_1(pcp, val)	percpu_to_op("mov", (pcp), val)
> >  #define __this_cpu_write_2(pcp, val)	percpu_to_op("mov", (pcp), val)
> >  #define __this_cpu_write_4(pcp, val)	percpu_to_op("mov", (pcp), val)
> 
> 
> The functions added are already defined in linux/percpu.h and their
> definition here is wrong since the u64 case is not handled (percpu.h does
> that correctly).

Well, in linux/percpu-defs.h after the first patch in this series, but
x86 is overriding it with the percpu_to_op() implementation.  You're
saying that the x86 percpu_to_op() macro doesn't handle 8-byte 'pcp'
operands?  It appears to handle sizes 1, 2, 4 and 8.

But, I just tried the series with the above two definitions removed and
the kernel builds and boots.  And runs the hackbench test even faster.

2.6.33-mmotm-100302-1838 on 8x6 AMD x86_64 -- hackbench 400 process 200
[avg of 10 runs]:
no add'l patches:		3.332
my V3 series:			3.148
V3 + generic __this_cpu_xxx():  3.083  [removed x86 defs of
__this_cpu_xxx()]


So, I'll remove those definitions in V4.

Do we want to retain the x86 definitions of __this_cpu_xxx_[124]() or
remove those and let the generic definitions handle them? 

Lee





  
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH/RFC 3/8] numa:  x86_64:  use generic percpu var for numa_node_id() implementation
@ 2010-03-04 20:42       ` Lee Schermerhorn
  0 siblings, 0 replies; 42+ messages in thread
From: Lee Schermerhorn @ 2010-03-04 20:42 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-arch, linux-mm, linux-numa, Tejun Heo, Mel Gorman,
	Andi Kleen, Nick Piggin, David Rientjes, akpm, eric.whitney

On Thu, 2010-03-04 at 12:47 -0600, Christoph Lameter wrote: 
> On Thu, 4 Mar 2010, Lee Schermerhorn wrote:
> 
> > Index: linux-2.6.33-mmotm-100302-1838/arch/x86/include/asm/percpu.h
> > ===================================================================
> > --- linux-2.6.33-mmotm-100302-1838.orig/arch/x86/include/asm/percpu.h
> > +++ linux-2.6.33-mmotm-100302-1838/arch/x86/include/asm/percpu.h
> > @@ -208,10 +208,12 @@ do {									\
> >  #define percpu_or(var, val)		percpu_to_op("or", var, val)
> >  #define percpu_xor(var, val)		percpu_to_op("xor", var, val)
> >
> > +#define __this_cpu_read(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
> >  #define __this_cpu_read_1(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
> >  #define __this_cpu_read_2(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
> >  #define __this_cpu_read_4(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
> >
> > +#define __this_cpu_write(pcp, val)	percpu_to_op("mov", (pcp), val)
> >  #define __this_cpu_write_1(pcp, val)	percpu_to_op("mov", (pcp), val)
> >  #define __this_cpu_write_2(pcp, val)	percpu_to_op("mov", (pcp), val)
> >  #define __this_cpu_write_4(pcp, val)	percpu_to_op("mov", (pcp), val)
> 
> 
> The functions added are already defined in linux/percpu.h and their
> definition here is wrong since the u64 case is not handled (percpu.h does
> that correctly).

Well, in linux/percpu-defs.h after the first patch in this series, but
x86 is overriding it with the percpu_to_op() implementation.  You're
saying that the x86 percpu_to_op() macro doesn't handle 8-byte 'pcp'
operands?  It appears to handle sizes 1, 2, 4 and 8.

But, I just tried the series with the above two definitions removed and
the kernel builds and boots.  And runs the hackbench test even faster.

2.6.33-mmotm-100302-1838 on 8x6 AMD x86_64 -- hackbench 400 process 200
[avg of 10 runs]:
no add'l patches:		3.332
my V3 series:			3.148
V3 + generic __this_cpu_xxx():  3.083  [removed x86 defs of
__this_cpu_xxx()]


So, I'll remove those definitions in V4.

Do we want to retain the x86 definitions of __this_cpu_xxx_[124]() or
remove those and let the generic definitions handle them? 

Lee





  
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH/RFC 3/8] numa:  x86_64:  use generic percpu var for numa_node_id() implementation
  2010-03-04 20:42       ` Lee Schermerhorn
@ 2010-03-04 21:16         ` Christoph Lameter
  -1 siblings, 0 replies; 42+ messages in thread
From: Christoph Lameter @ 2010-03-04 21:16 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-arch, linux-mm, linux-numa, Tejun Heo, Mel Gorman,
	Andi Kleen, Nick Piggin, David Rientjes, akpm, eric.whitney

On Thu, 4 Mar 2010, Lee Schermerhorn wrote:

> Well, in linux/percpu-defs.h after the first patch in this series, but
> x86 is overriding it with the percpu_to_op() implementation.  You're
> saying that the x86 percpu_to_op() macro doesn't handle 8-byte 'pcp'
> operands?  It appears to handle sizes 1, 2, 4 and 8.

8 byte operands are not allowed for 32 bit but work on 64 bit.

> So, I'll remove those definitions in V4.

Ok.

> Do we want to retain the x86 definitions of __this_cpu_xxx_[124]() or
> remove those and let the generic definitions handle them?

Generic definitions would not be as efficient as the use of the segment
register to shift the address to the cpu area.

I have not figured out exactly what you are doing with the percpu
definitions and why yet. Ill look at that when I have some time.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH/RFC 3/8] numa:  x86_64:  use generic percpu var for numa_node_id() implementation
@ 2010-03-04 21:16         ` Christoph Lameter
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Lameter @ 2010-03-04 21:16 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-arch, linux-mm, linux-numa, Tejun Heo, Mel Gorman,
	Andi Kleen, Nick Piggin, David Rientjes, akpm, eric.whitney

On Thu, 4 Mar 2010, Lee Schermerhorn wrote:

> Well, in linux/percpu-defs.h after the first patch in this series, but
> x86 is overriding it with the percpu_to_op() implementation.  You're
> saying that the x86 percpu_to_op() macro doesn't handle 8-byte 'pcp'
> operands?  It appears to handle sizes 1, 2, 4 and 8.

8 byte operands are not allowed for 32 bit but work on 64 bit.

> So, I'll remove those definitions in V4.

Ok.

> Do we want to retain the x86 definitions of __this_cpu_xxx_[124]() or
> remove those and let the generic definitions handle them?

Generic definitions would not be as efficient as the use of the segment
register to shift the address to the cpu area.

I have not figured out exactly what you are doing with the percpu
definitions and why yet. Ill look at that when I have some time.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH/RFC 0/8] Numa: Use Generic Per-cpu Variables for numa_*_id()
  2010-03-04 17:06 ` Lee Schermerhorn
@ 2010-03-05  1:19   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 42+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-03-05  1:19 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-arch, linux-mm, linux-numa, Tejun Heo, Mel Gorman,
	Andi Kleen, Christoph Lameter, Nick Piggin, David Rientjes, akpm,
	eric.whitney

On Thu, 04 Mar 2010 12:06:54 -0500
Lee Schermerhorn <lee.schermerhorn@hp.com> wrote:

> >nid-04:
> >
> >* Isn't #define numa_mem numa_node a bit dangerous?  Someone might use
> >  numa_mem as a local variable name.  Why not define it as a inline
> >  function or at least a macro which takes argument.
> 
> numa_mem and numa_node are the names of the per cpu variables, referenced
> by __this_cpu_read().  So, I suppose we can rename them both something like:
> percpu_numa_*.  Would satisfy your concern?
> 
> What do others think?
> 
> Currently I've left them as numa_mem and numa_node.
> 

Could you add some documentation to Documentation/vm/numa ?
about
  numa_node_id()
  numa_mem_id()
  topics on memory-less node
  (cpu-less node)
  

Recently I see this kind of topics on list but I'm not sure whether
I catch the issues/changes correctly....

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH/RFC 0/8] Numa: Use Generic Per-cpu Variables for numa_*_id()
@ 2010-03-05  1:19   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 42+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-03-05  1:19 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-arch, linux-mm, linux-numa, Tejun Heo, Mel Gorman,
	Andi Kleen, Christoph Lameter, Nick Piggin, David Rientjes, akpm,
	eric.whitney

On Thu, 04 Mar 2010 12:06:54 -0500
Lee Schermerhorn <lee.schermerhorn@hp.com> wrote:

> >nid-04:
> >
> >* Isn't #define numa_mem numa_node a bit dangerous?  Someone might use
> >  numa_mem as a local variable name.  Why not define it as a inline
> >  function or at least a macro which takes argument.
> 
> numa_mem and numa_node are the names of the per cpu variables, referenced
> by __this_cpu_read().  So, I suppose we can rename them both something like:
> percpu_numa_*.  Would satisfy your concern?
> 
> What do others think?
> 
> Currently I've left them as numa_mem and numa_node.
> 

Could you add some documentation to Documentation/vm/numa ?
about
  numa_node_id()
  numa_mem_id()
  topics on memory-less node
  (cpu-less node)
  

Recently I see this kind of topics on list but I'm not sure whether
I catch the issues/changes correctly....

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH/RFC 0/8] Numa: Use Generic Per-cpu Variables for numa_*_id()
  2010-03-05  1:19   ` KAMEZAWA Hiroyuki
@ 2010-03-05  1:25     ` Lee Schermerhorn
  -1 siblings, 0 replies; 42+ messages in thread
From: Lee Schermerhorn @ 2010-03-05  1:25 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-arch, linux-mm, linux-numa, Tejun Heo, Mel Gorman,
	Andi Kleen, Christoph Lameter, Nick Piggin, David Rientjes, akpm,
	eric.whitney

On Fri, 2010-03-05 at 10:19 +0900, KAMEZAWA Hiroyuki wrote:
> On Thu, 04 Mar 2010 12:06:54 -0500
> Lee Schermerhorn <lee.schermerhorn@hp.com> wrote:
> 
> > >nid-04:
> > >
> > >* Isn't #define numa_mem numa_node a bit dangerous?  Someone might use
> > >  numa_mem as a local variable name.  Why not define it as a inline
> > >  function or at least a macro which takes argument.
> > 
> > numa_mem and numa_node are the names of the per cpu variables, referenced
> > by __this_cpu_read().  So, I suppose we can rename them both something like:
> > percpu_numa_*.  Would satisfy your concern?
> > 
> > What do others think?
> > 
> > Currently I've left them as numa_mem and numa_node.
> > 
> 
> Could you add some documentation to Documentation/vm/numa ?
> about
>   numa_node_id()
>   numa_mem_id()
>   topics on memory-less node
>   (cpu-less node)


Hmmm.  Good idea.  I'll see what I can come up with.

Thanks,
Lee

>   
> 
> Recently I see this kind of topics on list but I'm not sure whether
> I catch the issues/changes correctly....
> 
> Thanks,
> -Kame
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH/RFC 0/8] Numa: Use Generic Per-cpu Variables for numa_*_id()
@ 2010-03-05  1:25     ` Lee Schermerhorn
  0 siblings, 0 replies; 42+ messages in thread
From: Lee Schermerhorn @ 2010-03-05  1:25 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-arch, linux-mm, linux-numa, Tejun Heo, Mel Gorman,
	Andi Kleen, Christoph Lameter, Nick Piggin, David Rientjes, akpm,
	eric.whitney

On Fri, 2010-03-05 at 10:19 +0900, KAMEZAWA Hiroyuki wrote:
> On Thu, 04 Mar 2010 12:06:54 -0500
> Lee Schermerhorn <lee.schermerhorn@hp.com> wrote:
> 
> > >nid-04:
> > >
> > >* Isn't #define numa_mem numa_node a bit dangerous?  Someone might use
> > >  numa_mem as a local variable name.  Why not define it as a inline
> > >  function or at least a macro which takes argument.
> > 
> > numa_mem and numa_node are the names of the per cpu variables, referenced
> > by __this_cpu_read().  So, I suppose we can rename them both something like:
> > percpu_numa_*.  Would satisfy your concern?
> > 
> > What do others think?
> > 
> > Currently I've left them as numa_mem and numa_node.
> > 
> 
> Could you add some documentation to Documentation/vm/numa ?
> about
>   numa_node_id()
>   numa_mem_id()
>   topics on memory-less node
>   (cpu-less node)


Hmmm.  Good idea.  I'll see what I can come up with.

Thanks,
Lee

>   
> 
> Recently I see this kind of topics on list but I'm not sure whether
> I catch the issues/changes correctly....
> 
> Thanks,
> -Kame
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH/RFC 1/8] numa: prep:  move generic percpu interface definitions to percpu-defs.h
  2010-03-04 17:07   ` Lee Schermerhorn
@ 2010-03-09  8:46     ` Tejun Heo
  -1 siblings, 0 replies; 42+ messages in thread
From: Tejun Heo @ 2010-03-09  8:46 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-arch, linux-mm, linux-numa, Mel Gorman, Andi Kleen,
	Christoph Lameter, Nick Piggin, David Rientjes, akpm,
	eric.whitney

Hello,

On 03/05/2010 02:07 AM, Lee Schermerhorn wrote:
> To use the generic percpu infrastructure for the numa_node_id() interface,
> defined in linux/topology.h, we need to break the circular header dependency
> that results from including <linux/percpu.h> in <linux/topology.h>.  The
> circular dependency:
> 
> 	percpu.h -> slab.h -> gfp.h -> topology.h
> 
> percpu.h includes slab.h to obtain the definition of kzalloc()/kfree() for
> inlining __alloc_percpu() and free_percpu() in !SMP configurations.  One could
> un-inline these functions in the !SMP case, but a large number of files depend
> on percpu.h to include slab.h.  Tejun Heo suggested moving the definitions to
> percpu-defs.h and requested that this be separated from the remainder of the
> generic percpu numa_node_id() preparation patch.

Hmmm... I think uninlining !SMP case would be much cleaner.  Sorry
that you had to do it twice.  I'll break the dependency in the percpu
devel branch and let you know.

For other patches, except for what Christoph has already pointed out,
everything looks good to me.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH/RFC 1/8] numa: prep:  move generic percpu interface definitions to percpu-defs.h
@ 2010-03-09  8:46     ` Tejun Heo
  0 siblings, 0 replies; 42+ messages in thread
From: Tejun Heo @ 2010-03-09  8:46 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-arch, linux-mm, linux-numa, Mel Gorman, Andi Kleen,
	Christoph Lameter, Nick Piggin, David Rientjes, akpm,
	eric.whitney

Hello,

On 03/05/2010 02:07 AM, Lee Schermerhorn wrote:
> To use the generic percpu infrastructure for the numa_node_id() interface,
> defined in linux/topology.h, we need to break the circular header dependency
> that results from including <linux/percpu.h> in <linux/topology.h>.  The
> circular dependency:
> 
> 	percpu.h -> slab.h -> gfp.h -> topology.h
> 
> percpu.h includes slab.h to obtain the definition of kzalloc()/kfree() for
> inlining __alloc_percpu() and free_percpu() in !SMP configurations.  One could
> un-inline these functions in the !SMP case, but a large number of files depend
> on percpu.h to include slab.h.  Tejun Heo suggested moving the definitions to
> percpu-defs.h and requested that this be separated from the remainder of the
> generic percpu numa_node_id() preparation patch.

Hmmm... I think uninlining !SMP case would be much cleaner.  Sorry
that you had to do it twice.  I'll break the dependency in the percpu
devel branch and let you know.

For other patches, except for what Christoph has already pointed out,
everything looks good to me.

Thank you.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH/RFC 1/8] numa: prep:  move generic percpu interface definitions to percpu-defs.h
  2010-03-09  8:46     ` Tejun Heo
@ 2010-03-09 14:13       ` Lee Schermerhorn
  -1 siblings, 0 replies; 42+ messages in thread
From: Lee Schermerhorn @ 2010-03-09 14:13 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-arch, linux-mm, linux-numa, Mel Gorman, Andi Kleen,
	Christoph Lameter, Nick Piggin, David Rientjes, akpm,
	eric.whitney

On Tue, 2010-03-09 at 17:46 +0900, Tejun Heo wrote:
> Hello,
> 
> On 03/05/2010 02:07 AM, Lee Schermerhorn wrote:
> > To use the generic percpu infrastructure for the numa_node_id() interface,
> > defined in linux/topology.h, we need to break the circular header dependency
> > that results from including <linux/percpu.h> in <linux/topology.h>.  The
> > circular dependency:
> > 
> > 	percpu.h -> slab.h -> gfp.h -> topology.h
> > 
> > percpu.h includes slab.h to obtain the definition of kzalloc()/kfree() for
> > inlining __alloc_percpu() and free_percpu() in !SMP configurations.  One could
> > un-inline these functions in the !SMP case, but a large number of files depend
> > on percpu.h to include slab.h.  Tejun Heo suggested moving the definitions to
> > percpu-defs.h and requested that this be separated from the remainder of the
> > generic percpu numa_node_id() preparation patch.
> 
> Hmmm... I think uninlining !SMP case would be much cleaner.  Sorry
> that you had to do it twice.  I'll break the dependency in the percpu
> devel branch and let you know.

OK, I'll do that for V4.  It'll be one big ugly patch because of all the
dependencies.  But, it's really just a mechanical change.

> 
> For other patches, except for what Christoph has already pointed out,
> everything looks good to me.
> 
> Thank you.
> 

Thank you for the review.

Regards,
Lee

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH/RFC 1/8] numa: prep:  move generic percpu interface definitions to percpu-defs.h
@ 2010-03-09 14:13       ` Lee Schermerhorn
  0 siblings, 0 replies; 42+ messages in thread
From: Lee Schermerhorn @ 2010-03-09 14:13 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-arch, linux-mm, linux-numa, Mel Gorman, Andi Kleen,
	Christoph Lameter, Nick Piggin, David Rientjes, akpm,
	eric.whitney

On Tue, 2010-03-09 at 17:46 +0900, Tejun Heo wrote:
> Hello,
> 
> On 03/05/2010 02:07 AM, Lee Schermerhorn wrote:
> > To use the generic percpu infrastructure for the numa_node_id() interface,
> > defined in linux/topology.h, we need to break the circular header dependency
> > that results from including <linux/percpu.h> in <linux/topology.h>.  The
> > circular dependency:
> > 
> > 	percpu.h -> slab.h -> gfp.h -> topology.h
> > 
> > percpu.h includes slab.h to obtain the definition of kzalloc()/kfree() for
> > inlining __alloc_percpu() and free_percpu() in !SMP configurations.  One could
> > un-inline these functions in the !SMP case, but a large number of files depend
> > on percpu.h to include slab.h.  Tejun Heo suggested moving the definitions to
> > percpu-defs.h and requested that this be separated from the remainder of the
> > generic percpu numa_node_id() preparation patch.
> 
> Hmmm... I think uninlining !SMP case would be much cleaner.  Sorry
> that you had to do it twice.  I'll break the dependency in the percpu
> devel branch and let you know.

OK, I'll do that for V4.  It'll be one big ugly patch because of all the
dependencies.  But, it's really just a mechanical change.

> 
> For other patches, except for what Christoph has already pointed out,
> everything looks good to me.
> 
> Thank you.
> 

Thank you for the review.

Regards,
Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH/RFC 1/8] numa: prep:  move generic percpu interface definitions to percpu-defs.h
  2010-03-09 14:13       ` Lee Schermerhorn
@ 2010-03-10  9:06         ` Tejun Heo
  -1 siblings, 0 replies; 42+ messages in thread
From: Tejun Heo @ 2010-03-10  9:06 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-arch, linux-mm, linux-numa, Mel Gorman, Andi Kleen,
	Christoph Lameter, Nick Piggin, David Rientjes, akpm,
	eric.whitney

Hello,

On 03/09/2010 11:13 PM, Lee Schermerhorn wrote:
>> Hmmm... I think uninlining !SMP case would be much cleaner.  Sorry
>> that you had to do it twice.  I'll break the dependency in the percpu
>> devel branch and let you know.
> 
> OK, I'll do that for V4.  It'll be one big ugly patch because of all the
> dependencies.  But, it's really just a mechanical change.

Just in case it wasn't clear.  I'm giving it a shot right now.  I
don't think it will be too ugly and it's something which should be
done whether ugly or not.  I'll let you know how it turns out.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH/RFC 1/8] numa: prep:  move generic percpu interface definitions to percpu-defs.h
@ 2010-03-10  9:06         ` Tejun Heo
  0 siblings, 0 replies; 42+ messages in thread
From: Tejun Heo @ 2010-03-10  9:06 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-arch, linux-mm, linux-numa, Mel Gorman, Andi Kleen,
	Christoph Lameter, Nick Piggin, David Rientjes, akpm,
	eric.whitney

Hello,

On 03/09/2010 11:13 PM, Lee Schermerhorn wrote:
>> Hmmm... I think uninlining !SMP case would be much cleaner.  Sorry
>> that you had to do it twice.  I'll break the dependency in the percpu
>> devel branch and let you know.
> 
> OK, I'll do that for V4.  It'll be one big ugly patch because of all the
> dependencies.  But, it's really just a mechanical change.

Just in case it wasn't clear.  I'm giving it a shot right now.  I
don't think it will be too ugly and it's something which should be
done whether ugly or not.  I'll let you know how it turns out.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2010-03-10  9:07 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-04 17:06 [PATCH/RFC 0/8] Numa: Use Generic Per-cpu Variables for numa_*_id() Lee Schermerhorn
2010-03-04 17:06 ` Lee Schermerhorn
2010-03-04 17:07 ` [PATCH/RFC 1/8] numa: prep: move generic percpu interface definitions to percpu-defs.h Lee Schermerhorn
2010-03-04 17:07   ` Lee Schermerhorn
2010-03-09  8:46   ` Tejun Heo
2010-03-09  8:46     ` Tejun Heo
2010-03-09 14:13     ` Lee Schermerhorn
2010-03-09 14:13       ` Lee Schermerhorn
2010-03-10  9:06       ` Tejun Heo
2010-03-10  9:06         ` Tejun Heo
2010-03-04 17:07 ` [PATCH/RFC 2/8] numa: add generic percpu var implementation of numa_node_id() Lee Schermerhorn
2010-03-04 17:07   ` Lee Schermerhorn
2010-03-04 18:44   ` Christoph Lameter
2010-03-04 18:44     ` Christoph Lameter
2010-03-04 17:07 ` [PATCH/RFC 3/8] numa: x86_64: use generic percpu var for numa_node_id() implementation Lee Schermerhorn
2010-03-04 17:07   ` Lee Schermerhorn
2010-03-04 18:47   ` Christoph Lameter
2010-03-04 18:47     ` Christoph Lameter
2010-03-04 20:42     ` Lee Schermerhorn
2010-03-04 20:42       ` Lee Schermerhorn
2010-03-04 21:16       ` Christoph Lameter
2010-03-04 21:16         ` Christoph Lameter
2010-03-04 17:07 ` [PATCH/RFC 4/8] numa: ia64: use generic percpu var " Lee Schermerhorn
2010-03-04 17:07   ` Lee Schermerhorn
2010-03-04 18:48   ` Christoph Lameter
2010-03-04 18:48     ` Christoph Lameter
2010-03-04 17:08 ` [PATCH/RFC 5/8] numa: Introduce numa_mem_id()- effective local memory node id Lee Schermerhorn
2010-03-04 17:08   ` Lee Schermerhorn
2010-03-04 18:52   ` Christoph Lameter
2010-03-04 18:52     ` Christoph Lameter
2010-03-04 19:28     ` Lee Schermerhorn
2010-03-04 19:28       ` Lee Schermerhorn
2010-03-04 17:08 ` [PATCH/RFC 6/8] numa: ia64: support numa_mem_id() for memoryless nodes Lee Schermerhorn
2010-03-04 17:08   ` Lee Schermerhorn
2010-03-04 17:08 ` [PATCH/RFC 7/8] numa: slab: use numa_mem_id() for slab local memory node Lee Schermerhorn
2010-03-04 17:08   ` Lee Schermerhorn
2010-03-04 17:08 ` [PATCH/RFC 8/8] numa: in-kernel profiling -- support memoryless nodes Lee Schermerhorn
2010-03-04 17:08   ` Lee Schermerhorn
2010-03-05  1:19 ` [PATCH/RFC 0/8] Numa: Use Generic Per-cpu Variables for numa_*_id() KAMEZAWA Hiroyuki
2010-03-05  1:19   ` KAMEZAWA Hiroyuki
2010-03-05  1:25   ` Lee Schermerhorn
2010-03-05  1:25     ` Lee Schermerhorn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.