All of lore.kernel.org
 help / color / mirror / Atom feed
* [this_cpu_xx V2 00/19] Introduce this_cpu_xx operations
@ 2009-06-17 20:33 cl
  2009-06-17 20:33 ` [this_cpu_xx V2 01/19] Fix handling of pagesets for downed cpus cl
                   ` (18 more replies)
  0 siblings, 19 replies; 48+ messages in thread
From: cl @ 2009-06-17 20:33 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, Tejun Heo, mingo, rusty, davem

V1->V2:
- Various minor fixes
- Add SLUB conversion
- Add Page allocator conversion
- Patch against the git tree of today

The patchset introduces various operations to allow efficient access
to per cpu variables for the current processor. Currently there is
no way in the core to calcualte the address of the instance
of a per cpu variable without a table lookup through

	per_cpu_ptr(x, smp_processor_id())

The patchset introduces a way to calculate the address using the offset
that is available in arch specific ways (register or special memory
locations) using

	this_cpu_ptr(x)

In addition operations are provided that can operate on per cpu
pointers. This is necessary to be able to use the addresses
generated by the new per cpu allocator with per cpu RMW instructions.

The arch provided RMW instructions can be used to avoid having to switch
off preemption and interrupts for per cpu counter updates.

One caveat with this patchset is that it currently does not work on S/390.
Tejun Heo has a patchset that fixes the SHIFT_PERCPU_PTR issues on that
platform. That patch is required before S/390 will work.

Patchset will reduce the code size and increase speed of operations for
dynamically allocated per cpu based statistics.

Patch shows how this could be done. There are many other places in
the code where these macros could be beneficial.

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [this_cpu_xx V2 01/19] Fix handling of pagesets for downed cpus
  2009-06-17 20:33 [this_cpu_xx V2 00/19] Introduce this_cpu_xx operations cl
@ 2009-06-17 20:33 ` cl
  2009-06-17 20:33 ` [this_cpu_xx V2 02/19] Introduce this_cpu_ptr() and generic this_cpu_* operations cl
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: cl @ 2009-06-17 20:33 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, Tejun Heo, mingo, rusty, davem

[-- Attachment #1: fixupli --]
[-- Type: text/plain, Size: 1517 bytes --]

If a processor is downed then we need to set the pageset pointer back to the
boot pageset.

Updates of the high water marks should not access pagesets of unpopulated zones
(those pointer go to the boot pagesets which would be no longer functional if
their size would be increased beyond zero).

Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

---
 mm/page_alloc.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c	2009-06-17 14:06:22.000000000 -0500
+++ linux-2.6/mm/page_alloc.c	2009-06-17 14:07:50.000000000 -0500
@@ -3039,7 +3039,7 @@ static inline void free_zone_pagesets(in
 		/* Free per_cpu_pageset if it is slab allocated */
 		if (pset != &boot_pageset[cpu])
 			kfree(pset);
-		zone_pcp(zone, cpu) = NULL;
+		zone_pcp(zone, cpu) = &boot_pageset[cpu];
 	}
 }
 
@@ -4657,7 +4657,7 @@ int percpu_pagelist_fraction_sysctl_hand
 	ret = proc_dointvec_minmax(table, write, file, buffer, length, ppos);
 	if (!write || (ret == -EINVAL))
 		return ret;
-	for_each_zone(zone) {
+	for_each_populated_zone(zone) {
 		for_each_online_cpu(cpu) {
 			unsigned long  high;
 			high = zone->present_pages / percpu_pagelist_fraction;

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [this_cpu_xx V2 02/19] Introduce this_cpu_ptr() and generic this_cpu_* operations
  2009-06-17 20:33 [this_cpu_xx V2 00/19] Introduce this_cpu_xx operations cl
  2009-06-17 20:33 ` [this_cpu_xx V2 01/19] Fix handling of pagesets for downed cpus cl
@ 2009-06-17 20:33 ` cl
  2009-06-18  1:50   ` Tejun Heo
  2009-06-17 20:33 ` [this_cpu_xx V2 03/19] Use this_cpu operations for SNMP statistics cl
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: cl @ 2009-06-17 20:33 UTC (permalink / raw)
  To: akpm
  Cc: linux-mm, David Howells, Tejun Heo, Ingo Molnar, Rusty Russell,
	Eric Dumazet, davem

[-- Attachment #1: this_cpu_ptr_intro --]
[-- Type: text/plain, Size: 9370 bytes --]

this_cpu_ptr
------------

this_cpu_ptr(xx) = per_cpu_ptr(xx, smp_processor_id).

The problem with per_cpu_ptr(x, smp_processor_id) is that it requires
an array lookup to find the offset for the cpu. Processors typically
have the offset for the current cpu area in some kind of (arch dependent)
efficiently accessible register or memory location.

We can use that instead of doing the array lookup to speed up the
determination of the addressof the percpu variable. This is particularly
significant because these lookups occur in performance critical paths
of the core kernel.

this_cpu_ptr comes in two flavors. The preemption context matters since we
are referring the the currently executing processor. In many cases we must
insure that the processor does not change while a code segment is executed.

__this_cpu_ptr 	-> Do not check for preemption context
this_cpu_ptr	-> Check preemption context

Generic this_cpu_* operations
-----------------------------

this_cpu_* operations operate on scalars that are members of structures
allocated with the new per cpu allocator. They can also operate on
static per_cpu variables if the address of those is determined with
per_cpu_var().

Generic functions that are used if an arch does not define optimized
this_cpu operations. The functions come also come in the two favors. The first
parameter is a scalar that is a member of a structure allocated through
allocpercpu or by taking the address of a per cpu variable.

The operations are guaranteed to be atomic vs preemption if they modify
the scalar (unless they are prefixed by __ in which case they do not need
to be). The calculation of the per cpu offset is also guaranteed to be atomic
in the same way.

this_cpu_read(scalar)
this_cpu_write(scalar, value)
this_cpu_add(scale, value)
this_cpu_sub(scalar, value)
this_cpu_inc(scalar)
this_cpu_dec(scalar)
this_cpu_and(scalar, value)
this_cpu_or(scalar, value)
this_cpu_xor(scalar, value)

Arche code can override the generic functions and provide optimized atomic
per cpu operations. These atomic operations must provide both the relocation
(x86 does it through a segment override) and the operation on the data in a
 single instruction. Otherwise preempt needs to be disabled and there is no
 gain from providing arch implementations.

A third variant is provided prefixed by irqsafe_. These variants are safe
against hardware interrupts on the *same* processor (all per cpu atomic
primitives are *always* *only* providing safety for code running on the
*same* processor!). The increment needs to be implemented by the hardware
in such a way that it is a single RMW instruction that is either processed
before or after an interrupt.

cc: David Howells <dhowells@redhat.com>
cc: Tejun Heo <tj@kernel.org>
cc: Ingo Molnar <mingo@elte.hu>
cc: Rusty Russell <rusty@rustcorp.com.au>
cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

---
 include/asm-generic/percpu.h |    5 +
 include/linux/percpu.h       |  159 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 164 insertions(+)

Index: linux-2.6/include/linux/percpu.h
===================================================================
--- linux-2.6.orig/include/linux/percpu.h	2009-06-17 11:47:17.000000000 -0500
+++ linux-2.6/include/linux/percpu.h	2009-06-17 13:59:19.000000000 -0500
@@ -181,4 +181,163 @@ do {									\
 # define percpu_xor(var, val)		__percpu_generic_to_op(var, (val), ^=)
 #endif
 
+
+/*
+ * Optimized manipulation for memory allocated through the per cpu
+ * allocator or for addresses of per cpu variables (can be determined
+ * using per_cpu_var(xx).
+ *
+ * These operation guarantee exclusivity of access for other operations
+ * on the *same* processor. The assumption is that per cpu data is only
+ * accessed by a single processor instance (the current one).
+ *
+ * The first group is used for accesses that must be done in a
+ * preemption safe way since we know that the context is not preempt
+ * safe. Interrupts may occur. If the interrupt modifies the variable
+ * too then RMW actions will not be reliable.
+ */
+#ifndef this_cpu_read
+# define this_cpu_read(pcp)						\
+  ({									\
+	*this_cpu_ptr(&(pcp));						\
+  })
+#endif
+
+#define _this_cpu_generic_to_op(pcp, val, op)				\
+do {									\
+	preempt_disable();						\
+	*__this_cpu_ptr(&pcp) op val;					\
+	preempt_enable_no_resched();					\
+} while (0)
+
+#ifndef this_cpu_write
+# define this_cpu_write(pcp, val)	_this_cpu_generic_to_op((pcp), (val), =)
+#endif
+
+#ifndef this_cpu_add
+# define this_cpu_add(pcp, val)		_this_cpu_generic_to_op((pcp), (val), +=)
+#endif
+
+#ifndef this_cpu_sub
+# define this_cpu_sub(pcp, val)		this_cpu_add((pcp), -(val))
+#endif
+
+#ifndef this_cpu_inc
+# define this_cpu_inc(pcp)		this_cpu_add((pcp), 1)
+#endif
+
+#ifndef this_cpu_dec
+# define this_cpu_dec(pcp)		this_cpu_sub((pcp), 1)
+#endif
+
+#ifndef this_cpu_and
+# define this_cpu_and(pcp, val)		_this_cpu_generic_to_op((pcp), (val), &=)
+#endif
+
+#ifndef this_cpu_or
+# define this_cpu_or(pcp, val)		_this_cpu_generic_to_op((pcp), (val), |=)
+#endif
+
+#ifndef this_cpu_xor
+# define this_cpu_xor(pcp, val)		_this_cpu_generic_to_op((pcp), (val), ^=)
+#endif
+
+
+/*
+ * Generic percpu operations that do not require preemption handling.
+ * Either we do not care about races or the caller has the
+ * responsibility of handling preemptions issues.
+ *
+ * If there is no other protection through preempt disable and/or
+ * disabling interupts then one of these RMW operations can show unexpected
+ * behavior because the execution thread was rescheduled on another processor
+ * or an interrupt occurred and the same percpu variable was modified from
+ * the interrupt context.
+ */
+#ifndef __this_cpu_read
+# define __this_cpu_read(pcp)						\
+  ({									\
+	*__this_cpu_ptr(&(pcp));					\
+  })
+#endif
+
+#define __this_cpu_generic_to_op(pcp, val, op)				\
+do {									\
+	*__this_cpu_ptr(&(pcp)) op val;					\
+} while (0)
+
+#ifndef __this_cpu_write
+# define __this_cpu_write(pcp, val)	__this_cpu_generic_to_op((pcp), (val), =)
+#endif
+
+#ifndef __this_cpu_add
+# define __this_cpu_add(pcp, val)	__this_cpu_generic_to_op((pcp), (val), +=)
+#endif
+
+#ifndef __this_cpu_sub
+# define __this_cpu_sub(pcp, val)	__this_cpu_add((pcp), -(var))
+#endif
+
+#ifndef __this_cpu_inc
+# define __this_cpu_inc(pcp)		__this_cpu_add((pcp), 1)
+#endif
+
+#ifndef __this_cpu_dec
+# define __this_cpu_dec(pcp)		__this_cpu_sub((pcp), 1)
+#endif
+
+#ifndef __this_cpu_and
+# define __this_cpu_and(pcp, val)	__this_cpu_generic_to_op((pcp), (val), &=)
+#endif
+
+#ifndef __this_cpu_or
+# define __this_cpu_or(pcp, val)	__this_cpu_generic_to_op((pcp), (val), |=)
+#endif
+
+#ifndef __this_cpu_xor
+# define __this_cpu_xor(pcp, val)	__this_cpu_generic_to_op((pcp), (val), ^=)
+#endif
+
+/*
+ * IRQ safe versions of the per cpu RMW operations. Note that these operations
+ * are *not* safe against modification of the same variable from another
+ * processors. They are guaranteed to be atomic vs. local interrupts and
+ * preemption.
+ */
+#define irqsafe_cpu_generic_to_op(pcp, val, op)				\
+do {									\
+	unsigned long flags;						\
+	local_irqsave(flags);						\
+	*__this_cpu_ptr(&(pcp)) op val;					\
+	local_irqrestore(flags);					\
+} while (0)
+
+#ifndef irqsafe_this_cpu_add
+# define irqsafe_this_cpu_add(pcp, val)	irqsafe_cpu_generic_to_op((pcp), (val), +=)
+#endif
+
+#ifndef irqsafe_this_cpu_sub
+# define irqsafe_this_cpu_sub(pcp, val)	irqsafe_cpu_add((pcp), -(var))
+#endif
+
+#ifndef irqsafe_this_cpu_inc
+# define irqsafe_this_cpu_inc(pcp)	irqsafe_cpu_add((pcp), 1)
+#endif
+
+#ifndef irqsafe_this_cpu_dec
+# define irqsafe_this_cpu_dec(pcp)	irqsafe_cpu_sub((pcp), 1)
+#endif
+
+#ifndef irqsafe_this_cpu_and
+# define irqsafe_this_cpu_and(pcp, val)	irqsafe_cpu_generic_to_op((pcp), (val), &=)
+#endif
+
+#ifndef irqsafe_this_cpu_or
+# define irqsafe_this_cpu_or(pcp, val)	irqsafe_cpu_generic_to_op((pcp), (val), |=)
+#endif
+
+#ifndef irqsafe_this_cpu_xor
+# define irqsafe_this_cpu_xor(pcp, val)	irqsafe_cpu_generic_to_op((pcp), (val), ^=)
+#endif
+
 #endif /* __LINUX_PERCPU_H */
Index: linux-2.6/include/asm-generic/percpu.h
===================================================================
--- linux-2.6.orig/include/asm-generic/percpu.h	2009-06-17 11:47:17.000000000 -0500
+++ linux-2.6/include/asm-generic/percpu.h	2009-06-17 13:41:40.000000000 -0500
@@ -56,6 +56,9 @@ extern unsigned long __per_cpu_offset[NR
 #define __raw_get_cpu_var(var) \
 	(*SHIFT_PERCPU_PTR(&per_cpu_var(var), __my_cpu_offset))
 
+#define this_cpu_ptr(ptr) SHIFT_PERCPU_PTR(ptr, my_cpu_offset)
+#define __this_cpu_ptr(ptr) SHIFT_PERCPU_PTR(ptr, __my_cpu_offset)
+
 
 #ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA
 extern void setup_per_cpu_areas(void);
@@ -66,6 +69,8 @@ extern void setup_per_cpu_areas(void);
 #define per_cpu(var, cpu)			(*((void)(cpu), &per_cpu_var(var)))
 #define __get_cpu_var(var)			per_cpu_var(var)
 #define __raw_get_cpu_var(var)			per_cpu_var(var)
+#define this_cpu_ptr(ptr) per_cpu_ptr(ptr, 0)
+#define __this_cpu_ptr(ptr) this_cpu_ptr(ptr)
 
 #endif	/* SMP */
 

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [this_cpu_xx V2 03/19] Use this_cpu operations for SNMP statistics
  2009-06-17 20:33 [this_cpu_xx V2 00/19] Introduce this_cpu_xx operations cl
  2009-06-17 20:33 ` [this_cpu_xx V2 01/19] Fix handling of pagesets for downed cpus cl
  2009-06-17 20:33 ` [this_cpu_xx V2 02/19] Introduce this_cpu_ptr() and generic this_cpu_* operations cl
@ 2009-06-17 20:33 ` cl
  2009-06-18  1:55   ` Tejun Heo
  2009-06-17 20:33 ` [this_cpu_xx V2 04/19] Use this_cpu operations for NFS statistics cl
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: cl @ 2009-06-17 20:33 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, Tejun Heo, mingo, rusty, davem

[-- Attachment #1: this_cpu_snmp --]
[-- Type: text/plain, Size: 2552 bytes --]

SNMP statistic macros can be signficantly simplified.
This will also reduce code size if the arch supports these operations
in harware.

Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

---
 include/net/snmp.h |   41 ++++++++++++-----------------------------
 1 file changed, 12 insertions(+), 29 deletions(-)

Index: linux-2.6/include/net/snmp.h
===================================================================
--- linux-2.6.orig/include/net/snmp.h	2009-06-15 17:24:31.000000000 -0500
+++ linux-2.6/include/net/snmp.h	2009-06-15 17:53:29.000000000 -0500
@@ -136,35 +136,18 @@ struct linux_xfrm_mib {
 #define SNMP_STAT_BHPTR(name)	(name[0])
 #define SNMP_STAT_USRPTR(name)	(name[1])
 
-#define SNMP_INC_STATS_BH(mib, field) 	\
-	(per_cpu_ptr(mib[0], raw_smp_processor_id())->mibs[field]++)
-#define SNMP_INC_STATS_USER(mib, field) \
-	do { \
-		per_cpu_ptr(mib[1], get_cpu())->mibs[field]++; \
-		put_cpu(); \
-	} while (0)
-#define SNMP_INC_STATS(mib, field) 	\
-	do { \
-		per_cpu_ptr(mib[!in_softirq()], get_cpu())->mibs[field]++; \
-		put_cpu(); \
-	} while (0)
-#define SNMP_DEC_STATS(mib, field) 	\
-	do { \
-		per_cpu_ptr(mib[!in_softirq()], get_cpu())->mibs[field]--; \
-		put_cpu(); \
-	} while (0)
-#define SNMP_ADD_STATS(mib, field, addend) 	\
-	do { \
-		per_cpu_ptr(mib[!in_softirq()], get_cpu())->mibs[field] += addend; \
-		put_cpu(); \
-	} while (0)
-#define SNMP_ADD_STATS_BH(mib, field, addend) 	\
-	(per_cpu_ptr(mib[0], raw_smp_processor_id())->mibs[field] += addend)
-#define SNMP_ADD_STATS_USER(mib, field, addend) 	\
-	do { \
-		per_cpu_ptr(mib[1], get_cpu())->mibs[field] += addend; \
-		put_cpu(); \
-	} while (0)
+#define SNMP_INC_STATS_BH(mib, field)	\
+			__this_cpu_inc(mib[0]->mibs[field])
+#define SNMP_INC_STATS_USER(mib, field)	\
+			this_cpu_inc(mib[1]->mibs[field])
+#define SNMP_INC_STATS(mib, field)	\
+			this_cpu_inc(mib[!in_softirq()]->mibs[field])
+#define SNMP_DEC_STATS(mib, field)	\
+		 	this_cpu_dec(mib[!in_softirq()]->mibs[field])
+#define SNMP_ADD_STATS_BH(mib, field, addend)	\
+			__this_cpu_add(mib[0]->mibs[field], addend)
+#define SNMP_ADD_STATS_USER(mib, field, addend)	\
+			this_cpu_add(mib[1]->mibs[field], addend)
 #define SNMP_UPD_PO_STATS(mib, basefield, addend)	\
 	do { \
 		__typeof__(mib[0]) ptr = per_cpu_ptr(mib[!in_softirq()], get_cpu());\

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [this_cpu_xx V2 04/19] Use this_cpu operations for NFS statistics
  2009-06-17 20:33 [this_cpu_xx V2 00/19] Introduce this_cpu_xx operations cl
                   ` (2 preceding siblings ...)
  2009-06-17 20:33 ` [this_cpu_xx V2 03/19] Use this_cpu operations for SNMP statistics cl
@ 2009-06-17 20:33 ` cl
  2009-06-18  2:03   ` Tejun Heo
  2009-06-17 20:33 ` [this_cpu_xx V2 05/19] use this_cpu ops for network statistics cl
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: cl @ 2009-06-17 20:33 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, Trond Myklebust, Tejun Heo, mingo, rusty, davem

[-- Attachment #1: this_cpu_nfs --]
[-- Type: text/plain, Size: 2152 bytes --]

Simplify NFS statistics and allow the use of optimized
arch instructions.

CC: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

---
 fs/nfs/iostat.h |   30 +++++++++---------------------
 1 file changed, 9 insertions(+), 21 deletions(-)

Index: linux-2.6/fs/nfs/iostat.h
===================================================================
--- linux-2.6.orig/fs/nfs/iostat.h	2009-06-17 09:10:00.000000000 -0500
+++ linux-2.6/fs/nfs/iostat.h	2009-06-17 09:21:02.000000000 -0500
@@ -25,13 +25,7 @@ struct nfs_iostats {
 static inline void nfs_inc_server_stats(const struct nfs_server *server,
 					enum nfs_stat_eventcounters stat)
 {
-	struct nfs_iostats *iostats;
-	int cpu;
-
-	cpu = get_cpu();
-	iostats = per_cpu_ptr(server->io_stats, cpu);
-	iostats->events[stat]++;
-	put_cpu();
+	this_cpu_inc(server->io_stats->events[stat]);
 }
 
 static inline void nfs_inc_stats(const struct inode *inode,
@@ -44,13 +38,13 @@ static inline void nfs_add_server_stats(
 					enum nfs_stat_bytecounters stat,
 					unsigned long addend)
 {
-	struct nfs_iostats *iostats;
-	int cpu;
-
-	cpu = get_cpu();
-	iostats = per_cpu_ptr(server->io_stats, cpu);
-	iostats->bytes[stat] += addend;
-	put_cpu();
+	/*
+	 * bytes is larger than word size on 32 bit platforms.
+	 * Thus we cannot use this_cpu_add() here.
+	 */
+	preempt_disable();
+	*this_cpu_ptr(&server->io_stats->bytes[stat]) +=  addend;
+	preempt_enable_no_resched();
 }
 
 static inline void nfs_add_stats(const struct inode *inode,
@@ -65,13 +59,7 @@ static inline void nfs_add_fscache_stats
 					 enum nfs_stat_fscachecounters stat,
 					 unsigned long addend)
 {
-	struct nfs_iostats *iostats;
-	int cpu;
-
-	cpu = get_cpu();
-	iostats = per_cpu_ptr(NFS_SERVER(inode)->io_stats, cpu);
-	iostats->fscache[stat] += addend;
-	put_cpu();
+	this_cpu_add(NFS_SERVER(inode)->io_stats->fscache[stat], addend);
 }
 #endif
 

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [this_cpu_xx V2 05/19] use this_cpu ops for network statistics
  2009-06-17 20:33 [this_cpu_xx V2 00/19] Introduce this_cpu_xx operations cl
                   ` (3 preceding siblings ...)
  2009-06-17 20:33 ` [this_cpu_xx V2 04/19] Use this_cpu operations for NFS statistics cl
@ 2009-06-17 20:33 ` cl
  2009-06-17 20:33 ` [this_cpu_xx V2 06/19] this_cpu_ptr: Straight transformations cl
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: cl @ 2009-06-17 20:33 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, David Miller, Tejun Heo, mingo, rusty

[-- Attachment #1: this_cpu_net --]
[-- Type: text/plain, Size: 2113 bytes --]

CC: David Miller <davem@davemloft.net>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

---
 include/net/neighbour.h              |    7 +------
 include/net/netfilter/nf_conntrack.h |    9 ++-------
 2 files changed, 3 insertions(+), 13 deletions(-)

Index: linux-2.6/include/net/neighbour.h
===================================================================
--- linux-2.6.orig/include/net/neighbour.h	2009-06-15 17:24:59.000000000 -0500
+++ linux-2.6/include/net/neighbour.h	2009-06-15 17:53:35.000000000 -0500
@@ -89,12 +89,7 @@ struct neigh_statistics
 	unsigned long unres_discards;	/* number of unresolved drops */
 };
 
-#define NEIGH_CACHE_STAT_INC(tbl, field)				\
-	do {								\
-		preempt_disable();					\
-		(per_cpu_ptr((tbl)->stats, smp_processor_id())->field)++; \
-		preempt_enable();					\
-	} while (0)
+#define NEIGH_CACHE_STAT_INC(tbl, field) this_cpu_inc((tbl)->stats->field)
 
 struct neighbour
 {
Index: linux-2.6/include/net/netfilter/nf_conntrack.h
===================================================================
--- linux-2.6.orig/include/net/netfilter/nf_conntrack.h	2009-06-15 17:24:59.000000000 -0500
+++ linux-2.6/include/net/netfilter/nf_conntrack.h	2009-06-15 17:53:35.000000000 -0500
@@ -294,14 +294,9 @@ extern int nf_conntrack_set_hashsize(con
 extern unsigned int nf_conntrack_htable_size;
 extern unsigned int nf_conntrack_max;
 
-#define NF_CT_STAT_INC(net, count)	\
-	(per_cpu_ptr((net)->ct.stat, raw_smp_processor_id())->count++)
+#define NF_CT_STAT_INC(net, count) __this_cpu_inc((net)->ct.stat->count)
 #define NF_CT_STAT_INC_ATOMIC(net, count)		\
-do {							\
-	local_bh_disable();				\
-	per_cpu_ptr((net)->ct.stat, raw_smp_processor_id())->count++;	\
-	local_bh_enable();				\
-} while (0)
+	this_cpu_inc((net)->ct.stat->count)
 
 #define MODULE_ALIAS_NFCT_HELPER(helper) \
         MODULE_ALIAS("nfct-helper-" helper)

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [this_cpu_xx V2 06/19] this_cpu_ptr: Straight transformations
  2009-06-17 20:33 [this_cpu_xx V2 00/19] Introduce this_cpu_xx operations cl
                   ` (4 preceding siblings ...)
  2009-06-17 20:33 ` [this_cpu_xx V2 05/19] use this_cpu ops for network statistics cl
@ 2009-06-17 20:33 ` cl
  2009-06-17 20:33 ` [this_cpu_xx V2 07/19] this_cpu_ptr: Elimninate get/put_cpu cl
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: cl @ 2009-06-17 20:33 UTC (permalink / raw)
  To: akpm
  Cc: linux-mm, David Howells, Tejun Heo, Ingo Molnar, Rusty Russell,
	Eric Dumazet, davem

[-- Attachment #1: this_cpu_ptr_straight_transforms --]
[-- Type: text/plain, Size: 3786 bytes --]

Use this_cpu_ptr and __this_cpu_ptr in locations where straight
transformations are possible because per_cpu_ptr is used with
either smp_processor_id() or raw_smp_processor_id().

cc: David Howells <dhowells@redhat.com>
cc: Tejun Heo <tj@kernel.org>
cc: Ingo Molnar <mingo@elte.hu>
cc: Rusty Russell <rusty@rustcorp.com.au>
cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

---
 drivers/infiniband/hw/ehca/ehca_irq.c |    3 +--
 drivers/net/chelsio/sge.c             |    5 ++---
 drivers/net/loopback.c                |    2 +-
 fs/ext4/mballoc.c                     |    2 +-
 4 files changed, 5 insertions(+), 7 deletions(-)

Index: linux-2.6/drivers/net/chelsio/sge.c
===================================================================
--- linux-2.6.orig/drivers/net/chelsio/sge.c	2009-06-15 17:23:25.000000000 -0500
+++ linux-2.6/drivers/net/chelsio/sge.c	2009-06-15 17:53:41.000000000 -0500
@@ -1378,7 +1378,7 @@ static void sge_rx(struct sge *sge, stru
 	}
 	__skb_pull(skb, sizeof(*p));
 
-	st = per_cpu_ptr(sge->port_stats[p->iff], smp_processor_id());
+	st = this_cpu_ptr(sge->port_stats[p->iff]);
 
 	skb->protocol = eth_type_trans(skb, adapter->port[p->iff].dev);
 	if ((adapter->flags & RX_CSUM_ENABLED) && p->csum == 0xffff &&
@@ -1780,8 +1780,7 @@ int t1_start_xmit(struct sk_buff *skb, s
 {
 	struct adapter *adapter = dev->ml_priv;
 	struct sge *sge = adapter->sge;
-	struct sge_port_stats *st = per_cpu_ptr(sge->port_stats[dev->if_port],
-						smp_processor_id());
+	struct sge_port_stats *st = this_cpu_ptr(sge->port_stats[dev->if_port]);
 	struct cpl_tx_pkt *cpl;
 	struct sk_buff *orig_skb = skb;
 	int ret;
Index: linux-2.6/drivers/net/loopback.c
===================================================================
--- linux-2.6.orig/drivers/net/loopback.c	2009-06-15 17:23:25.000000000 -0500
+++ linux-2.6/drivers/net/loopback.c	2009-06-15 17:54:41.000000000 -0500
@@ -80,7 +80,7 @@ static int loopback_xmit(struct sk_buff 
 
 	/* it's OK to use per_cpu_ptr() because BHs are off */
 	pcpu_lstats = dev->ml_priv;
-	lb_stats = per_cpu_ptr(pcpu_lstats, smp_processor_id());
+	lb_stats = this_cpu_ptr(pcpu_lstats);
 
 	len = skb->len;
 	if (likely(netif_rx(skb) == NET_RX_SUCCESS)) {
Index: linux-2.6/fs/ext4/mballoc.c
===================================================================
--- linux-2.6.orig/fs/ext4/mballoc.c	2009-06-15 17:22:07.000000000 -0500
+++ linux-2.6/fs/ext4/mballoc.c	2009-06-15 17:53:41.000000000 -0500
@@ -4182,7 +4182,7 @@ static void ext4_mb_group_or_file(struct
 	 * per cpu locality group is to reduce the contention between block
 	 * request from multiple CPUs.
 	 */
-	ac->ac_lg = per_cpu_ptr(sbi->s_locality_groups, raw_smp_processor_id());
+	ac->ac_lg = __this_cpu_ptr(sbi->s_locality_groups);
 
 	/* we're going to use group allocation */
 	ac->ac_flags |= EXT4_MB_HINT_GROUP_ALLOC;
Index: linux-2.6/drivers/infiniband/hw/ehca/ehca_irq.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/ehca/ehca_irq.c	2009-06-15 17:22:07.000000000 -0500
+++ linux-2.6/drivers/infiniband/hw/ehca/ehca_irq.c	2009-06-15 17:53:41.000000000 -0500
@@ -826,8 +826,7 @@ static void __cpuinit take_over_work(str
 		cq = list_entry(cct->cq_list.next, struct ehca_cq, entry);
 
 		list_del(&cq->entry);
-		__queue_comp_task(cq, per_cpu_ptr(pool->cpu_comp_tasks,
-						  smp_processor_id()));
+		__queue_comp_task(cq, this_cpu_ptr(pool->cpu_comp_tasks));
 	}
 
 	spin_unlock_irqrestore(&cct->task_lock, flags_cct);

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [this_cpu_xx V2 07/19] this_cpu_ptr: Elimninate get/put_cpu
  2009-06-17 20:33 [this_cpu_xx V2 00/19] Introduce this_cpu_xx operations cl
                   ` (5 preceding siblings ...)
  2009-06-17 20:33 ` [this_cpu_xx V2 06/19] this_cpu_ptr: Straight transformations cl
@ 2009-06-17 20:33 ` cl
  2009-06-17 20:33 ` [this_cpu_xx V2 08/19] this_cpu_ptr: xfs_icsb_modify_counters does not need "cpu" variable cl
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: cl @ 2009-06-17 20:33 UTC (permalink / raw)
  To: akpm
  Cc: linux-mm, Maciej Sosnowski, Dan Williams, Eric Biederman,
	Stephen Hemminger, David L Stevens, Tejun Heo, mingo, rusty,
	davem

[-- Attachment #1: this_cpu_ptr_eliminate_get_put_cpu --]
[-- Type: text/plain, Size: 4676 bytes --]

There are cases where we can use this_cpu_ptr and as the result
of using this_cpu_ptr() we no longer need to determine the
current executing cpu.

In those places no get/put_cpu combination is needed anymore.
The local cpu variable can be eliminated.

Preemption still needs to be disabled and enabled since the
modifications of the per cpu variables is not atomic. There may
be multiple per cpu variables modified and those must all
be from the same processor.

Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
cc: Eric Biederman <ebiederm@aristanetworks.com>
cc: Stephen Hemminger <shemminger@vyatta.com>
cc: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

---
 drivers/dma/dmaengine.c |   36 +++++++++++++-----------------------
 drivers/net/veth.c      |    7 +++----
 2 files changed, 16 insertions(+), 27 deletions(-)

Index: linux-2.6/drivers/dma/dmaengine.c
===================================================================
--- linux-2.6.orig/drivers/dma/dmaengine.c	2009-06-04 13:38:15.000000000 -0500
+++ linux-2.6/drivers/dma/dmaengine.c	2009-06-04 14:19:43.000000000 -0500
@@ -326,14 +326,7 @@ arch_initcall(dma_channel_table_init);
  */
 struct dma_chan *dma_find_channel(enum dma_transaction_type tx_type)
 {
-	struct dma_chan *chan;
-	int cpu;
-
-	cpu = get_cpu();
-	chan = per_cpu_ptr(channel_table[tx_type], cpu)->chan;
-	put_cpu();
-
-	return chan;
+	return this_cpu_read(channel_table[tx_type]->chan);
 }
 EXPORT_SYMBOL(dma_find_channel);
 
@@ -803,7 +796,6 @@ dma_async_memcpy_buf_to_buf(struct dma_c
 	struct dma_async_tx_descriptor *tx;
 	dma_addr_t dma_dest, dma_src;
 	dma_cookie_t cookie;
-	int cpu;
 	unsigned long flags;
 
 	dma_src = dma_map_single(dev->dev, src, len, DMA_TO_DEVICE);
@@ -822,10 +814,10 @@ dma_async_memcpy_buf_to_buf(struct dma_c
 	tx->callback = NULL;
 	cookie = tx->tx_submit(tx);
 
-	cpu = get_cpu();
-	per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
-	per_cpu_ptr(chan->local, cpu)->memcpy_count++;
-	put_cpu();
+	preempt_disable();
+	__this_cpu_add(chan->local->bytes_transferred, len);
+	__this_cpu_inc(chan->local->memcpy_count);
+	preempt_enable();
 
 	return cookie;
 }
@@ -852,7 +844,6 @@ dma_async_memcpy_buf_to_pg(struct dma_ch
 	struct dma_async_tx_descriptor *tx;
 	dma_addr_t dma_dest, dma_src;
 	dma_cookie_t cookie;
-	int cpu;
 	unsigned long flags;
 
 	dma_src = dma_map_single(dev->dev, kdata, len, DMA_TO_DEVICE);
@@ -869,10 +860,10 @@ dma_async_memcpy_buf_to_pg(struct dma_ch
 	tx->callback = NULL;
 	cookie = tx->tx_submit(tx);
 
-	cpu = get_cpu();
-	per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
-	per_cpu_ptr(chan->local, cpu)->memcpy_count++;
-	put_cpu();
+	preempt_disable();
+	__this_cpu_add(chan->local->bytes_transferred, len);
+	__this_cpu_inc(chan->local->memcpy_count);
+	preempt_enable();
 
 	return cookie;
 }
@@ -901,7 +892,6 @@ dma_async_memcpy_pg_to_pg(struct dma_cha
 	struct dma_async_tx_descriptor *tx;
 	dma_addr_t dma_dest, dma_src;
 	dma_cookie_t cookie;
-	int cpu;
 	unsigned long flags;
 
 	dma_src = dma_map_page(dev->dev, src_pg, src_off, len, DMA_TO_DEVICE);
@@ -919,10 +909,10 @@ dma_async_memcpy_pg_to_pg(struct dma_cha
 	tx->callback = NULL;
 	cookie = tx->tx_submit(tx);
 
-	cpu = get_cpu();
-	per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
-	per_cpu_ptr(chan->local, cpu)->memcpy_count++;
-	put_cpu();
+	preempt_disable();
+	__this_cpu_add(chan->local->bytes_transferred, len);
+	__this_cpu_inc(chan->local->memcpy_count);
+	preempt_enable();
 
 	return cookie;
 }
Index: linux-2.6/drivers/net/veth.c
===================================================================
--- linux-2.6.orig/drivers/net/veth.c	2009-06-04 13:38:15.000000000 -0500
+++ linux-2.6/drivers/net/veth.c	2009-06-04 14:18:00.000000000 -0500
@@ -153,7 +153,7 @@ static int veth_xmit(struct sk_buff *skb
 	struct net_device *rcv = NULL;
 	struct veth_priv *priv, *rcv_priv;
 	struct veth_net_stats *stats, *rcv_stats;
-	int length, cpu;
+	int length;
 
 	skb_orphan(skb);
 
@@ -161,9 +161,8 @@ static int veth_xmit(struct sk_buff *skb
 	rcv = priv->peer;
 	rcv_priv = netdev_priv(rcv);
 
-	cpu = smp_processor_id();
-	stats = per_cpu_ptr(priv->stats, cpu);
-	rcv_stats = per_cpu_ptr(rcv_priv->stats, cpu);
+	stats = this_cpu_ptr(priv->stats);
+	rcv_stats = this_cpu_ptr(rcv_priv->stats);
 
 	if (!(rcv->flags & IFF_UP))
 		goto tx_drop;

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [this_cpu_xx V2 08/19] this_cpu_ptr: xfs_icsb_modify_counters does not need "cpu" variable
  2009-06-17 20:33 [this_cpu_xx V2 00/19] Introduce this_cpu_xx operations cl
                   ` (6 preceding siblings ...)
  2009-06-17 20:33 ` [this_cpu_xx V2 07/19] this_cpu_ptr: Elimninate get/put_cpu cl
@ 2009-06-17 20:33 ` cl
  2009-06-17 20:33 ` [this_cpu_xx V2 09/19] Use this_cpu_ptr in crypto subsystem cl
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: cl @ 2009-06-17 20:33 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, Olaf Weber, Tejun Heo, mingo, rusty, davem

[-- Attachment #1: this_cpu_ptr_xfs --]
[-- Type: text/plain, Size: 1665 bytes --]

The xfs_icsb_modify_counters() function no longer needs the cpu variable
if we use this_cpu_ptr() and we can get rid of get/put_cpu().

Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Olaf Weber <olaf@sgi.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

---
 fs/xfs/xfs_mount.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

Index: linux-2.6/fs/xfs/xfs_mount.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_mount.c	2009-06-15 09:08:01.000000000 -0500
+++ linux-2.6/fs/xfs/xfs_mount.c	2009-06-15 14:20:11.000000000 -0500
@@ -2389,12 +2389,12 @@ xfs_icsb_modify_counters(
 {
 	xfs_icsb_cnts_t	*icsbp;
 	long long	lcounter;	/* long counter for 64 bit fields */
-	int		cpu, ret = 0;
+	int		ret = 0;
 
 	might_sleep();
 again:
-	cpu = get_cpu();
-	icsbp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, cpu);
+	preempt_disable();
+	icsbp = this_cpu_ptr(mp->m_sb_cnts);
 
 	/*
 	 * if the counter is disabled, go to slow path
@@ -2438,11 +2438,11 @@ again:
 		break;
 	}
 	xfs_icsb_unlock_cntr(icsbp);
-	put_cpu();
+	preempt_enable();
 	return 0;
 
 slow_path:
-	put_cpu();
+	preempt_enable();
 
 	/*
 	 * serialise with a mutex so we don't burn lots of cpu on
@@ -2490,7 +2490,7 @@ slow_path:
 
 balance_counter:
 	xfs_icsb_unlock_cntr(icsbp);
-	put_cpu();
+	preempt_enable();
 
 	/*
 	 * We may have multiple threads here if multiple per-cpu

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [this_cpu_xx V2 09/19] Use this_cpu_ptr in crypto subsystem
  2009-06-17 20:33 [this_cpu_xx V2 00/19] Introduce this_cpu_xx operations cl
                   ` (7 preceding siblings ...)
  2009-06-17 20:33 ` [this_cpu_xx V2 08/19] this_cpu_ptr: xfs_icsb_modify_counters does not need "cpu" variable cl
@ 2009-06-17 20:33 ` cl
  2009-06-17 20:33 ` [this_cpu_xx V2 10/19] this_cpu: X86 optimized this_cpu operations cl
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: cl @ 2009-06-17 20:33 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, Huang Ying, Tejun Heo, mingo, rusty, davem

[-- Attachment #1: this_cpu_ptr_crypto --]
[-- Type: text/plain, Size: 1138 bytes --]

Just a slight optimization that removes one array lookup.
The processor number is needed for other things as well so the
get/put_cpu cannot be removed.

Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

---
 crypto/cryptd.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/crypto/cryptd.c
===================================================================
--- linux-2.6.orig/crypto/cryptd.c	2009-05-27 11:55:20.000000000 -0500
+++ linux-2.6/crypto/cryptd.c	2009-05-27 11:56:55.000000000 -0500
@@ -93,7 +93,7 @@ static int cryptd_enqueue_request(struct
 	struct cryptd_cpu_queue *cpu_queue;
 
 	cpu = get_cpu();
-	cpu_queue = per_cpu_ptr(queue->cpu_queue, cpu);
+	cpu_queue = this_cpu_ptr(queue->cpu_queue);
 	err = crypto_enqueue_request(&cpu_queue->queue, request);
 	queue_work_on(cpu, kcrypto_wq, &cpu_queue->work);
 	put_cpu();

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [this_cpu_xx V2 10/19] this_cpu: X86 optimized this_cpu operations
  2009-06-17 20:33 [this_cpu_xx V2 00/19] Introduce this_cpu_xx operations cl
                   ` (8 preceding siblings ...)
  2009-06-17 20:33 ` [this_cpu_xx V2 09/19] Use this_cpu_ptr in crypto subsystem cl
@ 2009-06-17 20:33 ` cl
  2009-06-18  3:00   ` Tejun Heo
  2009-06-17 20:33 ` [this_cpu_xx V2 11/19] Use this_cpu ops for VM statistics cl
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: cl @ 2009-06-17 20:33 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, Tejun Heo, mingo, rusty, davem

[-- Attachment #1: this_cpu_x86_ops --]
[-- Type: text/plain, Size: 2479 bytes --]

Basically the existing percpu ops can be used. However, we do not pass a
reference to a percpu variable in. Instead an address of a percpu variable
is provided.

Both preempt, the non preempt and the irqsafe operations generate the same code.

Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

---
 arch/x86/include/asm/percpu.h |   22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

Index: linux-2.6/arch/x86/include/asm/percpu.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/percpu.h	2009-06-04 13:38:01.000000000 -0500
+++ linux-2.6/arch/x86/include/asm/percpu.h	2009-06-04 14:21:22.000000000 -0500
@@ -140,6 +140,28 @@ do {							\
 #define percpu_or(var, val)	percpu_to_op("or", per_cpu__##var, val)
 #define percpu_xor(var, val)	percpu_to_op("xor", per_cpu__##var, val)
 
+#define __this_cpu_read(pcp)		percpu_from_op("mov", pcp)
+#define __this_cpu_write(pcp, val)	percpu_to_op("mov", (pcp), val)
+#define __this_cpu_add(pcp, val)	percpu_to_op("add", (pcp), val)
+#define __this_cpu_sub(pcp, val)	percpu_to_op("sub", (pcp), val)
+#define __this_cpu_and(pcp, val)	percpu_to_op("and", (pcp), val)
+#define __this_cpu_or(pcp, val)		percpu_to_op("or", (pcp), val)
+#define __this_cpu_xor(pcp, val)	percpu_to_op("xor", (pcp), val)
+
+#define this_cpu_read(pcp)		percpu_from_op("mov", (pcp))
+#define this_cpu_write(pcp, val)	percpu_to_op("mov", (pcp), val)
+#define this_cpu_add(pcp, val)		percpu_to_op("add", (pcp), val)
+#define this_cpu_sub(pcp, val)		percpu_to_op("sub", (pcp), val)
+#define this_cpu_and(pcp, val)		percpu_to_op("and", (pcp), val)
+#define this_cpu_or(pcp, val)		percpu_to_op("or", (pcp), val)
+#define this_cpu_xor(pcp, val)		percpu_to_op("xor", (pcp), val)
+
+#define irqsafe_cpu_add(pcp, val)	percpu_to_op("add", (pcp), val)
+#define irqsafe_cpu_sub(pcp, val)	percpu_to_op("sub", (pcp), val)
+#define irqsafe_cpu_and(pcp, val)	percpu_to_op("and", (pcp), val)
+#define irqsafe_cpu_or(pcp, val)	percpu_to_op("or", (pcp), val)
+#define irqsafe_cpu_xor(pcp, val)	percpu_to_op("xor", (pcp), val)
+
 /* This is not atomic against other CPUs -- CPU preemption needs to be off */
 #define x86_test_and_clear_bit_percpu(bit, var)				\
 ({									\

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [this_cpu_xx V2 11/19] Use this_cpu ops for VM statistics.
  2009-06-17 20:33 [this_cpu_xx V2 00/19] Introduce this_cpu_xx operations cl
                   ` (9 preceding siblings ...)
  2009-06-17 20:33 ` [this_cpu_xx V2 10/19] this_cpu: X86 optimized this_cpu operations cl
@ 2009-06-17 20:33 ` cl
  2009-06-18  3:05   ` Tejun Heo
  2009-06-17 20:33 ` [this_cpu_xx V2 12/19] RCU: Use this_cpu operations cl
                   ` (7 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: cl @ 2009-06-17 20:33 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, Tejun Heo, mingo, rusty, davem

[-- Attachment #1: this_cpu_vmstats --]
[-- Type: text/plain, Size: 1534 bytes --]

Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

---
 include/linux/vmstat.h |   10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

Index: linux-2.6/include/linux/vmstat.h
===================================================================
--- linux-2.6.orig/include/linux/vmstat.h	2009-06-11 10:50:59.000000000 -0500
+++ linux-2.6/include/linux/vmstat.h	2009-06-11 11:10:48.000000000 -0500
@@ -75,24 +75,22 @@ DECLARE_PER_CPU(struct vm_event_state, v
 
 static inline void __count_vm_event(enum vm_event_item item)
 {
-	__get_cpu_var(vm_event_states).event[item]++;
+	__this_cpu_inc(per_cpu_var(vm_event_states).event[item]);
 }
 
 static inline void count_vm_event(enum vm_event_item item)
 {
-	get_cpu_var(vm_event_states).event[item]++;
-	put_cpu();
+	this_cpu_inc(per_cpu_var(vm_event_states).event[item]);
 }
 
 static inline void __count_vm_events(enum vm_event_item item, long delta)
 {
-	__get_cpu_var(vm_event_states).event[item] += delta;
+	__this_cpu_add(per_cpu_var(vm_event_states).event[item], delta);
 }
 
 static inline void count_vm_events(enum vm_event_item item, long delta)
 {
-	get_cpu_var(vm_event_states).event[item] += delta;
-	put_cpu();
+	this_cpu_add(per_cpu_var(vm_event_states).event[item], delta);
 }
 
 extern void all_vm_events(unsigned long *);

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [this_cpu_xx V2 12/19] RCU: Use this_cpu operations
  2009-06-17 20:33 [this_cpu_xx V2 00/19] Introduce this_cpu_xx operations cl
                   ` (10 preceding siblings ...)
  2009-06-17 20:33 ` [this_cpu_xx V2 11/19] Use this_cpu ops for VM statistics cl
@ 2009-06-17 20:33 ` cl
  2009-06-17 20:33 ` [this_cpu_xx V2 13/19] Use this_cpu operations in slub cl
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: cl @ 2009-06-17 20:33 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, Paul E. McKenney, Tejun Heo, mingo, rusty, davem

[-- Attachment #1: this_cpu_rcu --]
[-- Type: text/plain, Size: 3107 bytes --]

RCU does not do dynamic allocations but it increments per cpu variables
a lot. These instructions results in a move to a register and then back
to memory. This patch will make it use the inc/dec instructions on x86
that do not need a register.

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

---
 kernel/rcupreempt.c |    4 ++--
 kernel/rcutorture.c |    8 ++++----
 2 files changed, 6 insertions(+), 6 deletions(-)

Index: linux-2.6/kernel/rcutorture.c
===================================================================
--- linux-2.6.orig/kernel/rcutorture.c	2009-06-04 14:26:42.000000000 -0500
+++ linux-2.6/kernel/rcutorture.c	2009-06-04 14:38:05.000000000 -0500
@@ -709,13 +709,13 @@ static void rcu_torture_timer(unsigned l
 		/* Should not happen, but... */
 		pipe_count = RCU_TORTURE_PIPE_LEN;
 	}
-	++__get_cpu_var(rcu_torture_count)[pipe_count];
+	__this_cpu_inc(per_cpu_var(rcu_torture_count)[pipe_count]);
 	completed = cur_ops->completed() - completed;
 	if (completed > RCU_TORTURE_PIPE_LEN) {
 		/* Should not happen, but... */
 		completed = RCU_TORTURE_PIPE_LEN;
 	}
-	++__get_cpu_var(rcu_torture_batch)[completed];
+	__this_cpu_inc(per_cpu_var(rcu_torture_batch)[completed]);
 	preempt_enable();
 	cur_ops->readunlock(idx);
 }
@@ -764,13 +764,13 @@ rcu_torture_reader(void *arg)
 			/* Should not happen, but... */
 			pipe_count = RCU_TORTURE_PIPE_LEN;
 		}
-		++__get_cpu_var(rcu_torture_count)[pipe_count];
+		__this_cpu_inc(per_cpu_var(rcu_torture_count)[pipe_count]);
 		completed = cur_ops->completed() - completed;
 		if (completed > RCU_TORTURE_PIPE_LEN) {
 			/* Should not happen, but... */
 			completed = RCU_TORTURE_PIPE_LEN;
 		}
-		++__get_cpu_var(rcu_torture_batch)[completed];
+		__this_cpu_inc(per_cpu_var(rcu_torture_batch)[completed]);
 		preempt_enable();
 		cur_ops->readunlock(idx);
 		schedule();
Index: linux-2.6/kernel/rcupreempt.c
===================================================================
--- linux-2.6.orig/kernel/rcupreempt.c	2009-06-04 14:28:53.000000000 -0500
+++ linux-2.6/kernel/rcupreempt.c	2009-06-04 14:39:35.000000000 -0500
@@ -173,7 +173,7 @@ void rcu_enter_nohz(void)
 	static DEFINE_RATELIMIT_STATE(rs, 10 * HZ, 1);
 
 	smp_mb(); /* CPUs seeing ++ must see prior RCU read-side crit sects */
-	__get_cpu_var(rcu_dyntick_sched).dynticks++;
+	__this_cpu_inc(per_cpu_var(rcu_dyntick_sched).dynticks);
 	WARN_ON_RATELIMIT(__get_cpu_var(rcu_dyntick_sched).dynticks & 0x1, &rs);
 }
 
@@ -181,7 +181,7 @@ void rcu_exit_nohz(void)
 {
 	static DEFINE_RATELIMIT_STATE(rs, 10 * HZ, 1);
 
-	__get_cpu_var(rcu_dyntick_sched).dynticks++;
+	__this_cpu_inc(per_cpu_var(rcu_dyntick_sched).dynticks);
 	smp_mb(); /* CPUs seeing ++ must see later RCU read-side crit sects */
 	WARN_ON_RATELIMIT(!(__get_cpu_var(rcu_dyntick_sched).dynticks & 0x1),
 				&rs);

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [this_cpu_xx V2 13/19] Use this_cpu operations in slub
  2009-06-17 20:33 [this_cpu_xx V2 00/19] Introduce this_cpu_xx operations cl
                   ` (11 preceding siblings ...)
  2009-06-17 20:33 ` [this_cpu_xx V2 12/19] RCU: Use this_cpu operations cl
@ 2009-06-17 20:33 ` cl
  2009-06-18  6:20   ` Pekka Enberg
  2009-06-17 20:33 ` [this_cpu_xx V2 14/19] this_cpu: Remove slub kmem_cache fields cl
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: cl @ 2009-06-17 20:33 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, Pekka Enberg, Tejun Heo, mingo, rusty, davem

[-- Attachment #1: this_cpu_slub_conversion --]
[-- Type: text/plain, Size: 13526 bytes --]

Using per cpu allocations removes the needs for the per cpu arrays in the
kmem_cache struct. These could get quite big if we have to support system
of up to thousands of cpus. The use of this_cpu operations results in:

1. The size of kmem_cache for SMP configuration shrinks since we will only
   need 1 pointer instead of NR_CPUS. The same pointer can be used by all
   processors. Reduces cache footprint of the allocator.

2. We can dynamically size kmem_cache according to the actual nodes in the
   system meaning less memory overhead for configurations that may potentially
   support up to 1k NUMA nodes / 4k cpus.

3. We can remove the diddle widdle with allocating and releasing of
   kmem_cache_cpu structures when bringing up and shutting down cpus. The cpu
   alloc logic will do it all for us. Removes some portions of the cpu hotplug
   functionality.

4. Fastpath performance increases.

A particular problem for the dynamic dma kmalloc slab creation is that the
new percpu allocator cannot be called from an atomic context. The solution
adopted here for the atomic context is to track spare elements in the per
cpu kmem_cache array for non dma kmallocs. Use them if necessary for dma
cache creation from an atomic context. Otherwise we just fail the allocation.

If this gets too iffy then we need to just get rid of dynamic dma cache
creation and have a full dma kmalloc array. Or fix up the percpu allocator
so that it supports allocations from an atomic context.

Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

---
 include/linux/slub_def.h |    6 -
 mm/slub.c                |  205 ++++++++++-------------------------------------
 2 files changed, 48 insertions(+), 163 deletions(-)

Index: linux-2.6/include/linux/slub_def.h
===================================================================
--- linux-2.6.orig/include/linux/slub_def.h	2009-06-17 14:06:22.000000000 -0500
+++ linux-2.6/include/linux/slub_def.h	2009-06-17 14:11:15.000000000 -0500
@@ -68,6 +68,7 @@ struct kmem_cache_order_objects {
  * Slab cache management.
  */
 struct kmem_cache {
+	struct kmem_cache_cpu *cpu_slab;
 	/* Used for retriving partial slabs etc */
 	unsigned long flags;
 	int size;		/* The size of an object including meta data */
@@ -103,11 +104,6 @@ struct kmem_cache {
 	int remote_node_defrag_ratio;
 	struct kmem_cache_node *node[MAX_NUMNODES];
 #endif
-#ifdef CONFIG_SMP
-	struct kmem_cache_cpu *cpu_slab[NR_CPUS];
-#else
-	struct kmem_cache_cpu cpu_slab;
-#endif
 };
 
 /*
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2009-06-17 14:10:38.000000000 -0500
+++ linux-2.6/mm/slub.c	2009-06-17 14:11:15.000000000 -0500
@@ -242,15 +242,6 @@ static inline struct kmem_cache_node *ge
 #endif
 }
 
-static inline struct kmem_cache_cpu *get_cpu_slab(struct kmem_cache *s, int cpu)
-{
-#ifdef CONFIG_SMP
-	return s->cpu_slab[cpu];
-#else
-	return &s->cpu_slab;
-#endif
-}
-
 /* Verify that a pointer has an address that is valid within a slab page */
 static inline int check_valid_pointer(struct kmem_cache *s,
 				struct page *page, const void *object)
@@ -1106,7 +1097,7 @@ static struct page *allocate_slab(struct
 		if (!page)
 			return NULL;
 
-		stat(get_cpu_slab(s, raw_smp_processor_id()), ORDER_FALLBACK);
+		stat(this_cpu_ptr(s->cpu_slab), ORDER_FALLBACK);
 	}
 
 	if (kmemcheck_enabled
@@ -1405,7 +1396,7 @@ static struct page *get_partial(struct k
 static void unfreeze_slab(struct kmem_cache *s, struct page *page, int tail)
 {
 	struct kmem_cache_node *n = get_node(s, page_to_nid(page));
-	struct kmem_cache_cpu *c = get_cpu_slab(s, smp_processor_id());
+	struct kmem_cache_cpu *c = this_cpu_ptr(s->cpu_slab);
 
 	__ClearPageSlubFrozen(page);
 	if (page->inuse) {
@@ -1437,7 +1428,7 @@ static void unfreeze_slab(struct kmem_ca
 			slab_unlock(page);
 		} else {
 			slab_unlock(page);
-			stat(get_cpu_slab(s, raw_smp_processor_id()), FREE_SLAB);
+			stat(__this_cpu_ptr(s->cpu_slab), FREE_SLAB);
 			discard_slab(s, page);
 		}
 	}
@@ -1490,7 +1481,7 @@ static inline void flush_slab(struct kme
  */
 static inline void __flush_cpu_slab(struct kmem_cache *s, int cpu)
 {
-	struct kmem_cache_cpu *c = get_cpu_slab(s, cpu);
+	struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);
 
 	if (likely(c && c->page))
 		flush_slab(s, c);
@@ -1604,9 +1595,6 @@ static void *__slab_alloc(struct kmem_ca
 	void **object;
 	struct page *new;
 
-	/* We handle __GFP_ZERO in the caller */
-	gfpflags &= ~__GFP_ZERO;
-
 	if (!c->page)
 		goto new_slab;
 
@@ -1652,7 +1640,7 @@ new_slab:
 		local_irq_disable();
 
 	if (new) {
-		c = get_cpu_slab(s, smp_processor_id());
+		c = __this_cpu_ptr(s->cpu_slab);
 		stat(c, ALLOC_SLAB);
 		if (c->page)
 			flush_slab(s, c);
@@ -1690,7 +1678,6 @@ static __always_inline void *slab_alloc(
 	void **object;
 	struct kmem_cache_cpu *c;
 	unsigned long flags;
-	unsigned int objsize;
 
 	gfpflags &= slab_gfp_mask;
 
@@ -1701,24 +1688,23 @@ static __always_inline void *slab_alloc(
 		return NULL;
 
 	local_irq_save(flags);
-	c = get_cpu_slab(s, smp_processor_id());
-	objsize = c->objsize;
-	if (unlikely(!c->freelist || !node_match(c, node)))
+	c = __this_cpu_ptr(s->cpu_slab);
+	object = c->freelist;
+	if (unlikely(!object || !node_match(c, node)))
 
 		object = __slab_alloc(s, gfpflags, node, addr, c);
 
 	else {
-		object = c->freelist;
 		c->freelist = object[c->offset];
 		stat(c, ALLOC_FASTPATH);
 	}
 	local_irq_restore(flags);
 
 	if (unlikely((gfpflags & __GFP_ZERO) && object))
-		memset(object, 0, objsize);
+		memset(object, 0, s->objsize);
 
 	kmemcheck_slab_alloc(s, gfpflags, object, c->objsize);
-	kmemleak_alloc_recursive(object, objsize, 1, s->flags, gfpflags);
+	kmemleak_alloc_recursive(object, c->objsize, 1, s->flags, gfpflags);
 
 	return object;
 }
@@ -1779,7 +1765,7 @@ static void __slab_free(struct kmem_cach
 	void **object = (void *)x;
 	struct kmem_cache_cpu *c;
 
-	c = get_cpu_slab(s, raw_smp_processor_id());
+	c = __this_cpu_ptr(s->cpu_slab);
 	stat(c, FREE_SLOWPATH);
 	slab_lock(page);
 
@@ -1851,7 +1837,7 @@ static __always_inline void slab_free(st
 
 	kmemleak_free_recursive(x, s->flags);
 	local_irq_save(flags);
-	c = get_cpu_slab(s, smp_processor_id());
+	c = __this_cpu_ptr(s->cpu_slab);
 	kmemcheck_slab_free(s, object, c->objsize);
 	debug_check_no_locks_freed(object, c->objsize);
 	if (!(s->flags & SLAB_DEBUG_OBJECTS))
@@ -2074,130 +2060,28 @@ init_kmem_cache_node(struct kmem_cache_n
 #endif
 }
 
-#ifdef CONFIG_SMP
-/*
- * Per cpu array for per cpu structures.
- *
- * The per cpu array places all kmem_cache_cpu structures from one processor
- * close together meaning that it becomes possible that multiple per cpu
- * structures are contained in one cacheline. This may be particularly
- * beneficial for the kmalloc caches.
- *
- * A desktop system typically has around 60-80 slabs. With 100 here we are
- * likely able to get per cpu structures for all caches from the array defined
- * here. We must be able to cover all kmalloc caches during bootstrap.
- *
- * If the per cpu array is exhausted then fall back to kmalloc
- * of individual cachelines. No sharing is possible then.
- */
-#define NR_KMEM_CACHE_CPU 100
-
-static DEFINE_PER_CPU(struct kmem_cache_cpu,
-				kmem_cache_cpu)[NR_KMEM_CACHE_CPU];
-
-static DEFINE_PER_CPU(struct kmem_cache_cpu *, kmem_cache_cpu_free);
-static DECLARE_BITMAP(kmem_cach_cpu_free_init_once, CONFIG_NR_CPUS);
-
-static struct kmem_cache_cpu *alloc_kmem_cache_cpu(struct kmem_cache *s,
-							int cpu, gfp_t flags)
-{
-	struct kmem_cache_cpu *c = per_cpu(kmem_cache_cpu_free, cpu);
-
-	if (c)
-		per_cpu(kmem_cache_cpu_free, cpu) =
-				(void *)c->freelist;
-	else {
-		/* Table overflow: So allocate ourselves */
-		c = kmalloc_node(
-			ALIGN(sizeof(struct kmem_cache_cpu), cache_line_size()),
-			flags, cpu_to_node(cpu));
-		if (!c)
-			return NULL;
-	}
-
-	init_kmem_cache_cpu(s, c);
-	return c;
-}
-
-static void free_kmem_cache_cpu(struct kmem_cache_cpu *c, int cpu)
-{
-	if (c < per_cpu(kmem_cache_cpu, cpu) ||
-			c >= per_cpu(kmem_cache_cpu, cpu) + NR_KMEM_CACHE_CPU) {
-		kfree(c);
-		return;
-	}
-	c->freelist = (void *)per_cpu(kmem_cache_cpu_free, cpu);
-	per_cpu(kmem_cache_cpu_free, cpu) = c;
-}
-
-static void free_kmem_cache_cpus(struct kmem_cache *s)
-{
-	int cpu;
-
-	for_each_online_cpu(cpu) {
-		struct kmem_cache_cpu *c = get_cpu_slab(s, cpu);
-
-		if (c) {
-			s->cpu_slab[cpu] = NULL;
-			free_kmem_cache_cpu(c, cpu);
-		}
-	}
-}
+static DEFINE_PER_CPU(struct kmem_cache_cpu, kmalloc_percpu[SLUB_PAGE_SHIFT]);
 
 static int alloc_kmem_cache_cpus(struct kmem_cache *s, gfp_t flags)
 {
 	int cpu;
 
-	for_each_online_cpu(cpu) {
-		struct kmem_cache_cpu *c = get_cpu_slab(s, cpu);
-
-		if (c)
-			continue;
-
-		c = alloc_kmem_cache_cpu(s, cpu, flags);
-		if (!c) {
-			free_kmem_cache_cpus(s);
-			return 0;
-		}
-		s->cpu_slab[cpu] = c;
-	}
-	return 1;
-}
-
-/*
- * Initialize the per cpu array.
- */
-static void init_alloc_cpu_cpu(int cpu)
-{
-	int i;
-
-	if (cpumask_test_cpu(cpu, to_cpumask(kmem_cach_cpu_free_init_once)))
-		return;
-
-	for (i = NR_KMEM_CACHE_CPU - 1; i >= 0; i--)
-		free_kmem_cache_cpu(&per_cpu(kmem_cache_cpu, cpu)[i], cpu);
-
-	cpumask_set_cpu(cpu, to_cpumask(kmem_cach_cpu_free_init_once));
-}
-
-static void __init init_alloc_cpu(void)
-{
-	int cpu;
-
-	for_each_online_cpu(cpu)
-		init_alloc_cpu_cpu(cpu);
-  }
+	if (s < kmalloc_caches + SLUB_PAGE_SHIFT && s >= kmalloc_caches)
+		/*
+		 * Boot time creation of the kmalloc array. Use static per cpu data
+		 * since the per cpu allocator is not available yet.
+		 */
+		s->cpu_slab = per_cpu_var(kmalloc_percpu) + (s - kmalloc_caches);
+	else
+		s->cpu_slab =  alloc_percpu(struct kmem_cache_cpu);
 
-#else
-static inline void free_kmem_cache_cpus(struct kmem_cache *s) {}
-static inline void init_alloc_cpu(void) {}
+	if (!s->cpu_slab)
+		return 0;
 
-static inline int alloc_kmem_cache_cpus(struct kmem_cache *s, gfp_t flags)
-{
-	init_kmem_cache_cpu(s, &s->cpu_slab);
+	for_each_possible_cpu(cpu)
+		init_kmem_cache_cpu(s, per_cpu_ptr(s->cpu_slab, cpu));
 	return 1;
 }
-#endif
 
 #ifdef CONFIG_NUMA
 /*
@@ -2575,9 +2459,8 @@ static inline int kmem_cache_close(struc
 	int node;
 
 	flush_all(s);
-
+	free_percpu(s->cpu_slab);
 	/* Attempt to free all objects */
-	free_kmem_cache_cpus(s);
 	for_each_node_state(node, N_NORMAL_MEMORY) {
 		struct kmem_cache_node *n = get_node(s, node);
 
@@ -2724,7 +2607,19 @@ static noinline struct kmem_cache *dma_k
 	realsize = kmalloc_caches[index].objsize;
 	text = kasprintf(flags & ~SLUB_DMA, "kmalloc_dma-%d",
 			 (unsigned int)realsize);
-	s = kmalloc(kmem_size, flags & ~SLUB_DMA);
+
+	if (flags & __GFP_WAIT)
+		s = kmalloc(kmem_size, flags & ~SLUB_DMA);
+	else {
+		int i;
+
+		s = NULL;
+		for (i = 0; i < SLUB_PAGE_SHIFT; i++)
+			if (kmalloc_caches[i].size) {
+				s = kmalloc_caches + i;
+				break;
+			}
+	}
 
 	/*
 	 * Must defer sysfs creation to a workqueue because we don't know
@@ -3132,8 +3027,6 @@ void __init kmem_cache_init(void)
 	int i;
 	int caches = 0;
 
-	init_alloc_cpu();
-
 #ifdef CONFIG_NUMA
 	/*
 	 * Must first have the slab cache available for the allocations of the
@@ -3204,8 +3097,10 @@ void __init kmem_cache_init(void)
 
 #ifdef CONFIG_SMP
 	register_cpu_notifier(&slab_notifier);
-	kmem_size = offsetof(struct kmem_cache, cpu_slab) +
-				nr_cpu_ids * sizeof(struct kmem_cache_cpu *);
+#endif
+#ifdef CONFIG_NUMA
+	kmem_size = offsetof(struct kmem_cache, node) +
+				nr_node_ids * sizeof(struct kmem_cache_node *);
 #else
 	kmem_size = sizeof(struct kmem_cache);
 #endif
@@ -3309,7 +3204,7 @@ struct kmem_cache *kmem_cache_create(con
 		 * per cpu structures
 		 */
 		for_each_online_cpu(cpu)
-			get_cpu_slab(s, cpu)->objsize = s->objsize;
+			per_cpu_ptr(s->cpu_slab, cpu)->objsize = s->objsize;
 
 		s->inuse = max_t(int, s->inuse, ALIGN(size, sizeof(void *)));
 		up_write(&slub_lock);
@@ -3366,11 +3261,9 @@ static int __cpuinit slab_cpuup_callback
 	switch (action) {
 	case CPU_UP_PREPARE:
 	case CPU_UP_PREPARE_FROZEN:
-		init_alloc_cpu_cpu(cpu);
 		down_read(&slub_lock);
 		list_for_each_entry(s, &slab_caches, list)
-			s->cpu_slab[cpu] = alloc_kmem_cache_cpu(s, cpu,
-							GFP_KERNEL);
+			init_kmem_cache_cpu(s, per_cpu_ptr(s->cpu_slab, cpu));
 		up_read(&slub_lock);
 		break;
 
@@ -3380,13 +3273,9 @@ static int __cpuinit slab_cpuup_callback
 	case CPU_DEAD_FROZEN:
 		down_read(&slub_lock);
 		list_for_each_entry(s, &slab_caches, list) {
-			struct kmem_cache_cpu *c = get_cpu_slab(s, cpu);
-
 			local_irq_save(flags);
 			__flush_cpu_slab(s, cpu);
 			local_irq_restore(flags);
-			free_kmem_cache_cpu(c, cpu);
-			s->cpu_slab[cpu] = NULL;
 		}
 		up_read(&slub_lock);
 		break;
@@ -3872,7 +3761,7 @@ static ssize_t show_slab_objects(struct 
 		int cpu;
 
 		for_each_possible_cpu(cpu) {
-			struct kmem_cache_cpu *c = get_cpu_slab(s, cpu);
+			struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);
 
 			if (!c || c->node < 0)
 				continue;
@@ -4297,7 +4186,7 @@ static int show_stat(struct kmem_cache *
 		return -ENOMEM;
 
 	for_each_online_cpu(cpu) {
-		unsigned x = get_cpu_slab(s, cpu)->stat[si];
+		unsigned x = per_cpu_ptr(s->cpu_slab, cpu)->stat[si];
 
 		data[cpu] = x;
 		sum += x;

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [this_cpu_xx V2 14/19] this_cpu: Remove slub kmem_cache fields
  2009-06-17 20:33 [this_cpu_xx V2 00/19] Introduce this_cpu_xx operations cl
                   ` (12 preceding siblings ...)
  2009-06-17 20:33 ` [this_cpu_xx V2 13/19] Use this_cpu operations in slub cl
@ 2009-06-17 20:33 ` cl
  2009-06-17 20:33 ` [this_cpu_xx V2 15/19] Make slub statistics use this_cpu_inc cl
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: cl @ 2009-06-17 20:33 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, Pekka Enberg, Tejun Heo, mingo, rusty, davem

[-- Attachment #1: this_cpu_slub_remove_fields --]
[-- Type: text/plain, Size: 7997 bytes --]

Remove the fields in kmem_cache_cpu that were used to cache data from
kmem_cache when they were in different cachelines. The cacheline that holds
the per cpu array pointer now also holds these values. We can cut down the
struct kmem_cache_cpu size to almost half.

The get_freepointer() and set_freepointer() functions that used to be only
intended for the slow path now are also useful for the hot path since access
to the field does not require accessing an additional cacheline anymore. This
results in consistent use of setting the freepointer for objects throughout
SLUB.

Also we initialize all possible kmem_cache_cpu structures when a slab is
created. No need to initialize them when a processor or node comes online.

Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
---
 include/linux/slub_def.h |    2 -
 mm/slub.c                |   81 +++++++++++++----------------------------------
 2 files changed, 24 insertions(+), 59 deletions(-)

Index: linux-2.6/include/linux/slub_def.h
===================================================================
--- linux-2.6.orig/include/linux/slub_def.h	2009-06-17 14:11:15.000000000 -0500
+++ linux-2.6/include/linux/slub_def.h	2009-06-17 14:11:20.000000000 -0500
@@ -37,8 +37,6 @@ struct kmem_cache_cpu {
 	void **freelist;	/* Pointer to first free per cpu object */
 	struct page *page;	/* The slab from which we are allocating */
 	int node;		/* The node of the page (or -1 for debug) */
-	unsigned int offset;	/* Freepointer offset (in word units) */
-	unsigned int objsize;	/* Size of an object (from kmem_cache) */
 #ifdef CONFIG_SLUB_STATS
 	unsigned stat[NR_SLUB_STAT_ITEMS];
 #endif
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2009-06-17 14:11:15.000000000 -0500
+++ linux-2.6/mm/slub.c	2009-06-17 14:11:20.000000000 -0500
@@ -260,13 +260,6 @@ static inline int check_valid_pointer(st
 	return 1;
 }
 
-/*
- * Slow version of get and set free pointer.
- *
- * This version requires touching the cache lines of kmem_cache which
- * we avoid to do in the fast alloc free paths. There we obtain the offset
- * from the page struct.
- */
 static inline void *get_freepointer(struct kmem_cache *s, void *object)
 {
 	return *(void **)(object + s->offset);
@@ -1456,10 +1449,10 @@ static void deactivate_slab(struct kmem_
 
 		/* Retrieve object from cpu_freelist */
 		object = c->freelist;
-		c->freelist = c->freelist[c->offset];
+		c->freelist = get_freepointer(s, c->freelist);
 
 		/* And put onto the regular freelist */
-		object[c->offset] = page->freelist;
+		set_freepointer(s, object, page->freelist);
 		page->freelist = object;
 		page->inuse--;
 	}
@@ -1611,7 +1604,7 @@ load_freelist:
 	if (unlikely(SLABDEBUG && PageSlubDebug(c->page)))
 		goto debug;
 
-	c->freelist = object[c->offset];
+	c->freelist = get_freepointer(s, object);
 	c->page->inuse = c->page->objects;
 	c->page->freelist = NULL;
 	c->node = page_to_nid(c->page);
@@ -1657,7 +1650,7 @@ debug:
 		goto another_slab;
 
 	c->page->inuse++;
-	c->page->freelist = object[c->offset];
+	c->page->freelist = get_freepointer(s, object);
 	c->node = -1;
 	goto unlock_out;
 }
@@ -1695,7 +1688,7 @@ static __always_inline void *slab_alloc(
 		object = __slab_alloc(s, gfpflags, node, addr, c);
 
 	else {
-		c->freelist = object[c->offset];
+		c->freelist = get_freepointer(s, object);
 		stat(c, ALLOC_FASTPATH);
 	}
 	local_irq_restore(flags);
@@ -1703,8 +1696,8 @@ static __always_inline void *slab_alloc(
 	if (unlikely((gfpflags & __GFP_ZERO) && object))
 		memset(object, 0, s->objsize);
 
-	kmemcheck_slab_alloc(s, gfpflags, object, c->objsize);
-	kmemleak_alloc_recursive(object, c->objsize, 1, s->flags, gfpflags);
+	kmemcheck_slab_alloc(s, gfpflags, object, s->objsize);
+	kmemleak_alloc_recursive(object, s->objsize, 1, s->flags, gfpflags);
 
 	return object;
 }
@@ -1759,7 +1752,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_node_notr
  * handling required then we can return immediately.
  */
 static void __slab_free(struct kmem_cache *s, struct page *page,
-			void *x, unsigned long addr, unsigned int offset)
+			void *x, unsigned long addr)
 {
 	void *prior;
 	void **object = (void *)x;
@@ -1773,7 +1766,8 @@ static void __slab_free(struct kmem_cach
 		goto debug;
 
 checks_ok:
-	prior = object[offset] = page->freelist;
+	prior = page->freelist;
+	set_freepointer(s, object, prior);
 	page->freelist = object;
 	page->inuse--;
 
@@ -1838,16 +1832,16 @@ static __always_inline void slab_free(st
 	kmemleak_free_recursive(x, s->flags);
 	local_irq_save(flags);
 	c = __this_cpu_ptr(s->cpu_slab);
-	kmemcheck_slab_free(s, object, c->objsize);
-	debug_check_no_locks_freed(object, c->objsize);
+	kmemcheck_slab_free(s, object, s->objsize);
+	debug_check_no_locks_freed(object, s->objsize);
 	if (!(s->flags & SLAB_DEBUG_OBJECTS))
-		debug_check_no_obj_freed(object, c->objsize);
+		debug_check_no_obj_freed(object, s->objsize);
 	if (likely(page == c->page && c->node >= 0)) {
-		object[c->offset] = c->freelist;
+		set_freepointer(s, object, c->freelist);
 		c->freelist = object;
 		stat(c, FREE_FASTPATH);
 	} else
-		__slab_free(s, page, x, addr, c->offset);
+		__slab_free(s, page, x, addr);
 
 	local_irq_restore(flags);
 }
@@ -2034,19 +2028,6 @@ static unsigned long calculate_alignment
 	return ALIGN(align, sizeof(void *));
 }
 
-static void init_kmem_cache_cpu(struct kmem_cache *s,
-			struct kmem_cache_cpu *c)
-{
-	c->page = NULL;
-	c->freelist = NULL;
-	c->node = 0;
-	c->offset = s->offset / sizeof(void *);
-	c->objsize = s->objsize;
-#ifdef CONFIG_SLUB_STATS
-	memset(c->stat, 0, NR_SLUB_STAT_ITEMS * sizeof(unsigned));
-#endif
-}
-
 static void
 init_kmem_cache_node(struct kmem_cache_node *n, struct kmem_cache *s)
 {
@@ -2064,8 +2045,6 @@ static DEFINE_PER_CPU(struct kmem_cache_
 
 static int alloc_kmem_cache_cpus(struct kmem_cache *s, gfp_t flags)
 {
-	int cpu;
-
 	if (s < kmalloc_caches + SLUB_PAGE_SHIFT && s >= kmalloc_caches)
 		/*
 		 * Boot time creation of the kmalloc array. Use static per cpu data
@@ -2078,8 +2057,6 @@ static int alloc_kmem_cache_cpus(struct 
 	if (!s->cpu_slab)
 		return 0;
 
-	for_each_possible_cpu(cpu)
-		init_kmem_cache_cpu(s, per_cpu_ptr(s->cpu_slab, cpu));
 	return 1;
 }
 
@@ -2350,8 +2327,16 @@ static int kmem_cache_open(struct kmem_c
 	if (!init_kmem_cache_nodes(s, gfpflags & ~SLUB_DMA))
 		goto error;
 
-	if (alloc_kmem_cache_cpus(s, gfpflags & ~SLUB_DMA))
+	if (!alloc_kmem_cache_cpus(s, gfpflags & ~SLUB_DMA))
+		return 0;
+
+	/*
+	 * gfp_flags would be flags & ~SLUB_DMA but the per cpu
+	 * allocator does not support it.
+	 */
+	if (s->cpu_slab)
 		return 1;
+
 	free_kmem_cache_nodes(s);
 error:
 	if (flags & SLAB_PANIC)
@@ -3190,22 +3175,12 @@ struct kmem_cache *kmem_cache_create(con
 	down_write(&slub_lock);
 	s = find_mergeable(size, align, flags, name, ctor);
 	if (s) {
-		int cpu;
-
 		s->refcount++;
 		/*
 		 * Adjust the object sizes so that we clear
 		 * the complete object on kzalloc.
 		 */
 		s->objsize = max(s->objsize, (int)size);
-
-		/*
-		 * And then we need to update the object size in the
-		 * per cpu structures
-		 */
-		for_each_online_cpu(cpu)
-			per_cpu_ptr(s->cpu_slab, cpu)->objsize = s->objsize;
-
 		s->inuse = max_t(int, s->inuse, ALIGN(size, sizeof(void *)));
 		up_write(&slub_lock);
 
@@ -3259,14 +3234,6 @@ static int __cpuinit slab_cpuup_callback
 	unsigned long flags;
 
 	switch (action) {
-	case CPU_UP_PREPARE:
-	case CPU_UP_PREPARE_FROZEN:
-		down_read(&slub_lock);
-		list_for_each_entry(s, &slab_caches, list)
-			init_kmem_cache_cpu(s, per_cpu_ptr(s->cpu_slab, cpu));
-		up_read(&slub_lock);
-		break;
-
 	case CPU_UP_CANCELED:
 	case CPU_UP_CANCELED_FROZEN:
 	case CPU_DEAD:

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [this_cpu_xx V2 15/19] Make slub statistics use this_cpu_inc
  2009-06-17 20:33 [this_cpu_xx V2 00/19] Introduce this_cpu_xx operations cl
                   ` (13 preceding siblings ...)
  2009-06-17 20:33 ` [this_cpu_xx V2 14/19] this_cpu: Remove slub kmem_cache fields cl
@ 2009-06-17 20:33 ` cl
  2009-06-17 20:33 ` [this_cpu_xx V2 16/19] this_cpu: slub aggressive use of this_cpu operations in the hotpaths cl
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: cl @ 2009-06-17 20:33 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, Pekka Enberg, Tejun Heo, mingo, rusty, davem

[-- Attachment #1: this_cpu_slub_cleanup_stat --]
[-- Type: text/plain, Size: 5362 bytes --]

this_cpu_inc() translates into a single instruction on x86 and does not
need any register. So use it in stat(). We also want to avoid the
calculation of the per cpu kmem_cache_cpu structure pointer. So pass
a kmem_cache pointer instead of a kmem_cache_cpu pointer.

Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org?

---
 mm/slub.c |   43 ++++++++++++++++++++-----------------------
 1 file changed, 20 insertions(+), 23 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2009-06-17 14:11:20.000000000 -0500
+++ linux-2.6/mm/slub.c	2009-06-17 14:11:24.000000000 -0500
@@ -217,10 +217,10 @@ static inline void sysfs_slab_remove(str
 
 #endif
 
-static inline void stat(struct kmem_cache_cpu *c, enum stat_item si)
+static inline void stat(struct kmem_cache *s, enum stat_item si)
 {
 #ifdef CONFIG_SLUB_STATS
-	c->stat[si]++;
+	__this_cpu_inc(s->cpu_slab->stat[si]);
 #endif
 }
 
@@ -1090,7 +1090,7 @@ static struct page *allocate_slab(struct
 		if (!page)
 			return NULL;
 
-		stat(this_cpu_ptr(s->cpu_slab), ORDER_FALLBACK);
+		stat(s, ORDER_FALLBACK);
 	}
 
 	if (kmemcheck_enabled
@@ -1389,23 +1389,22 @@ static struct page *get_partial(struct k
 static void unfreeze_slab(struct kmem_cache *s, struct page *page, int tail)
 {
 	struct kmem_cache_node *n = get_node(s, page_to_nid(page));
-	struct kmem_cache_cpu *c = this_cpu_ptr(s->cpu_slab);
 
 	__ClearPageSlubFrozen(page);
 	if (page->inuse) {
 
 		if (page->freelist) {
 			add_partial(n, page, tail);
-			stat(c, tail ? DEACTIVATE_TO_TAIL : DEACTIVATE_TO_HEAD);
+			stat(s, tail ? DEACTIVATE_TO_TAIL : DEACTIVATE_TO_HEAD);
 		} else {
-			stat(c, DEACTIVATE_FULL);
+			stat(s, DEACTIVATE_FULL);
 			if (SLABDEBUG && PageSlubDebug(page) &&
 						(s->flags & SLAB_STORE_USER))
 				add_full(n, page);
 		}
 		slab_unlock(page);
 	} else {
-		stat(c, DEACTIVATE_EMPTY);
+		stat(s, DEACTIVATE_EMPTY);
 		if (n->nr_partial < s->min_partial) {
 			/*
 			 * Adding an empty slab to the partial slabs in order
@@ -1421,7 +1420,7 @@ static void unfreeze_slab(struct kmem_ca
 			slab_unlock(page);
 		} else {
 			slab_unlock(page);
-			stat(__this_cpu_ptr(s->cpu_slab), FREE_SLAB);
+			stat(s, FREE_SLAB);
 			discard_slab(s, page);
 		}
 	}
@@ -1436,7 +1435,7 @@ static void deactivate_slab(struct kmem_
 	int tail = 1;
 
 	if (page->freelist)
-		stat(c, DEACTIVATE_REMOTE_FREES);
+		stat(s, DEACTIVATE_REMOTE_FREES);
 	/*
 	 * Merge cpu freelist into slab freelist. Typically we get here
 	 * because both freelists are empty. So this is unlikely
@@ -1462,7 +1461,7 @@ static void deactivate_slab(struct kmem_
 
 static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 {
-	stat(c, CPUSLAB_FLUSH);
+	stat(s, CPUSLAB_FLUSH);
 	slab_lock(c->page);
 	deactivate_slab(s, c);
 }
@@ -1595,7 +1594,7 @@ static void *__slab_alloc(struct kmem_ca
 	if (unlikely(!node_match(c, node)))
 		goto another_slab;
 
-	stat(c, ALLOC_REFILL);
+	stat(s, ALLOC_REFILL);
 
 load_freelist:
 	object = c->page->freelist;
@@ -1610,7 +1609,7 @@ load_freelist:
 	c->node = page_to_nid(c->page);
 unlock_out:
 	slab_unlock(c->page);
-	stat(c, ALLOC_SLOWPATH);
+	stat(s, ALLOC_SLOWPATH);
 	return object;
 
 another_slab:
@@ -1620,7 +1619,7 @@ new_slab:
 	new = get_partial(s, gfpflags, node);
 	if (new) {
 		c->page = new;
-		stat(c, ALLOC_FROM_PARTIAL);
+		stat(s, ALLOC_FROM_PARTIAL);
 		goto load_freelist;
 	}
 
@@ -1634,7 +1633,7 @@ new_slab:
 
 	if (new) {
 		c = __this_cpu_ptr(s->cpu_slab);
-		stat(c, ALLOC_SLAB);
+		stat(s, ALLOC_SLAB);
 		if (c->page)
 			flush_slab(s, c);
 		slab_lock(new);
@@ -1689,7 +1688,7 @@ static __always_inline void *slab_alloc(
 
 	else {
 		c->freelist = get_freepointer(s, object);
-		stat(c, ALLOC_FASTPATH);
+		stat(s, ALLOC_FASTPATH);
 	}
 	local_irq_restore(flags);
 
@@ -1756,10 +1755,8 @@ static void __slab_free(struct kmem_cach
 {
 	void *prior;
 	void **object = (void *)x;
-	struct kmem_cache_cpu *c;
 
-	c = __this_cpu_ptr(s->cpu_slab);
-	stat(c, FREE_SLOWPATH);
+	stat(s, FREE_SLOWPATH);
 	slab_lock(page);
 
 	if (unlikely(SLABDEBUG && PageSlubDebug(page)))
@@ -1772,7 +1769,7 @@ checks_ok:
 	page->inuse--;
 
 	if (unlikely(PageSlubFrozen(page))) {
-		stat(c, FREE_FROZEN);
+		stat(s, FREE_FROZEN);
 		goto out_unlock;
 	}
 
@@ -1785,7 +1782,7 @@ checks_ok:
 	 */
 	if (unlikely(!prior)) {
 		add_partial(get_node(s, page_to_nid(page)), page, 1);
-		stat(c, FREE_ADD_PARTIAL);
+		stat(s, FREE_ADD_PARTIAL);
 	}
 
 out_unlock:
@@ -1798,10 +1795,10 @@ slab_empty:
 		 * Slab still on the partial list.
 		 */
 		remove_partial(s, page);
-		stat(c, FREE_REMOVE_PARTIAL);
+		stat(s, FREE_REMOVE_PARTIAL);
 	}
 	slab_unlock(page);
-	stat(c, FREE_SLAB);
+	stat(s, FREE_SLAB);
 	discard_slab(s, page);
 	return;
 
@@ -1839,7 +1836,7 @@ static __always_inline void slab_free(st
 	if (likely(page == c->page && c->node >= 0)) {
 		set_freepointer(s, object, c->freelist);
 		c->freelist = object;
-		stat(c, FREE_FASTPATH);
+		stat(s, FREE_FASTPATH);
 	} else
 		__slab_free(s, page, x, addr);
 

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [this_cpu_xx V2 16/19] this_cpu: slub aggressive use of this_cpu operations in the hotpaths
  2009-06-17 20:33 [this_cpu_xx V2 00/19] Introduce this_cpu_xx operations cl
                   ` (14 preceding siblings ...)
  2009-06-17 20:33 ` [this_cpu_xx V2 15/19] Make slub statistics use this_cpu_inc cl
@ 2009-06-17 20:33 ` cl
  2009-06-18  6:33   ` Pekka Enberg
  2009-06-17 20:33 ` [this_cpu_xx V2 17/19] Move early initialization of pagesets out of zone_wait_table_init() cl
                   ` (2 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: cl @ 2009-06-17 20:33 UTC (permalink / raw)
  To: akpm
  Cc: linux-mm, Mathieu Desnoyers, Pekka Enberg, Tejun Heo, mingo,
	rusty, davem

[-- Attachment #1: this_cpu_slub_aggressive_cpu_ops --]
[-- Type: text/plain, Size: 5876 bytes --]

Use this_cpu_* operations in the hotpath to avoid calculations of
kmem_cache_cpu pointer addresses.

It is not clear if this is always an advantage.

On x86 there is a tradeof: Multiple uses segment prefixes against an
address calculation and more register pressure.

On the other hand the use of prefixes is necessary if we want to use
Mathieus scheme for fastpaths that do not require interrupt disable.

Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

---
 mm/slub.c |   76 +++++++++++++++++++++++++++++++-------------------------------
 1 file changed, 39 insertions(+), 37 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2009-06-17 14:11:24.000000000 -0500
+++ linux-2.6/mm/slub.c	2009-06-17 14:11:27.000000000 -0500
@@ -1495,10 +1495,10 @@ static void flush_all(struct kmem_cache 
  * Check if the objects in a per cpu structure fit numa
  * locality expectations.
  */
-static inline int node_match(struct kmem_cache_cpu *c, int node)
+static inline int node_match(struct kmem_cache *s, int node)
 {
 #ifdef CONFIG_NUMA
-	if (node != -1 && c->node != node)
+	if (node != -1 && __this_cpu_read(s->cpu_slab->node) != node)
 		return 0;
 #endif
 	return 1;
@@ -1582,43 +1582,46 @@ slab_out_of_memory(struct kmem_cache *s,
  * a call to the page allocator and the setup of a new slab.
  */
 static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
-			  unsigned long addr, struct kmem_cache_cpu *c)
+			  unsigned long addr)
 {
 	void **object;
 	struct page *new;
+	struct kmem_cache_cpu *c = s->cpu_slab;
+	struct page *page = __this_cpu_read(c->page);
 
-	if (!c->page)
+	if (!page)
 		goto new_slab;
 
-	slab_lock(c->page);
-	if (unlikely(!node_match(c, node)))
+	slab_lock(page);
+	if (unlikely(!node_match(s, node)))
 		goto another_slab;
 
 	stat(s, ALLOC_REFILL);
 
 load_freelist:
-	object = c->page->freelist;
+	object = page->freelist;
 	if (unlikely(!object))
 		goto another_slab;
-	if (unlikely(SLABDEBUG && PageSlubDebug(c->page)))
+	if (unlikely(SLABDEBUG && PageSlubDebug(page)))
 		goto debug;
 
-	c->freelist = get_freepointer(s, object);
-	c->page->inuse = c->page->objects;
-	c->page->freelist = NULL;
-	c->node = page_to_nid(c->page);
+	__this_cpu_write(c->freelist, get_freepointer(s, object));
+	page->inuse = page->objects;
+	page->freelist = NULL;
+	__this_cpu_write(c->node, page_to_nid(page));
 unlock_out:
-	slab_unlock(c->page);
+	slab_unlock(page);
 	stat(s, ALLOC_SLOWPATH);
 	return object;
 
 another_slab:
-	deactivate_slab(s, c);
+	deactivate_slab(s, __this_cpu_ptr(c));
 
 new_slab:
 	new = get_partial(s, gfpflags, node);
 	if (new) {
-		c->page = new;
+		page = new;
+		__this_cpu_write(c->page, page);
 		stat(s, ALLOC_FROM_PARTIAL);
 		goto load_freelist;
 	}
@@ -1626,19 +1629,18 @@ new_slab:
 	if (gfpflags & __GFP_WAIT)
 		local_irq_enable();
 
-	new = new_slab(s, gfpflags, node);
+	page = new_slab(s, gfpflags, node);
 
 	if (gfpflags & __GFP_WAIT)
 		local_irq_disable();
 
-	if (new) {
-		c = __this_cpu_ptr(s->cpu_slab);
+	if (page) {
 		stat(s, ALLOC_SLAB);
-		if (c->page)
-			flush_slab(s, c);
-		slab_lock(new);
-		__SetPageSlubFrozen(new);
-		c->page = new;
+		if (__this_cpu_read(c->page))
+			flush_slab(s, __this_cpu_ptr(c));
+		slab_lock(page);
+		__SetPageSlubFrozen(page);
+		__this_cpu_write(c->page, page);
 		goto load_freelist;
 	}
 	if (!(gfpflags & __GFP_NOWARN) && printk_ratelimit())
@@ -1648,9 +1650,9 @@ debug:
 	if (!alloc_debug_processing(s, c->page, object, addr))
 		goto another_slab;
 
-	c->page->inuse++;
-	c->page->freelist = get_freepointer(s, object);
-	c->node = -1;
+	page->inuse++;
+	page->freelist = get_freepointer(s, object);
+	__this_cpu_write(c->node, -1);
 	goto unlock_out;
 }
 
@@ -1668,8 +1670,8 @@ static __always_inline void *slab_alloc(
 		gfp_t gfpflags, int node, unsigned long addr)
 {
 	void **object;
-	struct kmem_cache_cpu *c;
 	unsigned long flags;
+	struct kmem_cache_cpu *c = s->cpu_slab;
 
 	gfpflags &= slab_gfp_mask;
 
@@ -1680,14 +1682,13 @@ static __always_inline void *slab_alloc(
 		return NULL;
 
 	local_irq_save(flags);
-	c = __this_cpu_ptr(s->cpu_slab);
-	object = c->freelist;
-	if (unlikely(!object || !node_match(c, node)))
+	object = __this_cpu_read(c->freelist);
+	if (unlikely(!object || !node_match(s, node)))
 
-		object = __slab_alloc(s, gfpflags, node, addr, c);
+		object = __slab_alloc(s, gfpflags, node, addr);
 
 	else {
-		c->freelist = get_freepointer(s, object);
+		__this_cpu_write(c->freelist, get_freepointer(s, object));
 		stat(s, ALLOC_FASTPATH);
 	}
 	local_irq_restore(flags);
@@ -1823,19 +1824,20 @@ static __always_inline void slab_free(st
 			struct page *page, void *x, unsigned long addr)
 {
 	void **object = (void *)x;
-	struct kmem_cache_cpu *c;
+	struct kmem_cache_cpu *c = s->cpu_slab;
 	unsigned long flags;
 
 	kmemleak_free_recursive(x, s->flags);
 	local_irq_save(flags);
-	c = __this_cpu_ptr(s->cpu_slab);
 	kmemcheck_slab_free(s, object, s->objsize);
 	debug_check_no_locks_freed(object, s->objsize);
 	if (!(s->flags & SLAB_DEBUG_OBJECTS))
 		debug_check_no_obj_freed(object, s->objsize);
-	if (likely(page == c->page && c->node >= 0)) {
-		set_freepointer(s, object, c->freelist);
-		c->freelist = object;
+
+	if (likely(page == __this_cpu_read(c->page) &&
+			__this_cpu_read(c->node) >= 0)) {
+		set_freepointer(s, object, __this_cpu_read(c->freelist));
+		__this_cpu_write(c->freelist, object);
 		stat(s, FREE_FASTPATH);
 	} else
 		__slab_free(s, page, x, addr);

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [this_cpu_xx V2 17/19] Move early initialization of pagesets out of zone_wait_table_init()
  2009-06-17 20:33 [this_cpu_xx V2 00/19] Introduce this_cpu_xx operations cl
                   ` (15 preceding siblings ...)
  2009-06-17 20:33 ` [this_cpu_xx V2 16/19] this_cpu: slub aggressive use of this_cpu operations in the hotpaths cl
@ 2009-06-17 20:33 ` cl
  2009-06-18  3:13   ` Tejun Heo
  2009-06-17 20:33 ` [this_cpu_xx V2 18/19] this_cpu_ops: page allocator conversion cl
  2009-06-17 20:33 ` [this_cpu_xx V2 19/19] this_cpu ops: Remove pageset_notifier cl
  18 siblings, 1 reply; 48+ messages in thread
From: cl @ 2009-06-17 20:33 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, Mel Gorman, Tejun Heo, mingo, rusty, davem

[-- Attachment #1: this_cpu_move_initialization --]
[-- Type: text/plain, Size: 5686 bytes --]

Explicitly initialize the pagesets after the per cpu areas have been
initialized. This is necessary in order to be able to use per cpu
operations in later patches.

Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

---
 arch/ia64/kernel/setup.c       |    1 +
 arch/powerpc/kernel/setup_64.c |    1 +
 arch/sparc/kernel/smp_64.c     |    1 +
 arch/x86/kernel/setup_percpu.c |    2 ++
 include/linux/mm.h             |    1 +
 init/main.c                    |    1 +
 mm/page_alloc.c                |   40 +++++++++++++++++++++++++++++-----------
 7 files changed, 36 insertions(+), 11 deletions(-)

Index: linux-2.6/init/main.c
===================================================================
--- linux-2.6.orig/init/main.c	2009-06-17 14:06:22.000000000 -0500
+++ linux-2.6/init/main.c	2009-06-17 14:11:30.000000000 -0500
@@ -398,6 +398,7 @@ static void __init setup_per_cpu_areas(v
 		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
 		ptr += size;
 	}
+	setup_pagesets();
 }
 #endif /* CONFIG_HAVE_SETUP_PER_CPU_AREA */
 
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c	2009-06-17 14:11:04.000000000 -0500
+++ linux-2.6/mm/page_alloc.c	2009-06-17 14:11:30.000000000 -0500
@@ -3129,23 +3129,42 @@ int zone_wait_table_init(struct zone *zo
 	return 0;
 }
 
-static __meminit void zone_pcp_init(struct zone *zone)
+/*
+ * Early setup of pagesets.
+ *
+ * In the NUMA case the pageset setup simply results in all zones pcp
+ * pointer being directed at a per cpu pageset with zero batchsize.
+ *
+ * This means that every free and every allocation occurs directly from
+ * the buddy allocator tables.
+ *
+ * The pageset never queues pages during early boot and is therefore usable
+ * for every type of zone.
+ */
+__meminit void setup_pagesets(void)
 {
 	int cpu;
-	unsigned long batch = zone_batchsize(zone);
+	struct zone *zone;
 
-	for (cpu = 0; cpu < NR_CPUS; cpu++) {
+	for_each_zone(zone) {
 #ifdef CONFIG_NUMA
-		/* Early boot. Slab allocator not functional yet */
-		zone_pcp(zone, cpu) = &boot_pageset[cpu];
-		setup_pageset(&boot_pageset[cpu],0);
+		unsigned long batch = 0;
+
+		for (cpu = 0; cpu < NR_CPUS; cpu++) {
+			/* Early boot. Slab allocator not functional yet */
+			zone_pcp(zone, cpu) = &boot_pageset[cpu];
+		}
 #else
-		setup_pageset(zone_pcp(zone,cpu), batch);
+		unsigned long batch = zone_batchsize(zone);
 #endif
+
+		for_each_possible_cpu(cpu)
+			setup_pageset(zone_pcp(zone, cpu), batch);
+
+		if (zone->present_pages)
+			printk(KERN_DEBUG "  %s zone: %lu pages, LIFO batch:%lu\n",
+				zone->name, zone->present_pages, batch);
 	}
-	if (zone->present_pages)
-		printk(KERN_DEBUG "  %s zone: %lu pages, LIFO batch:%lu\n",
-			zone->name, zone->present_pages, batch);
 }
 
 __meminit int init_currently_empty_zone(struct zone *zone,
@@ -3700,7 +3719,6 @@ static void __paginginit free_area_init_
 
 		zone->prev_priority = DEF_PRIORITY;
 
-		zone_pcp_init(zone);
 		for_each_lru(l) {
 			INIT_LIST_HEAD(&zone->lru[l].list);
 			zone->lru[l].nr_saved_scan = 0;
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h	2009-06-17 14:06:22.000000000 -0500
+++ linux-2.6/include/linux/mm.h	2009-06-17 14:11:30.000000000 -0500
@@ -1060,6 +1060,7 @@ extern void show_mem(void);
 extern void si_meminfo(struct sysinfo * val);
 extern void si_meminfo_node(struct sysinfo *val, int nid);
 extern int after_bootmem;
+extern void setup_pagesets(void);
 
 #ifdef CONFIG_NUMA
 extern void setup_per_cpu_pageset(void);
Index: linux-2.6/arch/ia64/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/ia64/kernel/setup.c	2009-06-17 14:06:22.000000000 -0500
+++ linux-2.6/arch/ia64/kernel/setup.c	2009-06-17 14:11:30.000000000 -0500
@@ -859,6 +859,7 @@ void __init
 setup_per_cpu_areas (void)
 {
 	/* start_kernel() requires this... */
+	setup_pagesets();
 }
 
 /*
Index: linux-2.6/arch/powerpc/kernel/setup_64.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/setup_64.c	2009-06-17 14:06:22.000000000 -0500
+++ linux-2.6/arch/powerpc/kernel/setup_64.c	2009-06-17 14:11:30.000000000 -0500
@@ -588,6 +588,7 @@ void __init setup_per_cpu_areas(void)
 		paca[i].data_offset = ptr - __per_cpu_start;
 		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
 	}
+	setup_pagesets();
 }
 #endif
 
Index: linux-2.6/arch/sparc/kernel/smp_64.c
===================================================================
--- linux-2.6.orig/arch/sparc/kernel/smp_64.c	2009-06-17 14:06:22.000000000 -0500
+++ linux-2.6/arch/sparc/kernel/smp_64.c	2009-06-17 14:11:30.000000000 -0500
@@ -1543,4 +1543,5 @@ void __init setup_per_cpu_areas(void)
 	of_fill_in_cpu_data();
 	if (tlb_type == hypervisor)
 		mdesc_fill_in_cpu_data(cpu_all_mask);
+	setup_pagesets();
 }
Index: linux-2.6/arch/x86/kernel/setup_percpu.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup_percpu.c	2009-06-17 14:06:22.000000000 -0500
+++ linux-2.6/arch/x86/kernel/setup_percpu.c	2009-06-17 14:11:30.000000000 -0500
@@ -438,4 +438,6 @@ void __init setup_per_cpu_areas(void)
 
 	/* Setup cpu initialized, callin, callout masks */
 	setup_cpu_local_masks();
+
+	setup_pagesets();
 }

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [this_cpu_xx V2 18/19] this_cpu_ops: page allocator conversion
  2009-06-17 20:33 [this_cpu_xx V2 00/19] Introduce this_cpu_xx operations cl
                   ` (16 preceding siblings ...)
  2009-06-17 20:33 ` [this_cpu_xx V2 17/19] Move early initialization of pagesets out of zone_wait_table_init() cl
@ 2009-06-17 20:33 ` cl
  2009-06-17 20:33 ` [this_cpu_xx V2 19/19] this_cpu ops: Remove pageset_notifier cl
  18 siblings, 0 replies; 48+ messages in thread
From: cl @ 2009-06-17 20:33 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, Mel Gorman, Tejun Heo, mingo, rusty, davem

[-- Attachment #1: this_cpu_page_allocator --]
[-- Type: text/plain, Size: 12731 bytes --]

Use the per cpu allocator functionality to avoid per cpu arrays in struct zone.

This drastically reduces the size of struct zone for systems with large
amounts of processors and allows placement of critical variables of struct
zone in one cacheline even on very large systems.

Another effect is that the pagesets of one processor are placed near one
another. If multiple pagesets from different zones fit into one cacheline
then additional cacheline fetches can be avoided on the hot paths when
allocating memory from multiple zones.

Bootstrap becomes simpler if we use the same scheme for UP, SMP, NUMA. #ifdefs
are reduced and we can drop the zone_pcp macro.

Hotplug handling is also simplified since cpu alloc can bring up and
shut down cpu areas for a specific cpu as a whole. So there is no need to
allocate or free individual pagesets.

Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

---
 include/linux/mm.h     |    4 -
 include/linux/mmzone.h |   12 ---
 mm/page_alloc.c        |  154 ++++++++++++++-----------------------------------
 mm/vmstat.c            |   14 ++--
 4 files changed, 54 insertions(+), 130 deletions(-)

Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h	2009-06-17 14:11:30.000000000 -0500
+++ linux-2.6/include/linux/mm.h	2009-06-17 14:14:37.000000000 -0500
@@ -1062,11 +1062,7 @@ extern void si_meminfo_node(struct sysin
 extern int after_bootmem;
 extern void setup_pagesets(void);
 
-#ifdef CONFIG_NUMA
 extern void setup_per_cpu_pageset(void);
-#else
-static inline void setup_per_cpu_pageset(void) {}
-#endif
 
 /* nommu.c */
 extern atomic_long_t mmap_pages_allocated;
Index: linux-2.6/include/linux/mmzone.h
===================================================================
--- linux-2.6.orig/include/linux/mmzone.h	2009-06-17 14:06:22.000000000 -0500
+++ linux-2.6/include/linux/mmzone.h	2009-06-17 14:14:37.000000000 -0500
@@ -177,13 +177,7 @@ struct per_cpu_pageset {
 	s8 stat_threshold;
 	s8 vm_stat_diff[NR_VM_ZONE_STAT_ITEMS];
 #endif
-} ____cacheline_aligned_in_smp;
-
-#ifdef CONFIG_NUMA
-#define zone_pcp(__z, __cpu) ((__z)->pageset[(__cpu)])
-#else
-#define zone_pcp(__z, __cpu) (&(__z)->pageset[(__cpu)])
-#endif
+};
 
 #endif /* !__GENERATING_BOUNDS.H */
 
@@ -294,10 +288,8 @@ struct zone {
 	 */
 	unsigned long		min_unmapped_pages;
 	unsigned long		min_slab_pages;
-	struct per_cpu_pageset	*pageset[NR_CPUS];
-#else
-	struct per_cpu_pageset	pageset[NR_CPUS];
 #endif
+	struct per_cpu_pageset	*pageset;
 	/*
 	 * free areas of different sizes
 	 */
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c	2009-06-17 14:11:30.000000000 -0500
+++ linux-2.6/mm/page_alloc.c	2009-06-17 14:52:56.000000000 -0500
@@ -951,7 +951,7 @@ static void drain_pages(unsigned int cpu
 		struct per_cpu_pageset *pset;
 		struct per_cpu_pages *pcp;
 
-		pset = zone_pcp(zone, cpu);
+		pset = per_cpu_ptr(zone->pageset, cpu);
 
 		pcp = &pset->pcp;
 		local_irq_save(flags);
@@ -1037,7 +1037,7 @@ static void free_hot_cold_page(struct pa
 	arch_free_page(page, 0);
 	kernel_map_pages(page, 1, 0);
 
-	pcp = &zone_pcp(zone, get_cpu())->pcp;
+	pcp = &this_cpu_ptr(zone->pageset)->pcp;
 	set_page_private(page, get_pageblock_migratetype(page));
 	local_irq_save(flags);
 	if (unlikely(clearMlocked))
@@ -1054,7 +1054,6 @@ static void free_hot_cold_page(struct pa
 		pcp->count -= pcp->batch;
 	}
 	local_irq_restore(flags);
-	put_cpu();
 }
 
 void free_hot_page(struct page *page)
@@ -1108,14 +1107,12 @@ struct page *buffered_rmqueue(struct zon
 	unsigned long flags;
 	struct page *page;
 	int cold = !!(gfp_flags & __GFP_COLD);
-	int cpu;
 
 again:
-	cpu  = get_cpu();
 	if (likely(order == 0)) {
 		struct per_cpu_pages *pcp;
 
-		pcp = &zone_pcp(zone, cpu)->pcp;
+		pcp = &this_cpu_ptr(zone->pageset)->pcp;
 		local_irq_save(flags);
 		if (!pcp->count) {
 			pcp->count = rmqueue_bulk(zone, 0,
@@ -1169,7 +1166,6 @@ again:
 	__count_zone_vm_events(PGALLOC, zone, 1 << order);
 	zone_statistics(preferred_zone, zone);
 	local_irq_restore(flags);
-	put_cpu();
 
 	VM_BUG_ON(bad_range(zone, page));
 	if (prep_new_page(page, order, gfp_flags))
@@ -1178,7 +1174,6 @@ again:
 
 failed:
 	local_irq_restore(flags);
-	put_cpu();
 	return NULL;
 }
 
@@ -2105,7 +2100,7 @@ void show_free_areas(void)
 		for_each_online_cpu(cpu) {
 			struct per_cpu_pageset *pageset;
 
-			pageset = zone_pcp(zone, cpu);
+			pageset = per_cpu_ptr(zone->pageset, cpu);
 
 			printk("CPU %4d: hi:%5d, btch:%4d usd:%4d\n",
 			       cpu, pageset->pcp.high,
@@ -2972,7 +2967,6 @@ static void setup_pagelist_highmark(stru
 }
 
 
-#ifdef CONFIG_NUMA
 /*
  * Boot pageset table. One per cpu which is going to be used for all
  * zones and all nodes. The parameters will be set in such a way
@@ -2980,112 +2974,67 @@ static void setup_pagelist_highmark(stru
  * the buddy list. This is safe since pageset manipulation is done
  * with interrupts disabled.
  *
- * Some NUMA counter updates may also be caught by the boot pagesets.
- *
- * The boot_pagesets must be kept even after bootup is complete for
- * unused processors and/or zones. They do play a role for bootstrapping
- * hotplugged processors.
+ * Some counter updates may also be caught by the boot pagesets.
  *
  * zoneinfo_show() and maybe other functions do
  * not check if the processor is online before following the pageset pointer.
  * Other parts of the kernel may not check if the zone is available.
  */
-static struct per_cpu_pageset boot_pageset[NR_CPUS];
-
-/*
- * Dynamically allocate memory for the
- * per cpu pageset array in struct zone.
- */
-static int __cpuinit process_zones(int cpu)
-{
-	struct zone *zone, *dzone;
-	int node = cpu_to_node(cpu);
-
-	node_set_state(node, N_CPU);	/* this node has a cpu */
-
-	for_each_populated_zone(zone) {
-		zone_pcp(zone, cpu) = kmalloc_node(sizeof(struct per_cpu_pageset),
-					 GFP_KERNEL, node);
-		if (!zone_pcp(zone, cpu))
-			goto bad;
-
-		setup_pageset(zone_pcp(zone, cpu), zone_batchsize(zone));
-
-		if (percpu_pagelist_fraction)
-			setup_pagelist_highmark(zone_pcp(zone, cpu),
-			 	(zone->present_pages / percpu_pagelist_fraction));
-	}
-
-	return 0;
-bad:
-	for_each_zone(dzone) {
-		if (!populated_zone(dzone))
-			continue;
-		if (dzone == zone)
-			break;
-		kfree(zone_pcp(dzone, cpu));
-		zone_pcp(dzone, cpu) = NULL;
-	}
-	return -ENOMEM;
-}
-
-static inline void free_zone_pagesets(int cpu)
-{
-	struct zone *zone;
-
-	for_each_zone(zone) {
-		struct per_cpu_pageset *pset = zone_pcp(zone, cpu);
-
-		/* Free per_cpu_pageset if it is slab allocated */
-		if (pset != &boot_pageset[cpu])
-			kfree(pset);
-		zone_pcp(zone, cpu) = &boot_pageset[cpu];
-	}
-}
+static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset);
 
 static int __cpuinit pageset_cpuup_callback(struct notifier_block *nfb,
 		unsigned long action,
 		void *hcpu)
 {
 	int cpu = (long)hcpu;
-	int ret = NOTIFY_OK;
 
 	switch (action) {
 	case CPU_UP_PREPARE:
 	case CPU_UP_PREPARE_FROZEN:
-		if (process_zones(cpu))
-			ret = NOTIFY_BAD;
-		break;
-	case CPU_UP_CANCELED:
-	case CPU_UP_CANCELED_FROZEN:
-	case CPU_DEAD:
-	case CPU_DEAD_FROZEN:
-		free_zone_pagesets(cpu);
+		node_set_state(cpu_to_node(cpu), N_CPU);
 		break;
 	default:
 		break;
 	}
-	return ret;
+	return NOTIFY_OK;
 }
 
 static struct notifier_block __cpuinitdata pageset_notifier =
 	{ &pageset_cpuup_callback, NULL, 0 };
 
+/*
+ * Allocate per cpu pagesets and initialize them.
+ * Before this call only boot pagesets were available.
+ * Boot pagesets will no longer be used after this call is complete.
+ */
 void __init setup_per_cpu_pageset(void)
 {
-	int err;
+	struct zone *zone;
+	int cpu;
+
+	for_each_populated_zone(zone) {
+		zone->pageset = alloc_percpu(struct per_cpu_pageset);
 
-	/* Initialize per_cpu_pageset for cpu 0.
-	 * A cpuup callback will do this for every cpu
-	 * as it comes online
+		for_each_possible_cpu(cpu) {
+			struct per_cpu_pageset *pcp = per_cpu_ptr(zone->pageset, cpu);
+
+			setup_pageset(pcp, zone_batchsize(zone));
+
+			if (percpu_pagelist_fraction)
+				setup_pagelist_highmark(pcp,
+					(zone->present_pages /
+						percpu_pagelist_fraction));
+		}
+	}
+
+	/*
+	 * The boot cpu is always the first active.
+	 * The boot node has a processor
 	 */
-	err = process_zones(smp_processor_id());
-	BUG_ON(err);
+	node_set_state(cpu_to_node(smp_processor_id()), N_CPU);
 	register_cpu_notifier(&pageset_notifier);
 }
 
-#endif
-
 static noinline __init_refok
 int zone_wait_table_init(struct zone *zone, unsigned long zone_size_pages)
 {
@@ -3131,15 +3080,7 @@ int zone_wait_table_init(struct zone *zo
 
 /*
  * Early setup of pagesets.
- *
- * In the NUMA case the pageset setup simply results in all zones pcp
- * pointer being directed at a per cpu pageset with zero batchsize.
- *
- * This means that every free and every allocation occurs directly from
- * the buddy allocator tables.
- *
- * The pageset never queues pages during early boot and is therefore usable
- * for every type of zone.
+ * At this point various allocators are not operational yet.
  */
 __meminit void setup_pagesets(void)
 {
@@ -3147,23 +3088,15 @@ __meminit void setup_pagesets(void)
 	struct zone *zone;
 
 	for_each_zone(zone) {
-#ifdef CONFIG_NUMA
-		unsigned long batch = 0;
-
-		for (cpu = 0; cpu < NR_CPUS; cpu++) {
-			/* Early boot. Slab allocator not functional yet */
-			zone_pcp(zone, cpu) = &boot_pageset[cpu];
-		}
-#else
-		unsigned long batch = zone_batchsize(zone);
-#endif
+		zone->pageset = &per_cpu_var(boot_pageset);
 
+		/*
+		 * Special pagesets with zero elements so that frees
+		 * and allocations are not buffered at all.
+		 */
 		for_each_possible_cpu(cpu)
-			setup_pageset(zone_pcp(zone, cpu), batch);
+			setup_pageset(per_cpu_ptr(zone->pageset, cpu), 0);
 
-		if (zone->present_pages)
-			printk(KERN_DEBUG "  %s zone: %lu pages, LIFO batch:%lu\n",
-				zone->name, zone->present_pages, batch);
 	}
 }
 
@@ -4676,10 +4609,11 @@ int percpu_pagelist_fraction_sysctl_hand
 	if (!write || (ret == -EINVAL))
 		return ret;
 	for_each_populated_zone(zone) {
-		for_each_online_cpu(cpu) {
+		for_each_possible_cpu(cpu) {
 			unsigned long  high;
 			high = zone->present_pages / percpu_pagelist_fraction;
-			setup_pagelist_highmark(zone_pcp(zone, cpu), high);
+			setup_pagelist_highmark(
+				per_cpu_ptr(zone->pageset, cpu), high);
 		}
 	}
 	return 0;
Index: linux-2.6/mm/vmstat.c
===================================================================
--- linux-2.6.orig/mm/vmstat.c	2009-06-17 14:06:22.000000000 -0500
+++ linux-2.6/mm/vmstat.c	2009-06-17 14:14:37.000000000 -0500
@@ -139,7 +139,8 @@ static void refresh_zone_stat_thresholds
 		threshold = calculate_threshold(zone);
 
 		for_each_online_cpu(cpu)
-			zone_pcp(zone, cpu)->stat_threshold = threshold;
+			per_cpu_ptr(zone->pageset, cpu)->stat_threshold
+							= threshold;
 	}
 }
 
@@ -149,7 +150,8 @@ static void refresh_zone_stat_thresholds
 void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
 				int delta)
 {
-	struct per_cpu_pageset *pcp = zone_pcp(zone, smp_processor_id());
+	struct per_cpu_pageset *pcp = this_cpu_ptr(zone->pageset);
+
 	s8 *p = pcp->vm_stat_diff + item;
 	long x;
 
@@ -202,7 +204,7 @@ EXPORT_SYMBOL(mod_zone_page_state);
  */
 void __inc_zone_state(struct zone *zone, enum zone_stat_item item)
 {
-	struct per_cpu_pageset *pcp = zone_pcp(zone, smp_processor_id());
+	struct per_cpu_pageset *pcp = this_cpu_ptr(zone->pageset);
 	s8 *p = pcp->vm_stat_diff + item;
 
 	(*p)++;
@@ -223,7 +225,7 @@ EXPORT_SYMBOL(__inc_zone_page_state);
 
 void __dec_zone_state(struct zone *zone, enum zone_stat_item item)
 {
-	struct per_cpu_pageset *pcp = zone_pcp(zone, smp_processor_id());
+	struct per_cpu_pageset *pcp = this_cpu_ptr(zone->pageset);
 	s8 *p = pcp->vm_stat_diff + item;
 
 	(*p)--;
@@ -300,7 +302,7 @@ void refresh_cpu_vm_stats(int cpu)
 	for_each_populated_zone(zone) {
 		struct per_cpu_pageset *p;
 
-		p = zone_pcp(zone, cpu);
+		p = per_cpu_ptr(zone->pageset, cpu);
 
 		for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
 			if (p->vm_stat_diff[i]) {
@@ -735,7 +737,7 @@ static void zoneinfo_show_print(struct s
 	for_each_online_cpu(i) {
 		struct per_cpu_pageset *pageset;
 
-		pageset = zone_pcp(zone, i);
+		pageset = per_cpu_ptr(zone->pageset, i);
 		seq_printf(m,
 			   "\n    cpu: %i"
 			   "\n              count: %i"

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [this_cpu_xx V2 19/19] this_cpu ops: Remove pageset_notifier
  2009-06-17 20:33 [this_cpu_xx V2 00/19] Introduce this_cpu_xx operations cl
                   ` (17 preceding siblings ...)
  2009-06-17 20:33 ` [this_cpu_xx V2 18/19] this_cpu_ops: page allocator conversion cl
@ 2009-06-17 20:33 ` cl
  18 siblings, 0 replies; 48+ messages in thread
From: cl @ 2009-06-17 20:33 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, Tejun Heo, mingo, rusty, davem

[-- Attachment #1: this_cpu_remove_pageset_notifier --]
[-- Type: text/plain, Size: 2270 bytes --]

Remove the pageset notifier since it only marks that a processor
exists on a specific node. Move that code into the vmstat notifier.

Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

---
 mm/page_alloc.c |   27 ---------------------------
 mm/vmstat.c     |    1 +
 2 files changed, 1 insertion(+), 27 deletions(-)

Index: linux-2.6/mm/vmstat.c
===================================================================
--- linux-2.6.orig/mm/vmstat.c	2009-06-17 15:02:35.000000000 -0500
+++ linux-2.6/mm/vmstat.c	2009-06-17 15:10:43.000000000 -0500
@@ -903,6 +903,7 @@ static int __cpuinit vmstat_cpuup_callba
 	case CPU_ONLINE:
 	case CPU_ONLINE_FROZEN:
 		start_cpu_timer(cpu);
+		node_set_state(cpu_to_node(cpu), N_CPU);
 		break;
 	case CPU_DOWN_PREPARE:
 	case CPU_DOWN_PREPARE_FROZEN:
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c	2009-06-17 15:10:52.000000000 -0500
+++ linux-2.6/mm/page_alloc.c	2009-06-17 15:12:37.000000000 -0500
@@ -2982,26 +2982,6 @@ static void setup_pagelist_highmark(stru
  */
 static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset);
 
-static int __cpuinit pageset_cpuup_callback(struct notifier_block *nfb,
-		unsigned long action,
-		void *hcpu)
-{
-	int cpu = (long)hcpu;
-
-	switch (action) {
-	case CPU_UP_PREPARE:
-	case CPU_UP_PREPARE_FROZEN:
-		node_set_state(cpu_to_node(cpu), N_CPU);
-		break;
-	default:
-		break;
-	}
-	return NOTIFY_OK;
-}
-
-static struct notifier_block __cpuinitdata pageset_notifier =
-	{ &pageset_cpuup_callback, NULL, 0 };
-
 /*
  * Allocate per cpu pagesets and initialize them.
  * Before this call only boot pagesets were available.
@@ -3026,13 +3006,6 @@ void __init setup_per_cpu_pageset(void)
 						percpu_pagelist_fraction));
 		}
 	}
-
-	/*
-	 * The boot cpu is always the first active.
-	 * The boot node has a processor
-	 */
-	node_set_state(cpu_to_node(smp_processor_id()), N_CPU);
-	register_cpu_notifier(&pageset_notifier);
 }
 
 static noinline __init_refok

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [this_cpu_xx V2 02/19] Introduce this_cpu_ptr() and generic this_cpu_* operations
  2009-06-17 20:33 ` [this_cpu_xx V2 02/19] Introduce this_cpu_ptr() and generic this_cpu_* operations cl
@ 2009-06-18  1:50   ` Tejun Heo
  2009-06-18  2:29     ` Tejun Heo
  0 siblings, 1 reply; 48+ messages in thread
From: Tejun Heo @ 2009-06-18  1:50 UTC (permalink / raw)
  To: cl
  Cc: akpm, linux-mm, David Howells, Ingo Molnar, Rusty Russell,
	Eric Dumazet, davem

cl@linux-foundation.org wrote:
> this_cpu_ptr
> ------------
> 
> this_cpu_ptr(xx) = per_cpu_ptr(xx, smp_processor_id).
> 
> The problem with per_cpu_ptr(x, smp_processor_id) is that it requires
> an array lookup to find the offset for the cpu. Processors typically
> have the offset for the current cpu area in some kind of (arch dependent)
> efficiently accessible register or memory location.
...
> cc: David Howells <dhowells@redhat.com>
> cc: Tejun Heo <tj@kernel.org>
> cc: Ingo Molnar <mingo@elte.hu>
> cc: Rusty Russell <rusty@rustcorp.com.au>
> cc: Eric Dumazet <dada1@cosmosbay.com>
> Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

Acked-by: Tejun Heo <tj@kernel.org>

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [this_cpu_xx V2 03/19] Use this_cpu operations for SNMP statistics
  2009-06-17 20:33 ` [this_cpu_xx V2 03/19] Use this_cpu operations for SNMP statistics cl
@ 2009-06-18  1:55   ` Tejun Heo
  0 siblings, 0 replies; 48+ messages in thread
From: Tejun Heo @ 2009-06-18  1:55 UTC (permalink / raw)
  To: cl; +Cc: akpm, linux-mm, mingo, rusty, davem

cl@linux-foundation.org wrote:
> SNMP statistic macros can be signficantly simplified.
> This will also reduce code size if the arch supports these operations
> in harware.
> 
> Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

Acked-by: Tejun Heo <tj@kernel.org>

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [this_cpu_xx V2 04/19] Use this_cpu operations for NFS statistics
  2009-06-17 20:33 ` [this_cpu_xx V2 04/19] Use this_cpu operations for NFS statistics cl
@ 2009-06-18  2:03   ` Tejun Heo
  0 siblings, 0 replies; 48+ messages in thread
From: Tejun Heo @ 2009-06-18  2:03 UTC (permalink / raw)
  To: cl; +Cc: akpm, linux-mm, Trond Myklebust, mingo, rusty, davem

cl@linux-foundation.org wrote:
> Simplify NFS statistics and allow the use of optimized
> arch instructions.
> 
> CC: Trond Myklebust <trond.myklebust@fys.uio.no>
> Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

For 04-09

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [this_cpu_xx V2 02/19] Introduce this_cpu_ptr() and generic this_cpu_* operations
  2009-06-18  1:50   ` Tejun Heo
@ 2009-06-18  2:29     ` Tejun Heo
  2009-06-18 13:54       ` Christoph Lameter
  0 siblings, 1 reply; 48+ messages in thread
From: Tejun Heo @ 2009-06-18  2:29 UTC (permalink / raw)
  To: cl
  Cc: akpm, linux-mm, David Howells, Ingo Molnar, Rusty Russell,
	Eric Dumazet, davem

Tejun Heo wrote:
> cl@linux-foundation.org wrote:
>> this_cpu_ptr
>> ------------
>>
>> this_cpu_ptr(xx) = per_cpu_ptr(xx, smp_processor_id).
>>
>> The problem with per_cpu_ptr(x, smp_processor_id) is that it requires
>> an array lookup to find the offset for the cpu. Processors typically
>> have the offset for the current cpu area in some kind of (arch dependent)
>> efficiently accessible register or memory location.
> ...
>> cc: David Howells <dhowells@redhat.com>
>> cc: Tejun Heo <tj@kernel.org>
>> cc: Ingo Molnar <mingo@elte.hu>
>> cc: Rusty Russell <rusty@rustcorp.com.au>
>> cc: Eric Dumazet <dada1@cosmosbay.com>
>> Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
> 
> Acked-by: Tejun Heo <tj@kernel.org>
> 

Oops, one problem.  this_cpu_ptr() should evaluate to the same type
pointer as input but it currently evaulates to unsigned long.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [this_cpu_xx V2 10/19] this_cpu: X86 optimized this_cpu operations
  2009-06-17 20:33 ` [this_cpu_xx V2 10/19] this_cpu: X86 optimized this_cpu operations cl
@ 2009-06-18  3:00   ` Tejun Heo
  2009-06-18 14:07     ` Christoph Lameter
  0 siblings, 1 reply; 48+ messages in thread
From: Tejun Heo @ 2009-06-18  3:00 UTC (permalink / raw)
  To: cl; +Cc: akpm, linux-mm, mingo, rusty, davem

cl@linux-foundation.org wrote:
> Basically the existing percpu ops can be used. However, we do not pass a
> reference to a percpu variable in. Instead an address of a percpu variable
> is provided.
> 
> Both preempt, the non preempt and the irqsafe operations generate the same code.
> Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

I'm a bit confused why this patch is in the middle of patches which
convert macro users?  Wouldn't it be better to put this one right
after the patch which introduces this_cpu_*()?

> ---
>  arch/x86/include/asm/percpu.h |   22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> Index: linux-2.6/arch/x86/include/asm/percpu.h
> ===================================================================
> --- linux-2.6.orig/arch/x86/include/asm/percpu.h	2009-06-04 13:38:01.000000000 -0500
> +++ linux-2.6/arch/x86/include/asm/percpu.h	2009-06-04 14:21:22.000000000 -0500
> @@ -140,6 +140,28 @@ do {							\
>  #define percpu_or(var, val)	percpu_to_op("or", per_cpu__##var, val)
>  #define percpu_xor(var, val)	percpu_to_op("xor", per_cpu__##var, val)
>  
> +#define __this_cpu_read(pcp)		percpu_from_op("mov", pcp)
                                                             ^^^^
					              missing parentheses
and maybe adding () around val is a good idea too?

Also, I'm not quite sure these macros would operate on the correct
address.  Checking... yeap, the following function,

 DEFINE_PER_CPU(int, my_pcpu_cnt);
 void my_func(void)
 {
	 int *ptr = &per_cpu__my_pcpu_cnt;

	 *(int *)this_cpu_ptr(ptr) = 0;
	 this_cpu_add(ptr, 1);
	 percpu_add(my_pcpu_cnt, 1);
 }

ends up being assembled into the following.

 mov    $0xdd48,%rax		# save offset of my_pcpu_cnt
 mov    %rax,-0x10(%rbp)		# into local var ptr
 mov    %gs:0xb800,%rdx		# fetch this_cpu_off
 movl   $0x0,(%rax,%rdx,1)	# 0 -> *(this_cpu_off + my_pcpu_cnt)
 addq   $0x1,%gs:-0x10(%rbp)	# add 1 to %gs:ptr !!!
 addl   $0x1,%gs:0xdd48		# add 1 to %gs:my_pcpu_cnt

So, this_cpu_add(ptr, 1) ends up accessing the wrong address.  Also,
please note the use of 'addq' instead of 'addl' as the pointer
variable is being modified.

> +#define __this_cpu_write(pcp, val)	percpu_to_op("mov", (pcp), val)
> +#define __this_cpu_add(pcp, val)	percpu_to_op("add", (pcp), val)
> +#define __this_cpu_sub(pcp, val)	percpu_to_op("sub", (pcp), val)
> +#define __this_cpu_and(pcp, val)	percpu_to_op("and", (pcp), val)
> +#define __this_cpu_or(pcp, val)		percpu_to_op("or", (pcp), val)
> +#define __this_cpu_xor(pcp, val)	percpu_to_op("xor", (pcp), val)
> +
> +#define this_cpu_read(pcp)		percpu_from_op("mov", (pcp))
> +#define this_cpu_write(pcp, val)	percpu_to_op("mov", (pcp), val)
> +#define this_cpu_add(pcp, val)		percpu_to_op("add", (pcp), val)
> +#define this_cpu_sub(pcp, val)		percpu_to_op("sub", (pcp), val)
> +#define this_cpu_and(pcp, val)		percpu_to_op("and", (pcp), val)
> +#define this_cpu_or(pcp, val)		percpu_to_op("or", (pcp), val)
> +#define this_cpu_xor(pcp, val)		percpu_to_op("xor", (pcp), val)
>
> +#define irqsafe_cpu_add(pcp, val)	percpu_to_op("add", (pcp), val)
> +#define irqsafe_cpu_sub(pcp, val)	percpu_to_op("sub", (pcp), val)
> +#define irqsafe_cpu_and(pcp, val)	percpu_to_op("and", (pcp), val)
> +#define irqsafe_cpu_or(pcp, val)	percpu_to_op("or", (pcp), val)
> +#define irqsafe_cpu_xor(pcp, val)	percpu_to_op("xor", (pcp), val)

Wouldn't it be clearer / easier to define preempt and irqsafe versions
as aliases of __ prefixed ones?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [this_cpu_xx V2 11/19] Use this_cpu ops for VM statistics.
  2009-06-17 20:33 ` [this_cpu_xx V2 11/19] Use this_cpu ops for VM statistics cl
@ 2009-06-18  3:05   ` Tejun Heo
  0 siblings, 0 replies; 48+ messages in thread
From: Tejun Heo @ 2009-06-18  3:05 UTC (permalink / raw)
  To: cl; +Cc: akpm, linux-mm, mingo, rusty, davem

cl@linux-foundation.org wrote:
> Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

For 11-12,

Acked-by: Tejun Heo <tj@kernel.org>

For 13, I have no idea.  :-)

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [this_cpu_xx V2 17/19] Move early initialization of pagesets out of zone_wait_table_init()
  2009-06-17 20:33 ` [this_cpu_xx V2 17/19] Move early initialization of pagesets out of zone_wait_table_init() cl
@ 2009-06-18  3:13   ` Tejun Heo
  0 siblings, 0 replies; 48+ messages in thread
From: Tejun Heo @ 2009-06-18  3:13 UTC (permalink / raw)
  To: cl; +Cc: akpm, linux-mm, Mel Gorman, mingo, rusty, davem

cl@linux-foundation.org wrote:
> Explicitly initialize the pagesets after the per cpu areas have been
> initialized. This is necessary in order to be able to use per cpu
> operations in later patches.
> 
> Cc: Mel Gorman <mel@csn.ul.ie>
> Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

For 17-19, not being too familiar with the code, I can't ack them but
at a glance they look really nice to me.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [this_cpu_xx V2 13/19] Use this_cpu operations in slub
  2009-06-17 20:33 ` [this_cpu_xx V2 13/19] Use this_cpu operations in slub cl
@ 2009-06-18  6:20   ` Pekka Enberg
  2009-06-18  6:25     ` Pekka Enberg
                       ` (2 more replies)
  0 siblings, 3 replies; 48+ messages in thread
From: Pekka Enberg @ 2009-06-18  6:20 UTC (permalink / raw)
  To: cl; +Cc: akpm, linux-mm, Tejun Heo, mingo, rusty, davem

Hi Christoph,

On Wed, Jun 17, 2009 at 11:33 PM, <cl@linux-foundation.org> wrote:
> @@ -1604,9 +1595,6 @@ static void *__slab_alloc(struct kmem_ca
>        void **object;
>        struct page *new;
>
> -       /* We handle __GFP_ZERO in the caller */
> -       gfpflags &= ~__GFP_ZERO;
> -

This should probably not be here.

> @@ -2724,7 +2607,19 @@ static noinline struct kmem_cache *dma_k
>        realsize = kmalloc_caches[index].objsize;
>        text = kasprintf(flags & ~SLUB_DMA, "kmalloc_dma-%d",
>                         (unsigned int)realsize);
> -       s = kmalloc(kmem_size, flags & ~SLUB_DMA);
> +
> +       if (flags & __GFP_WAIT)
> +               s = kmalloc(kmem_size, flags & ~SLUB_DMA);
> +       else {
> +               int i;
> +
> +               s = NULL;
> +               for (i = 0; i < SLUB_PAGE_SHIFT; i++)
> +                       if (kmalloc_caches[i].size) {
> +                               s = kmalloc_caches + i;
> +                               break;
> +                       }
> +       }

[snip]

> A particular problem for the dynamic dma kmalloc slab creation is that the
> new percpu allocator cannot be called from an atomic context. The solution
> adopted here for the atomic context is to track spare elements in the per
> cpu kmem_cache array for non dma kmallocs. Use them if necessary for dma
> cache creation from an atomic context. Otherwise we just fail the allocation.

OK, I am confused. Isn't the whole point in separating DMA caches that
we don't mix regular and DMA allocations in the same slab and using up
precious DMA memory on some archs?

So I don't think the above hunk is a good solution to this at all. We
certainly can remove the lazy DMA slab creation (why did we add it in
the first place?) but how hard is it to fix the per-cpu allocator to
work in atomic contexts?

                                          Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [this_cpu_xx V2 13/19] Use this_cpu operations in slub
  2009-06-18  6:20   ` Pekka Enberg
@ 2009-06-18  6:25     ` Pekka Enberg
  2009-06-18 13:59       ` Christoph Lameter
  2009-06-18  6:49     ` Tejun Heo
  2009-06-18 13:59     ` Christoph Lameter
  2 siblings, 1 reply; 48+ messages in thread
From: Pekka Enberg @ 2009-06-18  6:25 UTC (permalink / raw)
  To: cl; +Cc: akpm, linux-mm, Tejun Heo, mingo, rusty, davem

On Thu, Jun 18, 2009 at 9:20 AM, Pekka Enberg<penberg@cs.helsinki.fi> wrote:
> Hi Christoph,
>
> On Wed, Jun 17, 2009 at 11:33 PM, <cl@linux-foundation.org> wrote:
>> @@ -1604,9 +1595,6 @@ static void *__slab_alloc(struct kmem_ca
>>        void **object;
>>        struct page *new;
>>
>> -       /* We handle __GFP_ZERO in the caller */
>> -       gfpflags &= ~__GFP_ZERO;
>> -
>
> This should probably not be here.
>
>> @@ -2724,7 +2607,19 @@ static noinline struct kmem_cache *dma_k
>>        realsize = kmalloc_caches[index].objsize;
>>        text = kasprintf(flags & ~SLUB_DMA, "kmalloc_dma-%d",
>>                         (unsigned int)realsize);
>> -       s = kmalloc(kmem_size, flags & ~SLUB_DMA);
>> +
>> +       if (flags & __GFP_WAIT)
>> +               s = kmalloc(kmem_size, flags & ~SLUB_DMA);
>> +       else {
>> +               int i;
>> +
>> +               s = NULL;
>> +               for (i = 0; i < SLUB_PAGE_SHIFT; i++)
>> +                       if (kmalloc_caches[i].size) {
>> +                               s = kmalloc_caches + i;
>> +                               break;
>> +                       }
>> +       }
>
> [snip]
>
>> A particular problem for the dynamic dma kmalloc slab creation is that the
>> new percpu allocator cannot be called from an atomic context. The solution
>> adopted here for the atomic context is to track spare elements in the per
>> cpu kmem_cache array for non dma kmallocs. Use them if necessary for dma
>> cache creation from an atomic context. Otherwise we just fail the allocation.
>
> OK, I am confused. Isn't the whole point in separating DMA caches that
> we don't mix regular and DMA allocations in the same slab and using up
> precious DMA memory on some archs?
>
> So I don't think the above hunk is a good solution to this at all. We
> certainly can remove the lazy DMA slab creation (why did we add it in
> the first place?) but how hard is it to fix the per-cpu allocator to
> work in atomic contexts?

Oh, and how does this work with the early boot slab code? We're
creating all the kmalloc caches with interrupts disabled and doing
per-cpu allocations, no?

                                                        Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [this_cpu_xx V2 16/19] this_cpu: slub aggressive use of this_cpu operations in the hotpaths
  2009-06-17 20:33 ` [this_cpu_xx V2 16/19] this_cpu: slub aggressive use of this_cpu operations in the hotpaths cl
@ 2009-06-18  6:33   ` Pekka Enberg
  2009-06-18 11:59     ` Mathieu Desnoyers
  2009-06-18 14:00     ` Christoph Lameter
  0 siblings, 2 replies; 48+ messages in thread
From: Pekka Enberg @ 2009-06-18  6:33 UTC (permalink / raw)
  To: cl; +Cc: akpm, linux-mm, Mathieu Desnoyers, Tejun Heo, mingo, rusty, davem

On Wed, 2009-06-17 at 16:33 -0400, cl@linux-foundation.org wrote:
> Use this_cpu_* operations in the hotpath to avoid calculations of
> kmem_cache_cpu pointer addresses.
> 
> It is not clear if this is always an advantage.
> 
> On x86 there is a tradeof: Multiple uses segment prefixes against an
> address calculation and more register pressure.
> 
> On the other hand the use of prefixes is necessary if we want to use
> Mathieus scheme for fastpaths that do not require interrupt disable.

On an unrelated note, it sure would be nice if the SLUB allocator didn't
have to disable interrupts because then we could just get rid of the gfp
masking there completely.

			Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [this_cpu_xx V2 13/19] Use this_cpu operations in slub
  2009-06-18  6:20   ` Pekka Enberg
  2009-06-18  6:25     ` Pekka Enberg
@ 2009-06-18  6:49     ` Tejun Heo
  2009-06-18  7:35       ` Pekka Enberg
  2009-06-18 13:59     ` Christoph Lameter
  2 siblings, 1 reply; 48+ messages in thread
From: Tejun Heo @ 2009-06-18  6:49 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: cl, akpm, linux-mm, mingo, rusty, davem

Hello,

Pekka Enberg wrote:
> So I don't think the above hunk is a good solution to this at all. We
> certainly can remove the lazy DMA slab creation (why did we add it in
> the first place?) but how hard is it to fix the per-cpu allocator to
> work in atomic contexts?

Should be possible but I wanna avoid that as long as possible.  Atomic
allocations suck anyway...  :-(

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [this_cpu_xx V2 13/19] Use this_cpu operations in slub
  2009-06-18  6:49     ` Tejun Heo
@ 2009-06-18  7:35       ` Pekka Enberg
  0 siblings, 0 replies; 48+ messages in thread
From: Pekka Enberg @ 2009-06-18  7:35 UTC (permalink / raw)
  To: Tejun Heo
  Cc: cl, akpm, linux-mm, mingo, rusty, davem, Linus Torvalds,
	Benjamin Herrenschmidt, Nick Piggin

Hi Tejun,

On Thu, Jun 18, 2009 at 9:49 AM, Tejun Heo<tj@kernel.org> wrote:
>> So I don't think the above hunk is a good solution to this at all. We
>> certainly can remove the lazy DMA slab creation (why did we add it in
>> the first place?) but how hard is it to fix the per-cpu allocator to
>> work in atomic contexts?
>
> Should be possible but I wanna avoid that as long as possible.  Atomic
> allocations suck anyway...  :-(

OK, but I suspect this could turn into an issue as people start using
kmalloc() earlier in the boot sequence (where interrupts are disabled)
and will likely expect other allocators to work there too.

                                         Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [this_cpu_xx V2 16/19] this_cpu: slub aggressive use of this_cpu operations in the hotpaths
  2009-06-18  6:33   ` Pekka Enberg
@ 2009-06-18 11:59     ` Mathieu Desnoyers
  2009-06-18 14:00     ` Christoph Lameter
  1 sibling, 0 replies; 48+ messages in thread
From: Mathieu Desnoyers @ 2009-06-18 11:59 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: cl, akpm, linux-mm, Tejun Heo, mingo, rusty, davem

* Pekka Enberg (penberg@cs.helsinki.fi) wrote:
> On Wed, 2009-06-17 at 16:33 -0400, cl@linux-foundation.org wrote:
> > Use this_cpu_* operations in the hotpath to avoid calculations of
> > kmem_cache_cpu pointer addresses.
> > 
> > It is not clear if this is always an advantage.
> > 
> > On x86 there is a tradeof: Multiple uses segment prefixes against an
> > address calculation and more register pressure.
> > 
> > On the other hand the use of prefixes is necessary if we want to use
> > Mathieus scheme for fastpaths that do not require interrupt disable.
> 
> On an unrelated note, it sure would be nice if the SLUB allocator didn't
> have to disable interrupts because then we could just get rid of the gfp
> masking there completely.
> 

The solution I had just gets rid of the irqoff for the fast path. The
slow path still needs to disable interrupts.

Mathieu

> 			Pekka
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [this_cpu_xx V2 02/19] Introduce this_cpu_ptr() and generic this_cpu_* operations
  2009-06-18  2:29     ` Tejun Heo
@ 2009-06-18 13:54       ` Christoph Lameter
  2009-06-18 14:49         ` Tejun Heo
  0 siblings, 1 reply; 48+ messages in thread
From: Christoph Lameter @ 2009-06-18 13:54 UTC (permalink / raw)
  To: Tejun Heo
  Cc: akpm, linux-mm, David Howells, Ingo Molnar, Rusty Russell,
	Eric Dumazet, davem

On Thu, 18 Jun 2009, Tejun Heo wrote:

> Oops, one problem.  this_cpu_ptr() should evaluate to the same type
> pointer as input but it currently evaulates to unsigned long.

this_cpu_ptr is uses SHIFT_PERCPU_PTR which preserves the type.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [this_cpu_xx V2 13/19] Use this_cpu operations in slub
  2009-06-18  6:20   ` Pekka Enberg
  2009-06-18  6:25     ` Pekka Enberg
  2009-06-18  6:49     ` Tejun Heo
@ 2009-06-18 13:59     ` Christoph Lameter
  2009-06-25  7:11       ` Pekka Enberg
  2 siblings, 1 reply; 48+ messages in thread
From: Christoph Lameter @ 2009-06-18 13:59 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: akpm, linux-mm, Tejun Heo, mingo, rusty, davem

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1564 bytes --]

On Thu, 18 Jun 2009, Pekka Enberg wrote:

> Hi Christoph,
>
> On Wed, Jun 17, 2009 at 11:33 PM, <cl@linux-foundation.org> wrote:
> > @@ -1604,9 +1595,6 @@ static void *__slab_alloc(struct kmem_ca
> >        void **object;
> >        struct page *new;
> >
> > -       /* We handle __GFP_ZERO in the caller */
> > -       gfpflags &= ~__GFP_ZERO;
> > -
>
> This should probably not be here.

Yes how did this get in there? Useless code somehow leaked in.

> > A particular problem for the dynamic dma kmalloc slab creation is that the
> > new percpu allocator cannot be called from an atomic context. The solution
> > adopted here for the atomic context is to track spare elements in the per
> > cpu kmem_cache array for non dma kmallocs. Use them if necessary for dma
> > cache creation from an atomic context. Otherwise we just fail the allocation.
>
> OK, I am confused. Isn't the whole point in separating DMA caches that
> we don't mix regular and DMA allocations in the same slab and using up
> precious DMA memory on some archs?

DMA caches exist to allocate memory in a range that can be reached by
legacy devices. F.e. some SCSI controllers can only dma below 16MB.

> So I don't think the above hunk is a good solution to this at all. We
> certainly can remove the lazy DMA slab creation (why did we add it in
> the first place?) but how hard is it to fix the per-cpu allocator to
> work in atomic contexts?

Lazy DMA creation was added to avoid having to duplicate the kmalloc array
for a few rare uses of DMA caches.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [this_cpu_xx V2 13/19] Use this_cpu operations in slub
  2009-06-18  6:25     ` Pekka Enberg
@ 2009-06-18 13:59       ` Christoph Lameter
  2009-06-25  7:12         ` Pekka Enberg
  0 siblings, 1 reply; 48+ messages in thread
From: Christoph Lameter @ 2009-06-18 13:59 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: akpm, linux-mm, Tejun Heo, mingo, rusty, davem

On Thu, 18 Jun 2009, Pekka Enberg wrote:

> Oh, and how does this work with the early boot slab code? We're
> creating all the kmalloc caches with interrupts disabled and doing
> per-cpu allocations, no?

DMA slabs are not used during early bootup.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [this_cpu_xx V2 16/19] this_cpu: slub aggressive use of this_cpu operations in the hotpaths
  2009-06-18  6:33   ` Pekka Enberg
  2009-06-18 11:59     ` Mathieu Desnoyers
@ 2009-06-18 14:00     ` Christoph Lameter
  1 sibling, 0 replies; 48+ messages in thread
From: Christoph Lameter @ 2009-06-18 14:00 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: akpm, linux-mm, Mathieu Desnoyers, Tejun Heo, mingo, rusty, davem



> On an unrelated note, it sure would be nice if the SLUB allocator didn't
> have to disable interrupts because then we could just get rid of the gfp
> masking there completely.

Right there are a number of good effects.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [this_cpu_xx V2 10/19] this_cpu: X86 optimized this_cpu operations
  2009-06-18  3:00   ` Tejun Heo
@ 2009-06-18 14:07     ` Christoph Lameter
  2009-06-18 14:48       ` Tejun Heo
  0 siblings, 1 reply; 48+ messages in thread
From: Christoph Lameter @ 2009-06-18 14:07 UTC (permalink / raw)
  To: Tejun Heo; +Cc: akpm, linux-mm, mingo, rusty, davem

On Thu, 18 Jun 2009, Tejun Heo wrote:

> > +#define __this_cpu_read(pcp)		percpu_from_op("mov", pcp)
>                                                              ^^^^
> 					              missing parentheses
> and maybe adding () around val is a good idea too?

() around pcp right.

> Also, I'm not quite sure these macros would operate on the correct
> address.  Checking... yeap, the following function,
>
>  DEFINE_PER_CPU(int, my_pcpu_cnt);
>  void my_func(void)
>  {
> 	 int *ptr = &per_cpu__my_pcpu_cnt;
>
> 	 *(int *)this_cpu_ptr(ptr) = 0;
> 	 this_cpu_add(ptr, 1);

Needs to be this_cpu_add(*ptr, 1). this_cpu_add does not take a pointer
to an int but a lvalue. The typical use case is with a struct. I.e.

struct {
	int x;
} * ptr = &per_cpu_var(m_cpu_pnt);

then do

this_cpu_add(ptr->x, 1)


> 	 percpu_add(my_pcpu_cnt, 1);
>  }
>
> So, this_cpu_add(ptr, 1) ends up accessing the wrong address.  Also,
> please note the use of 'addq' instead of 'addl' as the pointer
> variable is being modified.

You incremented the pointer instead of the value pointed to. Look at the
patches that use this_cpu_add(). You pass the object to be incremented not
a pointer. If the convention would be different then the address would
have to be taken of these objects everywhere.

> > +#define irqsafe_cpu_add(pcp, val)	percpu_to_op("add", (pcp), val)
> > +#define irqsafe_cpu_sub(pcp, val)	percpu_to_op("sub", (pcp), val)
> > +#define irqsafe_cpu_and(pcp, val)	percpu_to_op("and", (pcp), val)
> > +#define irqsafe_cpu_or(pcp, val)	percpu_to_op("or", (pcp), val)
> > +#define irqsafe_cpu_xor(pcp, val)	percpu_to_op("xor", (pcp), val)
>
> Wouldn't it be clearer / easier to define preempt and irqsafe versions
> as aliases of __ prefixed ones?

Ok will do that.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [this_cpu_xx V2 10/19] this_cpu: X86 optimized this_cpu operations
  2009-06-18 14:07     ` Christoph Lameter
@ 2009-06-18 14:48       ` Tejun Heo
  2009-06-18 15:39         ` Christoph Lameter
  0 siblings, 1 reply; 48+ messages in thread
From: Tejun Heo @ 2009-06-18 14:48 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-mm, mingo, rusty, davem

Hello,

Christoph Lameter wrote:
>>  DEFINE_PER_CPU(int, my_pcpu_cnt);
>>  void my_func(void)
>>  {
>> 	 int *ptr = &per_cpu__my_pcpu_cnt;
>>
>> 	 *(int *)this_cpu_ptr(ptr) = 0;
>> 	 this_cpu_add(ptr, 1);
> 
> Needs to be this_cpu_add(*ptr, 1). this_cpu_add does not take a pointer
> to an int but a lvalue. The typical use case is with a struct. I.e.
> 
> struct {
> 	int x;
> } * ptr = &per_cpu_var(m_cpu_pnt);
> 
> then do
> 
> this_cpu_add(ptr->x, 1)
> 
> 
>> 	 percpu_add(my_pcpu_cnt, 1);
>>  }
>>
>> So, this_cpu_add(ptr, 1) ends up accessing the wrong address.  Also,
>> please note the use of 'addq' instead of 'addl' as the pointer
>> variable is being modified.
> 
> You incremented the pointer instead of the value pointed to. Look at the
> patches that use this_cpu_add(). You pass the object to be incremented not
> a pointer. If the convention would be different then the address would
> have to be taken of these objects everywhere.

Ah... okay, so it's supposed to take a lvalue.  I think it would be
better to make it take pointer.  lvalue parameter is just weird when
dynamic percpu variables are involved.  The old percpu accessors
taking lvalue has more to do with the way percpu variables were
defined in the beginning than anything else and are inconsistent with
other similar accessors in the kernel.  As the new accessors are gonna
replace the old ones eventually and maybe leave only the most often
used ones as wrapper around pointer based ones, I think it would be
better to make the transition while introducing new accessors.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [this_cpu_xx V2 02/19] Introduce this_cpu_ptr() and generic this_cpu_* operations
  2009-06-18 13:54       ` Christoph Lameter
@ 2009-06-18 14:49         ` Tejun Heo
  0 siblings, 0 replies; 48+ messages in thread
From: Tejun Heo @ 2009-06-18 14:49 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, linux-mm, David Howells, Ingo Molnar, Rusty Russell,
	Eric Dumazet, davem

Christoph Lameter wrote:
> On Thu, 18 Jun 2009, Tejun Heo wrote:
> 
>> Oops, one problem.  this_cpu_ptr() should evaluate to the same type
>> pointer as input but it currently evaulates to unsigned long.
> 
> this_cpu_ptr is uses SHIFT_PERCPU_PTR which preserves the type.

Yeap, I was passing in the pointer variable instead of lvalue and was
getting type mismatch warnings.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [this_cpu_xx V2 10/19] this_cpu: X86 optimized this_cpu operations
  2009-06-18 14:48       ` Tejun Heo
@ 2009-06-18 15:39         ` Christoph Lameter
  2009-06-18 16:06           ` Tejun Heo
  0 siblings, 1 reply; 48+ messages in thread
From: Christoph Lameter @ 2009-06-18 15:39 UTC (permalink / raw)
  To: Tejun Heo; +Cc: akpm, linux-mm, mingo, rusty, davem

On Thu, 18 Jun 2009, Tejun Heo wrote:

> Ah... okay, so it's supposed to take a lvalue.  I think it would be
> better to make it take pointer.  lvalue parameter is just weird when
> dynamic percpu variables are involved.  The old percpu accessors
> taking lvalue has more to do with the way percpu variables were
> defined in the beginning than anything else and are inconsistent with
> other similar accessors in the kernel.  As the new accessors are gonna
> replace the old ones eventually and maybe leave only the most often
> used ones as wrapper around pointer based ones, I think it would be
> better to make the transition while introducing new accessors.

The main purpose of these operations is to increment counters. Passing a
pointer would mean adding the & operator in all locations. Is there any
benefit through the use of the & operator?

lvalues of structs in the form of my_struct->field is a natural form of
referring to scalars.

The operation occurs on the object not on the pointer.

The special feature is that the address of the object is taken and its
address is relocated so that the current processors instance of the object
is used.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [this_cpu_xx V2 10/19] this_cpu: X86 optimized this_cpu operations
  2009-06-18 15:39         ` Christoph Lameter
@ 2009-06-18 16:06           ` Tejun Heo
  2009-06-18 16:15             ` Tejun Heo
                               ` (2 more replies)
  0 siblings, 3 replies; 48+ messages in thread
From: Tejun Heo @ 2009-06-18 16:06 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-mm, mingo, rusty, davem

Christoph Lameter wrote:
> On Thu, 18 Jun 2009, Tejun Heo wrote:
> 
>> Ah... okay, so it's supposed to take a lvalue.  I think it would be
>> better to make it take pointer.  lvalue parameter is just weird when
>> dynamic percpu variables are involved.  The old percpu accessors
>> taking lvalue has more to do with the way percpu variables were
>> defined in the beginning than anything else and are inconsistent with
>> other similar accessors in the kernel.  As the new accessors are gonna
>> replace the old ones eventually and maybe leave only the most often
>> used ones as wrapper around pointer based ones, I think it would be
>> better to make the transition while introducing new accessors.
> 
> The main purpose of these operations is to increment counters. Passing a
> pointer would mean adding the & operator in all locations. Is there any
> benefit through the use of the & operator?
> 
> lvalues of structs in the form of my_struct->field is a natural form of
> referring to scalars.
> 
> The operation occurs on the object not on the pointer.
> 
> The special feature is that the address of the object is taken and its
> address is relocated so that the current processors instance of the object
> is used.

Functionally, there's no practical difference but it's just weird to
use scalar as input/output parameter.  All the atomic and bitops
operations are taking pointers.  In fact, there are only very few
which take lvalue input and modify it, so I think it would be much
better to take pointers like normal C functions and macros for the
sake of consistency.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [this_cpu_xx V2 10/19] this_cpu: X86 optimized this_cpu operations
  2009-06-18 16:06           ` Tejun Heo
@ 2009-06-18 16:15             ` Tejun Heo
  2009-06-18 17:05             ` Christoph Lameter
  2009-06-19  5:41             ` Rusty Russell
  2 siblings, 0 replies; 48+ messages in thread
From: Tejun Heo @ 2009-06-18 16:15 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-mm, mingo, rusty, davem

Tejun Heo wrote:
> Functionally, there's no practical difference but it's just weird to
> use scalar as input/output parameter.  All the atomic and bitops
> operations are taking pointers.  In fact, there are only very few
> which take lvalue input and modify it, so I think it would be much
> better to take pointers like normal C functions and macros for the
> sake of consistency.

One notable exception tho is the get/put_user() macros.  In that these
percpu ops behave differently depending on input parameter size might
put them closer to get/put_user() than other atomic / bitops
operations.  Eh... I'm not so sure anymore.  It's purely interface
decision.  Ingo, Rusty, what do you guys think?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [this_cpu_xx V2 10/19] this_cpu: X86 optimized this_cpu operations
  2009-06-18 16:06           ` Tejun Heo
  2009-06-18 16:15             ` Tejun Heo
@ 2009-06-18 17:05             ` Christoph Lameter
  2009-06-19  5:41             ` Rusty Russell
  2 siblings, 0 replies; 48+ messages in thread
From: Christoph Lameter @ 2009-06-18 17:05 UTC (permalink / raw)
  To: Tejun Heo; +Cc: akpm, linux-mm, mingo, rusty, davem

On Fri, 19 Jun 2009, Tejun Heo wrote:

> Functionally, there's no practical difference but it's just weird to
> use scalar as input/output parameter.  All the atomic and bitops
> operations are taking pointers.  In fact, there are only very few
> which take lvalue input and modify it, so I think it would be much
> better to take pointers like normal C functions and macros for the
> sake of consistency.

The atomic operators take an atomic_t or so type as a parameter. Those
operations actually can be handled like functions.

this_cpu and per cpuoperations can operate on an arbitrary type and
dynamically generate code adequate for the size of the variable involved.

Taking the address of a per cpu static or dynamically allocated variable
is not meaningful. The address must be relocated using the per cpu offset
for the desired processor in order to point to an instance.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [this_cpu_xx V2 10/19] this_cpu: X86 optimized this_cpu operations
  2009-06-18 16:06           ` Tejun Heo
  2009-06-18 16:15             ` Tejun Heo
  2009-06-18 17:05             ` Christoph Lameter
@ 2009-06-19  5:41             ` Rusty Russell
  2009-06-23 18:00               ` Christoph Lameter
  2 siblings, 1 reply; 48+ messages in thread
From: Rusty Russell @ 2009-06-19  5:41 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Christoph Lameter, akpm, linux-mm, mingo, davem

On Fri, 19 Jun 2009 01:36:15 am Tejun Heo wrote:
> Christoph Lameter wrote:
> > On Thu, 18 Jun 2009, Tejun Heo wrote:
> >> Ah... okay, so it's supposed to take a lvalue.  I think it would be
> >> better to make it take pointer.  lvalue parameter is just weird when
> >> dynamic percpu variables are involved.  The old percpu accessors
> >> taking lvalue has more to do with the way percpu variables were
> >> defined in the beginning than anything else and are inconsistent with
> >> other similar accessors in the kernel.  As the new accessors are gonna
> >> replace the old ones eventually and maybe leave only the most often
> >> used ones as wrapper around pointer based ones, I think it would be
> >> better to make the transition while introducing new accessors.
> >
> > The main purpose of these operations is to increment counters. Passing a
> > pointer would mean adding the & operator in all locations. Is there any
> > benefit through the use of the & operator?
> >
> > lvalues of structs in the form of my_struct->field is a natural form of
> > referring to scalars.
> >
> > The operation occurs on the object not on the pointer.
> >
> > The special feature is that the address of the object is taken and its
> > address is relocated so that the current processors instance of the
> > object is used.
>
> Functionally, there's no practical difference but it's just weird to
> use scalar as input/output parameter.  All the atomic and bitops
> operations are taking pointers.  In fact, there are only very few
> which take lvalue input and modify it, so I think it would be much
> better to take pointers like normal C functions and macros for the
> sake of consistency.

Absolutely agreed here; C is pass by value and any use of macros to violate 
that is abhorrent.  Let's not spread the horro of cpus_* or local_irq_save()!

Thanks,
Rusty.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [this_cpu_xx V2 10/19] this_cpu: X86 optimized this_cpu operations
  2009-06-19  5:41             ` Rusty Russell
@ 2009-06-23 18:00               ` Christoph Lameter
  0 siblings, 0 replies; 48+ messages in thread
From: Christoph Lameter @ 2009-06-23 18:00 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Tejun Heo, akpm, linux-mm, mingo, davem


> > Functionally, there's no practical difference but it's just weird to
> > use scalar as input/output parameter.  All the atomic and bitops
> > operations are taking pointers.  In fact, there are only very few
> > which take lvalue input and modify it, so I think it would be much
> > better to take pointers like normal C functions and macros for the
> > sake of consistency.
>
> Absolutely agreed here; C is pass by value and any use of macros to violate
> that is abhorrent.  Let's not spread the horro of cpus_* or local_irq_save()!

All of those take a definite type. this_cpu takes an arbitrary type like
the existing percpu operations.

What you are suggesting would lead to strange results.

now

int a;

a = this_cpu_read(struct->a)

Both the result and argument are of the same type.

you suggest

a = this_cpu_read(&struct->a)

also

this_cpu_write(struct->a, a)

vs.

this_cpu_write(&struct->a, a)

Precedent for an lvalue exists in the percpu operations that work exactly
like this. Why would this_cpu deviate from that? The first parameter must
always be an assignable variable either statically allocated or
dynamically.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [this_cpu_xx V2 13/19] Use this_cpu operations in slub
  2009-06-18 13:59     ` Christoph Lameter
@ 2009-06-25  7:11       ` Pekka Enberg
  0 siblings, 0 replies; 48+ messages in thread
From: Pekka Enberg @ 2009-06-25  7:11 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-mm, Tejun Heo, mingo, rusty, davem

On Thu, 18 Jun 2009, Pekka Enberg wrote:
> > On Wed, Jun 17, 2009 at 11:33 PM, <cl@linux-foundation.org> wrote:
> > > @@ -1604,9 +1595,6 @@ static void *__slab_alloc(struct kmem_ca
> > >        void **object;
> > >        struct page *new;
> > >
> > > -       /* We handle __GFP_ZERO in the caller */
> > > -       gfpflags &= ~__GFP_ZERO;
> > > -
> >
> > This should probably not be here.

On Thu, 2009-06-18 at 09:59 -0400, Christoph Lameter wrote:
> Yes how did this get in there? Useless code somehow leaked in.

Hmm, you know as well as I do that Linus added it after a flame fest :)

The change has nothing to do with this series so lets keep it out of the
patch, OK?

			Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [this_cpu_xx V2 13/19] Use this_cpu operations in slub
  2009-06-18 13:59       ` Christoph Lameter
@ 2009-06-25  7:12         ` Pekka Enberg
  0 siblings, 0 replies; 48+ messages in thread
From: Pekka Enberg @ 2009-06-25  7:12 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, linux-mm, Tejun Heo, mingo, rusty, davem, heiko.carstens

On Thu, 18 Jun 2009, Pekka Enberg wrote:
> > Oh, and how does this work with the early boot slab code? We're
> > creating all the kmalloc caches with interrupts disabled and doing
> > per-cpu allocations, no?

On Thu, 2009-06-18 at 09:59 -0400, Christoph Lameter wrote:
> DMA slabs are not used during early bootup.

Actually, I think s390 is starting to use them during early boot:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=964cf35c88f93b4927dbc4e950dfa4d880c7f9d1

			Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2009-06-25  7:12 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-17 20:33 [this_cpu_xx V2 00/19] Introduce this_cpu_xx operations cl
2009-06-17 20:33 ` [this_cpu_xx V2 01/19] Fix handling of pagesets for downed cpus cl
2009-06-17 20:33 ` [this_cpu_xx V2 02/19] Introduce this_cpu_ptr() and generic this_cpu_* operations cl
2009-06-18  1:50   ` Tejun Heo
2009-06-18  2:29     ` Tejun Heo
2009-06-18 13:54       ` Christoph Lameter
2009-06-18 14:49         ` Tejun Heo
2009-06-17 20:33 ` [this_cpu_xx V2 03/19] Use this_cpu operations for SNMP statistics cl
2009-06-18  1:55   ` Tejun Heo
2009-06-17 20:33 ` [this_cpu_xx V2 04/19] Use this_cpu operations for NFS statistics cl
2009-06-18  2:03   ` Tejun Heo
2009-06-17 20:33 ` [this_cpu_xx V2 05/19] use this_cpu ops for network statistics cl
2009-06-17 20:33 ` [this_cpu_xx V2 06/19] this_cpu_ptr: Straight transformations cl
2009-06-17 20:33 ` [this_cpu_xx V2 07/19] this_cpu_ptr: Elimninate get/put_cpu cl
2009-06-17 20:33 ` [this_cpu_xx V2 08/19] this_cpu_ptr: xfs_icsb_modify_counters does not need "cpu" variable cl
2009-06-17 20:33 ` [this_cpu_xx V2 09/19] Use this_cpu_ptr in crypto subsystem cl
2009-06-17 20:33 ` [this_cpu_xx V2 10/19] this_cpu: X86 optimized this_cpu operations cl
2009-06-18  3:00   ` Tejun Heo
2009-06-18 14:07     ` Christoph Lameter
2009-06-18 14:48       ` Tejun Heo
2009-06-18 15:39         ` Christoph Lameter
2009-06-18 16:06           ` Tejun Heo
2009-06-18 16:15             ` Tejun Heo
2009-06-18 17:05             ` Christoph Lameter
2009-06-19  5:41             ` Rusty Russell
2009-06-23 18:00               ` Christoph Lameter
2009-06-17 20:33 ` [this_cpu_xx V2 11/19] Use this_cpu ops for VM statistics cl
2009-06-18  3:05   ` Tejun Heo
2009-06-17 20:33 ` [this_cpu_xx V2 12/19] RCU: Use this_cpu operations cl
2009-06-17 20:33 ` [this_cpu_xx V2 13/19] Use this_cpu operations in slub cl
2009-06-18  6:20   ` Pekka Enberg
2009-06-18  6:25     ` Pekka Enberg
2009-06-18 13:59       ` Christoph Lameter
2009-06-25  7:12         ` Pekka Enberg
2009-06-18  6:49     ` Tejun Heo
2009-06-18  7:35       ` Pekka Enberg
2009-06-18 13:59     ` Christoph Lameter
2009-06-25  7:11       ` Pekka Enberg
2009-06-17 20:33 ` [this_cpu_xx V2 14/19] this_cpu: Remove slub kmem_cache fields cl
2009-06-17 20:33 ` [this_cpu_xx V2 15/19] Make slub statistics use this_cpu_inc cl
2009-06-17 20:33 ` [this_cpu_xx V2 16/19] this_cpu: slub aggressive use of this_cpu operations in the hotpaths cl
2009-06-18  6:33   ` Pekka Enberg
2009-06-18 11:59     ` Mathieu Desnoyers
2009-06-18 14:00     ` Christoph Lameter
2009-06-17 20:33 ` [this_cpu_xx V2 17/19] Move early initialization of pagesets out of zone_wait_table_init() cl
2009-06-18  3:13   ` Tejun Heo
2009-06-17 20:33 ` [this_cpu_xx V2 18/19] this_cpu_ops: page allocator conversion cl
2009-06-17 20:33 ` [this_cpu_xx V2 19/19] this_cpu ops: Remove pageset_notifier cl

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.