linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects
@ 2007-11-16 23:09 Christoph Lameter
  2007-11-16 23:09 ` [patch 01/30] cpu alloc: Simple version of the allocator (static allocations) Christoph Lameter
                   ` (29 more replies)
  0 siblings, 30 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[Note arch maintainers: Some configuration variables in arch/*/Kconfig needed
for large users of per cpu space (large NUMA mostly, or lots of processors)]
and in order to make optimal use of cpu_alloc.

V1->V2:
- Split off patch for virtualization. Patch has some instructions on
  how to configure an arch for cpu_alloc.
- uiuc patch is upstream so leave it out.
- There was an article on LWN.net on cpu_alloc.
- Add a sparc64 config
- Against current git that merged the Kconfigs for x86_64 and i386.

In various places the kernel maintains arrays of pointers indexed by
processor numbers. These are used to locate objects that need to be used
when executing on a specirfic processor. Both the slab allocator
and the page allocator use these arrays and there the arrays are used in
performance critical code. The allocpercpu functionality is a simple
allocator to provide these arrays. However, there are certain drawbacks
in using such arrays:

1. The arrays become huge for large systems and may be very sparsely
   populated (if they are dimensionied for NR_CPUS) on an architecture
   like IA64 that allows up to 4k cpus if a kernel is then booted on a
   machine that only supports 8 processors. We could nr_cpu_ids there
   but we would still have to allocate all possible processors up to
   the number of processor ids. cpu_alloc can deal with sparse cpu_maps.

2. The arrays cause surrounding variables to no longer fit into a single
   cacheline. The layout of core data structure is typically optimized so
   that variables frequently used together are placed in the same cacheline.
   Arrays of pointers move these variables far apart and destroy this effect.

3. A processor frequently follows only one pointer for its own use. Thus
   that cacheline with that pointer has to be kept in memory. The neighboring
   pointers are all to other processors that are rarely used. So a whole
   cacheline of 128 bytes may be consumed but only 8 bytes of information
   is constant use. It would be better to be able to place more information
   in this cacheline.

4. The lookup of the per cpu object is expensive and requires multiple
   memory accesses to:

   A) smp_processor_id()
   B) pointer to the base of the per cpu pointer array
   C) pointer to the per cpu object in the pointer array
   D) the per cpu object itself.

5. Each use of allocper requires its own per cpu array. On large
   system large arrays have to be allocated again and again.

6. Processor hotplug cannot effectively track the per cpu objects
   since the VM cannot find all memory that was allocated for
   a specific cpu. It is impossible to add or remove objects in
   a consistent way. Although the allocpercpu subsystem was extended
   to add that capability is not used since use would require adding
   cpu hotplug callbacks to each and every use of allocpercpu in
   the kernel.

The patchset here provides an cpu allocator that arranges data differently.
Objects are placed tightly in linear areas reserved for each processor.
The areas are of a fixed size so that address calculation can be used
instead of a lookup. This means that

6. The VM knows where all the per cpu variables are and it could remove
   or add cpu areas as cpus come online or go offline.

5. There is no need for per cpu pointer arrays.

4. The lookup of a per cpu object is easy and requires memory access to:

   A) smp_processor_id()
   B) cpu pointer to the object
   C) the per cpu object itself.

3. So one access to the not very friendly cacheline that only contains
   a single useful pointer is avoided. The cache footprint is reduced.

2. Surrounding variables can be placed in the same cacheline.
   This allow f.e. in SLUB to avoid caching objects in per cpu structures
   since the kmem_cache structure is finally available without the need
   to access a cache cold cacheline.

1. A single pointer can be used regardless of the number of processors
   in the system.

The cpu allocator managed data beginning at CPU_AREA_BASE. The pointer to
access item DATA on processor X can then be calculated using

POINTER = CPU_AREA_BASE + DATA + (X << CPU_AREA_ORDER)

This makes the allocator rely on a fixed address of the cpu area and on
a fixed size of memory for each processor (similar to S/390s
way of addressing percpu variables).

The allocator can be configured in two ways:

1. Static configuration

	The cpu areas are directly mapped memory addresses. Thus
	the memory in the cpu areas is fixed and is allocated
	as a static variable.

	The default configuration of the cpu allocator (if no arch code
	changes the settings) is to reserve a 32k area for each processor.

2. Virtual configuration

	The cpu areas are virtualized. Memory in cpu areas is allocated
	on demand. The MMU is used to map memory allocated into the
	cpu areas (in same way that the virtual memmap functionality does it).

	The maximum sizes for the cpu areas is only dependent on the amount
	of virtual memory available. The virtualization can use large
	mappings (PMDs f.e.) in order to avoid TLB pressure that could occur
	on system that only have a small page when heavy use of cpu areas
	is made.


This patch increases the speed of the SLUB fastpath and it is likely that
similar results can be obtained for other kernel subsystems :


Allocation of 10000 objects of each size. Measurement of the cycles
for each action:

Size  SLUBmm	cpu alloc
-------------------------
   8  45	38
  16  49	43
  32  61	53
  64  82	75
 128  188	176
 256  207	204
 512  260	250
1024  398	391
2048  530	511
4096  342	376

Allocation and then immeidate freeing of an object. Measured in cycles
for each alloc/free action:

alloc/free test
    SLUBmm	cpu alloc
    68-72	56-58

The cpu allocator also removes the difference in handling SMP, UP and NUMA in
the slab and page allocate and simplifies code. It is advantageous even for UP
to place per cpu data from different zones or different slabs in the same
cacheline. Cpu alloc makes uniform handling of cpu data on all three different
types of configurations possible.

The cpu allocator also decreases the memory needs for per cpu storage.

On a classic configuration with SLAB, 32 processors and the allocation of a 4 byte
counter via allocpercpu one needs the following on a 64 bit platform:

32 * 8		256	Array indexed by processor
32 * 32		1024	32 objects. The minimum allocation size of SLAB is 32.
------------------------------------------------------------------------------
Total		1280 bytes

cpu alloc needs

32 * 4		128 bytes

This is one tenth of storage. Granted this is the worst case scenario for a
32 processor system but it shows the savings that can be had. cpu alloc can
allocate 10 counters in the same cacheline for the price of one with
allocpercpu. The allocpercpu counters are likely dispersed over all of
memory. So multiple cachelines (in the worst case 10) need to be kept in
memory if those counters need constant updating. cpu alloc will keep the
10 counter in a single cacheline. cpu alloc can keep up to 16 counters
in the same cacheline if the machine has a 64 byte cacheline size.

The use of the cpu area is usually pretty minimal. 32 bit SMP systems typicaly
use about 8k of cpu area space after bootup. 64 bit SMP around 16k. Small NUMA
systems (8p 4node) use about 64k. Large NUMA system may need a megabyte of
cpu area.

The usage of the per cpu areas typically increases by

1. New slabs being created (needs about 12 bytes per slab on 32 bit, 20 on 64 bit)
2. New devices being mounted that need cpu data for statistics
3. Network devices statistics
4. Special network features (Dave needs to run 100000 IP tunnels)

The current use of the cpu area can be seen in the field

	cpu_bytes

in /proc/vmstat

Drawbacks:

1. The per cpu area size is fixed

   If we use a virtually mapped area then this is not a problem if there
   is sufficient virtual space. The 100000 IP tunnels are only realistic
   with a virtually mapped cpu area.

2. The cpu allocator cannot control allocation of individual objects like
   allocpercpu may. This is in actuality never used except in net/iucv/iucv.c
   where we have a single case of a per cpu allocation being used to allocate
   GFP_DMA structures(!). A patch is provided that replaces the use of
   allocpercpu with explicit calls to allocators for each object in iucv.c

TODO:
- Currently only i386, ia64 and x86_64 arch definitions are provided.
  Other arches fall back to 64k static configurations.
- Cpu hotplug support. Current we simply allocate for all possible processors.
  We could reduce this to only online processors if we could allocate the
  cpu area for the new processor before the callbacks are run and if we could
  free the cpu areas for a processor going down after all the callbacks for
  that were run.

The patchset implements cpu alloc and then gradually replaces all uses of
allocpercpu in the kernel. The last patch removes the allocpercpu support.
If the last patch is not applied then allocpercpu can coexist with cpu alloc.

The patchset is available also via

git pull git://git.kernel.org/pub/scm/linux/kernel/git/christoph/slab.git cpu_alloc


The following patches are based on the linux-2.6 git tree +

git://git.kernel.org/pub/scm/linux/kernel/git/christoph/slab.git performance

(which is the mm version of SLUB)

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 01/30] cpu alloc: Simple version of the allocator (static allocations)
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  2007-11-16 23:09 ` [patch 02/30] cpu alloc: Use in SLUB Christoph Lameter
                   ` (28 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[-- Attachment #1: cpu_alloc --]
[-- Type: text/plain, Size: 11842 bytes --]

The core portion of the cpu allocator.

The per cpu allocator allows dynamic allocation of memory on all
processor simultaneously. A bitmap is used to track used areas.
The allocator implements tight packing to reduce the cache footprint
and increase speed since cacheline contention is typically not a concern
for memory mainly used by a single cpu. Small objects will fill up gaps
left by larger allocations that required alignments.

This is a limited version of the cpu allocator that only performs a
static allocation of a single page for each processor. This is enough
for the use of the cpu allocator in the slab and page allocator for most
of the common configurations. The configuration will be useful for
embedded systems to reduce memory requirements. However, there is a hard limit
of the size of the per cpu structures and so the default configuration of an
order 0 allocation can only support up to 150 slab caches (most systems that
I got use 70) and probably not more than 16 or so NUMA nodes. The size of the
statically configured area can be changed via make menuconfig etc.

The cpu allocator virtualization patch is needed in order to support the dynamically
extending per cpu areas.

V1->V2:
- Split off the dynamic extendable cpu area feature to make it clear that it exists.\
- Remove useless variables.
- Add boot_cpu_alloc for bootime cpu area reservations (allows the folding in of
  per cpu areas and other arch specific per cpu stuff during boot).

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 include/linux/percpu.h |   59 +++++++++++++++
 include/linux/vmstat.h |    2 
 mm/Kconfig             |    7 +
 mm/Makefile            |    2 
 mm/cpu_alloc.c         |  184 +++++++++++++++++++++++++++++++++++++++++++++++++
 mm/vmstat.c            |    1 
 6 files changed, 253 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/cpu_alloc.h
 create mode 100644 mm/cpu_alloc.c

Index: linux-2.6/include/linux/vmstat.h
===================================================================
--- linux-2.6.orig/include/linux/vmstat.h	2007-11-16 14:51:29.326681524 -0800
+++ linux-2.6/include/linux/vmstat.h	2007-11-16 14:51:47.569430596 -0800
@@ -36,7 +36,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS
 		FOR_ALL_ZONES(PGSCAN_KSWAPD),
 		FOR_ALL_ZONES(PGSCAN_DIRECT),
 		PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL,
-		PAGEOUTRUN, ALLOCSTALL, PGROTATED,
+		PAGEOUTRUN, ALLOCSTALL, PGROTATED, CPU_BYTES,
 		NR_VM_EVENT_ITEMS
 };
 
Index: linux-2.6/mm/Kconfig
===================================================================
--- linux-2.6.orig/mm/Kconfig	2007-11-16 14:51:29.338681182 -0800
+++ linux-2.6/mm/Kconfig	2007-11-16 14:53:31.614680913 -0800
@@ -194,3 +194,10 @@ config NR_QUICK
 config VIRT_TO_BUS
 	def_bool y
 	depends on !ARCH_NO_VIRT_TO_BUS
+
+config CPU_AREA_ORDER
+	int "Maximum size (order) of CPU area"
+	default "3"
+	help
+	  Sets the maximum amount of memory that can be allocated via cpu_alloc
+	  The size is set in page order, so 0 = PAGE_SIZE, 1 = PAGE_SIZE << 1 etc.
Index: linux-2.6/mm/Makefile
===================================================================
--- linux-2.6.orig/mm/Makefile	2007-11-16 14:51:29.346681415 -0800
+++ linux-2.6/mm/Makefile	2007-11-16 14:51:47.569430596 -0800
@@ -11,7 +11,7 @@ obj-y			:= bootmem.o filemap.o mempool.o
 			   page_alloc.o page-writeback.o pdflush.o \
 			   readahead.o swap.o truncate.o vmscan.o \
 			   prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
-			   page_isolation.o $(mmu-y)
+			   page_isolation.o cpu_alloc.o $(mmu-y)
 
 obj-$(CONFIG_BOUNCE)	+= bounce.o
 obj-$(CONFIG_SWAP)	+= page_io.o swap_state.o swapfile.o thrash.o
Index: linux-2.6/mm/cpu_alloc.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/mm/cpu_alloc.c	2007-11-16 14:51:47.573430967 -0800
@@ -0,0 +1,184 @@
+/*
+ * Cpu allocator - Manage objects allocated for each processor
+ *
+ * (C) 2007 SGI, Christoph Lameter <clameter@sgi.com>
+ * 	Basic implementation with allocation and free from a dedicated per
+ * 	cpu area.
+ *
+ * The per cpu allocator allows dynamic allocation of memory on all
+ * processor simultaneously. A bitmap is used to track used areas.
+ * The allocator implements tight packing to reduce the cache footprint
+ * and increase speed since cacheline contention is typically not a concern
+ * for memory mainly used by a single cpu. Small objects will fill up gaps
+ * left by larger allocations that required alignments.
+ */
+#include <linux/mm.h>
+#include <linux/mmzone.h>
+#include <linux/module.h>
+#include <linux/percpu.h>
+#include <linux/bitmap.h>
+
+/*
+ * Basic allocation unit. A bit map is created to track the use of each
+ * UNIT_SIZE element in the cpu area.
+ */
+
+#define UNIT_SIZE sizeof(int)
+#define UNITS (ALLOC_SIZE / UNIT_SIZE)
+
+/*
+ * How many units are needed for an object of a given size
+ */
+static int size_to_units(unsigned long size)
+{
+	return DIV_ROUND_UP(size, UNIT_SIZE);
+}
+
+/*
+ * Lock to protect the bitmap and the meta data for the cpu allocator.
+ */
+static DEFINE_SPINLOCK(cpu_alloc_map_lock);
+static unsigned long units_reserved;	/* Units reserved by boot allocations */
+
+/*
+ * Static configuration. The cpu areas are of a fixed size and
+ * cannot be extended. Such configurations are mainly useful on
+ * machines that do not have MMU support. Note that we have to use
+ * bss space for the static declarations. The combination of a large number
+ * of processors and a large cpu area may cause problems with the size
+ * of the bss segment.
+ */
+#define ALLOC_SIZE (1UL << (CONFIG_CPU_AREA_ORDER + PAGE_SHIFT))
+
+static u8 cpu_area[NR_CPUS * ALLOC_SIZE];
+static DECLARE_BITMAP(cpu_alloc_map, UNITS);
+
+void * __init boot_cpu_alloc(unsigned long size)
+{
+	unsigned long x = units_reserved;
+
+	units_reserved += size_to_units(size);
+	BUG_ON(units_reserved > UNITS);
+	return cpu_area + x * UNIT_SIZE;
+}
+
+static int first_free;		/* First known free unit */
+
+/*
+ * Mark an object as used in the cpu_alloc_map
+ *
+ * Must hold cpu_alloc_map_lock
+ */
+static void set_map(int start, int length)
+{
+	while (length-- > 0)
+		__set_bit(start++, cpu_alloc_map);
+}
+
+/*
+ * Mark an area as freed.
+ *
+ * Must hold cpu_alloc_map_lock
+ */
+static void clear_map(int start, int length)
+{
+	while (length-- > 0)
+		__clear_bit(start++, cpu_alloc_map);
+}
+
+/*
+ * Allocate an object of a certain size
+ *
+ * Returns a special pointer that can be used with CPU_PTR to find the
+ * address of the object for a certain cpu.
+ */
+void *cpu_alloc(unsigned long size, gfp_t gfpflags, unsigned long align)
+{
+	unsigned long start;
+	int units = size_to_units(size);
+	void *ptr;
+	int first;
+	unsigned long flags;
+
+	BUG_ON(gfpflags & ~(GFP_RECLAIM_MASK | __GFP_ZERO));
+
+	spin_lock_irqsave(&cpu_alloc_map_lock, flags);
+
+	first = 1;
+	start = first_free;
+
+	for ( ; ; ) {
+
+		start = find_next_zero_bit(cpu_alloc_map, ALLOC_SIZE, start);
+		if (start >= UNITS - units_reserved)
+			goto out_of_memory;
+
+		if (first)
+			first_free = start;
+
+		/*
+		 * Check alignment and that there is enough space after
+		 * the starting unit.
+		 */
+		if ((start + units_reserved) % (align / UNIT_SIZE) == 0 &&
+			find_next_bit(cpu_alloc_map, ALLOC_SIZE, start + 1)
+							>= start + units)
+				break;
+		start++;
+		first = 0;
+	}
+
+	if (first)
+		first_free = start + units;
+
+	if (start + units > UNITS - units_reserved)
+		goto out_of_memory;
+
+	set_map(start, units);
+	__count_vm_events(CPU_BYTES, units * UNIT_SIZE);
+
+	spin_unlock_irqrestore(&cpu_alloc_map_lock, flags);
+
+	ptr = cpu_area + (start + units_reserved) * UNIT_SIZE;
+
+	if (gfpflags & __GFP_ZERO) {
+		int cpu;
+
+		for_each_possible_cpu(cpu)
+			memset(CPU_PTR(ptr, cpu), 0, size);
+	}
+
+	return ptr;
+
+out_of_memory:
+	spin_unlock_irqrestore(&cpu_alloc_map_lock, flags);
+	return NULL;
+}
+EXPORT_SYMBOL(cpu_alloc);
+
+/*
+ * Free an object. The pointer must be a cpu pointer allocated
+ * via cpu_alloc.
+ */
+void cpu_free(void *start, unsigned long size)
+{
+	int units = size_to_units(size);
+	int index;
+	u8 *p = start;
+	unsigned long flags;
+
+	BUG_ON(p < (cpu_area + units_reserved * UNIT_SIZE));
+	index = (p - cpu_area) / UNIT_SIZE - units_reserved;
+	BUG_ON(!test_bit(index, cpu_alloc_map) ||
+			index >= UNITS - units_reserved);
+
+	spin_lock_irqsave(&cpu_alloc_map_lock, flags);
+
+	clear_map(index, units);
+	__count_vm_events(CPU_BYTES, -units * UNIT_SIZE);
+	if (index < first_free)
+		first_free = index;
+
+	spin_unlock_irqrestore(&cpu_alloc_map_lock, flags);
+}
+EXPORT_SYMBOL(cpu_free);
Index: linux-2.6/mm/vmstat.c
===================================================================
--- linux-2.6.orig/mm/vmstat.c	2007-11-16 14:51:43.522430963 -0800
+++ linux-2.6/mm/vmstat.c	2007-11-16 14:51:47.578181134 -0800
@@ -639,6 +639,7 @@ static const char * const vmstat_text[] 
 	"allocstall",
 
 	"pgrotated",
+	"cpu_bytes",
 #endif
 };
 
Index: linux-2.6/include/linux/percpu.h
===================================================================
--- linux-2.6.orig/include/linux/percpu.h	2007-11-16 14:51:29.334681520 -0800
+++ linux-2.6/include/linux/percpu.h	2007-11-16 14:51:47.578181134 -0800
@@ -110,4 +110,63 @@ static inline void percpu_free(void *__p
 #define free_percpu(ptr)	percpu_free((ptr))
 #define per_cpu_ptr(ptr, cpu)	percpu_ptr((ptr), (cpu))
 
+
+/*
+ * cpu allocator definitions
+ *
+ * The cpu allocator allows allocating an array of objects on all processors.
+ * A single pointer can then be used to access the instance of the object
+ * on a particular processor.
+ *
+ * Cpu objects are typically small. The allocator packs them tightly
+ * to increase the chance on each access that a per cpu object is already
+ * cached. Alignments may be specified but the intent is to align the data
+ * properly due to cpu alignment constraints and not to avoid cacheline
+ * contention. Any holes left by aligning objects are filled up with smaller
+ * objects that are allocated later.
+ *
+ * Cpu data can be allocated using CPU_ALLOC. The resulting pointer is
+ * pointing to the instance of the variable on cpu 0. It is generally an
+ * error to use the pointer directly unless we are running on cpu 0. So
+ * the use is valid during boot for example.
+ *
+ * The GFP flags have their usual function: __GFP_ZERO zeroes the object
+ * and other flags may be used to control reclaim behavior if the cpu
+ * areas have to be extended. However, zones cannot be selected nor
+ * can locality constraint flags be used.
+ *
+ * CPU_PTR() may be used to calculate the pointer for a specific processor.
+ * CPU_PTR is highly scalable since it simply adds the shifted value of
+ * smp_processor_id() to the base.
+ *
+ * Note: Synchronization is up to caller. If preemption is disabled then
+ * it is generally safe to access cpu variables (unless they are also
+ * handled from an interrupt context).
+ */
+
+#define CPU_OFFSET(__cpu) \
+	((unsigned long)(__cpu) << (CONFIG_CPU_AREA_ORDER + PAGE_SHIFT))
+
+#define CPU_PTR(__p, __cpu) ((__typeof__(__p))((void *)(__p) + \
+							CPU_OFFSET(__cpu)))
+
+#define CPU_ALLOC(type, flags)	cpu_alloc(sizeof(type), flags, \
+					__alignof__(type))
+#define CPU_FREE(pointer)	cpu_free(pointer, sizeof(*(pointer)))
+
+#define THIS_CPU(__p)	CPU_PTR(__p, smp_processor_id())
+#define __THIS_CPU(__p)	CPU_PTR(__p, raw_smp_processor_id())
+
+/*
+ * Raw calls
+ */
+void *cpu_alloc(unsigned long size, gfp_t gfp, unsigned long align);
+void cpu_free(void *cpu_pointer, unsigned long size);
+
+/*
+ * Early boot allocator for per_cpu variables and special per cpu areas.
+ * Allocations are not tracked and cannot be freed.
+ */
+void *boot_cpu_alloc(unsigned long size);
+
 #endif /* __LINUX_PERCPU_H */

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 02/30] cpu alloc: Use in SLUB
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
  2007-11-16 23:09 ` [patch 01/30] cpu alloc: Simple version of the allocator (static allocations) Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  2007-11-16 23:09 ` [patch 03/30] cpu alloc: Remove SLUB fields Christoph Lameter
                   ` (27 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[-- Attachment #1: cpu_alloc_slub_conversion --]
[-- Type: text/plain, Size: 11021 bytes --]

Using cpu alloc removes the needs for the per cpu arrays in the kmem_cache struct.
These could get quite big if we have to support system of up to thousands of cpus.
The use of alloc_percpu means that:

1. The size of kmem_cache for SMP configuration shrinks since we will only
   need 1 pointer instead of NR_CPUS. The same pointer can be used by all
   processors. Reduces cache footprint of the allocator.

2. We can dynamically size kmem_cache according to the actual nodes in the
   system meaning less memory overhead for configurations that may potentially
   support up to 1k NUMA nodes.

3. We can remove the diddle widdle with allocating and releasing kmem_cache_cpu
   structures when bringing up and shuttting down cpus. The allocpercpu
   logic will do it all for us. Removes some portions of the cpu hotplug
   functionality.

4. Fastpath performance increases by another 20% vs. the earlier improvements.
   Instead of having fastpath with 45-50 cycles it is now possible to get
   below 40.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 include/linux/slub_def.h |    6 -
 mm/slub.c                |  182 ++++++-----------------------------------------
 2 files changed, 25 insertions(+), 163 deletions(-)

Index: linux-2.6/include/linux/slub_def.h
===================================================================
--- linux-2.6.orig/include/linux/slub_def.h	2007-11-15 21:24:53.494154465 -0800
+++ linux-2.6/include/linux/slub_def.h	2007-11-15 21:25:07.622904866 -0800
@@ -34,6 +34,7 @@ struct kmem_cache_node {
  * Slab cache management.
  */
 struct kmem_cache {
+	struct kmem_cache_cpu *cpu_slab;
 	/* Used for retriving partial slabs etc */
 	unsigned long flags;
 	int size;		/* The size of an object including meta data */
@@ -63,11 +64,6 @@ struct kmem_cache {
 	int defrag_ratio;
 	struct kmem_cache_node *node[MAX_NUMNODES];
 #endif
-#ifdef CONFIG_SMP
-	struct kmem_cache_cpu *cpu_slab[NR_CPUS];
-#else
-	struct kmem_cache_cpu cpu_slab;
-#endif
 };
 
 /*
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2007-11-15 21:24:53.502154325 -0800
+++ linux-2.6/mm/slub.c	2007-11-15 21:25:07.622904866 -0800
@@ -239,15 +239,6 @@ static inline struct kmem_cache_node *ge
 #endif
 }
 
-static inline struct kmem_cache_cpu *get_cpu_slab(struct kmem_cache *s, int cpu)
-{
-#ifdef CONFIG_SMP
-	return s->cpu_slab[cpu];
-#else
-	return &s->cpu_slab;
-#endif
-}
-
 /*
  * The end pointer in a slab is special. It points to the first object in the
  * slab but has bit 0 set to mark it.
@@ -1472,7 +1463,7 @@ static inline void flush_slab(struct kme
  */
 static inline void __flush_cpu_slab(struct kmem_cache *s, int cpu)
 {
-	struct kmem_cache_cpu *c = get_cpu_slab(s, cpu);
+	struct kmem_cache_cpu *c = CPU_PTR(s->cpu_slab, cpu);
 
 	if (likely(c && c->page))
 		flush_slab(s, c);
@@ -1487,15 +1478,7 @@ static void flush_cpu_slab(void *d)
 
 static void flush_all(struct kmem_cache *s)
 {
-#ifdef CONFIG_SMP
 	on_each_cpu(flush_cpu_slab, s, 1, 1);
-#else
-	unsigned long flags;
-
-	local_irq_save(flags);
-	flush_cpu_slab(s);
-	local_irq_restore(flags);
-#endif
 }
 
 /*
@@ -1529,7 +1512,7 @@ static noinline unsigned long get_new_sl
 	if (!page)
 		return 0;
 
-	*pc = c = get_cpu_slab(s, smp_processor_id());
+	*pc = c = THIS_CPU(s->cpu_slab);
 	if (c->page)
 		flush_slab(s, c);
 	c->page = page;
@@ -1641,25 +1624,26 @@ static void __always_inline *slab_alloc(
 	struct kmem_cache_cpu *c;
 
 #ifdef CONFIG_FAST_CMPXCHG_LOCAL
-	c = get_cpu_slab(s, get_cpu());
+	preempt_disable();
+	c = THIS_CPU(s->cpu_slab);
 	do {
 		object = c->freelist;
 		if (unlikely(is_end(object) || !node_match(c, node))) {
 			object = __slab_alloc(s, gfpflags, node, addr, c);
 			if (unlikely(!object)) {
-				put_cpu();
+				preempt_enable();
 				goto out;
 			}
 			break;
 		}
 	} while (cmpxchg_local(&c->freelist, object, object[c->offset])
 								!= object);
-	put_cpu();
+	preempt_enable();
 #else
 	unsigned long flags;
 
 	local_irq_save(flags);
-	c = get_cpu_slab(s, smp_processor_id());
+	c = THIS_CPU(s->cpu_slab);
 	if (unlikely((is_end(c->freelist)) || !node_match(c, node))) {
 
 		object = __slab_alloc(s, gfpflags, node, addr, c);
@@ -1784,7 +1768,8 @@ static void __always_inline slab_free(st
 #ifdef CONFIG_FAST_CMPXCHG_LOCAL
 	void **freelist;
 
-	c = get_cpu_slab(s, get_cpu());
+	preempt_disable();
+	c = THIS_CPU(s->cpu_slab);
 	debug_check_no_locks_freed(object, s->objsize);
 	do {
 		freelist = c->freelist;
@@ -1806,13 +1791,13 @@ static void __always_inline slab_free(st
 		}
 		object[c->offset] = freelist;
 	} while (cmpxchg_local(&c->freelist, freelist, object) != freelist);
-	put_cpu();
+	preempt_enable();
 #else
 	unsigned long flags;
 
 	local_irq_save(flags);
 	debug_check_no_locks_freed(object, s->objsize);
-	c = get_cpu_slab(s, smp_processor_id());
+	c = THIS_CPU(s->cpu_slab);
 	if (likely(page == c->page && c->node >= 0)) {
 		object[c->offset] = c->freelist;
 		c->freelist = object;
@@ -2015,130 +2000,19 @@ static void init_kmem_cache_node(struct 
 #endif
 }
 
-#ifdef CONFIG_SMP
-/*
- * Per cpu array for per cpu structures.
- *
- * The per cpu array places all kmem_cache_cpu structures from one processor
- * close together meaning that it becomes possible that multiple per cpu
- * structures are contained in one cacheline. This may be particularly
- * beneficial for the kmalloc caches.
- *
- * A desktop system typically has around 60-80 slabs. With 100 here we are
- * likely able to get per cpu structures for all caches from the array defined
- * here. We must be able to cover all kmalloc caches during bootstrap.
- *
- * If the per cpu array is exhausted then fall back to kmalloc
- * of individual cachelines. No sharing is possible then.
- */
-#define NR_KMEM_CACHE_CPU 100
-
-static DEFINE_PER_CPU(struct kmem_cache_cpu,
-				kmem_cache_cpu)[NR_KMEM_CACHE_CPU];
-
-static DEFINE_PER_CPU(struct kmem_cache_cpu *, kmem_cache_cpu_free);
-static cpumask_t kmem_cach_cpu_free_init_once = CPU_MASK_NONE;
-
-static struct kmem_cache_cpu *alloc_kmem_cache_cpu(struct kmem_cache *s,
-							int cpu, gfp_t flags)
-{
-	struct kmem_cache_cpu *c = per_cpu(kmem_cache_cpu_free, cpu);
-
-	if (c)
-		per_cpu(kmem_cache_cpu_free, cpu) =
-				(void *)c->freelist;
-	else {
-		/* Table overflow: So allocate ourselves */
-		c = kmalloc_node(
-			ALIGN(sizeof(struct kmem_cache_cpu), cache_line_size()),
-			flags, cpu_to_node(cpu));
-		if (!c)
-			return NULL;
-	}
-
-	init_kmem_cache_cpu(s, c);
-	return c;
-}
-
-static void free_kmem_cache_cpu(struct kmem_cache_cpu *c, int cpu)
-{
-	if (c < per_cpu(kmem_cache_cpu, cpu) ||
-			c > per_cpu(kmem_cache_cpu, cpu) + NR_KMEM_CACHE_CPU) {
-		kfree(c);
-		return;
-	}
-	c->freelist = (void *)per_cpu(kmem_cache_cpu_free, cpu);
-	per_cpu(kmem_cache_cpu_free, cpu) = c;
-}
-
-static void free_kmem_cache_cpus(struct kmem_cache *s)
-{
-	int cpu;
-
-	for_each_online_cpu(cpu) {
-		struct kmem_cache_cpu *c = get_cpu_slab(s, cpu);
-
-		if (c) {
-			s->cpu_slab[cpu] = NULL;
-			free_kmem_cache_cpu(c, cpu);
-		}
-	}
-}
-
 static int alloc_kmem_cache_cpus(struct kmem_cache *s, gfp_t flags)
 {
 	int cpu;
 
-	for_each_online_cpu(cpu) {
-		struct kmem_cache_cpu *c = get_cpu_slab(s, cpu);
-
-		if (c)
-			continue;
-
-		c = alloc_kmem_cache_cpu(s, cpu, flags);
-		if (!c) {
-			free_kmem_cache_cpus(s);
-			return 0;
-		}
-		s->cpu_slab[cpu] = c;
-	}
-	return 1;
-}
-
-/*
- * Initialize the per cpu array.
- */
-static void init_alloc_cpu_cpu(int cpu)
-{
-	int i;
+	s->cpu_slab = CPU_ALLOC(struct kmem_cache_cpu, flags);
 
-	if (cpu_isset(cpu, kmem_cach_cpu_free_init_once))
-		return;
-
-	for (i = NR_KMEM_CACHE_CPU - 1; i >= 0; i--)
-		free_kmem_cache_cpu(&per_cpu(kmem_cache_cpu, cpu)[i], cpu);
-
-	cpu_set(cpu, kmem_cach_cpu_free_init_once);
-}
-
-static void __init init_alloc_cpu(void)
-{
-	int cpu;
+	if (!s->cpu_slab)
+		return 0;
 
 	for_each_online_cpu(cpu)
-		init_alloc_cpu_cpu(cpu);
-  }
-
-#else
-static inline void free_kmem_cache_cpus(struct kmem_cache *s) {}
-static inline void init_alloc_cpu(void) {}
-
-static inline int alloc_kmem_cache_cpus(struct kmem_cache *s, gfp_t flags)
-{
-	init_kmem_cache_cpu(s, &s->cpu_slab);
+		init_kmem_cache_cpu(s, CPU_PTR(s->cpu_slab, cpu));
 	return 1;
 }
-#endif
 
 #ifdef CONFIG_NUMA
 /*
@@ -2452,9 +2326,8 @@ static inline int kmem_cache_close(struc
 	int node;
 
 	flush_all(s);
-
+	CPU_FREE(s->cpu_slab);
 	/* Attempt to free all objects */
-	free_kmem_cache_cpus(s);
 	for_each_node_state(node, N_NORMAL_MEMORY) {
 		struct kmem_cache_node *n = get_node(s, node);
 
@@ -2958,8 +2831,6 @@ void __init kmem_cache_init(void)
 	int i;
 	int caches = 0;
 
-	init_alloc_cpu();
-
 #ifdef CONFIG_NUMA
 	/*
 	 * Must first have the slab cache available for the allocations of the
@@ -3019,11 +2890,12 @@ void __init kmem_cache_init(void)
 	for (i = KMALLOC_SHIFT_LOW; i < PAGE_SHIFT; i++)
 		kmalloc_caches[i]. name =
 			kasprintf(GFP_KERNEL, "kmalloc-%d", 1 << i);
-
 #ifdef CONFIG_SMP
 	register_cpu_notifier(&slab_notifier);
-	kmem_size = offsetof(struct kmem_cache, cpu_slab) +
-				nr_cpu_ids * sizeof(struct kmem_cache_cpu *);
+#endif
+#ifdef CONFIG_NUMA
+	kmem_size = offsetof(struct kmem_cache, node) +
+				nr_node_ids * sizeof(struct kmem_cache_node *);
 #else
 	kmem_size = sizeof(struct kmem_cache);
 #endif
@@ -3120,7 +2992,7 @@ struct kmem_cache *kmem_cache_create(con
 		 * per cpu structures
 		 */
 		for_each_online_cpu(cpu)
-			get_cpu_slab(s, cpu)->objsize = s->objsize;
+			CPU_PTR(s->cpu_slab, cpu)->objsize = s->objsize;
 		s->inuse = max_t(int, s->inuse, ALIGN(size, sizeof(void *)));
 		up_write(&slub_lock);
 		if (sysfs_slab_alias(s, name))
@@ -3165,11 +3037,9 @@ static int __cpuinit slab_cpuup_callback
 	switch (action) {
 	case CPU_UP_PREPARE:
 	case CPU_UP_PREPARE_FROZEN:
-		init_alloc_cpu_cpu(cpu);
 		down_read(&slub_lock);
 		list_for_each_entry(s, &slab_caches, list)
-			s->cpu_slab[cpu] = alloc_kmem_cache_cpu(s, cpu,
-							GFP_KERNEL);
+			init_kmem_cache_cpu(s, CPU_PTR(s->cpu_slab, cpu));
 		up_read(&slub_lock);
 		break;
 
@@ -3179,13 +3049,9 @@ static int __cpuinit slab_cpuup_callback
 	case CPU_DEAD_FROZEN:
 		down_read(&slub_lock);
 		list_for_each_entry(s, &slab_caches, list) {
-			struct kmem_cache_cpu *c = get_cpu_slab(s, cpu);
-
 			local_irq_save(flags);
 			__flush_cpu_slab(s, cpu);
 			local_irq_restore(flags);
-			free_kmem_cache_cpu(c, cpu);
-			s->cpu_slab[cpu] = NULL;
 		}
 		up_read(&slub_lock);
 		break;
@@ -3657,7 +3523,7 @@ static unsigned long slab_objects(struct
 	for_each_possible_cpu(cpu) {
 		struct page *page;
 		int node;
-		struct kmem_cache_cpu *c = get_cpu_slab(s, cpu);
+		struct kmem_cache_cpu *c = CPU_PTR(s->cpu_slab, cpu);
 
 		if (!c)
 			continue;
@@ -3724,7 +3590,7 @@ static int any_slab_objects(struct kmem_
 	int cpu;
 
 	for_each_possible_cpu(cpu) {
-		struct kmem_cache_cpu *c = get_cpu_slab(s, cpu);
+		struct kmem_cache_cpu *c = CPU_PTR(s->cpu_slab, cpu);
 
 		if (c && c->page)
 			return 1;

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 03/30] cpu alloc: Remove SLUB fields
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
  2007-11-16 23:09 ` [patch 01/30] cpu alloc: Simple version of the allocator (static allocations) Christoph Lameter
  2007-11-16 23:09 ` [patch 02/30] cpu alloc: Use in SLUB Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  2007-11-16 23:09 ` [patch 04/30] cpu alloc: page allocator conversion Christoph Lameter
                   ` (26 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[-- Attachment #1: cpu_alloc_remove_slub_fields --]
[-- Type: text/plain, Size: 5883 bytes --]

Remove the fields in kmem_cache_cpu that were used to cache data from
kmem_cache when they were in different cachelines. The cacheline that holds
the per cpu array pointer now also holds these values. We can cut down the
kmem_cache_cpu size to almost half.

The get_freepointer() and set_freepointer() functions that used to be only
intended for the slow path now are also useful for the hot path since access
to the field does not require an additional cacheline anymore. This results
in consistent use of setting the freepointer for objects throughout SLUB.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 include/linux/slub_def.h |    3 --
 mm/slub.c                |   50 +++++++++++++++--------------------------------
 2 files changed, 17 insertions(+), 36 deletions(-)

Index: linux-2.6/include/linux/slub_def.h
===================================================================
--- linux-2.6.orig/include/linux/slub_def.h	2007-11-15 21:25:07.622904866 -0800
+++ linux-2.6/include/linux/slub_def.h	2007-11-15 21:25:10.335154196 -0800
@@ -15,9 +15,6 @@ struct kmem_cache_cpu {
 	void **freelist;
 	struct page *page;
 	int node;
-	unsigned int offset;
-	unsigned int objsize;
-	unsigned int objects;
 };
 
 struct kmem_cache_node {
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2007-11-15 21:25:07.622904866 -0800
+++ linux-2.6/mm/slub.c	2007-11-15 21:25:10.339154532 -0800
@@ -273,13 +273,6 @@ static inline int check_valid_pointer(st
 	return 1;
 }
 
-/*
- * Slow version of get and set free pointer.
- *
- * This version requires touching the cache lines of kmem_cache which
- * we avoid to do in the fast alloc free paths. There we obtain the offset
- * from the page struct.
- */
 static inline void *get_freepointer(struct kmem_cache *s, void *object)
 {
 	return *(void **)(object + s->offset);
@@ -1438,10 +1431,10 @@ static void deactivate_slab(struct kmem_
 
 		/* Retrieve object from cpu_freelist */
 		object = c->freelist;
-		c->freelist = c->freelist[c->offset];
+		c->freelist = get_freepointer(s, c->freelist);
 
 		/* And put onto the regular freelist */
-		object[c->offset] = page->freelist;
+		set_freepointer(s, object, page->freelist);
 		page->freelist = object;
 		page->inuse--;
 	}
@@ -1573,8 +1566,8 @@ load_freelist:
 		goto debug;
 
 	object = c->page->freelist;
-	c->freelist = object[c->offset];
-	c->page->inuse = c->objects;
+	c->freelist = get_freepointer(s, object);
+	c->page->inuse = s->objects;
 	c->page->freelist = c->page->end;
 	c->node = page_to_nid(c->page);
 unlock_out:
@@ -1602,7 +1595,7 @@ debug:
 		goto another_slab;
 
 	c->page->inuse++;
-	c->page->freelist = object[c->offset];
+	c->page->freelist = get_freepointer(s, object);
 	c->node = -1;
 	goto unlock_out;
 }
@@ -1636,8 +1629,8 @@ static void __always_inline *slab_alloc(
 			}
 			break;
 		}
-	} while (cmpxchg_local(&c->freelist, object, object[c->offset])
-								!= object);
+	} while (cmpxchg_local(&c->freelist, object,
+			get_freepointer(s, object)) != object);
 	preempt_enable();
 #else
 	unsigned long flags;
@@ -1653,13 +1646,13 @@ static void __always_inline *slab_alloc(
 		}
 	} else {
 		object = c->freelist;
-		c->freelist = object[c->offset];
+		c->freelist = get_freepointer(s, object);
 	}
 	local_irq_restore(flags);
 #endif
 
 	if (unlikely((gfpflags & __GFP_ZERO)))
-		memset(object, 0, c->objsize);
+		memset(object, 0, s->objsize);
 out:
 	return object;
 }
@@ -1687,7 +1680,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_node);
  * handling required then we can return immediately.
  */
 static void __slab_free(struct kmem_cache *s, struct page *page,
-				void *x, void *addr, unsigned int offset)
+				void *x, void *addr)
 {
 	void *prior;
 	void **object = (void *)x;
@@ -1703,7 +1696,8 @@ static void __slab_free(struct kmem_cach
 	if (unlikely(state & SLABDEBUG))
 		goto debug;
 checks_ok:
-	prior = object[offset] = page->freelist;
+	prior = page->freelist;
+	set_freepointer(s, object, prior);
 	page->freelist = object;
 	page->inuse--;
 
@@ -1786,10 +1780,10 @@ static void __always_inline slab_free(st
 		 * since the freelist pointers are unique per slab.
 		 */
 		if (unlikely(page != c->page || c->node < 0)) {
-			__slab_free(s, page, x, addr, c->offset);
+			__slab_free(s, page, x, addr);
 			break;
 		}
-		object[c->offset] = freelist;
+		set_freepointer(s, object, freelist);
 	} while (cmpxchg_local(&c->freelist, freelist, object) != freelist);
 	preempt_enable();
 #else
@@ -1799,10 +1793,10 @@ static void __always_inline slab_free(st
 	debug_check_no_locks_freed(object, s->objsize);
 	c = THIS_CPU(s->cpu_slab);
 	if (likely(page == c->page && c->node >= 0)) {
-		object[c->offset] = c->freelist;
+		set_freepointer(s, object, c->freelist);
 		c->freelist = object;
 	} else
-		__slab_free(s, page, x, addr, c->offset);
+		__slab_free(s, page, x, addr);
 
 	local_irq_restore(flags);
 #endif
@@ -1984,9 +1978,6 @@ static void init_kmem_cache_cpu(struct k
 	c->page = NULL;
 	c->freelist = (void *)PAGE_MAPPING_ANON;
 	c->node = 0;
-	c->offset = s->offset / sizeof(void *);
-	c->objsize = s->objsize;
-	c->objects = s->objects;
 }
 
 static void init_kmem_cache_node(struct kmem_cache_node *n)
@@ -2978,21 +2969,14 @@ struct kmem_cache *kmem_cache_create(con
 	down_write(&slub_lock);
 	s = find_mergeable(size, align, flags, name, ctor);
 	if (s) {
-		int cpu;
-
 		s->refcount++;
+
 		/*
 		 * Adjust the object sizes so that we clear
 		 * the complete object on kzalloc.
 		 */
 		s->objsize = max(s->objsize, (int)size);
 
-		/*
-		 * And then we need to update the object size in the
-		 * per cpu structures
-		 */
-		for_each_online_cpu(cpu)
-			CPU_PTR(s->cpu_slab, cpu)->objsize = s->objsize;
 		s->inuse = max_t(int, s->inuse, ALIGN(size, sizeof(void *)));
 		up_write(&slub_lock);
 		if (sysfs_slab_alias(s, name))

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 04/30] cpu alloc: page allocator conversion
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
                   ` (2 preceding siblings ...)
  2007-11-16 23:09 ` [patch 03/30] cpu alloc: Remove SLUB fields Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  2007-11-16 23:09 ` [patch 05/30] cpu_alloc: Implement dynamically extendable cpu areas Christoph Lameter
                   ` (25 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[-- Attachment #1: cpu_alloc_page_allocator_conversion --]
[-- Type: text/plain, Size: 12930 bytes --]

Use the new cpu_alloc functionality to avoid per cpu arrays in struct zone.
This drastically reduces the size of struct zone for systems with a large
amounts of processors and allows placement of critical variables of struct
zone in one cacheline even on very large systems.

Another effect is that the pagesets of one processor are placed near one
another. If multiple pagesets from different zones fit into one cacheline
then additional cacheline fetches can be avoided on the hot paths when
allocating memory from multiple zones.

Surprisingly this clears up much of the painful NUMA bringup. Bootstrap
becomes simpler if we use the same scheme for UP, SMP, NUMA. #ifdefs are
reduced and we can drop the zone_pcp macro.

Hotplug handling is also simplified since cpu alloc can bring up and
shut down cpu areas for a specific cpu as a whole. So there is no need to
allocate or free individual pagesets.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 include/linux/mm.h     |    4 -
 include/linux/mmzone.h |   12 ---
 mm/page_alloc.c        |  161 ++++++++++++++++++-------------------------------
 mm/vmstat.c            |   14 ++--
 4 files changed, 72 insertions(+), 119 deletions(-)

Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h	2007-11-15 21:24:47.238154208 -0800
+++ linux-2.6/include/linux/mm.h	2007-11-15 21:25:12.735154250 -0800
@@ -931,11 +931,7 @@ extern void show_mem(void);
 extern void si_meminfo(struct sysinfo * val);
 extern void si_meminfo_node(struct sysinfo *val, int nid);
 
-#ifdef CONFIG_NUMA
 extern void setup_per_cpu_pageset(void);
-#else
-static inline void setup_per_cpu_pageset(void) {}
-#endif
 
 /* prio_tree.c */
 void vma_prio_tree_add(struct vm_area_struct *, struct vm_area_struct *old);
Index: linux-2.6/include/linux/mmzone.h
===================================================================
--- linux-2.6.orig/include/linux/mmzone.h	2007-11-15 21:24:49.970154448 -0800
+++ linux-2.6/include/linux/mmzone.h	2007-11-15 21:25:12.735154250 -0800
@@ -121,13 +121,7 @@ struct per_cpu_pageset {
 	s8 stat_threshold;
 	s8 vm_stat_diff[NR_VM_ZONE_STAT_ITEMS];
 #endif
-} ____cacheline_aligned_in_smp;
-
-#ifdef CONFIG_NUMA
-#define zone_pcp(__z, __cpu) ((__z)->pageset[(__cpu)])
-#else
-#define zone_pcp(__z, __cpu) (&(__z)->pageset[(__cpu)])
-#endif
+};
 
 enum zone_type {
 #ifdef CONFIG_ZONE_DMA
@@ -231,10 +225,8 @@ struct zone {
 	 */
 	unsigned long		min_unmapped_pages;
 	unsigned long		min_slab_pages;
-	struct per_cpu_pageset	*pageset[NR_CPUS];
-#else
-	struct per_cpu_pageset	pageset[NR_CPUS];
 #endif
+	struct per_cpu_pageset	*pageset;
 	/*
 	 * free areas of different sizes
 	 */
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c	2007-11-15 21:24:50.330904214 -0800
+++ linux-2.6/mm/page_alloc.c	2007-11-15 21:25:12.739154691 -0800
@@ -892,7 +892,7 @@ static void __drain_pages(unsigned int c
 		if (!populated_zone(zone))
 			continue;
 
-		pset = zone_pcp(zone, cpu);
+		pset = CPU_PTR(zone->pageset, cpu);
 
 		pcp = &pset->pcp;
 		local_irq_save(flags);
@@ -988,8 +988,8 @@ static void fastcall free_hot_cold_page(
 	arch_free_page(page, 0);
 	kernel_map_pages(page, 1, 0);
 
-	pcp = &zone_pcp(zone, get_cpu())->pcp;
 	local_irq_save(flags);
+	pcp = &THIS_CPU(zone->pageset)->pcp;
 	__count_vm_event(PGFREE);
 	if (cold)
 		list_add_tail(&page->lru, &pcp->list);
@@ -1002,7 +1002,6 @@ static void fastcall free_hot_cold_page(
 		pcp->count -= pcp->batch;
 	}
 	local_irq_restore(flags);
-	put_cpu();
 }
 
 void fastcall free_hot_page(struct page *page)
@@ -1044,16 +1043,14 @@ static struct page *buffered_rmqueue(str
 	unsigned long flags;
 	struct page *page;
 	int cold = !!(gfp_flags & __GFP_COLD);
-	int cpu;
 	int migratetype = allocflags_to_migratetype(gfp_flags);
 
 again:
-	cpu  = get_cpu();
 	if (likely(order == 0)) {
 		struct per_cpu_pages *pcp;
 
-		pcp = &zone_pcp(zone, cpu)->pcp;
 		local_irq_save(flags);
+		pcp = &THIS_CPU(zone->pageset)->pcp;
 		if (!pcp->count) {
 			pcp->count = rmqueue_bulk(zone, 0,
 					pcp->batch, &pcp->list, migratetype);
@@ -1092,7 +1089,6 @@ again:
 	__count_zone_vm_events(PGALLOC, zone, 1 << order);
 	zone_statistics(zonelist, zone);
 	local_irq_restore(flags);
-	put_cpu();
 
 	VM_BUG_ON(bad_range(zone, page));
 	if (prep_new_page(page, order, gfp_flags))
@@ -1101,7 +1097,6 @@ again:
 
 failed:
 	local_irq_restore(flags);
-	put_cpu();
 	return NULL;
 }
 
@@ -1795,7 +1790,7 @@ void show_free_areas(void)
 		for_each_online_cpu(cpu) {
 			struct per_cpu_pageset *pageset;
 
-			pageset = zone_pcp(zone, cpu);
+			pageset = CPU_PTR(zone->pageset, cpu);
 
 			printk("CPU %4d: hi:%5d, btch:%4d usd:%4d\n",
 			       cpu, pageset->pcp.high,
@@ -2621,82 +2616,33 @@ static void setup_pagelist_highmark(stru
 		pcp->batch = PAGE_SHIFT * 8;
 }
 
-
-#ifdef CONFIG_NUMA
 /*
- * Boot pageset table. One per cpu which is going to be used for all
- * zones and all nodes. The parameters will be set in such a way
- * that an item put on a list will immediately be handed over to
- * the buddy list. This is safe since pageset manipulation is done
- * with interrupts disabled.
- *
- * Some NUMA counter updates may also be caught by the boot pagesets.
- *
- * The boot_pagesets must be kept even after bootup is complete for
- * unused processors and/or zones. They do play a role for bootstrapping
- * hotplugged processors.
- *
- * zoneinfo_show() and maybe other functions do
- * not check if the processor is online before following the pageset pointer.
- * Other parts of the kernel may not check if the zone is available.
+ * Dynamically allocate memory for the per cpu pageset array in struct zone.
  */
-static struct per_cpu_pageset boot_pageset[NR_CPUS];
-
-/*
- * Dynamically allocate memory for the
- * per cpu pageset array in struct zone.
- */
-static int __cpuinit process_zones(int cpu)
+static void __cpuinit process_zones(int cpu)
 {
-	struct zone *zone, *dzone;
+	struct zone *zone;
 	int node = cpu_to_node(cpu);
 
 	node_set_state(node, N_CPU);	/* this node has a cpu */
 
 	for_each_zone(zone) {
+		struct per_cpu_pageset *pcp =
+				CPU_PTR(zone->pageset, cpu);
 
 		if (!populated_zone(zone))
 			continue;
 
-		zone_pcp(zone, cpu) = kmalloc_node(sizeof(struct per_cpu_pageset),
-					 GFP_KERNEL, node);
-		if (!zone_pcp(zone, cpu))
-			goto bad;
-
-		setup_pageset(zone_pcp(zone, cpu), zone_batchsize(zone));
+		setup_pageset(pcp, zone_batchsize(zone));
 
 		if (percpu_pagelist_fraction)
-			setup_pagelist_highmark(zone_pcp(zone, cpu),
-			 	(zone->present_pages / percpu_pagelist_fraction));
-	}
-
-	return 0;
-bad:
-	for_each_zone(dzone) {
-		if (!populated_zone(dzone))
-			continue;
-		if (dzone == zone)
-			break;
-		kfree(zone_pcp(dzone, cpu));
-		zone_pcp(dzone, cpu) = NULL;
-	}
-	return -ENOMEM;
-}
+			setup_pagelist_highmark(pcp, zone->present_pages /
+						percpu_pagelist_fraction);
 
-static inline void free_zone_pagesets(int cpu)
-{
-	struct zone *zone;
-
-	for_each_zone(zone) {
-		struct per_cpu_pageset *pset = zone_pcp(zone, cpu);
-
-		/* Free per_cpu_pageset if it is slab allocated */
-		if (pset != &boot_pageset[cpu])
-			kfree(pset);
-		zone_pcp(zone, cpu) = NULL;
 	}
 }
 
+#ifdef CONFIG_SMP
 static int __cpuinit pageset_cpuup_callback(struct notifier_block *nfb,
 		unsigned long action,
 		void *hcpu)
@@ -2707,14 +2653,7 @@ static int __cpuinit pageset_cpuup_callb
 	switch (action) {
 	case CPU_UP_PREPARE:
 	case CPU_UP_PREPARE_FROZEN:
-		if (process_zones(cpu))
-			ret = NOTIFY_BAD;
-		break;
-	case CPU_UP_CANCELED:
-	case CPU_UP_CANCELED_FROZEN:
-	case CPU_DEAD:
-	case CPU_DEAD_FROZEN:
-		free_zone_pagesets(cpu);
+		process_zones(cpu);
 		break;
 	default:
 		break;
@@ -2724,21 +2663,34 @@ static int __cpuinit pageset_cpuup_callb
 
 static struct notifier_block __cpuinitdata pageset_notifier =
 	{ &pageset_cpuup_callback, NULL, 0 };
+#endif
 
 void __init setup_per_cpu_pageset(void)
 {
-	int err;
-
-	/* Initialize per_cpu_pageset for cpu 0.
+	/*
+	 * Initialize per_cpu settings for the boot cpu.
 	 * A cpuup callback will do this for every cpu
-	 * as it comes online
+	 * as it comes online.
+	 *
+	 * This is also initializing the cpu areas for the
+	 * pagesets.
 	 */
-	err = process_zones(smp_processor_id());
-	BUG_ON(err);
-	register_cpu_notifier(&pageset_notifier);
-}
+	struct zone *zone;
 
+	for_each_zone(zone) {
+
+		if (!populated_zone(zone))
+			continue;
+
+		zone->pageset = CPU_ALLOC(struct per_cpu_pageset,
+					GFP_KERNEL|__GFP_ZERO);
+		BUG_ON(!zone->pageset);
+	}
+	process_zones(smp_processor_id());
+#ifdef CONFIG_SMP
+	register_cpu_notifier(&pageset_notifier);
 #endif
+}
 
 static noinline __init_refok
 int zone_wait_table_init(struct zone *zone, unsigned long zone_size_pages)
@@ -2785,21 +2737,30 @@ int zone_wait_table_init(struct zone *zo
 
 static __meminit void zone_pcp_init(struct zone *zone)
 {
-	int cpu;
-	unsigned long batch = zone_batchsize(zone);
+	static struct per_cpu_pageset boot_pageset;
 
-	for (cpu = 0; cpu < NR_CPUS; cpu++) {
-#ifdef CONFIG_NUMA
-		/* Early boot. Slab allocator not functional yet */
-		zone_pcp(zone, cpu) = &boot_pageset[cpu];
-		setup_pageset(&boot_pageset[cpu],0);
-#else
-		setup_pageset(zone_pcp(zone,cpu), batch);
-#endif
-	}
+	/*
+	 * Fake a cpu_alloc pointer that can take the required
+	 * offset to get to the boot pageset. This is only
+	 * needed for the boot pageset while bootstrapping
+	 * the new zone. In the course of zone bootstrap
+	 * setup_cpu_pagesets() will do the proper CPU_ALLOC and
+	 * set things up the right way.
+	 *
+	 * Deferral allows CPU_ALLOC() to use the boot pageset
+	 * to allocate the initial memory to get going and then provide
+	 * the proper memory when called from setup_cpu_pagesets() to
+	 * install the proper pagesets.
+	 *
+	 * Deferral also allows slab allocators to perform their
+	 * initialization without resorting to bootmem.
+	 */
+	zone->pageset = &boot_pageset - CPU_OFFSET(smp_processor_id());
+	setup_pageset(&boot_pageset, 0);
 	if (zone->present_pages)
-		printk(KERN_DEBUG "  %s zone: %lu pages, LIFO batch:%lu\n",
-			zone->name, zone->present_pages, batch);
+		printk(KERN_DEBUG "  %s zone: %lu pages, LIFO batch:%u\n",
+			zone->name, zone->present_pages,
+			zone_batchsize(zone));
 }
 
 __meminit int init_currently_empty_zone(struct zone *zone,
@@ -4214,11 +4175,13 @@ int percpu_pagelist_fraction_sysctl_hand
 	ret = proc_dointvec_minmax(table, write, file, buffer, length, ppos);
 	if (!write || (ret == -EINVAL))
 		return ret;
-	for_each_zone(zone) {
-		for_each_online_cpu(cpu) {
+	for_each_online_cpu(cpu) {
+		for_each_zone(zone) {
 			unsigned long  high;
+
 			high = zone->present_pages / percpu_pagelist_fraction;
-			setup_pagelist_highmark(zone_pcp(zone, cpu), high);
+			setup_pagelist_highmark(CPU_PTR(zone->pageset, cpu),
+									high);
 		}
 	}
 	return 0;
Index: linux-2.6/mm/vmstat.c
===================================================================
--- linux-2.6.orig/mm/vmstat.c	2007-11-15 21:24:50.730654207 -0800
+++ linux-2.6/mm/vmstat.c	2007-11-15 21:25:12.739154691 -0800
@@ -147,7 +147,8 @@ static void refresh_zone_stat_thresholds
 		threshold = calculate_threshold(zone);
 
 		for_each_online_cpu(cpu)
-			zone_pcp(zone, cpu)->stat_threshold = threshold;
+			CPU_PTR(zone->pageset, cpu)->stat_threshold
+							= threshold;
 	}
 }
 
@@ -157,7 +158,8 @@ static void refresh_zone_stat_thresholds
 void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
 				int delta)
 {
-	struct per_cpu_pageset *pcp = zone_pcp(zone, smp_processor_id());
+	struct per_cpu_pageset *pcp = THIS_CPU(zone->pageset);
+
 	s8 *p = pcp->vm_stat_diff + item;
 	long x;
 
@@ -210,7 +212,7 @@ EXPORT_SYMBOL(mod_zone_page_state);
  */
 void __inc_zone_state(struct zone *zone, enum zone_stat_item item)
 {
-	struct per_cpu_pageset *pcp = zone_pcp(zone, smp_processor_id());
+	struct per_cpu_pageset *pcp = THIS_CPU(zone->pageset);
 	s8 *p = pcp->vm_stat_diff + item;
 
 	(*p)++;
@@ -231,7 +233,7 @@ EXPORT_SYMBOL(__inc_zone_page_state);
 
 void __dec_zone_state(struct zone *zone, enum zone_stat_item item)
 {
-	struct per_cpu_pageset *pcp = zone_pcp(zone, smp_processor_id());
+	struct per_cpu_pageset *pcp = THIS_CPU(zone->pageset);
 	s8 *p = pcp->vm_stat_diff + item;
 
 	(*p)--;
@@ -307,7 +309,7 @@ void refresh_cpu_vm_stats(int cpu)
 		if (!populated_zone(zone))
 			continue;
 
-		p = zone_pcp(zone, cpu);
+		p = CPU_PTR(zone->pageset, cpu);
 
 		for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
 			if (p->vm_stat_diff[i]) {
@@ -680,7 +682,7 @@ static void zoneinfo_show_print(struct s
 	for_each_online_cpu(i) {
 		struct per_cpu_pageset *pageset;
 
-		pageset = zone_pcp(zone, i);
+		pageset = CPU_PTR(zone->pageset, i);
 		seq_printf(m,
 			   "\n    cpu: %i"
 			   "\n              count: %i"

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 05/30] cpu_alloc: Implement dynamically extendable cpu areas
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
                   ` (3 preceding siblings ...)
  2007-11-16 23:09 ` [patch 04/30] cpu alloc: page allocator conversion Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  2007-11-16 23:09 ` [patch 06/30] cpu alloc: x86 support Christoph Lameter
                   ` (24 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[-- Attachment #1: cpu_alloc_virtual --]
[-- Type: text/plain, Size: 13552 bytes --]

Virtually map the cpu areas. This allows bigger maximum sizes and to only
populate the virtual mappings on demand.

In order to use the virtual mapping capability the arch must setup some
configuration variables in arch/xxx/Kconfig:

CONFIG_CPU_AREA_VIRTUAL to y

CONFIG_CPU_AREA_ORDER
	to the largest allowed size that the per cpu area can grow to.

CONFIG_CPU_AREA_ALLOC_ORDER
	to the allocation size when the cpu area needs to grow. Use 0
	here to guarantee order 0 allocations.

The address to use must be defined in CPU_AREA_BASE. This is typically done
in include/asm-xxx/pgtable.h. 

The maximum space used by the cpu are is

	NR_CPUS * (PAGE_SIZE << CONFIG_CPU_AREA_ORDER)

An arch may provide its own population function for the virtual mappings
(in order to exploit huge page mappings and other frills of the MMU of an
architecture). The default populate function uses single page mappings.


int cpu_area_populate(void *start, unsigned long size, gfp_t flags, int node)

The list of cpu_area_xx functions exported in include/linux/mm.h may be used
as helpers to generate the mapping that the arch needs.

In the simplest form the arch code calls:

	cpu_area_populate_basepages(start, size, flags, node);

The arch code must call

	cpu_area_alloc_block(unsigned long size, gfp_t flags, int node)

for all its memory needs during the construction of the custom page table.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/linux/mm.h |   13 ++
 mm/Kconfig         |   10 +
 mm/cpu_alloc.c     |  287 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 299 insertions(+), 11 deletions(-)

Index: linux-2.6/mm/cpu_alloc.c
===================================================================
--- linux-2.6.orig/mm/cpu_alloc.c	2007-11-16 14:54:29.890430938 -0800
+++ linux-2.6/mm/cpu_alloc.c	2007-11-16 14:54:37.106404761 -0800
@@ -17,6 +17,12 @@
 #include <linux/module.h>
 #include <linux/percpu.h>
 #include <linux/bitmap.h>
+#include <linux/vmalloc.h>
+#include <linux/bootmem.h>
+#include <linux/sched.h>	/* i386 definition of init_mm */
+#include <linux/highmem.h>	/* i386 dependency on highmem config */
+#include <asm/pgtable.h>
+#include <asm/pgalloc.h>
 
 /*
  * Basic allocation unit. A bit map is created to track the use of each
@@ -24,7 +30,7 @@
  */
 
 #define UNIT_SIZE sizeof(int)
-#define UNITS (ALLOC_SIZE / UNIT_SIZE)
+#define UNITS_PER_BLOCK (ALLOC_SIZE / UNIT_SIZE)
 
 /*
  * How many units are needed for an object of a given size
@@ -40,6 +46,249 @@ static int size_to_units(unsigned long s
 static DEFINE_SPINLOCK(cpu_alloc_map_lock);
 static unsigned long units_reserved;	/* Units reserved by boot allocations */
 
+#ifdef CONFIG_CPU_AREA_VIRTUAL
+
+/*
+ * Virtualized cpu area. The cpu area can be extended if more space is needed.
+ */
+
+#define cpu_area ((u8 *)(CPU_AREA_BASE))
+#define ALLOC_SIZE (1UL << (CONFIG_CPU_AREA_ALLOC_ORDER + PAGE_SHIFT))
+#define BOOT_ALLOC (1 << __GFP_BITS_SHIFT)
+
+
+/*
+ * The maximum number of blocks is the maximum size of the
+ * cpu area for one processor divided by the size of an allocation
+ * block.
+ */
+#define MAX_BLOCKS (1UL << (CONFIG_CPU_AREA_ORDER - \
+				CONFIG_CPU_AREA_ALLOC_ORDER))
+
+
+static unsigned long *cpu_alloc_map = NULL;
+static int cpu_alloc_map_order = -1;	/* Size of the bitmap in page order */
+static unsigned long active_blocks;	/* Number of block allocated on each cpu */
+static unsigned long units_total;	/* Total units that are managed */
+/*
+ * Allocate a block of memory to be used to provide cpu area memory
+ * or to extend the bitmap for the cpu map.
+ */
+void *cpu_area_alloc_block(unsigned long size, gfp_t flags, int node)
+{
+	if (!(flags & BOOT_ALLOC)) {
+		struct page *page = alloc_pages_node(node,
+			flags, get_order(size));
+
+		if (page)
+			return page_address(page);
+		return NULL;
+	} else
+		return __alloc_bootmem_node(NODE_DATA(node), size, size,
+				__pa(MAX_DMA_ADDRESS));
+}
+
+pte_t *cpu_area_pte_populate(pmd_t *pmd, unsigned long addr,
+						gfp_t flags, int node)
+{
+	pte_t *pte = pte_offset_kernel(pmd, addr);
+	if (pte_none(*pte)) {
+		pte_t entry;
+		void *p = cpu_area_alloc_block(PAGE_SIZE, flags, node);
+		if (!p)
+			return 0;
+		entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL);
+		set_pte_at(&init_mm, addr, pte, entry);
+	}
+	return pte;
+}
+
+pmd_t *cpu_area_pmd_populate(pud_t *pud, unsigned long addr,
+						gfp_t flags, int node)
+{
+	pmd_t *pmd = pmd_offset(pud, addr);
+	if (pmd_none(*pmd)) {
+		void *p = cpu_area_alloc_block(PAGE_SIZE, flags, node);
+		if (!p)
+			return 0;
+		pmd_populate_kernel(&init_mm, pmd, p);
+	}
+	return pmd;
+}
+
+pud_t *cpu_area_pud_populate(pgd_t *pgd, unsigned long addr,
+						gfp_t flags, int node)
+{
+	pud_t *pud = pud_offset(pgd, addr);
+	if (pud_none(*pud)) {
+		void *p = cpu_area_alloc_block(PAGE_SIZE, flags, node);
+		if (!p)
+			return 0;
+		pud_populate(&init_mm, pud, p);
+	}
+	return pud;
+}
+
+pgd_t *cpu_area_pgd_populate(unsigned long addr, gfp_t flags, int node)
+{
+	pgd_t *pgd = pgd_offset_k(addr);
+	if (pgd_none(*pgd)) {
+		void *p = cpu_area_alloc_block(PAGE_SIZE, flags, node);
+		if (!p)
+			return 0;
+		pgd_populate(&init_mm, pgd, p);
+	}
+	return pgd;
+}
+
+int cpu_area_populate_basepages(void *start, unsigned long size,
+						gfp_t flags, int node)
+{
+	unsigned long addr = (unsigned long)start;
+	unsigned long end = addr + size;
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+
+	for (; addr < end; addr += PAGE_SIZE) {
+		pgd = cpu_area_pgd_populate(addr, flags, node);
+		if (!pgd)
+			return -ENOMEM;
+		pud = cpu_area_pud_populate(pgd, addr, flags, node);
+		if (!pud)
+			return -ENOMEM;
+		pmd = cpu_area_pmd_populate(pud, addr, flags, node);
+		if (!pmd)
+			return -ENOMEM;
+		pte = cpu_area_pte_populate(pmd, addr, flags, node);
+		if (!pte)
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+/*
+ * If no other population function is defined then this function will stand
+ * in and provide the capability to map PAGE_SIZE pages into the cpu area.
+ */
+int __attribute__((weak)) cpu_area_populate(void *start, unsigned long size,
+					gfp_t flags, int node)
+{
+	return cpu_area_populate_basepages(start, size, flags, node);
+}
+
+/*
+ * Extend the areas on all processors. This function may be called repeatedly
+ * until we have enough space to accomodate a newly allocated object.
+ *
+ * Must hold the cpu_alloc_map_lock on entry. Will drop the lock and then
+ * regain it.
+ */
+static int expand_cpu_area(gfp_t flags)
+{
+	unsigned long blocks = active_blocks;
+	unsigned long bits;
+	int cpu;
+	int err = -ENOMEM;
+	int map_order;
+	unsigned long *new_map = NULL;
+	void *start;
+
+	if (active_blocks == MAX_BLOCKS)
+		goto out;
+
+	spin_unlock(&cpu_alloc_map_lock);
+	if (flags & __GFP_WAIT)
+		local_irq_enable();
+
+	/*
+	 * Determine the size of the bit map needed
+	 */
+	bits = (blocks + 1) * UNITS_PER_BLOCK - units_reserved;
+
+	map_order = get_order(DIV_ROUND_UP(bits, 8));
+	BUG_ON(map_order >= MAX_ORDER);
+	start = cpu_area + \
+		(blocks << (PAGE_SHIFT + CONFIG_CPU_AREA_ALLOC_ORDER));
+
+	for_each_possible_cpu(cpu) {
+		err = cpu_area_populate(CPU_PTR(start, cpu), ALLOC_SIZE,
+			flags, cpu_to_node(cpu));
+
+		if (err) {
+			spin_lock(&cpu_alloc_map_lock);
+			goto out;
+		}
+	}
+
+	if (map_order > cpu_alloc_map_order) {
+		new_map = cpu_area_alloc_block(PAGE_SIZE << map_order,
+						flags | __GFP_ZERO, 0);
+		if (!new_map)
+			goto out;
+	}
+
+	if (flags & __GFP_WAIT)
+		local_irq_disable();
+	spin_lock(&cpu_alloc_map_lock);
+
+	/*
+	 * We dropped the lock. Another processor may have already extended
+	 * the cpu area size as needed.
+	 */
+	if (blocks != active_blocks) {
+		if (new_map)
+			free_pages((unsigned long)new_map,
+						map_order);
+		err = 0;
+		goto out;
+	}
+
+	if (new_map) {
+		/*
+		 * Need to extend the bitmap
+		 */
+		if (cpu_alloc_map)
+			memcpy(new_map, cpu_alloc_map,
+				PAGE_SIZE << cpu_alloc_map_order);
+		cpu_alloc_map = new_map;
+		cpu_alloc_map_order = map_order;
+	}
+
+	active_blocks++;
+	units_total += UNITS_PER_BLOCK;
+	err = 0;
+out:
+	return err;
+}
+
+void * __init boot_cpu_alloc(unsigned long size)
+{
+	unsigned long flags;
+	unsigned long x = units_reserved;
+	unsigned long units = size_to_units(size);
+
+	/*
+	 * Locking is really not necessary during boot
+	 * but expand_cpu_area() unlocks and relocks.
+	 * If we do not perform locking here then
+	 *
+	 * 1. The cpu_alloc_map_lock is locked when
+	 *    we exit boot causing a hang on the next cpu_alloc().
+	 * 2. lockdep will get upset if we do not consistently
+	 *    handle things.
+	 */
+	spin_lock_irqsave(&cpu_alloc_map_lock, flags);
+	while (units_reserved + units > units_total)
+		expand_cpu_area(BOOT_ALLOC);
+	units_reserved += units;
+	spin_unlock_irqrestore(&cpu_alloc_map_lock, flags);
+	return cpu_area + x * UNIT_SIZE;
+}
+#else
+
 /*
  * Static configuration. The cpu areas are of a fixed size and
  * cannot be extended. Such configurations are mainly useful on
@@ -51,16 +300,24 @@ static unsigned long units_reserved;	/* 
 #define ALLOC_SIZE (1UL << (CONFIG_CPU_AREA_ORDER + PAGE_SHIFT))
 
 static u8 cpu_area[NR_CPUS * ALLOC_SIZE];
-static DECLARE_BITMAP(cpu_alloc_map, UNITS);
+static DECLARE_BITMAP(cpu_alloc_map, UNITS_PER_BLOCK);
+#define cpu_alloc_map_order CONFIG_CPU_AREA_ORDER
+#define units_total UNITS_PER_BLOCK
+
+static inline int expand_cpu_area(gfp_t flags)
+{
+	return -ENOSYS;
+}
 
 void * __init boot_cpu_alloc(unsigned long size)
 {
 	unsigned long x = units_reserved;
 
 	units_reserved += size_to_units(size);
-	BUG_ON(units_reserved > UNITS);
+	BUG_ON(units_reserved > units_total);
 	return cpu_area + x * UNIT_SIZE;
 }
+#endif
 
 static int first_free;		/* First known free unit */
 
@@ -98,20 +355,30 @@ void *cpu_alloc(unsigned long size, gfp_
 	int units = size_to_units(size);
 	void *ptr;
 	int first;
+	unsigned long map_size;
 	unsigned long flags;
 
 	BUG_ON(gfpflags & ~(GFP_RECLAIM_MASK | __GFP_ZERO));
 
 	spin_lock_irqsave(&cpu_alloc_map_lock, flags);
 
+restart:
+	if (cpu_alloc_map_order >= 0)
+		map_size = PAGE_SIZE << cpu_alloc_map_order;
+	else
+		map_size = 0;
+
 	first = 1;
 	start = first_free;
 
 	for ( ; ; ) {
 
-		start = find_next_zero_bit(cpu_alloc_map, ALLOC_SIZE, start);
-		if (start >= UNITS - units_reserved)
+		start = find_next_zero_bit(cpu_alloc_map, map_size, start);
+		if (start >= units_total - units_reserved) {
+			if (!expand_cpu_area(gfpflags))
+				goto restart;
 			goto out_of_memory;
+		}
 
 		if (first)
 			first_free = start;
@@ -121,7 +388,7 @@ void *cpu_alloc(unsigned long size, gfp_
 		 * the starting unit.
 		 */
 		if ((start + units_reserved) % (align / UNIT_SIZE) == 0 &&
-			find_next_bit(cpu_alloc_map, ALLOC_SIZE, start + 1)
+			find_next_bit(cpu_alloc_map, map_size, start + 1)
 							>= start + units)
 				break;
 		start++;
@@ -131,8 +398,10 @@ void *cpu_alloc(unsigned long size, gfp_
 	if (first)
 		first_free = start + units;
 
-	if (start + units > UNITS - units_reserved)
-		goto out_of_memory;
+	while (start + units > units_total - units_reserved) {
+		if (expand_cpu_area(gfpflags))
+			goto out_of_memory;
+	}
 
 	set_map(start, units);
 	__count_vm_events(CPU_BYTES, units * UNIT_SIZE);
@@ -170,7 +439,7 @@ void cpu_free(void *start, unsigned long
 	BUG_ON(p < (cpu_area + units_reserved * UNIT_SIZE));
 	index = (p - cpu_area) / UNIT_SIZE - units_reserved;
 	BUG_ON(!test_bit(index, cpu_alloc_map) ||
-			index >= UNITS - units_reserved);
+			index >= units_total - units_reserved);
 
 	spin_lock_irqsave(&cpu_alloc_map_lock, flags);
 
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h	2007-11-16 14:54:33.186431271 -0800
+++ linux-2.6/include/linux/mm.h	2007-11-16 14:54:37.106404761 -0800
@@ -1137,5 +1137,18 @@ int vmemmap_populate_basepages(struct pa
 						unsigned long pages, int node);
 int vmemmap_populate(struct page *start_page, unsigned long pages, int node);
 
+pgd_t *cpu_area_pgd_populate(unsigned long addr, gfp_t flags, int node);
+pud_t *cpu_area_pud_populate(pgd_t *pgd, unsigned long addr,
+						gfp_t flags, int node);
+pmd_t *cpu_area_pmd_populate(pud_t *pud, unsigned long addr,
+						gfp_t flags, int node);
+pte_t *cpu_area_pte_populate(pmd_t *pmd, unsigned long addr,
+						gfp_t flags, int node);
+void *cpu_area_alloc_block(unsigned long size, gfp_t flags, int node);
+int cpu_area_populate_basepages(void *start, unsigned long size,
+						gfp_t flags, int node);
+int cpu_area_populate(void *start, unsigned long size,
+						gfp_t flags, int node);
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
Index: linux-2.6/mm/Kconfig
===================================================================
--- linux-2.6.orig/mm/Kconfig	2007-11-16 14:54:29.890430938 -0800
+++ linux-2.6/mm/Kconfig	2007-11-16 14:55:07.364981597 -0800
@@ -197,7 +197,13 @@ config VIRT_TO_BUS
 
 config CPU_AREA_ORDER
 	int "Maximum size (order) of CPU area"
-	default "3"
+	default "10" if CPU_AREA_VIRTUAL
+	default "3" if !CPU_AREA_VIRTUAL
 	help
 	  Sets the maximum amount of memory that can be allocated via cpu_alloc
-	  The size is set in page order, so 0 = PAGE_SIZE, 1 = PAGE_SIZE << 1 etc.
+	  The size is set in page order. The size set (times the maximum
+	  number of processors) determines the amount of virtual memory that
+	  is set aside for the per cpu areas for virtualized cpu areas or the
+	  amount of memory allocated in the bss segment for non virtualized
+	  cpu areas.
+

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 06/30] cpu alloc: x86 support
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
                   ` (4 preceding siblings ...)
  2007-11-16 23:09 ` [patch 05/30] cpu_alloc: Implement dynamically extendable cpu areas Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  2007-11-16 23:09 ` [patch 07/30] cpu alloc: IA64 support Christoph Lameter
                   ` (23 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[-- Attachment #1: cpu_alloc_x86_support --]
[-- Type: text/plain, Size: 6345 bytes --]

64 bit:

Set up a cpu area that allows the use of up 16MB for each processor.

Cpu memory use can grow a bit. F.e. if we assume that a pageset
occupies 64 bytes of memory and we have 3 zones in each of 1024 nodes
then we need 3 * 1k * 16k = 50 million pagesets or 3096 pagesets per
processor. This results in a total of 3.2 GB of page structs.
Each cpu needs around 200k of cpu storage for the page allocator alone.
So its a worth it to use a 2M huge mapping here.

For the UP and SMP case map the area using 4k ptes. Typical use of per cpu
data is around 16k for UP and SMP configurations. It goes up to 45k when the
per cpu area is managed by cpu_alloc (see special x86_64 patchset).
Allocating in 2M segments would be overkill.

For NUMA map the area using 2M PMDs. A large NUMA system may use
lots of cpu data for the page allocator data alone. We typically
have large amounts of memory around on those size. Using a 2M page size
reduces TLB pressure for that case.

Some numbers for envisioned maximum configurations of NUMA systems:

4k cpu configurations with 1k nodes:

	4096 * 16MB = 64TB of virtual space.

Maximum theoretical configuration 16384 processors 1k nodes:

	16384 * 16MB = 256TB of virtual space.

Both fit within the established limits established.

32 bit:

Setup a 256 kB area for the cpu areas below the FIXADDR area.

The use of the cpu alloc area is pretty minimal on i386. An 8p system
with no extras uses only ~8kb. So 256kb should be plenty. A configuration
that supports up to 8 processors takes up 2MB of the scarce
virtual address space

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 arch/x86/Kconfig             |   15 +++++++++++++++
 arch/x86/mm/init_32.c        |    3 +++
 arch/x86/mm/init_64.c        |   38 ++++++++++++++++++++++++++++++++++++++
 include/asm-x86/pgtable_32.h |    7 +++++--
 include/asm-x86/pgtable_64.h |    1 +
 5 files changed, 62 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_64.c	2007-11-15 21:24:47.059904335 -0800
+++ linux-2.6/arch/x86/mm/init_64.c	2007-11-15 21:25:18.578584246 -0800
@@ -781,3 +781,41 @@ int __meminit vmemmap_populate(struct pa
 	return 0;
 }
 #endif
+
+#ifdef CONFIG_NUMA
+int __meminit cpu_area_populate(void *start, unsigned long size,
+						gfp_t flags, int node)
+{
+	unsigned long addr = (unsigned long)start;
+	unsigned long end = addr + size;
+	unsigned long next;
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+
+	for (; addr < end; addr = next) {
+		next = pmd_addr_end(addr, end);
+
+		pgd = cpu_area_pgd_populate(addr, flags, node);
+		if (!pgd)
+			return -ENOMEM;
+		pud = cpu_area_pud_populate(pgd, addr, flags, node);
+		if (!pud)
+			return -ENOMEM;
+
+		pmd = pmd_offset(pud, addr);
+		if (pmd_none(*pmd)) {
+			pte_t entry;
+			void *p = cpu_area_alloc_block(PMD_SIZE, flags, node);
+			if (!p)
+				return -ENOMEM;
+
+			entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL);
+			mk_pte_huge(entry);
+			set_pmd(pmd, __pmd(pte_val(entry)));
+		}
+	}
+
+	return 0;
+}
+#endif
Index: linux-2.6/include/asm-x86/pgtable_64.h
===================================================================
--- linux-2.6.orig/include/asm-x86/pgtable_64.h	2007-11-15 21:24:47.079904686 -0800
+++ linux-2.6/include/asm-x86/pgtable_64.h	2007-11-15 21:25:18.578584246 -0800
@@ -138,6 +138,7 @@ static inline pte_t ptep_get_and_clear_f
 #define VMALLOC_START    _AC(0xffffc20000000000, UL)
 #define VMALLOC_END      _AC(0xffffe1ffffffffff, UL)
 #define VMEMMAP_START	 _AC(0xffffe20000000000, UL)
+#define CPU_AREA_BASE	 _AC(0xfffff20000000000, UL)
 #define MODULES_VADDR    _AC(0xffffffff88000000, UL)
 #define MODULES_END      _AC(0xfffffffffff00000, UL)
 #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
Index: linux-2.6/arch/x86/Kconfig
===================================================================
--- linux-2.6.orig/arch/x86/Kconfig	2007-11-15 21:24:47.075904383 -0800
+++ linux-2.6/arch/x86/Kconfig	2007-11-15 21:25:18.578584246 -0800
@@ -163,6 +163,21 @@ config X86_TRAMPOLINE
 
 config KTIME_SCALAR
 	def_bool X86_32
+
+config CPU_AREA_VIRTUAL
+	bool
+	default y
+
+config CPU_AREA_ORDER
+	int
+	default "16" if X86_64
+	default "6" if X86_32
+
+config CPU_AREA_ALLOC_ORDER
+	int
+	default "9" if NUMA && X86_64
+	default "0" if !NUMA || X86_32
+
 source "init/Kconfig"
 
 menu "Processor type and features"
Index: linux-2.6/arch/x86/mm/init_32.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_32.c	2007-11-15 21:24:47.067904108 -0800
+++ linux-2.6/arch/x86/mm/init_32.c	2007-11-15 21:25:18.578584246 -0800
@@ -674,6 +674,7 @@ void __init mem_init(void)
 #if 1 /* double-sanity-check paranoia */
 	printk("virtual kernel memory layout:\n"
 	       "    fixmap  : 0x%08lx - 0x%08lx   (%4ld kB)\n"
+	       "    cpu area: 0x%08lx - 0x%08lx   (%4ld kb)\n"
 #ifdef CONFIG_HIGHMEM
 	       "    pkmap   : 0x%08lx - 0x%08lx   (%4ld kB)\n"
 #endif
@@ -684,6 +685,8 @@ void __init mem_init(void)
 	       "      .text : 0x%08lx - 0x%08lx   (%4ld kB)\n",
 	       FIXADDR_START, FIXADDR_TOP,
 	       (FIXADDR_TOP - FIXADDR_START) >> 10,
+	       CPU_AREA_BASE, FIXADDR_START,
+	       (FIXADDR_START - CPU_AREA_BASE) >> 10,
 
 #ifdef CONFIG_HIGHMEM
 	       PKMAP_BASE, PKMAP_BASE+LAST_PKMAP*PAGE_SIZE,
Index: linux-2.6/include/asm-x86/pgtable_32.h
===================================================================
--- linux-2.6.orig/include/asm-x86/pgtable_32.h	2007-11-15 21:24:47.087904440 -0800
+++ linux-2.6/include/asm-x86/pgtable_32.h	2007-11-15 21:25:18.578584246 -0800
@@ -79,11 +79,14 @@ void paging_init(void);
 #define VMALLOC_START	(((unsigned long) high_memory + \
 			2*VMALLOC_OFFSET-1) & ~(VMALLOC_OFFSET-1))
 #ifdef CONFIG_HIGHMEM
-# define VMALLOC_END	(PKMAP_BASE-2*PAGE_SIZE)
+# define CPU_AREA_BASE	(PKMAP_BASE - NR_CPUS * \
+				(1 << (CONFIG_CPU_AREA_ORDER + PAGE_SHIFT)))
 #else
-# define VMALLOC_END	(FIXADDR_START-2*PAGE_SIZE)
+# define CPU_AREA_BASE	(FIXADDR_START - NR_CPUS * \
+				(1 << (CONFIG_CPU_AREA_ORDER + PAGE_SHIFT)))
 #endif
 
+#define VMALLOC_END	(CPU_AREA_BASE - 2 * PAGE_SIZE)
 /*
  * _PAGE_PSE set in the page directory entry just means that
  * the page directory entry points directly to a 4MB-aligned block of

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 07/30] cpu alloc: IA64 support
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
                   ` (5 preceding siblings ...)
  2007-11-16 23:09 ` [patch 06/30] cpu alloc: x86 support Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  2007-11-16 23:32   ` Luck, Tony
  2007-11-16 23:09 ` [patch 08/30] cpu_alloc: Sparc64 support Christoph Lameter
                   ` (22 subsequent siblings)
  29 siblings, 1 reply; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[-- Attachment #1: cpu_alloc_ia64_support --]
[-- Type: text/plain, Size: 3008 bytes --]

Typical use of per cpu memory for a small system of 8G 8p 4node is less than
64k per cpu memory. This is increasing rapidly for larger systems where we can
get up to 512k or 1M of memory used for cpu storage.

The maximum size allowed of the cpu area is 128MB of memory.

The cpu area is placed in region 5 with the kernel, vmemmap and vmalloc areas.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 arch/ia64/Kconfig          |   13 +++++++++++++
 include/asm-ia64/pgtable.h |   32 ++++++++++++++++++++++++++------
 2 files changed, 39 insertions(+), 6 deletions(-)

Index: linux-2.6/arch/ia64/Kconfig
===================================================================
--- linux-2.6.orig/arch/ia64/Kconfig	2007-11-15 21:24:46.991154957 -0800
+++ linux-2.6/arch/ia64/Kconfig	2007-11-16 14:43:17.277214329 -0800
@@ -99,6 +99,19 @@ config AUDIT_ARCH
 	bool
 	default y
 
+config CPU_AREA_VIRTUAL
+	bool
+	default y
+
+# Maximum of 128 MB cpu_alloc space per cpu
+config CPU_AREA_ORDER
+	int
+	default "13"
+
+config CPU_AREA_ALLOC_ORDER
+	int
+	default "0"
+
 choice
 	prompt "System type"
 	default IA64_GENERIC
Index: linux-2.6/include/asm-ia64/pgtable.h
===================================================================
--- linux-2.6.orig/include/asm-ia64/pgtable.h	2007-11-15 21:24:47.003154534 -0800
+++ linux-2.6/include/asm-ia64/pgtable.h	2007-11-16 14:42:57.629964336 -0800
@@ -224,21 +224,41 @@ ia64_phys_addr_valid (unsigned long addr
  */
 
 
+/*
+ * Layout of RGN_GATE
+ *
+ * 47 bits wide (16kb pages)
+ *
+ * 0xa000000000000000-0xa00000200000000	8G	Kernel data area
+ * 0xa000002000000000-0xa00040000000000	64T	vmalloc
+ * 0xa000400000000000-0xa00060000000000 32T	vmemmmap
+ * 0xa000600000000000-0xa00080000000000	32T	cpu area
+ *
+ * 55 bits wide (64kb pages)
+ *
+ * 0xa000000000000000-0xa00000200000000	8G	Kernel data area
+ * 0xa000002000000000-0xa04000000000000	16P	vmalloc
+ * 0xa040000000000000-0xa06000000000000 8P	vmemmmap
+ * 0xa060000000000000-0xa08000000000000	8P	cpu area
+ */
+
 #define VMALLOC_START		(RGN_BASE(RGN_GATE) + 0x200000000UL)
+#define VMALLOC_END_INIT	(RGN_BASE(RGN_GATE) + (1UL << (4*PAGE_SHIFT - 10)))
+
 #ifdef CONFIG_VIRTUAL_MEM_MAP
-# define VMALLOC_END_INIT	(RGN_BASE(RGN_GATE) + (1UL << (4*PAGE_SHIFT - 9)))
 # define VMALLOC_END		vmalloc_end
   extern unsigned long vmalloc_end;
 #else
+# define VMALLOC_END VMALLOC_END_INIT
+#endif
+
 #if defined(CONFIG_SPARSEMEM) && defined(CONFIG_SPARSEMEM_VMEMMAP)
 /* SPARSEMEM_VMEMMAP uses half of vmalloc... */
-# define VMALLOC_END		(RGN_BASE(RGN_GATE) + (1UL << (4*PAGE_SHIFT - 10)))
-# define vmemmap		((struct page *)VMALLOC_END)
-#else
-# define VMALLOC_END		(RGN_BASE(RGN_GATE) + (1UL << (4*PAGE_SHIFT - 9)))
-#endif
+# define vmemmap		((struct page *)VMALLOC_END_INIT)
 #endif
 
+#define CPU_AREA_BASE		(RGN_BASE(RGN_GATE) + (3UL << (4*PAGE_SHIFT - 11)))
+
 /* fs/proc/kcore.c */
 #define	kc_vaddr_to_offset(v) ((v) - RGN_BASE(RGN_GATE))
 #define	kc_offset_to_vaddr(o) ((o) + RGN_BASE(RGN_GATE))

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 08/30] cpu_alloc: Sparc64 support
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
                   ` (6 preceding siblings ...)
  2007-11-16 23:09 ` [patch 07/30] cpu alloc: IA64 support Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  2007-11-16 23:09 ` [patch 09/30] cpu alloc: percpu_counter conversion Christoph Lameter
                   ` (21 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[-- Attachment #1: cpu_alloc_sparc64 --]
[-- Type: text/plain, Size: 1596 bytes --]

Enable a simple virtual configuration with 32MB available per cpu so that
we do not use a static area on sparc64.

[Not tested. I have no sparc64]

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 arch/sparc64/Kconfig          |   15 +++++++++++++++
 include/asm-sparc64/pgtable.h |    1 +
 2 files changed, 16 insertions(+)

Index: linux-2.6/arch/sparc64/Kconfig
===================================================================
--- linux-2.6.orig/arch/sparc64/Kconfig	2007-11-15 21:24:46.942815842 -0800
+++ linux-2.6/arch/sparc64/Kconfig	2007-11-15 21:25:27.790904212 -0800
@@ -103,6 +103,21 @@ config SPARC64_PAGE_SIZE_4MB
 
 endchoice
 
+config CPU_AREA_VIRTUAL
+	bool
+	default y
+
+config CPU_AREA_ORDER
+	int
+	default "11" if SPARC64_PAGE_SIZE_8KB
+	default "9" if SPARC64_PAGE_SIZE_64KB
+	default "6" if SPARC64_PAGE_SIZE_512KB
+	default "3" if SPARC64_PAGE_SIZE_4MB
+
+config CPU_AREA_ALLOC_ORDER
+	int
+	default "0"
+
 config SECCOMP
 	bool "Enable seccomp to safely compute untrusted bytecode"
 	depends on PROC_FS
Index: linux-2.6/include/asm-sparc64/pgtable.h
===================================================================
--- linux-2.6.orig/include/asm-sparc64/pgtable.h	2007-11-15 21:24:46.950904404 -0800
+++ linux-2.6/include/asm-sparc64/pgtable.h	2007-11-15 21:25:27.794904145 -0800
@@ -43,6 +43,7 @@
 #define VMALLOC_START		_AC(0x0000000100000000,UL)
 #define VMALLOC_END		_AC(0x0000000200000000,UL)
 #define VMEMMAP_BASE		_AC(0x0000000200000000,UL)
+#define CPU_AREA_BASE		_AC(0x0000000300000000,UL)
 
 #define vmemmap			((struct page *)VMEMMAP_BASE)
 

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 09/30] cpu alloc: percpu_counter conversion
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
                   ` (7 preceding siblings ...)
  2007-11-16 23:09 ` [patch 08/30] cpu_alloc: Sparc64 support Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  2007-11-16 23:09 ` [patch 10/30] cpu alloc: crash_notes conversion Christoph Lameter
                   ` (20 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[-- Attachment #1: 0019-cpu-alloc-percpu_counter-conversion.patch --]
[-- Type: text/plain, Size: 2041 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 lib/percpu_counter.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

Index: linux-2.6/lib/percpu_counter.c
===================================================================
--- linux-2.6.orig/lib/percpu_counter.c	2007-11-15 21:24:46.878154362 -0800
+++ linux-2.6/lib/percpu_counter.c	2007-11-15 21:25:28.963154085 -0800
@@ -20,7 +20,7 @@ void percpu_counter_set(struct percpu_co
 
 	spin_lock(&fbc->lock);
 	for_each_possible_cpu(cpu) {
-		s32 *pcount = per_cpu_ptr(fbc->counters, cpu);
+		s32 *pcount = CPU_PTR(fbc->counters, cpu);
 		*pcount = 0;
 	}
 	fbc->count = amount;
@@ -34,7 +34,7 @@ void __percpu_counter_add(struct percpu_
 	s32 *pcount;
 	int cpu = get_cpu();
 
-	pcount = per_cpu_ptr(fbc->counters, cpu);
+	pcount = CPU_PTR(fbc->counters, cpu);
 	count = *pcount + amount;
 	if (count >= batch || count <= -batch) {
 		spin_lock(&fbc->lock);
@@ -60,7 +60,7 @@ s64 __percpu_counter_sum(struct percpu_c
 	spin_lock(&fbc->lock);
 	ret = fbc->count;
 	for_each_online_cpu(cpu) {
-		s32 *pcount = per_cpu_ptr(fbc->counters, cpu);
+		s32 *pcount = CPU_PTR(fbc->counters, cpu);
 		ret += *pcount;
 	}
 	spin_unlock(&fbc->lock);
@@ -74,7 +74,7 @@ int percpu_counter_init(struct percpu_co
 {
 	spin_lock_init(&fbc->lock);
 	fbc->count = amount;
-	fbc->counters = alloc_percpu(s32);
+	fbc->counters = CPU_ALLOC(s32, GFP_KERNEL|__GFP_ZERO);
 	if (!fbc->counters)
 		return -ENOMEM;
 #ifdef CONFIG_HOTPLUG_CPU
@@ -101,7 +101,7 @@ void percpu_counter_destroy(struct percp
 	if (!fbc->counters)
 		return;
 
-	free_percpu(fbc->counters);
+	CPU_FREE(fbc->counters);
 #ifdef CONFIG_HOTPLUG_CPU
 	mutex_lock(&percpu_counters_lock);
 	list_del(&fbc->list);
@@ -127,7 +127,7 @@ static int __cpuinit percpu_counter_hotc
 		unsigned long flags;
 
 		spin_lock_irqsave(&fbc->lock, flags);
-		pcount = per_cpu_ptr(fbc->counters, cpu);
+		pcount = CPU_PTR(fbc->counters, cpu);
 		fbc->count += *pcount;
 		*pcount = 0;
 		spin_unlock_irqrestore(&fbc->lock, flags);

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 10/30] cpu alloc: crash_notes conversion
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
                   ` (8 preceding siblings ...)
  2007-11-16 23:09 ` [patch 09/30] cpu alloc: percpu_counter conversion Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  2007-11-16 23:09 ` [patch 11/30] cpu alloc: workqueue conversion Christoph Lameter
                   ` (19 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[-- Attachment #1: 0020-cpu-alloc-crash_notes-conversion.patch --]
[-- Type: text/plain, Size: 2331 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 arch/ia64/kernel/crash.c |    2 +-
 drivers/base/cpu.c       |    2 +-
 kernel/kexec.c           |    4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

Index: linux-2.6/arch/ia64/kernel/crash.c
===================================================================
--- linux-2.6.orig/arch/ia64/kernel/crash.c	2007-11-15 21:18:10.647904573 -0800
+++ linux-2.6/arch/ia64/kernel/crash.c	2007-11-15 21:25:29.423155123 -0800
@@ -71,7 +71,7 @@ crash_save_this_cpu(void)
 	dst[46] = (unsigned long)ia64_rse_skip_regs((unsigned long *)dst[46],
 			sof - sol);
 
-	buf = (u64 *) per_cpu_ptr(crash_notes, cpu);
+	buf = (u64 *) CPU_PTR(crash_notes, cpu);
 	if (!buf)
 		return;
 	buf = append_elf_note(buf, KEXEC_CORE_NOTE_NAME, NT_PRSTATUS, prstatus,
Index: linux-2.6/drivers/base/cpu.c
===================================================================
--- linux-2.6.orig/drivers/base/cpu.c	2007-11-15 21:18:10.655904442 -0800
+++ linux-2.6/drivers/base/cpu.c	2007-11-15 21:25:29.423155123 -0800
@@ -95,7 +95,7 @@ static ssize_t show_crash_notes(struct s
 	 * boot up and this data does not change there after. Hence this
 	 * operation should be safe. No locking required.
 	 */
-	addr = __pa(per_cpu_ptr(crash_notes, cpunum));
+	addr = __pa(CPU_PTR(crash_notes, cpunum));
 	rc = sprintf(buf, "%Lx\n", addr);
 	return rc;
 }
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c	2007-11-15 21:18:10.663904549 -0800
+++ linux-2.6/kernel/kexec.c	2007-11-15 21:25:29.423155123 -0800
@@ -1122,7 +1122,7 @@ void crash_save_cpu(struct pt_regs *regs
 	 * squirrelled away.  ELF notes happen to provide
 	 * all of that, so there is no need to invent something new.
 	 */
-	buf = (u32*)per_cpu_ptr(crash_notes, cpu);
+	buf = (u32*)CPU_PTR(crash_notes, cpu);
 	if (!buf)
 		return;
 	memset(&prstatus, 0, sizeof(prstatus));
@@ -1136,7 +1136,7 @@ void crash_save_cpu(struct pt_regs *regs
 static int __init crash_notes_memory_init(void)
 {
 	/* Allocate memory for saving cpu registers. */
-	crash_notes = alloc_percpu(note_buf_t);
+	crash_notes = CPU_ALLOC(note_buf_t, GFP_KERNEL|__GFP_ZERO);
 	if (!crash_notes) {
 		printk("Kexec: Memory allocation for saving cpu register"
 		" states failed\n");

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 11/30] cpu alloc: workqueue conversion
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
                   ` (9 preceding siblings ...)
  2007-11-16 23:09 ` [patch 10/30] cpu alloc: crash_notes conversion Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  2007-11-16 23:09 ` [patch 12/30] cpu alloc: ACPI cstate handling conversion Christoph Lameter
                   ` (18 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[-- Attachment #1: 0021-cpu-alloc-workqueue-conversion.patch --]
[-- Type: text/plain, Size: 3414 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 kernel/workqueue.c |   27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

Index: linux-2.6/kernel/workqueue.c
===================================================================
--- linux-2.6.orig/kernel/workqueue.c	2007-11-15 21:18:11.726153923 -0800
+++ linux-2.6/kernel/workqueue.c	2007-11-15 21:25:29.966154099 -0800
@@ -100,7 +100,7 @@ struct cpu_workqueue_struct *wq_per_cpu(
 {
 	if (unlikely(is_single_threaded(wq)))
 		cpu = singlethread_cpu;
-	return per_cpu_ptr(wq->cpu_wq, cpu);
+	return CPU_PTR(wq->cpu_wq, cpu);
 }
 
 /*
@@ -398,7 +398,7 @@ void fastcall flush_workqueue(struct wor
 	lock_acquire(&wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
 	lock_release(&wq->lockdep_map, 1, _THIS_IP_);
 	for_each_cpu_mask(cpu, *cpu_map)
-		flush_cpu_workqueue(per_cpu_ptr(wq->cpu_wq, cpu));
+		flush_cpu_workqueue(CPU_PTR(wq->cpu_wq, cpu));
 }
 EXPORT_SYMBOL_GPL(flush_workqueue);
 
@@ -478,7 +478,7 @@ static void wait_on_work(struct work_str
 	cpu_map = wq_cpu_map(wq);
 
 	for_each_cpu_mask(cpu, *cpu_map)
-		wait_on_cpu_work(per_cpu_ptr(wq->cpu_wq, cpu), work);
+		wait_on_cpu_work(CPU_PTR(wq->cpu_wq, cpu), work);
 }
 
 static int __cancel_work_timer(struct work_struct *work,
@@ -601,21 +601,21 @@ int schedule_on_each_cpu(work_func_t fun
 	int cpu;
 	struct work_struct *works;
 
-	works = alloc_percpu(struct work_struct);
+	works = CPU_ALLOC(struct work_struct, GFP_KERNEL);
 	if (!works)
 		return -ENOMEM;
 
 	preempt_disable();		/* CPU hotplug */
 	for_each_online_cpu(cpu) {
-		struct work_struct *work = per_cpu_ptr(works, cpu);
+		struct work_struct *work = CPU_PTR(works, cpu);
 
 		INIT_WORK(work, func);
 		set_bit(WORK_STRUCT_PENDING, work_data_bits(work));
-		__queue_work(per_cpu_ptr(keventd_wq->cpu_wq, cpu), work);
+		__queue_work(CPU_PTR(keventd_wq->cpu_wq, cpu), work);
 	}
 	preempt_enable();
 	flush_workqueue(keventd_wq);
-	free_percpu(works);
+	CPU_FREE(works);
 	return 0;
 }
 
@@ -664,7 +664,7 @@ int current_is_keventd(void)
 
 	BUG_ON(!keventd_wq);
 
-	cwq = per_cpu_ptr(keventd_wq->cpu_wq, cpu);
+	cwq = CPU_PTR(keventd_wq->cpu_wq, cpu);
 	if (current == cwq->thread)
 		ret = 1;
 
@@ -675,7 +675,7 @@ int current_is_keventd(void)
 static struct cpu_workqueue_struct *
 init_cpu_workqueue(struct workqueue_struct *wq, int cpu)
 {
-	struct cpu_workqueue_struct *cwq = per_cpu_ptr(wq->cpu_wq, cpu);
+	struct cpu_workqueue_struct *cwq = CPU_PTR(wq->cpu_wq, cpu);
 
 	cwq->wq = wq;
 	spin_lock_init(&cwq->lock);
@@ -732,7 +732,8 @@ struct workqueue_struct *__create_workqu
 	if (!wq)
 		return NULL;
 
-	wq->cpu_wq = alloc_percpu(struct cpu_workqueue_struct);
+	wq->cpu_wq = CPU_ALLOC(struct cpu_workqueue_struct,
+					GFP_KERNEL|__GFP_ZERO);
 	if (!wq->cpu_wq) {
 		kfree(wq);
 		return NULL;
@@ -814,11 +815,11 @@ void destroy_workqueue(struct workqueue_
 	mutex_unlock(&workqueue_mutex);
 
 	for_each_cpu_mask(cpu, *cpu_map) {
-		cwq = per_cpu_ptr(wq->cpu_wq, cpu);
+		cwq = CPU_PTR(wq->cpu_wq, cpu);
 		cleanup_workqueue_thread(cwq, cpu);
 	}
 
-	free_percpu(wq->cpu_wq);
+	CPU_FREE(wq->cpu_wq);
 	kfree(wq);
 }
 EXPORT_SYMBOL_GPL(destroy_workqueue);
@@ -847,7 +848,7 @@ static int __devinit workqueue_cpu_callb
 	}
 
 	list_for_each_entry(wq, &workqueues, list) {
-		cwq = per_cpu_ptr(wq->cpu_wq, cpu);
+		cwq = CPU_PTR(wq->cpu_wq, cpu);
 
 		switch (action) {
 		case CPU_UP_PREPARE:

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 12/30] cpu alloc: ACPI cstate handling conversion
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
                   ` (10 preceding siblings ...)
  2007-11-16 23:09 ` [patch 11/30] cpu alloc: workqueue conversion Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  2007-11-16 23:09 ` [patch 13/30] cpu alloc: genhd statistics conversion Christoph Lameter
                   ` (17 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[-- Attachment #1: 0022-cpu-alloc-ACPI-cstate-handling-conversion.patch --]
[-- Type: text/plain, Size: 3543 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 arch/x86/kernel/acpi/cstate.c              |    9 +++++----
 arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c |    7 ++++---
 drivers/acpi/processor_perflib.c           |    4 ++--
 3 files changed, 11 insertions(+), 9 deletions(-)

Index: linux-2.6/arch/x86/kernel/acpi/cstate.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/acpi/cstate.c	2007-11-15 21:18:09.238904115 -0800
+++ linux-2.6/arch/x86/kernel/acpi/cstate.c	2007-11-15 21:25:30.499154221 -0800
@@ -87,7 +87,7 @@ int acpi_processor_ffh_cstate_probe(unsi
 	if (reg->bit_offset != NATIVE_CSTATE_BEYOND_HALT)
 		return -1;
 
-	percpu_entry = per_cpu_ptr(cpu_cstate_entry, cpu);
+	percpu_entry = CPU_PTR(cpu_cstate_entry, cpu);
 	percpu_entry->states[cx->index].eax = 0;
 	percpu_entry->states[cx->index].ecx = 0;
 
@@ -138,7 +138,7 @@ void acpi_processor_ffh_cstate_enter(str
 	unsigned int cpu = smp_processor_id();
 	struct cstate_entry *percpu_entry;
 
-	percpu_entry = per_cpu_ptr(cpu_cstate_entry, cpu);
+	percpu_entry = CPU_PTR(cpu_cstate_entry, cpu);
 	mwait_idle_with_hints(percpu_entry->states[cx->index].eax,
 	                      percpu_entry->states[cx->index].ecx);
 }
@@ -150,13 +150,14 @@ static int __init ffh_cstate_init(void)
 	if (c->x86_vendor != X86_VENDOR_INTEL)
 		return -1;
 
-	cpu_cstate_entry = alloc_percpu(struct cstate_entry);
+	cpu_cstate_entry = CPU_ALLOC(struct cstate_entry,
+					GFP_KERNEL|__GFP_ZERO);
 	return 0;
 }
 
 static void __exit ffh_cstate_exit(void)
 {
-	free_percpu(cpu_cstate_entry);
+	CPU_FREE(cpu_cstate_entry);
 	cpu_cstate_entry = NULL;
 }
 
Index: linux-2.6/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c	2007-11-15 21:18:09.246904080 -0800
+++ linux-2.6/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c	2007-11-15 21:25:30.499154221 -0800
@@ -513,7 +513,8 @@ static int __init acpi_cpufreq_early_ini
 {
 	dprintk("acpi_cpufreq_early_init\n");
 
-	acpi_perf_data = alloc_percpu(struct acpi_processor_performance);
+	acpi_perf_data = CPU_ALLOC(struct acpi_processor_performance,
+						GFP_KERNEL|__GFP_ZERO);
 	if (!acpi_perf_data) {
 		dprintk("Memory allocation error for acpi_perf_data.\n");
 		return -ENOMEM;
@@ -569,7 +570,7 @@ static int acpi_cpufreq_cpu_init(struct 
 	if (!data)
 		return -ENOMEM;
 
-	data->acpi_data = percpu_ptr(acpi_perf_data, cpu);
+	data->acpi_data = CPU_PTR(acpi_perf_data, cpu);
 	drv_data[cpu] = data;
 
 	if (cpu_has(c, X86_FEATURE_CONSTANT_TSC))
@@ -782,7 +783,7 @@ static void __exit acpi_cpufreq_exit(voi
 
 	cpufreq_unregister_driver(&acpi_cpufreq_driver);
 
-	free_percpu(acpi_perf_data);
+	CPU_FREE(acpi_perf_data);
 
 	return;
 }
Index: linux-2.6/drivers/acpi/processor_perflib.c
===================================================================
--- linux-2.6.orig/drivers/acpi/processor_perflib.c	2007-11-15 21:18:09.254904773 -0800
+++ linux-2.6/drivers/acpi/processor_perflib.c	2007-11-15 21:25:30.499154221 -0800
@@ -567,12 +567,12 @@ int acpi_processor_preregister_performan
 			continue;
 		}
 
-		if (!performance || !percpu_ptr(performance, i)) {
+		if (!performance || !CPU_PTR(performance, i)) {
 			retval = -EINVAL;
 			continue;
 		}
 
-		pr->performance = percpu_ptr(performance, i);
+		pr->performance = CPU_PTR(performance, i);
 		cpu_set(i, pr->performance->shared_cpu_map);
 		if (acpi_processor_get_psd(pr)) {
 			retval = -EINVAL;

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 13/30] cpu alloc: genhd statistics conversion
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
                   ` (11 preceding siblings ...)
  2007-11-16 23:09 ` [patch 12/30] cpu alloc: ACPI cstate handling conversion Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  2007-11-16 23:09 ` [patch 14/30] cpu alloc: blktrace conversion Christoph Lameter
                   ` (16 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[-- Attachment #1: 0023-cpu-alloc-genhd-statistics-conversion.patch --]
[-- Type: text/plain, Size: 1772 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 include/linux/genhd.h |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Index: linux-2.6/include/linux/genhd.h
===================================================================
--- linux-2.6.orig/include/linux/genhd.h	2007-11-15 21:18:07.967654575 -0800
+++ linux-2.6/include/linux/genhd.h	2007-11-15 21:25:31.066904143 -0800
@@ -158,21 +158,21 @@ struct disk_attribute {
  */
 #ifdef	CONFIG_SMP
 #define __disk_stat_add(gendiskp, field, addnd) 	\
-	(per_cpu_ptr(gendiskp->dkstats, smp_processor_id())->field += addnd)
+	(THIS_CPU(gendiskp->dkstats)->field += addnd)
 
 #define disk_stat_read(gendiskp, field)					\
 ({									\
 	typeof(gendiskp->dkstats->field) res = 0;			\
 	int i;								\
 	for_each_possible_cpu(i)					\
-		res += per_cpu_ptr(gendiskp->dkstats, i)->field;	\
+		res += CPU_PTR(gendiskp->dkstats, i)->field;	\
 	res;								\
 })
 
 static inline void disk_stat_set_all(struct gendisk *gendiskp, int value)	{
 	int i;
 	for_each_possible_cpu(i)
-		memset(per_cpu_ptr(gendiskp->dkstats, i), value,
+		memset(CPU_PTR(gendiskp->dkstats, i), value,
 				sizeof (struct disk_stats));
 }		
 				
@@ -209,7 +209,7 @@ static inline void disk_stat_set_all(str
 #ifdef  CONFIG_SMP
 static inline int init_disk_stats(struct gendisk *disk)
 {
-	disk->dkstats = alloc_percpu(struct disk_stats);
+	disk->dkstats = CPU_ALLOC(struct disk_stats, GFP_KERNEL | __GFP_ZERO);
 	if (!disk->dkstats)
 		return 0;
 	return 1;
@@ -217,7 +217,7 @@ static inline int init_disk_stats(struct
 
 static inline void free_disk_stats(struct gendisk *disk)
 {
-	free_percpu(disk->dkstats);
+	CPU_FREE(disk->dkstats);
 }
 #else	/* CONFIG_SMP */
 static inline int init_disk_stats(struct gendisk *disk)

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 14/30] cpu alloc: blktrace conversion
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
                   ` (12 preceding siblings ...)
  2007-11-16 23:09 ` [patch 13/30] cpu alloc: genhd statistics conversion Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  2007-11-16 23:09 ` [patch 15/30] cpu alloc: SRCU Christoph Lameter
                   ` (15 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[-- Attachment #1: 0024-cpu-alloc-blktrace-conversion.patch --]
[-- Type: text/plain, Size: 1398 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 block/blktrace.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Index: linux-2.6/block/blktrace.c
===================================================================
--- linux-2.6.orig/block/blktrace.c	2007-11-15 21:17:24.586154116 -0800
+++ linux-2.6/block/blktrace.c	2007-11-15 21:25:31.591154091 -0800
@@ -155,7 +155,7 @@ void __blk_add_trace(struct blk_trace *b
 	t = relay_reserve(bt->rchan, sizeof(*t) + pdu_len);
 	if (t) {
 		cpu = smp_processor_id();
-		sequence = per_cpu_ptr(bt->sequence, cpu);
+		sequence = CPU_PTR(bt->sequence, cpu);
 
 		t->magic = BLK_IO_TRACE_MAGIC | BLK_IO_TRACE_VERSION;
 		t->sequence = ++(*sequence);
@@ -227,7 +227,7 @@ static void blk_trace_cleanup(struct blk
 	relay_close(bt->rchan);
 	debugfs_remove(bt->dropped_file);
 	blk_remove_tree(bt->dir);
-	free_percpu(bt->sequence);
+	CPU_FREE(bt->sequence);
 	kfree(bt);
 }
 
@@ -338,7 +338,7 @@ int do_blk_trace_setup(struct request_qu
 	if (!bt)
 		goto err;
 
-	bt->sequence = alloc_percpu(unsigned long);
+	bt->sequence = CPU_ALLOC(unsigned long, GFP_KERNEL | __GFP_ZERO);
 	if (!bt->sequence)
 		goto err;
 
@@ -387,7 +387,7 @@ err:
 	if (bt) {
 		if (bt->dropped_file)
 			debugfs_remove(bt->dropped_file);
-		free_percpu(bt->sequence);
+		CPU_FREE(bt->sequence);
 		if (bt->rchan)
 			relay_close(bt->rchan);
 		kfree(bt);

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 15/30] cpu alloc: SRCU
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
                   ` (13 preceding siblings ...)
  2007-11-16 23:09 ` [patch 14/30] cpu alloc: blktrace conversion Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  2007-11-16 23:09 ` [patch 16/30] cpu alloc: XFS counters Christoph Lameter
                   ` (14 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[-- Attachment #1: 0025-cpu-alloc-SRCU.patch --]
[-- Type: text/plain, Size: 2573 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 kernel/rcutorture.c |    4 ++--
 kernel/srcu.c       |   11 ++++++-----
 2 files changed, 8 insertions(+), 7 deletions(-)

Index: linux-2.6/kernel/rcutorture.c
===================================================================
--- linux-2.6.orig/kernel/rcutorture.c	2007-11-15 21:17:24.515654132 -0800
+++ linux-2.6/kernel/rcutorture.c	2007-11-15 21:25:32.102406141 -0800
@@ -441,8 +441,8 @@ static int srcu_torture_stats(char *page
 		       torture_type, TORTURE_FLAG, idx);
 	for_each_possible_cpu(cpu) {
 		cnt += sprintf(&page[cnt], " %d(%d,%d)", cpu,
-			       per_cpu_ptr(srcu_ctl.per_cpu_ref, cpu)->c[!idx],
-			       per_cpu_ptr(srcu_ctl.per_cpu_ref, cpu)->c[idx]);
+			       CPU_PTR(srcu_ctl.per_cpu_ref, cpu)->c[!idx],
+			       CPU_PTR(srcu_ctl.per_cpu_ref, cpu)->c[idx]);
 	}
 	cnt += sprintf(&page[cnt], "\n");
 	return cnt;
Index: linux-2.6/kernel/srcu.c
===================================================================
--- linux-2.6.orig/kernel/srcu.c	2007-11-15 21:17:24.523654368 -0800
+++ linux-2.6/kernel/srcu.c	2007-11-15 21:25:32.102406141 -0800
@@ -46,7 +46,8 @@ int init_srcu_struct(struct srcu_struct 
 {
 	sp->completed = 0;
 	mutex_init(&sp->mutex);
-	sp->per_cpu_ref = alloc_percpu(struct srcu_struct_array);
+	sp->per_cpu_ref = CPU_ALLOC(struct srcu_struct_array,
+						GFP_KERNEL|__GFP_ZERO);
 	return (sp->per_cpu_ref ? 0 : -ENOMEM);
 }
 
@@ -62,7 +63,7 @@ static int srcu_readers_active_idx(struc
 
 	sum = 0;
 	for_each_possible_cpu(cpu)
-		sum += per_cpu_ptr(sp->per_cpu_ref, cpu)->c[idx];
+		sum += CPU_PTR(sp->per_cpu_ref, cpu)->c[idx];
 	return sum;
 }
 
@@ -94,7 +95,7 @@ void cleanup_srcu_struct(struct srcu_str
 	WARN_ON(sum);  /* Leakage unless caller handles error. */
 	if (sum != 0)
 		return;
-	free_percpu(sp->per_cpu_ref);
+	CPU_FREE(sp->per_cpu_ref);
 	sp->per_cpu_ref = NULL;
 }
 
@@ -113,7 +114,7 @@ int srcu_read_lock(struct srcu_struct *s
 	preempt_disable();
 	idx = sp->completed & 0x1;
 	barrier();  /* ensure compiler looks -once- at sp->completed. */
-	per_cpu_ptr(sp->per_cpu_ref, smp_processor_id())->c[idx]++;
+	THIS_CPU(sp->per_cpu_ref)->c[idx]++;
 	srcu_barrier();  /* ensure compiler won't misorder critical section. */
 	preempt_enable();
 	return idx;
@@ -133,7 +134,7 @@ void srcu_read_unlock(struct srcu_struct
 {
 	preempt_disable();
 	srcu_barrier();  /* ensure compiler won't misorder critical section. */
-	per_cpu_ptr(sp->per_cpu_ref, smp_processor_id())->c[idx]--;
+	THIS_CPU(sp->per_cpu_ref)->c[idx]--;
 	preempt_enable();
 }
 

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 16/30] cpu alloc: XFS counters
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
                   ` (14 preceding siblings ...)
  2007-11-16 23:09 ` [patch 15/30] cpu alloc: SRCU Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  2007-11-19 12:58   ` Christoph Hellwig
  2007-11-16 23:09 ` [patch 17/30] cpu alloc: NFS statistics Christoph Lameter
                   ` (13 subsequent siblings)
  29 siblings, 1 reply; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[-- Attachment #1: 0026-cpu-alloc-XFS-counters.patch --]
[-- Type: text/plain, Size: 3014 bytes --]

Also remove the useless zeroing after allocation. Allocpercpu already
zeroed the objects.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 fs/xfs/xfs_mount.c |   24 ++++++++----------------
 1 file changed, 8 insertions(+), 16 deletions(-)

Index: linux-2.6/fs/xfs/xfs_mount.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_mount.c	2007-11-15 21:17:24.467654585 -0800
+++ linux-2.6/fs/xfs/xfs_mount.c	2007-11-15 21:25:32.643904117 -0800
@@ -1924,7 +1924,7 @@ xfs_icsb_cpu_notify(
 
 	mp = (xfs_mount_t *)container_of(nfb, xfs_mount_t, m_icsb_notifier);
 	cntp = (xfs_icsb_cnts_t *)
-			per_cpu_ptr(mp->m_sb_cnts, (unsigned long)hcpu);
+			CPU_PTR(mp->m_sb_cnts, (unsigned long)hcpu);
 	switch (action) {
 	case CPU_UP_PREPARE:
 	case CPU_UP_PREPARE_FROZEN:
@@ -1976,10 +1976,7 @@ int
 xfs_icsb_init_counters(
 	xfs_mount_t	*mp)
 {
-	xfs_icsb_cnts_t *cntp;
-	int		i;
-
-	mp->m_sb_cnts = alloc_percpu(xfs_icsb_cnts_t);
+	mp->m_sb_cnts = CPU_ALLOC(xfs_icsb_cnts_t, GFP_KERNEL | __GFP_ZERO);
 	if (mp->m_sb_cnts == NULL)
 		return -ENOMEM;
 
@@ -1989,11 +1986,6 @@ xfs_icsb_init_counters(
 	register_hotcpu_notifier(&mp->m_icsb_notifier);
 #endif /* CONFIG_HOTPLUG_CPU */
 
-	for_each_online_cpu(i) {
-		cntp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, i);
-		memset(cntp, 0, sizeof(xfs_icsb_cnts_t));
-	}
-
 	mutex_init(&mp->m_icsb_mutex);
 
 	/*
@@ -2026,7 +2018,7 @@ xfs_icsb_destroy_counters(
 {
 	if (mp->m_sb_cnts) {
 		unregister_hotcpu_notifier(&mp->m_icsb_notifier);
-		free_percpu(mp->m_sb_cnts);
+		CPU_FREE(mp->m_sb_cnts);
 	}
 	mutex_destroy(&mp->m_icsb_mutex);
 }
@@ -2056,7 +2048,7 @@ xfs_icsb_lock_all_counters(
 	int		i;
 
 	for_each_online_cpu(i) {
-		cntp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, i);
+		cntp = (xfs_icsb_cnts_t *)CPU_PTR(mp->m_sb_cnts, i);
 		xfs_icsb_lock_cntr(cntp);
 	}
 }
@@ -2069,7 +2061,7 @@ xfs_icsb_unlock_all_counters(
 	int		i;
 
 	for_each_online_cpu(i) {
-		cntp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, i);
+		cntp = (xfs_icsb_cnts_t *)CPU_PTR(mp->m_sb_cnts, i);
 		xfs_icsb_unlock_cntr(cntp);
 	}
 }
@@ -2089,7 +2081,7 @@ xfs_icsb_count(
 		xfs_icsb_lock_all_counters(mp);
 
 	for_each_online_cpu(i) {
-		cntp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, i);
+		cntp = (xfs_icsb_cnts_t *)CPU_PTR(mp->m_sb_cnts, i);
 		cnt->icsb_icount += cntp->icsb_icount;
 		cnt->icsb_ifree += cntp->icsb_ifree;
 		cnt->icsb_fdblocks += cntp->icsb_fdblocks;
@@ -2167,7 +2159,7 @@ xfs_icsb_enable_counter(
 
 	xfs_icsb_lock_all_counters(mp);
 	for_each_online_cpu(i) {
-		cntp = per_cpu_ptr(mp->m_sb_cnts, i);
+		cntp = CPU_PTR(mp->m_sb_cnts, i);
 		switch (field) {
 		case XFS_SBS_ICOUNT:
 			cntp->icsb_icount = count + resid;
@@ -2307,7 +2299,7 @@ xfs_icsb_modify_counters(
 	might_sleep();
 again:
 	cpu = get_cpu();
-	icsbp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, cpu);
+	icsbp = (xfs_icsb_cnts_t *)CPU_PTR(mp->m_sb_cnts, cpu);
 
 	/*
 	 * if the counter is disabled, go to slow path

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 17/30] cpu alloc: NFS statistics
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
                   ` (15 preceding siblings ...)
  2007-11-16 23:09 ` [patch 16/30] cpu alloc: XFS counters Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  2007-11-16 23:09 ` [patch 18/30] cpu alloc: neigbour statistics Christoph Lameter
                   ` (12 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[-- Attachment #1: 0027-cpu-alloc-NFS-statistics.patch --]
[-- Type: text/plain, Size: 1804 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 fs/nfs/iostat.h |    8 ++++----
 fs/nfs/super.c  |    2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

Index: linux-2.6/fs/nfs/iostat.h
===================================================================
--- linux-2.6.orig/fs/nfs/iostat.h	2007-11-15 21:17:24.391404458 -0800
+++ linux-2.6/fs/nfs/iostat.h	2007-11-15 21:25:33.167654066 -0800
@@ -123,7 +123,7 @@ static inline void nfs_inc_server_stats(
 	int cpu;
 
 	cpu = get_cpu();
-	iostats = per_cpu_ptr(server->io_stats, cpu);
+	iostats = CPU_PTR(server->io_stats, cpu);
 	iostats->events[stat] ++;
 	put_cpu_no_resched();
 }
@@ -139,7 +139,7 @@ static inline void nfs_add_server_stats(
 	int cpu;
 
 	cpu = get_cpu();
-	iostats = per_cpu_ptr(server->io_stats, cpu);
+	iostats = CPU_PTR(server->io_stats, cpu);
 	iostats->bytes[stat] += addend;
 	put_cpu_no_resched();
 }
@@ -151,13 +151,13 @@ static inline void nfs_add_stats(struct 
 
 static inline struct nfs_iostats *nfs_alloc_iostats(void)
 {
-	return alloc_percpu(struct nfs_iostats);
+	return CPU_ALLOC(struct nfs_iostats, GFP_KERNEL | __GFP_ZERO);
 }
 
 static inline void nfs_free_iostats(struct nfs_iostats *stats)
 {
 	if (stats != NULL)
-		free_percpu(stats);
+		CPU_FREE(stats);
 }
 
 #endif
Index: linux-2.6/fs/nfs/super.c
===================================================================
--- linux-2.6.orig/fs/nfs/super.c	2007-11-15 21:17:24.399404478 -0800
+++ linux-2.6/fs/nfs/super.c	2007-11-15 21:25:33.171654143 -0800
@@ -529,7 +529,7 @@ static int nfs_show_stats(struct seq_fil
 		struct nfs_iostats *stats;
 
 		preempt_disable();
-		stats = per_cpu_ptr(nfss->io_stats, cpu);
+		stats = CPU_PTR(nfss->io_stats, cpu);
 
 		for (i = 0; i < __NFSIOS_COUNTSMAX; i++)
 			totals.events[i] += stats->events[i];

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 18/30] cpu alloc: neigbour statistics
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
                   ` (16 preceding siblings ...)
  2007-11-16 23:09 ` [patch 17/30] cpu alloc: NFS statistics Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  2007-11-16 23:09 ` [patch 19/30] cpu alloc: tcp statistics Christoph Lameter
                   ` (11 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[-- Attachment #1: 0028-cpu-alloc-neigbour-statistics.patch --]
[-- Type: text/plain, Size: 2364 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 include/net/neighbour.h |    2 +-
 net/core/neighbour.c    |   11 ++++++-----
 2 files changed, 7 insertions(+), 6 deletions(-)

Index: linux-2.6/include/net/neighbour.h
===================================================================
--- linux-2.6.orig/include/net/neighbour.h	2007-11-15 21:17:24.319654300 -0800
+++ linux-2.6/include/net/neighbour.h	2007-11-15 21:25:33.678404221 -0800
@@ -83,7 +83,7 @@ struct neigh_statistics
 #define NEIGH_CACHE_STAT_INC(tbl, field)				\
 	do {								\
 		preempt_disable();					\
-		(per_cpu_ptr((tbl)->stats, smp_processor_id())->field)++; \
+		(THIS_CPU((tbl)->stats)->field)++; \
 		preempt_enable();					\
 	} while (0)
 
Index: linux-2.6/net/core/neighbour.c
===================================================================
--- linux-2.6.orig/net/core/neighbour.c	2007-11-15 21:17:24.327654639 -0800
+++ linux-2.6/net/core/neighbour.c	2007-11-15 21:25:33.678404221 -0800
@@ -1348,7 +1348,8 @@ void neigh_table_init_no_netlink(struct 
 			kmem_cache_create(tbl->id, tbl->entry_size, 0,
 					  SLAB_HWCACHE_ALIGN|SLAB_PANIC,
 					  NULL);
-	tbl->stats = alloc_percpu(struct neigh_statistics);
+	tbl->stats = CPU_ALLOC(struct neigh_statistics,
+					GFP_KERNEL | __GFP_ZERO);
 	if (!tbl->stats)
 		panic("cannot create neighbour cache statistics");
 
@@ -1437,7 +1438,7 @@ int neigh_table_clear(struct neigh_table
 
 	remove_proc_entry(tbl->id, init_net.proc_net_stat);
 
-	free_percpu(tbl->stats);
+	CPU_FREE(tbl->stats);
 	tbl->stats = NULL;
 
 	kmem_cache_destroy(tbl->kmem_cachep);
@@ -1694,7 +1695,7 @@ static int neightbl_fill_info(struct sk_
 		for_each_possible_cpu(cpu) {
 			struct neigh_statistics	*st;
 
-			st = per_cpu_ptr(tbl->stats, cpu);
+			st = CPU_PTR(tbl->stats, cpu);
 			ndst.ndts_allocs		+= st->allocs;
 			ndst.ndts_destroys		+= st->destroys;
 			ndst.ndts_hash_grows		+= st->hash_grows;
@@ -2343,7 +2344,7 @@ static void *neigh_stat_seq_start(struct
 		if (!cpu_possible(cpu))
 			continue;
 		*pos = cpu+1;
-		return per_cpu_ptr(tbl->stats, cpu);
+		return CPU_PTR(tbl->stats, cpu);
 	}
 	return NULL;
 }
@@ -2358,7 +2359,7 @@ static void *neigh_stat_seq_next(struct 
 		if (!cpu_possible(cpu))
 			continue;
 		*pos = cpu+1;
-		return per_cpu_ptr(tbl->stats, cpu);
+		return CPU_PTR(tbl->stats, cpu);
 	}
 	return NULL;
 }

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 19/30] cpu alloc: tcp statistics
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
                   ` (17 preceding siblings ...)
  2007-11-16 23:09 ` [patch 18/30] cpu alloc: neigbour statistics Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  2007-11-16 23:09 ` [patch 20/30] cpu alloc: convert scatches Christoph Lameter
                   ` (10 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[-- Attachment #1: 0029-cpu-alloc-tcp-statistics.patch --]
[-- Type: text/plain, Size: 1629 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 net/ipv4/tcp.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Index: linux-2.6/net/ipv4/tcp.c
===================================================================
--- linux-2.6.orig/net/ipv4/tcp.c	2007-11-15 21:17:24.267654551 -0800
+++ linux-2.6/net/ipv4/tcp.c	2007-11-15 21:25:34.214404334 -0800
@@ -2273,7 +2273,7 @@ static void __tcp_free_md5sig_pool(struc
 {
 	int cpu;
 	for_each_possible_cpu(cpu) {
-		struct tcp_md5sig_pool *p = *per_cpu_ptr(pool, cpu);
+		struct tcp_md5sig_pool *p = *CPU_PTR(pool, cpu);
 		if (p) {
 			if (p->md5_desc.tfm)
 				crypto_free_hash(p->md5_desc.tfm);
@@ -2281,7 +2281,7 @@ static void __tcp_free_md5sig_pool(struc
 			p = NULL;
 		}
 	}
-	free_percpu(pool);
+	CPU_FREE(pool);
 }
 
 void tcp_free_md5sig_pool(void)
@@ -2305,7 +2305,7 @@ static struct tcp_md5sig_pool **__tcp_al
 	int cpu;
 	struct tcp_md5sig_pool **pool;
 
-	pool = alloc_percpu(struct tcp_md5sig_pool *);
+	pool = CPU_ALLOC(struct tcp_md5sig_pool *, GFP_KERNEL);
 	if (!pool)
 		return NULL;
 
@@ -2316,7 +2316,7 @@ static struct tcp_md5sig_pool **__tcp_al
 		p = kzalloc(sizeof(*p), GFP_KERNEL);
 		if (!p)
 			goto out_free;
-		*per_cpu_ptr(pool, cpu) = p;
+		*CPU_PTR(pool, cpu) = p;
 
 		hash = crypto_alloc_hash("md5", 0, CRYPTO_ALG_ASYNC);
 		if (!hash || IS_ERR(hash))
@@ -2381,7 +2381,7 @@ struct tcp_md5sig_pool *__tcp_get_md5sig
 	if (p)
 		tcp_md5sig_users++;
 	spin_unlock_bh(&tcp_md5sig_pool_lock);
-	return (p ? *per_cpu_ptr(p, cpu) : NULL);
+	return (p ? *CPU_PTR(p, cpu) : NULL);
 }
 
 EXPORT_SYMBOL(__tcp_get_md5sig_pool);

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 20/30] cpu alloc: convert scatches
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
                   ` (18 preceding siblings ...)
  2007-11-16 23:09 ` [patch 19/30] cpu alloc: tcp statistics Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  2007-11-16 23:09 ` [patch 21/30] cpu alloc: dmaengine conversion Christoph Lameter
                   ` (9 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[-- Attachment #1: 0030-cpu-alloc-convert-scatches.patch --]
[-- Type: text/plain, Size: 6122 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 net/ipv4/ipcomp.c  |   26 +++++++++++++-------------
 net/ipv6/ipcomp6.c |   26 +++++++++++++-------------
 2 files changed, 26 insertions(+), 26 deletions(-)

Index: linux-2.6/net/ipv4/ipcomp.c
===================================================================
--- linux-2.6.orig/net/ipv4/ipcomp.c	2007-11-15 21:17:24.199404507 -0800
+++ linux-2.6/net/ipv4/ipcomp.c	2007-11-15 21:25:34.771154012 -0800
@@ -48,8 +48,8 @@ static int ipcomp_decompress(struct xfrm
 	int dlen = IPCOMP_SCRATCH_SIZE;
 	const u8 *start = skb->data;
 	const int cpu = get_cpu();
-	u8 *scratch = *per_cpu_ptr(ipcomp_scratches, cpu);
-	struct crypto_comp *tfm = *per_cpu_ptr(ipcd->tfms, cpu);
+	u8 *scratch = *CPU_PTR(ipcomp_scratches, cpu);
+	struct crypto_comp *tfm = *CPU_PTR(ipcd->tfms, cpu);
 	int err = crypto_comp_decompress(tfm, start, plen, scratch, &dlen);
 
 	if (err)
@@ -103,8 +103,8 @@ static int ipcomp_compress(struct xfrm_s
 	int dlen = IPCOMP_SCRATCH_SIZE;
 	u8 *start = skb->data;
 	const int cpu = get_cpu();
-	u8 *scratch = *per_cpu_ptr(ipcomp_scratches, cpu);
-	struct crypto_comp *tfm = *per_cpu_ptr(ipcd->tfms, cpu);
+	u8 *scratch = *CPU_PTR(ipcomp_scratches, cpu);
+	struct crypto_comp *tfm = *CPU_PTR(ipcd->tfms, cpu);
 	int err = crypto_comp_compress(tfm, start, plen, scratch, &dlen);
 
 	if (err)
@@ -252,9 +252,9 @@ static void ipcomp_free_scratches(void)
 		return;
 
 	for_each_possible_cpu(i)
-		vfree(*per_cpu_ptr(scratches, i));
+		vfree(*CPU_PTR(scratches, i));
 
-	free_percpu(scratches);
+	CPU_FREE(scratches);
 }
 
 static void **ipcomp_alloc_scratches(void)
@@ -265,7 +265,7 @@ static void **ipcomp_alloc_scratches(voi
 	if (ipcomp_scratch_users++)
 		return ipcomp_scratches;
 
-	scratches = alloc_percpu(void *);
+	scratches = CPU_ALLOC(void *, GFP_KERNEL);
 	if (!scratches)
 		return NULL;
 
@@ -275,7 +275,7 @@ static void **ipcomp_alloc_scratches(voi
 		void *scratch = vmalloc(IPCOMP_SCRATCH_SIZE);
 		if (!scratch)
 			return NULL;
-		*per_cpu_ptr(scratches, i) = scratch;
+		*CPU_PTR(scratches, i) = scratch;
 	}
 
 	return scratches;
@@ -303,10 +303,10 @@ static void ipcomp_free_tfms(struct cryp
 		return;
 
 	for_each_possible_cpu(cpu) {
-		struct crypto_comp *tfm = *per_cpu_ptr(tfms, cpu);
+		struct crypto_comp *tfm = *CPU_PTR(tfms, cpu);
 		crypto_free_comp(tfm);
 	}
-	free_percpu(tfms);
+	CPU_FREE(tfms);
 }
 
 static struct crypto_comp **ipcomp_alloc_tfms(const char *alg_name)
@@ -322,7 +322,7 @@ static struct crypto_comp **ipcomp_alloc
 		struct crypto_comp *tfm;
 
 		tfms = pos->tfms;
-		tfm = *per_cpu_ptr(tfms, cpu);
+		tfm = *CPU_PTR(tfms, cpu);
 
 		if (!strcmp(crypto_comp_name(tfm), alg_name)) {
 			pos->users++;
@@ -338,7 +338,7 @@ static struct crypto_comp **ipcomp_alloc
 	INIT_LIST_HEAD(&pos->list);
 	list_add(&pos->list, &ipcomp_tfms_list);
 
-	pos->tfms = tfms = alloc_percpu(struct crypto_comp *);
+	pos->tfms = tfms = CPU_ALLOC(struct crypto_comp *, GFP_KERNEL);
 	if (!tfms)
 		goto error;
 
@@ -347,7 +347,7 @@ static struct crypto_comp **ipcomp_alloc
 							    CRYPTO_ALG_ASYNC);
 		if (IS_ERR(tfm))
 			goto error;
-		*per_cpu_ptr(tfms, cpu) = tfm;
+		*CPU_PTR(tfms, cpu) = tfm;
 	}
 
 	return tfms;
Index: linux-2.6/net/ipv6/ipcomp6.c
===================================================================
--- linux-2.6.orig/net/ipv6/ipcomp6.c	2007-11-15 21:17:24.207404544 -0800
+++ linux-2.6/net/ipv6/ipcomp6.c	2007-11-15 21:25:34.774656957 -0800
@@ -88,8 +88,8 @@ static int ipcomp6_input(struct xfrm_sta
 	start = skb->data;
 
 	cpu = get_cpu();
-	scratch = *per_cpu_ptr(ipcomp6_scratches, cpu);
-	tfm = *per_cpu_ptr(ipcd->tfms, cpu);
+	scratch = *CPU_PTR(ipcomp6_scratches, cpu);
+	tfm = *CPU_PTR(ipcd->tfms, cpu);
 
 	err = crypto_comp_decompress(tfm, start, plen, scratch, &dlen);
 	if (err)
@@ -140,8 +140,8 @@ static int ipcomp6_output(struct xfrm_st
 	start = skb->data;
 
 	cpu = get_cpu();
-	scratch = *per_cpu_ptr(ipcomp6_scratches, cpu);
-	tfm = *per_cpu_ptr(ipcd->tfms, cpu);
+	scratch = *CPU_PTR(ipcomp6_scratches, cpu);
+	tfm = *CPU_PTR(ipcd->tfms, cpu);
 
 	err = crypto_comp_compress(tfm, start, plen, scratch, &dlen);
 	if (err || (dlen + sizeof(*ipch)) >= plen) {
@@ -263,12 +263,12 @@ static void ipcomp6_free_scratches(void)
 		return;
 
 	for_each_possible_cpu(i) {
-		void *scratch = *per_cpu_ptr(scratches, i);
+		void *scratch = *CPU_PTR(scratches, i);
 
 		vfree(scratch);
 	}
 
-	free_percpu(scratches);
+	CPU_FREE(scratches);
 }
 
 static void **ipcomp6_alloc_scratches(void)
@@ -279,7 +279,7 @@ static void **ipcomp6_alloc_scratches(vo
 	if (ipcomp6_scratch_users++)
 		return ipcomp6_scratches;
 
-	scratches = alloc_percpu(void *);
+	scratches = CPU_ALLOC(void *, GFP_KERNEL);
 	if (!scratches)
 		return NULL;
 
@@ -289,7 +289,7 @@ static void **ipcomp6_alloc_scratches(vo
 		void *scratch = vmalloc(IPCOMP_SCRATCH_SIZE);
 		if (!scratch)
 			return NULL;
-		*per_cpu_ptr(scratches, i) = scratch;
+		*CPU_PTR(scratches, i) = scratch;
 	}
 
 	return scratches;
@@ -317,10 +317,10 @@ static void ipcomp6_free_tfms(struct cry
 		return;
 
 	for_each_possible_cpu(cpu) {
-		struct crypto_comp *tfm = *per_cpu_ptr(tfms, cpu);
+		struct crypto_comp *tfm = *CPU_PTR(tfms, cpu);
 		crypto_free_comp(tfm);
 	}
-	free_percpu(tfms);
+	CPU_FREE(tfms);
 }
 
 static struct crypto_comp **ipcomp6_alloc_tfms(const char *alg_name)
@@ -336,7 +336,7 @@ static struct crypto_comp **ipcomp6_allo
 		struct crypto_comp *tfm;
 
 		tfms = pos->tfms;
-		tfm = *per_cpu_ptr(tfms, cpu);
+		tfm = *CPU_PTR(tfms, cpu);
 
 		if (!strcmp(crypto_comp_name(tfm), alg_name)) {
 			pos->users++;
@@ -352,7 +352,7 @@ static struct crypto_comp **ipcomp6_allo
 	INIT_LIST_HEAD(&pos->list);
 	list_add(&pos->list, &ipcomp6_tfms_list);
 
-	pos->tfms = tfms = alloc_percpu(struct crypto_comp *);
+	pos->tfms = tfms = CPU_ALLOC(struct crypto_comp *, GFP_KERNEL);
 	if (!tfms)
 		goto error;
 
@@ -361,7 +361,7 @@ static struct crypto_comp **ipcomp6_allo
 							    CRYPTO_ALG_ASYNC);
 		if (IS_ERR(tfm))
 			goto error;
-		*per_cpu_ptr(tfms, cpu) = tfm;
+		*CPU_PTR(tfms, cpu) = tfm;
 	}
 
 	return tfms;

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 21/30] cpu alloc: dmaengine conversion
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
                   ` (19 preceding siblings ...)
  2007-11-16 23:09 ` [patch 20/30] cpu alloc: convert scatches Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  2007-11-16 23:09 ` [patch 22/30] cpu alloc: convert loopback statistics Christoph Lameter
                   ` (8 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[-- Attachment #1: 0031-cpu-alloc-dmaengine-conversion.patch --]
[-- Type: text/plain, Size: 4319 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 drivers/dma/dmaengine.c   |   27 ++++++++++++++-------------
 include/linux/dmaengine.h |    4 ++--
 2 files changed, 16 insertions(+), 15 deletions(-)

Index: linux-2.6/drivers/dma/dmaengine.c
===================================================================
--- linux-2.6.orig/drivers/dma/dmaengine.c	2007-11-15 21:17:24.127154620 -0800
+++ linux-2.6/drivers/dma/dmaengine.c	2007-11-15 21:25:35.354654191 -0800
@@ -84,7 +84,7 @@ static ssize_t show_memcpy_count(struct 
 	int i;
 
 	for_each_possible_cpu(i)
-		count += per_cpu_ptr(chan->local, i)->memcpy_count;
+		count += CPU_PTR(chan->local, i)->memcpy_count;
 
 	return sprintf(buf, "%lu\n", count);
 }
@@ -96,7 +96,7 @@ static ssize_t show_bytes_transferred(st
 	int i;
 
 	for_each_possible_cpu(i)
-		count += per_cpu_ptr(chan->local, i)->bytes_transferred;
+		count += CPU_PTR(chan->local, i)->bytes_transferred;
 
 	return sprintf(buf, "%lu\n", count);
 }
@@ -110,7 +110,7 @@ static ssize_t show_in_use(struct class_
 		atomic_read(&chan->refcount.refcount) > 1)
 		in_use = 1;
 	else {
-		if (local_read(&(per_cpu_ptr(chan->local,
+		if (local_read(&(CPU_PTR(chan->local,
 			get_cpu())->refcount)) > 0)
 			in_use = 1;
 		put_cpu();
@@ -226,7 +226,7 @@ static void dma_chan_free_rcu(struct rcu
 	int bias = 0x7FFFFFFF;
 	int i;
 	for_each_possible_cpu(i)
-		bias -= local_read(&per_cpu_ptr(chan->local, i)->refcount);
+		bias -= local_read(&CPU_PTR(chan->local, i)->refcount);
 	atomic_sub(bias, &chan->refcount.refcount);
 	kref_put(&chan->refcount, dma_chan_cleanup);
 }
@@ -372,7 +372,8 @@ int dma_async_device_register(struct dma
 
 	/* represent channels in sysfs. Probably want devs too */
 	list_for_each_entry(chan, &device->channels, device_node) {
-		chan->local = alloc_percpu(typeof(*chan->local));
+		chan->local = CPU_ALLOC(typeof(*chan->local),
+					GFP_KERNEL | __GFP_ZERO);
 		if (chan->local == NULL)
 			continue;
 
@@ -385,7 +386,7 @@ int dma_async_device_register(struct dma
 		rc = class_device_register(&chan->class_dev);
 		if (rc) {
 			chancnt--;
-			free_percpu(chan->local);
+			CPU_FREE(chan->local);
 			chan->local = NULL;
 			goto err_out;
 		}
@@ -413,7 +414,7 @@ err_out:
 		kref_put(&device->refcount, dma_async_device_cleanup);
 		class_device_unregister(&chan->class_dev);
 		chancnt--;
-		free_percpu(chan->local);
+		CPU_FREE(chan->local);
 	}
 	return rc;
 }
@@ -489,8 +490,8 @@ dma_async_memcpy_buf_to_buf(struct dma_c
 	cookie = tx->tx_submit(tx);
 
 	cpu = get_cpu();
-	per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
-	per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+	CPU_PTR(chan->local, cpu)->bytes_transferred += len;
+	CPU_PTR(chan->local, cpu)->memcpy_count++;
 	put_cpu();
 
 	return cookie;
@@ -533,8 +534,8 @@ dma_async_memcpy_buf_to_pg(struct dma_ch
 	cookie = tx->tx_submit(tx);
 
 	cpu = get_cpu();
-	per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
-	per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+	CPU_PTR(chan->local, cpu)->bytes_transferred += len;
+	CPU_PTR(chan->local, cpu)->memcpy_count++;
 	put_cpu();
 
 	return cookie;
@@ -579,8 +580,8 @@ dma_async_memcpy_pg_to_pg(struct dma_cha
 	cookie = tx->tx_submit(tx);
 
 	cpu = get_cpu();
-	per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
-	per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+	CPU_PTR(chan->local, cpu)->bytes_transferred += len;
+	CPU_PTR(chan->local, cpu)->memcpy_count++;
 	put_cpu();
 
 	return cookie;
Index: linux-2.6/include/linux/dmaengine.h
===================================================================
--- linux-2.6.orig/include/linux/dmaengine.h	2007-11-15 21:17:24.135154570 -0800
+++ linux-2.6/include/linux/dmaengine.h	2007-11-15 21:25:35.358654166 -0800
@@ -150,7 +150,7 @@ static inline void dma_chan_get(struct d
 	if (unlikely(chan->slow_ref))
 		kref_get(&chan->refcount);
 	else {
-		local_inc(&(per_cpu_ptr(chan->local, get_cpu())->refcount));
+		local_inc(&CPU_PTR(chan->local, get_cpu())->refcount);
 		put_cpu();
 	}
 }
@@ -160,7 +160,7 @@ static inline void dma_chan_put(struct d
 	if (unlikely(chan->slow_ref))
 		kref_put(&chan->refcount, dma_chan_cleanup);
 	else {
-		local_dec(&(per_cpu_ptr(chan->local, get_cpu())->refcount));
+		local_dec(&CPU_PTR(chan->local, get_cpu())->refcount);
 		put_cpu();
 	}
 }

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 22/30] cpu alloc: convert loopback statistics
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
                   ` (20 preceding siblings ...)
  2007-11-16 23:09 ` [patch 21/30] cpu alloc: dmaengine conversion Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  2007-11-16 23:09 ` [patch 23/30] cpu alloc: veth conversion Christoph Lameter
                   ` (7 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[-- Attachment #1: 0032-cpu-alloc-convert-loopback-statistics.patch --]
[-- Type: text/plain, Size: 1423 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 drivers/net/loopback.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Index: linux-2.6/drivers/net/loopback.c
===================================================================
--- linux-2.6.orig/drivers/net/loopback.c	2007-11-15 21:17:24.067154382 -0800
+++ linux-2.6/drivers/net/loopback.c	2007-11-15 21:25:36.006154068 -0800
@@ -156,7 +156,7 @@ static int loopback_xmit(struct sk_buff 
 
 	/* it's OK to use per_cpu_ptr() because BHs are off */
 	pcpu_lstats = netdev_priv(dev);
-	lb_stats = per_cpu_ptr(pcpu_lstats, smp_processor_id());
+	lb_stats = THIS_CPU(pcpu_lstats);
 	lb_stats->bytes += skb->len;
 	lb_stats->packets++;
 
@@ -177,7 +177,7 @@ static struct net_device_stats *get_stat
 	for_each_possible_cpu(i) {
 		const struct pcpu_lstats *lb_stats;
 
-		lb_stats = per_cpu_ptr(pcpu_lstats, i);
+		lb_stats = CPU_PTR(pcpu_lstats, i);
 		bytes   += lb_stats->bytes;
 		packets += lb_stats->packets;
 	}
@@ -205,7 +205,7 @@ static int loopback_dev_init(struct net_
 {
 	struct pcpu_lstats *lstats;
 
-	lstats = alloc_percpu(struct pcpu_lstats);
+	lstats = CPU_ALLOC(struct pcpu_lstats, GFP_KERNEL | __GFP_ZERO);
 	if (!lstats)
 		return -ENOMEM;
 
@@ -217,7 +217,7 @@ static void loopback_dev_free(struct net
 {
 	struct pcpu_lstats *lstats = netdev_priv(dev);
 
-	free_percpu(lstats);
+	CPU_FREE(lstats);
 	free_netdev(dev);
 }
 

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 23/30] cpu alloc: veth conversion
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
                   ` (21 preceding siblings ...)
  2007-11-16 23:09 ` [patch 22/30] cpu alloc: convert loopback statistics Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  2007-11-16 23:09 ` [patch 24/30] cpu alloc: Chelsio statistics conversion Christoph Lameter
                   ` (6 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[-- Attachment #1: 0033-cpu-alloc-veth-conversion.patch --]
[-- Type: text/plain, Size: 1666 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 drivers/net/veth.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Index: linux-2.6/drivers/net/veth.c
===================================================================
--- linux-2.6.orig/drivers/net/veth.c	2007-11-15 21:17:24.010404318 -0800
+++ linux-2.6/drivers/net/veth.c	2007-11-15 21:25:36.483154219 -0800
@@ -162,7 +162,7 @@ static int veth_xmit(struct sk_buff *skb
 	rcv_priv = netdev_priv(rcv);
 
 	cpu = smp_processor_id();
-	stats = per_cpu_ptr(priv->stats, cpu);
+	stats = CPU_PTR(priv->stats, cpu);
 
 	if (!(rcv->flags & IFF_UP))
 		goto outf;
@@ -183,7 +183,7 @@ static int veth_xmit(struct sk_buff *skb
 	stats->tx_bytes += length;
 	stats->tx_packets++;
 
-	stats = per_cpu_ptr(rcv_priv->stats, cpu);
+	stats = CPU_PTR(rcv_priv->stats, cpu);
 	stats->rx_bytes += length;
 	stats->rx_packets++;
 
@@ -217,7 +217,7 @@ static struct net_device_stats *veth_get
 	dev_stats->tx_dropped = 0;
 
 	for_each_online_cpu(cpu) {
-		stats = per_cpu_ptr(priv->stats, cpu);
+		stats = CPU_PTR(priv->stats, cpu);
 
 		dev_stats->rx_packets += stats->rx_packets;
 		dev_stats->tx_packets += stats->tx_packets;
@@ -261,7 +261,7 @@ static int veth_dev_init(struct net_devi
 	struct veth_net_stats *stats;
 	struct veth_priv *priv;
 
-	stats = alloc_percpu(struct veth_net_stats);
+	stats = CPU_ALLOC(struct veth_net_stats, GFP_KERNEL | __GFP_ZER);
 	if (stats == NULL)
 		return -ENOMEM;
 
@@ -275,7 +275,7 @@ static void veth_dev_free(struct net_dev
 	struct veth_priv *priv;
 
 	priv = netdev_priv(dev);
-	free_percpu(priv->stats);
+	CPU_FREE(priv->stats);
 	free_netdev(dev);
 }
 

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 24/30] cpu alloc: Chelsio statistics conversion
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
                   ` (22 preceding siblings ...)
  2007-11-16 23:09 ` [patch 23/30] cpu alloc: veth conversion Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  2007-11-16 23:09 ` [patch 25/30] cpu alloc: convert mib handling to cpu alloc Christoph Lameter
                   ` (5 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[-- Attachment #1: 0034-cpu-alloc-Chelsio-statistics-conversion.patch --]
[-- Type: text/plain, Size: 2209 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 drivers/net/chelsio/sge.c |   13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

Index: linux-2.6/drivers/net/chelsio/sge.c
===================================================================
--- linux-2.6.orig/drivers/net/chelsio/sge.c	2007-11-15 21:17:23.927654318 -0800
+++ linux-2.6/drivers/net/chelsio/sge.c	2007-11-15 21:25:37.015154316 -0800
@@ -805,7 +805,7 @@ void t1_sge_destroy(struct sge *sge)
 	int i;
 
 	for_each_port(sge->adapter, i)
-		free_percpu(sge->port_stats[i]);
+		CPU_FREE(sge->port_stats[i]);
 
 	kfree(sge->tx_sched);
 	free_tx_resources(sge);
@@ -984,7 +984,7 @@ void t1_sge_get_port_stats(const struct 
 
 	memset(ss, 0, sizeof(*ss));
 	for_each_possible_cpu(cpu) {
-		struct sge_port_stats *st = per_cpu_ptr(sge->port_stats[port], cpu);
+		struct sge_port_stats *st = CPU_PTR(sge->port_stats[port], cpu);
 
 		ss->rx_packets += st->rx_packets;
 		ss->rx_cso_good += st->rx_cso_good;
@@ -1379,7 +1379,7 @@ static void sge_rx(struct sge *sge, stru
 	}
 	__skb_pull(skb, sizeof(*p));
 
-	st = per_cpu_ptr(sge->port_stats[p->iff], smp_processor_id());
+	st = THIS_CPU(sge->port_stats[p->iff]);
 	st->rx_packets++;
 
 	skb->protocol = eth_type_trans(skb, adapter->port[p->iff].dev);
@@ -1848,7 +1848,7 @@ int t1_start_xmit(struct sk_buff *skb, s
 {
 	struct adapter *adapter = dev->priv;
 	struct sge *sge = adapter->sge;
-	struct sge_port_stats *st = per_cpu_ptr(sge->port_stats[dev->if_port], smp_processor_id());
+	struct sge_port_stats *st = THIS_CPU(sge->port_stats[dev->if_port]);
 	struct cpl_tx_pkt *cpl;
 	struct sk_buff *orig_skb = skb;
 	int ret;
@@ -2165,7 +2165,8 @@ struct sge * __devinit t1_sge_create(str
 	sge->jumbo_fl = t1_is_T1B(adapter) ? 1 : 0;
 
 	for_each_port(adapter, i) {
-		sge->port_stats[i] = alloc_percpu(struct sge_port_stats);
+		sge->port_stats[i] = CPU_ALLOC(struct sge_port_stats,
+					GFP_KERNEL | __GFP_ZERO);
 		if (!sge->port_stats[i])
 			goto nomem_port;
 	}
@@ -2209,7 +2210,7 @@ struct sge * __devinit t1_sge_create(str
 	return sge;
 nomem_port:
 	while (i >= 0) {
-		free_percpu(sge->port_stats[i]);
+		CPU_FREE(sge->port_stats[i]);
 		--i;
 	}
 	kfree(sge);

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 25/30] cpu alloc: convert mib handling to cpu alloc
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
                   ` (23 preceding siblings ...)
  2007-11-16 23:09 ` [patch 24/30] cpu alloc: Chelsio statistics conversion Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  2007-11-16 23:09 ` [patch 26/30] cpu_alloc: convert network sockets Christoph Lameter
                   ` (4 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[-- Attachment #1: 0035-cpu-alloc-convert-mib-handling-to-cpu-alloc.patch --]
[-- Type: text/plain, Size: 10677 bytes --]

Use the cpu alloc functions for the mib handling functions in the net
layer. The API for snmp_mib_free() is changed to add a size parameter
since cpu_fre requires that.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 include/net/ip.h    |    2 +-
 include/net/snmp.h  |   14 +++++++-------
 net/dccp/proto.c    |   12 +++++++-----
 net/ipv4/af_inet.c  |   31 +++++++++++++++++--------------
 net/ipv6/addrconf.c |   10 +++++-----
 net/ipv6/af_inet6.c |   18 +++++++++---------
 net/sctp/proc.c     |    4 ++--
 net/sctp/protocol.c |   12 +++++++-----
 8 files changed, 55 insertions(+), 48 deletions(-)

Index: linux-2.6/include/net/ip.h
===================================================================
--- linux-2.6.orig/include/net/ip.h	2007-11-15 21:17:23.831654180 -0800
+++ linux-2.6/include/net/ip.h	2007-11-15 21:25:37.575154222 -0800
@@ -170,7 +170,7 @@ DECLARE_SNMP_STAT(struct linux_mib, net_
 
 extern unsigned long snmp_fold_field(void *mib[], int offt);
 extern int snmp_mib_init(void *ptr[2], size_t mibsize, size_t mibalign);
-extern void snmp_mib_free(void *ptr[2]);
+extern void snmp_mib_free(void *ptr[2], size_t mibsize);
 
 extern void inet_get_local_port_range(int *low, int *high);
 
Index: linux-2.6/include/net/snmp.h
===================================================================
--- linux-2.6.orig/include/net/snmp.h	2007-11-15 21:17:23.839654350 -0800
+++ linux-2.6/include/net/snmp.h	2007-11-15 21:25:37.575154222 -0800
@@ -133,18 +133,18 @@ struct linux_mib {
 #define SNMP_STAT_USRPTR(name)	(name[1])
 
 #define SNMP_INC_STATS_BH(mib, field) 	\
-	(per_cpu_ptr(mib[0], raw_smp_processor_id())->mibs[field]++)
+	(__THIS_CPU(mib[0])->mibs[field]++)
 #define SNMP_INC_STATS_OFFSET_BH(mib, field, offset)	\
-	(per_cpu_ptr(mib[0], raw_smp_processor_id())->mibs[field + (offset)]++)
+	(__THIS_CPU(mib[0])->mibs[field + (offset)]++)
 #define SNMP_INC_STATS_USER(mib, field) \
-	(per_cpu_ptr(mib[1], raw_smp_processor_id())->mibs[field]++)
+	(__THIS_CPU(mib[1])->mibs[field]++)
 #define SNMP_INC_STATS(mib, field) 	\
-	(per_cpu_ptr(mib[!in_softirq()], raw_smp_processor_id())->mibs[field]++)
+	(__THIS_CPU(mib[!in_softirq()])->mibs[field]++)
 #define SNMP_DEC_STATS(mib, field) 	\
-	(per_cpu_ptr(mib[!in_softirq()], raw_smp_processor_id())->mibs[field]--)
+	(__THIS_CPU(mib[!in_softirq()])->mibs[field]--)
 #define SNMP_ADD_STATS_BH(mib, field, addend) 	\
-	(per_cpu_ptr(mib[0], raw_smp_processor_id())->mibs[field] += addend)
+	(__THIS_CPU(mib[0])->mibs[field] += addend)
 #define SNMP_ADD_STATS_USER(mib, field, addend) 	\
-	(per_cpu_ptr(mib[1], raw_smp_processor_id())->mibs[field] += addend)
+	(__THIS_CPU(mib[1])->mibs[field] += addend)
 
 #endif
Index: linux-2.6/net/dccp/proto.c
===================================================================
--- linux-2.6.orig/net/dccp/proto.c	2007-11-15 21:17:23.847654486 -0800
+++ linux-2.6/net/dccp/proto.c	2007-11-15 21:25:37.575154222 -0800
@@ -990,11 +990,13 @@ static int __init dccp_mib_init(void)
 {
 	int rc = -ENOMEM;
 
-	dccp_statistics[0] = alloc_percpu(struct dccp_mib);
+	dccp_statistics[0] = CPU_ALLOC(struct dccp_mib,
+					GFP_KERNEL | __GFP_ZERO);
 	if (dccp_statistics[0] == NULL)
 		goto out;
 
-	dccp_statistics[1] = alloc_percpu(struct dccp_mib);
+	dccp_statistics[1] = CPU_ALLOC(struct dccp_mib,
+					GFP_KERNEL | __GFP_ZERO);
 	if (dccp_statistics[1] == NULL)
 		goto out_free_one;
 
@@ -1002,7 +1004,7 @@ static int __init dccp_mib_init(void)
 out:
 	return rc;
 out_free_one:
-	free_percpu(dccp_statistics[0]);
+	CPU_FREE(dccp_statistics[0]);
 	dccp_statistics[0] = NULL;
 	goto out;
 
@@ -1010,8 +1012,8 @@ out_free_one:
 
 static void dccp_mib_exit(void)
 {
-	free_percpu(dccp_statistics[0]);
-	free_percpu(dccp_statistics[1]);
+	CPU_FREE(dccp_statistics[0]);
+	CPU_FREE(dccp_statistics[1]);
 	dccp_statistics[0] = dccp_statistics[1] = NULL;
 }
 
Index: linux-2.6/net/ipv4/af_inet.c
===================================================================
--- linux-2.6.orig/net/ipv4/af_inet.c	2007-11-15 21:17:23.855654347 -0800
+++ linux-2.6/net/ipv4/af_inet.c	2007-11-15 21:25:37.575154222 -0800
@@ -1230,8 +1230,8 @@ unsigned long snmp_fold_field(void *mib[
 	int i;
 
 	for_each_possible_cpu(i) {
-		res += *(((unsigned long *) per_cpu_ptr(mib[0], i)) + offt);
-		res += *(((unsigned long *) per_cpu_ptr(mib[1], i)) + offt);
+		res += *(((unsigned long *) CPU_PTR(mib[0], i)) + offt);
+		res += *(((unsigned long *) CPU_PTR(mib[1], i)) + offt);
 	}
 	return res;
 }
@@ -1240,26 +1240,28 @@ EXPORT_SYMBOL_GPL(snmp_fold_field);
 int snmp_mib_init(void *ptr[2], size_t mibsize, size_t mibalign)
 {
 	BUG_ON(ptr == NULL);
-	ptr[0] = __alloc_percpu(mibsize);
+	ptr[0] = cpu_alloc(mibsize, GFP_KERNEL | __GFP_ZERO,
+					mibalign);
 	if (!ptr[0])
 		goto err0;
-	ptr[1] = __alloc_percpu(mibsize);
+	ptr[1] = cpu_alloc(mibsize, GFP_KERNEL | __GFP_ZERO,
+					mibalign);
 	if (!ptr[1])
 		goto err1;
 	return 0;
 err1:
-	free_percpu(ptr[0]);
+	cpu_free(ptr[0], mibsize);
 	ptr[0] = NULL;
 err0:
 	return -ENOMEM;
 }
 EXPORT_SYMBOL_GPL(snmp_mib_init);
 
-void snmp_mib_free(void *ptr[2])
+void snmp_mib_free(void *ptr[2], size_t mibsize)
 {
 	BUG_ON(ptr == NULL);
-	free_percpu(ptr[0]);
-	free_percpu(ptr[1]);
+	cpu_free(ptr[0], mibsize);
+	cpu_free(ptr[1], mibsize);
 	ptr[0] = ptr[1] = NULL;
 }
 EXPORT_SYMBOL_GPL(snmp_mib_free);
@@ -1324,17 +1326,18 @@ static int __init init_ipv4_mibs(void)
 	return 0;
 
 err_udplite_mib:
-	snmp_mib_free((void **)udp_statistics);
+	snmp_mib_free((void **)udp_statistics, sizeof(struct udp_mib));
 err_udp_mib:
-	snmp_mib_free((void **)tcp_statistics);
+	snmp_mib_free((void **)tcp_statistics, sizeof(struct tcp_mib));
 err_tcp_mib:
-	snmp_mib_free((void **)icmpmsg_statistics);
+	snmp_mib_free((void **)icmpmsg_statistics,
+					sizeof(struct icmpmsg_mib));
 err_icmpmsg_mib:
-	snmp_mib_free((void **)icmp_statistics);
+	snmp_mib_free((void **)icmp_statistics, sizeof(struct icmp_mib));
 err_icmp_mib:
-	snmp_mib_free((void **)ip_statistics);
+	snmp_mib_free((void **)ip_statistics, sizeof(struct ipstats_mib));
 err_ip_mib:
-	snmp_mib_free((void **)net_statistics);
+	snmp_mib_free((void **)net_statistics, sizeof(struct linux_mib));
 err_net_mib:
 	return -ENOMEM;
 }
Index: linux-2.6/net/ipv6/addrconf.c
===================================================================
--- linux-2.6.orig/net/ipv6/addrconf.c	2007-11-15 21:17:23.859654454 -0800
+++ linux-2.6/net/ipv6/addrconf.c	2007-11-15 21:25:37.579154173 -0800
@@ -271,18 +271,18 @@ static int snmp6_alloc_dev(struct inet6_
 	return 0;
 
 err_icmpmsg:
-	snmp_mib_free((void **)idev->stats.icmpv6);
+	snmp_mib_free((void **)idev->stats.icmpv6, sizeof(struct icmpv6_mib));
 err_icmp:
-	snmp_mib_free((void **)idev->stats.ipv6);
+	snmp_mib_free((void **)idev->stats.ipv6, sizeof(struct ipstats_mib));
 err_ip:
 	return -ENOMEM;
 }
 
 static void snmp6_free_dev(struct inet6_dev *idev)
 {
-	snmp_mib_free((void **)idev->stats.icmpv6msg);
-	snmp_mib_free((void **)idev->stats.icmpv6);
-	snmp_mib_free((void **)idev->stats.ipv6);
+	snmp_mib_free((void **)idev->stats.icmpv6msg, sizeof(struct icmpv6_mib));
+	snmp_mib_free((void **)idev->stats.icmpv6, sizeof(struct icmpv6_mib));
+	snmp_mib_free((void **)idev->stats.ipv6, sizeof(struct ipstats_mib));
 }
 
 /* Nobody refers to this device, we may destroy it. */
Index: linux-2.6/net/ipv6/af_inet6.c
===================================================================
--- linux-2.6.orig/net/ipv6/af_inet6.c	2007-11-15 21:17:23.867654431 -0800
+++ linux-2.6/net/ipv6/af_inet6.c	2007-11-15 21:25:37.579154173 -0800
@@ -731,13 +731,13 @@ static int __init init_ipv6_mibs(void)
 	return 0;
 
 err_udplite_mib:
-	snmp_mib_free((void **)udp_stats_in6);
+	snmp_mib_free((void **)udp_stats_in6, sizeof(struct udp_mib));
 err_udp_mib:
-	snmp_mib_free((void **)icmpv6msg_statistics);
+	snmp_mib_free((void **)icmpv6msg_statistics, sizeof(struct icmpv6_mib));
 err_icmpmsg_mib:
-	snmp_mib_free((void **)icmpv6_statistics);
+	snmp_mib_free((void **)icmpv6_statistics, sizeof(struct icmpv6_mib));
 err_icmp_mib:
-	snmp_mib_free((void **)ipv6_statistics);
+	snmp_mib_free((void **)ipv6_statistics, sizeof(struct ipstats_mib));
 err_ip_mib:
 	return -ENOMEM;
 
@@ -745,11 +745,11 @@ err_ip_mib:
 
 static void cleanup_ipv6_mibs(void)
 {
-	snmp_mib_free((void **)ipv6_statistics);
-	snmp_mib_free((void **)icmpv6_statistics);
-	snmp_mib_free((void **)icmpv6msg_statistics);
-	snmp_mib_free((void **)udp_stats_in6);
-	snmp_mib_free((void **)udplite_stats_in6);
+	snmp_mib_free((void **)ipv6_statistics, sizeof(struct ipstats_mib));
+	snmp_mib_free((void **)icmpv6_statistics, sizeof(struct icmpv6_mib));
+	snmp_mib_free((void **)icmpv6msg_statistics, sizeof(struct icmpv6_mib));
+	snmp_mib_free((void **)udp_stats_in6, sizeof(struct udp_mib));
+	snmp_mib_free((void **)udplite_stats_in6, sizeof(struct udp_mib));
 }
 
 static int __init inet6_init(void)
Index: linux-2.6/net/sctp/proc.c
===================================================================
--- linux-2.6.orig/net/sctp/proc.c	2007-11-15 21:17:23.875654189 -0800
+++ linux-2.6/net/sctp/proc.c	2007-11-15 21:25:37.579154173 -0800
@@ -86,10 +86,10 @@ fold_field(void *mib[], int nr)
 
 	for_each_possible_cpu(i) {
 		res +=
-		    *((unsigned long *) (((void *) per_cpu_ptr(mib[0], i)) +
+		    *((unsigned long *) (((void *)CPU_PTR(mib[0], i)) +
 					 sizeof (unsigned long) * nr));
 		res +=
-		    *((unsigned long *) (((void *) per_cpu_ptr(mib[1], i)) +
+		    *((unsigned long *) (((void *)CPU_PTR(mib[1], i)) +
 					 sizeof (unsigned long) * nr));
 	}
 	return res;
Index: linux-2.6/net/sctp/protocol.c
===================================================================
--- linux-2.6.orig/net/sctp/protocol.c	2007-11-15 21:17:23.883654344 -0800
+++ linux-2.6/net/sctp/protocol.c	2007-11-15 21:25:37.579154173 -0800
@@ -970,12 +970,14 @@ int sctp_register_pf(struct sctp_pf *pf,
 
 static int __init init_sctp_mibs(void)
 {
-	sctp_statistics[0] = alloc_percpu(struct sctp_mib);
+	sctp_statistics[0] = CPU_ALLOC(struct sctp_mib,
+					GFP_KERNEL | __GFP_ZERO);
 	if (!sctp_statistics[0])
 		return -ENOMEM;
-	sctp_statistics[1] = alloc_percpu(struct sctp_mib);
+	sctp_statistics[1] = CPU_ALLOC(struct sctp_mib,
+					GFP_KERNEL | __GFP_ZERO);
 	if (!sctp_statistics[1]) {
-		free_percpu(sctp_statistics[0]);
+		CPU_FREE(sctp_statistics[0]);
 		return -ENOMEM;
 	}
 	return 0;
@@ -984,8 +986,8 @@ static int __init init_sctp_mibs(void)
 
 static void cleanup_sctp_mibs(void)
 {
-	free_percpu(sctp_statistics[0]);
-	free_percpu(sctp_statistics[1]);
+	CPU_FREE(sctp_statistics[0]);
+	CPU_FREE(sctp_statistics[1]);
 }
 
 /* Initialize the universe into something sensible.  */

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 26/30] cpu_alloc: convert network sockets
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
                   ` (24 preceding siblings ...)
  2007-11-16 23:09 ` [patch 25/30] cpu alloc: convert mib handling to cpu alloc Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  2007-11-16 23:09 ` [patch 27/30] cpu alloc: Explicitly code allocpercpu calls in iucv Christoph Lameter
                   ` (3 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[-- Attachment #1: 0036-cpu_alloc-convert-network-sockets.patch --]
[-- Type: text/plain, Size: 1342 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 net/core/sock.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Index: linux-2.6/net/core/sock.c
===================================================================
--- linux-2.6.orig/net/core/sock.c	2007-11-15 21:17:23.775404482 -0800
+++ linux-2.6/net/core/sock.c	2007-11-15 21:25:38.183201940 -0800
@@ -1809,21 +1809,21 @@ static LIST_HEAD(proto_list);
  */
 static void inuse_add(struct proto *prot, int inc)
 {
-	per_cpu_ptr(prot->inuse_ptr, smp_processor_id())[0] += inc;
+	THIS_CPU(prot->inuse_ptr)[0] += inc;
 }
 
 static int inuse_get(const struct proto *prot)
 {
 	int res = 0, cpu;
 	for_each_possible_cpu(cpu)
-		res += per_cpu_ptr(prot->inuse_ptr, cpu)[0];
+		res += CPU_PTR(prot->inuse_ptr, cpu)[0];
 	return res;
 }
 
 static int inuse_init(struct proto *prot)
 {
 	if (!prot->inuse_getval || !prot->inuse_add) {
-		prot->inuse_ptr = alloc_percpu(int);
+		prot->inuse_ptr = CPU_ALLOC(int, GFP_KERNEL);
 		if (prot->inuse_ptr == NULL)
 			return -ENOBUFS;
 
@@ -1836,7 +1836,7 @@ static int inuse_init(struct proto *prot
 static void inuse_fini(struct proto *prot)
 {
 	if (prot->inuse_ptr != NULL) {
-		free_percpu(prot->inuse_ptr);
+		CPU_FREE(prot->inuse_ptr);
 		prot->inuse_ptr = NULL;
 		prot->inuse_getval = NULL;
 		prot->inuse_add = NULL;

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 27/30] cpu alloc: Explicitly code allocpercpu calls in iucv
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
                   ` (25 preceding siblings ...)
  2007-11-16 23:09 ` [patch 26/30] cpu_alloc: convert network sockets Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  2007-11-16 23:09 ` [patch 28/30] cpu alloc: Use for infiniband Christoph Lameter
                   ` (2 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm
  Cc: linux-arch, Martin Schwidefsky, linux-kernel, David Miller,
	Eric Dumazet, Peter Zijlstra

[-- Attachment #1: 0037-cpu-alloc-Explicitly-code-allocpercpu-calls-in-iucv.patch --]
[-- Type: text/plain, Size: 10015 bytes --]

The iucv is the only user of the various functions that are used to bring
parts of cpus up and down. Its the only allocpercpu user that will do
I/O on per cpu objects (which is difficult to do with virtually mapped memory).
And its the only use of allocpercpu where a GFP_DMA allocation is done.

Remove the allocpercpu calls from iucv and code the allocation and freeing
manually.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 net/iucv/iucv.c |  107 +++++++++++++++++++++++++++++++-------------------------
 1 file changed, 61 insertions(+), 46 deletions(-)

Index: linux-2.6/net/iucv/iucv.c
===================================================================
--- linux-2.6.orig/net/iucv/iucv.c	2007-11-15 21:17:23.719404553 -0800
+++ linux-2.6/net/iucv/iucv.c	2007-11-15 21:25:38.758654051 -0800
@@ -97,7 +97,7 @@ struct iucv_irq_list {
 	struct iucv_irq_data data;
 };
 
-static struct iucv_irq_data *iucv_irq_data;
+static struct iucv_irq_data *iucv_irq_data[NR_CPUS];
 static cpumask_t iucv_buffer_cpumask = CPU_MASK_NONE;
 static cpumask_t iucv_irq_cpumask = CPU_MASK_NONE;
 
@@ -277,7 +277,7 @@ union iucv_param {
 /*
  * Anchor for per-cpu IUCV command parameter block.
  */
-static union iucv_param *iucv_param;
+static union iucv_param *iucv_param[NR_CPUS];
 
 /**
  * iucv_call_b2f0
@@ -356,7 +356,7 @@ static void iucv_allow_cpu(void *data)
 	 *	0x10 - Flag to allow priority message completion interrupts
 	 *	0x08 - Flag to allow IUCV control interrupts
 	 */
-	parm = percpu_ptr(iucv_param, smp_processor_id());
+	parm = iucv_param[cpu];
 	memset(parm, 0, sizeof(union iucv_param));
 	parm->set_mask.ipmask = 0xf8;
 	iucv_call_b2f0(IUCV_SETMASK, parm);
@@ -377,7 +377,7 @@ static void iucv_block_cpu(void *data)
 	union iucv_param *parm;
 
 	/* Disable all iucv interrupts. */
-	parm = percpu_ptr(iucv_param, smp_processor_id());
+	parm = iucv_param[cpu];
 	memset(parm, 0, sizeof(union iucv_param));
 	iucv_call_b2f0(IUCV_SETMASK, parm);
 
@@ -401,9 +401,9 @@ static void iucv_declare_cpu(void *data)
 		return;
 
 	/* Declare interrupt buffer. */
-	parm = percpu_ptr(iucv_param, cpu);
+	parm = iucv_param[cpu];
 	memset(parm, 0, sizeof(union iucv_param));
-	parm->db.ipbfadr1 = virt_to_phys(percpu_ptr(iucv_irq_data, cpu));
+	parm->db.ipbfadr1 = virt_to_phys(iucv_irq_data[cpu]);
 	rc = iucv_call_b2f0(IUCV_DECLARE_BUFFER, parm);
 	if (rc) {
 		char *err = "Unknown";
@@ -458,7 +458,7 @@ static void iucv_retrieve_cpu(void *data
 	iucv_block_cpu(NULL);
 
 	/* Retrieve interrupt buffer. */
-	parm = percpu_ptr(iucv_param, cpu);
+	parm = iucv_param[cpu];
 	iucv_call_b2f0(IUCV_RETRIEVE_BUFFER, parm);
 
 	/* Clear indication that an iucv buffer exists for this cpu. */
@@ -558,22 +558,23 @@ static int __cpuinit iucv_cpu_notify(str
 	switch (action) {
 	case CPU_UP_PREPARE:
 	case CPU_UP_PREPARE_FROZEN:
-		if (!percpu_populate(iucv_irq_data,
-				     sizeof(struct iucv_irq_data),
-				     GFP_KERNEL|GFP_DMA, cpu))
+		iucv_irq_data[cpu] = kmalloc_node(sizeof(struct iucv_irq_data),
+					GFP_KERNEL|GFP_DMA, cpu_to_node(cpu));
+		if (!iucv_irq_data[cpu])
 			return NOTIFY_BAD;
-		if (!percpu_populate(iucv_param, sizeof(union iucv_param),
-				     GFP_KERNEL|GFP_DMA, cpu)) {
-			percpu_depopulate(iucv_irq_data, cpu);
+		iucv_param[cpu] = kmalloc_node(sizeof(union iucv_param),
+				     GFP_KERNEL|GFP_DMA, cpu_to_node(cpu));
+		if (!iucv_param[cpu])
 			return NOTIFY_BAD;
-		}
 		break;
 	case CPU_UP_CANCELED:
 	case CPU_UP_CANCELED_FROZEN:
 	case CPU_DEAD:
 	case CPU_DEAD_FROZEN:
-		percpu_depopulate(iucv_param, cpu);
-		percpu_depopulate(iucv_irq_data, cpu);
+		kfree(iucv_param[cpu]);
+		iucv_param[cpu] = NULL;
+		kfree(iucv_irq_data[cpu]);
+		iucv_irq_data[cpu] = NULL;
 		break;
 	case CPU_ONLINE:
 	case CPU_ONLINE_FROZEN:
@@ -612,7 +613,7 @@ static int iucv_sever_pathid(u16 pathid,
 {
 	union iucv_param *parm;
 
-	parm = percpu_ptr(iucv_param, smp_processor_id());
+	parm = iucv_param[smp_processor_id()];
 	memset(parm, 0, sizeof(union iucv_param));
 	if (userdata)
 		memcpy(parm->ctrl.ipuser, userdata, sizeof(parm->ctrl.ipuser));
@@ -755,7 +756,7 @@ int iucv_path_accept(struct iucv_path *p
 
 	local_bh_disable();
 	/* Prepare parameter block. */
-	parm = percpu_ptr(iucv_param, smp_processor_id());
+	parm = iucv_param[smp_processor_id()];
 	memset(parm, 0, sizeof(union iucv_param));
 	parm->ctrl.ippathid = path->pathid;
 	parm->ctrl.ipmsglim = path->msglim;
@@ -799,7 +800,7 @@ int iucv_path_connect(struct iucv_path *
 	BUG_ON(in_atomic());
 	spin_lock_bh(&iucv_table_lock);
 	iucv_cleanup_queue();
-	parm = percpu_ptr(iucv_param, smp_processor_id());
+	parm = iucv_param[smp_processor_id()];
 	memset(parm, 0, sizeof(union iucv_param));
 	parm->ctrl.ipmsglim = path->msglim;
 	parm->ctrl.ipflags1 = path->flags;
@@ -854,7 +855,7 @@ int iucv_path_quiesce(struct iucv_path *
 	int rc;
 
 	local_bh_disable();
-	parm = percpu_ptr(iucv_param, smp_processor_id());
+	parm = iucv_param[smp_processor_id()];
 	memset(parm, 0, sizeof(union iucv_param));
 	if (userdata)
 		memcpy(parm->ctrl.ipuser, userdata, sizeof(parm->ctrl.ipuser));
@@ -881,7 +882,7 @@ int iucv_path_resume(struct iucv_path *p
 	int rc;
 
 	local_bh_disable();
-	parm = percpu_ptr(iucv_param, smp_processor_id());
+	parm = iucv_param[smp_processor_id()];
 	memset(parm, 0, sizeof(union iucv_param));
 	if (userdata)
 		memcpy(parm->ctrl.ipuser, userdata, sizeof(parm->ctrl.ipuser));
@@ -936,7 +937,7 @@ int iucv_message_purge(struct iucv_path 
 	int rc;
 
 	local_bh_disable();
-	parm = percpu_ptr(iucv_param, smp_processor_id());
+	parm = iucv_param[smp_processor_id()];
 	memset(parm, 0, sizeof(union iucv_param));
 	parm->purge.ippathid = path->pathid;
 	parm->purge.ipmsgid = msg->id;
@@ -1003,7 +1004,7 @@ int iucv_message_receive(struct iucv_pat
 	}
 
 	local_bh_disable();
-	parm = percpu_ptr(iucv_param, smp_processor_id());
+	parm = iucv_param[smp_processor_id()];
 	memset(parm, 0, sizeof(union iucv_param));
 	parm->db.ipbfadr1 = (u32)(addr_t) buffer;
 	parm->db.ipbfln1f = (u32) size;
@@ -1040,7 +1041,7 @@ int iucv_message_reject(struct iucv_path
 	int rc;
 
 	local_bh_disable();
-	parm = percpu_ptr(iucv_param, smp_processor_id());
+	parm = iucv_param[smp_processor_id()];
 	memset(parm, 0, sizeof(union iucv_param));
 	parm->db.ippathid = path->pathid;
 	parm->db.ipmsgid = msg->id;
@@ -1074,7 +1075,7 @@ int iucv_message_reply(struct iucv_path 
 	int rc;
 
 	local_bh_disable();
-	parm = percpu_ptr(iucv_param, smp_processor_id());
+	parm = iucv_param[smp_processor_id()];
 	memset(parm, 0, sizeof(union iucv_param));
 	if (flags & IUCV_IPRMDATA) {
 		parm->dpl.ippathid = path->pathid;
@@ -1118,7 +1119,7 @@ int iucv_message_send(struct iucv_path *
 	int rc;
 
 	local_bh_disable();
-	parm = percpu_ptr(iucv_param, smp_processor_id());
+	parm = iucv_param[smp_processor_id()];
 	memset(parm, 0, sizeof(union iucv_param));
 	if (flags & IUCV_IPRMDATA) {
 		/* Message of 8 bytes can be placed into the parameter list. */
@@ -1172,7 +1173,7 @@ int iucv_message_send2way(struct iucv_pa
 	int rc;
 
 	local_bh_disable();
-	parm = percpu_ptr(iucv_param, smp_processor_id());
+	parm = iucv_param[smp_processor_id()];
 	memset(parm, 0, sizeof(union iucv_param));
 	if (flags & IUCV_IPRMDATA) {
 		parm->dpl.ippathid = path->pathid;
@@ -1559,7 +1560,7 @@ static void iucv_external_interrupt(u16 
 	struct iucv_irq_data *p;
 	struct iucv_irq_list *work;
 
-	p = percpu_ptr(iucv_irq_data, smp_processor_id());
+	p = iucv_irq_data[smp_processor_id()];
 	if (p->ippathid >= iucv_max_pathid) {
 		printk(KERN_WARNING "iucv_do_int: Got interrupt with "
 		       "pathid %d > max_connections (%ld)\n",
@@ -1598,6 +1599,7 @@ static void iucv_external_interrupt(u16 
 static int __init iucv_init(void)
 {
 	int rc;
+	int cpu;
 
 	if (!MACHINE_IS_VM) {
 		rc = -EPROTONOSUPPORT;
@@ -1617,19 +1619,23 @@ static int __init iucv_init(void)
 		rc = PTR_ERR(iucv_root);
 		goto out_bus;
 	}
-	/* Note: GFP_DMA used to get memory below 2G */
-	iucv_irq_data = percpu_alloc(sizeof(struct iucv_irq_data),
-				     GFP_KERNEL|GFP_DMA);
-	if (!iucv_irq_data) {
-		rc = -ENOMEM;
-		goto out_root;
-	}
-	/* Allocate parameter blocks. */
-	iucv_param = percpu_alloc(sizeof(union iucv_param),
-				  GFP_KERNEL|GFP_DMA);
-	if (!iucv_param) {
-		rc = -ENOMEM;
-		goto out_extint;
+
+	for_each_online_cpu(cpu) {
+		/* Note: GFP_DMA used to get memory below 2G */
+		iucv_irq_data[cpu] = kmalloc_node(sizeof(struct iucv_irq_data),
+				     GFP_KERNEL|GFP_DMA, cpu_to_node(cpu));
+		if (!iucv_irq_data[cpu]) {
+			rc = -ENOMEM;
+			goto out_free;
+		}
+
+		/* Allocate parameter blocks. */
+		iucv_param[cpu] = kmalloc_node(sizeof(union iucv_param),
+				  GFP_KERNEL|GFP_DMA, cpu_to_node(cpu));
+		if (!iucv_param[cpu]) {
+			rc = -ENOMEM;
+			goto out_free;
+		}
 	}
 	register_hotcpu_notifier(&iucv_cpu_notifier);
 	ASCEBC(iucv_error_no_listener, 16);
@@ -1638,9 +1644,13 @@ static int __init iucv_init(void)
 	iucv_available = 1;
 	return 0;
 
-out_extint:
-	percpu_free(iucv_irq_data);
-out_root:
+out_free:
+	for_each_possible_cpu(cpu) {
+		kfree(iucv_param[cpu]);
+		iucv_param[cpu] = NULL;
+		kfree(iucv_irq_data[cpu]);
+		iucv_irq_data[cpu] = NULL;
+	}
 	s390_root_dev_unregister(iucv_root);
 out_bus:
 	bus_unregister(&iucv_bus);
@@ -1658,6 +1668,7 @@ out:
 static void __exit iucv_exit(void)
 {
 	struct iucv_irq_list *p, *n;
+	int cpu;
 
 	spin_lock_irq(&iucv_queue_lock);
 	list_for_each_entry_safe(p, n, &iucv_task_queue, list)
@@ -1666,8 +1677,12 @@ static void __exit iucv_exit(void)
 		kfree(p);
 	spin_unlock_irq(&iucv_queue_lock);
 	unregister_hotcpu_notifier(&iucv_cpu_notifier);
-	percpu_free(iucv_param);
-	percpu_free(iucv_irq_data);
+	for_each_possible_cpu(cpu) {
+		kfree(iucv_param[cpu]);
+		iucv_param[cpu] = NULL;
+		kfree(iucv_irq_data[cpu]);
+		iucv_irq_data[cpu] = NULL;
+	}
 	s390_root_dev_unregister(iucv_root);
 	bus_unregister(&iucv_bus);
 	unregister_external_interrupt(0x4000, iucv_external_interrupt);

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 28/30] cpu alloc: Use for infiniband
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
                   ` (26 preceding siblings ...)
  2007-11-16 23:09 ` [patch 27/30] cpu alloc: Explicitly code allocpercpu calls in iucv Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  2007-11-16 23:09 ` [patch 29/30] cpu alloc: Use in the crypto subsystem Christoph Lameter
  2007-11-16 23:09 ` [patch 30/30] cpu alloc: Remove the allocpercpu functionality Christoph Lameter
  29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[-- Attachment #1: 0038-cpu-alloc-Use-for-infiniband.patch --]
[-- Type: text/plain, Size: 3555 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 drivers/infiniband/hw/ehca/ehca_irq.c |   22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

Index: linux-2.6/drivers/infiniband/hw/ehca/ehca_irq.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/ehca/ehca_irq.c	2007-11-15 21:17:23.663404239 -0800
+++ linux-2.6/drivers/infiniband/hw/ehca/ehca_irq.c	2007-11-15 21:25:39.310404188 -0800
@@ -646,7 +646,7 @@ static void queue_comp_task(struct ehca_
 	cpu_id = find_next_online_cpu(pool);
 	BUG_ON(!cpu_online(cpu_id));
 
-	cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu_id);
+	cct = CPU_PTR(pool->cpu_comp_tasks, cpu_id);
 	BUG_ON(!cct);
 
 	spin_lock_irqsave(&cct->task_lock, flags);
@@ -654,7 +654,7 @@ static void queue_comp_task(struct ehca_
 	spin_unlock_irqrestore(&cct->task_lock, flags);
 	if (cq_jobs > 0) {
 		cpu_id = find_next_online_cpu(pool);
-		cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu_id);
+		cct = CPU_PTR(pool->cpu_comp_tasks, cpu_id);
 		BUG_ON(!cct);
 	}
 
@@ -727,7 +727,7 @@ static struct task_struct *create_comp_t
 {
 	struct ehca_cpu_comp_task *cct;
 
-	cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu);
+	cct = CPU_PTR(pool->cpu_comp_tasks, cpu);
 	spin_lock_init(&cct->task_lock);
 	INIT_LIST_HEAD(&cct->cq_list);
 	init_waitqueue_head(&cct->wait_queue);
@@ -743,7 +743,7 @@ static void destroy_comp_task(struct ehc
 	struct task_struct *task;
 	unsigned long flags_cct;
 
-	cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu);
+	cct = CPU_PTR(pool->cpu_comp_tasks, cpu);
 
 	spin_lock_irqsave(&cct->task_lock, flags_cct);
 
@@ -759,7 +759,7 @@ static void destroy_comp_task(struct ehc
 
 static void __cpuinit take_over_work(struct ehca_comp_pool *pool, int cpu)
 {
-	struct ehca_cpu_comp_task *cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu);
+	struct ehca_cpu_comp_task *cct = CPU_PTR(pool->cpu_comp_tasks, cpu);
 	LIST_HEAD(list);
 	struct ehca_cq *cq;
 	unsigned long flags_cct;
@@ -772,8 +772,7 @@ static void __cpuinit take_over_work(str
 		cq = list_entry(cct->cq_list.next, struct ehca_cq, entry);
 
 		list_del(&cq->entry);
-		__queue_comp_task(cq, per_cpu_ptr(pool->cpu_comp_tasks,
-						  smp_processor_id()));
+		__queue_comp_task(cq, THIS_CPU(pool->cpu_comp_tasks));
 	}
 
 	spin_unlock_irqrestore(&cct->task_lock, flags_cct);
@@ -799,14 +798,14 @@ static int __cpuinit comp_pool_callback(
 	case CPU_UP_CANCELED:
 	case CPU_UP_CANCELED_FROZEN:
 		ehca_gen_dbg("CPU: %x (CPU_CANCELED)", cpu);
-		cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu);
+		cct = CPU_PTR(pool->cpu_comp_tasks, cpu);
 		kthread_bind(cct->task, any_online_cpu(cpu_online_map));
 		destroy_comp_task(pool, cpu);
 		break;
 	case CPU_ONLINE:
 	case CPU_ONLINE_FROZEN:
 		ehca_gen_dbg("CPU: %x (CPU_ONLINE)", cpu);
-		cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu);
+		cct = CPU_PTR(pool->cpu_comp_tasks, cpu);
 		kthread_bind(cct->task, cpu);
 		wake_up_process(cct->task);
 		break;
@@ -849,7 +848,8 @@ int ehca_create_comp_pool(void)
 	spin_lock_init(&pool->last_cpu_lock);
 	pool->last_cpu = any_online_cpu(cpu_online_map);
 
-	pool->cpu_comp_tasks = alloc_percpu(struct ehca_cpu_comp_task);
+	pool->cpu_comp_tasks = CPU_ALLOC(struct ehca_cpu_comp_task,
+						GFP_KERNEL | __GFP_ZERO);
 	if (pool->cpu_comp_tasks == NULL) {
 		kfree(pool);
 		return -EINVAL;
@@ -883,6 +883,6 @@ void ehca_destroy_comp_pool(void)
 		if (cpu_online(i))
 			destroy_comp_task(pool, i);
 	}
-	free_percpu(pool->cpu_comp_tasks);
+	CPU_FREE(pool->cpu_comp_tasks);
 	kfree(pool);
 }

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 29/30] cpu alloc: Use in the crypto subsystem.
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
                   ` (27 preceding siblings ...)
  2007-11-16 23:09 ` [patch 28/30] cpu alloc: Use for infiniband Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  2007-11-16 23:09 ` [patch 30/30] cpu alloc: Remove the allocpercpu functionality Christoph Lameter
  29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[-- Attachment #1: 0039-cpu-alloc-Use-in-the-crypto-subsystem.patch --]
[-- Type: text/plain, Size: 2245 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 crypto/async_tx/async_tx.c |   15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

Index: linux-2.6/crypto/async_tx/async_tx.c
===================================================================
--- linux-2.6.orig/crypto/async_tx/async_tx.c	2007-11-15 21:17:23.610404668 -0800
+++ linux-2.6/crypto/async_tx/async_tx.c	2007-11-15 21:25:39.834904080 -0800
@@ -207,10 +207,10 @@ static void async_tx_rebalance(void)
 	for_each_dma_cap_mask(cap, dma_cap_mask_all)
 		for_each_possible_cpu(cpu) {
 			struct dma_chan_ref *ref =
-				per_cpu_ptr(channel_table[cap], cpu)->ref;
+				CPU_PTR(channel_table[cap], cpu)->ref;
 			if (ref) {
 				atomic_set(&ref->count, 0);
-				per_cpu_ptr(channel_table[cap], cpu)->ref =
+				CPU_PTR(channel_table[cap], cpu)->ref =
 									NULL;
 			}
 		}
@@ -223,7 +223,7 @@ static void async_tx_rebalance(void)
 			else
 				new = get_chan_ref_by_cap(cap, -1);
 
-			per_cpu_ptr(channel_table[cap], cpu)->ref = new;
+			CPU_PTR(channel_table[cap], cpu)->ref = new;
 		}
 
 	spin_unlock_irqrestore(&async_tx_lock, flags);
@@ -327,7 +327,8 @@ async_tx_init(void)
 	clear_bit(DMA_INTERRUPT, dma_cap_mask_all.bits);
 
 	for_each_dma_cap_mask(cap, dma_cap_mask_all) {
-		channel_table[cap] = alloc_percpu(struct chan_ref_percpu);
+		channel_table[cap] = CPU_ALLOC(struct chan_ref_percpu,
+						GFP_KERNEL | __GFP_ZERO);
 		if (!channel_table[cap])
 			goto err;
 	}
@@ -343,7 +344,7 @@ err:
 	printk(KERN_ERR "async_tx: initialization failure\n");
 
 	while (--cap >= 0)
-		free_percpu(channel_table[cap]);
+		CPU_FRE(channel_table[cap]);
 
 	return 1;
 }
@@ -356,7 +357,7 @@ static void __exit async_tx_exit(void)
 
 	for_each_dma_cap_mask(cap, dma_cap_mask_all)
 		if (channel_table[cap])
-			free_percpu(channel_table[cap]);
+			CPU_FREE(channel_table[cap]);
 
 	dma_async_client_unregister(&async_tx_dma);
 }
@@ -378,7 +379,7 @@ async_tx_find_channel(struct dma_async_t
 	else if (likely(channel_table_initialized)) {
 		struct dma_chan_ref *ref;
 		int cpu = get_cpu();
-		ref = per_cpu_ptr(channel_table[tx_type], cpu)->ref;
+		ref = CPU_PTR(channel_table[tx_type], cpu)->ref;
 		put_cpu();
 		return ref ? ref->chan : NULL;
 	} else

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 30/30] cpu alloc: Remove the allocpercpu functionality
  2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
                   ` (28 preceding siblings ...)
  2007-11-16 23:09 ` [patch 29/30] cpu alloc: Use in the crypto subsystem Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
  29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

[-- Attachment #1: 0040-cpu-alloc-Remove-the-allocpercpu-functionality.patch --]
[-- Type: text/plain, Size: 7861 bytes --]

There is no user of allocpercpu left after all the earlier patches were
applied. Remove the code that realizes allocpercpu.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 include/linux/percpu.h |   80 ------------------------------
 mm/Makefile            |    1 
 mm/allocpercpu.c       |  127 -------------------------------------------------
 3 files changed, 208 deletions(-)
 delete mode 100644 mm/allocpercpu.c

Index: linux-2.6/include/linux/percpu.h
===================================================================
--- linux-2.6.orig/include/linux/percpu.h	2007-11-15 21:24:50.730654207 -0800
+++ linux-2.6/include/linux/percpu.h	2007-11-15 21:25:40.302313443 -0800
@@ -31,86 +31,6 @@
 	&__get_cpu_var(var); }))
 #define put_cpu_var(var) preempt_enable()
 
-#ifdef CONFIG_SMP
-
-struct percpu_data {
-	void *ptrs[NR_CPUS];
-};
-
-#define __percpu_disguise(pdata) (struct percpu_data *)~(unsigned long)(pdata)
-/* 
- * Use this to get to a cpu's version of the per-cpu object dynamically
- * allocated. Non-atomic access to the current CPU's version should
- * probably be combined with get_cpu()/put_cpu().
- */ 
-#define percpu_ptr(ptr, cpu)                              \
-({                                                        \
-        struct percpu_data *__p = __percpu_disguise(ptr); \
-        (__typeof__(ptr))__p->ptrs[(cpu)];	          \
-})
-
-extern void *percpu_populate(void *__pdata, size_t size, gfp_t gfp, int cpu);
-extern void percpu_depopulate(void *__pdata, int cpu);
-extern int __percpu_populate_mask(void *__pdata, size_t size, gfp_t gfp,
-				  cpumask_t *mask);
-extern void __percpu_depopulate_mask(void *__pdata, cpumask_t *mask);
-extern void *__percpu_alloc_mask(size_t size, gfp_t gfp, cpumask_t *mask);
-extern void percpu_free(void *__pdata);
-
-#else /* CONFIG_SMP */
-
-#define percpu_ptr(ptr, cpu) ({ (void)(cpu); (ptr); })
-
-static inline void percpu_depopulate(void *__pdata, int cpu)
-{
-}
-
-static inline void __percpu_depopulate_mask(void *__pdata, cpumask_t *mask)
-{
-}
-
-static inline void *percpu_populate(void *__pdata, size_t size, gfp_t gfp,
-				    int cpu)
-{
-	return percpu_ptr(__pdata, cpu);
-}
-
-static inline int __percpu_populate_mask(void *__pdata, size_t size, gfp_t gfp,
-					 cpumask_t *mask)
-{
-	return 0;
-}
-
-static __always_inline void *__percpu_alloc_mask(size_t size, gfp_t gfp, cpumask_t *mask)
-{
-	return kzalloc(size, gfp);
-}
-
-static inline void percpu_free(void *__pdata)
-{
-	kfree(__pdata);
-}
-
-#endif /* CONFIG_SMP */
-
-#define percpu_populate_mask(__pdata, size, gfp, mask) \
-	__percpu_populate_mask((__pdata), (size), (gfp), &(mask))
-#define percpu_depopulate_mask(__pdata, mask) \
-	__percpu_depopulate_mask((__pdata), &(mask))
-#define percpu_alloc_mask(size, gfp, mask) \
-	__percpu_alloc_mask((size), (gfp), &(mask))
-
-#define percpu_alloc(size, gfp) percpu_alloc_mask((size), (gfp), cpu_online_map)
-
-/* (legacy) interface for use without CPU hotplug handling */
-
-#define __alloc_percpu(size)	percpu_alloc_mask((size), GFP_KERNEL, \
-						  cpu_possible_map)
-#define alloc_percpu(type)	(type *)__alloc_percpu(sizeof(type))
-#define free_percpu(ptr)	percpu_free((ptr))
-#define per_cpu_ptr(ptr, cpu)	percpu_ptr((ptr), (cpu))
-
-
 /*
  * cpu allocator definitions
  *
Index: linux-2.6/mm/Makefile
===================================================================
--- linux-2.6.orig/mm/Makefile	2007-11-15 21:24:50.726654353 -0800
+++ linux-2.6/mm/Makefile	2007-11-15 21:25:40.302313443 -0800
@@ -28,6 +28,5 @@ obj-$(CONFIG_SLUB) += slub.o
 obj-$(CONFIG_MEMORY_HOTPLUG) += memory_hotplug.o
 obj-$(CONFIG_FS_XIP) += filemap_xip.o
 obj-$(CONFIG_MIGRATION) += migrate.o
-obj-$(CONFIG_SMP) += allocpercpu.o
 obj-$(CONFIG_QUICKLIST) += quicklist.o
 
Index: linux-2.6/mm/allocpercpu.c
===================================================================
--- linux-2.6.orig/mm/allocpercpu.c	2007-11-15 21:17:23.570404405 -0800
+++ /dev/null	1970-01-01 00:00:00.000000000 +0000
@@ -1,127 +0,0 @@
-/*
- * linux/mm/allocpercpu.c
- *
- * Separated from slab.c August 11, 2006 Christoph Lameter <clameter@sgi.com>
- */
-#include <linux/mm.h>
-#include <linux/module.h>
-
-/**
- * percpu_depopulate - depopulate per-cpu data for given cpu
- * @__pdata: per-cpu data to depopulate
- * @cpu: depopulate per-cpu data for this cpu
- *
- * Depopulating per-cpu data for a cpu going offline would be a typical
- * use case. You need to register a cpu hotplug handler for that purpose.
- */
-void percpu_depopulate(void *__pdata, int cpu)
-{
-	struct percpu_data *pdata = __percpu_disguise(__pdata);
-
-	kfree(pdata->ptrs[cpu]);
-	pdata->ptrs[cpu] = NULL;
-}
-EXPORT_SYMBOL_GPL(percpu_depopulate);
-
-/**
- * percpu_depopulate_mask - depopulate per-cpu data for some cpu's
- * @__pdata: per-cpu data to depopulate
- * @mask: depopulate per-cpu data for cpu's selected through mask bits
- */
-void __percpu_depopulate_mask(void *__pdata, cpumask_t *mask)
-{
-	int cpu;
-	for_each_cpu_mask(cpu, *mask)
-		percpu_depopulate(__pdata, cpu);
-}
-EXPORT_SYMBOL_GPL(__percpu_depopulate_mask);
-
-/**
- * percpu_populate - populate per-cpu data for given cpu
- * @__pdata: per-cpu data to populate further
- * @size: size of per-cpu object
- * @gfp: may sleep or not etc.
- * @cpu: populate per-data for this cpu
- *
- * Populating per-cpu data for a cpu coming online would be a typical
- * use case. You need to register a cpu hotplug handler for that purpose.
- * Per-cpu object is populated with zeroed buffer.
- */
-void *percpu_populate(void *__pdata, size_t size, gfp_t gfp, int cpu)
-{
-	struct percpu_data *pdata = __percpu_disguise(__pdata);
-	int node = cpu_to_node(cpu);
-
-	BUG_ON(pdata->ptrs[cpu]);
-	if (node_online(node))
-		pdata->ptrs[cpu] = kmalloc_node(size, gfp|__GFP_ZERO, node);
-	else
-		pdata->ptrs[cpu] = kzalloc(size, gfp);
-	return pdata->ptrs[cpu];
-}
-EXPORT_SYMBOL_GPL(percpu_populate);
-
-/**
- * percpu_populate_mask - populate per-cpu data for more cpu's
- * @__pdata: per-cpu data to populate further
- * @size: size of per-cpu object
- * @gfp: may sleep or not etc.
- * @mask: populate per-cpu data for cpu's selected through mask bits
- *
- * Per-cpu objects are populated with zeroed buffers.
- */
-int __percpu_populate_mask(void *__pdata, size_t size, gfp_t gfp,
-			   cpumask_t *mask)
-{
-	cpumask_t populated = CPU_MASK_NONE;
-	int cpu;
-
-	for_each_cpu_mask(cpu, *mask)
-		if (unlikely(!percpu_populate(__pdata, size, gfp, cpu))) {
-			__percpu_depopulate_mask(__pdata, &populated);
-			return -ENOMEM;
-		} else
-			cpu_set(cpu, populated);
-	return 0;
-}
-EXPORT_SYMBOL_GPL(__percpu_populate_mask);
-
-/**
- * percpu_alloc_mask - initial setup of per-cpu data
- * @size: size of per-cpu object
- * @gfp: may sleep or not etc.
- * @mask: populate per-data for cpu's selected through mask bits
- *
- * Populating per-cpu data for all online cpu's would be a typical use case,
- * which is simplified by the percpu_alloc() wrapper.
- * Per-cpu objects are populated with zeroed buffers.
- */
-void *__percpu_alloc_mask(size_t size, gfp_t gfp, cpumask_t *mask)
-{
-	void *pdata = kzalloc(sizeof(struct percpu_data), gfp);
-	void *__pdata = __percpu_disguise(pdata);
-
-	if (unlikely(!pdata))
-		return NULL;
-	if (likely(!__percpu_populate_mask(__pdata, size, gfp, mask)))
-		return __pdata;
-	kfree(pdata);
-	return NULL;
-}
-EXPORT_SYMBOL_GPL(__percpu_alloc_mask);
-
-/**
- * percpu_free - final cleanup of per-cpu data
- * @__pdata: object to clean up
- *
- * We simply clean up any per-cpu object left. No need for the client to
- * track and specify through a bis mask which per-cpu objects are to free.
- */
-void percpu_free(void *__pdata)
-{
-	if (unlikely(!__pdata))
-		return;
-	__percpu_depopulate_mask(__pdata, &cpu_possible_map);
-	kfree(__percpu_disguise(__pdata));
-}
-EXPORT_SYMBOL_GPL(percpu_free);

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [patch 07/30] cpu alloc: IA64 support
  2007-11-16 23:09 ` [patch 07/30] cpu alloc: IA64 support Christoph Lameter
@ 2007-11-16 23:32   ` Luck, Tony
  2007-11-17  0:05     ` Christoph Lameter
  0 siblings, 1 reply; 34+ messages in thread
From: Luck, Tony @ 2007-11-16 23:32 UTC (permalink / raw)
  To: Christoph Lameter, akpm
  Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra

+# Maximum of 128 MB cpu_alloc space per cpu
+config CPU_AREA_ORDER
+	int
+	default "13"

Comment only matches code when page size is 16K ... and we are (slowly)
moving to 64k as the default (which with order 13 allocation would mean
512M)

-Tony

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [patch 07/30] cpu alloc: IA64 support
  2007-11-16 23:32   ` Luck, Tony
@ 2007-11-17  0:05     ` Christoph Lameter
  0 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-17  0:05 UTC (permalink / raw)
  To: Luck, Tony
  Cc: akpm, linux-arch, linux-kernel, David Miller, Eric Dumazet,
	Peter Zijlstra

On Fri, 16 Nov 2007, Luck, Tony wrote:

> +# Maximum of 128 MB cpu_alloc space per cpu
> +config CPU_AREA_ORDER
> +	int
> +	default "13"
> 
> Comment only matches code when page size is 16K ... and we are (slowly)
> moving to 64k as the default (which with order 13 allocation would mean
> 512M)

But the page table also grow so 512M may be okay?


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 16/30] cpu alloc: XFS counters
  2007-11-16 23:09 ` [patch 16/30] cpu alloc: XFS counters Christoph Lameter
@ 2007-11-19 12:58   ` Christoph Hellwig
  0 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2007-11-19 12:58 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, linux-arch, linux-kernel, David Miller, Eric Dumazet,
	Peter Zijlstra

On Fri, Nov 16, 2007 at 03:09:36PM -0800, Christoph Lameter wrote:
>  	cntp = (xfs_icsb_cnts_t *)
> -			per_cpu_ptr(mp->m_sb_cnts, (unsigned long)hcpu);
> +			CPU_PTR(mp->m_sb_cnts, (unsigned long)hcpu);
> -	mp->m_sb_cnts = alloc_percpu(xfs_icsb_cnts_t);
> +	mp->m_sb_cnts = CPU_ALLOC(xfs_icsb_cnts_t, GFP_KERNEL | __GFP_ZERO);

What's the point for renaming these?  And even if you have a case for
renamining them please give them proper lower-case names.


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2007-11-19 12:58 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
2007-11-16 23:09 ` [patch 01/30] cpu alloc: Simple version of the allocator (static allocations) Christoph Lameter
2007-11-16 23:09 ` [patch 02/30] cpu alloc: Use in SLUB Christoph Lameter
2007-11-16 23:09 ` [patch 03/30] cpu alloc: Remove SLUB fields Christoph Lameter
2007-11-16 23:09 ` [patch 04/30] cpu alloc: page allocator conversion Christoph Lameter
2007-11-16 23:09 ` [patch 05/30] cpu_alloc: Implement dynamically extendable cpu areas Christoph Lameter
2007-11-16 23:09 ` [patch 06/30] cpu alloc: x86 support Christoph Lameter
2007-11-16 23:09 ` [patch 07/30] cpu alloc: IA64 support Christoph Lameter
2007-11-16 23:32   ` Luck, Tony
2007-11-17  0:05     ` Christoph Lameter
2007-11-16 23:09 ` [patch 08/30] cpu_alloc: Sparc64 support Christoph Lameter
2007-11-16 23:09 ` [patch 09/30] cpu alloc: percpu_counter conversion Christoph Lameter
2007-11-16 23:09 ` [patch 10/30] cpu alloc: crash_notes conversion Christoph Lameter
2007-11-16 23:09 ` [patch 11/30] cpu alloc: workqueue conversion Christoph Lameter
2007-11-16 23:09 ` [patch 12/30] cpu alloc: ACPI cstate handling conversion Christoph Lameter
2007-11-16 23:09 ` [patch 13/30] cpu alloc: genhd statistics conversion Christoph Lameter
2007-11-16 23:09 ` [patch 14/30] cpu alloc: blktrace conversion Christoph Lameter
2007-11-16 23:09 ` [patch 15/30] cpu alloc: SRCU Christoph Lameter
2007-11-16 23:09 ` [patch 16/30] cpu alloc: XFS counters Christoph Lameter
2007-11-19 12:58   ` Christoph Hellwig
2007-11-16 23:09 ` [patch 17/30] cpu alloc: NFS statistics Christoph Lameter
2007-11-16 23:09 ` [patch 18/30] cpu alloc: neigbour statistics Christoph Lameter
2007-11-16 23:09 ` [patch 19/30] cpu alloc: tcp statistics Christoph Lameter
2007-11-16 23:09 ` [patch 20/30] cpu alloc: convert scatches Christoph Lameter
2007-11-16 23:09 ` [patch 21/30] cpu alloc: dmaengine conversion Christoph Lameter
2007-11-16 23:09 ` [patch 22/30] cpu alloc: convert loopback statistics Christoph Lameter
2007-11-16 23:09 ` [patch 23/30] cpu alloc: veth conversion Christoph Lameter
2007-11-16 23:09 ` [patch 24/30] cpu alloc: Chelsio statistics conversion Christoph Lameter
2007-11-16 23:09 ` [patch 25/30] cpu alloc: convert mib handling to cpu alloc Christoph Lameter
2007-11-16 23:09 ` [patch 26/30] cpu_alloc: convert network sockets Christoph Lameter
2007-11-16 23:09 ` [patch 27/30] cpu alloc: Explicitly code allocpercpu calls in iucv Christoph Lameter
2007-11-16 23:09 ` [patch 28/30] cpu alloc: Use for infiniband Christoph Lameter
2007-11-16 23:09 ` [patch 29/30] cpu alloc: Use in the crypto subsystem Christoph Lameter
2007-11-16 23:09 ` [patch 30/30] cpu alloc: Remove the allocpercpu functionality Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).