* [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects
@ 2007-11-16 23:09 Christoph Lameter
2007-11-16 23:09 ` [patch 01/30] cpu alloc: Simple version of the allocator (static allocations) Christoph Lameter
` (29 more replies)
0 siblings, 30 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[Note arch maintainers: Some configuration variables in arch/*/Kconfig needed
for large users of per cpu space (large NUMA mostly, or lots of processors)]
and in order to make optimal use of cpu_alloc.
V1->V2:
- Split off patch for virtualization. Patch has some instructions on
how to configure an arch for cpu_alloc.
- uiuc patch is upstream so leave it out.
- There was an article on LWN.net on cpu_alloc.
- Add a sparc64 config
- Against current git that merged the Kconfigs for x86_64 and i386.
In various places the kernel maintains arrays of pointers indexed by
processor numbers. These are used to locate objects that need to be used
when executing on a specirfic processor. Both the slab allocator
and the page allocator use these arrays and there the arrays are used in
performance critical code. The allocpercpu functionality is a simple
allocator to provide these arrays. However, there are certain drawbacks
in using such arrays:
1. The arrays become huge for large systems and may be very sparsely
populated (if they are dimensionied for NR_CPUS) on an architecture
like IA64 that allows up to 4k cpus if a kernel is then booted on a
machine that only supports 8 processors. We could nr_cpu_ids there
but we would still have to allocate all possible processors up to
the number of processor ids. cpu_alloc can deal with sparse cpu_maps.
2. The arrays cause surrounding variables to no longer fit into a single
cacheline. The layout of core data structure is typically optimized so
that variables frequently used together are placed in the same cacheline.
Arrays of pointers move these variables far apart and destroy this effect.
3. A processor frequently follows only one pointer for its own use. Thus
that cacheline with that pointer has to be kept in memory. The neighboring
pointers are all to other processors that are rarely used. So a whole
cacheline of 128 bytes may be consumed but only 8 bytes of information
is constant use. It would be better to be able to place more information
in this cacheline.
4. The lookup of the per cpu object is expensive and requires multiple
memory accesses to:
A) smp_processor_id()
B) pointer to the base of the per cpu pointer array
C) pointer to the per cpu object in the pointer array
D) the per cpu object itself.
5. Each use of allocper requires its own per cpu array. On large
system large arrays have to be allocated again and again.
6. Processor hotplug cannot effectively track the per cpu objects
since the VM cannot find all memory that was allocated for
a specific cpu. It is impossible to add or remove objects in
a consistent way. Although the allocpercpu subsystem was extended
to add that capability is not used since use would require adding
cpu hotplug callbacks to each and every use of allocpercpu in
the kernel.
The patchset here provides an cpu allocator that arranges data differently.
Objects are placed tightly in linear areas reserved for each processor.
The areas are of a fixed size so that address calculation can be used
instead of a lookup. This means that
6. The VM knows where all the per cpu variables are and it could remove
or add cpu areas as cpus come online or go offline.
5. There is no need for per cpu pointer arrays.
4. The lookup of a per cpu object is easy and requires memory access to:
A) smp_processor_id()
B) cpu pointer to the object
C) the per cpu object itself.
3. So one access to the not very friendly cacheline that only contains
a single useful pointer is avoided. The cache footprint is reduced.
2. Surrounding variables can be placed in the same cacheline.
This allow f.e. in SLUB to avoid caching objects in per cpu structures
since the kmem_cache structure is finally available without the need
to access a cache cold cacheline.
1. A single pointer can be used regardless of the number of processors
in the system.
The cpu allocator managed data beginning at CPU_AREA_BASE. The pointer to
access item DATA on processor X can then be calculated using
POINTER = CPU_AREA_BASE + DATA + (X << CPU_AREA_ORDER)
This makes the allocator rely on a fixed address of the cpu area and on
a fixed size of memory for each processor (similar to S/390s
way of addressing percpu variables).
The allocator can be configured in two ways:
1. Static configuration
The cpu areas are directly mapped memory addresses. Thus
the memory in the cpu areas is fixed and is allocated
as a static variable.
The default configuration of the cpu allocator (if no arch code
changes the settings) is to reserve a 32k area for each processor.
2. Virtual configuration
The cpu areas are virtualized. Memory in cpu areas is allocated
on demand. The MMU is used to map memory allocated into the
cpu areas (in same way that the virtual memmap functionality does it).
The maximum sizes for the cpu areas is only dependent on the amount
of virtual memory available. The virtualization can use large
mappings (PMDs f.e.) in order to avoid TLB pressure that could occur
on system that only have a small page when heavy use of cpu areas
is made.
This patch increases the speed of the SLUB fastpath and it is likely that
similar results can be obtained for other kernel subsystems :
Allocation of 10000 objects of each size. Measurement of the cycles
for each action:
Size SLUBmm cpu alloc
-------------------------
8 45 38
16 49 43
32 61 53
64 82 75
128 188 176
256 207 204
512 260 250
1024 398 391
2048 530 511
4096 342 376
Allocation and then immeidate freeing of an object. Measured in cycles
for each alloc/free action:
alloc/free test
SLUBmm cpu alloc
68-72 56-58
The cpu allocator also removes the difference in handling SMP, UP and NUMA in
the slab and page allocate and simplifies code. It is advantageous even for UP
to place per cpu data from different zones or different slabs in the same
cacheline. Cpu alloc makes uniform handling of cpu data on all three different
types of configurations possible.
The cpu allocator also decreases the memory needs for per cpu storage.
On a classic configuration with SLAB, 32 processors and the allocation of a 4 byte
counter via allocpercpu one needs the following on a 64 bit platform:
32 * 8 256 Array indexed by processor
32 * 32 1024 32 objects. The minimum allocation size of SLAB is 32.
------------------------------------------------------------------------------
Total 1280 bytes
cpu alloc needs
32 * 4 128 bytes
This is one tenth of storage. Granted this is the worst case scenario for a
32 processor system but it shows the savings that can be had. cpu alloc can
allocate 10 counters in the same cacheline for the price of one with
allocpercpu. The allocpercpu counters are likely dispersed over all of
memory. So multiple cachelines (in the worst case 10) need to be kept in
memory if those counters need constant updating. cpu alloc will keep the
10 counter in a single cacheline. cpu alloc can keep up to 16 counters
in the same cacheline if the machine has a 64 byte cacheline size.
The use of the cpu area is usually pretty minimal. 32 bit SMP systems typicaly
use about 8k of cpu area space after bootup. 64 bit SMP around 16k. Small NUMA
systems (8p 4node) use about 64k. Large NUMA system may need a megabyte of
cpu area.
The usage of the per cpu areas typically increases by
1. New slabs being created (needs about 12 bytes per slab on 32 bit, 20 on 64 bit)
2. New devices being mounted that need cpu data for statistics
3. Network devices statistics
4. Special network features (Dave needs to run 100000 IP tunnels)
The current use of the cpu area can be seen in the field
cpu_bytes
in /proc/vmstat
Drawbacks:
1. The per cpu area size is fixed
If we use a virtually mapped area then this is not a problem if there
is sufficient virtual space. The 100000 IP tunnels are only realistic
with a virtually mapped cpu area.
2. The cpu allocator cannot control allocation of individual objects like
allocpercpu may. This is in actuality never used except in net/iucv/iucv.c
where we have a single case of a per cpu allocation being used to allocate
GFP_DMA structures(!). A patch is provided that replaces the use of
allocpercpu with explicit calls to allocators for each object in iucv.c
TODO:
- Currently only i386, ia64 and x86_64 arch definitions are provided.
Other arches fall back to 64k static configurations.
- Cpu hotplug support. Current we simply allocate for all possible processors.
We could reduce this to only online processors if we could allocate the
cpu area for the new processor before the callbacks are run and if we could
free the cpu areas for a processor going down after all the callbacks for
that were run.
The patchset implements cpu alloc and then gradually replaces all uses of
allocpercpu in the kernel. The last patch removes the allocpercpu support.
If the last patch is not applied then allocpercpu can coexist with cpu alloc.
The patchset is available also via
git pull git://git.kernel.org/pub/scm/linux/kernel/git/christoph/slab.git cpu_alloc
The following patches are based on the linux-2.6 git tree +
git://git.kernel.org/pub/scm/linux/kernel/git/christoph/slab.git performance
(which is the mm version of SLUB)
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 01/30] cpu alloc: Simple version of the allocator (static allocations)
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
2007-11-16 23:09 ` [patch 02/30] cpu alloc: Use in SLUB Christoph Lameter
` (28 subsequent siblings)
29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[-- Attachment #1: cpu_alloc --]
[-- Type: text/plain, Size: 11842 bytes --]
The core portion of the cpu allocator.
The per cpu allocator allows dynamic allocation of memory on all
processor simultaneously. A bitmap is used to track used areas.
The allocator implements tight packing to reduce the cache footprint
and increase speed since cacheline contention is typically not a concern
for memory mainly used by a single cpu. Small objects will fill up gaps
left by larger allocations that required alignments.
This is a limited version of the cpu allocator that only performs a
static allocation of a single page for each processor. This is enough
for the use of the cpu allocator in the slab and page allocator for most
of the common configurations. The configuration will be useful for
embedded systems to reduce memory requirements. However, there is a hard limit
of the size of the per cpu structures and so the default configuration of an
order 0 allocation can only support up to 150 slab caches (most systems that
I got use 70) and probably not more than 16 or so NUMA nodes. The size of the
statically configured area can be changed via make menuconfig etc.
The cpu allocator virtualization patch is needed in order to support the dynamically
extending per cpu areas.
V1->V2:
- Split off the dynamic extendable cpu area feature to make it clear that it exists.\
- Remove useless variables.
- Add boot_cpu_alloc for bootime cpu area reservations (allows the folding in of
per cpu areas and other arch specific per cpu stuff during boot).
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
include/linux/percpu.h | 59 +++++++++++++++
include/linux/vmstat.h | 2
mm/Kconfig | 7 +
mm/Makefile | 2
mm/cpu_alloc.c | 184 +++++++++++++++++++++++++++++++++++++++++++++++++
mm/vmstat.c | 1
6 files changed, 253 insertions(+), 2 deletions(-)
create mode 100644 include/linux/cpu_alloc.h
create mode 100644 mm/cpu_alloc.c
Index: linux-2.6/include/linux/vmstat.h
===================================================================
--- linux-2.6.orig/include/linux/vmstat.h 2007-11-16 14:51:29.326681524 -0800
+++ linux-2.6/include/linux/vmstat.h 2007-11-16 14:51:47.569430596 -0800
@@ -36,7 +36,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS
FOR_ALL_ZONES(PGSCAN_KSWAPD),
FOR_ALL_ZONES(PGSCAN_DIRECT),
PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL,
- PAGEOUTRUN, ALLOCSTALL, PGROTATED,
+ PAGEOUTRUN, ALLOCSTALL, PGROTATED, CPU_BYTES,
NR_VM_EVENT_ITEMS
};
Index: linux-2.6/mm/Kconfig
===================================================================
--- linux-2.6.orig/mm/Kconfig 2007-11-16 14:51:29.338681182 -0800
+++ linux-2.6/mm/Kconfig 2007-11-16 14:53:31.614680913 -0800
@@ -194,3 +194,10 @@ config NR_QUICK
config VIRT_TO_BUS
def_bool y
depends on !ARCH_NO_VIRT_TO_BUS
+
+config CPU_AREA_ORDER
+ int "Maximum size (order) of CPU area"
+ default "3"
+ help
+ Sets the maximum amount of memory that can be allocated via cpu_alloc
+ The size is set in page order, so 0 = PAGE_SIZE, 1 = PAGE_SIZE << 1 etc.
Index: linux-2.6/mm/Makefile
===================================================================
--- linux-2.6.orig/mm/Makefile 2007-11-16 14:51:29.346681415 -0800
+++ linux-2.6/mm/Makefile 2007-11-16 14:51:47.569430596 -0800
@@ -11,7 +11,7 @@ obj-y := bootmem.o filemap.o mempool.o
page_alloc.o page-writeback.o pdflush.o \
readahead.o swap.o truncate.o vmscan.o \
prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
- page_isolation.o $(mmu-y)
+ page_isolation.o cpu_alloc.o $(mmu-y)
obj-$(CONFIG_BOUNCE) += bounce.o
obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o
Index: linux-2.6/mm/cpu_alloc.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/mm/cpu_alloc.c 2007-11-16 14:51:47.573430967 -0800
@@ -0,0 +1,184 @@
+/*
+ * Cpu allocator - Manage objects allocated for each processor
+ *
+ * (C) 2007 SGI, Christoph Lameter <clameter@sgi.com>
+ * Basic implementation with allocation and free from a dedicated per
+ * cpu area.
+ *
+ * The per cpu allocator allows dynamic allocation of memory on all
+ * processor simultaneously. A bitmap is used to track used areas.
+ * The allocator implements tight packing to reduce the cache footprint
+ * and increase speed since cacheline contention is typically not a concern
+ * for memory mainly used by a single cpu. Small objects will fill up gaps
+ * left by larger allocations that required alignments.
+ */
+#include <linux/mm.h>
+#include <linux/mmzone.h>
+#include <linux/module.h>
+#include <linux/percpu.h>
+#include <linux/bitmap.h>
+
+/*
+ * Basic allocation unit. A bit map is created to track the use of each
+ * UNIT_SIZE element in the cpu area.
+ */
+
+#define UNIT_SIZE sizeof(int)
+#define UNITS (ALLOC_SIZE / UNIT_SIZE)
+
+/*
+ * How many units are needed for an object of a given size
+ */
+static int size_to_units(unsigned long size)
+{
+ return DIV_ROUND_UP(size, UNIT_SIZE);
+}
+
+/*
+ * Lock to protect the bitmap and the meta data for the cpu allocator.
+ */
+static DEFINE_SPINLOCK(cpu_alloc_map_lock);
+static unsigned long units_reserved; /* Units reserved by boot allocations */
+
+/*
+ * Static configuration. The cpu areas are of a fixed size and
+ * cannot be extended. Such configurations are mainly useful on
+ * machines that do not have MMU support. Note that we have to use
+ * bss space for the static declarations. The combination of a large number
+ * of processors and a large cpu area may cause problems with the size
+ * of the bss segment.
+ */
+#define ALLOC_SIZE (1UL << (CONFIG_CPU_AREA_ORDER + PAGE_SHIFT))
+
+static u8 cpu_area[NR_CPUS * ALLOC_SIZE];
+static DECLARE_BITMAP(cpu_alloc_map, UNITS);
+
+void * __init boot_cpu_alloc(unsigned long size)
+{
+ unsigned long x = units_reserved;
+
+ units_reserved += size_to_units(size);
+ BUG_ON(units_reserved > UNITS);
+ return cpu_area + x * UNIT_SIZE;
+}
+
+static int first_free; /* First known free unit */
+
+/*
+ * Mark an object as used in the cpu_alloc_map
+ *
+ * Must hold cpu_alloc_map_lock
+ */
+static void set_map(int start, int length)
+{
+ while (length-- > 0)
+ __set_bit(start++, cpu_alloc_map);
+}
+
+/*
+ * Mark an area as freed.
+ *
+ * Must hold cpu_alloc_map_lock
+ */
+static void clear_map(int start, int length)
+{
+ while (length-- > 0)
+ __clear_bit(start++, cpu_alloc_map);
+}
+
+/*
+ * Allocate an object of a certain size
+ *
+ * Returns a special pointer that can be used with CPU_PTR to find the
+ * address of the object for a certain cpu.
+ */
+void *cpu_alloc(unsigned long size, gfp_t gfpflags, unsigned long align)
+{
+ unsigned long start;
+ int units = size_to_units(size);
+ void *ptr;
+ int first;
+ unsigned long flags;
+
+ BUG_ON(gfpflags & ~(GFP_RECLAIM_MASK | __GFP_ZERO));
+
+ spin_lock_irqsave(&cpu_alloc_map_lock, flags);
+
+ first = 1;
+ start = first_free;
+
+ for ( ; ; ) {
+
+ start = find_next_zero_bit(cpu_alloc_map, ALLOC_SIZE, start);
+ if (start >= UNITS - units_reserved)
+ goto out_of_memory;
+
+ if (first)
+ first_free = start;
+
+ /*
+ * Check alignment and that there is enough space after
+ * the starting unit.
+ */
+ if ((start + units_reserved) % (align / UNIT_SIZE) == 0 &&
+ find_next_bit(cpu_alloc_map, ALLOC_SIZE, start + 1)
+ >= start + units)
+ break;
+ start++;
+ first = 0;
+ }
+
+ if (first)
+ first_free = start + units;
+
+ if (start + units > UNITS - units_reserved)
+ goto out_of_memory;
+
+ set_map(start, units);
+ __count_vm_events(CPU_BYTES, units * UNIT_SIZE);
+
+ spin_unlock_irqrestore(&cpu_alloc_map_lock, flags);
+
+ ptr = cpu_area + (start + units_reserved) * UNIT_SIZE;
+
+ if (gfpflags & __GFP_ZERO) {
+ int cpu;
+
+ for_each_possible_cpu(cpu)
+ memset(CPU_PTR(ptr, cpu), 0, size);
+ }
+
+ return ptr;
+
+out_of_memory:
+ spin_unlock_irqrestore(&cpu_alloc_map_lock, flags);
+ return NULL;
+}
+EXPORT_SYMBOL(cpu_alloc);
+
+/*
+ * Free an object. The pointer must be a cpu pointer allocated
+ * via cpu_alloc.
+ */
+void cpu_free(void *start, unsigned long size)
+{
+ int units = size_to_units(size);
+ int index;
+ u8 *p = start;
+ unsigned long flags;
+
+ BUG_ON(p < (cpu_area + units_reserved * UNIT_SIZE));
+ index = (p - cpu_area) / UNIT_SIZE - units_reserved;
+ BUG_ON(!test_bit(index, cpu_alloc_map) ||
+ index >= UNITS - units_reserved);
+
+ spin_lock_irqsave(&cpu_alloc_map_lock, flags);
+
+ clear_map(index, units);
+ __count_vm_events(CPU_BYTES, -units * UNIT_SIZE);
+ if (index < first_free)
+ first_free = index;
+
+ spin_unlock_irqrestore(&cpu_alloc_map_lock, flags);
+}
+EXPORT_SYMBOL(cpu_free);
Index: linux-2.6/mm/vmstat.c
===================================================================
--- linux-2.6.orig/mm/vmstat.c 2007-11-16 14:51:43.522430963 -0800
+++ linux-2.6/mm/vmstat.c 2007-11-16 14:51:47.578181134 -0800
@@ -639,6 +639,7 @@ static const char * const vmstat_text[]
"allocstall",
"pgrotated",
+ "cpu_bytes",
#endif
};
Index: linux-2.6/include/linux/percpu.h
===================================================================
--- linux-2.6.orig/include/linux/percpu.h 2007-11-16 14:51:29.334681520 -0800
+++ linux-2.6/include/linux/percpu.h 2007-11-16 14:51:47.578181134 -0800
@@ -110,4 +110,63 @@ static inline void percpu_free(void *__p
#define free_percpu(ptr) percpu_free((ptr))
#define per_cpu_ptr(ptr, cpu) percpu_ptr((ptr), (cpu))
+
+/*
+ * cpu allocator definitions
+ *
+ * The cpu allocator allows allocating an array of objects on all processors.
+ * A single pointer can then be used to access the instance of the object
+ * on a particular processor.
+ *
+ * Cpu objects are typically small. The allocator packs them tightly
+ * to increase the chance on each access that a per cpu object is already
+ * cached. Alignments may be specified but the intent is to align the data
+ * properly due to cpu alignment constraints and not to avoid cacheline
+ * contention. Any holes left by aligning objects are filled up with smaller
+ * objects that are allocated later.
+ *
+ * Cpu data can be allocated using CPU_ALLOC. The resulting pointer is
+ * pointing to the instance of the variable on cpu 0. It is generally an
+ * error to use the pointer directly unless we are running on cpu 0. So
+ * the use is valid during boot for example.
+ *
+ * The GFP flags have their usual function: __GFP_ZERO zeroes the object
+ * and other flags may be used to control reclaim behavior if the cpu
+ * areas have to be extended. However, zones cannot be selected nor
+ * can locality constraint flags be used.
+ *
+ * CPU_PTR() may be used to calculate the pointer for a specific processor.
+ * CPU_PTR is highly scalable since it simply adds the shifted value of
+ * smp_processor_id() to the base.
+ *
+ * Note: Synchronization is up to caller. If preemption is disabled then
+ * it is generally safe to access cpu variables (unless they are also
+ * handled from an interrupt context).
+ */
+
+#define CPU_OFFSET(__cpu) \
+ ((unsigned long)(__cpu) << (CONFIG_CPU_AREA_ORDER + PAGE_SHIFT))
+
+#define CPU_PTR(__p, __cpu) ((__typeof__(__p))((void *)(__p) + \
+ CPU_OFFSET(__cpu)))
+
+#define CPU_ALLOC(type, flags) cpu_alloc(sizeof(type), flags, \
+ __alignof__(type))
+#define CPU_FREE(pointer) cpu_free(pointer, sizeof(*(pointer)))
+
+#define THIS_CPU(__p) CPU_PTR(__p, smp_processor_id())
+#define __THIS_CPU(__p) CPU_PTR(__p, raw_smp_processor_id())
+
+/*
+ * Raw calls
+ */
+void *cpu_alloc(unsigned long size, gfp_t gfp, unsigned long align);
+void cpu_free(void *cpu_pointer, unsigned long size);
+
+/*
+ * Early boot allocator for per_cpu variables and special per cpu areas.
+ * Allocations are not tracked and cannot be freed.
+ */
+void *boot_cpu_alloc(unsigned long size);
+
#endif /* __LINUX_PERCPU_H */
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 02/30] cpu alloc: Use in SLUB
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
2007-11-16 23:09 ` [patch 01/30] cpu alloc: Simple version of the allocator (static allocations) Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
2007-11-16 23:09 ` [patch 03/30] cpu alloc: Remove SLUB fields Christoph Lameter
` (27 subsequent siblings)
29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[-- Attachment #1: cpu_alloc_slub_conversion --]
[-- Type: text/plain, Size: 11021 bytes --]
Using cpu alloc removes the needs for the per cpu arrays in the kmem_cache struct.
These could get quite big if we have to support system of up to thousands of cpus.
The use of alloc_percpu means that:
1. The size of kmem_cache for SMP configuration shrinks since we will only
need 1 pointer instead of NR_CPUS. The same pointer can be used by all
processors. Reduces cache footprint of the allocator.
2. We can dynamically size kmem_cache according to the actual nodes in the
system meaning less memory overhead for configurations that may potentially
support up to 1k NUMA nodes.
3. We can remove the diddle widdle with allocating and releasing kmem_cache_cpu
structures when bringing up and shuttting down cpus. The allocpercpu
logic will do it all for us. Removes some portions of the cpu hotplug
functionality.
4. Fastpath performance increases by another 20% vs. the earlier improvements.
Instead of having fastpath with 45-50 cycles it is now possible to get
below 40.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
include/linux/slub_def.h | 6 -
mm/slub.c | 182 ++++++-----------------------------------------
2 files changed, 25 insertions(+), 163 deletions(-)
Index: linux-2.6/include/linux/slub_def.h
===================================================================
--- linux-2.6.orig/include/linux/slub_def.h 2007-11-15 21:24:53.494154465 -0800
+++ linux-2.6/include/linux/slub_def.h 2007-11-15 21:25:07.622904866 -0800
@@ -34,6 +34,7 @@ struct kmem_cache_node {
* Slab cache management.
*/
struct kmem_cache {
+ struct kmem_cache_cpu *cpu_slab;
/* Used for retriving partial slabs etc */
unsigned long flags;
int size; /* The size of an object including meta data */
@@ -63,11 +64,6 @@ struct kmem_cache {
int defrag_ratio;
struct kmem_cache_node *node[MAX_NUMNODES];
#endif
-#ifdef CONFIG_SMP
- struct kmem_cache_cpu *cpu_slab[NR_CPUS];
-#else
- struct kmem_cache_cpu cpu_slab;
-#endif
};
/*
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2007-11-15 21:24:53.502154325 -0800
+++ linux-2.6/mm/slub.c 2007-11-15 21:25:07.622904866 -0800
@@ -239,15 +239,6 @@ static inline struct kmem_cache_node *ge
#endif
}
-static inline struct kmem_cache_cpu *get_cpu_slab(struct kmem_cache *s, int cpu)
-{
-#ifdef CONFIG_SMP
- return s->cpu_slab[cpu];
-#else
- return &s->cpu_slab;
-#endif
-}
-
/*
* The end pointer in a slab is special. It points to the first object in the
* slab but has bit 0 set to mark it.
@@ -1472,7 +1463,7 @@ static inline void flush_slab(struct kme
*/
static inline void __flush_cpu_slab(struct kmem_cache *s, int cpu)
{
- struct kmem_cache_cpu *c = get_cpu_slab(s, cpu);
+ struct kmem_cache_cpu *c = CPU_PTR(s->cpu_slab, cpu);
if (likely(c && c->page))
flush_slab(s, c);
@@ -1487,15 +1478,7 @@ static void flush_cpu_slab(void *d)
static void flush_all(struct kmem_cache *s)
{
-#ifdef CONFIG_SMP
on_each_cpu(flush_cpu_slab, s, 1, 1);
-#else
- unsigned long flags;
-
- local_irq_save(flags);
- flush_cpu_slab(s);
- local_irq_restore(flags);
-#endif
}
/*
@@ -1529,7 +1512,7 @@ static noinline unsigned long get_new_sl
if (!page)
return 0;
- *pc = c = get_cpu_slab(s, smp_processor_id());
+ *pc = c = THIS_CPU(s->cpu_slab);
if (c->page)
flush_slab(s, c);
c->page = page;
@@ -1641,25 +1624,26 @@ static void __always_inline *slab_alloc(
struct kmem_cache_cpu *c;
#ifdef CONFIG_FAST_CMPXCHG_LOCAL
- c = get_cpu_slab(s, get_cpu());
+ preempt_disable();
+ c = THIS_CPU(s->cpu_slab);
do {
object = c->freelist;
if (unlikely(is_end(object) || !node_match(c, node))) {
object = __slab_alloc(s, gfpflags, node, addr, c);
if (unlikely(!object)) {
- put_cpu();
+ preempt_enable();
goto out;
}
break;
}
} while (cmpxchg_local(&c->freelist, object, object[c->offset])
!= object);
- put_cpu();
+ preempt_enable();
#else
unsigned long flags;
local_irq_save(flags);
- c = get_cpu_slab(s, smp_processor_id());
+ c = THIS_CPU(s->cpu_slab);
if (unlikely((is_end(c->freelist)) || !node_match(c, node))) {
object = __slab_alloc(s, gfpflags, node, addr, c);
@@ -1784,7 +1768,8 @@ static void __always_inline slab_free(st
#ifdef CONFIG_FAST_CMPXCHG_LOCAL
void **freelist;
- c = get_cpu_slab(s, get_cpu());
+ preempt_disable();
+ c = THIS_CPU(s->cpu_slab);
debug_check_no_locks_freed(object, s->objsize);
do {
freelist = c->freelist;
@@ -1806,13 +1791,13 @@ static void __always_inline slab_free(st
}
object[c->offset] = freelist;
} while (cmpxchg_local(&c->freelist, freelist, object) != freelist);
- put_cpu();
+ preempt_enable();
#else
unsigned long flags;
local_irq_save(flags);
debug_check_no_locks_freed(object, s->objsize);
- c = get_cpu_slab(s, smp_processor_id());
+ c = THIS_CPU(s->cpu_slab);
if (likely(page == c->page && c->node >= 0)) {
object[c->offset] = c->freelist;
c->freelist = object;
@@ -2015,130 +2000,19 @@ static void init_kmem_cache_node(struct
#endif
}
-#ifdef CONFIG_SMP
-/*
- * Per cpu array for per cpu structures.
- *
- * The per cpu array places all kmem_cache_cpu structures from one processor
- * close together meaning that it becomes possible that multiple per cpu
- * structures are contained in one cacheline. This may be particularly
- * beneficial for the kmalloc caches.
- *
- * A desktop system typically has around 60-80 slabs. With 100 here we are
- * likely able to get per cpu structures for all caches from the array defined
- * here. We must be able to cover all kmalloc caches during bootstrap.
- *
- * If the per cpu array is exhausted then fall back to kmalloc
- * of individual cachelines. No sharing is possible then.
- */
-#define NR_KMEM_CACHE_CPU 100
-
-static DEFINE_PER_CPU(struct kmem_cache_cpu,
- kmem_cache_cpu)[NR_KMEM_CACHE_CPU];
-
-static DEFINE_PER_CPU(struct kmem_cache_cpu *, kmem_cache_cpu_free);
-static cpumask_t kmem_cach_cpu_free_init_once = CPU_MASK_NONE;
-
-static struct kmem_cache_cpu *alloc_kmem_cache_cpu(struct kmem_cache *s,
- int cpu, gfp_t flags)
-{
- struct kmem_cache_cpu *c = per_cpu(kmem_cache_cpu_free, cpu);
-
- if (c)
- per_cpu(kmem_cache_cpu_free, cpu) =
- (void *)c->freelist;
- else {
- /* Table overflow: So allocate ourselves */
- c = kmalloc_node(
- ALIGN(sizeof(struct kmem_cache_cpu), cache_line_size()),
- flags, cpu_to_node(cpu));
- if (!c)
- return NULL;
- }
-
- init_kmem_cache_cpu(s, c);
- return c;
-}
-
-static void free_kmem_cache_cpu(struct kmem_cache_cpu *c, int cpu)
-{
- if (c < per_cpu(kmem_cache_cpu, cpu) ||
- c > per_cpu(kmem_cache_cpu, cpu) + NR_KMEM_CACHE_CPU) {
- kfree(c);
- return;
- }
- c->freelist = (void *)per_cpu(kmem_cache_cpu_free, cpu);
- per_cpu(kmem_cache_cpu_free, cpu) = c;
-}
-
-static void free_kmem_cache_cpus(struct kmem_cache *s)
-{
- int cpu;
-
- for_each_online_cpu(cpu) {
- struct kmem_cache_cpu *c = get_cpu_slab(s, cpu);
-
- if (c) {
- s->cpu_slab[cpu] = NULL;
- free_kmem_cache_cpu(c, cpu);
- }
- }
-}
-
static int alloc_kmem_cache_cpus(struct kmem_cache *s, gfp_t flags)
{
int cpu;
- for_each_online_cpu(cpu) {
- struct kmem_cache_cpu *c = get_cpu_slab(s, cpu);
-
- if (c)
- continue;
-
- c = alloc_kmem_cache_cpu(s, cpu, flags);
- if (!c) {
- free_kmem_cache_cpus(s);
- return 0;
- }
- s->cpu_slab[cpu] = c;
- }
- return 1;
-}
-
-/*
- * Initialize the per cpu array.
- */
-static void init_alloc_cpu_cpu(int cpu)
-{
- int i;
+ s->cpu_slab = CPU_ALLOC(struct kmem_cache_cpu, flags);
- if (cpu_isset(cpu, kmem_cach_cpu_free_init_once))
- return;
-
- for (i = NR_KMEM_CACHE_CPU - 1; i >= 0; i--)
- free_kmem_cache_cpu(&per_cpu(kmem_cache_cpu, cpu)[i], cpu);
-
- cpu_set(cpu, kmem_cach_cpu_free_init_once);
-}
-
-static void __init init_alloc_cpu(void)
-{
- int cpu;
+ if (!s->cpu_slab)
+ return 0;
for_each_online_cpu(cpu)
- init_alloc_cpu_cpu(cpu);
- }
-
-#else
-static inline void free_kmem_cache_cpus(struct kmem_cache *s) {}
-static inline void init_alloc_cpu(void) {}
-
-static inline int alloc_kmem_cache_cpus(struct kmem_cache *s, gfp_t flags)
-{
- init_kmem_cache_cpu(s, &s->cpu_slab);
+ init_kmem_cache_cpu(s, CPU_PTR(s->cpu_slab, cpu));
return 1;
}
-#endif
#ifdef CONFIG_NUMA
/*
@@ -2452,9 +2326,8 @@ static inline int kmem_cache_close(struc
int node;
flush_all(s);
-
+ CPU_FREE(s->cpu_slab);
/* Attempt to free all objects */
- free_kmem_cache_cpus(s);
for_each_node_state(node, N_NORMAL_MEMORY) {
struct kmem_cache_node *n = get_node(s, node);
@@ -2958,8 +2831,6 @@ void __init kmem_cache_init(void)
int i;
int caches = 0;
- init_alloc_cpu();
-
#ifdef CONFIG_NUMA
/*
* Must first have the slab cache available for the allocations of the
@@ -3019,11 +2890,12 @@ void __init kmem_cache_init(void)
for (i = KMALLOC_SHIFT_LOW; i < PAGE_SHIFT; i++)
kmalloc_caches[i]. name =
kasprintf(GFP_KERNEL, "kmalloc-%d", 1 << i);
-
#ifdef CONFIG_SMP
register_cpu_notifier(&slab_notifier);
- kmem_size = offsetof(struct kmem_cache, cpu_slab) +
- nr_cpu_ids * sizeof(struct kmem_cache_cpu *);
+#endif
+#ifdef CONFIG_NUMA
+ kmem_size = offsetof(struct kmem_cache, node) +
+ nr_node_ids * sizeof(struct kmem_cache_node *);
#else
kmem_size = sizeof(struct kmem_cache);
#endif
@@ -3120,7 +2992,7 @@ struct kmem_cache *kmem_cache_create(con
* per cpu structures
*/
for_each_online_cpu(cpu)
- get_cpu_slab(s, cpu)->objsize = s->objsize;
+ CPU_PTR(s->cpu_slab, cpu)->objsize = s->objsize;
s->inuse = max_t(int, s->inuse, ALIGN(size, sizeof(void *)));
up_write(&slub_lock);
if (sysfs_slab_alias(s, name))
@@ -3165,11 +3037,9 @@ static int __cpuinit slab_cpuup_callback
switch (action) {
case CPU_UP_PREPARE:
case CPU_UP_PREPARE_FROZEN:
- init_alloc_cpu_cpu(cpu);
down_read(&slub_lock);
list_for_each_entry(s, &slab_caches, list)
- s->cpu_slab[cpu] = alloc_kmem_cache_cpu(s, cpu,
- GFP_KERNEL);
+ init_kmem_cache_cpu(s, CPU_PTR(s->cpu_slab, cpu));
up_read(&slub_lock);
break;
@@ -3179,13 +3049,9 @@ static int __cpuinit slab_cpuup_callback
case CPU_DEAD_FROZEN:
down_read(&slub_lock);
list_for_each_entry(s, &slab_caches, list) {
- struct kmem_cache_cpu *c = get_cpu_slab(s, cpu);
-
local_irq_save(flags);
__flush_cpu_slab(s, cpu);
local_irq_restore(flags);
- free_kmem_cache_cpu(c, cpu);
- s->cpu_slab[cpu] = NULL;
}
up_read(&slub_lock);
break;
@@ -3657,7 +3523,7 @@ static unsigned long slab_objects(struct
for_each_possible_cpu(cpu) {
struct page *page;
int node;
- struct kmem_cache_cpu *c = get_cpu_slab(s, cpu);
+ struct kmem_cache_cpu *c = CPU_PTR(s->cpu_slab, cpu);
if (!c)
continue;
@@ -3724,7 +3590,7 @@ static int any_slab_objects(struct kmem_
int cpu;
for_each_possible_cpu(cpu) {
- struct kmem_cache_cpu *c = get_cpu_slab(s, cpu);
+ struct kmem_cache_cpu *c = CPU_PTR(s->cpu_slab, cpu);
if (c && c->page)
return 1;
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 03/30] cpu alloc: Remove SLUB fields
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
2007-11-16 23:09 ` [patch 01/30] cpu alloc: Simple version of the allocator (static allocations) Christoph Lameter
2007-11-16 23:09 ` [patch 02/30] cpu alloc: Use in SLUB Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
2007-11-16 23:09 ` [patch 04/30] cpu alloc: page allocator conversion Christoph Lameter
` (26 subsequent siblings)
29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[-- Attachment #1: cpu_alloc_remove_slub_fields --]
[-- Type: text/plain, Size: 5883 bytes --]
Remove the fields in kmem_cache_cpu that were used to cache data from
kmem_cache when they were in different cachelines. The cacheline that holds
the per cpu array pointer now also holds these values. We can cut down the
kmem_cache_cpu size to almost half.
The get_freepointer() and set_freepointer() functions that used to be only
intended for the slow path now are also useful for the hot path since access
to the field does not require an additional cacheline anymore. This results
in consistent use of setting the freepointer for objects throughout SLUB.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
include/linux/slub_def.h | 3 --
mm/slub.c | 50 +++++++++++++++--------------------------------
2 files changed, 17 insertions(+), 36 deletions(-)
Index: linux-2.6/include/linux/slub_def.h
===================================================================
--- linux-2.6.orig/include/linux/slub_def.h 2007-11-15 21:25:07.622904866 -0800
+++ linux-2.6/include/linux/slub_def.h 2007-11-15 21:25:10.335154196 -0800
@@ -15,9 +15,6 @@ struct kmem_cache_cpu {
void **freelist;
struct page *page;
int node;
- unsigned int offset;
- unsigned int objsize;
- unsigned int objects;
};
struct kmem_cache_node {
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2007-11-15 21:25:07.622904866 -0800
+++ linux-2.6/mm/slub.c 2007-11-15 21:25:10.339154532 -0800
@@ -273,13 +273,6 @@ static inline int check_valid_pointer(st
return 1;
}
-/*
- * Slow version of get and set free pointer.
- *
- * This version requires touching the cache lines of kmem_cache which
- * we avoid to do in the fast alloc free paths. There we obtain the offset
- * from the page struct.
- */
static inline void *get_freepointer(struct kmem_cache *s, void *object)
{
return *(void **)(object + s->offset);
@@ -1438,10 +1431,10 @@ static void deactivate_slab(struct kmem_
/* Retrieve object from cpu_freelist */
object = c->freelist;
- c->freelist = c->freelist[c->offset];
+ c->freelist = get_freepointer(s, c->freelist);
/* And put onto the regular freelist */
- object[c->offset] = page->freelist;
+ set_freepointer(s, object, page->freelist);
page->freelist = object;
page->inuse--;
}
@@ -1573,8 +1566,8 @@ load_freelist:
goto debug;
object = c->page->freelist;
- c->freelist = object[c->offset];
- c->page->inuse = c->objects;
+ c->freelist = get_freepointer(s, object);
+ c->page->inuse = s->objects;
c->page->freelist = c->page->end;
c->node = page_to_nid(c->page);
unlock_out:
@@ -1602,7 +1595,7 @@ debug:
goto another_slab;
c->page->inuse++;
- c->page->freelist = object[c->offset];
+ c->page->freelist = get_freepointer(s, object);
c->node = -1;
goto unlock_out;
}
@@ -1636,8 +1629,8 @@ static void __always_inline *slab_alloc(
}
break;
}
- } while (cmpxchg_local(&c->freelist, object, object[c->offset])
- != object);
+ } while (cmpxchg_local(&c->freelist, object,
+ get_freepointer(s, object)) != object);
preempt_enable();
#else
unsigned long flags;
@@ -1653,13 +1646,13 @@ static void __always_inline *slab_alloc(
}
} else {
object = c->freelist;
- c->freelist = object[c->offset];
+ c->freelist = get_freepointer(s, object);
}
local_irq_restore(flags);
#endif
if (unlikely((gfpflags & __GFP_ZERO)))
- memset(object, 0, c->objsize);
+ memset(object, 0, s->objsize);
out:
return object;
}
@@ -1687,7 +1680,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_node);
* handling required then we can return immediately.
*/
static void __slab_free(struct kmem_cache *s, struct page *page,
- void *x, void *addr, unsigned int offset)
+ void *x, void *addr)
{
void *prior;
void **object = (void *)x;
@@ -1703,7 +1696,8 @@ static void __slab_free(struct kmem_cach
if (unlikely(state & SLABDEBUG))
goto debug;
checks_ok:
- prior = object[offset] = page->freelist;
+ prior = page->freelist;
+ set_freepointer(s, object, prior);
page->freelist = object;
page->inuse--;
@@ -1786,10 +1780,10 @@ static void __always_inline slab_free(st
* since the freelist pointers are unique per slab.
*/
if (unlikely(page != c->page || c->node < 0)) {
- __slab_free(s, page, x, addr, c->offset);
+ __slab_free(s, page, x, addr);
break;
}
- object[c->offset] = freelist;
+ set_freepointer(s, object, freelist);
} while (cmpxchg_local(&c->freelist, freelist, object) != freelist);
preempt_enable();
#else
@@ -1799,10 +1793,10 @@ static void __always_inline slab_free(st
debug_check_no_locks_freed(object, s->objsize);
c = THIS_CPU(s->cpu_slab);
if (likely(page == c->page && c->node >= 0)) {
- object[c->offset] = c->freelist;
+ set_freepointer(s, object, c->freelist);
c->freelist = object;
} else
- __slab_free(s, page, x, addr, c->offset);
+ __slab_free(s, page, x, addr);
local_irq_restore(flags);
#endif
@@ -1984,9 +1978,6 @@ static void init_kmem_cache_cpu(struct k
c->page = NULL;
c->freelist = (void *)PAGE_MAPPING_ANON;
c->node = 0;
- c->offset = s->offset / sizeof(void *);
- c->objsize = s->objsize;
- c->objects = s->objects;
}
static void init_kmem_cache_node(struct kmem_cache_node *n)
@@ -2978,21 +2969,14 @@ struct kmem_cache *kmem_cache_create(con
down_write(&slub_lock);
s = find_mergeable(size, align, flags, name, ctor);
if (s) {
- int cpu;
-
s->refcount++;
+
/*
* Adjust the object sizes so that we clear
* the complete object on kzalloc.
*/
s->objsize = max(s->objsize, (int)size);
- /*
- * And then we need to update the object size in the
- * per cpu structures
- */
- for_each_online_cpu(cpu)
- CPU_PTR(s->cpu_slab, cpu)->objsize = s->objsize;
s->inuse = max_t(int, s->inuse, ALIGN(size, sizeof(void *)));
up_write(&slub_lock);
if (sysfs_slab_alias(s, name))
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 04/30] cpu alloc: page allocator conversion
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
` (2 preceding siblings ...)
2007-11-16 23:09 ` [patch 03/30] cpu alloc: Remove SLUB fields Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
2007-11-16 23:09 ` [patch 05/30] cpu_alloc: Implement dynamically extendable cpu areas Christoph Lameter
` (25 subsequent siblings)
29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[-- Attachment #1: cpu_alloc_page_allocator_conversion --]
[-- Type: text/plain, Size: 12930 bytes --]
Use the new cpu_alloc functionality to avoid per cpu arrays in struct zone.
This drastically reduces the size of struct zone for systems with a large
amounts of processors and allows placement of critical variables of struct
zone in one cacheline even on very large systems.
Another effect is that the pagesets of one processor are placed near one
another. If multiple pagesets from different zones fit into one cacheline
then additional cacheline fetches can be avoided on the hot paths when
allocating memory from multiple zones.
Surprisingly this clears up much of the painful NUMA bringup. Bootstrap
becomes simpler if we use the same scheme for UP, SMP, NUMA. #ifdefs are
reduced and we can drop the zone_pcp macro.
Hotplug handling is also simplified since cpu alloc can bring up and
shut down cpu areas for a specific cpu as a whole. So there is no need to
allocate or free individual pagesets.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
include/linux/mm.h | 4 -
include/linux/mmzone.h | 12 ---
mm/page_alloc.c | 161 ++++++++++++++++++-------------------------------
mm/vmstat.c | 14 ++--
4 files changed, 72 insertions(+), 119 deletions(-)
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h 2007-11-15 21:24:47.238154208 -0800
+++ linux-2.6/include/linux/mm.h 2007-11-15 21:25:12.735154250 -0800
@@ -931,11 +931,7 @@ extern void show_mem(void);
extern void si_meminfo(struct sysinfo * val);
extern void si_meminfo_node(struct sysinfo *val, int nid);
-#ifdef CONFIG_NUMA
extern void setup_per_cpu_pageset(void);
-#else
-static inline void setup_per_cpu_pageset(void) {}
-#endif
/* prio_tree.c */
void vma_prio_tree_add(struct vm_area_struct *, struct vm_area_struct *old);
Index: linux-2.6/include/linux/mmzone.h
===================================================================
--- linux-2.6.orig/include/linux/mmzone.h 2007-11-15 21:24:49.970154448 -0800
+++ linux-2.6/include/linux/mmzone.h 2007-11-15 21:25:12.735154250 -0800
@@ -121,13 +121,7 @@ struct per_cpu_pageset {
s8 stat_threshold;
s8 vm_stat_diff[NR_VM_ZONE_STAT_ITEMS];
#endif
-} ____cacheline_aligned_in_smp;
-
-#ifdef CONFIG_NUMA
-#define zone_pcp(__z, __cpu) ((__z)->pageset[(__cpu)])
-#else
-#define zone_pcp(__z, __cpu) (&(__z)->pageset[(__cpu)])
-#endif
+};
enum zone_type {
#ifdef CONFIG_ZONE_DMA
@@ -231,10 +225,8 @@ struct zone {
*/
unsigned long min_unmapped_pages;
unsigned long min_slab_pages;
- struct per_cpu_pageset *pageset[NR_CPUS];
-#else
- struct per_cpu_pageset pageset[NR_CPUS];
#endif
+ struct per_cpu_pageset *pageset;
/*
* free areas of different sizes
*/
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c 2007-11-15 21:24:50.330904214 -0800
+++ linux-2.6/mm/page_alloc.c 2007-11-15 21:25:12.739154691 -0800
@@ -892,7 +892,7 @@ static void __drain_pages(unsigned int c
if (!populated_zone(zone))
continue;
- pset = zone_pcp(zone, cpu);
+ pset = CPU_PTR(zone->pageset, cpu);
pcp = &pset->pcp;
local_irq_save(flags);
@@ -988,8 +988,8 @@ static void fastcall free_hot_cold_page(
arch_free_page(page, 0);
kernel_map_pages(page, 1, 0);
- pcp = &zone_pcp(zone, get_cpu())->pcp;
local_irq_save(flags);
+ pcp = &THIS_CPU(zone->pageset)->pcp;
__count_vm_event(PGFREE);
if (cold)
list_add_tail(&page->lru, &pcp->list);
@@ -1002,7 +1002,6 @@ static void fastcall free_hot_cold_page(
pcp->count -= pcp->batch;
}
local_irq_restore(flags);
- put_cpu();
}
void fastcall free_hot_page(struct page *page)
@@ -1044,16 +1043,14 @@ static struct page *buffered_rmqueue(str
unsigned long flags;
struct page *page;
int cold = !!(gfp_flags & __GFP_COLD);
- int cpu;
int migratetype = allocflags_to_migratetype(gfp_flags);
again:
- cpu = get_cpu();
if (likely(order == 0)) {
struct per_cpu_pages *pcp;
- pcp = &zone_pcp(zone, cpu)->pcp;
local_irq_save(flags);
+ pcp = &THIS_CPU(zone->pageset)->pcp;
if (!pcp->count) {
pcp->count = rmqueue_bulk(zone, 0,
pcp->batch, &pcp->list, migratetype);
@@ -1092,7 +1089,6 @@ again:
__count_zone_vm_events(PGALLOC, zone, 1 << order);
zone_statistics(zonelist, zone);
local_irq_restore(flags);
- put_cpu();
VM_BUG_ON(bad_range(zone, page));
if (prep_new_page(page, order, gfp_flags))
@@ -1101,7 +1097,6 @@ again:
failed:
local_irq_restore(flags);
- put_cpu();
return NULL;
}
@@ -1795,7 +1790,7 @@ void show_free_areas(void)
for_each_online_cpu(cpu) {
struct per_cpu_pageset *pageset;
- pageset = zone_pcp(zone, cpu);
+ pageset = CPU_PTR(zone->pageset, cpu);
printk("CPU %4d: hi:%5d, btch:%4d usd:%4d\n",
cpu, pageset->pcp.high,
@@ -2621,82 +2616,33 @@ static void setup_pagelist_highmark(stru
pcp->batch = PAGE_SHIFT * 8;
}
-
-#ifdef CONFIG_NUMA
/*
- * Boot pageset table. One per cpu which is going to be used for all
- * zones and all nodes. The parameters will be set in such a way
- * that an item put on a list will immediately be handed over to
- * the buddy list. This is safe since pageset manipulation is done
- * with interrupts disabled.
- *
- * Some NUMA counter updates may also be caught by the boot pagesets.
- *
- * The boot_pagesets must be kept even after bootup is complete for
- * unused processors and/or zones. They do play a role for bootstrapping
- * hotplugged processors.
- *
- * zoneinfo_show() and maybe other functions do
- * not check if the processor is online before following the pageset pointer.
- * Other parts of the kernel may not check if the zone is available.
+ * Dynamically allocate memory for the per cpu pageset array in struct zone.
*/
-static struct per_cpu_pageset boot_pageset[NR_CPUS];
-
-/*
- * Dynamically allocate memory for the
- * per cpu pageset array in struct zone.
- */
-static int __cpuinit process_zones(int cpu)
+static void __cpuinit process_zones(int cpu)
{
- struct zone *zone, *dzone;
+ struct zone *zone;
int node = cpu_to_node(cpu);
node_set_state(node, N_CPU); /* this node has a cpu */
for_each_zone(zone) {
+ struct per_cpu_pageset *pcp =
+ CPU_PTR(zone->pageset, cpu);
if (!populated_zone(zone))
continue;
- zone_pcp(zone, cpu) = kmalloc_node(sizeof(struct per_cpu_pageset),
- GFP_KERNEL, node);
- if (!zone_pcp(zone, cpu))
- goto bad;
-
- setup_pageset(zone_pcp(zone, cpu), zone_batchsize(zone));
+ setup_pageset(pcp, zone_batchsize(zone));
if (percpu_pagelist_fraction)
- setup_pagelist_highmark(zone_pcp(zone, cpu),
- (zone->present_pages / percpu_pagelist_fraction));
- }
-
- return 0;
-bad:
- for_each_zone(dzone) {
- if (!populated_zone(dzone))
- continue;
- if (dzone == zone)
- break;
- kfree(zone_pcp(dzone, cpu));
- zone_pcp(dzone, cpu) = NULL;
- }
- return -ENOMEM;
-}
+ setup_pagelist_highmark(pcp, zone->present_pages /
+ percpu_pagelist_fraction);
-static inline void free_zone_pagesets(int cpu)
-{
- struct zone *zone;
-
- for_each_zone(zone) {
- struct per_cpu_pageset *pset = zone_pcp(zone, cpu);
-
- /* Free per_cpu_pageset if it is slab allocated */
- if (pset != &boot_pageset[cpu])
- kfree(pset);
- zone_pcp(zone, cpu) = NULL;
}
}
+#ifdef CONFIG_SMP
static int __cpuinit pageset_cpuup_callback(struct notifier_block *nfb,
unsigned long action,
void *hcpu)
@@ -2707,14 +2653,7 @@ static int __cpuinit pageset_cpuup_callb
switch (action) {
case CPU_UP_PREPARE:
case CPU_UP_PREPARE_FROZEN:
- if (process_zones(cpu))
- ret = NOTIFY_BAD;
- break;
- case CPU_UP_CANCELED:
- case CPU_UP_CANCELED_FROZEN:
- case CPU_DEAD:
- case CPU_DEAD_FROZEN:
- free_zone_pagesets(cpu);
+ process_zones(cpu);
break;
default:
break;
@@ -2724,21 +2663,34 @@ static int __cpuinit pageset_cpuup_callb
static struct notifier_block __cpuinitdata pageset_notifier =
{ &pageset_cpuup_callback, NULL, 0 };
+#endif
void __init setup_per_cpu_pageset(void)
{
- int err;
-
- /* Initialize per_cpu_pageset for cpu 0.
+ /*
+ * Initialize per_cpu settings for the boot cpu.
* A cpuup callback will do this for every cpu
- * as it comes online
+ * as it comes online.
+ *
+ * This is also initializing the cpu areas for the
+ * pagesets.
*/
- err = process_zones(smp_processor_id());
- BUG_ON(err);
- register_cpu_notifier(&pageset_notifier);
-}
+ struct zone *zone;
+ for_each_zone(zone) {
+
+ if (!populated_zone(zone))
+ continue;
+
+ zone->pageset = CPU_ALLOC(struct per_cpu_pageset,
+ GFP_KERNEL|__GFP_ZERO);
+ BUG_ON(!zone->pageset);
+ }
+ process_zones(smp_processor_id());
+#ifdef CONFIG_SMP
+ register_cpu_notifier(&pageset_notifier);
#endif
+}
static noinline __init_refok
int zone_wait_table_init(struct zone *zone, unsigned long zone_size_pages)
@@ -2785,21 +2737,30 @@ int zone_wait_table_init(struct zone *zo
static __meminit void zone_pcp_init(struct zone *zone)
{
- int cpu;
- unsigned long batch = zone_batchsize(zone);
+ static struct per_cpu_pageset boot_pageset;
- for (cpu = 0; cpu < NR_CPUS; cpu++) {
-#ifdef CONFIG_NUMA
- /* Early boot. Slab allocator not functional yet */
- zone_pcp(zone, cpu) = &boot_pageset[cpu];
- setup_pageset(&boot_pageset[cpu],0);
-#else
- setup_pageset(zone_pcp(zone,cpu), batch);
-#endif
- }
+ /*
+ * Fake a cpu_alloc pointer that can take the required
+ * offset to get to the boot pageset. This is only
+ * needed for the boot pageset while bootstrapping
+ * the new zone. In the course of zone bootstrap
+ * setup_cpu_pagesets() will do the proper CPU_ALLOC and
+ * set things up the right way.
+ *
+ * Deferral allows CPU_ALLOC() to use the boot pageset
+ * to allocate the initial memory to get going and then provide
+ * the proper memory when called from setup_cpu_pagesets() to
+ * install the proper pagesets.
+ *
+ * Deferral also allows slab allocators to perform their
+ * initialization without resorting to bootmem.
+ */
+ zone->pageset = &boot_pageset - CPU_OFFSET(smp_processor_id());
+ setup_pageset(&boot_pageset, 0);
if (zone->present_pages)
- printk(KERN_DEBUG " %s zone: %lu pages, LIFO batch:%lu\n",
- zone->name, zone->present_pages, batch);
+ printk(KERN_DEBUG " %s zone: %lu pages, LIFO batch:%u\n",
+ zone->name, zone->present_pages,
+ zone_batchsize(zone));
}
__meminit int init_currently_empty_zone(struct zone *zone,
@@ -4214,11 +4175,13 @@ int percpu_pagelist_fraction_sysctl_hand
ret = proc_dointvec_minmax(table, write, file, buffer, length, ppos);
if (!write || (ret == -EINVAL))
return ret;
- for_each_zone(zone) {
- for_each_online_cpu(cpu) {
+ for_each_online_cpu(cpu) {
+ for_each_zone(zone) {
unsigned long high;
+
high = zone->present_pages / percpu_pagelist_fraction;
- setup_pagelist_highmark(zone_pcp(zone, cpu), high);
+ setup_pagelist_highmark(CPU_PTR(zone->pageset, cpu),
+ high);
}
}
return 0;
Index: linux-2.6/mm/vmstat.c
===================================================================
--- linux-2.6.orig/mm/vmstat.c 2007-11-15 21:24:50.730654207 -0800
+++ linux-2.6/mm/vmstat.c 2007-11-15 21:25:12.739154691 -0800
@@ -147,7 +147,8 @@ static void refresh_zone_stat_thresholds
threshold = calculate_threshold(zone);
for_each_online_cpu(cpu)
- zone_pcp(zone, cpu)->stat_threshold = threshold;
+ CPU_PTR(zone->pageset, cpu)->stat_threshold
+ = threshold;
}
}
@@ -157,7 +158,8 @@ static void refresh_zone_stat_thresholds
void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
int delta)
{
- struct per_cpu_pageset *pcp = zone_pcp(zone, smp_processor_id());
+ struct per_cpu_pageset *pcp = THIS_CPU(zone->pageset);
+
s8 *p = pcp->vm_stat_diff + item;
long x;
@@ -210,7 +212,7 @@ EXPORT_SYMBOL(mod_zone_page_state);
*/
void __inc_zone_state(struct zone *zone, enum zone_stat_item item)
{
- struct per_cpu_pageset *pcp = zone_pcp(zone, smp_processor_id());
+ struct per_cpu_pageset *pcp = THIS_CPU(zone->pageset);
s8 *p = pcp->vm_stat_diff + item;
(*p)++;
@@ -231,7 +233,7 @@ EXPORT_SYMBOL(__inc_zone_page_state);
void __dec_zone_state(struct zone *zone, enum zone_stat_item item)
{
- struct per_cpu_pageset *pcp = zone_pcp(zone, smp_processor_id());
+ struct per_cpu_pageset *pcp = THIS_CPU(zone->pageset);
s8 *p = pcp->vm_stat_diff + item;
(*p)--;
@@ -307,7 +309,7 @@ void refresh_cpu_vm_stats(int cpu)
if (!populated_zone(zone))
continue;
- p = zone_pcp(zone, cpu);
+ p = CPU_PTR(zone->pageset, cpu);
for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
if (p->vm_stat_diff[i]) {
@@ -680,7 +682,7 @@ static void zoneinfo_show_print(struct s
for_each_online_cpu(i) {
struct per_cpu_pageset *pageset;
- pageset = zone_pcp(zone, i);
+ pageset = CPU_PTR(zone->pageset, i);
seq_printf(m,
"\n cpu: %i"
"\n count: %i"
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 05/30] cpu_alloc: Implement dynamically extendable cpu areas
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
` (3 preceding siblings ...)
2007-11-16 23:09 ` [patch 04/30] cpu alloc: page allocator conversion Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
2007-11-16 23:09 ` [patch 06/30] cpu alloc: x86 support Christoph Lameter
` (24 subsequent siblings)
29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[-- Attachment #1: cpu_alloc_virtual --]
[-- Type: text/plain, Size: 13552 bytes --]
Virtually map the cpu areas. This allows bigger maximum sizes and to only
populate the virtual mappings on demand.
In order to use the virtual mapping capability the arch must setup some
configuration variables in arch/xxx/Kconfig:
CONFIG_CPU_AREA_VIRTUAL to y
CONFIG_CPU_AREA_ORDER
to the largest allowed size that the per cpu area can grow to.
CONFIG_CPU_AREA_ALLOC_ORDER
to the allocation size when the cpu area needs to grow. Use 0
here to guarantee order 0 allocations.
The address to use must be defined in CPU_AREA_BASE. This is typically done
in include/asm-xxx/pgtable.h.
The maximum space used by the cpu are is
NR_CPUS * (PAGE_SIZE << CONFIG_CPU_AREA_ORDER)
An arch may provide its own population function for the virtual mappings
(in order to exploit huge page mappings and other frills of the MMU of an
architecture). The default populate function uses single page mappings.
int cpu_area_populate(void *start, unsigned long size, gfp_t flags, int node)
The list of cpu_area_xx functions exported in include/linux/mm.h may be used
as helpers to generate the mapping that the arch needs.
In the simplest form the arch code calls:
cpu_area_populate_basepages(start, size, flags, node);
The arch code must call
cpu_area_alloc_block(unsigned long size, gfp_t flags, int node)
for all its memory needs during the construction of the custom page table.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
include/linux/mm.h | 13 ++
mm/Kconfig | 10 +
mm/cpu_alloc.c | 287 +++++++++++++++++++++++++++++++++++++++++++++++++++--
3 files changed, 299 insertions(+), 11 deletions(-)
Index: linux-2.6/mm/cpu_alloc.c
===================================================================
--- linux-2.6.orig/mm/cpu_alloc.c 2007-11-16 14:54:29.890430938 -0800
+++ linux-2.6/mm/cpu_alloc.c 2007-11-16 14:54:37.106404761 -0800
@@ -17,6 +17,12 @@
#include <linux/module.h>
#include <linux/percpu.h>
#include <linux/bitmap.h>
+#include <linux/vmalloc.h>
+#include <linux/bootmem.h>
+#include <linux/sched.h> /* i386 definition of init_mm */
+#include <linux/highmem.h> /* i386 dependency on highmem config */
+#include <asm/pgtable.h>
+#include <asm/pgalloc.h>
/*
* Basic allocation unit. A bit map is created to track the use of each
@@ -24,7 +30,7 @@
*/
#define UNIT_SIZE sizeof(int)
-#define UNITS (ALLOC_SIZE / UNIT_SIZE)
+#define UNITS_PER_BLOCK (ALLOC_SIZE / UNIT_SIZE)
/*
* How many units are needed for an object of a given size
@@ -40,6 +46,249 @@ static int size_to_units(unsigned long s
static DEFINE_SPINLOCK(cpu_alloc_map_lock);
static unsigned long units_reserved; /* Units reserved by boot allocations */
+#ifdef CONFIG_CPU_AREA_VIRTUAL
+
+/*
+ * Virtualized cpu area. The cpu area can be extended if more space is needed.
+ */
+
+#define cpu_area ((u8 *)(CPU_AREA_BASE))
+#define ALLOC_SIZE (1UL << (CONFIG_CPU_AREA_ALLOC_ORDER + PAGE_SHIFT))
+#define BOOT_ALLOC (1 << __GFP_BITS_SHIFT)
+
+
+/*
+ * The maximum number of blocks is the maximum size of the
+ * cpu area for one processor divided by the size of an allocation
+ * block.
+ */
+#define MAX_BLOCKS (1UL << (CONFIG_CPU_AREA_ORDER - \
+ CONFIG_CPU_AREA_ALLOC_ORDER))
+
+
+static unsigned long *cpu_alloc_map = NULL;
+static int cpu_alloc_map_order = -1; /* Size of the bitmap in page order */
+static unsigned long active_blocks; /* Number of block allocated on each cpu */
+static unsigned long units_total; /* Total units that are managed */
+/*
+ * Allocate a block of memory to be used to provide cpu area memory
+ * or to extend the bitmap for the cpu map.
+ */
+void *cpu_area_alloc_block(unsigned long size, gfp_t flags, int node)
+{
+ if (!(flags & BOOT_ALLOC)) {
+ struct page *page = alloc_pages_node(node,
+ flags, get_order(size));
+
+ if (page)
+ return page_address(page);
+ return NULL;
+ } else
+ return __alloc_bootmem_node(NODE_DATA(node), size, size,
+ __pa(MAX_DMA_ADDRESS));
+}
+
+pte_t *cpu_area_pte_populate(pmd_t *pmd, unsigned long addr,
+ gfp_t flags, int node)
+{
+ pte_t *pte = pte_offset_kernel(pmd, addr);
+ if (pte_none(*pte)) {
+ pte_t entry;
+ void *p = cpu_area_alloc_block(PAGE_SIZE, flags, node);
+ if (!p)
+ return 0;
+ entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL);
+ set_pte_at(&init_mm, addr, pte, entry);
+ }
+ return pte;
+}
+
+pmd_t *cpu_area_pmd_populate(pud_t *pud, unsigned long addr,
+ gfp_t flags, int node)
+{
+ pmd_t *pmd = pmd_offset(pud, addr);
+ if (pmd_none(*pmd)) {
+ void *p = cpu_area_alloc_block(PAGE_SIZE, flags, node);
+ if (!p)
+ return 0;
+ pmd_populate_kernel(&init_mm, pmd, p);
+ }
+ return pmd;
+}
+
+pud_t *cpu_area_pud_populate(pgd_t *pgd, unsigned long addr,
+ gfp_t flags, int node)
+{
+ pud_t *pud = pud_offset(pgd, addr);
+ if (pud_none(*pud)) {
+ void *p = cpu_area_alloc_block(PAGE_SIZE, flags, node);
+ if (!p)
+ return 0;
+ pud_populate(&init_mm, pud, p);
+ }
+ return pud;
+}
+
+pgd_t *cpu_area_pgd_populate(unsigned long addr, gfp_t flags, int node)
+{
+ pgd_t *pgd = pgd_offset_k(addr);
+ if (pgd_none(*pgd)) {
+ void *p = cpu_area_alloc_block(PAGE_SIZE, flags, node);
+ if (!p)
+ return 0;
+ pgd_populate(&init_mm, pgd, p);
+ }
+ return pgd;
+}
+
+int cpu_area_populate_basepages(void *start, unsigned long size,
+ gfp_t flags, int node)
+{
+ unsigned long addr = (unsigned long)start;
+ unsigned long end = addr + size;
+ pgd_t *pgd;
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *pte;
+
+ for (; addr < end; addr += PAGE_SIZE) {
+ pgd = cpu_area_pgd_populate(addr, flags, node);
+ if (!pgd)
+ return -ENOMEM;
+ pud = cpu_area_pud_populate(pgd, addr, flags, node);
+ if (!pud)
+ return -ENOMEM;
+ pmd = cpu_area_pmd_populate(pud, addr, flags, node);
+ if (!pmd)
+ return -ENOMEM;
+ pte = cpu_area_pte_populate(pmd, addr, flags, node);
+ if (!pte)
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+/*
+ * If no other population function is defined then this function will stand
+ * in and provide the capability to map PAGE_SIZE pages into the cpu area.
+ */
+int __attribute__((weak)) cpu_area_populate(void *start, unsigned long size,
+ gfp_t flags, int node)
+{
+ return cpu_area_populate_basepages(start, size, flags, node);
+}
+
+/*
+ * Extend the areas on all processors. This function may be called repeatedly
+ * until we have enough space to accomodate a newly allocated object.
+ *
+ * Must hold the cpu_alloc_map_lock on entry. Will drop the lock and then
+ * regain it.
+ */
+static int expand_cpu_area(gfp_t flags)
+{
+ unsigned long blocks = active_blocks;
+ unsigned long bits;
+ int cpu;
+ int err = -ENOMEM;
+ int map_order;
+ unsigned long *new_map = NULL;
+ void *start;
+
+ if (active_blocks == MAX_BLOCKS)
+ goto out;
+
+ spin_unlock(&cpu_alloc_map_lock);
+ if (flags & __GFP_WAIT)
+ local_irq_enable();
+
+ /*
+ * Determine the size of the bit map needed
+ */
+ bits = (blocks + 1) * UNITS_PER_BLOCK - units_reserved;
+
+ map_order = get_order(DIV_ROUND_UP(bits, 8));
+ BUG_ON(map_order >= MAX_ORDER);
+ start = cpu_area + \
+ (blocks << (PAGE_SHIFT + CONFIG_CPU_AREA_ALLOC_ORDER));
+
+ for_each_possible_cpu(cpu) {
+ err = cpu_area_populate(CPU_PTR(start, cpu), ALLOC_SIZE,
+ flags, cpu_to_node(cpu));
+
+ if (err) {
+ spin_lock(&cpu_alloc_map_lock);
+ goto out;
+ }
+ }
+
+ if (map_order > cpu_alloc_map_order) {
+ new_map = cpu_area_alloc_block(PAGE_SIZE << map_order,
+ flags | __GFP_ZERO, 0);
+ if (!new_map)
+ goto out;
+ }
+
+ if (flags & __GFP_WAIT)
+ local_irq_disable();
+ spin_lock(&cpu_alloc_map_lock);
+
+ /*
+ * We dropped the lock. Another processor may have already extended
+ * the cpu area size as needed.
+ */
+ if (blocks != active_blocks) {
+ if (new_map)
+ free_pages((unsigned long)new_map,
+ map_order);
+ err = 0;
+ goto out;
+ }
+
+ if (new_map) {
+ /*
+ * Need to extend the bitmap
+ */
+ if (cpu_alloc_map)
+ memcpy(new_map, cpu_alloc_map,
+ PAGE_SIZE << cpu_alloc_map_order);
+ cpu_alloc_map = new_map;
+ cpu_alloc_map_order = map_order;
+ }
+
+ active_blocks++;
+ units_total += UNITS_PER_BLOCK;
+ err = 0;
+out:
+ return err;
+}
+
+void * __init boot_cpu_alloc(unsigned long size)
+{
+ unsigned long flags;
+ unsigned long x = units_reserved;
+ unsigned long units = size_to_units(size);
+
+ /*
+ * Locking is really not necessary during boot
+ * but expand_cpu_area() unlocks and relocks.
+ * If we do not perform locking here then
+ *
+ * 1. The cpu_alloc_map_lock is locked when
+ * we exit boot causing a hang on the next cpu_alloc().
+ * 2. lockdep will get upset if we do not consistently
+ * handle things.
+ */
+ spin_lock_irqsave(&cpu_alloc_map_lock, flags);
+ while (units_reserved + units > units_total)
+ expand_cpu_area(BOOT_ALLOC);
+ units_reserved += units;
+ spin_unlock_irqrestore(&cpu_alloc_map_lock, flags);
+ return cpu_area + x * UNIT_SIZE;
+}
+#else
+
/*
* Static configuration. The cpu areas are of a fixed size and
* cannot be extended. Such configurations are mainly useful on
@@ -51,16 +300,24 @@ static unsigned long units_reserved; /*
#define ALLOC_SIZE (1UL << (CONFIG_CPU_AREA_ORDER + PAGE_SHIFT))
static u8 cpu_area[NR_CPUS * ALLOC_SIZE];
-static DECLARE_BITMAP(cpu_alloc_map, UNITS);
+static DECLARE_BITMAP(cpu_alloc_map, UNITS_PER_BLOCK);
+#define cpu_alloc_map_order CONFIG_CPU_AREA_ORDER
+#define units_total UNITS_PER_BLOCK
+
+static inline int expand_cpu_area(gfp_t flags)
+{
+ return -ENOSYS;
+}
void * __init boot_cpu_alloc(unsigned long size)
{
unsigned long x = units_reserved;
units_reserved += size_to_units(size);
- BUG_ON(units_reserved > UNITS);
+ BUG_ON(units_reserved > units_total);
return cpu_area + x * UNIT_SIZE;
}
+#endif
static int first_free; /* First known free unit */
@@ -98,20 +355,30 @@ void *cpu_alloc(unsigned long size, gfp_
int units = size_to_units(size);
void *ptr;
int first;
+ unsigned long map_size;
unsigned long flags;
BUG_ON(gfpflags & ~(GFP_RECLAIM_MASK | __GFP_ZERO));
spin_lock_irqsave(&cpu_alloc_map_lock, flags);
+restart:
+ if (cpu_alloc_map_order >= 0)
+ map_size = PAGE_SIZE << cpu_alloc_map_order;
+ else
+ map_size = 0;
+
first = 1;
start = first_free;
for ( ; ; ) {
- start = find_next_zero_bit(cpu_alloc_map, ALLOC_SIZE, start);
- if (start >= UNITS - units_reserved)
+ start = find_next_zero_bit(cpu_alloc_map, map_size, start);
+ if (start >= units_total - units_reserved) {
+ if (!expand_cpu_area(gfpflags))
+ goto restart;
goto out_of_memory;
+ }
if (first)
first_free = start;
@@ -121,7 +388,7 @@ void *cpu_alloc(unsigned long size, gfp_
* the starting unit.
*/
if ((start + units_reserved) % (align / UNIT_SIZE) == 0 &&
- find_next_bit(cpu_alloc_map, ALLOC_SIZE, start + 1)
+ find_next_bit(cpu_alloc_map, map_size, start + 1)
>= start + units)
break;
start++;
@@ -131,8 +398,10 @@ void *cpu_alloc(unsigned long size, gfp_
if (first)
first_free = start + units;
- if (start + units > UNITS - units_reserved)
- goto out_of_memory;
+ while (start + units > units_total - units_reserved) {
+ if (expand_cpu_area(gfpflags))
+ goto out_of_memory;
+ }
set_map(start, units);
__count_vm_events(CPU_BYTES, units * UNIT_SIZE);
@@ -170,7 +439,7 @@ void cpu_free(void *start, unsigned long
BUG_ON(p < (cpu_area + units_reserved * UNIT_SIZE));
index = (p - cpu_area) / UNIT_SIZE - units_reserved;
BUG_ON(!test_bit(index, cpu_alloc_map) ||
- index >= UNITS - units_reserved);
+ index >= units_total - units_reserved);
spin_lock_irqsave(&cpu_alloc_map_lock, flags);
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h 2007-11-16 14:54:33.186431271 -0800
+++ linux-2.6/include/linux/mm.h 2007-11-16 14:54:37.106404761 -0800
@@ -1137,5 +1137,18 @@ int vmemmap_populate_basepages(struct pa
unsigned long pages, int node);
int vmemmap_populate(struct page *start_page, unsigned long pages, int node);
+pgd_t *cpu_area_pgd_populate(unsigned long addr, gfp_t flags, int node);
+pud_t *cpu_area_pud_populate(pgd_t *pgd, unsigned long addr,
+ gfp_t flags, int node);
+pmd_t *cpu_area_pmd_populate(pud_t *pud, unsigned long addr,
+ gfp_t flags, int node);
+pte_t *cpu_area_pte_populate(pmd_t *pmd, unsigned long addr,
+ gfp_t flags, int node);
+void *cpu_area_alloc_block(unsigned long size, gfp_t flags, int node);
+int cpu_area_populate_basepages(void *start, unsigned long size,
+ gfp_t flags, int node);
+int cpu_area_populate(void *start, unsigned long size,
+ gfp_t flags, int node);
+
#endif /* __KERNEL__ */
#endif /* _LINUX_MM_H */
Index: linux-2.6/mm/Kconfig
===================================================================
--- linux-2.6.orig/mm/Kconfig 2007-11-16 14:54:29.890430938 -0800
+++ linux-2.6/mm/Kconfig 2007-11-16 14:55:07.364981597 -0800
@@ -197,7 +197,13 @@ config VIRT_TO_BUS
config CPU_AREA_ORDER
int "Maximum size (order) of CPU area"
- default "3"
+ default "10" if CPU_AREA_VIRTUAL
+ default "3" if !CPU_AREA_VIRTUAL
help
Sets the maximum amount of memory that can be allocated via cpu_alloc
- The size is set in page order, so 0 = PAGE_SIZE, 1 = PAGE_SIZE << 1 etc.
+ The size is set in page order. The size set (times the maximum
+ number of processors) determines the amount of virtual memory that
+ is set aside for the per cpu areas for virtualized cpu areas or the
+ amount of memory allocated in the bss segment for non virtualized
+ cpu areas.
+
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 06/30] cpu alloc: x86 support
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
` (4 preceding siblings ...)
2007-11-16 23:09 ` [patch 05/30] cpu_alloc: Implement dynamically extendable cpu areas Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
2007-11-16 23:09 ` [patch 07/30] cpu alloc: IA64 support Christoph Lameter
` (23 subsequent siblings)
29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[-- Attachment #1: cpu_alloc_x86_support --]
[-- Type: text/plain, Size: 6345 bytes --]
64 bit:
Set up a cpu area that allows the use of up 16MB for each processor.
Cpu memory use can grow a bit. F.e. if we assume that a pageset
occupies 64 bytes of memory and we have 3 zones in each of 1024 nodes
then we need 3 * 1k * 16k = 50 million pagesets or 3096 pagesets per
processor. This results in a total of 3.2 GB of page structs.
Each cpu needs around 200k of cpu storage for the page allocator alone.
So its a worth it to use a 2M huge mapping here.
For the UP and SMP case map the area using 4k ptes. Typical use of per cpu
data is around 16k for UP and SMP configurations. It goes up to 45k when the
per cpu area is managed by cpu_alloc (see special x86_64 patchset).
Allocating in 2M segments would be overkill.
For NUMA map the area using 2M PMDs. A large NUMA system may use
lots of cpu data for the page allocator data alone. We typically
have large amounts of memory around on those size. Using a 2M page size
reduces TLB pressure for that case.
Some numbers for envisioned maximum configurations of NUMA systems:
4k cpu configurations with 1k nodes:
4096 * 16MB = 64TB of virtual space.
Maximum theoretical configuration 16384 processors 1k nodes:
16384 * 16MB = 256TB of virtual space.
Both fit within the established limits established.
32 bit:
Setup a 256 kB area for the cpu areas below the FIXADDR area.
The use of the cpu alloc area is pretty minimal on i386. An 8p system
with no extras uses only ~8kb. So 256kb should be plenty. A configuration
that supports up to 8 processors takes up 2MB of the scarce
virtual address space
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
arch/x86/Kconfig | 15 +++++++++++++++
arch/x86/mm/init_32.c | 3 +++
arch/x86/mm/init_64.c | 38 ++++++++++++++++++++++++++++++++++++++
include/asm-x86/pgtable_32.h | 7 +++++--
include/asm-x86/pgtable_64.h | 1 +
5 files changed, 62 insertions(+), 2 deletions(-)
Index: linux-2.6/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_64.c 2007-11-15 21:24:47.059904335 -0800
+++ linux-2.6/arch/x86/mm/init_64.c 2007-11-15 21:25:18.578584246 -0800
@@ -781,3 +781,41 @@ int __meminit vmemmap_populate(struct pa
return 0;
}
#endif
+
+#ifdef CONFIG_NUMA
+int __meminit cpu_area_populate(void *start, unsigned long size,
+ gfp_t flags, int node)
+{
+ unsigned long addr = (unsigned long)start;
+ unsigned long end = addr + size;
+ unsigned long next;
+ pgd_t *pgd;
+ pud_t *pud;
+ pmd_t *pmd;
+
+ for (; addr < end; addr = next) {
+ next = pmd_addr_end(addr, end);
+
+ pgd = cpu_area_pgd_populate(addr, flags, node);
+ if (!pgd)
+ return -ENOMEM;
+ pud = cpu_area_pud_populate(pgd, addr, flags, node);
+ if (!pud)
+ return -ENOMEM;
+
+ pmd = pmd_offset(pud, addr);
+ if (pmd_none(*pmd)) {
+ pte_t entry;
+ void *p = cpu_area_alloc_block(PMD_SIZE, flags, node);
+ if (!p)
+ return -ENOMEM;
+
+ entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL);
+ mk_pte_huge(entry);
+ set_pmd(pmd, __pmd(pte_val(entry)));
+ }
+ }
+
+ return 0;
+}
+#endif
Index: linux-2.6/include/asm-x86/pgtable_64.h
===================================================================
--- linux-2.6.orig/include/asm-x86/pgtable_64.h 2007-11-15 21:24:47.079904686 -0800
+++ linux-2.6/include/asm-x86/pgtable_64.h 2007-11-15 21:25:18.578584246 -0800
@@ -138,6 +138,7 @@ static inline pte_t ptep_get_and_clear_f
#define VMALLOC_START _AC(0xffffc20000000000, UL)
#define VMALLOC_END _AC(0xffffe1ffffffffff, UL)
#define VMEMMAP_START _AC(0xffffe20000000000, UL)
+#define CPU_AREA_BASE _AC(0xfffff20000000000, UL)
#define MODULES_VADDR _AC(0xffffffff88000000, UL)
#define MODULES_END _AC(0xfffffffffff00000, UL)
#define MODULES_LEN (MODULES_END - MODULES_VADDR)
Index: linux-2.6/arch/x86/Kconfig
===================================================================
--- linux-2.6.orig/arch/x86/Kconfig 2007-11-15 21:24:47.075904383 -0800
+++ linux-2.6/arch/x86/Kconfig 2007-11-15 21:25:18.578584246 -0800
@@ -163,6 +163,21 @@ config X86_TRAMPOLINE
config KTIME_SCALAR
def_bool X86_32
+
+config CPU_AREA_VIRTUAL
+ bool
+ default y
+
+config CPU_AREA_ORDER
+ int
+ default "16" if X86_64
+ default "6" if X86_32
+
+config CPU_AREA_ALLOC_ORDER
+ int
+ default "9" if NUMA && X86_64
+ default "0" if !NUMA || X86_32
+
source "init/Kconfig"
menu "Processor type and features"
Index: linux-2.6/arch/x86/mm/init_32.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_32.c 2007-11-15 21:24:47.067904108 -0800
+++ linux-2.6/arch/x86/mm/init_32.c 2007-11-15 21:25:18.578584246 -0800
@@ -674,6 +674,7 @@ void __init mem_init(void)
#if 1 /* double-sanity-check paranoia */
printk("virtual kernel memory layout:\n"
" fixmap : 0x%08lx - 0x%08lx (%4ld kB)\n"
+ " cpu area: 0x%08lx - 0x%08lx (%4ld kb)\n"
#ifdef CONFIG_HIGHMEM
" pkmap : 0x%08lx - 0x%08lx (%4ld kB)\n"
#endif
@@ -684,6 +685,8 @@ void __init mem_init(void)
" .text : 0x%08lx - 0x%08lx (%4ld kB)\n",
FIXADDR_START, FIXADDR_TOP,
(FIXADDR_TOP - FIXADDR_START) >> 10,
+ CPU_AREA_BASE, FIXADDR_START,
+ (FIXADDR_START - CPU_AREA_BASE) >> 10,
#ifdef CONFIG_HIGHMEM
PKMAP_BASE, PKMAP_BASE+LAST_PKMAP*PAGE_SIZE,
Index: linux-2.6/include/asm-x86/pgtable_32.h
===================================================================
--- linux-2.6.orig/include/asm-x86/pgtable_32.h 2007-11-15 21:24:47.087904440 -0800
+++ linux-2.6/include/asm-x86/pgtable_32.h 2007-11-15 21:25:18.578584246 -0800
@@ -79,11 +79,14 @@ void paging_init(void);
#define VMALLOC_START (((unsigned long) high_memory + \
2*VMALLOC_OFFSET-1) & ~(VMALLOC_OFFSET-1))
#ifdef CONFIG_HIGHMEM
-# define VMALLOC_END (PKMAP_BASE-2*PAGE_SIZE)
+# define CPU_AREA_BASE (PKMAP_BASE - NR_CPUS * \
+ (1 << (CONFIG_CPU_AREA_ORDER + PAGE_SHIFT)))
#else
-# define VMALLOC_END (FIXADDR_START-2*PAGE_SIZE)
+# define CPU_AREA_BASE (FIXADDR_START - NR_CPUS * \
+ (1 << (CONFIG_CPU_AREA_ORDER + PAGE_SHIFT)))
#endif
+#define VMALLOC_END (CPU_AREA_BASE - 2 * PAGE_SIZE)
/*
* _PAGE_PSE set in the page directory entry just means that
* the page directory entry points directly to a 4MB-aligned block of
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 07/30] cpu alloc: IA64 support
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
` (5 preceding siblings ...)
2007-11-16 23:09 ` [patch 06/30] cpu alloc: x86 support Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
2007-11-16 23:32 ` Luck, Tony
2007-11-16 23:09 ` [patch 08/30] cpu_alloc: Sparc64 support Christoph Lameter
` (22 subsequent siblings)
29 siblings, 1 reply; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[-- Attachment #1: cpu_alloc_ia64_support --]
[-- Type: text/plain, Size: 3008 bytes --]
Typical use of per cpu memory for a small system of 8G 8p 4node is less than
64k per cpu memory. This is increasing rapidly for larger systems where we can
get up to 512k or 1M of memory used for cpu storage.
The maximum size allowed of the cpu area is 128MB of memory.
The cpu area is placed in region 5 with the kernel, vmemmap and vmalloc areas.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
arch/ia64/Kconfig | 13 +++++++++++++
include/asm-ia64/pgtable.h | 32 ++++++++++++++++++++++++++------
2 files changed, 39 insertions(+), 6 deletions(-)
Index: linux-2.6/arch/ia64/Kconfig
===================================================================
--- linux-2.6.orig/arch/ia64/Kconfig 2007-11-15 21:24:46.991154957 -0800
+++ linux-2.6/arch/ia64/Kconfig 2007-11-16 14:43:17.277214329 -0800
@@ -99,6 +99,19 @@ config AUDIT_ARCH
bool
default y
+config CPU_AREA_VIRTUAL
+ bool
+ default y
+
+# Maximum of 128 MB cpu_alloc space per cpu
+config CPU_AREA_ORDER
+ int
+ default "13"
+
+config CPU_AREA_ALLOC_ORDER
+ int
+ default "0"
+
choice
prompt "System type"
default IA64_GENERIC
Index: linux-2.6/include/asm-ia64/pgtable.h
===================================================================
--- linux-2.6.orig/include/asm-ia64/pgtable.h 2007-11-15 21:24:47.003154534 -0800
+++ linux-2.6/include/asm-ia64/pgtable.h 2007-11-16 14:42:57.629964336 -0800
@@ -224,21 +224,41 @@ ia64_phys_addr_valid (unsigned long addr
*/
+/*
+ * Layout of RGN_GATE
+ *
+ * 47 bits wide (16kb pages)
+ *
+ * 0xa000000000000000-0xa00000200000000 8G Kernel data area
+ * 0xa000002000000000-0xa00040000000000 64T vmalloc
+ * 0xa000400000000000-0xa00060000000000 32T vmemmmap
+ * 0xa000600000000000-0xa00080000000000 32T cpu area
+ *
+ * 55 bits wide (64kb pages)
+ *
+ * 0xa000000000000000-0xa00000200000000 8G Kernel data area
+ * 0xa000002000000000-0xa04000000000000 16P vmalloc
+ * 0xa040000000000000-0xa06000000000000 8P vmemmmap
+ * 0xa060000000000000-0xa08000000000000 8P cpu area
+ */
+
#define VMALLOC_START (RGN_BASE(RGN_GATE) + 0x200000000UL)
+#define VMALLOC_END_INIT (RGN_BASE(RGN_GATE) + (1UL << (4*PAGE_SHIFT - 10)))
+
#ifdef CONFIG_VIRTUAL_MEM_MAP
-# define VMALLOC_END_INIT (RGN_BASE(RGN_GATE) + (1UL << (4*PAGE_SHIFT - 9)))
# define VMALLOC_END vmalloc_end
extern unsigned long vmalloc_end;
#else
+# define VMALLOC_END VMALLOC_END_INIT
+#endif
+
#if defined(CONFIG_SPARSEMEM) && defined(CONFIG_SPARSEMEM_VMEMMAP)
/* SPARSEMEM_VMEMMAP uses half of vmalloc... */
-# define VMALLOC_END (RGN_BASE(RGN_GATE) + (1UL << (4*PAGE_SHIFT - 10)))
-# define vmemmap ((struct page *)VMALLOC_END)
-#else
-# define VMALLOC_END (RGN_BASE(RGN_GATE) + (1UL << (4*PAGE_SHIFT - 9)))
-#endif
+# define vmemmap ((struct page *)VMALLOC_END_INIT)
#endif
+#define CPU_AREA_BASE (RGN_BASE(RGN_GATE) + (3UL << (4*PAGE_SHIFT - 11)))
+
/* fs/proc/kcore.c */
#define kc_vaddr_to_offset(v) ((v) - RGN_BASE(RGN_GATE))
#define kc_offset_to_vaddr(o) ((o) + RGN_BASE(RGN_GATE))
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 08/30] cpu_alloc: Sparc64 support
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
` (6 preceding siblings ...)
2007-11-16 23:09 ` [patch 07/30] cpu alloc: IA64 support Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
2007-11-16 23:09 ` [patch 09/30] cpu alloc: percpu_counter conversion Christoph Lameter
` (21 subsequent siblings)
29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[-- Attachment #1: cpu_alloc_sparc64 --]
[-- Type: text/plain, Size: 1596 bytes --]
Enable a simple virtual configuration with 32MB available per cpu so that
we do not use a static area on sparc64.
[Not tested. I have no sparc64]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
arch/sparc64/Kconfig | 15 +++++++++++++++
include/asm-sparc64/pgtable.h | 1 +
2 files changed, 16 insertions(+)
Index: linux-2.6/arch/sparc64/Kconfig
===================================================================
--- linux-2.6.orig/arch/sparc64/Kconfig 2007-11-15 21:24:46.942815842 -0800
+++ linux-2.6/arch/sparc64/Kconfig 2007-11-15 21:25:27.790904212 -0800
@@ -103,6 +103,21 @@ config SPARC64_PAGE_SIZE_4MB
endchoice
+config CPU_AREA_VIRTUAL
+ bool
+ default y
+
+config CPU_AREA_ORDER
+ int
+ default "11" if SPARC64_PAGE_SIZE_8KB
+ default "9" if SPARC64_PAGE_SIZE_64KB
+ default "6" if SPARC64_PAGE_SIZE_512KB
+ default "3" if SPARC64_PAGE_SIZE_4MB
+
+config CPU_AREA_ALLOC_ORDER
+ int
+ default "0"
+
config SECCOMP
bool "Enable seccomp to safely compute untrusted bytecode"
depends on PROC_FS
Index: linux-2.6/include/asm-sparc64/pgtable.h
===================================================================
--- linux-2.6.orig/include/asm-sparc64/pgtable.h 2007-11-15 21:24:46.950904404 -0800
+++ linux-2.6/include/asm-sparc64/pgtable.h 2007-11-15 21:25:27.794904145 -0800
@@ -43,6 +43,7 @@
#define VMALLOC_START _AC(0x0000000100000000,UL)
#define VMALLOC_END _AC(0x0000000200000000,UL)
#define VMEMMAP_BASE _AC(0x0000000200000000,UL)
+#define CPU_AREA_BASE _AC(0x0000000300000000,UL)
#define vmemmap ((struct page *)VMEMMAP_BASE)
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 09/30] cpu alloc: percpu_counter conversion
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
` (7 preceding siblings ...)
2007-11-16 23:09 ` [patch 08/30] cpu_alloc: Sparc64 support Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
2007-11-16 23:09 ` [patch 10/30] cpu alloc: crash_notes conversion Christoph Lameter
` (20 subsequent siblings)
29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[-- Attachment #1: 0019-cpu-alloc-percpu_counter-conversion.patch --]
[-- Type: text/plain, Size: 2041 bytes --]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
lib/percpu_counter.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
Index: linux-2.6/lib/percpu_counter.c
===================================================================
--- linux-2.6.orig/lib/percpu_counter.c 2007-11-15 21:24:46.878154362 -0800
+++ linux-2.6/lib/percpu_counter.c 2007-11-15 21:25:28.963154085 -0800
@@ -20,7 +20,7 @@ void percpu_counter_set(struct percpu_co
spin_lock(&fbc->lock);
for_each_possible_cpu(cpu) {
- s32 *pcount = per_cpu_ptr(fbc->counters, cpu);
+ s32 *pcount = CPU_PTR(fbc->counters, cpu);
*pcount = 0;
}
fbc->count = amount;
@@ -34,7 +34,7 @@ void __percpu_counter_add(struct percpu_
s32 *pcount;
int cpu = get_cpu();
- pcount = per_cpu_ptr(fbc->counters, cpu);
+ pcount = CPU_PTR(fbc->counters, cpu);
count = *pcount + amount;
if (count >= batch || count <= -batch) {
spin_lock(&fbc->lock);
@@ -60,7 +60,7 @@ s64 __percpu_counter_sum(struct percpu_c
spin_lock(&fbc->lock);
ret = fbc->count;
for_each_online_cpu(cpu) {
- s32 *pcount = per_cpu_ptr(fbc->counters, cpu);
+ s32 *pcount = CPU_PTR(fbc->counters, cpu);
ret += *pcount;
}
spin_unlock(&fbc->lock);
@@ -74,7 +74,7 @@ int percpu_counter_init(struct percpu_co
{
spin_lock_init(&fbc->lock);
fbc->count = amount;
- fbc->counters = alloc_percpu(s32);
+ fbc->counters = CPU_ALLOC(s32, GFP_KERNEL|__GFP_ZERO);
if (!fbc->counters)
return -ENOMEM;
#ifdef CONFIG_HOTPLUG_CPU
@@ -101,7 +101,7 @@ void percpu_counter_destroy(struct percp
if (!fbc->counters)
return;
- free_percpu(fbc->counters);
+ CPU_FREE(fbc->counters);
#ifdef CONFIG_HOTPLUG_CPU
mutex_lock(&percpu_counters_lock);
list_del(&fbc->list);
@@ -127,7 +127,7 @@ static int __cpuinit percpu_counter_hotc
unsigned long flags;
spin_lock_irqsave(&fbc->lock, flags);
- pcount = per_cpu_ptr(fbc->counters, cpu);
+ pcount = CPU_PTR(fbc->counters, cpu);
fbc->count += *pcount;
*pcount = 0;
spin_unlock_irqrestore(&fbc->lock, flags);
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 10/30] cpu alloc: crash_notes conversion
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
` (8 preceding siblings ...)
2007-11-16 23:09 ` [patch 09/30] cpu alloc: percpu_counter conversion Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
2007-11-16 23:09 ` [patch 11/30] cpu alloc: workqueue conversion Christoph Lameter
` (19 subsequent siblings)
29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[-- Attachment #1: 0020-cpu-alloc-crash_notes-conversion.patch --]
[-- Type: text/plain, Size: 2331 bytes --]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
arch/ia64/kernel/crash.c | 2 +-
drivers/base/cpu.c | 2 +-
kernel/kexec.c | 4 ++--
3 files changed, 4 insertions(+), 4 deletions(-)
Index: linux-2.6/arch/ia64/kernel/crash.c
===================================================================
--- linux-2.6.orig/arch/ia64/kernel/crash.c 2007-11-15 21:18:10.647904573 -0800
+++ linux-2.6/arch/ia64/kernel/crash.c 2007-11-15 21:25:29.423155123 -0800
@@ -71,7 +71,7 @@ crash_save_this_cpu(void)
dst[46] = (unsigned long)ia64_rse_skip_regs((unsigned long *)dst[46],
sof - sol);
- buf = (u64 *) per_cpu_ptr(crash_notes, cpu);
+ buf = (u64 *) CPU_PTR(crash_notes, cpu);
if (!buf)
return;
buf = append_elf_note(buf, KEXEC_CORE_NOTE_NAME, NT_PRSTATUS, prstatus,
Index: linux-2.6/drivers/base/cpu.c
===================================================================
--- linux-2.6.orig/drivers/base/cpu.c 2007-11-15 21:18:10.655904442 -0800
+++ linux-2.6/drivers/base/cpu.c 2007-11-15 21:25:29.423155123 -0800
@@ -95,7 +95,7 @@ static ssize_t show_crash_notes(struct s
* boot up and this data does not change there after. Hence this
* operation should be safe. No locking required.
*/
- addr = __pa(per_cpu_ptr(crash_notes, cpunum));
+ addr = __pa(CPU_PTR(crash_notes, cpunum));
rc = sprintf(buf, "%Lx\n", addr);
return rc;
}
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c 2007-11-15 21:18:10.663904549 -0800
+++ linux-2.6/kernel/kexec.c 2007-11-15 21:25:29.423155123 -0800
@@ -1122,7 +1122,7 @@ void crash_save_cpu(struct pt_regs *regs
* squirrelled away. ELF notes happen to provide
* all of that, so there is no need to invent something new.
*/
- buf = (u32*)per_cpu_ptr(crash_notes, cpu);
+ buf = (u32*)CPU_PTR(crash_notes, cpu);
if (!buf)
return;
memset(&prstatus, 0, sizeof(prstatus));
@@ -1136,7 +1136,7 @@ void crash_save_cpu(struct pt_regs *regs
static int __init crash_notes_memory_init(void)
{
/* Allocate memory for saving cpu registers. */
- crash_notes = alloc_percpu(note_buf_t);
+ crash_notes = CPU_ALLOC(note_buf_t, GFP_KERNEL|__GFP_ZERO);
if (!crash_notes) {
printk("Kexec: Memory allocation for saving cpu register"
" states failed\n");
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 11/30] cpu alloc: workqueue conversion
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
` (9 preceding siblings ...)
2007-11-16 23:09 ` [patch 10/30] cpu alloc: crash_notes conversion Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
2007-11-16 23:09 ` [patch 12/30] cpu alloc: ACPI cstate handling conversion Christoph Lameter
` (18 subsequent siblings)
29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[-- Attachment #1: 0021-cpu-alloc-workqueue-conversion.patch --]
[-- Type: text/plain, Size: 3414 bytes --]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
kernel/workqueue.c | 27 ++++++++++++++-------------
1 file changed, 14 insertions(+), 13 deletions(-)
Index: linux-2.6/kernel/workqueue.c
===================================================================
--- linux-2.6.orig/kernel/workqueue.c 2007-11-15 21:18:11.726153923 -0800
+++ linux-2.6/kernel/workqueue.c 2007-11-15 21:25:29.966154099 -0800
@@ -100,7 +100,7 @@ struct cpu_workqueue_struct *wq_per_cpu(
{
if (unlikely(is_single_threaded(wq)))
cpu = singlethread_cpu;
- return per_cpu_ptr(wq->cpu_wq, cpu);
+ return CPU_PTR(wq->cpu_wq, cpu);
}
/*
@@ -398,7 +398,7 @@ void fastcall flush_workqueue(struct wor
lock_acquire(&wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
lock_release(&wq->lockdep_map, 1, _THIS_IP_);
for_each_cpu_mask(cpu, *cpu_map)
- flush_cpu_workqueue(per_cpu_ptr(wq->cpu_wq, cpu));
+ flush_cpu_workqueue(CPU_PTR(wq->cpu_wq, cpu));
}
EXPORT_SYMBOL_GPL(flush_workqueue);
@@ -478,7 +478,7 @@ static void wait_on_work(struct work_str
cpu_map = wq_cpu_map(wq);
for_each_cpu_mask(cpu, *cpu_map)
- wait_on_cpu_work(per_cpu_ptr(wq->cpu_wq, cpu), work);
+ wait_on_cpu_work(CPU_PTR(wq->cpu_wq, cpu), work);
}
static int __cancel_work_timer(struct work_struct *work,
@@ -601,21 +601,21 @@ int schedule_on_each_cpu(work_func_t fun
int cpu;
struct work_struct *works;
- works = alloc_percpu(struct work_struct);
+ works = CPU_ALLOC(struct work_struct, GFP_KERNEL);
if (!works)
return -ENOMEM;
preempt_disable(); /* CPU hotplug */
for_each_online_cpu(cpu) {
- struct work_struct *work = per_cpu_ptr(works, cpu);
+ struct work_struct *work = CPU_PTR(works, cpu);
INIT_WORK(work, func);
set_bit(WORK_STRUCT_PENDING, work_data_bits(work));
- __queue_work(per_cpu_ptr(keventd_wq->cpu_wq, cpu), work);
+ __queue_work(CPU_PTR(keventd_wq->cpu_wq, cpu), work);
}
preempt_enable();
flush_workqueue(keventd_wq);
- free_percpu(works);
+ CPU_FREE(works);
return 0;
}
@@ -664,7 +664,7 @@ int current_is_keventd(void)
BUG_ON(!keventd_wq);
- cwq = per_cpu_ptr(keventd_wq->cpu_wq, cpu);
+ cwq = CPU_PTR(keventd_wq->cpu_wq, cpu);
if (current == cwq->thread)
ret = 1;
@@ -675,7 +675,7 @@ int current_is_keventd(void)
static struct cpu_workqueue_struct *
init_cpu_workqueue(struct workqueue_struct *wq, int cpu)
{
- struct cpu_workqueue_struct *cwq = per_cpu_ptr(wq->cpu_wq, cpu);
+ struct cpu_workqueue_struct *cwq = CPU_PTR(wq->cpu_wq, cpu);
cwq->wq = wq;
spin_lock_init(&cwq->lock);
@@ -732,7 +732,8 @@ struct workqueue_struct *__create_workqu
if (!wq)
return NULL;
- wq->cpu_wq = alloc_percpu(struct cpu_workqueue_struct);
+ wq->cpu_wq = CPU_ALLOC(struct cpu_workqueue_struct,
+ GFP_KERNEL|__GFP_ZERO);
if (!wq->cpu_wq) {
kfree(wq);
return NULL;
@@ -814,11 +815,11 @@ void destroy_workqueue(struct workqueue_
mutex_unlock(&workqueue_mutex);
for_each_cpu_mask(cpu, *cpu_map) {
- cwq = per_cpu_ptr(wq->cpu_wq, cpu);
+ cwq = CPU_PTR(wq->cpu_wq, cpu);
cleanup_workqueue_thread(cwq, cpu);
}
- free_percpu(wq->cpu_wq);
+ CPU_FREE(wq->cpu_wq);
kfree(wq);
}
EXPORT_SYMBOL_GPL(destroy_workqueue);
@@ -847,7 +848,7 @@ static int __devinit workqueue_cpu_callb
}
list_for_each_entry(wq, &workqueues, list) {
- cwq = per_cpu_ptr(wq->cpu_wq, cpu);
+ cwq = CPU_PTR(wq->cpu_wq, cpu);
switch (action) {
case CPU_UP_PREPARE:
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 12/30] cpu alloc: ACPI cstate handling conversion
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
` (10 preceding siblings ...)
2007-11-16 23:09 ` [patch 11/30] cpu alloc: workqueue conversion Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
2007-11-16 23:09 ` [patch 13/30] cpu alloc: genhd statistics conversion Christoph Lameter
` (17 subsequent siblings)
29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[-- Attachment #1: 0022-cpu-alloc-ACPI-cstate-handling-conversion.patch --]
[-- Type: text/plain, Size: 3543 bytes --]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
arch/x86/kernel/acpi/cstate.c | 9 +++++----
arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c | 7 ++++---
drivers/acpi/processor_perflib.c | 4 ++--
3 files changed, 11 insertions(+), 9 deletions(-)
Index: linux-2.6/arch/x86/kernel/acpi/cstate.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/acpi/cstate.c 2007-11-15 21:18:09.238904115 -0800
+++ linux-2.6/arch/x86/kernel/acpi/cstate.c 2007-11-15 21:25:30.499154221 -0800
@@ -87,7 +87,7 @@ int acpi_processor_ffh_cstate_probe(unsi
if (reg->bit_offset != NATIVE_CSTATE_BEYOND_HALT)
return -1;
- percpu_entry = per_cpu_ptr(cpu_cstate_entry, cpu);
+ percpu_entry = CPU_PTR(cpu_cstate_entry, cpu);
percpu_entry->states[cx->index].eax = 0;
percpu_entry->states[cx->index].ecx = 0;
@@ -138,7 +138,7 @@ void acpi_processor_ffh_cstate_enter(str
unsigned int cpu = smp_processor_id();
struct cstate_entry *percpu_entry;
- percpu_entry = per_cpu_ptr(cpu_cstate_entry, cpu);
+ percpu_entry = CPU_PTR(cpu_cstate_entry, cpu);
mwait_idle_with_hints(percpu_entry->states[cx->index].eax,
percpu_entry->states[cx->index].ecx);
}
@@ -150,13 +150,14 @@ static int __init ffh_cstate_init(void)
if (c->x86_vendor != X86_VENDOR_INTEL)
return -1;
- cpu_cstate_entry = alloc_percpu(struct cstate_entry);
+ cpu_cstate_entry = CPU_ALLOC(struct cstate_entry,
+ GFP_KERNEL|__GFP_ZERO);
return 0;
}
static void __exit ffh_cstate_exit(void)
{
- free_percpu(cpu_cstate_entry);
+ CPU_FREE(cpu_cstate_entry);
cpu_cstate_entry = NULL;
}
Index: linux-2.6/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c 2007-11-15 21:18:09.246904080 -0800
+++ linux-2.6/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c 2007-11-15 21:25:30.499154221 -0800
@@ -513,7 +513,8 @@ static int __init acpi_cpufreq_early_ini
{
dprintk("acpi_cpufreq_early_init\n");
- acpi_perf_data = alloc_percpu(struct acpi_processor_performance);
+ acpi_perf_data = CPU_ALLOC(struct acpi_processor_performance,
+ GFP_KERNEL|__GFP_ZERO);
if (!acpi_perf_data) {
dprintk("Memory allocation error for acpi_perf_data.\n");
return -ENOMEM;
@@ -569,7 +570,7 @@ static int acpi_cpufreq_cpu_init(struct
if (!data)
return -ENOMEM;
- data->acpi_data = percpu_ptr(acpi_perf_data, cpu);
+ data->acpi_data = CPU_PTR(acpi_perf_data, cpu);
drv_data[cpu] = data;
if (cpu_has(c, X86_FEATURE_CONSTANT_TSC))
@@ -782,7 +783,7 @@ static void __exit acpi_cpufreq_exit(voi
cpufreq_unregister_driver(&acpi_cpufreq_driver);
- free_percpu(acpi_perf_data);
+ CPU_FREE(acpi_perf_data);
return;
}
Index: linux-2.6/drivers/acpi/processor_perflib.c
===================================================================
--- linux-2.6.orig/drivers/acpi/processor_perflib.c 2007-11-15 21:18:09.254904773 -0800
+++ linux-2.6/drivers/acpi/processor_perflib.c 2007-11-15 21:25:30.499154221 -0800
@@ -567,12 +567,12 @@ int acpi_processor_preregister_performan
continue;
}
- if (!performance || !percpu_ptr(performance, i)) {
+ if (!performance || !CPU_PTR(performance, i)) {
retval = -EINVAL;
continue;
}
- pr->performance = percpu_ptr(performance, i);
+ pr->performance = CPU_PTR(performance, i);
cpu_set(i, pr->performance->shared_cpu_map);
if (acpi_processor_get_psd(pr)) {
retval = -EINVAL;
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 13/30] cpu alloc: genhd statistics conversion
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
` (11 preceding siblings ...)
2007-11-16 23:09 ` [patch 12/30] cpu alloc: ACPI cstate handling conversion Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
2007-11-16 23:09 ` [patch 14/30] cpu alloc: blktrace conversion Christoph Lameter
` (16 subsequent siblings)
29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[-- Attachment #1: 0023-cpu-alloc-genhd-statistics-conversion.patch --]
[-- Type: text/plain, Size: 1772 bytes --]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
include/linux/genhd.h | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
Index: linux-2.6/include/linux/genhd.h
===================================================================
--- linux-2.6.orig/include/linux/genhd.h 2007-11-15 21:18:07.967654575 -0800
+++ linux-2.6/include/linux/genhd.h 2007-11-15 21:25:31.066904143 -0800
@@ -158,21 +158,21 @@ struct disk_attribute {
*/
#ifdef CONFIG_SMP
#define __disk_stat_add(gendiskp, field, addnd) \
- (per_cpu_ptr(gendiskp->dkstats, smp_processor_id())->field += addnd)
+ (THIS_CPU(gendiskp->dkstats)->field += addnd)
#define disk_stat_read(gendiskp, field) \
({ \
typeof(gendiskp->dkstats->field) res = 0; \
int i; \
for_each_possible_cpu(i) \
- res += per_cpu_ptr(gendiskp->dkstats, i)->field; \
+ res += CPU_PTR(gendiskp->dkstats, i)->field; \
res; \
})
static inline void disk_stat_set_all(struct gendisk *gendiskp, int value) {
int i;
for_each_possible_cpu(i)
- memset(per_cpu_ptr(gendiskp->dkstats, i), value,
+ memset(CPU_PTR(gendiskp->dkstats, i), value,
sizeof (struct disk_stats));
}
@@ -209,7 +209,7 @@ static inline void disk_stat_set_all(str
#ifdef CONFIG_SMP
static inline int init_disk_stats(struct gendisk *disk)
{
- disk->dkstats = alloc_percpu(struct disk_stats);
+ disk->dkstats = CPU_ALLOC(struct disk_stats, GFP_KERNEL | __GFP_ZERO);
if (!disk->dkstats)
return 0;
return 1;
@@ -217,7 +217,7 @@ static inline int init_disk_stats(struct
static inline void free_disk_stats(struct gendisk *disk)
{
- free_percpu(disk->dkstats);
+ CPU_FREE(disk->dkstats);
}
#else /* CONFIG_SMP */
static inline int init_disk_stats(struct gendisk *disk)
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 14/30] cpu alloc: blktrace conversion
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
` (12 preceding siblings ...)
2007-11-16 23:09 ` [patch 13/30] cpu alloc: genhd statistics conversion Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
2007-11-16 23:09 ` [patch 15/30] cpu alloc: SRCU Christoph Lameter
` (15 subsequent siblings)
29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[-- Attachment #1: 0024-cpu-alloc-blktrace-conversion.patch --]
[-- Type: text/plain, Size: 1398 bytes --]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
block/blktrace.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
Index: linux-2.6/block/blktrace.c
===================================================================
--- linux-2.6.orig/block/blktrace.c 2007-11-15 21:17:24.586154116 -0800
+++ linux-2.6/block/blktrace.c 2007-11-15 21:25:31.591154091 -0800
@@ -155,7 +155,7 @@ void __blk_add_trace(struct blk_trace *b
t = relay_reserve(bt->rchan, sizeof(*t) + pdu_len);
if (t) {
cpu = smp_processor_id();
- sequence = per_cpu_ptr(bt->sequence, cpu);
+ sequence = CPU_PTR(bt->sequence, cpu);
t->magic = BLK_IO_TRACE_MAGIC | BLK_IO_TRACE_VERSION;
t->sequence = ++(*sequence);
@@ -227,7 +227,7 @@ static void blk_trace_cleanup(struct blk
relay_close(bt->rchan);
debugfs_remove(bt->dropped_file);
blk_remove_tree(bt->dir);
- free_percpu(bt->sequence);
+ CPU_FREE(bt->sequence);
kfree(bt);
}
@@ -338,7 +338,7 @@ int do_blk_trace_setup(struct request_qu
if (!bt)
goto err;
- bt->sequence = alloc_percpu(unsigned long);
+ bt->sequence = CPU_ALLOC(unsigned long, GFP_KERNEL | __GFP_ZERO);
if (!bt->sequence)
goto err;
@@ -387,7 +387,7 @@ err:
if (bt) {
if (bt->dropped_file)
debugfs_remove(bt->dropped_file);
- free_percpu(bt->sequence);
+ CPU_FREE(bt->sequence);
if (bt->rchan)
relay_close(bt->rchan);
kfree(bt);
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 15/30] cpu alloc: SRCU
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
` (13 preceding siblings ...)
2007-11-16 23:09 ` [patch 14/30] cpu alloc: blktrace conversion Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
2007-11-16 23:09 ` [patch 16/30] cpu alloc: XFS counters Christoph Lameter
` (14 subsequent siblings)
29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[-- Attachment #1: 0025-cpu-alloc-SRCU.patch --]
[-- Type: text/plain, Size: 2573 bytes --]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
kernel/rcutorture.c | 4 ++--
kernel/srcu.c | 11 ++++++-----
2 files changed, 8 insertions(+), 7 deletions(-)
Index: linux-2.6/kernel/rcutorture.c
===================================================================
--- linux-2.6.orig/kernel/rcutorture.c 2007-11-15 21:17:24.515654132 -0800
+++ linux-2.6/kernel/rcutorture.c 2007-11-15 21:25:32.102406141 -0800
@@ -441,8 +441,8 @@ static int srcu_torture_stats(char *page
torture_type, TORTURE_FLAG, idx);
for_each_possible_cpu(cpu) {
cnt += sprintf(&page[cnt], " %d(%d,%d)", cpu,
- per_cpu_ptr(srcu_ctl.per_cpu_ref, cpu)->c[!idx],
- per_cpu_ptr(srcu_ctl.per_cpu_ref, cpu)->c[idx]);
+ CPU_PTR(srcu_ctl.per_cpu_ref, cpu)->c[!idx],
+ CPU_PTR(srcu_ctl.per_cpu_ref, cpu)->c[idx]);
}
cnt += sprintf(&page[cnt], "\n");
return cnt;
Index: linux-2.6/kernel/srcu.c
===================================================================
--- linux-2.6.orig/kernel/srcu.c 2007-11-15 21:17:24.523654368 -0800
+++ linux-2.6/kernel/srcu.c 2007-11-15 21:25:32.102406141 -0800
@@ -46,7 +46,8 @@ int init_srcu_struct(struct srcu_struct
{
sp->completed = 0;
mutex_init(&sp->mutex);
- sp->per_cpu_ref = alloc_percpu(struct srcu_struct_array);
+ sp->per_cpu_ref = CPU_ALLOC(struct srcu_struct_array,
+ GFP_KERNEL|__GFP_ZERO);
return (sp->per_cpu_ref ? 0 : -ENOMEM);
}
@@ -62,7 +63,7 @@ static int srcu_readers_active_idx(struc
sum = 0;
for_each_possible_cpu(cpu)
- sum += per_cpu_ptr(sp->per_cpu_ref, cpu)->c[idx];
+ sum += CPU_PTR(sp->per_cpu_ref, cpu)->c[idx];
return sum;
}
@@ -94,7 +95,7 @@ void cleanup_srcu_struct(struct srcu_str
WARN_ON(sum); /* Leakage unless caller handles error. */
if (sum != 0)
return;
- free_percpu(sp->per_cpu_ref);
+ CPU_FREE(sp->per_cpu_ref);
sp->per_cpu_ref = NULL;
}
@@ -113,7 +114,7 @@ int srcu_read_lock(struct srcu_struct *s
preempt_disable();
idx = sp->completed & 0x1;
barrier(); /* ensure compiler looks -once- at sp->completed. */
- per_cpu_ptr(sp->per_cpu_ref, smp_processor_id())->c[idx]++;
+ THIS_CPU(sp->per_cpu_ref)->c[idx]++;
srcu_barrier(); /* ensure compiler won't misorder critical section. */
preempt_enable();
return idx;
@@ -133,7 +134,7 @@ void srcu_read_unlock(struct srcu_struct
{
preempt_disable();
srcu_barrier(); /* ensure compiler won't misorder critical section. */
- per_cpu_ptr(sp->per_cpu_ref, smp_processor_id())->c[idx]--;
+ THIS_CPU(sp->per_cpu_ref)->c[idx]--;
preempt_enable();
}
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 16/30] cpu alloc: XFS counters
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
` (14 preceding siblings ...)
2007-11-16 23:09 ` [patch 15/30] cpu alloc: SRCU Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
2007-11-19 12:58 ` Christoph Hellwig
2007-11-16 23:09 ` [patch 17/30] cpu alloc: NFS statistics Christoph Lameter
` (13 subsequent siblings)
29 siblings, 1 reply; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[-- Attachment #1: 0026-cpu-alloc-XFS-counters.patch --]
[-- Type: text/plain, Size: 3014 bytes --]
Also remove the useless zeroing after allocation. Allocpercpu already
zeroed the objects.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
fs/xfs/xfs_mount.c | 24 ++++++++----------------
1 file changed, 8 insertions(+), 16 deletions(-)
Index: linux-2.6/fs/xfs/xfs_mount.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_mount.c 2007-11-15 21:17:24.467654585 -0800
+++ linux-2.6/fs/xfs/xfs_mount.c 2007-11-15 21:25:32.643904117 -0800
@@ -1924,7 +1924,7 @@ xfs_icsb_cpu_notify(
mp = (xfs_mount_t *)container_of(nfb, xfs_mount_t, m_icsb_notifier);
cntp = (xfs_icsb_cnts_t *)
- per_cpu_ptr(mp->m_sb_cnts, (unsigned long)hcpu);
+ CPU_PTR(mp->m_sb_cnts, (unsigned long)hcpu);
switch (action) {
case CPU_UP_PREPARE:
case CPU_UP_PREPARE_FROZEN:
@@ -1976,10 +1976,7 @@ int
xfs_icsb_init_counters(
xfs_mount_t *mp)
{
- xfs_icsb_cnts_t *cntp;
- int i;
-
- mp->m_sb_cnts = alloc_percpu(xfs_icsb_cnts_t);
+ mp->m_sb_cnts = CPU_ALLOC(xfs_icsb_cnts_t, GFP_KERNEL | __GFP_ZERO);
if (mp->m_sb_cnts == NULL)
return -ENOMEM;
@@ -1989,11 +1986,6 @@ xfs_icsb_init_counters(
register_hotcpu_notifier(&mp->m_icsb_notifier);
#endif /* CONFIG_HOTPLUG_CPU */
- for_each_online_cpu(i) {
- cntp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, i);
- memset(cntp, 0, sizeof(xfs_icsb_cnts_t));
- }
-
mutex_init(&mp->m_icsb_mutex);
/*
@@ -2026,7 +2018,7 @@ xfs_icsb_destroy_counters(
{
if (mp->m_sb_cnts) {
unregister_hotcpu_notifier(&mp->m_icsb_notifier);
- free_percpu(mp->m_sb_cnts);
+ CPU_FREE(mp->m_sb_cnts);
}
mutex_destroy(&mp->m_icsb_mutex);
}
@@ -2056,7 +2048,7 @@ xfs_icsb_lock_all_counters(
int i;
for_each_online_cpu(i) {
- cntp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, i);
+ cntp = (xfs_icsb_cnts_t *)CPU_PTR(mp->m_sb_cnts, i);
xfs_icsb_lock_cntr(cntp);
}
}
@@ -2069,7 +2061,7 @@ xfs_icsb_unlock_all_counters(
int i;
for_each_online_cpu(i) {
- cntp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, i);
+ cntp = (xfs_icsb_cnts_t *)CPU_PTR(mp->m_sb_cnts, i);
xfs_icsb_unlock_cntr(cntp);
}
}
@@ -2089,7 +2081,7 @@ xfs_icsb_count(
xfs_icsb_lock_all_counters(mp);
for_each_online_cpu(i) {
- cntp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, i);
+ cntp = (xfs_icsb_cnts_t *)CPU_PTR(mp->m_sb_cnts, i);
cnt->icsb_icount += cntp->icsb_icount;
cnt->icsb_ifree += cntp->icsb_ifree;
cnt->icsb_fdblocks += cntp->icsb_fdblocks;
@@ -2167,7 +2159,7 @@ xfs_icsb_enable_counter(
xfs_icsb_lock_all_counters(mp);
for_each_online_cpu(i) {
- cntp = per_cpu_ptr(mp->m_sb_cnts, i);
+ cntp = CPU_PTR(mp->m_sb_cnts, i);
switch (field) {
case XFS_SBS_ICOUNT:
cntp->icsb_icount = count + resid;
@@ -2307,7 +2299,7 @@ xfs_icsb_modify_counters(
might_sleep();
again:
cpu = get_cpu();
- icsbp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, cpu);
+ icsbp = (xfs_icsb_cnts_t *)CPU_PTR(mp->m_sb_cnts, cpu);
/*
* if the counter is disabled, go to slow path
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 17/30] cpu alloc: NFS statistics
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
` (15 preceding siblings ...)
2007-11-16 23:09 ` [patch 16/30] cpu alloc: XFS counters Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
2007-11-16 23:09 ` [patch 18/30] cpu alloc: neigbour statistics Christoph Lameter
` (12 subsequent siblings)
29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[-- Attachment #1: 0027-cpu-alloc-NFS-statistics.patch --]
[-- Type: text/plain, Size: 1804 bytes --]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
fs/nfs/iostat.h | 8 ++++----
fs/nfs/super.c | 2 +-
2 files changed, 5 insertions(+), 5 deletions(-)
Index: linux-2.6/fs/nfs/iostat.h
===================================================================
--- linux-2.6.orig/fs/nfs/iostat.h 2007-11-15 21:17:24.391404458 -0800
+++ linux-2.6/fs/nfs/iostat.h 2007-11-15 21:25:33.167654066 -0800
@@ -123,7 +123,7 @@ static inline void nfs_inc_server_stats(
int cpu;
cpu = get_cpu();
- iostats = per_cpu_ptr(server->io_stats, cpu);
+ iostats = CPU_PTR(server->io_stats, cpu);
iostats->events[stat] ++;
put_cpu_no_resched();
}
@@ -139,7 +139,7 @@ static inline void nfs_add_server_stats(
int cpu;
cpu = get_cpu();
- iostats = per_cpu_ptr(server->io_stats, cpu);
+ iostats = CPU_PTR(server->io_stats, cpu);
iostats->bytes[stat] += addend;
put_cpu_no_resched();
}
@@ -151,13 +151,13 @@ static inline void nfs_add_stats(struct
static inline struct nfs_iostats *nfs_alloc_iostats(void)
{
- return alloc_percpu(struct nfs_iostats);
+ return CPU_ALLOC(struct nfs_iostats, GFP_KERNEL | __GFP_ZERO);
}
static inline void nfs_free_iostats(struct nfs_iostats *stats)
{
if (stats != NULL)
- free_percpu(stats);
+ CPU_FREE(stats);
}
#endif
Index: linux-2.6/fs/nfs/super.c
===================================================================
--- linux-2.6.orig/fs/nfs/super.c 2007-11-15 21:17:24.399404478 -0800
+++ linux-2.6/fs/nfs/super.c 2007-11-15 21:25:33.171654143 -0800
@@ -529,7 +529,7 @@ static int nfs_show_stats(struct seq_fil
struct nfs_iostats *stats;
preempt_disable();
- stats = per_cpu_ptr(nfss->io_stats, cpu);
+ stats = CPU_PTR(nfss->io_stats, cpu);
for (i = 0; i < __NFSIOS_COUNTSMAX; i++)
totals.events[i] += stats->events[i];
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 18/30] cpu alloc: neigbour statistics
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
` (16 preceding siblings ...)
2007-11-16 23:09 ` [patch 17/30] cpu alloc: NFS statistics Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
2007-11-16 23:09 ` [patch 19/30] cpu alloc: tcp statistics Christoph Lameter
` (11 subsequent siblings)
29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[-- Attachment #1: 0028-cpu-alloc-neigbour-statistics.patch --]
[-- Type: text/plain, Size: 2364 bytes --]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
include/net/neighbour.h | 2 +-
net/core/neighbour.c | 11 ++++++-----
2 files changed, 7 insertions(+), 6 deletions(-)
Index: linux-2.6/include/net/neighbour.h
===================================================================
--- linux-2.6.orig/include/net/neighbour.h 2007-11-15 21:17:24.319654300 -0800
+++ linux-2.6/include/net/neighbour.h 2007-11-15 21:25:33.678404221 -0800
@@ -83,7 +83,7 @@ struct neigh_statistics
#define NEIGH_CACHE_STAT_INC(tbl, field) \
do { \
preempt_disable(); \
- (per_cpu_ptr((tbl)->stats, smp_processor_id())->field)++; \
+ (THIS_CPU((tbl)->stats)->field)++; \
preempt_enable(); \
} while (0)
Index: linux-2.6/net/core/neighbour.c
===================================================================
--- linux-2.6.orig/net/core/neighbour.c 2007-11-15 21:17:24.327654639 -0800
+++ linux-2.6/net/core/neighbour.c 2007-11-15 21:25:33.678404221 -0800
@@ -1348,7 +1348,8 @@ void neigh_table_init_no_netlink(struct
kmem_cache_create(tbl->id, tbl->entry_size, 0,
SLAB_HWCACHE_ALIGN|SLAB_PANIC,
NULL);
- tbl->stats = alloc_percpu(struct neigh_statistics);
+ tbl->stats = CPU_ALLOC(struct neigh_statistics,
+ GFP_KERNEL | __GFP_ZERO);
if (!tbl->stats)
panic("cannot create neighbour cache statistics");
@@ -1437,7 +1438,7 @@ int neigh_table_clear(struct neigh_table
remove_proc_entry(tbl->id, init_net.proc_net_stat);
- free_percpu(tbl->stats);
+ CPU_FREE(tbl->stats);
tbl->stats = NULL;
kmem_cache_destroy(tbl->kmem_cachep);
@@ -1694,7 +1695,7 @@ static int neightbl_fill_info(struct sk_
for_each_possible_cpu(cpu) {
struct neigh_statistics *st;
- st = per_cpu_ptr(tbl->stats, cpu);
+ st = CPU_PTR(tbl->stats, cpu);
ndst.ndts_allocs += st->allocs;
ndst.ndts_destroys += st->destroys;
ndst.ndts_hash_grows += st->hash_grows;
@@ -2343,7 +2344,7 @@ static void *neigh_stat_seq_start(struct
if (!cpu_possible(cpu))
continue;
*pos = cpu+1;
- return per_cpu_ptr(tbl->stats, cpu);
+ return CPU_PTR(tbl->stats, cpu);
}
return NULL;
}
@@ -2358,7 +2359,7 @@ static void *neigh_stat_seq_next(struct
if (!cpu_possible(cpu))
continue;
*pos = cpu+1;
- return per_cpu_ptr(tbl->stats, cpu);
+ return CPU_PTR(tbl->stats, cpu);
}
return NULL;
}
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 19/30] cpu alloc: tcp statistics
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
` (17 preceding siblings ...)
2007-11-16 23:09 ` [patch 18/30] cpu alloc: neigbour statistics Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
2007-11-16 23:09 ` [patch 20/30] cpu alloc: convert scatches Christoph Lameter
` (10 subsequent siblings)
29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[-- Attachment #1: 0029-cpu-alloc-tcp-statistics.patch --]
[-- Type: text/plain, Size: 1629 bytes --]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
net/ipv4/tcp.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
Index: linux-2.6/net/ipv4/tcp.c
===================================================================
--- linux-2.6.orig/net/ipv4/tcp.c 2007-11-15 21:17:24.267654551 -0800
+++ linux-2.6/net/ipv4/tcp.c 2007-11-15 21:25:34.214404334 -0800
@@ -2273,7 +2273,7 @@ static void __tcp_free_md5sig_pool(struc
{
int cpu;
for_each_possible_cpu(cpu) {
- struct tcp_md5sig_pool *p = *per_cpu_ptr(pool, cpu);
+ struct tcp_md5sig_pool *p = *CPU_PTR(pool, cpu);
if (p) {
if (p->md5_desc.tfm)
crypto_free_hash(p->md5_desc.tfm);
@@ -2281,7 +2281,7 @@ static void __tcp_free_md5sig_pool(struc
p = NULL;
}
}
- free_percpu(pool);
+ CPU_FREE(pool);
}
void tcp_free_md5sig_pool(void)
@@ -2305,7 +2305,7 @@ static struct tcp_md5sig_pool **__tcp_al
int cpu;
struct tcp_md5sig_pool **pool;
- pool = alloc_percpu(struct tcp_md5sig_pool *);
+ pool = CPU_ALLOC(struct tcp_md5sig_pool *, GFP_KERNEL);
if (!pool)
return NULL;
@@ -2316,7 +2316,7 @@ static struct tcp_md5sig_pool **__tcp_al
p = kzalloc(sizeof(*p), GFP_KERNEL);
if (!p)
goto out_free;
- *per_cpu_ptr(pool, cpu) = p;
+ *CPU_PTR(pool, cpu) = p;
hash = crypto_alloc_hash("md5", 0, CRYPTO_ALG_ASYNC);
if (!hash || IS_ERR(hash))
@@ -2381,7 +2381,7 @@ struct tcp_md5sig_pool *__tcp_get_md5sig
if (p)
tcp_md5sig_users++;
spin_unlock_bh(&tcp_md5sig_pool_lock);
- return (p ? *per_cpu_ptr(p, cpu) : NULL);
+ return (p ? *CPU_PTR(p, cpu) : NULL);
}
EXPORT_SYMBOL(__tcp_get_md5sig_pool);
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 20/30] cpu alloc: convert scatches
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
` (18 preceding siblings ...)
2007-11-16 23:09 ` [patch 19/30] cpu alloc: tcp statistics Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
2007-11-16 23:09 ` [patch 21/30] cpu alloc: dmaengine conversion Christoph Lameter
` (9 subsequent siblings)
29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[-- Attachment #1: 0030-cpu-alloc-convert-scatches.patch --]
[-- Type: text/plain, Size: 6122 bytes --]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
net/ipv4/ipcomp.c | 26 +++++++++++++-------------
net/ipv6/ipcomp6.c | 26 +++++++++++++-------------
2 files changed, 26 insertions(+), 26 deletions(-)
Index: linux-2.6/net/ipv4/ipcomp.c
===================================================================
--- linux-2.6.orig/net/ipv4/ipcomp.c 2007-11-15 21:17:24.199404507 -0800
+++ linux-2.6/net/ipv4/ipcomp.c 2007-11-15 21:25:34.771154012 -0800
@@ -48,8 +48,8 @@ static int ipcomp_decompress(struct xfrm
int dlen = IPCOMP_SCRATCH_SIZE;
const u8 *start = skb->data;
const int cpu = get_cpu();
- u8 *scratch = *per_cpu_ptr(ipcomp_scratches, cpu);
- struct crypto_comp *tfm = *per_cpu_ptr(ipcd->tfms, cpu);
+ u8 *scratch = *CPU_PTR(ipcomp_scratches, cpu);
+ struct crypto_comp *tfm = *CPU_PTR(ipcd->tfms, cpu);
int err = crypto_comp_decompress(tfm, start, plen, scratch, &dlen);
if (err)
@@ -103,8 +103,8 @@ static int ipcomp_compress(struct xfrm_s
int dlen = IPCOMP_SCRATCH_SIZE;
u8 *start = skb->data;
const int cpu = get_cpu();
- u8 *scratch = *per_cpu_ptr(ipcomp_scratches, cpu);
- struct crypto_comp *tfm = *per_cpu_ptr(ipcd->tfms, cpu);
+ u8 *scratch = *CPU_PTR(ipcomp_scratches, cpu);
+ struct crypto_comp *tfm = *CPU_PTR(ipcd->tfms, cpu);
int err = crypto_comp_compress(tfm, start, plen, scratch, &dlen);
if (err)
@@ -252,9 +252,9 @@ static void ipcomp_free_scratches(void)
return;
for_each_possible_cpu(i)
- vfree(*per_cpu_ptr(scratches, i));
+ vfree(*CPU_PTR(scratches, i));
- free_percpu(scratches);
+ CPU_FREE(scratches);
}
static void **ipcomp_alloc_scratches(void)
@@ -265,7 +265,7 @@ static void **ipcomp_alloc_scratches(voi
if (ipcomp_scratch_users++)
return ipcomp_scratches;
- scratches = alloc_percpu(void *);
+ scratches = CPU_ALLOC(void *, GFP_KERNEL);
if (!scratches)
return NULL;
@@ -275,7 +275,7 @@ static void **ipcomp_alloc_scratches(voi
void *scratch = vmalloc(IPCOMP_SCRATCH_SIZE);
if (!scratch)
return NULL;
- *per_cpu_ptr(scratches, i) = scratch;
+ *CPU_PTR(scratches, i) = scratch;
}
return scratches;
@@ -303,10 +303,10 @@ static void ipcomp_free_tfms(struct cryp
return;
for_each_possible_cpu(cpu) {
- struct crypto_comp *tfm = *per_cpu_ptr(tfms, cpu);
+ struct crypto_comp *tfm = *CPU_PTR(tfms, cpu);
crypto_free_comp(tfm);
}
- free_percpu(tfms);
+ CPU_FREE(tfms);
}
static struct crypto_comp **ipcomp_alloc_tfms(const char *alg_name)
@@ -322,7 +322,7 @@ static struct crypto_comp **ipcomp_alloc
struct crypto_comp *tfm;
tfms = pos->tfms;
- tfm = *per_cpu_ptr(tfms, cpu);
+ tfm = *CPU_PTR(tfms, cpu);
if (!strcmp(crypto_comp_name(tfm), alg_name)) {
pos->users++;
@@ -338,7 +338,7 @@ static struct crypto_comp **ipcomp_alloc
INIT_LIST_HEAD(&pos->list);
list_add(&pos->list, &ipcomp_tfms_list);
- pos->tfms = tfms = alloc_percpu(struct crypto_comp *);
+ pos->tfms = tfms = CPU_ALLOC(struct crypto_comp *, GFP_KERNEL);
if (!tfms)
goto error;
@@ -347,7 +347,7 @@ static struct crypto_comp **ipcomp_alloc
CRYPTO_ALG_ASYNC);
if (IS_ERR(tfm))
goto error;
- *per_cpu_ptr(tfms, cpu) = tfm;
+ *CPU_PTR(tfms, cpu) = tfm;
}
return tfms;
Index: linux-2.6/net/ipv6/ipcomp6.c
===================================================================
--- linux-2.6.orig/net/ipv6/ipcomp6.c 2007-11-15 21:17:24.207404544 -0800
+++ linux-2.6/net/ipv6/ipcomp6.c 2007-11-15 21:25:34.774656957 -0800
@@ -88,8 +88,8 @@ static int ipcomp6_input(struct xfrm_sta
start = skb->data;
cpu = get_cpu();
- scratch = *per_cpu_ptr(ipcomp6_scratches, cpu);
- tfm = *per_cpu_ptr(ipcd->tfms, cpu);
+ scratch = *CPU_PTR(ipcomp6_scratches, cpu);
+ tfm = *CPU_PTR(ipcd->tfms, cpu);
err = crypto_comp_decompress(tfm, start, plen, scratch, &dlen);
if (err)
@@ -140,8 +140,8 @@ static int ipcomp6_output(struct xfrm_st
start = skb->data;
cpu = get_cpu();
- scratch = *per_cpu_ptr(ipcomp6_scratches, cpu);
- tfm = *per_cpu_ptr(ipcd->tfms, cpu);
+ scratch = *CPU_PTR(ipcomp6_scratches, cpu);
+ tfm = *CPU_PTR(ipcd->tfms, cpu);
err = crypto_comp_compress(tfm, start, plen, scratch, &dlen);
if (err || (dlen + sizeof(*ipch)) >= plen) {
@@ -263,12 +263,12 @@ static void ipcomp6_free_scratches(void)
return;
for_each_possible_cpu(i) {
- void *scratch = *per_cpu_ptr(scratches, i);
+ void *scratch = *CPU_PTR(scratches, i);
vfree(scratch);
}
- free_percpu(scratches);
+ CPU_FREE(scratches);
}
static void **ipcomp6_alloc_scratches(void)
@@ -279,7 +279,7 @@ static void **ipcomp6_alloc_scratches(vo
if (ipcomp6_scratch_users++)
return ipcomp6_scratches;
- scratches = alloc_percpu(void *);
+ scratches = CPU_ALLOC(void *, GFP_KERNEL);
if (!scratches)
return NULL;
@@ -289,7 +289,7 @@ static void **ipcomp6_alloc_scratches(vo
void *scratch = vmalloc(IPCOMP_SCRATCH_SIZE);
if (!scratch)
return NULL;
- *per_cpu_ptr(scratches, i) = scratch;
+ *CPU_PTR(scratches, i) = scratch;
}
return scratches;
@@ -317,10 +317,10 @@ static void ipcomp6_free_tfms(struct cry
return;
for_each_possible_cpu(cpu) {
- struct crypto_comp *tfm = *per_cpu_ptr(tfms, cpu);
+ struct crypto_comp *tfm = *CPU_PTR(tfms, cpu);
crypto_free_comp(tfm);
}
- free_percpu(tfms);
+ CPU_FREE(tfms);
}
static struct crypto_comp **ipcomp6_alloc_tfms(const char *alg_name)
@@ -336,7 +336,7 @@ static struct crypto_comp **ipcomp6_allo
struct crypto_comp *tfm;
tfms = pos->tfms;
- tfm = *per_cpu_ptr(tfms, cpu);
+ tfm = *CPU_PTR(tfms, cpu);
if (!strcmp(crypto_comp_name(tfm), alg_name)) {
pos->users++;
@@ -352,7 +352,7 @@ static struct crypto_comp **ipcomp6_allo
INIT_LIST_HEAD(&pos->list);
list_add(&pos->list, &ipcomp6_tfms_list);
- pos->tfms = tfms = alloc_percpu(struct crypto_comp *);
+ pos->tfms = tfms = CPU_ALLOC(struct crypto_comp *, GFP_KERNEL);
if (!tfms)
goto error;
@@ -361,7 +361,7 @@ static struct crypto_comp **ipcomp6_allo
CRYPTO_ALG_ASYNC);
if (IS_ERR(tfm))
goto error;
- *per_cpu_ptr(tfms, cpu) = tfm;
+ *CPU_PTR(tfms, cpu) = tfm;
}
return tfms;
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 21/30] cpu alloc: dmaengine conversion
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
` (19 preceding siblings ...)
2007-11-16 23:09 ` [patch 20/30] cpu alloc: convert scatches Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
2007-11-16 23:09 ` [patch 22/30] cpu alloc: convert loopback statistics Christoph Lameter
` (8 subsequent siblings)
29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[-- Attachment #1: 0031-cpu-alloc-dmaengine-conversion.patch --]
[-- Type: text/plain, Size: 4319 bytes --]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
drivers/dma/dmaengine.c | 27 ++++++++++++++-------------
include/linux/dmaengine.h | 4 ++--
2 files changed, 16 insertions(+), 15 deletions(-)
Index: linux-2.6/drivers/dma/dmaengine.c
===================================================================
--- linux-2.6.orig/drivers/dma/dmaengine.c 2007-11-15 21:17:24.127154620 -0800
+++ linux-2.6/drivers/dma/dmaengine.c 2007-11-15 21:25:35.354654191 -0800
@@ -84,7 +84,7 @@ static ssize_t show_memcpy_count(struct
int i;
for_each_possible_cpu(i)
- count += per_cpu_ptr(chan->local, i)->memcpy_count;
+ count += CPU_PTR(chan->local, i)->memcpy_count;
return sprintf(buf, "%lu\n", count);
}
@@ -96,7 +96,7 @@ static ssize_t show_bytes_transferred(st
int i;
for_each_possible_cpu(i)
- count += per_cpu_ptr(chan->local, i)->bytes_transferred;
+ count += CPU_PTR(chan->local, i)->bytes_transferred;
return sprintf(buf, "%lu\n", count);
}
@@ -110,7 +110,7 @@ static ssize_t show_in_use(struct class_
atomic_read(&chan->refcount.refcount) > 1)
in_use = 1;
else {
- if (local_read(&(per_cpu_ptr(chan->local,
+ if (local_read(&(CPU_PTR(chan->local,
get_cpu())->refcount)) > 0)
in_use = 1;
put_cpu();
@@ -226,7 +226,7 @@ static void dma_chan_free_rcu(struct rcu
int bias = 0x7FFFFFFF;
int i;
for_each_possible_cpu(i)
- bias -= local_read(&per_cpu_ptr(chan->local, i)->refcount);
+ bias -= local_read(&CPU_PTR(chan->local, i)->refcount);
atomic_sub(bias, &chan->refcount.refcount);
kref_put(&chan->refcount, dma_chan_cleanup);
}
@@ -372,7 +372,8 @@ int dma_async_device_register(struct dma
/* represent channels in sysfs. Probably want devs too */
list_for_each_entry(chan, &device->channels, device_node) {
- chan->local = alloc_percpu(typeof(*chan->local));
+ chan->local = CPU_ALLOC(typeof(*chan->local),
+ GFP_KERNEL | __GFP_ZERO);
if (chan->local == NULL)
continue;
@@ -385,7 +386,7 @@ int dma_async_device_register(struct dma
rc = class_device_register(&chan->class_dev);
if (rc) {
chancnt--;
- free_percpu(chan->local);
+ CPU_FREE(chan->local);
chan->local = NULL;
goto err_out;
}
@@ -413,7 +414,7 @@ err_out:
kref_put(&device->refcount, dma_async_device_cleanup);
class_device_unregister(&chan->class_dev);
chancnt--;
- free_percpu(chan->local);
+ CPU_FREE(chan->local);
}
return rc;
}
@@ -489,8 +490,8 @@ dma_async_memcpy_buf_to_buf(struct dma_c
cookie = tx->tx_submit(tx);
cpu = get_cpu();
- per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
- per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+ CPU_PTR(chan->local, cpu)->bytes_transferred += len;
+ CPU_PTR(chan->local, cpu)->memcpy_count++;
put_cpu();
return cookie;
@@ -533,8 +534,8 @@ dma_async_memcpy_buf_to_pg(struct dma_ch
cookie = tx->tx_submit(tx);
cpu = get_cpu();
- per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
- per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+ CPU_PTR(chan->local, cpu)->bytes_transferred += len;
+ CPU_PTR(chan->local, cpu)->memcpy_count++;
put_cpu();
return cookie;
@@ -579,8 +580,8 @@ dma_async_memcpy_pg_to_pg(struct dma_cha
cookie = tx->tx_submit(tx);
cpu = get_cpu();
- per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
- per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+ CPU_PTR(chan->local, cpu)->bytes_transferred += len;
+ CPU_PTR(chan->local, cpu)->memcpy_count++;
put_cpu();
return cookie;
Index: linux-2.6/include/linux/dmaengine.h
===================================================================
--- linux-2.6.orig/include/linux/dmaengine.h 2007-11-15 21:17:24.135154570 -0800
+++ linux-2.6/include/linux/dmaengine.h 2007-11-15 21:25:35.358654166 -0800
@@ -150,7 +150,7 @@ static inline void dma_chan_get(struct d
if (unlikely(chan->slow_ref))
kref_get(&chan->refcount);
else {
- local_inc(&(per_cpu_ptr(chan->local, get_cpu())->refcount));
+ local_inc(&CPU_PTR(chan->local, get_cpu())->refcount);
put_cpu();
}
}
@@ -160,7 +160,7 @@ static inline void dma_chan_put(struct d
if (unlikely(chan->slow_ref))
kref_put(&chan->refcount, dma_chan_cleanup);
else {
- local_dec(&(per_cpu_ptr(chan->local, get_cpu())->refcount));
+ local_dec(&CPU_PTR(chan->local, get_cpu())->refcount);
put_cpu();
}
}
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 22/30] cpu alloc: convert loopback statistics
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
` (20 preceding siblings ...)
2007-11-16 23:09 ` [patch 21/30] cpu alloc: dmaengine conversion Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
2007-11-16 23:09 ` [patch 23/30] cpu alloc: veth conversion Christoph Lameter
` (7 subsequent siblings)
29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[-- Attachment #1: 0032-cpu-alloc-convert-loopback-statistics.patch --]
[-- Type: text/plain, Size: 1423 bytes --]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
drivers/net/loopback.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
Index: linux-2.6/drivers/net/loopback.c
===================================================================
--- linux-2.6.orig/drivers/net/loopback.c 2007-11-15 21:17:24.067154382 -0800
+++ linux-2.6/drivers/net/loopback.c 2007-11-15 21:25:36.006154068 -0800
@@ -156,7 +156,7 @@ static int loopback_xmit(struct sk_buff
/* it's OK to use per_cpu_ptr() because BHs are off */
pcpu_lstats = netdev_priv(dev);
- lb_stats = per_cpu_ptr(pcpu_lstats, smp_processor_id());
+ lb_stats = THIS_CPU(pcpu_lstats);
lb_stats->bytes += skb->len;
lb_stats->packets++;
@@ -177,7 +177,7 @@ static struct net_device_stats *get_stat
for_each_possible_cpu(i) {
const struct pcpu_lstats *lb_stats;
- lb_stats = per_cpu_ptr(pcpu_lstats, i);
+ lb_stats = CPU_PTR(pcpu_lstats, i);
bytes += lb_stats->bytes;
packets += lb_stats->packets;
}
@@ -205,7 +205,7 @@ static int loopback_dev_init(struct net_
{
struct pcpu_lstats *lstats;
- lstats = alloc_percpu(struct pcpu_lstats);
+ lstats = CPU_ALLOC(struct pcpu_lstats, GFP_KERNEL | __GFP_ZERO);
if (!lstats)
return -ENOMEM;
@@ -217,7 +217,7 @@ static void loopback_dev_free(struct net
{
struct pcpu_lstats *lstats = netdev_priv(dev);
- free_percpu(lstats);
+ CPU_FREE(lstats);
free_netdev(dev);
}
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 23/30] cpu alloc: veth conversion
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
` (21 preceding siblings ...)
2007-11-16 23:09 ` [patch 22/30] cpu alloc: convert loopback statistics Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
2007-11-16 23:09 ` [patch 24/30] cpu alloc: Chelsio statistics conversion Christoph Lameter
` (6 subsequent siblings)
29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[-- Attachment #1: 0033-cpu-alloc-veth-conversion.patch --]
[-- Type: text/plain, Size: 1666 bytes --]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
drivers/net/veth.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
Index: linux-2.6/drivers/net/veth.c
===================================================================
--- linux-2.6.orig/drivers/net/veth.c 2007-11-15 21:17:24.010404318 -0800
+++ linux-2.6/drivers/net/veth.c 2007-11-15 21:25:36.483154219 -0800
@@ -162,7 +162,7 @@ static int veth_xmit(struct sk_buff *skb
rcv_priv = netdev_priv(rcv);
cpu = smp_processor_id();
- stats = per_cpu_ptr(priv->stats, cpu);
+ stats = CPU_PTR(priv->stats, cpu);
if (!(rcv->flags & IFF_UP))
goto outf;
@@ -183,7 +183,7 @@ static int veth_xmit(struct sk_buff *skb
stats->tx_bytes += length;
stats->tx_packets++;
- stats = per_cpu_ptr(rcv_priv->stats, cpu);
+ stats = CPU_PTR(rcv_priv->stats, cpu);
stats->rx_bytes += length;
stats->rx_packets++;
@@ -217,7 +217,7 @@ static struct net_device_stats *veth_get
dev_stats->tx_dropped = 0;
for_each_online_cpu(cpu) {
- stats = per_cpu_ptr(priv->stats, cpu);
+ stats = CPU_PTR(priv->stats, cpu);
dev_stats->rx_packets += stats->rx_packets;
dev_stats->tx_packets += stats->tx_packets;
@@ -261,7 +261,7 @@ static int veth_dev_init(struct net_devi
struct veth_net_stats *stats;
struct veth_priv *priv;
- stats = alloc_percpu(struct veth_net_stats);
+ stats = CPU_ALLOC(struct veth_net_stats, GFP_KERNEL | __GFP_ZER);
if (stats == NULL)
return -ENOMEM;
@@ -275,7 +275,7 @@ static void veth_dev_free(struct net_dev
struct veth_priv *priv;
priv = netdev_priv(dev);
- free_percpu(priv->stats);
+ CPU_FREE(priv->stats);
free_netdev(dev);
}
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 24/30] cpu alloc: Chelsio statistics conversion
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
` (22 preceding siblings ...)
2007-11-16 23:09 ` [patch 23/30] cpu alloc: veth conversion Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
2007-11-16 23:09 ` [patch 25/30] cpu alloc: convert mib handling to cpu alloc Christoph Lameter
` (5 subsequent siblings)
29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[-- Attachment #1: 0034-cpu-alloc-Chelsio-statistics-conversion.patch --]
[-- Type: text/plain, Size: 2209 bytes --]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
drivers/net/chelsio/sge.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
Index: linux-2.6/drivers/net/chelsio/sge.c
===================================================================
--- linux-2.6.orig/drivers/net/chelsio/sge.c 2007-11-15 21:17:23.927654318 -0800
+++ linux-2.6/drivers/net/chelsio/sge.c 2007-11-15 21:25:37.015154316 -0800
@@ -805,7 +805,7 @@ void t1_sge_destroy(struct sge *sge)
int i;
for_each_port(sge->adapter, i)
- free_percpu(sge->port_stats[i]);
+ CPU_FREE(sge->port_stats[i]);
kfree(sge->tx_sched);
free_tx_resources(sge);
@@ -984,7 +984,7 @@ void t1_sge_get_port_stats(const struct
memset(ss, 0, sizeof(*ss));
for_each_possible_cpu(cpu) {
- struct sge_port_stats *st = per_cpu_ptr(sge->port_stats[port], cpu);
+ struct sge_port_stats *st = CPU_PTR(sge->port_stats[port], cpu);
ss->rx_packets += st->rx_packets;
ss->rx_cso_good += st->rx_cso_good;
@@ -1379,7 +1379,7 @@ static void sge_rx(struct sge *sge, stru
}
__skb_pull(skb, sizeof(*p));
- st = per_cpu_ptr(sge->port_stats[p->iff], smp_processor_id());
+ st = THIS_CPU(sge->port_stats[p->iff]);
st->rx_packets++;
skb->protocol = eth_type_trans(skb, adapter->port[p->iff].dev);
@@ -1848,7 +1848,7 @@ int t1_start_xmit(struct sk_buff *skb, s
{
struct adapter *adapter = dev->priv;
struct sge *sge = adapter->sge;
- struct sge_port_stats *st = per_cpu_ptr(sge->port_stats[dev->if_port], smp_processor_id());
+ struct sge_port_stats *st = THIS_CPU(sge->port_stats[dev->if_port]);
struct cpl_tx_pkt *cpl;
struct sk_buff *orig_skb = skb;
int ret;
@@ -2165,7 +2165,8 @@ struct sge * __devinit t1_sge_create(str
sge->jumbo_fl = t1_is_T1B(adapter) ? 1 : 0;
for_each_port(adapter, i) {
- sge->port_stats[i] = alloc_percpu(struct sge_port_stats);
+ sge->port_stats[i] = CPU_ALLOC(struct sge_port_stats,
+ GFP_KERNEL | __GFP_ZERO);
if (!sge->port_stats[i])
goto nomem_port;
}
@@ -2209,7 +2210,7 @@ struct sge * __devinit t1_sge_create(str
return sge;
nomem_port:
while (i >= 0) {
- free_percpu(sge->port_stats[i]);
+ CPU_FREE(sge->port_stats[i]);
--i;
}
kfree(sge);
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 25/30] cpu alloc: convert mib handling to cpu alloc
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
` (23 preceding siblings ...)
2007-11-16 23:09 ` [patch 24/30] cpu alloc: Chelsio statistics conversion Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
2007-11-16 23:09 ` [patch 26/30] cpu_alloc: convert network sockets Christoph Lameter
` (4 subsequent siblings)
29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[-- Attachment #1: 0035-cpu-alloc-convert-mib-handling-to-cpu-alloc.patch --]
[-- Type: text/plain, Size: 10677 bytes --]
Use the cpu alloc functions for the mib handling functions in the net
layer. The API for snmp_mib_free() is changed to add a size parameter
since cpu_fre requires that.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
include/net/ip.h | 2 +-
include/net/snmp.h | 14 +++++++-------
net/dccp/proto.c | 12 +++++++-----
net/ipv4/af_inet.c | 31 +++++++++++++++++--------------
net/ipv6/addrconf.c | 10 +++++-----
net/ipv6/af_inet6.c | 18 +++++++++---------
net/sctp/proc.c | 4 ++--
net/sctp/protocol.c | 12 +++++++-----
8 files changed, 55 insertions(+), 48 deletions(-)
Index: linux-2.6/include/net/ip.h
===================================================================
--- linux-2.6.orig/include/net/ip.h 2007-11-15 21:17:23.831654180 -0800
+++ linux-2.6/include/net/ip.h 2007-11-15 21:25:37.575154222 -0800
@@ -170,7 +170,7 @@ DECLARE_SNMP_STAT(struct linux_mib, net_
extern unsigned long snmp_fold_field(void *mib[], int offt);
extern int snmp_mib_init(void *ptr[2], size_t mibsize, size_t mibalign);
-extern void snmp_mib_free(void *ptr[2]);
+extern void snmp_mib_free(void *ptr[2], size_t mibsize);
extern void inet_get_local_port_range(int *low, int *high);
Index: linux-2.6/include/net/snmp.h
===================================================================
--- linux-2.6.orig/include/net/snmp.h 2007-11-15 21:17:23.839654350 -0800
+++ linux-2.6/include/net/snmp.h 2007-11-15 21:25:37.575154222 -0800
@@ -133,18 +133,18 @@ struct linux_mib {
#define SNMP_STAT_USRPTR(name) (name[1])
#define SNMP_INC_STATS_BH(mib, field) \
- (per_cpu_ptr(mib[0], raw_smp_processor_id())->mibs[field]++)
+ (__THIS_CPU(mib[0])->mibs[field]++)
#define SNMP_INC_STATS_OFFSET_BH(mib, field, offset) \
- (per_cpu_ptr(mib[0], raw_smp_processor_id())->mibs[field + (offset)]++)
+ (__THIS_CPU(mib[0])->mibs[field + (offset)]++)
#define SNMP_INC_STATS_USER(mib, field) \
- (per_cpu_ptr(mib[1], raw_smp_processor_id())->mibs[field]++)
+ (__THIS_CPU(mib[1])->mibs[field]++)
#define SNMP_INC_STATS(mib, field) \
- (per_cpu_ptr(mib[!in_softirq()], raw_smp_processor_id())->mibs[field]++)
+ (__THIS_CPU(mib[!in_softirq()])->mibs[field]++)
#define SNMP_DEC_STATS(mib, field) \
- (per_cpu_ptr(mib[!in_softirq()], raw_smp_processor_id())->mibs[field]--)
+ (__THIS_CPU(mib[!in_softirq()])->mibs[field]--)
#define SNMP_ADD_STATS_BH(mib, field, addend) \
- (per_cpu_ptr(mib[0], raw_smp_processor_id())->mibs[field] += addend)
+ (__THIS_CPU(mib[0])->mibs[field] += addend)
#define SNMP_ADD_STATS_USER(mib, field, addend) \
- (per_cpu_ptr(mib[1], raw_smp_processor_id())->mibs[field] += addend)
+ (__THIS_CPU(mib[1])->mibs[field] += addend)
#endif
Index: linux-2.6/net/dccp/proto.c
===================================================================
--- linux-2.6.orig/net/dccp/proto.c 2007-11-15 21:17:23.847654486 -0800
+++ linux-2.6/net/dccp/proto.c 2007-11-15 21:25:37.575154222 -0800
@@ -990,11 +990,13 @@ static int __init dccp_mib_init(void)
{
int rc = -ENOMEM;
- dccp_statistics[0] = alloc_percpu(struct dccp_mib);
+ dccp_statistics[0] = CPU_ALLOC(struct dccp_mib,
+ GFP_KERNEL | __GFP_ZERO);
if (dccp_statistics[0] == NULL)
goto out;
- dccp_statistics[1] = alloc_percpu(struct dccp_mib);
+ dccp_statistics[1] = CPU_ALLOC(struct dccp_mib,
+ GFP_KERNEL | __GFP_ZERO);
if (dccp_statistics[1] == NULL)
goto out_free_one;
@@ -1002,7 +1004,7 @@ static int __init dccp_mib_init(void)
out:
return rc;
out_free_one:
- free_percpu(dccp_statistics[0]);
+ CPU_FREE(dccp_statistics[0]);
dccp_statistics[0] = NULL;
goto out;
@@ -1010,8 +1012,8 @@ out_free_one:
static void dccp_mib_exit(void)
{
- free_percpu(dccp_statistics[0]);
- free_percpu(dccp_statistics[1]);
+ CPU_FREE(dccp_statistics[0]);
+ CPU_FREE(dccp_statistics[1]);
dccp_statistics[0] = dccp_statistics[1] = NULL;
}
Index: linux-2.6/net/ipv4/af_inet.c
===================================================================
--- linux-2.6.orig/net/ipv4/af_inet.c 2007-11-15 21:17:23.855654347 -0800
+++ linux-2.6/net/ipv4/af_inet.c 2007-11-15 21:25:37.575154222 -0800
@@ -1230,8 +1230,8 @@ unsigned long snmp_fold_field(void *mib[
int i;
for_each_possible_cpu(i) {
- res += *(((unsigned long *) per_cpu_ptr(mib[0], i)) + offt);
- res += *(((unsigned long *) per_cpu_ptr(mib[1], i)) + offt);
+ res += *(((unsigned long *) CPU_PTR(mib[0], i)) + offt);
+ res += *(((unsigned long *) CPU_PTR(mib[1], i)) + offt);
}
return res;
}
@@ -1240,26 +1240,28 @@ EXPORT_SYMBOL_GPL(snmp_fold_field);
int snmp_mib_init(void *ptr[2], size_t mibsize, size_t mibalign)
{
BUG_ON(ptr == NULL);
- ptr[0] = __alloc_percpu(mibsize);
+ ptr[0] = cpu_alloc(mibsize, GFP_KERNEL | __GFP_ZERO,
+ mibalign);
if (!ptr[0])
goto err0;
- ptr[1] = __alloc_percpu(mibsize);
+ ptr[1] = cpu_alloc(mibsize, GFP_KERNEL | __GFP_ZERO,
+ mibalign);
if (!ptr[1])
goto err1;
return 0;
err1:
- free_percpu(ptr[0]);
+ cpu_free(ptr[0], mibsize);
ptr[0] = NULL;
err0:
return -ENOMEM;
}
EXPORT_SYMBOL_GPL(snmp_mib_init);
-void snmp_mib_free(void *ptr[2])
+void snmp_mib_free(void *ptr[2], size_t mibsize)
{
BUG_ON(ptr == NULL);
- free_percpu(ptr[0]);
- free_percpu(ptr[1]);
+ cpu_free(ptr[0], mibsize);
+ cpu_free(ptr[1], mibsize);
ptr[0] = ptr[1] = NULL;
}
EXPORT_SYMBOL_GPL(snmp_mib_free);
@@ -1324,17 +1326,18 @@ static int __init init_ipv4_mibs(void)
return 0;
err_udplite_mib:
- snmp_mib_free((void **)udp_statistics);
+ snmp_mib_free((void **)udp_statistics, sizeof(struct udp_mib));
err_udp_mib:
- snmp_mib_free((void **)tcp_statistics);
+ snmp_mib_free((void **)tcp_statistics, sizeof(struct tcp_mib));
err_tcp_mib:
- snmp_mib_free((void **)icmpmsg_statistics);
+ snmp_mib_free((void **)icmpmsg_statistics,
+ sizeof(struct icmpmsg_mib));
err_icmpmsg_mib:
- snmp_mib_free((void **)icmp_statistics);
+ snmp_mib_free((void **)icmp_statistics, sizeof(struct icmp_mib));
err_icmp_mib:
- snmp_mib_free((void **)ip_statistics);
+ snmp_mib_free((void **)ip_statistics, sizeof(struct ipstats_mib));
err_ip_mib:
- snmp_mib_free((void **)net_statistics);
+ snmp_mib_free((void **)net_statistics, sizeof(struct linux_mib));
err_net_mib:
return -ENOMEM;
}
Index: linux-2.6/net/ipv6/addrconf.c
===================================================================
--- linux-2.6.orig/net/ipv6/addrconf.c 2007-11-15 21:17:23.859654454 -0800
+++ linux-2.6/net/ipv6/addrconf.c 2007-11-15 21:25:37.579154173 -0800
@@ -271,18 +271,18 @@ static int snmp6_alloc_dev(struct inet6_
return 0;
err_icmpmsg:
- snmp_mib_free((void **)idev->stats.icmpv6);
+ snmp_mib_free((void **)idev->stats.icmpv6, sizeof(struct icmpv6_mib));
err_icmp:
- snmp_mib_free((void **)idev->stats.ipv6);
+ snmp_mib_free((void **)idev->stats.ipv6, sizeof(struct ipstats_mib));
err_ip:
return -ENOMEM;
}
static void snmp6_free_dev(struct inet6_dev *idev)
{
- snmp_mib_free((void **)idev->stats.icmpv6msg);
- snmp_mib_free((void **)idev->stats.icmpv6);
- snmp_mib_free((void **)idev->stats.ipv6);
+ snmp_mib_free((void **)idev->stats.icmpv6msg, sizeof(struct icmpv6_mib));
+ snmp_mib_free((void **)idev->stats.icmpv6, sizeof(struct icmpv6_mib));
+ snmp_mib_free((void **)idev->stats.ipv6, sizeof(struct ipstats_mib));
}
/* Nobody refers to this device, we may destroy it. */
Index: linux-2.6/net/ipv6/af_inet6.c
===================================================================
--- linux-2.6.orig/net/ipv6/af_inet6.c 2007-11-15 21:17:23.867654431 -0800
+++ linux-2.6/net/ipv6/af_inet6.c 2007-11-15 21:25:37.579154173 -0800
@@ -731,13 +731,13 @@ static int __init init_ipv6_mibs(void)
return 0;
err_udplite_mib:
- snmp_mib_free((void **)udp_stats_in6);
+ snmp_mib_free((void **)udp_stats_in6, sizeof(struct udp_mib));
err_udp_mib:
- snmp_mib_free((void **)icmpv6msg_statistics);
+ snmp_mib_free((void **)icmpv6msg_statistics, sizeof(struct icmpv6_mib));
err_icmpmsg_mib:
- snmp_mib_free((void **)icmpv6_statistics);
+ snmp_mib_free((void **)icmpv6_statistics, sizeof(struct icmpv6_mib));
err_icmp_mib:
- snmp_mib_free((void **)ipv6_statistics);
+ snmp_mib_free((void **)ipv6_statistics, sizeof(struct ipstats_mib));
err_ip_mib:
return -ENOMEM;
@@ -745,11 +745,11 @@ err_ip_mib:
static void cleanup_ipv6_mibs(void)
{
- snmp_mib_free((void **)ipv6_statistics);
- snmp_mib_free((void **)icmpv6_statistics);
- snmp_mib_free((void **)icmpv6msg_statistics);
- snmp_mib_free((void **)udp_stats_in6);
- snmp_mib_free((void **)udplite_stats_in6);
+ snmp_mib_free((void **)ipv6_statistics, sizeof(struct ipstats_mib));
+ snmp_mib_free((void **)icmpv6_statistics, sizeof(struct icmpv6_mib));
+ snmp_mib_free((void **)icmpv6msg_statistics, sizeof(struct icmpv6_mib));
+ snmp_mib_free((void **)udp_stats_in6, sizeof(struct udp_mib));
+ snmp_mib_free((void **)udplite_stats_in6, sizeof(struct udp_mib));
}
static int __init inet6_init(void)
Index: linux-2.6/net/sctp/proc.c
===================================================================
--- linux-2.6.orig/net/sctp/proc.c 2007-11-15 21:17:23.875654189 -0800
+++ linux-2.6/net/sctp/proc.c 2007-11-15 21:25:37.579154173 -0800
@@ -86,10 +86,10 @@ fold_field(void *mib[], int nr)
for_each_possible_cpu(i) {
res +=
- *((unsigned long *) (((void *) per_cpu_ptr(mib[0], i)) +
+ *((unsigned long *) (((void *)CPU_PTR(mib[0], i)) +
sizeof (unsigned long) * nr));
res +=
- *((unsigned long *) (((void *) per_cpu_ptr(mib[1], i)) +
+ *((unsigned long *) (((void *)CPU_PTR(mib[1], i)) +
sizeof (unsigned long) * nr));
}
return res;
Index: linux-2.6/net/sctp/protocol.c
===================================================================
--- linux-2.6.orig/net/sctp/protocol.c 2007-11-15 21:17:23.883654344 -0800
+++ linux-2.6/net/sctp/protocol.c 2007-11-15 21:25:37.579154173 -0800
@@ -970,12 +970,14 @@ int sctp_register_pf(struct sctp_pf *pf,
static int __init init_sctp_mibs(void)
{
- sctp_statistics[0] = alloc_percpu(struct sctp_mib);
+ sctp_statistics[0] = CPU_ALLOC(struct sctp_mib,
+ GFP_KERNEL | __GFP_ZERO);
if (!sctp_statistics[0])
return -ENOMEM;
- sctp_statistics[1] = alloc_percpu(struct sctp_mib);
+ sctp_statistics[1] = CPU_ALLOC(struct sctp_mib,
+ GFP_KERNEL | __GFP_ZERO);
if (!sctp_statistics[1]) {
- free_percpu(sctp_statistics[0]);
+ CPU_FREE(sctp_statistics[0]);
return -ENOMEM;
}
return 0;
@@ -984,8 +986,8 @@ static int __init init_sctp_mibs(void)
static void cleanup_sctp_mibs(void)
{
- free_percpu(sctp_statistics[0]);
- free_percpu(sctp_statistics[1]);
+ CPU_FREE(sctp_statistics[0]);
+ CPU_FREE(sctp_statistics[1]);
}
/* Initialize the universe into something sensible. */
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 26/30] cpu_alloc: convert network sockets
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
` (24 preceding siblings ...)
2007-11-16 23:09 ` [patch 25/30] cpu alloc: convert mib handling to cpu alloc Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
2007-11-16 23:09 ` [patch 27/30] cpu alloc: Explicitly code allocpercpu calls in iucv Christoph Lameter
` (3 subsequent siblings)
29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[-- Attachment #1: 0036-cpu_alloc-convert-network-sockets.patch --]
[-- Type: text/plain, Size: 1342 bytes --]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
net/core/sock.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
Index: linux-2.6/net/core/sock.c
===================================================================
--- linux-2.6.orig/net/core/sock.c 2007-11-15 21:17:23.775404482 -0800
+++ linux-2.6/net/core/sock.c 2007-11-15 21:25:38.183201940 -0800
@@ -1809,21 +1809,21 @@ static LIST_HEAD(proto_list);
*/
static void inuse_add(struct proto *prot, int inc)
{
- per_cpu_ptr(prot->inuse_ptr, smp_processor_id())[0] += inc;
+ THIS_CPU(prot->inuse_ptr)[0] += inc;
}
static int inuse_get(const struct proto *prot)
{
int res = 0, cpu;
for_each_possible_cpu(cpu)
- res += per_cpu_ptr(prot->inuse_ptr, cpu)[0];
+ res += CPU_PTR(prot->inuse_ptr, cpu)[0];
return res;
}
static int inuse_init(struct proto *prot)
{
if (!prot->inuse_getval || !prot->inuse_add) {
- prot->inuse_ptr = alloc_percpu(int);
+ prot->inuse_ptr = CPU_ALLOC(int, GFP_KERNEL);
if (prot->inuse_ptr == NULL)
return -ENOBUFS;
@@ -1836,7 +1836,7 @@ static int inuse_init(struct proto *prot
static void inuse_fini(struct proto *prot)
{
if (prot->inuse_ptr != NULL) {
- free_percpu(prot->inuse_ptr);
+ CPU_FREE(prot->inuse_ptr);
prot->inuse_ptr = NULL;
prot->inuse_getval = NULL;
prot->inuse_add = NULL;
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 27/30] cpu alloc: Explicitly code allocpercpu calls in iucv
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
` (25 preceding siblings ...)
2007-11-16 23:09 ` [patch 26/30] cpu_alloc: convert network sockets Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
2007-11-16 23:09 ` [patch 28/30] cpu alloc: Use for infiniband Christoph Lameter
` (2 subsequent siblings)
29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm
Cc: linux-arch, Martin Schwidefsky, linux-kernel, David Miller,
Eric Dumazet, Peter Zijlstra
[-- Attachment #1: 0037-cpu-alloc-Explicitly-code-allocpercpu-calls-in-iucv.patch --]
[-- Type: text/plain, Size: 10015 bytes --]
The iucv is the only user of the various functions that are used to bring
parts of cpus up and down. Its the only allocpercpu user that will do
I/O on per cpu objects (which is difficult to do with virtually mapped memory).
And its the only use of allocpercpu where a GFP_DMA allocation is done.
Remove the allocpercpu calls from iucv and code the allocation and freeing
manually.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
net/iucv/iucv.c | 107 +++++++++++++++++++++++++++++++-------------------------
1 file changed, 61 insertions(+), 46 deletions(-)
Index: linux-2.6/net/iucv/iucv.c
===================================================================
--- linux-2.6.orig/net/iucv/iucv.c 2007-11-15 21:17:23.719404553 -0800
+++ linux-2.6/net/iucv/iucv.c 2007-11-15 21:25:38.758654051 -0800
@@ -97,7 +97,7 @@ struct iucv_irq_list {
struct iucv_irq_data data;
};
-static struct iucv_irq_data *iucv_irq_data;
+static struct iucv_irq_data *iucv_irq_data[NR_CPUS];
static cpumask_t iucv_buffer_cpumask = CPU_MASK_NONE;
static cpumask_t iucv_irq_cpumask = CPU_MASK_NONE;
@@ -277,7 +277,7 @@ union iucv_param {
/*
* Anchor for per-cpu IUCV command parameter block.
*/
-static union iucv_param *iucv_param;
+static union iucv_param *iucv_param[NR_CPUS];
/**
* iucv_call_b2f0
@@ -356,7 +356,7 @@ static void iucv_allow_cpu(void *data)
* 0x10 - Flag to allow priority message completion interrupts
* 0x08 - Flag to allow IUCV control interrupts
*/
- parm = percpu_ptr(iucv_param, smp_processor_id());
+ parm = iucv_param[cpu];
memset(parm, 0, sizeof(union iucv_param));
parm->set_mask.ipmask = 0xf8;
iucv_call_b2f0(IUCV_SETMASK, parm);
@@ -377,7 +377,7 @@ static void iucv_block_cpu(void *data)
union iucv_param *parm;
/* Disable all iucv interrupts. */
- parm = percpu_ptr(iucv_param, smp_processor_id());
+ parm = iucv_param[cpu];
memset(parm, 0, sizeof(union iucv_param));
iucv_call_b2f0(IUCV_SETMASK, parm);
@@ -401,9 +401,9 @@ static void iucv_declare_cpu(void *data)
return;
/* Declare interrupt buffer. */
- parm = percpu_ptr(iucv_param, cpu);
+ parm = iucv_param[cpu];
memset(parm, 0, sizeof(union iucv_param));
- parm->db.ipbfadr1 = virt_to_phys(percpu_ptr(iucv_irq_data, cpu));
+ parm->db.ipbfadr1 = virt_to_phys(iucv_irq_data[cpu]);
rc = iucv_call_b2f0(IUCV_DECLARE_BUFFER, parm);
if (rc) {
char *err = "Unknown";
@@ -458,7 +458,7 @@ static void iucv_retrieve_cpu(void *data
iucv_block_cpu(NULL);
/* Retrieve interrupt buffer. */
- parm = percpu_ptr(iucv_param, cpu);
+ parm = iucv_param[cpu];
iucv_call_b2f0(IUCV_RETRIEVE_BUFFER, parm);
/* Clear indication that an iucv buffer exists for this cpu. */
@@ -558,22 +558,23 @@ static int __cpuinit iucv_cpu_notify(str
switch (action) {
case CPU_UP_PREPARE:
case CPU_UP_PREPARE_FROZEN:
- if (!percpu_populate(iucv_irq_data,
- sizeof(struct iucv_irq_data),
- GFP_KERNEL|GFP_DMA, cpu))
+ iucv_irq_data[cpu] = kmalloc_node(sizeof(struct iucv_irq_data),
+ GFP_KERNEL|GFP_DMA, cpu_to_node(cpu));
+ if (!iucv_irq_data[cpu])
return NOTIFY_BAD;
- if (!percpu_populate(iucv_param, sizeof(union iucv_param),
- GFP_KERNEL|GFP_DMA, cpu)) {
- percpu_depopulate(iucv_irq_data, cpu);
+ iucv_param[cpu] = kmalloc_node(sizeof(union iucv_param),
+ GFP_KERNEL|GFP_DMA, cpu_to_node(cpu));
+ if (!iucv_param[cpu])
return NOTIFY_BAD;
- }
break;
case CPU_UP_CANCELED:
case CPU_UP_CANCELED_FROZEN:
case CPU_DEAD:
case CPU_DEAD_FROZEN:
- percpu_depopulate(iucv_param, cpu);
- percpu_depopulate(iucv_irq_data, cpu);
+ kfree(iucv_param[cpu]);
+ iucv_param[cpu] = NULL;
+ kfree(iucv_irq_data[cpu]);
+ iucv_irq_data[cpu] = NULL;
break;
case CPU_ONLINE:
case CPU_ONLINE_FROZEN:
@@ -612,7 +613,7 @@ static int iucv_sever_pathid(u16 pathid,
{
union iucv_param *parm;
- parm = percpu_ptr(iucv_param, smp_processor_id());
+ parm = iucv_param[smp_processor_id()];
memset(parm, 0, sizeof(union iucv_param));
if (userdata)
memcpy(parm->ctrl.ipuser, userdata, sizeof(parm->ctrl.ipuser));
@@ -755,7 +756,7 @@ int iucv_path_accept(struct iucv_path *p
local_bh_disable();
/* Prepare parameter block. */
- parm = percpu_ptr(iucv_param, smp_processor_id());
+ parm = iucv_param[smp_processor_id()];
memset(parm, 0, sizeof(union iucv_param));
parm->ctrl.ippathid = path->pathid;
parm->ctrl.ipmsglim = path->msglim;
@@ -799,7 +800,7 @@ int iucv_path_connect(struct iucv_path *
BUG_ON(in_atomic());
spin_lock_bh(&iucv_table_lock);
iucv_cleanup_queue();
- parm = percpu_ptr(iucv_param, smp_processor_id());
+ parm = iucv_param[smp_processor_id()];
memset(parm, 0, sizeof(union iucv_param));
parm->ctrl.ipmsglim = path->msglim;
parm->ctrl.ipflags1 = path->flags;
@@ -854,7 +855,7 @@ int iucv_path_quiesce(struct iucv_path *
int rc;
local_bh_disable();
- parm = percpu_ptr(iucv_param, smp_processor_id());
+ parm = iucv_param[smp_processor_id()];
memset(parm, 0, sizeof(union iucv_param));
if (userdata)
memcpy(parm->ctrl.ipuser, userdata, sizeof(parm->ctrl.ipuser));
@@ -881,7 +882,7 @@ int iucv_path_resume(struct iucv_path *p
int rc;
local_bh_disable();
- parm = percpu_ptr(iucv_param, smp_processor_id());
+ parm = iucv_param[smp_processor_id()];
memset(parm, 0, sizeof(union iucv_param));
if (userdata)
memcpy(parm->ctrl.ipuser, userdata, sizeof(parm->ctrl.ipuser));
@@ -936,7 +937,7 @@ int iucv_message_purge(struct iucv_path
int rc;
local_bh_disable();
- parm = percpu_ptr(iucv_param, smp_processor_id());
+ parm = iucv_param[smp_processor_id()];
memset(parm, 0, sizeof(union iucv_param));
parm->purge.ippathid = path->pathid;
parm->purge.ipmsgid = msg->id;
@@ -1003,7 +1004,7 @@ int iucv_message_receive(struct iucv_pat
}
local_bh_disable();
- parm = percpu_ptr(iucv_param, smp_processor_id());
+ parm = iucv_param[smp_processor_id()];
memset(parm, 0, sizeof(union iucv_param));
parm->db.ipbfadr1 = (u32)(addr_t) buffer;
parm->db.ipbfln1f = (u32) size;
@@ -1040,7 +1041,7 @@ int iucv_message_reject(struct iucv_path
int rc;
local_bh_disable();
- parm = percpu_ptr(iucv_param, smp_processor_id());
+ parm = iucv_param[smp_processor_id()];
memset(parm, 0, sizeof(union iucv_param));
parm->db.ippathid = path->pathid;
parm->db.ipmsgid = msg->id;
@@ -1074,7 +1075,7 @@ int iucv_message_reply(struct iucv_path
int rc;
local_bh_disable();
- parm = percpu_ptr(iucv_param, smp_processor_id());
+ parm = iucv_param[smp_processor_id()];
memset(parm, 0, sizeof(union iucv_param));
if (flags & IUCV_IPRMDATA) {
parm->dpl.ippathid = path->pathid;
@@ -1118,7 +1119,7 @@ int iucv_message_send(struct iucv_path *
int rc;
local_bh_disable();
- parm = percpu_ptr(iucv_param, smp_processor_id());
+ parm = iucv_param[smp_processor_id()];
memset(parm, 0, sizeof(union iucv_param));
if (flags & IUCV_IPRMDATA) {
/* Message of 8 bytes can be placed into the parameter list. */
@@ -1172,7 +1173,7 @@ int iucv_message_send2way(struct iucv_pa
int rc;
local_bh_disable();
- parm = percpu_ptr(iucv_param, smp_processor_id());
+ parm = iucv_param[smp_processor_id()];
memset(parm, 0, sizeof(union iucv_param));
if (flags & IUCV_IPRMDATA) {
parm->dpl.ippathid = path->pathid;
@@ -1559,7 +1560,7 @@ static void iucv_external_interrupt(u16
struct iucv_irq_data *p;
struct iucv_irq_list *work;
- p = percpu_ptr(iucv_irq_data, smp_processor_id());
+ p = iucv_irq_data[smp_processor_id()];
if (p->ippathid >= iucv_max_pathid) {
printk(KERN_WARNING "iucv_do_int: Got interrupt with "
"pathid %d > max_connections (%ld)\n",
@@ -1598,6 +1599,7 @@ static void iucv_external_interrupt(u16
static int __init iucv_init(void)
{
int rc;
+ int cpu;
if (!MACHINE_IS_VM) {
rc = -EPROTONOSUPPORT;
@@ -1617,19 +1619,23 @@ static int __init iucv_init(void)
rc = PTR_ERR(iucv_root);
goto out_bus;
}
- /* Note: GFP_DMA used to get memory below 2G */
- iucv_irq_data = percpu_alloc(sizeof(struct iucv_irq_data),
- GFP_KERNEL|GFP_DMA);
- if (!iucv_irq_data) {
- rc = -ENOMEM;
- goto out_root;
- }
- /* Allocate parameter blocks. */
- iucv_param = percpu_alloc(sizeof(union iucv_param),
- GFP_KERNEL|GFP_DMA);
- if (!iucv_param) {
- rc = -ENOMEM;
- goto out_extint;
+
+ for_each_online_cpu(cpu) {
+ /* Note: GFP_DMA used to get memory below 2G */
+ iucv_irq_data[cpu] = kmalloc_node(sizeof(struct iucv_irq_data),
+ GFP_KERNEL|GFP_DMA, cpu_to_node(cpu));
+ if (!iucv_irq_data[cpu]) {
+ rc = -ENOMEM;
+ goto out_free;
+ }
+
+ /* Allocate parameter blocks. */
+ iucv_param[cpu] = kmalloc_node(sizeof(union iucv_param),
+ GFP_KERNEL|GFP_DMA, cpu_to_node(cpu));
+ if (!iucv_param[cpu]) {
+ rc = -ENOMEM;
+ goto out_free;
+ }
}
register_hotcpu_notifier(&iucv_cpu_notifier);
ASCEBC(iucv_error_no_listener, 16);
@@ -1638,9 +1644,13 @@ static int __init iucv_init(void)
iucv_available = 1;
return 0;
-out_extint:
- percpu_free(iucv_irq_data);
-out_root:
+out_free:
+ for_each_possible_cpu(cpu) {
+ kfree(iucv_param[cpu]);
+ iucv_param[cpu] = NULL;
+ kfree(iucv_irq_data[cpu]);
+ iucv_irq_data[cpu] = NULL;
+ }
s390_root_dev_unregister(iucv_root);
out_bus:
bus_unregister(&iucv_bus);
@@ -1658,6 +1668,7 @@ out:
static void __exit iucv_exit(void)
{
struct iucv_irq_list *p, *n;
+ int cpu;
spin_lock_irq(&iucv_queue_lock);
list_for_each_entry_safe(p, n, &iucv_task_queue, list)
@@ -1666,8 +1677,12 @@ static void __exit iucv_exit(void)
kfree(p);
spin_unlock_irq(&iucv_queue_lock);
unregister_hotcpu_notifier(&iucv_cpu_notifier);
- percpu_free(iucv_param);
- percpu_free(iucv_irq_data);
+ for_each_possible_cpu(cpu) {
+ kfree(iucv_param[cpu]);
+ iucv_param[cpu] = NULL;
+ kfree(iucv_irq_data[cpu]);
+ iucv_irq_data[cpu] = NULL;
+ }
s390_root_dev_unregister(iucv_root);
bus_unregister(&iucv_bus);
unregister_external_interrupt(0x4000, iucv_external_interrupt);
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 28/30] cpu alloc: Use for infiniband
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
` (26 preceding siblings ...)
2007-11-16 23:09 ` [patch 27/30] cpu alloc: Explicitly code allocpercpu calls in iucv Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
2007-11-16 23:09 ` [patch 29/30] cpu alloc: Use in the crypto subsystem Christoph Lameter
2007-11-16 23:09 ` [patch 30/30] cpu alloc: Remove the allocpercpu functionality Christoph Lameter
29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[-- Attachment #1: 0038-cpu-alloc-Use-for-infiniband.patch --]
[-- Type: text/plain, Size: 3555 bytes --]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
drivers/infiniband/hw/ehca/ehca_irq.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
Index: linux-2.6/drivers/infiniband/hw/ehca/ehca_irq.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/ehca/ehca_irq.c 2007-11-15 21:17:23.663404239 -0800
+++ linux-2.6/drivers/infiniband/hw/ehca/ehca_irq.c 2007-11-15 21:25:39.310404188 -0800
@@ -646,7 +646,7 @@ static void queue_comp_task(struct ehca_
cpu_id = find_next_online_cpu(pool);
BUG_ON(!cpu_online(cpu_id));
- cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu_id);
+ cct = CPU_PTR(pool->cpu_comp_tasks, cpu_id);
BUG_ON(!cct);
spin_lock_irqsave(&cct->task_lock, flags);
@@ -654,7 +654,7 @@ static void queue_comp_task(struct ehca_
spin_unlock_irqrestore(&cct->task_lock, flags);
if (cq_jobs > 0) {
cpu_id = find_next_online_cpu(pool);
- cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu_id);
+ cct = CPU_PTR(pool->cpu_comp_tasks, cpu_id);
BUG_ON(!cct);
}
@@ -727,7 +727,7 @@ static struct task_struct *create_comp_t
{
struct ehca_cpu_comp_task *cct;
- cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu);
+ cct = CPU_PTR(pool->cpu_comp_tasks, cpu);
spin_lock_init(&cct->task_lock);
INIT_LIST_HEAD(&cct->cq_list);
init_waitqueue_head(&cct->wait_queue);
@@ -743,7 +743,7 @@ static void destroy_comp_task(struct ehc
struct task_struct *task;
unsigned long flags_cct;
- cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu);
+ cct = CPU_PTR(pool->cpu_comp_tasks, cpu);
spin_lock_irqsave(&cct->task_lock, flags_cct);
@@ -759,7 +759,7 @@ static void destroy_comp_task(struct ehc
static void __cpuinit take_over_work(struct ehca_comp_pool *pool, int cpu)
{
- struct ehca_cpu_comp_task *cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu);
+ struct ehca_cpu_comp_task *cct = CPU_PTR(pool->cpu_comp_tasks, cpu);
LIST_HEAD(list);
struct ehca_cq *cq;
unsigned long flags_cct;
@@ -772,8 +772,7 @@ static void __cpuinit take_over_work(str
cq = list_entry(cct->cq_list.next, struct ehca_cq, entry);
list_del(&cq->entry);
- __queue_comp_task(cq, per_cpu_ptr(pool->cpu_comp_tasks,
- smp_processor_id()));
+ __queue_comp_task(cq, THIS_CPU(pool->cpu_comp_tasks));
}
spin_unlock_irqrestore(&cct->task_lock, flags_cct);
@@ -799,14 +798,14 @@ static int __cpuinit comp_pool_callback(
case CPU_UP_CANCELED:
case CPU_UP_CANCELED_FROZEN:
ehca_gen_dbg("CPU: %x (CPU_CANCELED)", cpu);
- cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu);
+ cct = CPU_PTR(pool->cpu_comp_tasks, cpu);
kthread_bind(cct->task, any_online_cpu(cpu_online_map));
destroy_comp_task(pool, cpu);
break;
case CPU_ONLINE:
case CPU_ONLINE_FROZEN:
ehca_gen_dbg("CPU: %x (CPU_ONLINE)", cpu);
- cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu);
+ cct = CPU_PTR(pool->cpu_comp_tasks, cpu);
kthread_bind(cct->task, cpu);
wake_up_process(cct->task);
break;
@@ -849,7 +848,8 @@ int ehca_create_comp_pool(void)
spin_lock_init(&pool->last_cpu_lock);
pool->last_cpu = any_online_cpu(cpu_online_map);
- pool->cpu_comp_tasks = alloc_percpu(struct ehca_cpu_comp_task);
+ pool->cpu_comp_tasks = CPU_ALLOC(struct ehca_cpu_comp_task,
+ GFP_KERNEL | __GFP_ZERO);
if (pool->cpu_comp_tasks == NULL) {
kfree(pool);
return -EINVAL;
@@ -883,6 +883,6 @@ void ehca_destroy_comp_pool(void)
if (cpu_online(i))
destroy_comp_task(pool, i);
}
- free_percpu(pool->cpu_comp_tasks);
+ CPU_FREE(pool->cpu_comp_tasks);
kfree(pool);
}
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 29/30] cpu alloc: Use in the crypto subsystem.
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
` (27 preceding siblings ...)
2007-11-16 23:09 ` [patch 28/30] cpu alloc: Use for infiniband Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
2007-11-16 23:09 ` [patch 30/30] cpu alloc: Remove the allocpercpu functionality Christoph Lameter
29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[-- Attachment #1: 0039-cpu-alloc-Use-in-the-crypto-subsystem.patch --]
[-- Type: text/plain, Size: 2245 bytes --]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
crypto/async_tx/async_tx.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
Index: linux-2.6/crypto/async_tx/async_tx.c
===================================================================
--- linux-2.6.orig/crypto/async_tx/async_tx.c 2007-11-15 21:17:23.610404668 -0800
+++ linux-2.6/crypto/async_tx/async_tx.c 2007-11-15 21:25:39.834904080 -0800
@@ -207,10 +207,10 @@ static void async_tx_rebalance(void)
for_each_dma_cap_mask(cap, dma_cap_mask_all)
for_each_possible_cpu(cpu) {
struct dma_chan_ref *ref =
- per_cpu_ptr(channel_table[cap], cpu)->ref;
+ CPU_PTR(channel_table[cap], cpu)->ref;
if (ref) {
atomic_set(&ref->count, 0);
- per_cpu_ptr(channel_table[cap], cpu)->ref =
+ CPU_PTR(channel_table[cap], cpu)->ref =
NULL;
}
}
@@ -223,7 +223,7 @@ static void async_tx_rebalance(void)
else
new = get_chan_ref_by_cap(cap, -1);
- per_cpu_ptr(channel_table[cap], cpu)->ref = new;
+ CPU_PTR(channel_table[cap], cpu)->ref = new;
}
spin_unlock_irqrestore(&async_tx_lock, flags);
@@ -327,7 +327,8 @@ async_tx_init(void)
clear_bit(DMA_INTERRUPT, dma_cap_mask_all.bits);
for_each_dma_cap_mask(cap, dma_cap_mask_all) {
- channel_table[cap] = alloc_percpu(struct chan_ref_percpu);
+ channel_table[cap] = CPU_ALLOC(struct chan_ref_percpu,
+ GFP_KERNEL | __GFP_ZERO);
if (!channel_table[cap])
goto err;
}
@@ -343,7 +344,7 @@ err:
printk(KERN_ERR "async_tx: initialization failure\n");
while (--cap >= 0)
- free_percpu(channel_table[cap]);
+ CPU_FRE(channel_table[cap]);
return 1;
}
@@ -356,7 +357,7 @@ static void __exit async_tx_exit(void)
for_each_dma_cap_mask(cap, dma_cap_mask_all)
if (channel_table[cap])
- free_percpu(channel_table[cap]);
+ CPU_FREE(channel_table[cap]);
dma_async_client_unregister(&async_tx_dma);
}
@@ -378,7 +379,7 @@ async_tx_find_channel(struct dma_async_t
else if (likely(channel_table_initialized)) {
struct dma_chan_ref *ref;
int cpu = get_cpu();
- ref = per_cpu_ptr(channel_table[tx_type], cpu)->ref;
+ ref = CPU_PTR(channel_table[tx_type], cpu)->ref;
put_cpu();
return ref ? ref->chan : NULL;
} else
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 30/30] cpu alloc: Remove the allocpercpu functionality
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
` (28 preceding siblings ...)
2007-11-16 23:09 ` [patch 29/30] cpu alloc: Use in the crypto subsystem Christoph Lameter
@ 2007-11-16 23:09 ` Christoph Lameter
29 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-16 23:09 UTC (permalink / raw)
To: akpm; +Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
[-- Attachment #1: 0040-cpu-alloc-Remove-the-allocpercpu-functionality.patch --]
[-- Type: text/plain, Size: 7861 bytes --]
There is no user of allocpercpu left after all the earlier patches were
applied. Remove the code that realizes allocpercpu.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
include/linux/percpu.h | 80 ------------------------------
mm/Makefile | 1
mm/allocpercpu.c | 127 -------------------------------------------------
3 files changed, 208 deletions(-)
delete mode 100644 mm/allocpercpu.c
Index: linux-2.6/include/linux/percpu.h
===================================================================
--- linux-2.6.orig/include/linux/percpu.h 2007-11-15 21:24:50.730654207 -0800
+++ linux-2.6/include/linux/percpu.h 2007-11-15 21:25:40.302313443 -0800
@@ -31,86 +31,6 @@
&__get_cpu_var(var); }))
#define put_cpu_var(var) preempt_enable()
-#ifdef CONFIG_SMP
-
-struct percpu_data {
- void *ptrs[NR_CPUS];
-};
-
-#define __percpu_disguise(pdata) (struct percpu_data *)~(unsigned long)(pdata)
-/*
- * Use this to get to a cpu's version of the per-cpu object dynamically
- * allocated. Non-atomic access to the current CPU's version should
- * probably be combined with get_cpu()/put_cpu().
- */
-#define percpu_ptr(ptr, cpu) \
-({ \
- struct percpu_data *__p = __percpu_disguise(ptr); \
- (__typeof__(ptr))__p->ptrs[(cpu)]; \
-})
-
-extern void *percpu_populate(void *__pdata, size_t size, gfp_t gfp, int cpu);
-extern void percpu_depopulate(void *__pdata, int cpu);
-extern int __percpu_populate_mask(void *__pdata, size_t size, gfp_t gfp,
- cpumask_t *mask);
-extern void __percpu_depopulate_mask(void *__pdata, cpumask_t *mask);
-extern void *__percpu_alloc_mask(size_t size, gfp_t gfp, cpumask_t *mask);
-extern void percpu_free(void *__pdata);
-
-#else /* CONFIG_SMP */
-
-#define percpu_ptr(ptr, cpu) ({ (void)(cpu); (ptr); })
-
-static inline void percpu_depopulate(void *__pdata, int cpu)
-{
-}
-
-static inline void __percpu_depopulate_mask(void *__pdata, cpumask_t *mask)
-{
-}
-
-static inline void *percpu_populate(void *__pdata, size_t size, gfp_t gfp,
- int cpu)
-{
- return percpu_ptr(__pdata, cpu);
-}
-
-static inline int __percpu_populate_mask(void *__pdata, size_t size, gfp_t gfp,
- cpumask_t *mask)
-{
- return 0;
-}
-
-static __always_inline void *__percpu_alloc_mask(size_t size, gfp_t gfp, cpumask_t *mask)
-{
- return kzalloc(size, gfp);
-}
-
-static inline void percpu_free(void *__pdata)
-{
- kfree(__pdata);
-}
-
-#endif /* CONFIG_SMP */
-
-#define percpu_populate_mask(__pdata, size, gfp, mask) \
- __percpu_populate_mask((__pdata), (size), (gfp), &(mask))
-#define percpu_depopulate_mask(__pdata, mask) \
- __percpu_depopulate_mask((__pdata), &(mask))
-#define percpu_alloc_mask(size, gfp, mask) \
- __percpu_alloc_mask((size), (gfp), &(mask))
-
-#define percpu_alloc(size, gfp) percpu_alloc_mask((size), (gfp), cpu_online_map)
-
-/* (legacy) interface for use without CPU hotplug handling */
-
-#define __alloc_percpu(size) percpu_alloc_mask((size), GFP_KERNEL, \
- cpu_possible_map)
-#define alloc_percpu(type) (type *)__alloc_percpu(sizeof(type))
-#define free_percpu(ptr) percpu_free((ptr))
-#define per_cpu_ptr(ptr, cpu) percpu_ptr((ptr), (cpu))
-
-
/*
* cpu allocator definitions
*
Index: linux-2.6/mm/Makefile
===================================================================
--- linux-2.6.orig/mm/Makefile 2007-11-15 21:24:50.726654353 -0800
+++ linux-2.6/mm/Makefile 2007-11-15 21:25:40.302313443 -0800
@@ -28,6 +28,5 @@ obj-$(CONFIG_SLUB) += slub.o
obj-$(CONFIG_MEMORY_HOTPLUG) += memory_hotplug.o
obj-$(CONFIG_FS_XIP) += filemap_xip.o
obj-$(CONFIG_MIGRATION) += migrate.o
-obj-$(CONFIG_SMP) += allocpercpu.o
obj-$(CONFIG_QUICKLIST) += quicklist.o
Index: linux-2.6/mm/allocpercpu.c
===================================================================
--- linux-2.6.orig/mm/allocpercpu.c 2007-11-15 21:17:23.570404405 -0800
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,127 +0,0 @@
-/*
- * linux/mm/allocpercpu.c
- *
- * Separated from slab.c August 11, 2006 Christoph Lameter <clameter@sgi.com>
- */
-#include <linux/mm.h>
-#include <linux/module.h>
-
-/**
- * percpu_depopulate - depopulate per-cpu data for given cpu
- * @__pdata: per-cpu data to depopulate
- * @cpu: depopulate per-cpu data for this cpu
- *
- * Depopulating per-cpu data for a cpu going offline would be a typical
- * use case. You need to register a cpu hotplug handler for that purpose.
- */
-void percpu_depopulate(void *__pdata, int cpu)
-{
- struct percpu_data *pdata = __percpu_disguise(__pdata);
-
- kfree(pdata->ptrs[cpu]);
- pdata->ptrs[cpu] = NULL;
-}
-EXPORT_SYMBOL_GPL(percpu_depopulate);
-
-/**
- * percpu_depopulate_mask - depopulate per-cpu data for some cpu's
- * @__pdata: per-cpu data to depopulate
- * @mask: depopulate per-cpu data for cpu's selected through mask bits
- */
-void __percpu_depopulate_mask(void *__pdata, cpumask_t *mask)
-{
- int cpu;
- for_each_cpu_mask(cpu, *mask)
- percpu_depopulate(__pdata, cpu);
-}
-EXPORT_SYMBOL_GPL(__percpu_depopulate_mask);
-
-/**
- * percpu_populate - populate per-cpu data for given cpu
- * @__pdata: per-cpu data to populate further
- * @size: size of per-cpu object
- * @gfp: may sleep or not etc.
- * @cpu: populate per-data for this cpu
- *
- * Populating per-cpu data for a cpu coming online would be a typical
- * use case. You need to register a cpu hotplug handler for that purpose.
- * Per-cpu object is populated with zeroed buffer.
- */
-void *percpu_populate(void *__pdata, size_t size, gfp_t gfp, int cpu)
-{
- struct percpu_data *pdata = __percpu_disguise(__pdata);
- int node = cpu_to_node(cpu);
-
- BUG_ON(pdata->ptrs[cpu]);
- if (node_online(node))
- pdata->ptrs[cpu] = kmalloc_node(size, gfp|__GFP_ZERO, node);
- else
- pdata->ptrs[cpu] = kzalloc(size, gfp);
- return pdata->ptrs[cpu];
-}
-EXPORT_SYMBOL_GPL(percpu_populate);
-
-/**
- * percpu_populate_mask - populate per-cpu data for more cpu's
- * @__pdata: per-cpu data to populate further
- * @size: size of per-cpu object
- * @gfp: may sleep or not etc.
- * @mask: populate per-cpu data for cpu's selected through mask bits
- *
- * Per-cpu objects are populated with zeroed buffers.
- */
-int __percpu_populate_mask(void *__pdata, size_t size, gfp_t gfp,
- cpumask_t *mask)
-{
- cpumask_t populated = CPU_MASK_NONE;
- int cpu;
-
- for_each_cpu_mask(cpu, *mask)
- if (unlikely(!percpu_populate(__pdata, size, gfp, cpu))) {
- __percpu_depopulate_mask(__pdata, &populated);
- return -ENOMEM;
- } else
- cpu_set(cpu, populated);
- return 0;
-}
-EXPORT_SYMBOL_GPL(__percpu_populate_mask);
-
-/**
- * percpu_alloc_mask - initial setup of per-cpu data
- * @size: size of per-cpu object
- * @gfp: may sleep or not etc.
- * @mask: populate per-data for cpu's selected through mask bits
- *
- * Populating per-cpu data for all online cpu's would be a typical use case,
- * which is simplified by the percpu_alloc() wrapper.
- * Per-cpu objects are populated with zeroed buffers.
- */
-void *__percpu_alloc_mask(size_t size, gfp_t gfp, cpumask_t *mask)
-{
- void *pdata = kzalloc(sizeof(struct percpu_data), gfp);
- void *__pdata = __percpu_disguise(pdata);
-
- if (unlikely(!pdata))
- return NULL;
- if (likely(!__percpu_populate_mask(__pdata, size, gfp, mask)))
- return __pdata;
- kfree(pdata);
- return NULL;
-}
-EXPORT_SYMBOL_GPL(__percpu_alloc_mask);
-
-/**
- * percpu_free - final cleanup of per-cpu data
- * @__pdata: object to clean up
- *
- * We simply clean up any per-cpu object left. No need for the client to
- * track and specify through a bis mask which per-cpu objects are to free.
- */
-void percpu_free(void *__pdata)
-{
- if (unlikely(!__pdata))
- return;
- __percpu_depopulate_mask(__pdata, &cpu_possible_map);
- kfree(__percpu_disguise(__pdata));
-}
-EXPORT_SYMBOL_GPL(percpu_free);
--
^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [patch 07/30] cpu alloc: IA64 support
2007-11-16 23:09 ` [patch 07/30] cpu alloc: IA64 support Christoph Lameter
@ 2007-11-16 23:32 ` Luck, Tony
2007-11-17 0:05 ` Christoph Lameter
0 siblings, 1 reply; 34+ messages in thread
From: Luck, Tony @ 2007-11-16 23:32 UTC (permalink / raw)
To: Christoph Lameter, akpm
Cc: linux-arch, linux-kernel, David Miller, Eric Dumazet, Peter Zijlstra
+# Maximum of 128 MB cpu_alloc space per cpu
+config CPU_AREA_ORDER
+ int
+ default "13"
Comment only matches code when page size is 16K ... and we are (slowly)
moving to 64k as the default (which with order 13 allocation would mean
512M)
-Tony
^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [patch 07/30] cpu alloc: IA64 support
2007-11-16 23:32 ` Luck, Tony
@ 2007-11-17 0:05 ` Christoph Lameter
0 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-11-17 0:05 UTC (permalink / raw)
To: Luck, Tony
Cc: akpm, linux-arch, linux-kernel, David Miller, Eric Dumazet,
Peter Zijlstra
On Fri, 16 Nov 2007, Luck, Tony wrote:
> +# Maximum of 128 MB cpu_alloc space per cpu
> +config CPU_AREA_ORDER
> + int
> + default "13"
>
> Comment only matches code when page size is 16K ... and we are (slowly)
> moving to 64k as the default (which with order 13 allocation would mean
> 512M)
But the page table also grow so 512M may be okay?
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 16/30] cpu alloc: XFS counters
2007-11-16 23:09 ` [patch 16/30] cpu alloc: XFS counters Christoph Lameter
@ 2007-11-19 12:58 ` Christoph Hellwig
0 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2007-11-19 12:58 UTC (permalink / raw)
To: Christoph Lameter
Cc: akpm, linux-arch, linux-kernel, David Miller, Eric Dumazet,
Peter Zijlstra
On Fri, Nov 16, 2007 at 03:09:36PM -0800, Christoph Lameter wrote:
> cntp = (xfs_icsb_cnts_t *)
> - per_cpu_ptr(mp->m_sb_cnts, (unsigned long)hcpu);
> + CPU_PTR(mp->m_sb_cnts, (unsigned long)hcpu);
> - mp->m_sb_cnts = alloc_percpu(xfs_icsb_cnts_t);
> + mp->m_sb_cnts = CPU_ALLOC(xfs_icsb_cnts_t, GFP_KERNEL | __GFP_ZERO);
What's the point for renaming these? And even if you have a case for
renamining them please give them proper lower-case names.
^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2007-11-19 12:58 UTC | newest]
Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-11-16 23:09 [patch 00/30] cpu alloc v2: Optimize by removing arrays of pointers to per cpu objects Christoph Lameter
2007-11-16 23:09 ` [patch 01/30] cpu alloc: Simple version of the allocator (static allocations) Christoph Lameter
2007-11-16 23:09 ` [patch 02/30] cpu alloc: Use in SLUB Christoph Lameter
2007-11-16 23:09 ` [patch 03/30] cpu alloc: Remove SLUB fields Christoph Lameter
2007-11-16 23:09 ` [patch 04/30] cpu alloc: page allocator conversion Christoph Lameter
2007-11-16 23:09 ` [patch 05/30] cpu_alloc: Implement dynamically extendable cpu areas Christoph Lameter
2007-11-16 23:09 ` [patch 06/30] cpu alloc: x86 support Christoph Lameter
2007-11-16 23:09 ` [patch 07/30] cpu alloc: IA64 support Christoph Lameter
2007-11-16 23:32 ` Luck, Tony
2007-11-17 0:05 ` Christoph Lameter
2007-11-16 23:09 ` [patch 08/30] cpu_alloc: Sparc64 support Christoph Lameter
2007-11-16 23:09 ` [patch 09/30] cpu alloc: percpu_counter conversion Christoph Lameter
2007-11-16 23:09 ` [patch 10/30] cpu alloc: crash_notes conversion Christoph Lameter
2007-11-16 23:09 ` [patch 11/30] cpu alloc: workqueue conversion Christoph Lameter
2007-11-16 23:09 ` [patch 12/30] cpu alloc: ACPI cstate handling conversion Christoph Lameter
2007-11-16 23:09 ` [patch 13/30] cpu alloc: genhd statistics conversion Christoph Lameter
2007-11-16 23:09 ` [patch 14/30] cpu alloc: blktrace conversion Christoph Lameter
2007-11-16 23:09 ` [patch 15/30] cpu alloc: SRCU Christoph Lameter
2007-11-16 23:09 ` [patch 16/30] cpu alloc: XFS counters Christoph Lameter
2007-11-19 12:58 ` Christoph Hellwig
2007-11-16 23:09 ` [patch 17/30] cpu alloc: NFS statistics Christoph Lameter
2007-11-16 23:09 ` [patch 18/30] cpu alloc: neigbour statistics Christoph Lameter
2007-11-16 23:09 ` [patch 19/30] cpu alloc: tcp statistics Christoph Lameter
2007-11-16 23:09 ` [patch 20/30] cpu alloc: convert scatches Christoph Lameter
2007-11-16 23:09 ` [patch 21/30] cpu alloc: dmaengine conversion Christoph Lameter
2007-11-16 23:09 ` [patch 22/30] cpu alloc: convert loopback statistics Christoph Lameter
2007-11-16 23:09 ` [patch 23/30] cpu alloc: veth conversion Christoph Lameter
2007-11-16 23:09 ` [patch 24/30] cpu alloc: Chelsio statistics conversion Christoph Lameter
2007-11-16 23:09 ` [patch 25/30] cpu alloc: convert mib handling to cpu alloc Christoph Lameter
2007-11-16 23:09 ` [patch 26/30] cpu_alloc: convert network sockets Christoph Lameter
2007-11-16 23:09 ` [patch 27/30] cpu alloc: Explicitly code allocpercpu calls in iucv Christoph Lameter
2007-11-16 23:09 ` [patch 28/30] cpu alloc: Use for infiniband Christoph Lameter
2007-11-16 23:09 ` [patch 29/30] cpu alloc: Use in the crypto subsystem Christoph Lameter
2007-11-16 23:09 ` [patch 30/30] cpu alloc: Remove the allocpercpu functionality Christoph Lameter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).