linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC] Reimplementation of linux dynamic percpu memory allocator
@ 2004-12-17 22:29 Manfred Spraul
  2004-12-20 18:20 ` Ravikiran G Thirumalai
  0 siblings, 1 reply; 8+ messages in thread
From: Manfred Spraul @ 2004-12-17 22:29 UTC (permalink / raw)
  To: Ravikiran G Thirumalai; +Cc: Linux Kernel Mailing List

Hi kiran,

>+ * 
>+ * Originally by Dipankar Sarma and Ravikiran Thirumalai,
>+ * This reimplements alloc_percpu to make it 
>+ * 1. Independent of slab/kmalloc
>  
>
Probably the right approach. slab should use per-cpu for it's internal 
head arrays, but I've never converted the slab code due to 
chicken-and-egg problems and due to the additional pointer dereference.

>+ * Allocator is slow -- expected to be called during module/subsytem
>+ * init. alloc_percpu can block.
>  
>
How slow is slow?
I think the block subsystem uses alloc_percpu for some statistics 
counters, i.e. one alloc during creation of a new disk. The slab 
implementation was really slow and that caused problems with LVM (or 
something like that) stress tests.

>+	/* Map pages for each cpu by splitting vm_struct for each cpu */
>+	for (i = 0; i < NR_CPUS; i++) {
>+		if (cpu_possible(i)) {
>+			tmppage = &blkp->pages[i*cpu_pages];
>+			tmp.addr = area->addr + i * PCPU_BLKSIZE;
>+			/* map_vm_area assumes a guard page of size PAGE_SIZE */
>+			tmp.size = PCPU_BLKSIZE + PAGE_SIZE; 
>+			if (map_vm_area(&tmp, PAGE_KERNEL, &tmppage))
>+				goto fail_map;
>  
>
That means no large pte entries for the per-cpu allocations, right?
I think that's a bad idea for non-numa systems. What about a fallback to 
simple getfreepages() for non-numa systems?

>+ * This allocator is slow as we assume allocs to come
>+ * by during boot/module init.
>+ * Should not be called from interrupt context 
>  
>
"Must not" - it contains down() and thus can sleep.

--
    Manfred



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Reimplementation of linux dynamic percpu memory allocator
  2004-12-17 22:29 [RFC] Reimplementation of linux dynamic percpu memory allocator Manfred Spraul
@ 2004-12-20 18:20 ` Ravikiran G Thirumalai
  2004-12-20 18:24   ` Manfred Spraul
  0 siblings, 1 reply; 8+ messages in thread
From: Ravikiran G Thirumalai @ 2004-12-20 18:20 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: Linux Kernel Mailing List

On Fri, Dec 17, 2004 at 11:29:42PM +0100, Manfred Spraul wrote:
> >
> Probably the right approach. slab should use per-cpu for it's internal 
> head arrays, but I've never converted the slab code due to 
> chicken-and-egg problems and due to the additional pointer dereference.
> 
> >+ * Allocator is slow -- expected to be called during module/subsytem
> >+ * init. alloc_percpu can block.
> > 
> >
> How slow is slow?

Haven't measured it, but the allocator is not designed for speed.
Once the block to be alloced from is identified, the allocator builds 
and sorts a map of objects in ascending order so that we allocate 
from the smallest chunk.  Goal is to enhance memory/cacheline utlization 
and reduce fragmentation rather than speed. It is not expected that
the allocator  will be used from the fastpath.

> I think the block subsystem uses alloc_percpu for some statistics 
> counters, i.e. one alloc during creation of a new disk. The slab 
> implementation was really slow and that caused problems with LVM (or 
> something like that) stress tests.

Hmmm..I knew from some experiments earlier that access to per cpu versions
of memory was slow with the slab based implementation -- which this patch
addresses, but I didn't know allocs themselves were slow...
Creation of a disk should not be a fast path no?
 
> 
> >+	/* Map pages for each cpu by splitting vm_struct for each cpu */
> >+	for (i = 0; i < NR_CPUS; i++) {
> >+		if (cpu_possible(i)) {
> >+			tmppage = &blkp->pages[i*cpu_pages];
> >+			tmp.addr = area->addr + i * PCPU_BLKSIZE;
> >+			/* map_vm_area assumes a guard page of size 
> >PAGE_SIZE */
> >+			tmp.size = PCPU_BLKSIZE + PAGE_SIZE; 
> >+			if (map_vm_area(&tmp, PAGE_KERNEL, &tmppage))
> >+				goto fail_map;
> > 
> >
> That means no large pte entries for the per-cpu allocations, right?
> I think that's a bad idea for non-numa systems. What about a fallback to 
> simple getfreepages() for non-numa systems?

Can we have large pte entries with PAGE_SIZEd pages?  

> >+ * This allocator is slow as we assume allocs to come
> >+ * by during boot/module init.
> >+ * Should not be called from interrupt context 
> > 
> >
> "Must not" - it contains down() and thus can sleep.
> 

:D Yes will replace 'should not' with 'must not' in my next iteration.

Thanks for the comments and feedback.

Kiran

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Reimplementation of linux dynamic percpu memory allocator
  2004-12-20 18:20 ` Ravikiran G Thirumalai
@ 2004-12-20 18:24   ` Manfred Spraul
  2004-12-20 19:25     ` Ravikiran G Thirumalai
  0 siblings, 1 reply; 8+ messages in thread
From: Manfred Spraul @ 2004-12-20 18:24 UTC (permalink / raw)
  To: Ravikiran G Thirumalai; +Cc: Linux Kernel Mailing List

Ravikiran G Thirumalai wrote:

>Hmmm..I knew from some experiments earlier that access to per cpu versions
>of memory was slow with the slab based implementation -- which this patch
>addresses, but I didn't know allocs themselves were slow...
>Creation of a disk should not be a fast path no?
>  
>
No, not fast path. But it can happen a few thousand times. The slab 
implementation failed due to heavy internal fragmentation. If your code 
runs fine with a few thousand users, then there shouldn't be a problem.

>>>      
>>>
>>That means no large pte entries for the per-cpu allocations, right?
>>I think that's a bad idea for non-numa systems. What about a fallback to 
>>simple getfreepages() for non-numa systems?
>>    
>>
>
>Can we have large pte entries with PAGE_SIZEd pages?  
>
>  
>
For non-NUMA systems, I would use get_free_pages() to allocate a 
multi-page area instead of map_vm_area(). Typically, get_free_pages() is 
backed by large pte memory and map_vm_area() by normal virtual memory.

--
    Manfred

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Reimplementation of linux dynamic percpu memory allocator
  2004-12-20 18:24   ` Manfred Spraul
@ 2004-12-20 19:25     ` Ravikiran G Thirumalai
  2004-12-29 16:33       ` Manfred Spraul
  0 siblings, 1 reply; 8+ messages in thread
From: Ravikiran G Thirumalai @ 2004-12-20 19:25 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: Linux Kernel Mailing List

On Mon, Dec 20, 2004 at 07:24:07PM +0100, Manfred Spraul wrote:
> >
> No, not fast path. But it can happen a few thousand times. The slab 
> implementation failed due to heavy internal fragmentation. If your code 
> runs fine with a few thousand users, then there shouldn't be a problem.

If there is a stress test I can use, I can try running it.

> >>>     
> >..
> For non-NUMA systems, I would use get_free_pages() to allocate a 
> multi-page area instead of map_vm_area(). Typically, get_free_pages() is 
> backed by large pte memory and map_vm_area() by normal virtual memory.

Hmm...the arithmetic becomes tricky then.  Right now I allocate
NR_CPUS * PCU_BLOCKSIZE + BLOCK_MANAGEMENT_SIZE amount of KVA for a block,
allocate pages for cpu_possible cpus and map corresponding va space
with allocated pages using map_vm_area.  We may fragment if
NR_CPUS * PCPU_BLOCKSIZE doesn't fit into a proper page order,
also we'd be wasting pages for !cpu_possible(cpus) of NR_CPUS

Thanks,
Kiran

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Reimplementation of linux dynamic percpu memory allocator
  2004-12-20 19:25     ` Ravikiran G Thirumalai
@ 2004-12-29 16:33       ` Manfred Spraul
  2004-12-29 17:52         ` Ravikiran G Thirumalai
  2005-01-12 18:12         ` Ravikiran G Thirumalai
  0 siblings, 2 replies; 8+ messages in thread
From: Manfred Spraul @ 2004-12-29 16:33 UTC (permalink / raw)
  To: Ravikiran G Thirumalai; +Cc: Linux Kernel Mailing List

Ravikiran G Thirumalai wrote:

>On Mon, Dec 20, 2004 at 07:24:07PM +0100, Manfred Spraul wrote:
>  
>
>>No, not fast path. But it can happen a few thousand times. The slab 
>>implementation failed due to heavy internal fragmentation. If your code 
>>runs fine with a few thousand users, then there shouldn't be a problem.
>>    
>>
>
>  
>
Could you ask Badari Pulavarty (pbadari@us.ibm.com)?
He noticed the fragmentation problem with the original 
kmem_cache_alloc_node implementation. Perhaps he could just run your 
version with his test setup:
The thread with the fix is at:

http://marc.theaimsgroup.com/?t=109735434400002&r=1&w=2

--
    Manfred

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Reimplementation of linux dynamic percpu memory allocator
  2004-12-29 16:33       ` Manfred Spraul
@ 2004-12-29 17:52         ` Ravikiran G Thirumalai
  2005-01-12 18:12         ` Ravikiran G Thirumalai
  1 sibling, 0 replies; 8+ messages in thread
From: Ravikiran G Thirumalai @ 2004-12-29 17:52 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: Linux Kernel Mailing List

On Wed, Dec 29, 2004 at 05:33:44PM +0100, Manfred Spraul wrote:
> Ravikiran G Thirumalai wrote:
> 
> >On Mon, Dec 20, 2004 at 07:24:07PM +0100, Manfred Spraul wrote:
> > 
> >
> >>No, not fast path. But it can happen a few thousand times. The slab 
> >>implementation failed due to heavy internal fragmentation. If your code 
> >>runs fine with a few thousand users, then there shouldn't be a problem.
> >>   
> >>
> >
> > 
> >
> Could you ask Badari Pulavarty (pbadari@us.ibm.com)?
> He noticed the fragmentation problem with the original 
> kmem_cache_alloc_node implementation. Perhaps he could just run your 
> version with his test setup:

Yes I will.  Thanks for the pointer.


Thanks,
Kiran

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Reimplementation of linux dynamic percpu memory allocator
  2004-12-29 16:33       ` Manfred Spraul
  2004-12-29 17:52         ` Ravikiran G Thirumalai
@ 2005-01-12 18:12         ` Ravikiran G Thirumalai
  1 sibling, 0 replies; 8+ messages in thread
From: Ravikiran G Thirumalai @ 2005-01-12 18:12 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: Linux Kernel Mailing List, pbadari

On Wed, Dec 29, 2004 at 05:33:44PM +0100, Manfred Spraul wrote:
> Ravikiran G Thirumalai wrote:
> 
> >
> Could you ask Badari Pulavarty (pbadari@us.ibm.com)?
> He noticed the fragmentation problem with the original 
> kmem_cache_alloc_node implementation. Perhaps he could just run your 
> version with his test setup:
> The thread with the fix is at:
> 
> http://marc.theaimsgroup.com/?t=109735434400002&r=1&w=2
> 

Manfred,
Badari's test was to create thousands of scsi devices with
scsi_debug on a multiprocessor x86_64 box with CONFIG_NUMA.  
I tried out something similar -- create 2000 scsi disks on 2 way x86_64 box.
Here are the results; All numbers in kB from /proc/meminfo.

1) Without your patch to reduce fragmentation due to kmem_cache_alloc_node:
		Before  After disk	Difference
		disks	creation	
		----------------------------------
MemTotal	5009956	5009956		0
MemFree		4949428	4868300		81128

2) With your patch to reduce fragmentation due to kmem_cache_alloc_node:
		Before  After disk	Difference
		disks	creation	
		----------------------------------
MemTotal	5009956	5009956		0
MemFree		4947380	4874876		72504

3) With the new alloc_percpu implementation which does not use slab:
		Before  After disk	Difference
		disks	creation	
		----------------------------------
MemTotal	5009956	5009956		0
MemFree		4923244	4851648		71596

As you can see, the alloc_percpu reimplementation doesn't fragment.

Also, I'd ran some user space stress tests to check the allocator's
utilization levels.  I reran them and here's the result:

A) Test description and results for a 'counters only' test run:
With a block size of 8192 bytes,
1. Allocate 2000 4 byte counters 
	At the end of allocation, 1 8192byte block exists with a usecount
	of 8000 
2. Free a random number of objects in random order.
	After freeing 3992 bytes of memory, one block exists in the
	allocator with a usecount of 4008
3. Allocate 2000 4 byte counters again
	At the end of allocation 2 blocks exists with usecounts of 
	8192 and 3816
4. Free all remaining objects
	All objects go away and the allocator doesn't have any blocks left

B) Test description and results for a 'random sized objects' test run:
With a block size of 8192 bytes,
1. Allocate 2000 random sized objects with random alignment specifications
	At the end of allocation, 504 8192byte blocks exist in the system
	after allocating 4076556 bytes of objects -- with 98.7 % utilization
2. Free a random number of objects in random order.
	After freeing 2020220 bytes of memory, 458 blocks exist. 
	Utilization is 2056336/3751936 -- 54.8 %
3. Allocate 2000 random sized objects as in step 1 again
	After adding 4155972 bytes of objects, 772 blocks summing up to
	6324224 bytes exist in the allocator.  objects adding up  to 
	6212308 bytes have been allocated, which means a utilization
	level of 98.2 %.

I guess this proves that the allocator behaves quite well in terms of
fragmentation.

Thanks,
Kiran

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC] Reimplementation of linux dynamic percpu memory allocator
@ 2004-12-17 22:03 Ravikiran G Thirumalai
  0 siblings, 0 replies; 8+ messages in thread
From: Ravikiran G Thirumalai @ 2004-12-17 22:03 UTC (permalink / raw)
  To: linux-kernel; +Cc: dipankar

The following patch reimplements the linux dynamic percpu memory allocator
so that:
1. Percpu memory dereference is faster 
	- One less memory reference compared to existing simple alloc_percpu
	- As fast as with static percpu areas, one memory reference
	  less actually.
2. Better memory usage
	- Doesn't need a NR_CPUS pointer array for each allocation
	- Interlaces objects making better utilization of memory/cachelines
3. Provides truly node local allocation
	- The percpu memory with exisiting alloc_percpu does node local
	  allocation, but the NR_CPUS place holder is not node local. This
	  problem doesn't exist with the new implementation.

Design:
We have blocks of memory akin to slabs.  Each block has 
(percpu blocksize) * NR_CPUS of kernel VA space allocated to it.
Node local pages are allocated and mapped against the corresponding cpus'
VA space.  The allocator allocates memory in mulitples of a fixed currency 
size for an alloc request.  The allocator returns address of the percpu
object corresponding to cpu0.  The cpu local variable for any given cpu
can be obtained by simple arithmetic:
obj_address + cpu_id  * PCPU_BLKSIZE.

Testing:
The block allocator has undergone some userspace stress testing.  The
remnants of userspace testing still exists in the code as this patch is RFC.

Signed-off-by: Ravikiran Thirumalai <kiran@in.ibm.com>
---

 include/linux/kernel.h |    2 
 include/linux/percpu.h |   18 -
 mm/Makefile            |    1 
 mm/percpu.c            |  759 +++++++++++++++++++++++++++++++++++++++++++++++++
 mm/slab.c              |   69 ----
 5 files changed, 769 insertions(+), 80 deletions(-)


diff -ruN -X dontdiff2 linux-2.6.10-rc3/include/linux/kernel.h alloc_percpu-2.6.10-rc3/include/linux/kernel.h
--- linux-2.6.10-rc3/include/linux/kernel.h	2004-12-04 03:22:07.000000000 +0530
+++ alloc_percpu-2.6.10-rc3/include/linux/kernel.h	2004-12-16 01:29:14.000000000 +0530
@@ -27,6 +27,8 @@
 
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
 #define ALIGN(x,a) (((x)+(a)-1)&~((a)-1))
+#define IS_ALIGNED(x,a) (!(((a) - 1) & (x)))
+#define IS_POWEROFTWO(x) (!(((x) - 1) & (x)))
 
 #define	KERN_EMERG	"<0>"	/* system is unusable			*/
 #define	KERN_ALERT	"<1>"	/* action must be taken immediately	*/
diff -ruN -X dontdiff2 linux-2.6.10-rc3/include/linux/percpu.h alloc_percpu-2.6.10-rc3/include/linux/percpu.h
--- linux-2.6.10-rc3/include/linux/percpu.h	2004-12-04 03:24:43.000000000 +0530
+++ alloc_percpu-2.6.10-rc3/include/linux/percpu.h	2004-12-16 02:06:56.000000000 +0530
@@ -15,23 +15,19 @@
 #define get_cpu_var(var) (*({ preempt_disable(); &__get_cpu_var(var); }))
 #define put_cpu_var(var) preempt_enable()
 
-#ifdef CONFIG_SMP
-
-struct percpu_data {
-	void *ptrs[NR_CPUS];
-	void *blkp;
-};
+/* This is the upper bound for an object using alloc_percpu. */
+#define PCPU_BLKSIZE (PAGE_SIZE * 2)
+#define PCPU_CURR_SIZE        (sizeof (void *))
 
+#ifdef CONFIG_SMP
 /* 
  * Use this to get to a cpu's version of the per-cpu object allocated using
  * alloc_percpu.  Non-atomic access to the current CPU's version should
  * probably be combined with get_cpu()/put_cpu().
  */ 
 #define per_cpu_ptr(ptr, cpu)                   \
-({                                              \
-        struct percpu_data *__p = (struct percpu_data *)~(unsigned long)(ptr); \
-        (__typeof__(ptr))__p->ptrs[(cpu)];	\
-})
+	((__typeof__(ptr))                      \
+	(RELOC_HIDE(ptr,  PCPU_BLKSIZE * cpu)))
 
 extern void *__alloc_percpu(size_t size, size_t align);
 extern void free_percpu(const void *);
@@ -56,6 +52,6 @@
 
 /* Simple wrapper for the common case: zeros memory. */
 #define alloc_percpu(type) \
-	((type *)(__alloc_percpu(sizeof(type), __alignof__(type))))
+	((type *)(__alloc_percpu(ALIGN(sizeof (type), PCPU_CURR_SIZE),  __alignof__(type))))
 
 #endif /* __LINUX_PERCPU_H */
diff -ruN -X dontdiff2 linux-2.6.10-rc3/mm/Makefile alloc_percpu-2.6.10-rc3/mm/Makefile
--- linux-2.6.10-rc3/mm/Makefile	2004-12-04 03:23:57.000000000 +0530
+++ alloc_percpu-2.6.10-rc3/mm/Makefile	2004-12-16 01:29:14.000000000 +0530
@@ -17,4 +17,5 @@
 obj-$(CONFIG_NUMA) 	+= mempolicy.o
 obj-$(CONFIG_SHMEM) += shmem.o
 obj-$(CONFIG_TINY_SHMEM) += tiny-shmem.o
+obj-$(CONFIG_SMP)	+= percpu.o
 
diff -ruN -X dontdiff2 linux-2.6.10-rc3/mm/percpu.c alloc_percpu-2.6.10-rc3/mm/percpu.c
--- linux-2.6.10-rc3/mm/percpu.c	1970-01-01 05:30:00.000000000 +0530
+++ alloc_percpu-2.6.10-rc3/mm/percpu.c	2004-12-16 02:08:38.000000000 +0530
@@ -0,0 +1,759 @@
+/*
+ * Dynamic percpu memory allocator.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2003
+ *
+ * Author: Ravikiran Thirumalai <kiran@in.ibm.com>
+ * 
+ * Originally by Dipankar Sarma and Ravikiran Thirumalai,
+ * This reimplements alloc_percpu to make it 
+ * 1. Independent of slab/kmalloc
+ * 2. Use node local memory
+ * 3. Use simple pointer arithmetic 
+ * 4. Minimise fragmentation.
+ *
+ * Allocator is slow -- expected to be called during module/subsytem
+ * init. alloc_percpu can block.
+ *
+ */
+
+#ifdef __KERNEL__
+#include <linux/percpu.h>
+#include <linux/vmalloc.h>
+#include <linux/module.h>
+#include <linux/mm.h>
+#include <asm/semaphore.h>
+#include <asm/pgtable.h>
+#include <asm/hardirq.h>
+#else
+#include "userspace.h"
+#endif
+
+#define MAX_OBJSIZE	PCPU_BLKSIZE
+#define OBJS_PER_BLOCK	(PCPU_BLKSIZE/PCPU_CURR_SIZE)
+#define	BITMAP_ARR_SIZE (OBJS_PER_BLOCK/(sizeof (unsigned long) * 8))
+#define MAX_NR_BITS	(OBJS_PER_BLOCK)
+#define PCPUPAGES_PER_BLOCK ((PCPU_BLKSIZE >> PAGE_SHIFT) * NR_CPUS)
+
+/* Block descriptor */
+struct pcpu_block {
+	void *start_addr;
+	struct page *pages[PCPUPAGES_PER_BLOCK * 2]; /* Extra for block mgt */
+	struct list_head blklist;
+	unsigned long bitmap[BITMAP_ARR_SIZE];	/* Object Freelist */
+	int bufctl_fl[OBJS_PER_BLOCK];		/* bufctl_fl freelist */
+	int bufctl_fl_head;
+	unsigned int size_used;
+};
+
+#define BLK_SIZE_USED(listpos) (list_entry(listpos, 		 	      \
+					struct pcpu_block, blklist)->size_used)
+
+/* Block list maintanance */
+
+/* Ordered list of pcpu_blocks -- Full, partial first */
+#ifdef PCPU_DEBUG
+struct list_head blkhead = LIST_HEAD_INIT(blkhead);
+#else
+static struct list_head blkhead = LIST_HEAD_INIT(blkhead);
+#endif
+static struct list_head *firstnotfull = &blkhead;
+static DECLARE_MUTEX(blklist_lock);
+
+/* 
+ * Bufctl descriptor and bufctl list for all allocated objs...
+ * Having one list for all buffers in the allocater might not be very efficient
+ * but we are not expecting allocs and frees in fast path (only during module
+ * load and unload hopefully
+ */
+struct buf_ctl {
+	void *addr;
+	size_t size;
+	struct buf_ctl *next;
+};
+
+static struct buf_ctl *buf_head = NULL;
+
+#define BLOCK_MANAGEMENT_SIZE						\
+({									\
+	int extra = sizeof (struct buf_ctl)*OBJS_PER_BLOCK 		\
+				+ sizeof (struct pcpu_block); 		\
+	ALIGN(extra, PAGE_SIZE);					\
+})
+
+#define BLOCK_MANAGEMENT_PAGES (BLOCK_MANAGEMENT_SIZE >> PAGE_SHIFT)
+
+void init_pcpu_block(struct pcpu_block *blkp)
+{
+	int i;
+	memset(blkp, 0, sizeof (struct pcpu_block)); /* Delme --for US only */
+
+	/* Setup the freelist */
+	blkp->bufctl_fl_head = 0;
+	for (i = 0; i < OBJS_PER_BLOCK-1; i++)
+		blkp->bufctl_fl[i] = i+1;
+	blkp->bufctl_fl[i] = -1;	/* Sentinel to mark End of list */
+}
+
+#ifndef USERSPACE
+
+/*
+ * Allocate PCPU_BLKSIZE * NR_CPUS + BLOCK_MANAGEMENT_SIZE  worth of 
+ * contiguous kva space, and PCPU_BLKSIZE amount of node local 
+ * memory (pages) for all cpus possible + BLOCK_MANAGEMENT_SIZE pages
+ */
+static void *
+valloc_percpu(void)
+{
+	int i,j = 0;
+	unsigned int nr_pages;
+	struct vm_struct *area, tmp;
+	struct page **tmppage;
+	struct page *pages[BLOCK_MANAGEMENT_PAGES];
+	unsigned int cpu_pages = PCPU_BLKSIZE >> PAGE_SHIFT;
+	struct pcpu_block *blkp = NULL;
+
+	BUG_ON(!IS_ALIGNED(PCPU_BLKSIZE, PAGE_SIZE));
+	BUG_ON(!PCPU_BLKSIZE);
+	nr_pages = PCPUPAGES_PER_BLOCK + BLOCK_MANAGEMENT_PAGES;	
+
+	/* Alloc Managent block pages */
+	for ( i = 0; i < BLOCK_MANAGEMENT_PAGES; i++) {
+		pages[i] = alloc_pages(GFP_KERNEL, 0);
+		if (!pages[i]) {
+			while ( --i >= 0 ) 
+				__free_pages(pages[i], 0);
+			return NULL;
+		}
+		/* Zero the alloced page */
+		clear_page(page_address(pages[i]));
+	}
+
+	/* Get the contiguous VA space for this block */
+	area = get_vm_area(nr_pages << PAGE_SHIFT, VM_MAP);
+	if (!area)
+		goto rollback_mgt;
+
+	/* Map pages for the block management pages */
+	tmppage = pages;
+	tmp.addr = area->addr + NR_CPUS * PCPU_BLKSIZE;
+	tmp.size =  BLOCK_MANAGEMENT_SIZE + PAGE_SIZE;
+	if (map_vm_area(&tmp, PAGE_KERNEL, &tmppage))
+		goto rollback_vm_area;
+	
+	/* Init the block descriptor */
+	blkp = area->addr + NR_CPUS * PCPU_BLKSIZE;
+	init_pcpu_block(blkp);
+	for ( i = 0; i < BLOCK_MANAGEMENT_PAGES; i++)
+		blkp->pages[i+PCPUPAGES_PER_BLOCK] = pages[i];
+	
+	/* Alloc node local pages for all cpus possible */
+	for (i = 0; i < NR_CPUS; i++) {
+		if (cpu_possible(i)) {
+			int start_idx =  i * cpu_pages;
+			for (j = start_idx; j < start_idx + cpu_pages; j++) {
+				blkp->pages[j] = alloc_pages_node(cpu_to_node(i)
+						 ,GFP_KERNEL | __GFP_HIGHMEM,
+						    0);
+				if (unlikely(!blkp->pages[j]))
+					goto rollback_pages;
+			}
+		}
+	}
+
+	/* Map pages for each cpu by splitting vm_struct for each cpu */
+	for (i = 0; i < NR_CPUS; i++) {
+		if (cpu_possible(i)) {
+			tmppage = &blkp->pages[i*cpu_pages];
+			tmp.addr = area->addr + i * PCPU_BLKSIZE;
+			/* map_vm_area assumes a guard page of size PAGE_SIZE */
+			tmp.size = PCPU_BLKSIZE + PAGE_SIZE; 
+			if (map_vm_area(&tmp, PAGE_KERNEL, &tmppage))
+				goto fail_map;
+		}
+	}
+
+	return area->addr;
+
+fail_map:
+	i--;
+	for (; i >= 0; i--) {
+		if (cpu_possible(i)) {
+			tmp.addr = area->addr + i * PCPU_BLKSIZE;
+			/* we've mapped a guard page extra earlier... */
+			tmp.size = PCPU_BLKSIZE + PAGE_SIZE;
+			unmap_vm_area(&tmp);
+		}
+	}
+	
+	/* set i and j with proper values for the roll back at fail: */
+	i = NR_CPUS - 1;
+	j = PCPUPAGES_PER_BLOCK;
+	
+rollback_pages:
+	j--;
+	for (; j >= 0; j--)
+		if (cpu_possible(j/cpu_pages))
+			__free_pages(blkp->pages[j], 0);
+	
+	/* Unmap  block management */
+	tmp.addr = area->addr + NR_CPUS * PCPU_BLKSIZE;
+	tmp.size =  BLOCK_MANAGEMENT_SIZE + PAGE_SIZE;
+	unmap_vm_area(&tmp);
+
+rollback_vm_area:
+	/* Give back the contiguous mem area */
+	area = remove_vm_area(area->addr);
+	BUG_ON(!area);
+	
+rollback_mgt:
+
+	/* Free the block management pages */
+	for (i = 0 ; i < BLOCK_MANAGEMENT_PAGES; i++)
+		__free_pages(pages[i], 0);
+		
+	return NULL;
+}
+
+/* Free memory block allocated by valloc_percpu */
+static void
+vfree_percpu(void *addr)
+{
+	int i;
+	struct pcpu_block *blkp = addr + PCPUPAGES_PER_BLOCK * PAGE_SIZE;
+	struct vm_struct *area, tmp;
+	unsigned int cpu_pages = PCPU_BLKSIZE >> PAGE_SHIFT;
+	struct page *pages[BLOCK_MANAGEMENT_PAGES];
+
+	/* Backup the block management struct pages */
+	for (i=0; i < BLOCK_MANAGEMENT_PAGES; i++)
+		pages[i] = blkp->pages[i+PCPUPAGES_PER_BLOCK];
+	
+	/* Unmap all cpu_pages from the block's vm space */
+	for (i = 0; i < NR_CPUS; i++) {
+		if (cpu_possible(i)) {
+			tmp.addr = addr + i * PCPU_BLKSIZE;
+			/* We've mapped a guard page extra earlier */
+			tmp.size = PCPU_BLKSIZE + PAGE_SIZE;
+			unmap_vm_area(&tmp);
+		}
+	}
+	
+	/* Give back all allocated pages */
+	for (i = 0; i < PCPUPAGES_PER_BLOCK; i++) {
+		if (cpu_possible(i/cpu_pages)) 
+			__free_pages(blkp->pages[i], 0);
+	}	
+
+	/* Unmap block management pages */
+	tmp.addr = addr + NR_CPUS * PCPU_BLKSIZE;
+	tmp.size = BLOCK_MANAGEMENT_SIZE + PAGE_SIZE;
+	unmap_vm_area(&tmp);
+
+	/* Free block management pages */
+	for (i=0; i < BLOCK_MANAGEMENT_PAGES; i++)
+		__free_pages(pages[i], 0);
+
+	/* Give back vm area for this block */
+	area = remove_vm_area(addr);
+	BUG_ON(!area);
+
+}
+
+#else
+static void *
+valloc_percpu(void)
+{
+	void *ret;
+	struct pcpu_block *blkp;
+	ret = valloc(PCPU_BLKSIZE + BLOCK_MANAGEMENT_SIZE);
+	if (!ret)
+		return ret;
+	blkp = ret + PCPU_BLKSIZE;
+	init_pcpu_block(blkp);
+	return ret;
+}
+
+#endif				/* USERSPACE */
+
+static int
+add_percpu_block(void)
+{
+	struct pcpu_block *blkp;
+	void *start_addr;
+
+	start_addr = valloc_percpu();
+	if (!start_addr) 
+		return 0;
+	blkp = start_addr + PCPUPAGES_PER_BLOCK * PAGE_SIZE;
+	blkp->start_addr = start_addr;
+	down(&blklist_lock);
+	list_add_tail(&blkp->blklist, &blkhead);
+	if (firstnotfull == &blkhead)
+		firstnotfull = &blkp->blklist;
+	up(&blklist_lock);
+
+	return 1;
+}
+
+struct obj_map_elmt {
+	int startbit;
+	int obj_size;
+};
+
+/* Fill the array with obj map info and return no of elements in the array */
+static int
+make_obj_map(struct obj_map_elmt arr[], struct pcpu_block *blkp)
+{
+	int nr_elements = 0;
+	int i, j, obj_size;
+
+	for (i = 0, j = 0; i < MAX_NR_BITS; i++) {
+		if (!test_bit(i, blkp->bitmap)) {
+			/* Free block start */
+			arr[j].startbit = i;
+			nr_elements++;
+			obj_size = 1;
+			i++;
+			while (i < MAX_NR_BITS && (!test_bit(i, blkp->bitmap))) {
+				i++;
+				obj_size++;
+			}
+			arr[j].obj_size = obj_size * PCPU_CURR_SIZE;
+			j++;
+		}
+	}
+
+	return nr_elements;
+}
+
+/* Sort obj_map array in ascending order -- simple bubble sort */
+static void
+sort_obj_map(struct obj_map_elmt map[], int nr)
+{
+	int i, j, k;
+
+	for (i = 0; i < nr - 1; i++) {
+		k = i;
+
+		for (j = k + 1; j < nr; j++)
+			if (map[j].obj_size < map[k].obj_size)
+				k = j;
+		if (k != i) {
+			struct obj_map_elmt tmp;
+			tmp = map[i];
+			map[i] = map[k];
+			map[k] = tmp;
+		}
+	}
+}
+
+/* Add bufctl to list of bufctl */
+static void
+add_bufctl(struct buf_ctl *bufp)
+{
+	if (buf_head == NULL)
+		buf_head = bufp;
+	else {
+		bufp->next = buf_head;
+		buf_head = bufp;
+	}
+}
+
+/* After you alloc from a block, It can only go up the ordered list */
+static void
+sort_blk_list_up(struct pcpu_block *blkp)
+{
+	struct list_head *pos;
+
+	for (pos = blkp->blklist.prev; pos != &blkhead; pos = pos->prev) {
+		if (BLK_SIZE_USED(pos) < blkp->size_used) {
+			/* Move blkp up */
+			list_del(&blkp->blklist);
+			list_add_tail(&blkp->blklist, pos);
+			pos = &blkp->blklist;
+		} else
+			break;
+	}
+	/* Fix firstnotfull if needed */
+	if (blkp->size_used == PCPU_BLKSIZE) {
+		firstnotfull = blkp->blklist.next;
+		return;
+	}
+	if (blkp->size_used > BLK_SIZE_USED(firstnotfull)) {
+		firstnotfull = &blkp->blklist;
+		return;
+	}
+}
+
+struct buf_ctl *alloc_bufctl(struct pcpu_block *blkp)
+{
+	void *bufctl;
+	int head = blkp->bufctl_fl_head;
+	BUG_ON(head == -1);	/* If bufctls for this block has exhausted */
+	blkp->bufctl_fl_head = blkp->bufctl_fl[blkp->bufctl_fl_head];
+	bufctl = (void *)blkp + sizeof (struct pcpu_block) + 
+				sizeof (struct buf_ctl) * head;
+	return bufctl; 
+}
+	
+/* Don't want to kmalloc this -- to avoid dependence on slab for future */
+static struct obj_map_elmt obj_map[OBJS_PER_BLOCK];
+
+/* Scan the freelist and return suitable obj if found */
+static void
+*get_obj_from_block(size_t size, size_t align, struct pcpu_block *blkp)
+{
+	int nr_elements, nr_currency, obj_startbit, obj_endbit;
+	int i, j;
+	void *objp;
+	struct buf_ctl *bufctl;
+
+	nr_elements = make_obj_map(obj_map, blkp);
+	if (!nr_elements)
+		return NULL;
+
+	/* Sort list in ascending order */
+	sort_obj_map(obj_map, nr_elements);
+
+	/* Get the smallest obj_sized chunk for this size */
+	i = 0;
+	while (i < nr_elements - 1 && size > obj_map[i].obj_size)
+		i++;
+	if (obj_map[i].obj_size < size)	/* No suitable obj_size found */
+		return NULL;
+
+	/* chunk of obj_size >= size is found, check for suitability (align) 
+	 * and alloc 
+	 */
+	nr_currency = size / PCPU_CURR_SIZE;
+	obj_startbit = obj_map[i].startbit;
+
+try_again_for_align:
+
+	obj_endbit = obj_map[i].startbit + obj_map[i].obj_size / PCPU_CURR_SIZE
+	    - 1;
+	objp = obj_startbit * PCPU_CURR_SIZE + blkp->start_addr;
+
+	if (IS_ALIGNED((unsigned long) objp, align)) {
+		/* Alignment is ok so alloc this chunk */
+		bufctl = alloc_bufctl(blkp);
+		if (!bufctl)
+			return NULL;
+		bufctl->addr = objp;
+		bufctl->size = size;
+		bufctl->next = NULL;
+
+		/* Mark the bitmap as allocated */
+		for (j = obj_startbit; j < nr_currency + obj_startbit; j++)
+			set_bit(j, blkp->bitmap);
+		blkp->size_used += size;
+		/* Re-arrange list to preserve full, partial and free order */
+		sort_blk_list_up(blkp);
+		/* Add to the allocated buffers list and return */
+		add_bufctl(bufctl);
+		return objp;
+	} else {
+		/* Alignment is not ok */
+		int obj_size = (obj_endbit - obj_startbit + 1) * PCPU_CURR_SIZE;
+		if (obj_size > size && obj_startbit <= obj_endbit) {
+			/* Since obj_size is bigger than requested, check if
+			   alignment can be met by changing startbit */
+			obj_startbit++;
+			goto try_again_for_align;
+		} else {
+			/* Try in the next chunk */
+			if (++i < nr_elements) {
+				/* Reset start bit and try again */
+				obj_startbit = obj_map[i].startbit;
+				goto try_again_for_align;
+			}
+		}
+	}
+
+	/* Everything failed so return NULL */
+	return NULL;
+}
+
+/* 
+ * __alloc_percpu - allocate one copy of the object for every present
+ * cpu in the system, zeroing them.
+ * Objects should be dereferenced using per_cpu_ptr/get_cpu_ptr
+ * macros only
+ *
+ * This allocator is slow as we assume allocs to come
+ * by during boot/module init.
+ * Should not be called from interrupt context 
+ */
+void *
+__alloc_percpu(size_t size, size_t align)
+{
+	struct pcpu_block *blkp;
+	struct list_head *l;
+	void *obj;
+
+	if (!size)
+		return NULL;
+
+	if (size < PCPU_CURR_SIZE)
+		size = PCPU_CURR_SIZE;
+
+	if (align == 0)
+		align = PCPU_CURR_SIZE;
+
+	if (size > MAX_OBJSIZE) {
+		printk("alloc_percpu: ");
+		printk("size %d requested is more than I can handle\n", size);
+		return NULL;
+	}
+	
+	BUG_ON(!IS_ALIGNED(size, PCPU_CURR_SIZE));
+
+try_after_refill:
+
+	/* Get the block to allocate from */
+	down(&blklist_lock);
+	l = firstnotfull;
+
+try_next_block:
+
+	/* If you have reached end of list, add another block and try */
+	if (l == &blkhead)
+		goto unlock_and_get_mem;
+	blkp = list_entry(l, struct pcpu_block, blklist);
+	obj = get_obj_from_block(size, align, blkp);
+	if (!obj) {
+		l = l->next;
+		goto try_next_block;
+	}
+	up(&blklist_lock);
+	return obj;
+
+unlock_and_get_mem:
+
+	up(&blklist_lock);
+	if (add_percpu_block())
+		goto try_after_refill;
+	return NULL;
+
+}
+
+EXPORT_SYMBOL(__alloc_percpu);
+
+/* After you free from a block, It can only go down the ordered list */
+static void
+sort_blk_list_down(struct pcpu_block *blkp)
+{
+	struct list_head *pos, *prev, *next;
+	/* Store the actual prev and next pointers for fnof fixing later */
+	prev = blkp->blklist.prev;
+	next = blkp->blklist.next;
+
+	/* Fix the ordering on the list */
+	for (pos = blkp->blklist.next; pos != &blkhead; pos = pos->next) {
+		if (BLK_SIZE_USED(pos) > blkp->size_used) {
+			/* Move blkp down */
+			list_del(&blkp->blklist);
+			list_add(&blkp->blklist, pos);
+			pos = &blkp->blklist;
+		} else
+			break;
+	}
+	/* Fix firstnotfull if needed and return */
+	if (firstnotfull == &blkhead) {
+		/* There was no block free, so now this block is fnotfull */
+		firstnotfull = &blkp->blklist;
+		return;
+	}
+
+	if (firstnotfull == &blkp->blklist) {
+		/* This was firstnotfull so fix fnof pointer accdly */
+		if (prev != &blkhead && BLK_SIZE_USED(prev) != PCPU_BLKSIZE) {
+			/* Move fnof pointer up */
+			firstnotfull = prev;
+			prev = prev->prev;
+			/* If size_used of prev is same as fnof, fix fnof to 
+			   point to topmost of the equal sized blocks */
+			while (prev != &blkhead &&
+			       BLK_SIZE_USED(prev) != PCPU_BLKSIZE) {
+				if (BLK_SIZE_USED(prev) !=
+				    BLK_SIZE_USED(firstnotfull))
+					return;
+				firstnotfull = prev;
+				prev = prev->prev;
+			}
+		} else if (next != &blkhead) {
+			/* Move fnof pointer down */
+			firstnotfull = next;
+			next = next->next;
+			if (BLK_SIZE_USED(firstnotfull) != PCPU_BLKSIZE)
+				return;
+			/* fnof is pointing to block which is full...fix it */
+			while (next != &blkhead &&
+			       BLK_SIZE_USED(next) == PCPU_BLKSIZE) {
+				firstnotfull = next;
+				next = next->next;
+			}
+		}
+
+	}
+
+}
+
+void free_bufctl(struct pcpu_block *blkp, struct buf_ctl *bufp)
+{
+	int idx = ((void *) bufp - (void *) blkp + sizeof (struct pcpu_block))
+			/ sizeof (struct buf_ctl);
+	blkp->bufctl_fl[idx] = blkp->bufctl_fl_head;
+	blkp->bufctl_fl_head = idx;
+}
+
+/*
+ * Free the percpu obj and whatever memory can be freed
+ */
+static void
+free_percpu_obj(struct list_head *pos, struct buf_ctl *bufp)
+{
+	struct pcpu_block *blkp;
+	blkp = list_entry(pos, struct pcpu_block, blklist);
+
+	/* Update blkp->size_used and free if size_used is 0 */
+	blkp->size_used -= bufp->size;
+	if (blkp->size_used) {
+		/* Mark the bitmap corresponding to this object free */
+		int i, obj_startbit;
+		int nr_currency = bufp->size / PCPU_CURR_SIZE;
+		obj_startbit = (bufp->addr - blkp->start_addr) / PCPU_CURR_SIZE;
+		for (i = obj_startbit; i < obj_startbit + nr_currency; i++)
+			clear_bit(i, blkp->bitmap);
+		sort_blk_list_down(blkp);
+	} else {
+		/* Usecount is zero, so prepare to give this block back to vm */
+		/* Fix firstnotfull if freeing block was firstnotfull 
+		 * If there are more blocks with the same usecount as fnof,
+		 * point to the first block from the head */
+		if (firstnotfull == pos) {
+			firstnotfull = pos->prev;
+			while (firstnotfull != &blkhead) {
+				unsigned int fnf_size_used;
+				fnf_size_used = BLK_SIZE_USED(firstnotfull);
+
+				if (fnf_size_used == PCPU_BLKSIZE)
+					firstnotfull = &blkhead;
+				else if (firstnotfull->prev == &blkhead)
+					break;
+				else if (BLK_SIZE_USED(firstnotfull->prev)
+					 == fnf_size_used)
+					firstnotfull = firstnotfull->prev;
+				else
+					break;
+			}
+		}
+		list_del(pos);
+	}
+
+	/* Free bufctl after fixing the bufctl list */
+	if (bufp == buf_head) {
+		buf_head = bufp->next;
+	} else {
+		struct buf_ctl *tmp = buf_head;
+		while (tmp && tmp->next != bufp)
+			tmp = tmp->next;
+		BUG_ON(!tmp || tmp->next != bufp);
+		tmp->next = bufp->next;
+	}
+	free_bufctl(blkp, bufp);
+	/* If usecount is zero, give this block back to vm */
+	if (!blkp->size_used)
+		vfree_percpu(blkp->start_addr);
+	return;
+}
+
+/*
+ * Free memory allocated using alloc_percpu.
+ */
+
+void
+free_percpu(const void *objp)
+{
+	struct buf_ctl *bufp;
+	struct pcpu_block *blkp;
+	struct list_head *pos;
+	if (!objp)
+		return;
+
+	/* Find block from which obj was allocated by scanning  bufctl list */
+	down(&blklist_lock);
+	bufp = buf_head;
+	while (bufp) {
+		if (bufp->addr == objp)
+			break;
+		bufp = bufp->next;
+	}
+	BUG_ON(!bufp);
+
+	/* We have the bufctl for the obj here, Now get the block */
+	list_for_each(pos, &blkhead) {
+		blkp = list_entry(pos, struct pcpu_block, blklist);
+		if (objp >= blkp->start_addr &&
+		    objp < blkp->start_addr + PCPU_BLKSIZE)
+			break;
+	}
+
+	BUG_ON(pos == &blkhead);	/* Couldn't find obj in block list */
+
+	/* 
+	 * Mark the bitmap free, Update use count, fix the ordered 
+	 * blklist, free the obj bufctl. 
+	 */
+	free_percpu_obj(pos, bufp);
+
+	up(&blklist_lock);
+	return;
+}
+
+EXPORT_SYMBOL(free_percpu);
+
+#ifdef PCPU_DEBUG
+/* Print All the blocks in this allocator */
+void
+pcpu_block_info(void)
+{
+	struct list_head *pos;
+	unsigned int nr_blocks = 0;
+	printk("Block size is %d bytes\n", PCPU_BLKSIZE);
+	down(&blklist_lock);
+	list_for_each(pos, &blkhead) {
+		struct pcpu_block *blkp;
+		nr_blocks++;
+		blkp = list_entry(pos, struct pcpu_block, blklist);
+		printk("Block %lx with size_used %d\n",
+		       (unsigned long) blkp->start_addr, blkp->size_used);
+	}
+	if (firstnotfull != &blkhead)
+		printk("firstnotfull is %lx\n", (unsigned long)
+		       (list_entry(firstnotfull, struct pcpu_block, blklist))->
+		       start_addr);
+	else
+		printk("firstnotfull is NULL\n");
+
+	up(&blklist_lock);
+
+	printk("Total %u blocks takes %u bytes\n", nr_blocks,
+	       nr_blocks * PCPU_BLKSIZE);
+}
+#endif
diff -ruN -X dontdiff2 linux-2.6.10-rc3/mm/slab.c alloc_percpu-2.6.10-rc3/mm/slab.c
--- linux-2.6.10-rc3/mm/slab.c	2004-12-04 03:25:13.000000000 +0530
+++ alloc_percpu-2.6.10-rc3/mm/slab.c	2004-12-16 01:29:14.000000000 +0530
@@ -2446,51 +2446,6 @@
 
 EXPORT_SYMBOL(__kmalloc);
 
-#ifdef CONFIG_SMP
-/**
- * __alloc_percpu - allocate one copy of the object for every present
- * cpu in the system, zeroing them.
- * Objects should be dereferenced using the per_cpu_ptr macro only.
- *
- * @size: how many bytes of memory are required.
- * @align: the alignment, which can't be greater than SMP_CACHE_BYTES.
- */
-void *__alloc_percpu(size_t size, size_t align)
-{
-	int i;
-	struct percpu_data *pdata = kmalloc(sizeof (*pdata), GFP_KERNEL);
-
-	if (!pdata)
-		return NULL;
-
-	for (i = 0; i < NR_CPUS; i++) {
-		if (!cpu_possible(i))
-			continue;
-		pdata->ptrs[i] = kmem_cache_alloc_node(
-				kmem_find_general_cachep(size, GFP_KERNEL),
-				cpu_to_node(i));
-
-		if (!pdata->ptrs[i])
-			goto unwind_oom;
-		memset(pdata->ptrs[i], 0, size);
-	}
-
-	/* Catch derefs w/o wrappers */
-	return (void *) (~(unsigned long) pdata);
-
-unwind_oom:
-	while (--i >= 0) {
-		if (!cpu_possible(i))
-			continue;
-		kfree(pdata->ptrs[i]);
-	}
-	kfree(pdata);
-	return NULL;
-}
-
-EXPORT_SYMBOL(__alloc_percpu);
-#endif
-
 /**
  * kmem_cache_free - Deallocate an object
  * @cachep: The cache the allocation was from.
@@ -2554,30 +2509,6 @@
 
 EXPORT_SYMBOL(kfree);
 
-#ifdef CONFIG_SMP
-/**
- * free_percpu - free previously allocated percpu memory
- * @objp: pointer returned by alloc_percpu.
- *
- * Don't free memory not originally allocated by alloc_percpu()
- * The complemented objp is to check for that.
- */
-void
-free_percpu(const void *objp)
-{
-	int i;
-	struct percpu_data *p = (struct percpu_data *) (~(unsigned long) objp);
-
-	for (i = 0; i < NR_CPUS; i++) {
-		if (!cpu_possible(i))
-			continue;
-		kfree(p->ptrs[i]);
-	}
-}
-
-EXPORT_SYMBOL(free_percpu);
-#endif
-
 unsigned int kmem_cache_size(kmem_cache_t *cachep)
 {
 	return obj_reallen(cachep);

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2005-01-12 18:09 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-12-17 22:29 [RFC] Reimplementation of linux dynamic percpu memory allocator Manfred Spraul
2004-12-20 18:20 ` Ravikiran G Thirumalai
2004-12-20 18:24   ` Manfred Spraul
2004-12-20 19:25     ` Ravikiran G Thirumalai
2004-12-29 16:33       ` Manfred Spraul
2004-12-29 17:52         ` Ravikiran G Thirumalai
2005-01-12 18:12         ` Ravikiran G Thirumalai
  -- strict thread matches above, loose matches on Subject: below --
2004-12-17 22:03 Ravikiran G Thirumalai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).