All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
@ 2007-07-08  3:49 Christoph Lameter
  2007-07-08  3:49 ` [patch 01/10] SLUB: Direct pass through of page size or higher kmalloc requests Christoph Lameter
                   ` (11 more replies)
  0 siblings, 12 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-08  3:49 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, suresh.b.siddha, corey.d.gough, Pekka Enberg, akpm

This series here contains a number of patches that need some discussion or
evaluation. They apply on top of 2.6.22-rc6-mm1 + slub patches already
in mm.

1. Page allocator pass through

SLOB does pass through all larger kmalloc requests directly to the page
allocator. The advantage is that the allocator overhead is eliminated
and that we do not need to provide kmalloc slabs >= PAGE_SIZE.
This patch does the same thing in SLUB.

For allocation whose size is know the call to the slab may be converted to
a call to the page allocator at compile time.

The result of doing so is also that the behavior of SLUB for pagesize
and higher kmalloc allocation will conform to SLAB. Meaning large
kmalloc allocations are no longer debuggable and are guaranteed to be
page aligned.

Do we want this?

2. A series of performance enhancements patches

The patches improve the producer / consumer scenario. If objects are
always allocated on one processor and released on another then both
will use distinct cachelines to store their information in order to
avoid a bouncing cacheline.

In order to do so we have to introduce a per cpu structure to keep
per cpu allocation lists in distinct cachelines from the remote free
information in the page struct. If we introduce a per cpu structure
then we also need to allocate that in a NUMA aware fashion from the
local node.

Having a per cpu structure allows to avoid the use of certain fields
in the page struct which in turn allows us to avoid using page->mapping
and increasing the maximum number of objects per slab. More optimizations
become possible by shifting information from the kmem_cache structure
that is used in the hotpath to the per cpu structure thereby minimizing
cacheline coverage.

Finally there is an implementation of slab_alloc/slab_free using a cmpxchg
instead of current interrupt enable disable approach. This was inspired by
LTTng's approach. A cmpxchg is less costly than interrupt enabe/disable
but means more complexity in managing the resulting race conditions.
The disadvantage is that the allocation / free paths become very
complex and fragile.

All of these patches need to be evaluated as to what impact they
have on a variety of loads.

3. Removal of SLOB and SLAB.

We would like to consolidate and only have one slab allocator in the future.
Two patches are included that remove SLOB and SLAB. There is only minimal
justification for retaining SLOB. So I think we could remove SLOB for 2.6.23.

SLAB is the reference that SLUB must be measured against to avoid regressions.
On the other hand it will be a problem to support new functionality like
Slab defragmentation since its design makes it difficult to implement comparable
features. So I think that we need to keep SLAB around for one more cycle and
then we may be able to get rid of it. Or we can keep it in the tree for
awhile and produce more and more shims for new slab functionality.

-- 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [patch 01/10] SLUB: Direct pass through of page size or higher kmalloc requests
  2007-07-08  3:49 [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance Christoph Lameter
@ 2007-07-08  3:49 ` Christoph Lameter
  2007-07-08  3:49 ` [patch 02/10] SLUB: Avoid page struct cacheline bouncing due to remote frees to cpu slab Christoph Lameter
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-08  3:49 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, suresh.b.siddha, corey.d.gough, Pekka Enberg, akpm

[-- Attachment #1: slub_page_allocator_pass_through --]
[-- Type: text/plain, Size: 9547 bytes --]

This gets rid of all kmalloc caches larger than page size. A
kmalloc request larger than PAGE_SIZE > 2 is going to be passed
through to the page allocator. This works both inline where
we will call __get_free_pages instead of kmem_cache_alloc and
in __kmalloc.

kfree is modified to check if the object is in a slab page. If not
then the page is freed via the page allocator instead.

Drawbacks:
- No accounting for large kmalloc slab allocations anymore
- No debugging of large kmalloc slab allocations.
- Meshing of slab allocations and page allocator allocations
  become possible.
- Strange discontinuity in kmalloc operations. If larger than
  page size then full page allocator semantics apply.
  But SLOB is already doing that.
- kmalloc objects are aligned to ARCH_KMALLOC_MINALIGN
  if smaller than PAGE_SIZE otherwise they are
  page aligned.
- Additional check of the size in kmalloc and kfree.

Advantages:
- Significantly reduces memory overhead for kmalloc array
- Large kmalloc operations are faster since they do not
  need to pass through the slab allocator to get to the
  page allocator.
- Large kmallocs yields page aligned object which is what
  SLAB does. Bad things like using page sized kmalloc allocations can
  be transparently handled and are not distinguishable from page
  allocator uses.
- Checking for too large objects can be removed since
  it is done by the page allocator.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/linux/slub_def.h |   57 +++++++++++++++++-------------------------
 mm/slub.c                |   63 ++++++++++++++++++++++++++++-------------------
 2 files changed, 62 insertions(+), 58 deletions(-)

Index: linux-2.6.22-rc6-mm1/mm/slub.c
===================================================================
--- linux-2.6.22-rc6-mm1.orig/mm/slub.c	2007-07-06 16:08:36.000000000 -0700
+++ linux-2.6.22-rc6-mm1/mm/slub.c	2007-07-06 16:08:36.000000000 -0700
@@ -2239,11 +2239,11 @@ EXPORT_SYMBOL(kmem_cache_destroy);
  *		Kmalloc subsystem
  *******************************************************************/
 
-struct kmem_cache kmalloc_caches[KMALLOC_SHIFT_HIGH + 1] __cacheline_aligned;
+struct kmem_cache kmalloc_caches[PAGE_SHIFT] __cacheline_aligned;
 EXPORT_SYMBOL(kmalloc_caches);
 
 #ifdef CONFIG_ZONE_DMA
-static struct kmem_cache *kmalloc_caches_dma[KMALLOC_SHIFT_HIGH + 1];
+static struct kmem_cache *kmalloc_caches_dma[PAGE_SHIFT];
 #endif
 
 static int __init setup_slub_min_order(char *str)
@@ -2379,12 +2379,8 @@ static struct kmem_cache *get_slab(size_
 			return ZERO_SIZE_PTR;
 
 		index = size_index[(size - 1) / 8];
-	} else {
-		if (size > KMALLOC_MAX_SIZE)
-			return NULL;
-
+	} else
 		index = fls(size - 1);
-	}
 
 #ifdef CONFIG_ZONE_DMA
 	if (unlikely((flags & SLUB_DMA)))
@@ -2396,9 +2392,15 @@ static struct kmem_cache *get_slab(size_
 
 void *__kmalloc(size_t size, gfp_t flags)
 {
-	struct kmem_cache *s = get_slab(size, flags);
+	struct kmem_cache *s;
 
-	if (ZERO_OR_NULL_PTR(s))
+	if (unlikely(size > PAGE_SIZE / 2))
+		return (void *)__get_free_pages(flags | __GFP_COMP,
+							get_order(size));
+
+	s = get_slab(size, flags);
+
+	if (unlikely(ZERO_OR_NULL_PTR(s)))
 		return s;
 
 	return slab_alloc(s, flags, -1, __builtin_return_address(0));
@@ -2408,9 +2410,15 @@ EXPORT_SYMBOL(__kmalloc);
 #ifdef CONFIG_NUMA
 void *__kmalloc_node(size_t size, gfp_t flags, int node)
 {
-	struct kmem_cache *s = get_slab(size, flags);
+	struct kmem_cache *s;
 
-	if (ZERO_OR_NULL_PTR(s))
+	if (unlikely(size > PAGE_SIZE / 2))
+		return (void *)__get_free_pages(flags | __GFP_COMP,
+							get_order(size));
+
+	s = get_slab(size, flags);
+
+	if (unlikely(ZERO_OR_NULL_PTR(s)))
 		return s;
 
 	return slab_alloc(s, flags, node, __builtin_return_address(0));
@@ -2455,22 +2463,17 @@ EXPORT_SYMBOL(ksize);
 
 void kfree(const void *x)
 {
-	struct kmem_cache *s;
 	struct page *page;
 
-	/*
-	 * This has to be an unsigned comparison. According to Linus
-	 * some gcc version treat a pointer as a signed entity. Then
-	 * this comparison would be true for all "negative" pointers
-	 * (which would cover the whole upper half of the address space).
-	 */
 	if (ZERO_OR_NULL_PTR(x))
 		return;
 
 	page = virt_to_head_page(x);
-	s = page->slab;
-
-	slab_free(s, page, (void *)x, __builtin_return_address(0));
+	if (unlikely(!PageSlab(page))) {
+		put_page(page);
+		return;
+	}
+	slab_free(page->slab, page, (void *)x, __builtin_return_address(0));
 }
 EXPORT_SYMBOL(kfree);
 
@@ -2927,7 +2930,7 @@ void __init kmem_cache_init(void)
 		caches++;
 	}
 
-	for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) {
+	for (i = KMALLOC_SHIFT_LOW; i < PAGE_SHIFT; i++) {
 		create_kmalloc_cache(&kmalloc_caches[i],
 			"kmalloc", 1 << i, GFP_KERNEL);
 		caches++;
@@ -2954,7 +2957,7 @@ void __init kmem_cache_init(void)
 	slab_state = UP;
 
 	/* Provide the correct kmalloc names now that the caches are up */
-	for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++)
+	for (i = KMALLOC_SHIFT_LOW; i < PAGE_SHIFT; i++)
 		kmalloc_caches[i]. name =
 			kasprintf(GFP_KERNEL, "kmalloc-%d", 1 << i);
 
@@ -3142,7 +3145,12 @@ static struct notifier_block __cpuinitda
 
 void *__kmalloc_track_caller(size_t size, gfp_t gfpflags, void *caller)
 {
-	struct kmem_cache *s = get_slab(size, gfpflags);
+	struct kmem_cache *s;
+
+	if (unlikely(size > PAGE_SIZE / 2))
+		return (void *)__get_free_pages(gfpflags | __GFP_COMP,
+							get_order(size));
+	s = get_slab(size, gfpflags);
 
 	if (ZERO_OR_NULL_PTR(s))
 		return s;
@@ -3153,7 +3161,12 @@ void *__kmalloc_track_caller(size_t size
 void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
 					int node, void *caller)
 {
-	struct kmem_cache *s = get_slab(size, gfpflags);
+	struct kmem_cache *s;
+
+	if (unlikely(size > PAGE_SIZE / 2))
+		return (void *)__get_free_pages(gfpflags | __GFP_COMP,
+							get_order(size));
+	s = get_slab(size, gfpflags);
 
 	if (ZERO_OR_NULL_PTR(s))
 		return s;
Index: linux-2.6.22-rc6-mm1/include/linux/slub_def.h
===================================================================
--- linux-2.6.22-rc6-mm1.orig/include/linux/slub_def.h	2007-07-06 16:07:53.000000000 -0700
+++ linux-2.6.22-rc6-mm1/include/linux/slub_def.h	2007-07-06 16:08:36.000000000 -0700
@@ -81,7 +81,7 @@ struct kmem_cache {
  * We keep the general caches in an array of slab caches that are used for
  * 2^x bytes of allocations.
  */
-extern struct kmem_cache kmalloc_caches[KMALLOC_SHIFT_HIGH + 1];
+extern struct kmem_cache kmalloc_caches[PAGE_SHIFT];
 
 /*
  * Sorry that the following has to be that ugly but some versions of GCC
@@ -92,9 +92,6 @@ static inline int kmalloc_index(size_t s
 	if (!size)
 		return 0;
 
-	if (size > KMALLOC_MAX_SIZE)
-		return -1;
-
 	if (size <= KMALLOC_MIN_SIZE)
 		return KMALLOC_SHIFT_LOW;
 
@@ -111,6 +108,10 @@ static inline int kmalloc_index(size_t s
 	if (size <=        512) return 9;
 	if (size <=       1024) return 10;
 	if (size <=   2 * 1024) return 11;
+/*
+ * The following is only needed to support architectures with a larger page
+ * size than 4k.
+ */
 	if (size <=   4 * 1024) return 12;
 	if (size <=   8 * 1024) return 13;
 	if (size <=  16 * 1024) return 14;
@@ -118,13 +119,9 @@ static inline int kmalloc_index(size_t s
 	if (size <=  64 * 1024) return 16;
 	if (size <= 128 * 1024) return 17;
 	if (size <= 256 * 1024) return 18;
-	if (size <=  512 * 1024) return 19;
+	if (size <= 512 * 1024) return 19;
 	if (size <= 1024 * 1024) return 20;
 	if (size <=  2 * 1024 * 1024) return 21;
-	if (size <=  4 * 1024 * 1024) return 22;
-	if (size <=  8 * 1024 * 1024) return 23;
-	if (size <= 16 * 1024 * 1024) return 24;
-	if (size <= 32 * 1024 * 1024) return 25;
 	return -1;
 
 /*
@@ -149,19 +146,6 @@ static inline struct kmem_cache *kmalloc
 	if (index == 0)
 		return NULL;
 
-	/*
-	 * This function only gets expanded if __builtin_constant_p(size), so
-	 * testing it here shouldn't be needed.  But some versions of gcc need
-	 * help.
-	 */
-	if (__builtin_constant_p(size) && index < 0) {
-		/*
-		 * Generate a link failure. Would be great if we could
-		 * do something to stop the compile here.
-		 */
-		extern void __kmalloc_size_too_large(void);
-		__kmalloc_size_too_large();
-	}
 	return &kmalloc_caches[index];
 }
 
@@ -177,15 +161,21 @@ void *__kmalloc(size_t size, gfp_t flags
 
 static inline void *kmalloc(size_t size, gfp_t flags)
 {
-	if (__builtin_constant_p(size) && !(flags & SLUB_DMA)) {
-		struct kmem_cache *s = kmalloc_slab(size);
+	if (__builtin_constant_p(size)) {
+		if (size > PAGE_SIZE / 2)
+			return (void *)__get_free_pages(flags | __GFP_COMP,
+							get_order(size));
 
-		if (!s)
-			return ZERO_SIZE_PTR;
+		if (!(flags & SLUB_DMA)) {
+			struct kmem_cache *s = kmalloc_slab(size);
+
+			if (!s)
+				return ZERO_SIZE_PTR;
 
-		return kmem_cache_alloc(s, flags);
-	} else
-		return __kmalloc(size, flags);
+			return kmem_cache_alloc(s, flags);
+		}
+	}
+	return __kmalloc(size, flags);
 }
 
 #ifdef CONFIG_NUMA
@@ -194,15 +184,16 @@ void *kmem_cache_alloc_node(struct kmem_
 
 static inline void *kmalloc_node(size_t size, gfp_t flags, int node)
 {
-	if (__builtin_constant_p(size) && !(flags & SLUB_DMA)) {
-		struct kmem_cache *s = kmalloc_slab(size);
+	if (__builtin_constant_p(size) &&
+		size <= PAGE_SIZE / 2 && !(flags & SLUB_DMA)) {
+			struct kmem_cache *s = kmalloc_slab(size);
 
 		if (!s)
 			return ZERO_SIZE_PTR;
 
 		return kmem_cache_alloc_node(s, flags, node);
-	} else
-		return __kmalloc_node(size, flags, node);
+	}
+	return __kmalloc_node(size, flags, node);
 }
 #endif
 

-- 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [patch 02/10] SLUB: Avoid page struct cacheline bouncing due to remote frees to cpu slab
  2007-07-08  3:49 [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance Christoph Lameter
  2007-07-08  3:49 ` [patch 01/10] SLUB: Direct pass through of page size or higher kmalloc requests Christoph Lameter
@ 2007-07-08  3:49 ` Christoph Lameter
  2007-07-08  3:49 ` [patch 03/10] SLUB: Do not use page->mapping Christoph Lameter
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-08  3:49 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, suresh.b.siddha, corey.d.gough, Pekka Enberg, akpm

[-- Attachment #1: slub_performance_conc_free_alloc --]
[-- Type: text/plain, Size: 16449 bytes --]

A remote free may access the same page struct that also contains the lockless
freelist for the cpu slab. If objects have a short lifetime and are freed by
a different processor then remote frees back to the slab from which we are
currently allocating are frequent. The cacheline with the page struct needs
to be repeately acquired in exclusive mode by both the allocating thread and
the freeing thread. If this is frequent enough then performance will suffer
because of cacheline bouncing.

This patchset puts the lockless_freelist pointer in its own cacheline. In
order to make that happen we introduce a per cpu structure called
kmem_cache_cpu.

Instead of keeping an array of pointers to page structs we now keep an array
to a per cpu structure that--among other things--contains the pointer to the
lockless freelist. The freeing thread can then keep possession of exclusive
access to the page struct cacheline while the allocating thread keeps its
exclusive access to the cacheline containing the per cpu structure.

This works as long as the allocating cpu is able to service its request
from the lockless freelist. If the lockless freelist runs empty then the
allocating thread needs to acquire exclusive access to the cacheline with
the page struct lock the slab.

The allocating thread will then check if new objects were freed to the per
cpu slab. If so it will keep the slab as the cpu slab and continue with the
recently remote freed objects. So the allocating thread can take a series
of just freed remote pages and dish them out again. Ideally allocations
could be just recycling objects in the same slab this way which will lead
to an ideal allocation / remote free pattern.

The number of objects that can be handled in this way is limited by the
capacity of one slab. Increasing slab size via slub_min_objects/
slub_max_order may increase the number of objects and therefore performance.

If the allocating thread runs out of objects and finds that no objects were
put back by the remote processor then it will retrieve a new slab (from the
partial lists or from the page allocator) and start with a whole
new set of objects while the remote thread may still be freeing objects to
the old cpu slab. This may then repeat until the new slab is also exhausted.
If remote freeing has freed objects in the earlier slab then that earlier
slab will now be on the partial freelist and the allocating thread will
pick that slab next for allocation. So the loop is extended. However,
both threads need to take the list_lock to make the swizzling via
the partial list happen.

It is likely that this kind of scheme will keep the objects being passed
around to a small set that can be kept in the cpu caches leading to increased
performance.

More code cleanups become possible:

- Instead of passing a cpu we can now pass a kmem_cache_cpu structure around.
  Allows reducing the number of parameters to various functions.
- Can define a new node_match() function for NUMA to encapsulate locality
  checks.


Effect on allocations:

Cachelines touched before this patch:

	Write:	page cache struct and first cacheline of object

Cachelines touched after this patch:

	Write:	kmem_cache_cpu cacheline and first cacheline of object
	Read: page cache struct (but see later patch that avoids touching
		that cacheline)


The handling when the lockless alloc list runs empty gets to be a bit more
complicated since another cacheline has now to be written to. But that is
halfway out of the hot path.

Effect on freeing:

Cachelines touched before this patch:

	Write: page_struct and first cacheline of object

Cachelines touched after this patch depending on how we free:

  Write(to cpu_slab):	kmem_cache_cpu struct and first cacheline of object
  Write(to other):	page struct and first cacheline of object

  Read(to cpu_slab):	page struct to id slab etc. (but see later patch that
  			avoids touching the page struct on free)
  Read(to other):	cpu local kmem_cache_cpu struct to verify its not
  			the cpu slab.



Summary:

Pro:
	- Distinct cachelines so that concurrent remote frees and local
	  allocs on a cpuslab can occur without cacheline bouncing.
	- Avoids potential bouncing cachelines because of neighboring
	  per cpu pointer updates in kmem_cache's cpu_slab structure since
	  it now grows to a cacheline (Therefore remove the comment
	  that talks about that concern).

Cons:
	- Freeing objects now requires the reading of one additional
	  cacheline. That can be mitigated for some cases by the following
	  patches but its not possible to completely eliminate these
	  references.

	- Memory usage grows slightly.

	The size of each per cpu object is blown up from one word
	(pointing to the page_struct) to one cacheline with various data.
	So this is NR_CPUS*NR_SLABS*L1_BYTES more memory use. Lets say
	NR_SLABS is 100 and a cache line size of 128 then we have just
	increased SLAB metadata requirements by 12.8k per cpu.
	(Another later patch reduces these requirements)

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/linux/slub_def.h |    9 +-
 mm/slub.c                |  191 ++++++++++++++++++++++++++++-------------------
 2 files changed, 124 insertions(+), 76 deletions(-)

Index: linux-2.6.22-rc6-mm1/include/linux/slub_def.h
===================================================================
--- linux-2.6.22-rc6-mm1.orig/include/linux/slub_def.h	2007-07-06 16:08:36.000000000 -0700
+++ linux-2.6.22-rc6-mm1/include/linux/slub_def.h	2007-07-06 16:08:40.000000000 -0700
@@ -11,6 +11,13 @@
 #include <linux/workqueue.h>
 #include <linux/kobject.h>
 
+struct kmem_cache_cpu {
+	void **freelist;
+	struct page *page;
+	int node;
+	/* Lots of wasted space */
+} ____cacheline_aligned_in_smp;
+
 struct kmem_cache_node {
 	spinlock_t list_lock;	/* Protect partial list and nr_partial */
 	unsigned long nr_partial;
@@ -63,7 +70,7 @@ struct kmem_cache {
 				 */
 	struct kmem_cache_node *node[MAX_NUMNODES];
 #endif
-	struct page *cpu_slab[NR_CPUS];
+	struct kmem_cache_cpu cpu_slab[NR_CPUS];
 };
 
 /*
Index: linux-2.6.22-rc6-mm1/mm/slub.c
===================================================================
--- linux-2.6.22-rc6-mm1.orig/mm/slub.c	2007-07-06 16:08:36.000000000 -0700
+++ linux-2.6.22-rc6-mm1/mm/slub.c	2007-07-06 16:08:40.000000000 -0700
@@ -90,7 +90,7 @@
  * 			One use of this flag is to mark slabs that are
  * 			used for allocations. Then such a slab becomes a cpu
  * 			slab. The cpu slab may be equipped with an additional
- * 			lockless_freelist that allows lockless access to
+ * 			freelist that allows lockless access to
  * 			free objects in addition to the regular freelist
  * 			that requires the slab lock.
  *
@@ -140,11 +140,6 @@ static inline void ClearSlabDebug(struct
 /*
  * Issues still to be resolved:
  *
- * - The per cpu array is updated for each new slab and and is a remote
- *   cacheline for most nodes. This could become a bouncing cacheline given
- *   enough frequent updates. There are 16 pointers in a cacheline, so at
- *   max 16 cpus could compete for the cacheline which may be okay.
- *
  * - Support PAGE_ALLOC_DEBUG. Should be easy to do.
  *
  * - Variable sizing of the per node arrays
@@ -286,6 +281,11 @@ static inline struct kmem_cache_node *ge
 #endif
 }
 
+static inline struct kmem_cache_cpu *get_cpu_slab(struct kmem_cache *s, int cpu)
+{
+	return &s->cpu_slab[cpu];
+}
+
 static inline int check_valid_pointer(struct kmem_cache *s,
 				struct page *page, const void *object)
 {
@@ -1399,33 +1399,34 @@ static void unfreeze_slab(struct kmem_ca
 /*
  * Remove the cpu slab
  */
-static void deactivate_slab(struct kmem_cache *s, struct page *page, int cpu)
+static void deactivate_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 {
+	struct page *page = c->page;
 	/*
 	 * Merge cpu freelist into freelist. Typically we get here
 	 * because both freelists are empty. So this is unlikely
 	 * to occur.
 	 */
-	while (unlikely(page->lockless_freelist)) {
+	while (unlikely(c->freelist)) {
 		void **object;
 
 		/* Retrieve object from cpu_freelist */
-		object = page->lockless_freelist;
-		page->lockless_freelist = page->lockless_freelist[page->offset];
+		object = c->freelist;
+		c->freelist = c->freelist[page->offset];
 
 		/* And put onto the regular freelist */
 		object[page->offset] = page->freelist;
 		page->freelist = object;
 		page->inuse--;
 	}
-	s->cpu_slab[cpu] = NULL;
+	c->page = NULL;
 	unfreeze_slab(s, page);
 }
 
-static inline void flush_slab(struct kmem_cache *s, struct page *page, int cpu)
+static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 {
-	slab_lock(page);
-	deactivate_slab(s, page, cpu);
+	slab_lock(c->page);
+	deactivate_slab(s, c);
 }
 
 /*
@@ -1434,18 +1435,17 @@ static inline void flush_slab(struct kme
  */
 static inline void __flush_cpu_slab(struct kmem_cache *s, int cpu)
 {
-	struct page *page = s->cpu_slab[cpu];
+	struct kmem_cache_cpu *c = get_cpu_slab(s, cpu);
 
-	if (likely(page))
-		flush_slab(s, page, cpu);
+	if (likely(c && c->page))
+		flush_slab(s, c);
 }
 
 static void flush_cpu_slab(void *d)
 {
 	struct kmem_cache *s = d;
-	int cpu = smp_processor_id();
 
-	__flush_cpu_slab(s, cpu);
+	__flush_cpu_slab(s, smp_processor_id());
 }
 
 static void flush_all(struct kmem_cache *s)
@@ -1462,6 +1462,19 @@ static void flush_all(struct kmem_cache 
 }
 
 /*
+ * Check if the objects in a per cpu structure fit numa
+ * locality expectations.
+ */
+static inline int node_match(struct kmem_cache_cpu *c, int node)
+{
+#ifdef CONFIG_NUMA
+	if (node != -1 && c->node != node)
+		return 0;
+#endif
+	return 1;
+}
+
+/*
  * Slow path. The lockless freelist is empty or we need to perform
  * debugging duties.
  *
@@ -1479,45 +1492,46 @@ static void flush_all(struct kmem_cache 
  * we need to allocate a new slab. This is slowest path since we may sleep.
  */
 static void *__slab_alloc(struct kmem_cache *s,
-		gfp_t gfpflags, int node, void *addr, struct page *page)
+		gfp_t gfpflags, int node, void *addr, struct kmem_cache_cpu *c)
 {
 	void **object;
-	int cpu = smp_processor_id();
+	struct page *new;
 
-	if (!page)
+	if (!c->page)
 		goto new_slab;
 
-	slab_lock(page);
-	if (unlikely(node != -1 && page_to_nid(page) != node))
+	slab_lock(c->page);
+	if (unlikely(!node_match(c, node)))
 		goto another_slab;
 load_freelist:
-	object = page->freelist;
+	object = c->page->freelist;
 	if (unlikely(!object))
 		goto another_slab;
-	if (unlikely(SlabDebug(page)))
+	if (unlikely(SlabDebug(c->page)))
 		goto debug;
 
-	object = page->freelist;
-	page->lockless_freelist = object[page->offset];
-	page->inuse = s->objects;
-	page->freelist = NULL;
-	slab_unlock(page);
+	object = c->page->freelist;
+	c->freelist = object[c->page->offset];
+	c->page->inuse = s->objects;
+	c->page->freelist = NULL;
+	c->node = page_to_nid(c->page);
+	slab_unlock(c->page);
 	return object;
 
 another_slab:
-	deactivate_slab(s, page, cpu);
+	deactivate_slab(s, c);
 
 new_slab:
-	page = get_partial(s, gfpflags, node);
-	if (page) {
-		s->cpu_slab[cpu] = page;
+	new = get_partial(s, gfpflags, node);
+	if (new) {
+		c->page = new;
 		goto load_freelist;
 	}
 
-	page = new_slab(s, gfpflags, node);
-	if (page) {
-		cpu = smp_processor_id();
-		if (s->cpu_slab[cpu]) {
+	new = new_slab(s, gfpflags, node);
+	if (new) {
+		c = get_cpu_slab(s, smp_processor_id());
+		if (c->page) {
 			/*
 			 * Someone else populated the cpu_slab while we
 			 * enabled interrupts, or we have gotten scheduled
@@ -1525,34 +1539,32 @@ new_slab:
 			 * requested node even if __GFP_THISNODE was
 			 * specified. So we need to recheck.
 			 */
-			if (node == -1 ||
-				page_to_nid(s->cpu_slab[cpu]) == node) {
+			if (node_match(c, node)) {
 				/*
 				 * Current cpuslab is acceptable and we
 				 * want the current one since its cache hot
 				 */
-				discard_slab(s, page);
-				page = s->cpu_slab[cpu];
-				slab_lock(page);
+				discard_slab(s, new);
+				slab_lock(c->page);
 				goto load_freelist;
 			}
 			/* New slab does not fit our expectations */
-			flush_slab(s, s->cpu_slab[cpu], cpu);
+			flush_slab(s, c);
 		}
-		slab_lock(page);
-		SetSlabFrozen(page);
-		s->cpu_slab[cpu] = page;
+		slab_lock(new);
+		SetSlabFrozen(new);
+		c->page = new;
 		goto load_freelist;
 	}
 	return NULL;
 debug:
-	object = page->freelist;
-	if (!alloc_debug_processing(s, page, object, addr))
+	object = c->page->freelist;
+	if (!alloc_debug_processing(s, c->page, object, addr))
 		goto another_slab;
 
-	page->inuse++;
-	page->freelist = object[page->offset];
-	slab_unlock(page);
+	c->page->inuse++;
+	c->page->freelist = object[c->page->offset];
+	slab_unlock(c->page);
 	return object;
 }
 
@@ -1569,20 +1581,20 @@ debug:
 static void __always_inline *slab_alloc(struct kmem_cache *s,
 		gfp_t gfpflags, int node, void *addr)
 {
-	struct page *page;
 	void **object;
 	unsigned long flags;
+	struct kmem_cache_cpu *c;
 
 	local_irq_save(flags);
-	page = s->cpu_slab[smp_processor_id()];
-	if (unlikely(!page || !page->lockless_freelist ||
-			(node != -1 && page_to_nid(page) != node)))
+	c = get_cpu_slab(s, smp_processor_id());
+	if (unlikely(!c->page || !c->freelist ||
+					!node_match(c, node)))
 
-		object = __slab_alloc(s, gfpflags, node, addr, page);
+		object = __slab_alloc(s, gfpflags, node, addr, c);
 
 	else {
-		object = page->lockless_freelist;
-		page->lockless_freelist = object[page->offset];
+		object = c->freelist;
+		c->freelist = object[c->page->offset];
 	}
 	local_irq_restore(flags);
 
@@ -1680,12 +1692,13 @@ static void __always_inline slab_free(st
 {
 	void **object = (void *)x;
 	unsigned long flags;
+	struct kmem_cache_cpu *c;
 
 	local_irq_save(flags);
-	if (likely(page == s->cpu_slab[smp_processor_id()] &&
-						!SlabDebug(page))) {
-		object[page->offset] = page->lockless_freelist;
-		page->lockless_freelist = object;
+	c = get_cpu_slab(s, smp_processor_id());
+	if (likely(page == c->page && !SlabDebug(page))) {
+		object[page->offset] = c->freelist;
+		c->freelist = object;
 	} else
 		__slab_free(s, page, x, addr);
 
@@ -1878,6 +1891,24 @@ static unsigned long calculate_alignment
 	return ALIGN(align, sizeof(void *));
 }
 
+static void init_kmem_cache_cpu(struct kmem_cache *s,
+			struct kmem_cache_cpu *c)
+{
+	c->page = NULL;
+	c->freelist = NULL;
+	c->node = 0;
+}
+
+static inline int alloc_kmem_cache_cpus(struct kmem_cache *s, gfp_t flags)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		init_kmem_cache_cpu(s, get_cpu_slab(s, cpu));
+
+	return 1;
+}
+
 static void init_kmem_cache_node(struct kmem_cache_node *n)
 {
 	n->nr_partial = 0;
@@ -2120,8 +2151,12 @@ static int kmem_cache_open(struct kmem_c
 #ifdef CONFIG_NUMA
 	s->remote_node_defrag_ratio = 100;
 #endif
-	if (init_kmem_cache_nodes(s, gfpflags & ~SLUB_DMA))
+	if (!init_kmem_cache_nodes(s, gfpflags & ~SLUB_DMA))
+		goto error;
+
+	if (alloc_kmem_cache_cpus(s, gfpflags & ~SLUB_DMA))
 		return 1;
+
 error:
 	if (flags & SLAB_PANIC)
 		panic("Cannot create slab %s size=%lu realsize=%u "
@@ -2966,7 +3001,7 @@ void __init kmem_cache_init(void)
 #endif
 
 	kmem_size = offsetof(struct kmem_cache, cpu_slab) +
-				nr_cpu_ids * sizeof(struct page *);
+				nr_cpu_ids * sizeof(struct kmem_cache_cpu);
 
 	printk(KERN_INFO "SLUB: Genslabs=%d, HWalign=%d, Order=%d-%d, MinObjects=%d,"
 		" CPUs=%d, Nodes=%d\n",
@@ -3582,11 +3617,14 @@ static unsigned long slab_objects(struct
 	per_cpu = nodes + nr_node_ids;
 
 	for_each_possible_cpu(cpu) {
-		struct page *page = s->cpu_slab[cpu];
-		int node;
+		struct page *page;
+		struct kmem_cache_cpu *c = get_cpu_slab(s, cpu);
+
+		if (!c)
+			continue;
 
+		page = c->page;
 		if (page) {
-			node = page_to_nid(page);
 			if (flags & SO_CPU) {
 				int x = 0;
 
@@ -3595,9 +3633,9 @@ static unsigned long slab_objects(struct
 				else
 					x = 1;
 				total += x;
-				nodes[node] += x;
+				nodes[c->node] += x;
 			}
-			per_cpu[node]++;
+			per_cpu[c->node]++;
 		}
 	}
 
@@ -3643,14 +3681,17 @@ static int any_slab_objects(struct kmem_
 	int node;
 	int cpu;
 
-	for_each_possible_cpu(cpu)
-		if (s->cpu_slab[cpu])
+	for_each_possible_cpu(cpu) {
+		struct kmem_cache_cpu *c = get_cpu_slab(s, cpu);
+
+		if (c && c->page)
 			return 1;
+	}
 
-	for_each_node(node) {
+	for_each_online_node(node) {
 		struct kmem_cache_node *n = get_node(s, node);
 
-		if (n->nr_partial || atomic_read(&n->nr_slabs))
+		if (n && (n->nr_partial || atomic_read(&n->nr_slabs)))
 			return 1;
 	}
 	return 0;

-- 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [patch 03/10] SLUB: Do not use page->mapping
  2007-07-08  3:49 [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance Christoph Lameter
  2007-07-08  3:49 ` [patch 01/10] SLUB: Direct pass through of page size or higher kmalloc requests Christoph Lameter
  2007-07-08  3:49 ` [patch 02/10] SLUB: Avoid page struct cacheline bouncing due to remote frees to cpu slab Christoph Lameter
@ 2007-07-08  3:49 ` Christoph Lameter
  2007-07-08  3:49 ` [patch 04/10] SLUB: Move page->offset to kmem_cache_cpu->offset Christoph Lameter
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-08  3:49 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, suresh.b.siddha, corey.d.gough, Pekka Enberg, akpm

[-- Attachment #1: slub_free_up_mapping --]
[-- Type: text/plain, Size: 2699 bytes --]

After moving the lockless_freelist to kmem_cache_cpu we no longer need
page->lockless_freelist. Restructure the use of the struct page fields in
such a way that we never touch the mapping field.

This is turn allows us to remove the special casing of SLUB when determining
the mapping of a page (needed for corner cases of virtual caches machines that
need to flush caches of processors mapping a page).

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/linux/mm.h       |    4 ----
 include/linux/mm_types.h |    9 ++-------
 mm/slub.c                |    2 --
 3 files changed, 2 insertions(+), 13 deletions(-)

Index: linux-2.6.22-rc6-mm1/include/linux/mm.h
===================================================================
--- linux-2.6.22-rc6-mm1.orig/include/linux/mm.h	2007-07-05 19:05:02.000000000 -0700
+++ linux-2.6.22-rc6-mm1/include/linux/mm.h	2007-07-05 19:05:24.000000000 -0700
@@ -632,10 +632,6 @@ static inline struct address_space *page
 	VM_BUG_ON(PageSlab(page));
 	if (unlikely(PageSwapCache(page)))
 		mapping = &swapper_space;
-#ifdef CONFIG_SLUB
-	else if (unlikely(PageSlab(page)))
-		mapping = NULL;
-#endif
 	else if (unlikely((unsigned long)mapping & PAGE_MAPPING_ANON))
 		mapping = NULL;
 	return mapping;
Index: linux-2.6.22-rc6-mm1/include/linux/mm_types.h
===================================================================
--- linux-2.6.22-rc6-mm1.orig/include/linux/mm_types.h	2007-07-05 19:00:19.000000000 -0700
+++ linux-2.6.22-rc6-mm1/include/linux/mm_types.h	2007-07-05 19:05:24.000000000 -0700
@@ -49,13 +49,8 @@ struct page {
 #if NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS
 	    spinlock_t ptl;
 #endif
-	    struct {			/* SLUB uses */
-	    	void **lockless_freelist;
-		struct kmem_cache *slab;	/* Pointer to slab */
-	    };
-	    struct {
-		struct page *first_page;	/* Compound pages */
-	    };
+	    struct kmem_cache *slab;	/* SLUB: Pointer to slab */
+	    struct page *first_page;	/* Compound tail pages */
 	};
 	union {
 		pgoff_t index;		/* Our offset within mapping. */
Index: linux-2.6.22-rc6-mm1/mm/slub.c
===================================================================
--- linux-2.6.22-rc6-mm1.orig/mm/slub.c	2007-07-05 19:05:16.000000000 -0700
+++ linux-2.6.22-rc6-mm1/mm/slub.c	2007-07-05 19:05:24.000000000 -0700
@@ -1109,7 +1109,6 @@ static struct page *new_slab(struct kmem
 		atomic_long_inc(&n->nr_slabs);
 
 	page->inuse = 0;
-	page->lockless_freelist = NULL;
 	page->offset = s->offset / sizeof(void *);
 	page->slab = s;
 
@@ -1163,7 +1162,6 @@ static void __free_slab(struct kmem_cach
 		NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
 		- pages);
 
-	page->mapping = NULL;
 	__free_pages(page, s->order);
 }
 

-- 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [patch 04/10] SLUB: Move page->offset to kmem_cache_cpu->offset
  2007-07-08  3:49 [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance Christoph Lameter
                   ` (2 preceding siblings ...)
  2007-07-08  3:49 ` [patch 03/10] SLUB: Do not use page->mapping Christoph Lameter
@ 2007-07-08  3:49 ` Christoph Lameter
  2007-07-08  3:49 ` [patch 05/10] SLUB: Avoid touching page struct when freeing to per cpu slab Christoph Lameter
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-08  3:49 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, suresh.b.siddha, corey.d.gough, Pekka Enberg, akpm

[-- Attachment #1: slub_move_offset_to_cpu --]
[-- Type: text/plain, Size: 8228 bytes --]

We need the offset during slab_alloc and slab_free. In both cases we also
reference the cacheline of the kmem_cache_cpu structure. We can therefore
move the offset field into the kmem_cache_cpu structure freeing up 16 bits
in the page flags.

Moving the offset allows an allocation from slab_alloc() without
touching the page struct in the hot path.

The only thing left in slab_free() that touches the page struct cacheline for
per cpu freeing is the checking of SlabDebug(page). The next patch deals with
that.

Use the available 16 bits to broaden page->inuse. That way we can have more
than 64k objects per slab and can get rid of the checks for that limitation.

No need anymore to shrink the order of slabs if we boot with 2M sized slabs
(slub_min_order=9).

No need anymore to switch off the offset calculation for very large slabs
since the field in the kmem_cache_cpu structure is 32 bits and so the offset
field can now handle slab sizes of up to 8GB.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/linux/mm_types.h |    5 ---
 include/linux/slub_def.h |    1 
 mm/slub.c                |   75 ++++++++++-------------------------------------
 3 files changed, 19 insertions(+), 62 deletions(-)

Index: linux-2.6.22-rc6-mm1/mm/slub.c
===================================================================
--- linux-2.6.22-rc6-mm1.orig/mm/slub.c	2007-07-06 23:35:49.000000000 -0700
+++ linux-2.6.22-rc6-mm1/mm/slub.c	2007-07-07 12:02:41.000000000 -0700
@@ -207,11 +207,6 @@ static inline void ClearSlabDebug(struct
 #define ARCH_SLAB_MINALIGN __alignof__(unsigned long long)
 #endif
 
-/*
- * The page->inuse field is 16 bit thus we have this limitation
- */
-#define MAX_OBJECTS_PER_SLAB 65535
-
 /* Internal SLUB flags */
 #define __OBJECT_POISON 0x80000000	/* Poison object */
 
@@ -741,11 +736,6 @@ static int check_slab(struct kmem_cache 
 		slab_err(s, page, "Not a valid slab page");
 		return 0;
 	}
-	if (page->offset * sizeof(void *) != s->offset) {
-		slab_err(s, page, "Corrupted offset %lu",
-			(unsigned long)(page->offset * sizeof(void *)));
-		return 0;
-	}
 	if (page->inuse > s->objects) {
 		slab_err(s, page, "inuse %u > max %u",
 			s->name, page->inuse, s->objects);
@@ -884,8 +874,6 @@ bad:
 		slab_fix(s, "Marking all objects used");
 		page->inuse = s->objects;
 		page->freelist = NULL;
-		/* Fix up fields that may be corrupted */
-		page->offset = s->offset / sizeof(void *);
 	}
 	return 0;
 }
@@ -1001,30 +989,12 @@ __setup("slub_debug", setup_slub_debug);
 static void kmem_cache_open_debug_check(struct kmem_cache *s)
 {
 	/*
-	 * The page->offset field is only 16 bit wide. This is an offset
-	 * in units of words from the beginning of an object. If the slab
-	 * size is bigger then we cannot move the free pointer behind the
-	 * object anymore.
-	 *
-	 * On 32 bit platforms the limit is 256k. On 64bit platforms
-	 * the limit is 512k.
-	 *
-	 * Debugging or ctor may create a need to move the free
-	 * pointer. Fail if this happens.
+	 * Enable debugging if selected on the kernel commandline.
 	 */
-	if (s->objsize >= 65535 * sizeof(void *)) {
-		BUG_ON(s->flags & (SLAB_RED_ZONE | SLAB_POISON |
-				SLAB_STORE_USER | SLAB_DESTROY_BY_RCU));
-		BUG_ON(s->ctor);
-	}
-	else
-		/*
-		 * Enable debugging if selected on the kernel commandline.
-		 */
-		if (slub_debug && (!slub_debug_slabs ||
-		    strncmp(slub_debug_slabs, s->name,
-		    	strlen(slub_debug_slabs)) == 0))
-				s->flags |= slub_debug;
+	if (slub_debug && (!slub_debug_slabs ||
+	    strncmp(slub_debug_slabs, s->name,
+	    	strlen(slub_debug_slabs)) == 0))
+			s->flags |= slub_debug;
 }
 #else
 static inline void setup_object_debug(struct kmem_cache *s,
@@ -1109,7 +1079,6 @@ static struct page *new_slab(struct kmem
 		atomic_long_inc(&n->nr_slabs);
 
 	page->inuse = 0;
-	page->offset = s->offset / sizeof(void *);
 	page->slab = s;
 
 	start = page_address(page);
@@ -1410,10 +1379,10 @@ static void deactivate_slab(struct kmem_
 
 		/* Retrieve object from cpu_freelist */
 		object = c->freelist;
-		c->freelist = c->freelist[page->offset];
+		c->freelist = c->freelist[c->offset];
 
 		/* And put onto the regular freelist */
-		object[page->offset] = page->freelist;
+		object[c->offset] = page->freelist;
 		page->freelist = object;
 		page->inuse--;
 	}
@@ -1509,7 +1478,7 @@ load_freelist:
 		goto debug;
 
 	object = c->page->freelist;
-	c->freelist = object[c->page->offset];
+	c->freelist = object[c->offset];
 	c->page->inuse = s->objects;
 	c->page->freelist = NULL;
 	c->node = page_to_nid(c->page);
@@ -1561,7 +1530,7 @@ debug:
 		goto another_slab;
 
 	c->page->inuse++;
-	c->page->freelist = object[c->page->offset];
+	c->page->freelist = object[c->offset];
 	slab_unlock(c->page);
 	return object;
 }
@@ -1592,7 +1561,7 @@ static void __always_inline *slab_alloc(
 
 	else {
 		object = c->freelist;
-		c->freelist = object[c->page->offset];
+		c->freelist = object[c->offset];
 	}
 	local_irq_restore(flags);
 
@@ -1625,7 +1594,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_node);
  * handling required then we can return immediately.
  */
 static void __slab_free(struct kmem_cache *s, struct page *page,
-					void *x, void *addr)
+				void *x, void *addr, unsigned int offset)
 {
 	void *prior;
 	void **object = (void *)x;
@@ -1635,7 +1604,7 @@ static void __slab_free(struct kmem_cach
 	if (unlikely(SlabDebug(page)))
 		goto debug;
 checks_ok:
-	prior = object[page->offset] = page->freelist;
+	prior = object[offset] = page->freelist;
 	page->freelist = object;
 	page->inuse--;
 
@@ -1695,10 +1664,10 @@ static void __always_inline slab_free(st
 	local_irq_save(flags);
 	c = get_cpu_slab(s, smp_processor_id());
 	if (likely(page == c->page && !SlabDebug(page))) {
-		object[page->offset] = c->freelist;
+		object[c->offset] = c->freelist;
 		c->freelist = object;
 	} else
-		__slab_free(s, page, x, addr);
+		__slab_free(s, page, x, addr, c->offset);
 
 	local_irq_restore(flags);
 }
@@ -1794,8 +1763,7 @@ static inline int slab_order(int size, i
 	 * If we would create too many object per slab then reduce
 	 * the slab order even if it goes below slub_min_order.
 	 */
-	while (min_order > 0 &&
-		(PAGE_SIZE << min_order) >= MAX_OBJECTS_PER_SLAB * size)
+	while (min_order > 0 && PAGE_SIZE << min_order)
 			min_order--;
 
 	for (order = max(min_order,
@@ -1812,9 +1780,6 @@ static inline int slab_order(int size, i
 		if (rem <= slab_size / fract_leftover)
 			break;
 
-		/* If the next size is too high then exit now */
-		if (slab_size * 2 >= MAX_OBJECTS_PER_SLAB * size)
-			break;
 	}
 
 	return order;
@@ -1894,6 +1859,7 @@ static void init_kmem_cache_cpu(struct k
 {
 	c->page = NULL;
 	c->freelist = NULL;
+	c->offset = s->offset / sizeof(void *);
 	c->node = 0;
 }
 
@@ -2115,14 +2081,7 @@ static int calculate_sizes(struct kmem_c
 	 */
 	s->objects = (PAGE_SIZE << s->order) / size;
 
-	/*
-	 * Verify that the number of objects is within permitted limits.
-	 * The page->inuse field is only 16 bit wide! So we cannot have
-	 * more than 64k objects per slab.
-	 */
-	if (!s->objects || s->objects > MAX_OBJECTS_PER_SLAB)
-		return 0;
-	return 1;
+	return !!s->objects;
 
 }
 
Index: linux-2.6.22-rc6-mm1/include/linux/mm_types.h
===================================================================
--- linux-2.6.22-rc6-mm1.orig/include/linux/mm_types.h	2007-07-06 23:35:49.000000000 -0700
+++ linux-2.6.22-rc6-mm1/include/linux/mm_types.h	2007-07-06 23:35:50.000000000 -0700
@@ -24,10 +24,7 @@ struct page {
 					 * to show when page is mapped
 					 * & limit reverse map searches.
 					 */
-		struct {	/* SLUB uses */
-			short unsigned int inuse;
-			short unsigned int offset;
-		};
+		unsigned int inuse;	/* SLUB: Nr of objects */
 	};
 	union {
 	    struct {
Index: linux-2.6.22-rc6-mm1/include/linux/slub_def.h
===================================================================
--- linux-2.6.22-rc6-mm1.orig/include/linux/slub_def.h	2007-07-06 23:35:49.000000000 -0700
+++ linux-2.6.22-rc6-mm1/include/linux/slub_def.h	2007-07-07 12:01:39.000000000 -0700
@@ -15,6 +15,7 @@ struct kmem_cache_cpu {
 	void **freelist;
 	struct page *page;
 	int node;
+	unsigned int offset;
 	/* Lots of wasted space */
 } ____cacheline_aligned_in_smp;
 

-- 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [patch 05/10] SLUB: Avoid touching page struct when freeing to per cpu slab
  2007-07-08  3:49 [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance Christoph Lameter
                   ` (3 preceding siblings ...)
  2007-07-08  3:49 ` [patch 04/10] SLUB: Move page->offset to kmem_cache_cpu->offset Christoph Lameter
@ 2007-07-08  3:49 ` Christoph Lameter
  2007-07-08  3:49 ` [patch 06/10] SLUB: Place kmem_cache_cpu structures in a NUMA aware way Christoph Lameter
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-08  3:49 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, suresh.b.siddha, corey.d.gough, Pekka Enberg, akpm

[-- Attachment #1: slub_free_cpu_slab_no_page_struct --]
[-- Type: text/plain, Size: 1314 bytes --]

Instead of checking for SlabDebug which requires access the page struct
contents simply check the per cpu freepointer if its not NULL. It can only
be not NULL if !SlabDebug.

This means we will not free to the cpu slab if the per cpu list is
empty. In that case it is likely that the cpu slab is soon being
retired anyways. Not freeing to an empty freelist is also required
to avoid races in the cmpxchg alloc/free version.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 mm/slub.c |   11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Index: linux-2.6.22-rc6-mm1/mm/slub.c
===================================================================
--- linux-2.6.22-rc6-mm1.orig/mm/slub.c	2007-07-05 19:05:29.000000000 -0700
+++ linux-2.6.22-rc6-mm1/mm/slub.c	2007-07-05 19:05:33.000000000 -0700
@@ -1525,6 +1531,7 @@ new_slab:
 	}
 	return NULL;
 debug:
+	c->freelist = NULL;
 	object = c->page->freelist;
 	if (!alloc_debug_processing(s, c->page, object, addr))
 		goto another_slab;
@@ -1663,7 +1670,7 @@ static void __always_inline slab_free(st
 
 	local_irq_save(flags);
 	c = get_cpu_slab(s, smp_processor_id());
-	if (likely(page == c->page && !SlabDebug(page))) {
+	if (likely(page == c->page && c->freelist)) {
 		object[c->offset] = c->freelist;
 		c->freelist = object;
 	} else

-- 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [patch 06/10] SLUB: Place kmem_cache_cpu structures in a NUMA aware way.
  2007-07-08  3:49 [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance Christoph Lameter
                   ` (4 preceding siblings ...)
  2007-07-08  3:49 ` [patch 05/10] SLUB: Avoid touching page struct when freeing to per cpu slab Christoph Lameter
@ 2007-07-08  3:49 ` Christoph Lameter
  2007-07-08  3:49 ` [patch 07/10] SLUB: Optimize cacheline use for zeroing Christoph Lameter
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-08  3:49 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, suresh.b.siddha, corey.d.gough, Pekka Enberg, akpm

[-- Attachment #1: slub_performance_numa_placement --]
[-- Type: text/plain, Size: 9179 bytes --]

The kmem_cache_cpu structures introduced are currently an array placed in the
kmem_cache struct. Meaning the kmem_cache_cpu structures are overwhelmingly
on the wrong node for systems with a higher amount of nodes. These are
performance critical structures since the per node information has
to be touched for every alloc and free in a slab.

In order to place the kmem_cache_cpu structure optimally we put an array
of pointers to kmem_cache_cpu structs in kmem_cache (similar to SLAB).

The kmem_cache_cpu structures can now be allocated in a more intelligent way.

We would like to put per cpu structures for the same cpu but different
slab caches in cachelines together to save space and decrease the cache
footprint. However, the slab allocators itself control only allocations
per node. Thus we set up a simple per cpu array for every processor with
100 per cpu structures which is usually enough to get them all set up right.
If we run out then we fall back to kmalloc_node. This also solves the
bootstrap problem since we do not have to use slab allocator functions
early in boot to get memory for the small per cpu structures.

Pro:
	- NUMA aware placement improves memory performance
	- All global structures in struct kmem_cache become readonly
	- Dense packing of per cpu structures reduces cacheline
	  footprint in SMP and NUMA.
	- Potential avoidance of exclusive cacheline fetches
	  on the free and alloc hotpath since multiple kmem_cache_cpu
	  structures are in one cacheline. This is particularly important
	  for the kmalloc array.

Cons:
	- Additional reference to one read only cacheline (per cpu
	  array of pointers to kmem_cache_cpu) in both slab_alloc()
	  and slab_free().

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/linux/slub_def.h |    9 +-
 mm/slub.c                |  163 ++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 155 insertions(+), 17 deletions(-)

Index: linux-2.6.22-rc6-mm1/include/linux/slub_def.h
===================================================================
--- linux-2.6.22-rc6-mm1.orig/include/linux/slub_def.h	2007-07-07 12:01:39.000000000 -0700
+++ linux-2.6.22-rc6-mm1/include/linux/slub_def.h	2007-07-07 12:06:43.000000000 -0700
@@ -16,8 +16,7 @@ struct kmem_cache_cpu {
 	struct page *page;
 	int node;
 	unsigned int offset;
-	/* Lots of wasted space */
-} ____cacheline_aligned_in_smp;
+};
 
 struct kmem_cache_node {
 	spinlock_t list_lock;	/* Protect partial list and nr_partial */
@@ -71,7 +70,11 @@ struct kmem_cache {
 				 */
 	struct kmem_cache_node *node[MAX_NUMNODES];
 #endif
-	struct kmem_cache_cpu cpu_slab[NR_CPUS];
+#ifdef CONFIG_SMP
+	struct kmem_cache_cpu *cpu_slab[NR_CPUS];
+#else
+	struct kmem_cache_cpu cpu_slab;
+#endif
 };
 
 /*
Index: linux-2.6.22-rc6-mm1/mm/slub.c
===================================================================
--- linux-2.6.22-rc6-mm1.orig/mm/slub.c	2007-07-07 12:06:19.000000000 -0700
+++ linux-2.6.22-rc6-mm1/mm/slub.c	2007-07-07 12:06:43.000000000 -0700
@@ -278,7 +278,11 @@ static inline struct kmem_cache_node *ge
 
 static inline struct kmem_cache_cpu *get_cpu_slab(struct kmem_cache *s, int cpu)
 {
-	return &s->cpu_slab[cpu];
+#ifdef CONFIG_SMP
+	return s->cpu_slab[cpu];
+#else
+	return &s->cpu_slab;
+#endif
 }
 
 static inline int check_valid_pointer(struct kmem_cache *s,
@@ -1864,16 +1868,6 @@ static void init_kmem_cache_cpu(struct k
 	c->node = 0;
 }
 
-static inline int alloc_kmem_cache_cpus(struct kmem_cache *s, gfp_t flags)
-{
-	int cpu;
-
-	for_each_possible_cpu(cpu)
-		init_kmem_cache_cpu(s, get_cpu_slab(s, cpu));
-
-	return 1;
-}
-
 static void init_kmem_cache_node(struct kmem_cache_node *n)
 {
 	n->nr_partial = 0;
@@ -1885,6 +1879,125 @@ static void init_kmem_cache_node(struct 
 #endif
 }
 
+#ifdef CONFIG_SMP
+/*
+ * Per cpu array for per cpu structures.
+ *
+ * The per cpu array places all kmem_cache_cpu structures from one processor
+ * close together meaning that it becomes possible that multiple per cpu
+ * structures are contained in one cacheline. This may be particularly
+ * beneficial for the kmalloc caches.
+ *
+ * A desktop system typically has around 60-80 slabs. With 100 here we are
+ * likely able to get per cpu structures for all caches from the array defined
+ * here. We must be able to cover all kmalloc caches during bootstrap.
+ *
+ * If the per cpu array is exhausted then fall back to kmalloc
+ * of individual cachelines. No sharing is possible then.
+ */
+#define NR_KMEM_CACHE_CPU 100
+
+static DEFINE_PER_CPU(struct kmem_cache_cpu,
+				kmem_cache_cpu)[NR_KMEM_CACHE_CPU];
+
+static DEFINE_PER_CPU(struct kmem_cache_cpu *, kmem_cache_cpu_free);
+
+static struct kmem_cache_cpu *alloc_kmem_cache_cpu(struct kmem_cache *s,
+							int cpu, gfp_t flags)
+{
+	struct kmem_cache_cpu *c = per_cpu(kmem_cache_cpu_free, cpu);
+
+	if (c)
+		per_cpu(kmem_cache_cpu_free, cpu) =
+				(void *)c->freelist;
+	else {
+		/* Table overflow: So allocate ourselves */
+		c = kmalloc_node(
+			ALIGN(sizeof(struct kmem_cache_cpu), cache_line_size()),
+			flags, cpu_to_node(cpu));
+		if (!c)
+			return NULL;
+	}
+
+	init_kmem_cache_cpu(s, c);
+	return c;
+}
+
+static void free_kmem_cache_cpu(struct kmem_cache_cpu *c, int cpu)
+{
+	if (c < per_cpu(kmem_cache_cpu, cpu) ||
+			c > per_cpu(kmem_cache_cpu, cpu) + NR_KMEM_CACHE_CPU) {
+		kfree(c);
+		return;
+	}
+	c->freelist = (void *)per_cpu(kmem_cache_cpu_free, cpu);
+ 	per_cpu(kmem_cache_cpu_free, cpu) = c;
+}
+
+static void free_kmem_cache_cpus(struct kmem_cache *s)
+{
+	int cpu;
+
+	for_each_online_cpu(cpu) {
+		struct kmem_cache_cpu *c = get_cpu_slab(s, cpu);
+
+		if (c) {
+			s->cpu_slab[cpu] = NULL;
+			free_kmem_cache_cpu(c, cpu);
+		}
+	}
+}
+
+static int alloc_kmem_cache_cpus(struct kmem_cache *s, gfp_t flags)
+{
+	int cpu;
+
+	for_each_online_cpu(cpu) {
+		struct kmem_cache_cpu *c = get_cpu_slab(s, cpu);
+
+		if (c)
+			continue;
+
+		c = alloc_kmem_cache_cpu(s, cpu, flags);
+		if (!c) {
+			free_kmem_cache_cpus(s);
+			return 0;
+		}
+		s->cpu_slab[cpu] = c;
+	}
+	return 1;
+}
+
+/*
+ * Initialize the per cpu array.
+ */
+static void init_alloc_cpu_cpu(int cpu)
+{
+	int i;
+
+	for (i = NR_KMEM_CACHE_CPU - 1; i >= 0; i--)
+		free_kmem_cache_cpu(&per_cpu(kmem_cache_cpu, cpu)[i], cpu);
+}
+
+static void __init init_alloc_cpu(void)
+{
+	int cpu;
+
+	for_each_online_cpu(cpu)
+		init_alloc_cpu_cpu(cpu);
+  }
+
+#else
+static inline void free_kmem_cache_cpus(struct kmem_cache *s) {}
+static inline void init_alloc_cpu(void) {}
+
+static inline int alloc_kmem_cache_cpus(struct kmem_cache *s, gfp_t flags)
+{
+ 	init_kmem_cache_cpu(s, &s->cpu_slab);
+	return 1;
+}
+#endif
+
 #ifdef CONFIG_NUMA
 /*
  * No kmalloc_node yet so do it by hand. We know that this is the first
@@ -1892,7 +2005,8 @@ static void init_kmem_cache_node(struct 
  * possible.
  *
  * Note that this function only works on the kmalloc_node_cache
- * when allocating for the kmalloc_node_cache.
+ * when allocating for the kmalloc_node_cache. This is used for bootstrapping
+ * memory on a fresh node that has no slab structures yet.
  */
 static struct kmem_cache_node * __init early_kmem_cache_node_alloc(gfp_t gfpflags,
 								int node)
@@ -2115,6 +2229,7 @@ static int kmem_cache_open(struct kmem_c
 	if (alloc_kmem_cache_cpus(s, gfpflags & ~SLUB_DMA))
 		return 1;
 
+	free_kmem_cache_nodes(s);
 error:
 	if (flags & SLAB_PANIC)
 		panic("Cannot create slab %s size=%lu realsize=%u "
@@ -2197,6 +2312,8 @@ static inline int kmem_cache_close(struc
 	flush_all(s);
 
 	/* Attempt to free all objects */
+	free_kmem_cache_cpus(s);
+
 	for_each_online_node(node) {
 		struct kmem_cache_node *n = get_node(s, node);
 
@@ -2899,6 +3016,8 @@ void __init kmem_cache_init(void)
 		slub_min_objects = DEFAULT_ANTIFRAG_MIN_OBJECTS;
 	}
 
+	init_alloc_cpu();
+
 #ifdef CONFIG_NUMA
 	/*
 	 * Must first have the slab cache available for the allocations of the
@@ -2959,10 +3078,12 @@ void __init kmem_cache_init(void)
 
 #ifdef CONFIG_SMP
 	register_cpu_notifier(&slab_notifier);
+	kmem_size = offsetof(struct kmem_cache, cpu_slab) +
+				nr_cpu_ids * sizeof(struct kmem_cache_cpu *);
+#else
+	kmem_size = sizeof(struct kmem_cache);
 #endif
 
-	kmem_size = offsetof(struct kmem_cache, cpu_slab) +
-				nr_cpu_ids * sizeof(struct kmem_cache_cpu);
 
 	printk(KERN_INFO "SLUB: Genslabs=%d, HWalign=%d, Order=%d-%d, MinObjects=%d,"
 		" CPUs=%d, Nodes=%d\n",
@@ -3116,15 +3237,29 @@ static int __cpuinit slab_cpuup_callback
 	unsigned long flags;
 
 	switch (action) {
+	case CPU_UP_PREPARE:
+	case CPU_UP_PREPARE_FROZEN:
+		init_alloc_cpu_cpu(cpu);
+		down_read(&slub_lock);
+		list_for_each_entry(s, &slab_caches, list)
+			s->cpu_slab[cpu] = alloc_kmem_cache_cpu(s, cpu,
+							GFP_KERNEL);
+		up_read(&slub_lock);
+		break;
+
 	case CPU_UP_CANCELED:
 	case CPU_UP_CANCELED_FROZEN:
 	case CPU_DEAD:
 	case CPU_DEAD_FROZEN:
 		down_read(&slub_lock);
 		list_for_each_entry(s, &slab_caches, list) {
+			struct kmem_cache_cpu *c = get_cpu_slab(s, cpu);
+
 			local_irq_save(flags);
 			__flush_cpu_slab(s, cpu);
 			local_irq_restore(flags);
+			free_kmem_cache_cpu(c, cpu);
+			s->cpu_slab[cpu] = NULL;
 		}
 		up_read(&slub_lock);
 		break;

-- 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [patch 07/10] SLUB: Optimize cacheline use for zeroing
  2007-07-08  3:49 [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance Christoph Lameter
                   ` (5 preceding siblings ...)
  2007-07-08  3:49 ` [patch 06/10] SLUB: Place kmem_cache_cpu structures in a NUMA aware way Christoph Lameter
@ 2007-07-08  3:49 ` Christoph Lameter
  2007-07-08  3:50 ` [patch 08/10] SLUB: Single atomic instruction alloc/free using cmpxchg Christoph Lameter
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-08  3:49 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, suresh.b.siddha, corey.d.gough, Pekka Enberg, akpm

[-- Attachment #1: slub_optimize_zeroing --]
[-- Type: text/plain, Size: 2472 bytes --]

We touch a cacheline in the kmem_cache structure for zeroing to get the
size. However, the hot paths in slab_alloc and slab_free do not reference
any other fields in kmem_cache.

Add a new field to kmem_cache_cpu that contains the object size. That
cacheline must already be used. So we save one cacheline on every
slab_alloc.

We need to update the kmem_cache_cpu object size if an aliasing operation
changes the objsize of an non debug slab.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/linux/slub_def.h |    1 +
 mm/slub.c                |   14 ++++++++++++--
 2 files changed, 13 insertions(+), 2 deletions(-)

Index: linux-2.6.22-rc6-mm1/include/linux/slub_def.h
===================================================================
--- linux-2.6.22-rc6-mm1.orig/include/linux/slub_def.h	2007-07-07 13:56:24.000000000 -0700
+++ linux-2.6.22-rc6-mm1/include/linux/slub_def.h	2007-07-07 15:52:37.000000000 -0700
@@ -16,6 +16,7 @@ struct kmem_cache_cpu {
 	struct page *page;
 	int node;
 	unsigned int offset;
+	unsigned int objsize;
 };
 
 struct kmem_cache_node {
Index: linux-2.6.22-rc6-mm1/mm/slub.c
===================================================================
--- linux-2.6.22-rc6-mm1.orig/mm/slub.c	2007-07-07 13:56:24.000000000 -0700
+++ linux-2.6.22-rc6-mm1/mm/slub.c	2007-07-07 17:49:25.000000000 -0700
@@ -1571,7 +1571,7 @@ static void __always_inline *slab_alloc(
 	local_irq_restore(flags);
 
 	if (unlikely((gfpflags & __GFP_ZERO) && object))
-		memset(object, 0, s->objsize);
+		memset(object, 0, c->objsize);
 
 	return object;
 }
@@ -1864,8 +1864,9 @@ static void init_kmem_cache_cpu(struct k
 {
 	c->page = NULL;
 	c->freelist = NULL;
-	c->offset = s->offset / sizeof(void *);
 	c->node = 0;
+	c->offset = s->offset / sizeof(void *);
+	c->objsize = s->objsize;
 }
 
 static void init_kmem_cache_node(struct kmem_cache_node *n)
@@ -3173,12 +3174,21 @@ struct kmem_cache *kmem_cache_create(con
 	down_write(&slub_lock);
 	s = find_mergeable(size, align, flags, ctor, ops);
 	if (s) {
+		int cpu;
+
 		s->refcount++;
 		/*
 		 * Adjust the object sizes so that we clear
 		 * the complete object on kzalloc.
 		 */
 		s->objsize = max(s->objsize, (int)size);
+
+		/*
+		 * And then we need to update the object size in the
+		 * per cpu structures
+		 */
+		for_each_online_cpu(cpu)
+			get_cpu_slab(s, cpu)->objsize = s->objsize;
 		s->inuse = max_t(int, s->inuse, ALIGN(size, sizeof(void *)));
 		up_write(&slub_lock);
 

-- 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [patch 08/10] SLUB: Single atomic instruction alloc/free using cmpxchg
  2007-07-08  3:49 [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance Christoph Lameter
                   ` (6 preceding siblings ...)
  2007-07-08  3:49 ` [patch 07/10] SLUB: Optimize cacheline use for zeroing Christoph Lameter
@ 2007-07-08  3:50 ` Christoph Lameter
  2007-07-08  3:50 ` [patch 09/10] Remove the SLOB allocator for 2.6.23 Christoph Lameter
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-08  3:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, suresh.b.siddha, corey.d.gough, Pekka Enberg, akpm

[-- Attachment #1: slub_cmpxchg --]
[-- Type: text/plain, Size: 3632 bytes --]

A cmpxchg allows us to avoid disabling and enabling interrupts. The cmpxchg
is optimal to allow operations on per cpu freelist even if we may be moved
to other processors while getting to the cmpxchg. So we do not need to be
pinned to a cpu. This may be particularly useful for the RT kernel
where we currently seem to have major SLAB issues with the per cpu structures.
But the constant interrupt disable / enable of slab operations also increases
the performance in general.

The hard binding to per cpu structures only comes into play when we enter
the slow path (__slab_alloc and __slab_free). At that point we have to disable
interrupts like before.

We have a problem of determining the page struct in slab_free due the
issue that the freelist pointer is the only data value that we can reliably
operate on. So we need to do a virt_to_page() on the freelist. This makes it
impossible to use the fastpath for a full slab and increases overhead
through a second virt_to_page for each slab_free(). We really need the
virtual memmap patchset to get slab_free to good performance for this one.

Pro:

        - Dirty single cacheline with a single instruction in
          slab_alloc to accomplish allocation.
        - Critical section is also a single instruction in slab_free.
          (but we need to write to the cacheline of the object too)

Con:
        - Complex freelist management. __slab_alloc has to deal
	  with results of race conditions.
        - Recalculation of per cpu structure address is necessary
          in __slab_alloc since process may be rescheduled while
          executing in slab_alloc.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 mm/slub.c |   21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

Index: linux-2.6.22-rc6-mm1/mm/slub.c
===================================================================
--- linux-2.6.22-rc6-mm1.orig/mm/slub.c	2007-07-07 18:40:10.000000000 -0700
+++ linux-2.6.22-rc6-mm1/mm/slub.c	2007-07-07 18:46:04.000000000 -0700
@@ -1370,34 +1370,38 @@ static void unfreeze_slab(struct kmem_ca
 /*
  * Remove the cpu slab
  */
-static void deactivate_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
+static void deactivate_slab(struct kmem_cache *s, struct kmem_cache_cpu *c,
+			void **freelist)
 {
 	struct page *page = c->page;
+
+	c->page = NULL;
 	/*
 	 * Merge cpu freelist into freelist. Typically we get here
 	 * because both freelists are empty. So this is unlikely
 	 * to occur.
 	 */
-	while (unlikely(c->freelist)) {
+	while (unlikely(freelist)) {
 		void **object;
 
 		/* Retrieve object from cpu_freelist */
-		object = c->freelist;
-		c->freelist = c->freelist[c->offset];
+		object = freelist;
+		freelist = freelist[c->offset];
 
 		/* And put onto the regular freelist */
 		object[c->offset] = page->freelist;
 		page->freelist = object;
 		page->inuse--;
 	}
-	c->page = NULL;
 	unfreeze_slab(s, page);
 }
 
 static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 {
+	void **freelist = xchg(&c->freelist, NULL);
+
 	slab_lock(c->page);
-	deactivate_slab(s, c);
+	deactivate_slab(s, c, freelist);
 }
 
 /*
@@ -1467,10 +1471,13 @@ static void *__slab_alloc(struct kmem_ca
 {
 	void **object;
 	struct page *new;
+	void **freelist = NULL;
 
 	if (!c->page)
 		goto new_slab;
 
+	freelist = xchg(&c->freelist, NULL);
+
 	slab_lock(c->page);
 	if (unlikely(!node_match(c, node)))
 		goto another_slab;
@@ -1490,7 +1497,7 @@ load_freelist:
 	return object;
 
 another_slab:
-	deactivate_slab(s, c);
+	deactivate_slab(s, c, freelist);
 
 new_slab:
 	new = get_partial(s, gfpflags, node);

-- 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-08  3:49 [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance Christoph Lameter
                   ` (7 preceding siblings ...)
  2007-07-08  3:50 ` [patch 08/10] SLUB: Single atomic instruction alloc/free using cmpxchg Christoph Lameter
@ 2007-07-08  3:50 ` Christoph Lameter
  2007-07-08  7:51   ` Ingo Molnar
  2007-07-09 20:52   ` Matt Mackall
  2007-07-08  3:50 ` [patch 10/10] Remove slab in 2.6.24 Christoph Lameter
                   ` (2 subsequent siblings)
  11 siblings, 2 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-08  3:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, suresh.b.siddha, corey.d.gough, Pekka Enberg, akpm

[-- Attachment #1: rm_slob --]
[-- Type: text/plain, Size: 20286 bytes --]

Maintenance of slab allocators becomes a problem as new features for
allocators are developed. The SLOB allocator in particular has been lagging
behind in many ways in the past:

- Had no support for SLAB_DESTROY_BY_RCU for years (but no one noticed)

- Still has no support for slab reclaim counters. This may currently not
  be necessary if one would restrict the supported configurations for
  functionality relying on these. But even that has not been done.

The only current advantage over SLUB in terms of memory savings is through
SLOBs kmalloc layout that is not power of two based like SLAB and SLUB which
allows to eliminate some memory waste.

Through that SLOB has still a slight memory advantage over SLUB of ~350k in
for a standard server configuration. It is likely that the savings are is
smaller for real embedded configurations that have less functionality.

The density of non kmalloc slabs is superior in SLUB since SLOB has the need
to keep a per object structure in a page which leads to a minimal loss of 4
bytes. If cacheline alignment is necessary then the loss may need to be
greater in order to align the objects. SLUB can avoid that through a linked
list and can pack objects tighly together in a page with no objects in
between.

We will likely be adding new slab features soon and we can avoid maintenance
of SLOB if we remove it. SLOB has customarily been neglected. It is best
to remove SLOB altogether given the minimal benefit that is remaining.

That no one noticed the lack of SLAB_DESTROY_BY_RCU for the longest time
indicates also that the user base is minimal.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/linux/slab.h |    6 
 init/Kconfig         |   10 
 mm/Makefile          |    1 
 mm/slob.c            |  615 ---------------------------------------------------
 4 files changed, 2 insertions(+), 630 deletions(-)

Index: linux-2.6.22-rc6-mm1/init/Kconfig
===================================================================
--- linux-2.6.22-rc6-mm1.orig/init/Kconfig	2007-07-03 17:19:28.000000000 -0700
+++ linux-2.6.22-rc6-mm1/init/Kconfig	2007-07-05 22:17:20.000000000 -0700
@@ -624,16 +624,6 @@ config SLUB
 	   of queues of objects. SLUB can use memory efficiently
 	   and has enhanced diagnostics.
 
-config SLOB
-	depends on EMBEDDED && !SPARSEMEM
-	bool "SLOB (Simple Allocator)"
-	help
-	   SLOB replaces the SLAB allocator with a drastically simpler
-	   allocator.  SLOB is more space efficient than SLAB but does not
-	   scale well (single lock for all operations) and is also highly
-	   susceptible to fragmentation. SLUB can accomplish a higher object
-	   density. It is usually better to use SLUB instead of SLOB.
-
 endchoice
 
 config PROC_SMAPS
Index: linux-2.6.22-rc6-mm1/mm/slob.c
===================================================================
--- linux-2.6.22-rc6-mm1.orig/mm/slob.c	2007-07-05 19:08:30.000000000 -0700
+++ /dev/null	1970-01-01 00:00:00.000000000 +0000
@@ -1,615 +0,0 @@
-/*
- * SLOB Allocator: Simple List Of Blocks
- *
- * Matt Mackall <mpm@selenic.com> 12/30/03
- *
- * NUMA support by Paul Mundt, 2007.
- *
- * How SLOB works:
- *
- * The core of SLOB is a traditional K&R style heap allocator, with
- * support for returning aligned objects. The granularity of this
- * allocator is as little as 2 bytes, however typically most architectures
- * will require 4 bytes on 32-bit and 8 bytes on 64-bit.
- *
- * The slob heap is a linked list of pages from alloc_pages(), and
- * within each page, there is a singly-linked list of free blocks (slob_t).
- * The heap is grown on demand and allocation from the heap is currently
- * first-fit.
- *
- * Above this is an implementation of kmalloc/kfree. Blocks returned
- * from kmalloc are prepended with a 4-byte header with the kmalloc size.
- * If kmalloc is asked for objects of PAGE_SIZE or larger, it calls
- * alloc_pages() directly, allocating compound pages so the page order
- * does not have to be separately tracked, and also stores the exact
- * allocation size in page->private so that it can be used to accurately
- * provide ksize(). These objects are detected in kfree() because slob_page()
- * is false for them.
- *
- * SLAB is emulated on top of SLOB by simply calling constructors and
- * destructors for every SLAB allocation. Objects are returned with the
- * 4-byte alignment unless the SLAB_HWCACHE_ALIGN flag is set, in which
- * case the low-level allocator will fragment blocks to create the proper
- * alignment. Again, objects of page-size or greater are allocated by
- * calling alloc_pages(). As SLAB objects know their size, no separate
- * size bookkeeping is necessary and there is essentially no allocation
- * space overhead, and compound pages aren't needed for multi-page
- * allocations.
- *
- * NUMA support in SLOB is fairly simplistic, pushing most of the real
- * logic down to the page allocator, and simply doing the node accounting
- * on the upper levels. In the event that a node id is explicitly
- * provided, alloc_pages_node() with the specified node id is used
- * instead. The common case (or when the node id isn't explicitly provided)
- * will default to the current node, as per numa_node_id().
- *
- * Node aware pages are still inserted in to the global freelist, and
- * these are scanned for by matching against the node id encoded in the
- * page flags. As a result, block allocations that can be satisfied from
- * the freelist will only be done so on pages residing on the same node,
- * in order to prevent random node placement.
- */
-
-#include <linux/kernel.h>
-#include <linux/slab.h>
-#include <linux/mm.h>
-#include <linux/cache.h>
-#include <linux/init.h>
-#include <linux/module.h>
-#include <linux/rcupdate.h>
-#include <linux/list.h>
-#include <asm/atomic.h>
-
-/*
- * slob_block has a field 'units', which indicates size of block if +ve,
- * or offset of next block if -ve (in SLOB_UNITs).
- *
- * Free blocks of size 1 unit simply contain the offset of the next block.
- * Those with larger size contain their size in the first SLOB_UNIT of
- * memory, and the offset of the next free block in the second SLOB_UNIT.
- */
-#if PAGE_SIZE <= (32767 * 2)
-typedef s16 slobidx_t;
-#else
-typedef s32 slobidx_t;
-#endif
-
-struct slob_block {
-	slobidx_t units;
-};
-typedef struct slob_block slob_t;
-
-/*
- * We use struct page fields to manage some slob allocation aspects,
- * however to avoid the horrible mess in include/linux/mm_types.h, we'll
- * just define our own struct page type variant here.
- */
-struct slob_page {
-	union {
-		struct {
-			unsigned long flags;	/* mandatory */
-			atomic_t _count;	/* mandatory */
-			slobidx_t units;	/* free units left in page */
-			unsigned long pad[2];
-			slob_t *free;		/* first free slob_t in page */
-			struct list_head list;	/* linked list of free pages */
-		};
-		struct page page;
-	};
-};
-static inline void struct_slob_page_wrong_size(void)
-{ BUILD_BUG_ON(sizeof(struct slob_page) != sizeof(struct page)); }
-
-/*
- * free_slob_page: call before a slob_page is returned to the page allocator.
- */
-static inline void free_slob_page(struct slob_page *sp)
-{
-	reset_page_mapcount(&sp->page);
-	sp->page.mapping = NULL;
-}
-
-/*
- * All (partially) free slob pages go on this list.
- */
-static LIST_HEAD(free_slob_pages);
-
-/*
- * slob_page: True for all slob pages (false for bigblock pages)
- */
-static inline int slob_page(struct slob_page *sp)
-{
-	return test_bit(PG_active, &sp->flags);
-}
-
-static inline void set_slob_page(struct slob_page *sp)
-{
-	__set_bit(PG_active, &sp->flags);
-}
-
-static inline void clear_slob_page(struct slob_page *sp)
-{
-	__clear_bit(PG_active, &sp->flags);
-}
-
-/*
- * slob_page_free: true for pages on free_slob_pages list.
- */
-static inline int slob_page_free(struct slob_page *sp)
-{
-	return test_bit(PG_private, &sp->flags);
-}
-
-static inline void set_slob_page_free(struct slob_page *sp)
-{
-	list_add(&sp->list, &free_slob_pages);
-	__set_bit(PG_private, &sp->flags);
-}
-
-static inline void clear_slob_page_free(struct slob_page *sp)
-{
-	list_del(&sp->list);
-	__clear_bit(PG_private, &sp->flags);
-}
-
-#define SLOB_UNIT sizeof(slob_t)
-#define SLOB_UNITS(size) (((size) + SLOB_UNIT - 1)/SLOB_UNIT)
-#define SLOB_ALIGN L1_CACHE_BYTES
-
-/*
- * struct slob_rcu is inserted at the tail of allocated slob blocks, which
- * were created with a SLAB_DESTROY_BY_RCU slab. slob_rcu is used to free
- * the block using call_rcu.
- */
-struct slob_rcu {
-	struct rcu_head head;
-	int size;
-};
-
-/*
- * slob_lock protects all slob allocator structures.
- */
-static DEFINE_SPINLOCK(slob_lock);
-
-/*
- * Encode the given size and next info into a free slob block s.
- */
-static void set_slob(slob_t *s, slobidx_t size, slob_t *next)
-{
-	slob_t *base = (slob_t *)((unsigned long)s & PAGE_MASK);
-	slobidx_t offset = next - base;
-
-	if (size > 1) {
-		s[0].units = size;
-		s[1].units = offset;
-	} else
-		s[0].units = -offset;
-}
-
-/*
- * Return the size of a slob block.
- */
-static slobidx_t slob_units(slob_t *s)
-{
-	if (s->units > 0)
-		return s->units;
-	return 1;
-}
-
-/*
- * Return the next free slob block pointer after this one.
- */
-static slob_t *slob_next(slob_t *s)
-{
-	slob_t *base = (slob_t *)((unsigned long)s & PAGE_MASK);
-	slobidx_t next;
-
-	if (s[0].units < 0)
-		next = -s[0].units;
-	else
-		next = s[1].units;
-	return base+next;
-}
-
-/*
- * Returns true if s is the last free block in its page.
- */
-static int slob_last(slob_t *s)
-{
-	return !((unsigned long)slob_next(s) & ~PAGE_MASK);
-}
-
-static void *slob_new_page(gfp_t gfp, int order, int node)
-{
-	void *page;
-
-#ifdef CONFIG_NUMA
-	if (node != -1)
-		page = alloc_pages_node(node, gfp, order);
-	else
-#endif
-		page = alloc_pages(gfp, order);
-
-	if (!page)
-		return NULL;
-
-	return page_address(page);
-}
-
-/*
- * Allocate a slob block within a given slob_page sp.
- */
-static void *slob_page_alloc(struct slob_page *sp, size_t size, int align)
-{
-	slob_t *prev, *cur, *aligned = 0;
-	int delta = 0, units = SLOB_UNITS(size);
-
-	for (prev = NULL, cur = sp->free; ; prev = cur, cur = slob_next(cur)) {
-		slobidx_t avail = slob_units(cur);
-
-		if (align) {
-			aligned = (slob_t *)ALIGN((unsigned long)cur, align);
-			delta = aligned - cur;
-		}
-		if (avail >= units + delta) { /* room enough? */
-			slob_t *next;
-
-			if (delta) { /* need to fragment head to align? */
-				next = slob_next(cur);
-				set_slob(aligned, avail - delta, next);
-				set_slob(cur, delta, aligned);
-				prev = cur;
-				cur = aligned;
-				avail = slob_units(cur);
-			}
-
-			next = slob_next(cur);
-			if (avail == units) { /* exact fit? unlink. */
-				if (prev)
-					set_slob(prev, slob_units(prev), next);
-				else
-					sp->free = next;
-			} else { /* fragment */
-				if (prev)
-					set_slob(prev, slob_units(prev), cur + units);
-				else
-					sp->free = cur + units;
-				set_slob(cur + units, avail - units, next);
-			}
-
-			sp->units -= units;
-			if (!sp->units)
-				clear_slob_page_free(sp);
-			return cur;
-		}
-		if (slob_last(cur))
-			return NULL;
-	}
-}
-
-/*
- * slob_alloc: entry point into the slob allocator.
- */
-static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
-{
-	struct slob_page *sp;
-	slob_t *b = NULL;
-	unsigned long flags;
-
-	spin_lock_irqsave(&slob_lock, flags);
-	/* Iterate through each partially free page, try to find room */
-	list_for_each_entry(sp, &free_slob_pages, list) {
-#ifdef CONFIG_NUMA
-		/*
-		 * If there's a node specification, search for a partial
-		 * page with a matching node id in the freelist.
-		 */
-		if (node != -1 && page_to_nid(&sp->page) != node)
-			continue;
-#endif
-
-		if (sp->units >= SLOB_UNITS(size)) {
-			b = slob_page_alloc(sp, size, align);
-			if (b)
-				break;
-		}
-	}
-	spin_unlock_irqrestore(&slob_lock, flags);
-
-	/* Not enough space: must allocate a new page */
-	if (!b) {
-		b = slob_new_page(gfp, 0, node);
-		if (!b)
-			return 0;
-		sp = (struct slob_page *)virt_to_page(b);
-		set_slob_page(sp);
-
-		spin_lock_irqsave(&slob_lock, flags);
-		sp->units = SLOB_UNITS(PAGE_SIZE);
-		sp->free = b;
-		INIT_LIST_HEAD(&sp->list);
-		set_slob(b, SLOB_UNITS(PAGE_SIZE), b + SLOB_UNITS(PAGE_SIZE));
-		set_slob_page_free(sp);
-		b = slob_page_alloc(sp, size, align);
-		BUG_ON(!b);
-		spin_unlock_irqrestore(&slob_lock, flags);
-	}
-	if (unlikely((gfp & __GFP_ZERO) && b))
-		memset(b, 0, size);
-	return b;
-}
-
-/*
- * slob_free: entry point into the slob allocator.
- */
-static void slob_free(void *block, int size)
-{
-	struct slob_page *sp;
-	slob_t *prev, *next, *b = (slob_t *)block;
-	slobidx_t units;
-	unsigned long flags;
-
-	if (ZERO_OR_NULL_PTR(block))
-		return;
-	BUG_ON(!size);
-
-	sp = (struct slob_page *)virt_to_page(block);
-	units = SLOB_UNITS(size);
-
-	spin_lock_irqsave(&slob_lock, flags);
-
-	if (sp->units + units == SLOB_UNITS(PAGE_SIZE)) {
-		/* Go directly to page allocator. Do not pass slob allocator */
-		if (slob_page_free(sp))
-			clear_slob_page_free(sp);
-		clear_slob_page(sp);
-		free_slob_page(sp);
-		free_page((unsigned long)b);
-		goto out;
-	}
-
-	if (!slob_page_free(sp)) {
-		/* This slob page is about to become partially free. Easy! */
-		sp->units = units;
-		sp->free = b;
-		set_slob(b, units,
-			(void *)((unsigned long)(b +
-					SLOB_UNITS(PAGE_SIZE)) & PAGE_MASK));
-		set_slob_page_free(sp);
-		goto out;
-	}
-
-	/*
-	 * Otherwise the page is already partially free, so find reinsertion
-	 * point.
-	 */
-	sp->units += units;
-
-	if (b < sp->free) {
-		set_slob(b, units, sp->free);
-		sp->free = b;
-	} else {
-		prev = sp->free;
-		next = slob_next(prev);
-		while (b > next) {
-			prev = next;
-			next = slob_next(prev);
-		}
-
-		if (!slob_last(prev) && b + units == next) {
-			units += slob_units(next);
-			set_slob(b, units, slob_next(next));
-		} else
-			set_slob(b, units, next);
-
-		if (prev + slob_units(prev) == b) {
-			units = slob_units(b) + slob_units(prev);
-			set_slob(prev, units, slob_next(b));
-		} else
-			set_slob(prev, slob_units(prev), b);
-	}
-out:
-	spin_unlock_irqrestore(&slob_lock, flags);
-}
-
-/*
- * End of slob allocator proper. Begin kmem_cache_alloc and kmalloc frontend.
- */
-
-#ifndef ARCH_KMALLOC_MINALIGN
-#define ARCH_KMALLOC_MINALIGN __alignof__(unsigned long)
-#endif
-
-#ifndef ARCH_SLAB_MINALIGN
-#define ARCH_SLAB_MINALIGN __alignof__(unsigned long)
-#endif
-
-void *__kmalloc_node(size_t size, gfp_t gfp, int node)
-{
-	unsigned int *m;
-	int align = max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
-
-	if (size < PAGE_SIZE - align) {
-		if (!size)
-			return ZERO_SIZE_PTR;
-
-		m = slob_alloc(size + align, gfp, align, node);
-		if (m)
-			*m = size;
-		return (void *)m + align;
-	} else {
-		void *ret;
-
-		ret = slob_new_page(gfp | __GFP_COMP, get_order(size), node);
-		if (ret) {
-			struct page *page;
-			page = virt_to_page(ret);
-			page->private = size;
-		}
-		return ret;
-	}
-}
-EXPORT_SYMBOL(__kmalloc_node);
-
-void kfree(const void *block)
-{
-	struct slob_page *sp;
-
-	if (ZERO_OR_NULL_PTR(block))
-		return;
-
-	sp = (struct slob_page *)virt_to_page(block);
-	if (slob_page(sp)) {
-		int align = max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
-		unsigned int *m = (unsigned int *)(block - align);
-		slob_free(m, *m + align);
-	} else
-		put_page(&sp->page);
-}
-EXPORT_SYMBOL(kfree);
-
-/* can't use ksize for kmem_cache_alloc memory, only kmalloc */
-size_t ksize(const void *block)
-{
-	struct slob_page *sp;
-
-	if (ZERO_OR_NULL_PTR(block))
-		return 0;
-
-	sp = (struct slob_page *)virt_to_page(block);
-	if (slob_page(sp))
-		return ((slob_t *)block - 1)->units + SLOB_UNIT;
-	else
-		return sp->page.private;
-}
-
-struct kmem_cache {
-	unsigned int size, align;
-	unsigned long flags;
-	const char *name;
-	void (*ctor)(void *, struct kmem_cache *, unsigned long);
-};
-
-struct kmem_cache *kmem_cache_create(const char *name, size_t size,
-	size_t align, unsigned long flags,
-	void (*ctor)(void*, struct kmem_cache *, unsigned long),
-	const struct kmem_cache_ops *o)
-{
-	struct kmem_cache *c;
-
-	c = slob_alloc(sizeof(struct kmem_cache), flags, 0, -1);
-
-	if (c) {
-		c->name = name;
-		c->size = size;
-		if (flags & SLAB_DESTROY_BY_RCU) {
-			/* leave room for rcu footer at the end of object */
-			c->size += sizeof(struct slob_rcu);
-		}
-		c->flags = flags;
-		c->ctor = ctor;
-		/* ignore alignment unless it's forced */
-		c->align = (flags & SLAB_HWCACHE_ALIGN) ? SLOB_ALIGN : 0;
-		if (c->align < ARCH_SLAB_MINALIGN)
-			c->align = ARCH_SLAB_MINALIGN;
-		if (c->align < align)
-			c->align = align;
-	} else if (flags & SLAB_PANIC)
-		panic("Cannot create slab cache %s\n", name);
-
-	return c;
-}
-EXPORT_SYMBOL(kmem_cache_create);
-
-void kmem_cache_destroy(struct kmem_cache *c)
-{
-	slob_free(c, sizeof(struct kmem_cache));
-}
-EXPORT_SYMBOL(kmem_cache_destroy);
-
-void *kmem_cache_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
-{
-	void *b;
-
-	if (c->size < PAGE_SIZE)
-		b = slob_alloc(c->size, flags, c->align, node);
-	else
-		b = slob_new_page(flags, get_order(c->size), node);
-
-	if (c->ctor)
-		c->ctor(b, c, 0);
-
-	return b;
-}
-EXPORT_SYMBOL(kmem_cache_alloc_node);
-
-static void __kmem_cache_free(void *b, int size)
-{
-	if (size < PAGE_SIZE)
-		slob_free(b, size);
-	else
-		free_pages((unsigned long)b, get_order(size));
-}
-
-static void kmem_rcu_free(struct rcu_head *head)
-{
-	struct slob_rcu *slob_rcu = (struct slob_rcu *)head;
-	void *b = (void *)slob_rcu - (slob_rcu->size - sizeof(struct slob_rcu));
-
-	__kmem_cache_free(b, slob_rcu->size);
-}
-
-void kmem_cache_free(struct kmem_cache *c, void *b)
-{
-	if (unlikely(c->flags & SLAB_DESTROY_BY_RCU)) {
-		struct slob_rcu *slob_rcu;
-		slob_rcu = b + (c->size - sizeof(struct slob_rcu));
-		INIT_RCU_HEAD(&slob_rcu->head);
-		slob_rcu->size = c->size;
-		call_rcu(&slob_rcu->head, kmem_rcu_free);
-	} else {
-		__kmem_cache_free(b, c->size);
-	}
-}
-EXPORT_SYMBOL(kmem_cache_free);
-
-unsigned int kmem_cache_size(struct kmem_cache *c)
-{
-	return c->size;
-}
-EXPORT_SYMBOL(kmem_cache_size);
-
-const char *kmem_cache_name(struct kmem_cache *c)
-{
-	return c->name;
-}
-EXPORT_SYMBOL(kmem_cache_name);
-
-int kmem_cache_shrink(struct kmem_cache *d)
-{
-	return 0;
-}
-EXPORT_SYMBOL(kmem_cache_shrink);
-
-int kmem_cache_defrag(int percentage, int node)
-{
-	return 0;
-}
-
-/*
- * SLOB does not support slab defragmentation
- */
-int kmem_cache_vacate(struct page *page)
-{
-	return 0;
-}
-EXPORT_SYMBOL(kmem_cache_vacate);
-
-int kmem_ptr_validate(struct kmem_cache *a, const void *b)
-{
-	return 0;
-}
-
-void __init kmem_cache_init(void)
-{
-}
Index: linux-2.6.22-rc6-mm1/include/linux/slab.h
===================================================================
--- linux-2.6.22-rc6-mm1.orig/include/linux/slab.h	2007-07-05 19:08:30.000000000 -0700
+++ linux-2.6.22-rc6-mm1/include/linux/slab.h	2007-07-05 22:18:14.000000000 -0700
@@ -156,8 +156,6 @@ size_t ksize(const void *);
  */
 #ifdef CONFIG_SLUB
 #include <linux/slub_def.h>
-#elif defined(CONFIG_SLOB)
-#include <linux/slob_def.h>
 #else
 #include <linux/slab_def.h>
 #endif
@@ -220,7 +218,7 @@ static inline void *kcalloc(size_t n, si
 	return __kmalloc(n * size, flags | __GFP_ZERO);
 }
 
-#if !defined(CONFIG_NUMA) && !defined(CONFIG_SLOB)
+#if !defined(CONFIG_NUMA)
 /**
  * kmalloc_node - allocate memory from a specific node
  * @size: how many bytes of memory are required.
@@ -248,7 +246,7 @@ static inline void *kmem_cache_alloc_nod
 {
 	return kmem_cache_alloc(cachep, flags);
 }
-#endif /* !CONFIG_NUMA && !CONFIG_SLOB */
+#endif /* !CONFIG_NUMA */
 
 /*
  * kmalloc_track_caller is a special version of kmalloc that records the
Index: linux-2.6.22-rc6-mm1/mm/Makefile
===================================================================
--- linux-2.6.22-rc6-mm1.orig/mm/Makefile	2007-07-03 17:19:28.000000000 -0700
+++ linux-2.6.22-rc6-mm1/mm/Makefile	2007-07-05 22:17:20.000000000 -0700
@@ -22,7 +22,6 @@ obj-$(CONFIG_SPARSEMEM)	+= sparse.o
 obj-$(CONFIG_SHMEM) += shmem.o
 obj-$(CONFIG_TMPFS_POSIX_ACL) += shmem_acl.o
 obj-$(CONFIG_TINY_SHMEM) += tiny-shmem.o
-obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_SLAB) += slab.o
 obj-$(CONFIG_SLUB) += slub.o
 obj-$(CONFIG_MEMORY_HOTPLUG) += memory_hotplug.o

-- 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [patch 10/10] Remove slab in 2.6.24
  2007-07-08  3:49 [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance Christoph Lameter
                   ` (8 preceding siblings ...)
  2007-07-08  3:50 ` [patch 09/10] Remove the SLOB allocator for 2.6.23 Christoph Lameter
@ 2007-07-08  3:50 ` Christoph Lameter
  2007-07-08  4:37 ` [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance David Miller
  2007-07-08 11:20 ` Andi Kleen
  11 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-08  3:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, suresh.b.siddha, corey.d.gough, Pekka Enberg, akpm

[-- Attachment #1: rm_slab --]
[-- Type: text/plain, Size: 130348 bytes --]

The SLAB functionality has been supplanted by SLUB. Benefits of SLUB

- More compact data store. Less cache footprint
- Reporting function and tools
- Higher speed
- Eliminate SLAB bitrot

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 fs/proc/proc_misc.c  |   47 
 include/linux/slab.h |   18 
 init/Kconfig         |   26 
 lib/Kconfig.debug    |   17 
 mm/Makefile          |    4 
 mm/slab.c            | 4448 ---------------------------------------------------
 6 files changed, 2 insertions(+), 4558 deletions(-)

Index: linux-2.6.22-rc6-mm1/include/linux/slab.h
===================================================================
--- linux-2.6.22-rc6-mm1.orig/include/linux/slab.h	2007-07-05 23:28:12.000000000 -0700
+++ linux-2.6.22-rc6-mm1/include/linux/slab.h	2007-07-05 23:36:12.000000000 -0700
@@ -16,7 +16,6 @@
 
 /*
  * Flags to pass to kmem_cache_create().
- * The ones marked DEBUG are only valid if CONFIG_SLAB_DEBUG is set.
  */
 #define SLAB_DEBUG_FREE		0x00000100UL	/* DEBUG: Perform (expensive) checks on free */
 #define SLAB_RED_ZONE		0x00000400UL	/* DEBUG: Red zone objs in a cache */
@@ -154,11 +153,7 @@ size_t ksize(const void *);
  * See each allocator definition file for additional comments and
  * implementation notes.
  */
-#ifdef CONFIG_SLUB
 #include <linux/slub_def.h>
-#else
-#include <linux/slab_def.h>
-#endif
 
 /**
  * kcalloc - allocate memory for an array. The memory is set to zero.
@@ -256,14 +251,9 @@ static inline void *kmem_cache_alloc_nod
  * allocator where we care about the real place the memory allocation
  * request comes from.
  */
-#if defined(CONFIG_DEBUG_SLAB) || defined(CONFIG_SLUB)
 extern void *__kmalloc_track_caller(size_t, gfp_t, void*);
 #define kmalloc_track_caller(size, flags) \
 	__kmalloc_track_caller(size, flags, __builtin_return_address(0))
-#else
-#define kmalloc_track_caller(size, flags) \
-	__kmalloc(size, flags)
-#endif /* DEBUG_SLAB */
 
 #ifdef CONFIG_NUMA
 /*
@@ -274,22 +264,16 @@ extern void *__kmalloc_track_caller(size
  * standard allocator where we care about the real place the memory
  * allocation request comes from.
  */
-#if defined(CONFIG_DEBUG_SLAB) || defined(CONFIG_SLUB)
 extern void *__kmalloc_node_track_caller(size_t, gfp_t, int, void *);
 #define kmalloc_node_track_caller(size, flags, node) \
 	__kmalloc_node_track_caller(size, flags, node, \
 			__builtin_return_address(0))
-#else
-#define kmalloc_node_track_caller(size, flags, node) \
-	__kmalloc_node(size, flags, node)
-#endif
-
 #else /* CONFIG_NUMA */
 
 #define kmalloc_node_track_caller(size, flags, node) \
 	kmalloc_track_caller(size, flags)
 
-#endif /* DEBUG_SLAB */
+#endif /* CONFIG_NUMA */
 
 /*
  * Shortcuts
Index: linux-2.6.22-rc6-mm1/init/Kconfig
===================================================================
--- linux-2.6.22-rc6-mm1.orig/init/Kconfig	2007-07-05 23:30:11.000000000 -0700
+++ linux-2.6.22-rc6-mm1/init/Kconfig	2007-07-06 09:14:48.000000000 -0700
@@ -594,38 +594,12 @@ config VM_EVENT_COUNTERS
 config SLUB_DEBUG
 	default y
 	bool "Enable SLUB debugging support" if EMBEDDED
-	depends on SLUB
 	help
 	  SLUB has extensive debug support features. Disabling these can
 	  result in significant savings in code size. This also disables
 	  SLUB sysfs support. /sys/slab will not exist and there will be
 	  no support for cache validation etc.
 
-choice
-	prompt "Choose SLAB allocator"
-	default SLUB
-	help
-	   This option allows to select a slab allocator.
-
-config SLAB
-	bool "SLAB"
-	help
-	  The regular slab allocator that is established and known to work
-	  well in all environments. It organizes cache hot objects in
-	  per cpu and per node queues. SLAB is the default choice for
-	  a slab allocator.
-
-config SLUB
-	bool "SLUB (Unqueued Allocator)"
-	help
-	   SLUB is a slab allocator that minimizes cache line usage
-	   instead of managing queues of cached objects (SLAB approach).
-	   Per cpu caching is realized using slabs of objects instead
-	   of queues of objects. SLUB can use memory efficiently
-	   and has enhanced diagnostics.
-
-endchoice
-
 config PROC_SMAPS
 	default y
 	bool "Enable /proc/pid/smaps support" if EMBEDDED && PROC_FS && MMU
Index: linux-2.6.22-rc6-mm1/mm/Makefile
===================================================================
--- linux-2.6.22-rc6-mm1.orig/mm/Makefile	2007-07-05 23:29:47.000000000 -0700
+++ linux-2.6.22-rc6-mm1/mm/Makefile	2007-07-05 23:30:07.000000000 -0700
@@ -11,7 +11,7 @@ obj-y			:= bootmem.o filemap.o mempool.o
 			   page_alloc.o page-writeback.o pdflush.o \
 			   readahead.o swap.o truncate.o vmscan.o \
 			   prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
-			   $(mmu-y)
+			   slub.o $(mmu-y)
 
 obj-$(CONFIG_BOUNCE)	+= bounce.o
 obj-$(CONFIG_SWAP)	+= page_io.o swap_state.o swapfile.o thrash.o
@@ -22,8 +22,6 @@ obj-$(CONFIG_SPARSEMEM)	+= sparse.o
 obj-$(CONFIG_SHMEM) += shmem.o
 obj-$(CONFIG_TMPFS_POSIX_ACL) += shmem_acl.o
 obj-$(CONFIG_TINY_SHMEM) += tiny-shmem.o
-obj-$(CONFIG_SLAB) += slab.o
-obj-$(CONFIG_SLUB) += slub.o
 obj-$(CONFIG_MEMORY_HOTPLUG) += memory_hotplug.o
 obj-$(CONFIG_FS_XIP) += filemap_xip.o
 obj-$(CONFIG_MIGRATION) += migrate.o
Index: linux-2.6.22-rc6-mm1/mm/slab.c
===================================================================
--- linux-2.6.22-rc6-mm1.orig/mm/slab.c	2007-07-05 23:30:53.000000000 -0700
+++ /dev/null	1970-01-01 00:00:00.000000000 +0000
@@ -1,4448 +0,0 @@
-/*
- * linux/mm/slab.c
- * Written by Mark Hemment, 1996/97.
- * (markhe@nextd.demon.co.uk)
- *
- * kmem_cache_destroy() + some cleanup - 1999 Andrea Arcangeli
- *
- * Major cleanup, different bufctl logic, per-cpu arrays
- *	(c) 2000 Manfred Spraul
- *
- * Cleanup, make the head arrays unconditional, preparation for NUMA
- * 	(c) 2002 Manfred Spraul
- *
- * An implementation of the Slab Allocator as described in outline in;
- *	UNIX Internals: The New Frontiers by Uresh Vahalia
- *	Pub: Prentice Hall	ISBN 0-13-101908-2
- * or with a little more detail in;
- *	The Slab Allocator: An Object-Caching Kernel Memory Allocator
- *	Jeff Bonwick (Sun Microsystems).
- *	Presented at: USENIX Summer 1994 Technical Conference
- *
- * The memory is organized in caches, one cache for each object type.
- * (e.g. inode_cache, dentry_cache, buffer_head, vm_area_struct)
- * Each cache consists out of many slabs (they are small (usually one
- * page long) and always contiguous), and each slab contains multiple
- * initialized objects.
- *
- * This means, that your constructor is used only for newly allocated
- * slabs and you must pass objects with the same intializations to
- * kmem_cache_free.
- *
- * Each cache can only support one memory type (GFP_DMA, GFP_HIGHMEM,
- * normal). If you need a special memory type, then must create a new
- * cache for that memory type.
- *
- * In order to reduce fragmentation, the slabs are sorted in 3 groups:
- *   full slabs with 0 free objects
- *   partial slabs
- *   empty slabs with no allocated objects
- *
- * If partial slabs exist, then new allocations come from these slabs,
- * otherwise from empty slabs or new slabs are allocated.
- *
- * kmem_cache_destroy() CAN CRASH if you try to allocate from the cache
- * during kmem_cache_destroy(). The caller must prevent concurrent allocs.
- *
- * Each cache has a short per-cpu head array, most allocs
- * and frees go into that array, and if that array overflows, then 1/2
- * of the entries in the array are given back into the global cache.
- * The head array is strictly LIFO and should improve the cache hit rates.
- * On SMP, it additionally reduces the spinlock operations.
- *
- * The c_cpuarray may not be read with enabled local interrupts -
- * it's changed with a smp_call_function().
- *
- * SMP synchronization:
- *  constructors and destructors are called without any locking.
- *  Several members in struct kmem_cache and struct slab never change, they
- *	are accessed without any locking.
- *  The per-cpu arrays are never accessed from the wrong cpu, no locking,
- *  	and local interrupts are disabled so slab code is preempt-safe.
- *  The non-constant members are protected with a per-cache irq spinlock.
- *
- * Many thanks to Mark Hemment, who wrote another per-cpu slab patch
- * in 2000 - many ideas in the current implementation are derived from
- * his patch.
- *
- * Further notes from the original documentation:
- *
- * 11 April '97.  Started multi-threading - markhe
- *	The global cache-chain is protected by the mutex 'cache_chain_mutex'.
- *	The sem is only needed when accessing/extending the cache-chain, which
- *	can never happen inside an interrupt (kmem_cache_create(),
- *	kmem_cache_shrink() and kmem_cache_reap()).
- *
- *	At present, each engine can be growing a cache.  This should be blocked.
- *
- * 15 March 2005. NUMA slab allocator.
- *	Shai Fultheim <shai@scalex86.org>.
- *	Shobhit Dayal <shobhit@calsoftinc.com>
- *	Alok N Kataria <alokk@calsoftinc.com>
- *	Christoph Lameter <christoph@lameter.com>
- *
- *	Modified the slab allocator to be node aware on NUMA systems.
- *	Each node has its own list of partial, free and full slabs.
- *	All object allocations for a node occur from node specific slab lists.
- */
-
-#include	<linux/slab.h>
-#include	<linux/mm.h>
-#include	<linux/poison.h>
-#include	<linux/swap.h>
-#include	<linux/cache.h>
-#include	<linux/interrupt.h>
-#include	<linux/init.h>
-#include	<linux/compiler.h>
-#include	<linux/cpuset.h>
-#include	<linux/seq_file.h>
-#include	<linux/notifier.h>
-#include	<linux/kallsyms.h>
-#include	<linux/cpu.h>
-#include	<linux/sysctl.h>
-#include	<linux/module.h>
-#include	<linux/rcupdate.h>
-#include	<linux/string.h>
-#include	<linux/uaccess.h>
-#include	<linux/nodemask.h>
-#include	<linux/mempolicy.h>
-#include	<linux/mutex.h>
-#include	<linux/fault-inject.h>
-#include	<linux/rtmutex.h>
-#include	<linux/reciprocal_div.h>
-
-#include	<asm/cacheflush.h>
-#include	<asm/tlbflush.h>
-#include	<asm/page.h>
-
-/*
- * DEBUG	- 1 for kmem_cache_create() to honour; SLAB_RED_ZONE & SLAB_POISON.
- *		  0 for faster, smaller code (especially in the critical paths).
- *
- * STATS	- 1 to collect stats for /proc/slabinfo.
- *		  0 for faster, smaller code (especially in the critical paths).
- *
- * FORCED_DEBUG	- 1 enables SLAB_RED_ZONE and SLAB_POISON (if possible)
- */
-
-#ifdef CONFIG_DEBUG_SLAB
-#define	DEBUG		1
-#define	STATS		1
-#define	FORCED_DEBUG	1
-#else
-#define	DEBUG		0
-#define	STATS		0
-#define	FORCED_DEBUG	0
-#endif
-
-/* Shouldn't this be in a header file somewhere? */
-#define	BYTES_PER_WORD		sizeof(void *)
-
-#ifndef cache_line_size
-#define cache_line_size()	L1_CACHE_BYTES
-#endif
-
-#ifndef ARCH_KMALLOC_MINALIGN
-/*
- * Enforce a minimum alignment for the kmalloc caches.
- * Usually, the kmalloc caches are cache_line_size() aligned, except when
- * DEBUG and FORCED_DEBUG are enabled, then they are BYTES_PER_WORD aligned.
- * Some archs want to perform DMA into kmalloc caches and need a guaranteed
- * alignment larger than the alignment of a 64-bit integer.
- * ARCH_KMALLOC_MINALIGN allows that.
- * Note that increasing this value may disable some debug features.
- */
-#define ARCH_KMALLOC_MINALIGN __alignof__(unsigned long long)
-#endif
-
-#ifndef ARCH_SLAB_MINALIGN
-/*
- * Enforce a minimum alignment for all caches.
- * Intended for archs that get misalignment faults even for BYTES_PER_WORD
- * aligned buffers. Includes ARCH_KMALLOC_MINALIGN.
- * If possible: Do not enable this flag for CONFIG_DEBUG_SLAB, it disables
- * some debug features.
- */
-#define ARCH_SLAB_MINALIGN 0
-#endif
-
-#ifndef ARCH_KMALLOC_FLAGS
-#define ARCH_KMALLOC_FLAGS SLAB_HWCACHE_ALIGN
-#endif
-
-/* Legal flag mask for kmem_cache_create(). */
-#if DEBUG
-# define CREATE_MASK	(SLAB_RED_ZONE | \
-			 SLAB_POISON | SLAB_HWCACHE_ALIGN | \
-			 SLAB_CACHE_DMA | \
-			 SLAB_STORE_USER | \
-			 SLAB_RECLAIM_ACCOUNT | SLAB_PANIC | \
-			 SLAB_DESTROY_BY_RCU | SLAB_MEM_SPREAD)
-#else
-# define CREATE_MASK	(SLAB_HWCACHE_ALIGN | \
-			 SLAB_CACHE_DMA | \
-			 SLAB_RECLAIM_ACCOUNT | SLAB_PANIC | \
-			 SLAB_DESTROY_BY_RCU | SLAB_MEM_SPREAD)
-#endif
-
-/*
- * kmem_bufctl_t:
- *
- * Bufctl's are used for linking objs within a slab
- * linked offsets.
- *
- * This implementation relies on "struct page" for locating the cache &
- * slab an object belongs to.
- * This allows the bufctl structure to be small (one int), but limits
- * the number of objects a slab (not a cache) can contain when off-slab
- * bufctls are used. The limit is the size of the largest general cache
- * that does not use off-slab slabs.
- * For 32bit archs with 4 kB pages, is this 56.
- * This is not serious, as it is only for large objects, when it is unwise
- * to have too many per slab.
- * Note: This limit can be raised by introducing a general cache whose size
- * is less than 512 (PAGE_SIZE<<3), but greater than 256.
- */
-
-typedef unsigned int kmem_bufctl_t;
-#define BUFCTL_END	(((kmem_bufctl_t)(~0U))-0)
-#define BUFCTL_FREE	(((kmem_bufctl_t)(~0U))-1)
-#define	BUFCTL_ACTIVE	(((kmem_bufctl_t)(~0U))-2)
-#define	SLAB_LIMIT	(((kmem_bufctl_t)(~0U))-3)
-
-/*
- * struct slab
- *
- * Manages the objs in a slab. Placed either at the beginning of mem allocated
- * for a slab, or allocated from an general cache.
- * Slabs are chained into three list: fully used, partial, fully free slabs.
- */
-struct slab {
-	struct list_head list;
-	unsigned long colouroff;
-	void *s_mem;		/* including colour offset */
-	unsigned int inuse;	/* num of objs active in slab */
-	kmem_bufctl_t free;
-	unsigned short nodeid;
-};
-
-/*
- * struct slab_rcu
- *
- * slab_destroy on a SLAB_DESTROY_BY_RCU cache uses this structure to
- * arrange for kmem_freepages to be called via RCU.  This is useful if
- * we need to approach a kernel structure obliquely, from its address
- * obtained without the usual locking.  We can lock the structure to
- * stabilize it and check it's still at the given address, only if we
- * can be sure that the memory has not been meanwhile reused for some
- * other kind of object (which our subsystem's lock might corrupt).
- *
- * rcu_read_lock before reading the address, then rcu_read_unlock after
- * taking the spinlock within the structure expected at that address.
- *
- * We assume struct slab_rcu can overlay struct slab when destroying.
- */
-struct slab_rcu {
-	struct rcu_head head;
-	struct kmem_cache *cachep;
-	void *addr;
-};
-
-/*
- * struct array_cache
- *
- * Purpose:
- * - LIFO ordering, to hand out cache-warm objects from _alloc
- * - reduce the number of linked list operations
- * - reduce spinlock operations
- *
- * The limit is stored in the per-cpu structure to reduce the data cache
- * footprint.
- *
- */
-struct array_cache {
-	unsigned int avail;
-	unsigned int limit;
-	unsigned int batchcount;
-	unsigned int touched;
-	spinlock_t lock;
-	void *entry[0];	/*
-			 * Must have this definition in here for the proper
-			 * alignment of array_cache. Also simplifies accessing
-			 * the entries.
-			 * [0] is for gcc 2.95. It should really be [].
-			 */
-};
-
-/*
- * bootstrap: The caches do not work without cpuarrays anymore, but the
- * cpuarrays are allocated from the generic caches...
- */
-#define BOOT_CPUCACHE_ENTRIES	1
-struct arraycache_init {
-	struct array_cache cache;
-	void *entries[BOOT_CPUCACHE_ENTRIES];
-};
-
-/*
- * The slab lists for all objects.
- */
-struct kmem_list3 {
-	struct list_head slabs_partial;	/* partial list first, better asm code */
-	struct list_head slabs_full;
-	struct list_head slabs_free;
-	unsigned long free_objects;
-	unsigned int free_limit;
-	unsigned int colour_next;	/* Per-node cache coloring */
-	spinlock_t list_lock;
-	struct array_cache *shared;	/* shared per node */
-	struct array_cache **alien;	/* on other nodes */
-	unsigned long next_reap;	/* updated without locking */
-	int free_touched;		/* updated without locking */
-};
-
-/*
- * Need this for bootstrapping a per node allocator.
- */
-#define NUM_INIT_LISTS (2 * MAX_NUMNODES + 1)
-struct kmem_list3 __initdata initkmem_list3[NUM_INIT_LISTS];
-#define	CACHE_CACHE 0
-#define	SIZE_AC 1
-#define	SIZE_L3 (1 + MAX_NUMNODES)
-
-static int drain_freelist(struct kmem_cache *cache,
-			struct kmem_list3 *l3, int tofree);
-static void free_block(struct kmem_cache *cachep, void **objpp, int len,
-			int node);
-static int enable_cpucache(struct kmem_cache *cachep);
-static void cache_reap(struct work_struct *unused);
-
-/*
- * This function must be completely optimized away if a constant is passed to
- * it.  Mostly the same as what is in linux/slab.h except it returns an index.
- */
-static __always_inline int index_of(const size_t size)
-{
-	extern void __bad_size(void);
-
-	if (__builtin_constant_p(size)) {
-		int i = 0;
-
-#define CACHE(x) \
-	if (size <=x) \
-		return i; \
-	else \
-		i++;
-#include "linux/kmalloc_sizes.h"
-#undef CACHE
-		__bad_size();
-	} else
-		__bad_size();
-	return 0;
-}
-
-static int slab_early_init = 1;
-
-#define INDEX_AC index_of(sizeof(struct arraycache_init))
-#define INDEX_L3 index_of(sizeof(struct kmem_list3))
-
-static void kmem_list3_init(struct kmem_list3 *parent)
-{
-	INIT_LIST_HEAD(&parent->slabs_full);
-	INIT_LIST_HEAD(&parent->slabs_partial);
-	INIT_LIST_HEAD(&parent->slabs_free);
-	parent->shared = NULL;
-	parent->alien = NULL;
-	parent->colour_next = 0;
-	spin_lock_init(&parent->list_lock);
-	parent->free_objects = 0;
-	parent->free_touched = 0;
-}
-
-#define MAKE_LIST(cachep, listp, slab, nodeid)				\
-	do {								\
-		INIT_LIST_HEAD(listp);					\
-		list_splice(&(cachep->nodelists[nodeid]->slab), listp);	\
-	} while (0)
-
-#define	MAKE_ALL_LISTS(cachep, ptr, nodeid)				\
-	do {								\
-	MAKE_LIST((cachep), (&(ptr)->slabs_full), slabs_full, nodeid);	\
-	MAKE_LIST((cachep), (&(ptr)->slabs_partial), slabs_partial, nodeid); \
-	MAKE_LIST((cachep), (&(ptr)->slabs_free), slabs_free, nodeid);	\
-	} while (0)
-
-/*
- * struct kmem_cache
- *
- * manages a cache.
- */
-
-struct kmem_cache {
-/* 1) per-cpu data, touched during every alloc/free */
-	struct array_cache *array[NR_CPUS];
-/* 2) Cache tunables. Protected by cache_chain_mutex */
-	unsigned int batchcount;
-	unsigned int limit;
-	unsigned int shared;
-
-	unsigned int buffer_size;
-	u32 reciprocal_buffer_size;
-/* 3) touched by every alloc & free from the backend */
-
-	unsigned int flags;		/* constant flags */
-	unsigned int num;		/* # of objs per slab */
-
-/* 4) cache_grow/shrink */
-	/* order of pgs per slab (2^n) */
-	unsigned int gfporder;
-
-	/* force GFP flags, e.g. GFP_DMA */
-	gfp_t gfpflags;
-
-	size_t colour;			/* cache colouring range */
-	unsigned int colour_off;	/* colour offset */
-	struct kmem_cache *slabp_cache;
-	unsigned int slab_size;
-	unsigned int dflags;		/* dynamic flags */
-
-	/* constructor func */
-	void (*ctor) (void *, struct kmem_cache *, unsigned long);
-
-/* 5) cache creation/removal */
-	const char *name;
-	struct list_head next;
-
-/* 6) statistics */
-#if STATS
-	unsigned long num_active;
-	unsigned long num_allocations;
-	unsigned long high_mark;
-	unsigned long grown;
-	unsigned long reaped;
-	unsigned long errors;
-	unsigned long max_freeable;
-	unsigned long node_allocs;
-	unsigned long node_frees;
-	unsigned long node_overflow;
-	atomic_t allochit;
-	atomic_t allocmiss;
-	atomic_t freehit;
-	atomic_t freemiss;
-#endif
-#if DEBUG
-	/*
-	 * If debugging is enabled, then the allocator can add additional
-	 * fields and/or padding to every object. buffer_size contains the total
-	 * object size including these internal fields, the following two
-	 * variables contain the offset to the user object and its size.
-	 */
-	int obj_offset;
-	int obj_size;
-#endif
-	/*
-	 * We put nodelists[] at the end of kmem_cache, because we want to size
-	 * this array to nr_node_ids slots instead of MAX_NUMNODES
-	 * (see kmem_cache_init())
-	 * We still use [MAX_NUMNODES] and not [1] or [0] because cache_cache
-	 * is statically defined, so we reserve the max number of nodes.
-	 */
-	struct kmem_list3 *nodelists[MAX_NUMNODES];
-	/*
-	 * Do not add fields after nodelists[]
-	 */
-};
-
-#define CFLGS_OFF_SLAB		(0x80000000UL)
-#define	OFF_SLAB(x)	((x)->flags & CFLGS_OFF_SLAB)
-
-#define BATCHREFILL_LIMIT	16
-/*
- * Optimization question: fewer reaps means less probability for unnessary
- * cpucache drain/refill cycles.
- *
- * OTOH the cpuarrays can contain lots of objects,
- * which could lock up otherwise freeable slabs.
- */
-#define REAPTIMEOUT_CPUC	(2*HZ)
-#define REAPTIMEOUT_LIST3	(4*HZ)
-
-#if STATS
-#define	STATS_INC_ACTIVE(x)	((x)->num_active++)
-#define	STATS_DEC_ACTIVE(x)	((x)->num_active--)
-#define	STATS_INC_ALLOCED(x)	((x)->num_allocations++)
-#define	STATS_INC_GROWN(x)	((x)->grown++)
-#define	STATS_ADD_REAPED(x,y)	((x)->reaped += (y))
-#define	STATS_SET_HIGH(x)						\
-	do {								\
-		if ((x)->num_active > (x)->high_mark)			\
-			(x)->high_mark = (x)->num_active;		\
-	} while (0)
-#define	STATS_INC_ERR(x)	((x)->errors++)
-#define	STATS_INC_NODEALLOCS(x)	((x)->node_allocs++)
-#define	STATS_INC_NODEFREES(x)	((x)->node_frees++)
-#define STATS_INC_ACOVERFLOW(x)   ((x)->node_overflow++)
-#define	STATS_SET_FREEABLE(x, i)					\
-	do {								\
-		if ((x)->max_freeable < i)				\
-			(x)->max_freeable = i;				\
-	} while (0)
-#define STATS_INC_ALLOCHIT(x)	atomic_inc(&(x)->allochit)
-#define STATS_INC_ALLOCMISS(x)	atomic_inc(&(x)->allocmiss)
-#define STATS_INC_FREEHIT(x)	atomic_inc(&(x)->freehit)
-#define STATS_INC_FREEMISS(x)	atomic_inc(&(x)->freemiss)
-#else
-#define	STATS_INC_ACTIVE(x)	do { } while (0)
-#define	STATS_DEC_ACTIVE(x)	do { } while (0)
-#define	STATS_INC_ALLOCED(x)	do { } while (0)
-#define	STATS_INC_GROWN(x)	do { } while (0)
-#define	STATS_ADD_REAPED(x,y)	do { } while (0)
-#define	STATS_SET_HIGH(x)	do { } while (0)
-#define	STATS_INC_ERR(x)	do { } while (0)
-#define	STATS_INC_NODEALLOCS(x)	do { } while (0)
-#define	STATS_INC_NODEFREES(x)	do { } while (0)
-#define STATS_INC_ACOVERFLOW(x)   do { } while (0)
-#define	STATS_SET_FREEABLE(x, i) do { } while (0)
-#define STATS_INC_ALLOCHIT(x)	do { } while (0)
-#define STATS_INC_ALLOCMISS(x)	do { } while (0)
-#define STATS_INC_FREEHIT(x)	do { } while (0)
-#define STATS_INC_FREEMISS(x)	do { } while (0)
-#endif
-
-#if DEBUG
-
-/*
- * memory layout of objects:
- * 0		: objp
- * 0 .. cachep->obj_offset - BYTES_PER_WORD - 1: padding. This ensures that
- * 		the end of an object is aligned with the end of the real
- * 		allocation. Catches writes behind the end of the allocation.
- * cachep->obj_offset - BYTES_PER_WORD .. cachep->obj_offset - 1:
- * 		redzone word.
- * cachep->obj_offset: The real object.
- * cachep->buffer_size - 2* BYTES_PER_WORD: redzone word [BYTES_PER_WORD long]
- * cachep->buffer_size - 1* BYTES_PER_WORD: last caller address
- *					[BYTES_PER_WORD long]
- */
-static int obj_offset(struct kmem_cache *cachep)
-{
-	return cachep->obj_offset;
-}
-
-static int obj_size(struct kmem_cache *cachep)
-{
-	return cachep->obj_size;
-}
-
-static unsigned long long *dbg_redzone1(struct kmem_cache *cachep, void *objp)
-{
-	BUG_ON(!(cachep->flags & SLAB_RED_ZONE));
-	return (unsigned long long*) (objp + obj_offset(cachep) -
-				      sizeof(unsigned long long));
-}
-
-static unsigned long long *dbg_redzone2(struct kmem_cache *cachep, void *objp)
-{
-	BUG_ON(!(cachep->flags & SLAB_RED_ZONE));
-	if (cachep->flags & SLAB_STORE_USER)
-		return (unsigned long long *)(objp + cachep->buffer_size -
-					      sizeof(unsigned long long) -
-					      BYTES_PER_WORD);
-	return (unsigned long long *) (objp + cachep->buffer_size -
-				       sizeof(unsigned long long));
-}
-
-static void **dbg_userword(struct kmem_cache *cachep, void *objp)
-{
-	BUG_ON(!(cachep->flags & SLAB_STORE_USER));
-	return (void **)(objp + cachep->buffer_size - BYTES_PER_WORD);
-}
-
-#else
-
-#define obj_offset(x)			0
-#define obj_size(cachep)		(cachep->buffer_size)
-#define dbg_redzone1(cachep, objp)	({BUG(); (unsigned long long *)NULL;})
-#define dbg_redzone2(cachep, objp)	({BUG(); (unsigned long long *)NULL;})
-#define dbg_userword(cachep, objp)	({BUG(); (void **)NULL;})
-
-#endif
-
-/*
- * Do not go above this order unless 0 objects fit into the slab.
- */
-#define	BREAK_GFP_ORDER_HI	1
-#define	BREAK_GFP_ORDER_LO	0
-static int slab_break_gfp_order = BREAK_GFP_ORDER_LO;
-
-/*
- * Functions for storing/retrieving the cachep and or slab from the page
- * allocator.  These are used to find the slab an obj belongs to.  With kfree(),
- * these are used to find the cache which an obj belongs to.
- */
-static inline void page_set_cache(struct page *page, struct kmem_cache *cache)
-{
-	page->lru.next = (struct list_head *)cache;
-}
-
-static inline struct kmem_cache *page_get_cache(struct page *page)
-{
-	page = compound_head(page);
-	BUG_ON(!PageSlab(page));
-	return (struct kmem_cache *)page->lru.next;
-}
-
-static inline void page_set_slab(struct page *page, struct slab *slab)
-{
-	page->lru.prev = (struct list_head *)slab;
-}
-
-static inline struct slab *page_get_slab(struct page *page)
-{
-	BUG_ON(!PageSlab(page));
-	return (struct slab *)page->lru.prev;
-}
-
-static inline struct kmem_cache *virt_to_cache(const void *obj)
-{
-	struct page *page = virt_to_head_page(obj);
-	return page_get_cache(page);
-}
-
-static inline struct slab *virt_to_slab(const void *obj)
-{
-	struct page *page = virt_to_head_page(obj);
-	return page_get_slab(page);
-}
-
-static inline void *index_to_obj(struct kmem_cache *cache, struct slab *slab,
-				 unsigned int idx)
-{
-	return slab->s_mem + cache->buffer_size * idx;
-}
-
-/*
- * We want to avoid an expensive divide : (offset / cache->buffer_size)
- *   Using the fact that buffer_size is a constant for a particular cache,
- *   we can replace (offset / cache->buffer_size) by
- *   reciprocal_divide(offset, cache->reciprocal_buffer_size)
- */
-static inline unsigned int obj_to_index(const struct kmem_cache *cache,
-					const struct slab *slab, void *obj)
-{
-	u32 offset = (obj - slab->s_mem);
-	return reciprocal_divide(offset, cache->reciprocal_buffer_size);
-}
-
-/*
- * These are the default caches for kmalloc. Custom caches can have other sizes.
- */
-struct cache_sizes malloc_sizes[] = {
-#define CACHE(x) { .cs_size = (x) },
-#include <linux/kmalloc_sizes.h>
-	CACHE(ULONG_MAX)
-#undef CACHE
-};
-EXPORT_SYMBOL(malloc_sizes);
-
-/* Must match cache_sizes above. Out of line to keep cache footprint low. */
-struct cache_names {
-	char *name;
-	char *name_dma;
-};
-
-static struct cache_names __initdata cache_names[] = {
-#define CACHE(x) { .name = "size-" #x, .name_dma = "size-" #x "(DMA)" },
-#include <linux/kmalloc_sizes.h>
-	{NULL,}
-#undef CACHE
-};
-
-static struct arraycache_init initarray_cache __initdata =
-    { {0, BOOT_CPUCACHE_ENTRIES, 1, 0} };
-static struct arraycache_init initarray_generic =
-    { {0, BOOT_CPUCACHE_ENTRIES, 1, 0} };
-
-/* internal cache of cache description objs */
-static struct kmem_cache cache_cache = {
-	.batchcount = 1,
-	.limit = BOOT_CPUCACHE_ENTRIES,
-	.shared = 1,
-	.buffer_size = sizeof(struct kmem_cache),
-	.name = "kmem_cache",
-};
-
-#define BAD_ALIEN_MAGIC 0x01020304ul
-
-#ifdef CONFIG_LOCKDEP
-
-/*
- * Slab sometimes uses the kmalloc slabs to store the slab headers
- * for other slabs "off slab".
- * The locking for this is tricky in that it nests within the locks
- * of all other slabs in a few places; to deal with this special
- * locking we put on-slab caches into a separate lock-class.
- *
- * We set lock class for alien array caches which are up during init.
- * The lock annotation will be lost if all cpus of a node goes down and
- * then comes back up during hotplug
- */
-static struct lock_class_key on_slab_l3_key;
-static struct lock_class_key on_slab_alc_key;
-
-static inline void init_lock_keys(void)
-
-{
-	int q;
-	struct cache_sizes *s = malloc_sizes;
-
-	while (s->cs_size != ULONG_MAX) {
-		for_each_node(q) {
-			struct array_cache **alc;
-			int r;
-			struct kmem_list3 *l3 = s->cs_cachep->nodelists[q];
-			if (!l3 || OFF_SLAB(s->cs_cachep))
-				continue;
-			lockdep_set_class(&l3->list_lock, &on_slab_l3_key);
-			alc = l3->alien;
-			/*
-			 * FIXME: This check for BAD_ALIEN_MAGIC
-			 * should go away when common slab code is taught to
-			 * work even without alien caches.
-			 * Currently, non NUMA code returns BAD_ALIEN_MAGIC
-			 * for alloc_alien_cache,
-			 */
-			if (!alc || (unsigned long)alc == BAD_ALIEN_MAGIC)
-				continue;
-			for_each_node(r) {
-				if (alc[r])
-					lockdep_set_class(&alc[r]->lock,
-					     &on_slab_alc_key);
-			}
-		}
-		s++;
-	}
-}
-#else
-static inline void init_lock_keys(void)
-{
-}
-#endif
-
-/*
- * 1. Guard access to the cache-chain.
- * 2. Protect sanity of cpu_online_map against cpu hotplug events
- */
-static DEFINE_MUTEX(cache_chain_mutex);
-static struct list_head cache_chain;
-
-/*
- * chicken and egg problem: delay the per-cpu array allocation
- * until the general caches are up.
- */
-static enum {
-	NONE,
-	PARTIAL_AC,
-	PARTIAL_L3,
-	FULL
-} g_cpucache_up;
-
-/*
- * used by boot code to determine if it can use slab based allocator
- */
-int slab_is_available(void)
-{
-	return g_cpucache_up == FULL;
-}
-
-static DEFINE_PER_CPU(struct delayed_work, reap_work);
-
-static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep)
-{
-	return cachep->array[smp_processor_id()];
-}
-
-static inline struct kmem_cache *__find_general_cachep(size_t size,
-							gfp_t gfpflags)
-{
-	struct cache_sizes *csizep = malloc_sizes;
-
-#if DEBUG
-	/* This happens if someone tries to call
-	 * kmem_cache_create(), or __kmalloc(), before
-	 * the generic caches are initialized.
-	 */
-	BUG_ON(malloc_sizes[INDEX_AC].cs_cachep == NULL);
-#endif
-	if (!size)
-		return ZERO_SIZE_PTR;
-
-	while (size > csizep->cs_size)
-		csizep++;
-
-	/*
-	 * Really subtle: The last entry with cs->cs_size==ULONG_MAX
-	 * has cs_{dma,}cachep==NULL. Thus no special case
-	 * for large kmalloc calls required.
-	 */
-#ifdef CONFIG_ZONE_DMA
-	if (unlikely(gfpflags & GFP_DMA))
-		return csizep->cs_dmacachep;
-#endif
-	return csizep->cs_cachep;
-}
-
-static struct kmem_cache *kmem_find_general_cachep(size_t size, gfp_t gfpflags)
-{
-	return __find_general_cachep(size, gfpflags);
-}
-
-static size_t slab_mgmt_size(size_t nr_objs, size_t align)
-{
-	return ALIGN(sizeof(struct slab)+nr_objs*sizeof(kmem_bufctl_t), align);
-}
-
-/*
- * Calculate the number of objects and left-over bytes for a given buffer size.
- */
-static void cache_estimate(unsigned long gfporder, size_t buffer_size,
-			   size_t align, int flags, size_t *left_over,
-			   unsigned int *num)
-{
-	int nr_objs;
-	size_t mgmt_size;
-	size_t slab_size = PAGE_SIZE << gfporder;
-
-	/*
-	 * The slab management structure can be either off the slab or
-	 * on it. For the latter case, the memory allocated for a
-	 * slab is used for:
-	 *
-	 * - The struct slab
-	 * - One kmem_bufctl_t for each object
-	 * - Padding to respect alignment of @align
-	 * - @buffer_size bytes for each object
-	 *
-	 * If the slab management structure is off the slab, then the
-	 * alignment will already be calculated into the size. Because
-	 * the slabs are all pages aligned, the objects will be at the
-	 * correct alignment when allocated.
-	 */
-	if (flags & CFLGS_OFF_SLAB) {
-		mgmt_size = 0;
-		nr_objs = slab_size / buffer_size;
-
-		if (nr_objs > SLAB_LIMIT)
-			nr_objs = SLAB_LIMIT;
-	} else {
-		/*
-		 * Ignore padding for the initial guess. The padding
-		 * is at most @align-1 bytes, and @buffer_size is at
-		 * least @align. In the worst case, this result will
-		 * be one greater than the number of objects that fit
-		 * into the memory allocation when taking the padding
-		 * into account.
-		 */
-		nr_objs = (slab_size - sizeof(struct slab)) /
-			  (buffer_size + sizeof(kmem_bufctl_t));
-
-		/*
-		 * This calculated number will be either the right
-		 * amount, or one greater than what we want.
-		 */
-		if (slab_mgmt_size(nr_objs, align) + nr_objs*buffer_size
-		       > slab_size)
-			nr_objs--;
-
-		if (nr_objs > SLAB_LIMIT)
-			nr_objs = SLAB_LIMIT;
-
-		mgmt_size = slab_mgmt_size(nr_objs, align);
-	}
-	*num = nr_objs;
-	*left_over = slab_size - nr_objs*buffer_size - mgmt_size;
-}
-
-#define slab_error(cachep, msg) __slab_error(__FUNCTION__, cachep, msg)
-
-static void __slab_error(const char *function, struct kmem_cache *cachep,
-			char *msg)
-{
-	printk(KERN_ERR "slab error in %s(): cache `%s': %s\n",
-	       function, cachep->name, msg);
-	dump_stack();
-}
-
-/*
- * By default on NUMA we use alien caches to stage the freeing of
- * objects allocated from other nodes. This causes massive memory
- * inefficiencies when using fake NUMA setup to split memory into a
- * large number of small nodes, so it can be disabled on the command
- * line
-  */
-
-static int use_alien_caches __read_mostly = 1;
-static int __init noaliencache_setup(char *s)
-{
-	use_alien_caches = 0;
-	return 1;
-}
-__setup("noaliencache", noaliencache_setup);
-
-#ifdef CONFIG_NUMA
-/*
- * Special reaping functions for NUMA systems called from cache_reap().
- * These take care of doing round robin flushing of alien caches (containing
- * objects freed on different nodes from which they were allocated) and the
- * flushing of remote pcps by calling drain_node_pages.
- */
-static DEFINE_PER_CPU(unsigned long, reap_node);
-
-static void init_reap_node(int cpu)
-{
-	int node;
-
-	node = next_node(cpu_to_node(cpu), node_online_map);
-	if (node == MAX_NUMNODES)
-		node = first_node(node_online_map);
-
-	per_cpu(reap_node, cpu) = node;
-}
-
-static void next_reap_node(void)
-{
-	int node = __get_cpu_var(reap_node);
-
-	node = next_node(node, node_online_map);
-	if (unlikely(node >= MAX_NUMNODES))
-		node = first_node(node_online_map);
-	__get_cpu_var(reap_node) = node;
-}
-
-#else
-#define init_reap_node(cpu) do { } while (0)
-#define next_reap_node(void) do { } while (0)
-#endif
-
-/*
- * Initiate the reap timer running on the target CPU.  We run at around 1 to 2Hz
- * via the workqueue/eventd.
- * Add the CPU number into the expiration time to minimize the possibility of
- * the CPUs getting into lockstep and contending for the global cache chain
- * lock.
- */
-static void __cpuinit start_cpu_timer(int cpu)
-{
-	struct delayed_work *reap_work = &per_cpu(reap_work, cpu);
-
-	/*
-	 * When this gets called from do_initcalls via cpucache_init(),
-	 * init_workqueues() has already run, so keventd will be setup
-	 * at that time.
-	 */
-	if (keventd_up() && reap_work->work.func == NULL) {
-		init_reap_node(cpu);
-		INIT_DELAYED_WORK(reap_work, cache_reap);
-		schedule_delayed_work_on(cpu, reap_work,
-					__round_jiffies_relative(HZ, cpu));
-	}
-}
-
-static struct array_cache *alloc_arraycache(int node, int entries,
-					    int batchcount)
-{
-	int memsize = sizeof(void *) * entries + sizeof(struct array_cache);
-	struct array_cache *nc = NULL;
-
-	nc = kmalloc_node(memsize, GFP_KERNEL, node);
-	if (nc) {
-		nc->avail = 0;
-		nc->limit = entries;
-		nc->batchcount = batchcount;
-		nc->touched = 0;
-		spin_lock_init(&nc->lock);
-	}
-	return nc;
-}
-
-/*
- * Transfer objects in one arraycache to another.
- * Locking must be handled by the caller.
- *
- * Return the number of entries transferred.
- */
-static int transfer_objects(struct array_cache *to,
-		struct array_cache *from, unsigned int max)
-{
-	/* Figure out how many entries to transfer */
-	int nr = min(min(from->avail, max), to->limit - to->avail);
-
-	if (!nr)
-		return 0;
-
-	memcpy(to->entry + to->avail, from->entry + from->avail -nr,
-			sizeof(void *) *nr);
-
-	from->avail -= nr;
-	to->avail += nr;
-	to->touched = 1;
-	return nr;
-}
-
-#ifndef CONFIG_NUMA
-
-#define drain_alien_cache(cachep, alien) do { } while (0)
-#define reap_alien(cachep, l3) do { } while (0)
-
-static inline struct array_cache **alloc_alien_cache(int node, int limit)
-{
-	return (struct array_cache **)BAD_ALIEN_MAGIC;
-}
-
-static inline void free_alien_cache(struct array_cache **ac_ptr)
-{
-}
-
-static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
-{
-	return 0;
-}
-
-static inline void *alternate_node_alloc(struct kmem_cache *cachep,
-		gfp_t flags)
-{
-	return NULL;
-}
-
-static inline void *____cache_alloc_node(struct kmem_cache *cachep,
-		 gfp_t flags, int nodeid)
-{
-	return NULL;
-}
-
-#else	/* CONFIG_NUMA */
-
-static void *____cache_alloc_node(struct kmem_cache *, gfp_t, int);
-static void *alternate_node_alloc(struct kmem_cache *, gfp_t);
-
-static struct array_cache **alloc_alien_cache(int node, int limit)
-{
-	struct array_cache **ac_ptr;
-	int memsize = sizeof(void *) * nr_node_ids;
-	int i;
-
-	if (limit > 1)
-		limit = 12;
-	ac_ptr = kmalloc_node(memsize, GFP_KERNEL, node);
-	if (ac_ptr) {
-		for_each_node(i) {
-			if (i == node || !node_online(i)) {
-				ac_ptr[i] = NULL;
-				continue;
-			}
-			ac_ptr[i] = alloc_arraycache(node, limit, 0xbaadf00d);
-			if (!ac_ptr[i]) {
-				for (i--; i <= 0; i--)
-					kfree(ac_ptr[i]);
-				kfree(ac_ptr);
-				return NULL;
-			}
-		}
-	}
-	return ac_ptr;
-}
-
-static void free_alien_cache(struct array_cache **ac_ptr)
-{
-	int i;
-
-	if (!ac_ptr)
-		return;
-	for_each_node(i)
-	    kfree(ac_ptr[i]);
-	kfree(ac_ptr);
-}
-
-static void __drain_alien_cache(struct kmem_cache *cachep,
-				struct array_cache *ac, int node)
-{
-	struct kmem_list3 *rl3 = cachep->nodelists[node];
-
-	if (ac->avail) {
-		spin_lock(&rl3->list_lock);
-		/*
-		 * Stuff objects into the remote nodes shared array first.
-		 * That way we could avoid the overhead of putting the objects
-		 * into the free lists and getting them back later.
-		 */
-		if (rl3->shared)
-			transfer_objects(rl3->shared, ac, ac->limit);
-
-		free_block(cachep, ac->entry, ac->avail, node);
-		ac->avail = 0;
-		spin_unlock(&rl3->list_lock);
-	}
-}
-
-/*
- * Called from cache_reap() to regularly drain alien caches round robin.
- */
-static void reap_alien(struct kmem_cache *cachep, struct kmem_list3 *l3)
-{
-	int node = __get_cpu_var(reap_node);
-
-	if (l3->alien) {
-		struct array_cache *ac = l3->alien[node];
-
-		if (ac && ac->avail && spin_trylock_irq(&ac->lock)) {
-			__drain_alien_cache(cachep, ac, node);
-			spin_unlock_irq(&ac->lock);
-		}
-	}
-}
-
-static void drain_alien_cache(struct kmem_cache *cachep,
-				struct array_cache **alien)
-{
-	int i = 0;
-	struct array_cache *ac;
-	unsigned long flags;
-
-	for_each_online_node(i) {
-		ac = alien[i];
-		if (ac) {
-			spin_lock_irqsave(&ac->lock, flags);
-			__drain_alien_cache(cachep, ac, i);
-			spin_unlock_irqrestore(&ac->lock, flags);
-		}
-	}
-}
-
-static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
-{
-	struct slab *slabp = virt_to_slab(objp);
-	int nodeid = slabp->nodeid;
-	struct kmem_list3 *l3;
-	struct array_cache *alien = NULL;
-	int node;
-
-	node = numa_node_id();
-
-	/*
-	 * Make sure we are not freeing a object from another node to the array
-	 * cache on this cpu.
-	 */
-	if (likely(slabp->nodeid == node))
-		return 0;
-
-	l3 = cachep->nodelists[node];
-	STATS_INC_NODEFREES(cachep);
-	if (l3->alien && l3->alien[nodeid]) {
-		alien = l3->alien[nodeid];
-		spin_lock(&alien->lock);
-		if (unlikely(alien->avail == alien->limit)) {
-			STATS_INC_ACOVERFLOW(cachep);
-			__drain_alien_cache(cachep, alien, nodeid);
-		}
-		alien->entry[alien->avail++] = objp;
-		spin_unlock(&alien->lock);
-	} else {
-		spin_lock(&(cachep->nodelists[nodeid])->list_lock);
-		free_block(cachep, &objp, 1, nodeid);
-		spin_unlock(&(cachep->nodelists[nodeid])->list_lock);
-	}
-	return 1;
-}
-#endif
-
-static int __cpuinit cpuup_callback(struct notifier_block *nfb,
-				    unsigned long action, void *hcpu)
-{
-	long cpu = (long)hcpu;
-	struct kmem_cache *cachep;
-	struct kmem_list3 *l3 = NULL;
-	int node = cpu_to_node(cpu);
-	int memsize = sizeof(struct kmem_list3);
-
-	switch (action) {
-	case CPU_LOCK_ACQUIRE:
-		mutex_lock(&cache_chain_mutex);
-		break;
-	case CPU_UP_PREPARE:
-	case CPU_UP_PREPARE_FROZEN:
-		/*
-		 * We need to do this right in the beginning since
-		 * alloc_arraycache's are going to use this list.
-		 * kmalloc_node allows us to add the slab to the right
-		 * kmem_list3 and not this cpu's kmem_list3
-		 */
-
-		list_for_each_entry(cachep, &cache_chain, next) {
-			/*
-			 * Set up the size64 kmemlist for cpu before we can
-			 * begin anything. Make sure some other cpu on this
-			 * node has not already allocated this
-			 */
-			if (!cachep->nodelists[node]) {
-				l3 = kmalloc_node(memsize, GFP_KERNEL, node);
-				if (!l3)
-					goto bad;
-				kmem_list3_init(l3);
-				l3->next_reap = jiffies + REAPTIMEOUT_LIST3 +
-				    ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
-
-				/*
-				 * The l3s don't come and go as CPUs come and
-				 * go.  cache_chain_mutex is sufficient
-				 * protection here.
-				 */
-				cachep->nodelists[node] = l3;
-			}
-
-			spin_lock_irq(&cachep->nodelists[node]->list_lock);
-			cachep->nodelists[node]->free_limit =
-				(1 + nr_cpus_node(node)) *
-				cachep->batchcount + cachep->num;
-			spin_unlock_irq(&cachep->nodelists[node]->list_lock);
-		}
-
-		/*
-		 * Now we can go ahead with allocating the shared arrays and
-		 * array caches
-		 */
-		list_for_each_entry(cachep, &cache_chain, next) {
-			struct array_cache *nc;
-			struct array_cache *shared = NULL;
-			struct array_cache **alien = NULL;
-
-			nc = alloc_arraycache(node, cachep->limit,
-						cachep->batchcount);
-			if (!nc)
-				goto bad;
-			if (cachep->shared) {
-				shared = alloc_arraycache(node,
-					cachep->shared * cachep->batchcount,
-					0xbaadf00d);
-				if (!shared)
-					goto bad;
-			}
-			if (use_alien_caches) {
-                                alien = alloc_alien_cache(node, cachep->limit);
-                                if (!alien)
-                                        goto bad;
-                        }
-			cachep->array[cpu] = nc;
-			l3 = cachep->nodelists[node];
-			BUG_ON(!l3);
-
-			spin_lock_irq(&l3->list_lock);
-			if (!l3->shared) {
-				/*
-				 * We are serialised from CPU_DEAD or
-				 * CPU_UP_CANCELLED by the cpucontrol lock
-				 */
-				l3->shared = shared;
-				shared = NULL;
-			}
-#ifdef CONFIG_NUMA
-			if (!l3->alien) {
-				l3->alien = alien;
-				alien = NULL;
-			}
-#endif
-			spin_unlock_irq(&l3->list_lock);
-			kfree(shared);
-			free_alien_cache(alien);
-		}
-		break;
-	case CPU_ONLINE:
-	case CPU_ONLINE_FROZEN:
-		start_cpu_timer(cpu);
-		break;
-#ifdef CONFIG_HOTPLUG_CPU
-  	case CPU_DOWN_PREPARE:
-  	case CPU_DOWN_PREPARE_FROZEN:
-		/*
-		 * Shutdown cache reaper. Note that the cache_chain_mutex is
-		 * held so that if cache_reap() is invoked it cannot do
-		 * anything expensive but will only modify reap_work
-		 * and reschedule the timer.
-		*/
-		cancel_rearming_delayed_work(&per_cpu(reap_work, cpu));
-		/* Now the cache_reaper is guaranteed to be not running. */
-		per_cpu(reap_work, cpu).work.func = NULL;
-  		break;
-  	case CPU_DOWN_FAILED:
-  	case CPU_DOWN_FAILED_FROZEN:
-		start_cpu_timer(cpu);
-  		break;
-	case CPU_DEAD:
-	case CPU_DEAD_FROZEN:
-		/*
-		 * Even if all the cpus of a node are down, we don't free the
-		 * kmem_list3 of any cache. This to avoid a race between
-		 * cpu_down, and a kmalloc allocation from another cpu for
-		 * memory from the node of the cpu going down.  The list3
-		 * structure is usually allocated from kmem_cache_create() and
-		 * gets destroyed at kmem_cache_destroy().
-		 */
-		/* fall thru */
-#endif
-	case CPU_UP_CANCELED:
-	case CPU_UP_CANCELED_FROZEN:
-		list_for_each_entry(cachep, &cache_chain, next) {
-			struct array_cache *nc;
-			struct array_cache *shared;
-			struct array_cache **alien;
-			cpumask_t mask;
-
-			mask = node_to_cpumask(node);
-			/* cpu is dead; no one can alloc from it. */
-			nc = cachep->array[cpu];
-			cachep->array[cpu] = NULL;
-			l3 = cachep->nodelists[node];
-
-			if (!l3)
-				goto free_array_cache;
-
-			spin_lock_irq(&l3->list_lock);
-
-			/* Free limit for this kmem_list3 */
-			l3->free_limit -= cachep->batchcount;
-			if (nc)
-				free_block(cachep, nc->entry, nc->avail, node);
-
-			if (!cpus_empty(mask)) {
-				spin_unlock_irq(&l3->list_lock);
-				goto free_array_cache;
-			}
-
-			shared = l3->shared;
-			if (shared) {
-				free_block(cachep, shared->entry,
-					   shared->avail, node);
-				l3->shared = NULL;
-			}
-
-			alien = l3->alien;
-			l3->alien = NULL;
-
-			spin_unlock_irq(&l3->list_lock);
-
-			kfree(shared);
-			if (alien) {
-				drain_alien_cache(cachep, alien);
-				free_alien_cache(alien);
-			}
-free_array_cache:
-			kfree(nc);
-		}
-		/*
-		 * In the previous loop, all the objects were freed to
-		 * the respective cache's slabs,  now we can go ahead and
-		 * shrink each nodelist to its limit.
-		 */
-		list_for_each_entry(cachep, &cache_chain, next) {
-			l3 = cachep->nodelists[node];
-			if (!l3)
-				continue;
-			drain_freelist(cachep, l3, l3->free_objects);
-		}
-		break;
-	case CPU_LOCK_RELEASE:
-		mutex_unlock(&cache_chain_mutex);
-		break;
-	}
-	return NOTIFY_OK;
-bad:
-	return NOTIFY_BAD;
-}
-
-static struct notifier_block __cpuinitdata cpucache_notifier = {
-	&cpuup_callback, NULL, 0
-};
-
-/*
- * swap the static kmem_list3 with kmalloced memory
- */
-static void init_list(struct kmem_cache *cachep, struct kmem_list3 *list,
-			int nodeid)
-{
-	struct kmem_list3 *ptr;
-
-	ptr = kmalloc_node(sizeof(struct kmem_list3), GFP_KERNEL, nodeid);
-	BUG_ON(!ptr);
-
-	local_irq_disable();
-	memcpy(ptr, list, sizeof(struct kmem_list3));
-	/*
-	 * Do not assume that spinlocks can be initialized via memcpy:
-	 */
-	spin_lock_init(&ptr->list_lock);
-
-	MAKE_ALL_LISTS(cachep, ptr, nodeid);
-	cachep->nodelists[nodeid] = ptr;
-	local_irq_enable();
-}
-
-/*
- * Initialisation.  Called after the page allocator have been initialised and
- * before smp_init().
- */
-void __init kmem_cache_init(void)
-{
-	size_t left_over;
-	struct cache_sizes *sizes;
-	struct cache_names *names;
-	int i;
-	int order;
-	int node;
-
-	if (num_possible_nodes() == 1)
-		use_alien_caches = 0;
-
-	for (i = 0; i < NUM_INIT_LISTS; i++) {
-		kmem_list3_init(&initkmem_list3[i]);
-		if (i < MAX_NUMNODES)
-			cache_cache.nodelists[i] = NULL;
-	}
-
-	/*
-	 * Fragmentation resistance on low memory - only use bigger
-	 * page orders on machines with more than 32MB of memory.
-	 */
-	if (num_physpages > (32 << 20) >> PAGE_SHIFT)
-		slab_break_gfp_order = BREAK_GFP_ORDER_HI;
-
-	/* Bootstrap is tricky, because several objects are allocated
-	 * from caches that do not exist yet:
-	 * 1) initialize the cache_cache cache: it contains the struct
-	 *    kmem_cache structures of all caches, except cache_cache itself:
-	 *    cache_cache is statically allocated.
-	 *    Initially an __init data area is used for the head array and the
-	 *    kmem_list3 structures, it's replaced with a kmalloc allocated
-	 *    array at the end of the bootstrap.
-	 * 2) Create the first kmalloc cache.
-	 *    The struct kmem_cache for the new cache is allocated normally.
-	 *    An __init data area is used for the head array.
-	 * 3) Create the remaining kmalloc caches, with minimally sized
-	 *    head arrays.
-	 * 4) Replace the __init data head arrays for cache_cache and the first
-	 *    kmalloc cache with kmalloc allocated arrays.
-	 * 5) Replace the __init data for kmem_list3 for cache_cache and
-	 *    the other cache's with kmalloc allocated memory.
-	 * 6) Resize the head arrays of the kmalloc caches to their final sizes.
-	 */
-
-	node = numa_node_id();
-
-	/* 1) create the cache_cache */
-	INIT_LIST_HEAD(&cache_chain);
-	list_add(&cache_cache.next, &cache_chain);
-	cache_cache.colour_off = cache_line_size();
-	cache_cache.array[smp_processor_id()] = &initarray_cache.cache;
-	cache_cache.nodelists[node] = &initkmem_list3[CACHE_CACHE];
-
-	/*
-	 * struct kmem_cache size depends on nr_node_ids, which
-	 * can be less than MAX_NUMNODES.
-	 */
-	cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) +
-				 nr_node_ids * sizeof(struct kmem_list3 *);
-#if DEBUG
-	cache_cache.obj_size = cache_cache.buffer_size;
-#endif
-	cache_cache.buffer_size = ALIGN(cache_cache.buffer_size,
-					cache_line_size());
-	cache_cache.reciprocal_buffer_size =
-		reciprocal_value(cache_cache.buffer_size);
-
-	for (order = 0; order < MAX_ORDER; order++) {
-		cache_estimate(order, cache_cache.buffer_size,
-			cache_line_size(), 0, &left_over, &cache_cache.num);
-		if (cache_cache.num)
-			break;
-	}
-	BUG_ON(!cache_cache.num);
-	cache_cache.gfporder = order;
-	cache_cache.colour = left_over / cache_cache.colour_off;
-	cache_cache.slab_size = ALIGN(cache_cache.num * sizeof(kmem_bufctl_t) +
-				      sizeof(struct slab), cache_line_size());
-
-	/* 2+3) create the kmalloc caches */
-	sizes = malloc_sizes;
-	names = cache_names;
-
-	/*
-	 * Initialize the caches that provide memory for the array cache and the
-	 * kmem_list3 structures first.  Without this, further allocations will
-	 * bug.
-	 */
-
-	sizes[INDEX_AC].cs_cachep = kmem_cache_create(names[INDEX_AC].name,
-					sizes[INDEX_AC].cs_size,
-					ARCH_KMALLOC_MINALIGN,
-					ARCH_KMALLOC_FLAGS|SLAB_PANIC,
-					NULL, NULL);
-
-	if (INDEX_AC != INDEX_L3) {
-		sizes[INDEX_L3].cs_cachep =
-			kmem_cache_create(names[INDEX_L3].name,
-				sizes[INDEX_L3].cs_size,
-				ARCH_KMALLOC_MINALIGN,
-				ARCH_KMALLOC_FLAGS|SLAB_PANIC,
-				NULL, NULL);
-	}
-
-	slab_early_init = 0;
-
-	while (sizes->cs_size != ULONG_MAX) {
-		/*
-		 * For performance, all the general caches are L1 aligned.
-		 * This should be particularly beneficial on SMP boxes, as it
-		 * eliminates "false sharing".
-		 * Note for systems short on memory removing the alignment will
-		 * allow tighter packing of the smaller caches.
-		 */
-		if (!sizes->cs_cachep) {
-			sizes->cs_cachep = kmem_cache_create(names->name,
-					sizes->cs_size,
-					ARCH_KMALLOC_MINALIGN,
-					ARCH_KMALLOC_FLAGS|SLAB_PANIC,
-					NULL, NULL);
-		}
-#ifdef CONFIG_ZONE_DMA
-		sizes->cs_dmacachep = kmem_cache_create(
-					names->name_dma,
-					sizes->cs_size,
-					ARCH_KMALLOC_MINALIGN,
-					ARCH_KMALLOC_FLAGS|SLAB_CACHE_DMA|
-						SLAB_PANIC,
-					NULL, NULL);
-#endif
-		sizes++;
-		names++;
-	}
-	/* 4) Replace the bootstrap head arrays */
-	{
-		struct array_cache *ptr;
-
-		ptr = kmalloc(sizeof(struct arraycache_init), GFP_KERNEL);
-
-		local_irq_disable();
-		BUG_ON(cpu_cache_get(&cache_cache) != &initarray_cache.cache);
-		memcpy(ptr, cpu_cache_get(&cache_cache),
-		       sizeof(struct arraycache_init));
-		/*
-		 * Do not assume that spinlocks can be initialized via memcpy:
-		 */
-		spin_lock_init(&ptr->lock);
-
-		cache_cache.array[smp_processor_id()] = ptr;
-		local_irq_enable();
-
-		ptr = kmalloc(sizeof(struct arraycache_init), GFP_KERNEL);
-
-		local_irq_disable();
-		BUG_ON(cpu_cache_get(malloc_sizes[INDEX_AC].cs_cachep)
-		       != &initarray_generic.cache);
-		memcpy(ptr, cpu_cache_get(malloc_sizes[INDEX_AC].cs_cachep),
-		       sizeof(struct arraycache_init));
-		/*
-		 * Do not assume that spinlocks can be initialized via memcpy:
-		 */
-		spin_lock_init(&ptr->lock);
-
-		malloc_sizes[INDEX_AC].cs_cachep->array[smp_processor_id()] =
-		    ptr;
-		local_irq_enable();
-	}
-	/* 5) Replace the bootstrap kmem_list3's */
-	{
-		int nid;
-
-		/* Replace the static kmem_list3 structures for the boot cpu */
-		init_list(&cache_cache, &initkmem_list3[CACHE_CACHE], node);
-
-		for_each_online_node(nid) {
-			init_list(malloc_sizes[INDEX_AC].cs_cachep,
-				  &initkmem_list3[SIZE_AC + nid], nid);
-
-			if (INDEX_AC != INDEX_L3) {
-				init_list(malloc_sizes[INDEX_L3].cs_cachep,
-					  &initkmem_list3[SIZE_L3 + nid], nid);
-			}
-		}
-	}
-
-	/* 6) resize the head arrays to their final sizes */
-	{
-		struct kmem_cache *cachep;
-		mutex_lock(&cache_chain_mutex);
-		list_for_each_entry(cachep, &cache_chain, next)
-			if (enable_cpucache(cachep))
-				BUG();
-		mutex_unlock(&cache_chain_mutex);
-	}
-
-	/* Annotate slab for lockdep -- annotate the malloc caches */
-	init_lock_keys();
-
-
-	/* Done! */
-	g_cpucache_up = FULL;
-
-	/*
-	 * Register a cpu startup notifier callback that initializes
-	 * cpu_cache_get for all new cpus
-	 */
-	register_cpu_notifier(&cpucache_notifier);
-
-	/*
-	 * The reap timers are started later, with a module init call: That part
-	 * of the kernel is not yet operational.
-	 */
-}
-
-static int __init cpucache_init(void)
-{
-	int cpu;
-
-	/*
-	 * Register the timers that return unneeded pages to the page allocator
-	 */
-	for_each_online_cpu(cpu)
-		start_cpu_timer(cpu);
-	return 0;
-}
-__initcall(cpucache_init);
-
-/*
- * Interface to system's page allocator. No need to hold the cache-lock.
- *
- * If we requested dmaable memory, we will get it. Even if we
- * did not request dmaable memory, we might get it, but that
- * would be relatively rare and ignorable.
- */
-static void *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, int nodeid)
-{
-	struct page *page;
-	int nr_pages;
-	int i;
-
-#ifndef CONFIG_MMU
-	/*
-	 * Nommu uses slab's for process anonymous memory allocations, and thus
-	 * requires __GFP_COMP to properly refcount higher order allocations
-	 */
-	flags |= __GFP_COMP;
-#endif
-
-	flags |= cachep->gfpflags;
-	if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
-		flags |= __GFP_RECLAIMABLE;
-
-	page = alloc_pages_node(nodeid, flags, cachep->gfporder);
-	if (!page)
-		return NULL;
-
-	nr_pages = (1 << cachep->gfporder);
-	if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
-		add_zone_page_state(page_zone(page),
-			NR_SLAB_RECLAIMABLE, nr_pages);
-	else
-		add_zone_page_state(page_zone(page),
-			NR_SLAB_UNRECLAIMABLE, nr_pages);
-	for (i = 0; i < nr_pages; i++)
-		__SetPageSlab(page + i);
-	return page_address(page);
-}
-
-/*
- * Interface to system's page release.
- */
-static void kmem_freepages(struct kmem_cache *cachep, void *addr)
-{
-	unsigned long i = (1 << cachep->gfporder);
-	struct page *page = virt_to_page(addr);
-	const unsigned long nr_freed = i;
-
-	if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
-		sub_zone_page_state(page_zone(page),
-				NR_SLAB_RECLAIMABLE, nr_freed);
-	else
-		sub_zone_page_state(page_zone(page),
-				NR_SLAB_UNRECLAIMABLE, nr_freed);
-	while (i--) {
-		BUG_ON(!PageSlab(page));
-		__ClearPageSlab(page);
-		page++;
-	}
-	if (current->reclaim_state)
-		current->reclaim_state->reclaimed_slab += nr_freed;
-	free_pages((unsigned long)addr, cachep->gfporder);
-}
-
-static void kmem_rcu_free(struct rcu_head *head)
-{
-	struct slab_rcu *slab_rcu = (struct slab_rcu *)head;
-	struct kmem_cache *cachep = slab_rcu->cachep;
-
-	kmem_freepages(cachep, slab_rcu->addr);
-	if (OFF_SLAB(cachep))
-		kmem_cache_free(cachep->slabp_cache, slab_rcu);
-}
-
-#if DEBUG
-
-#ifdef CONFIG_DEBUG_PAGEALLOC
-static void store_stackinfo(struct kmem_cache *cachep, unsigned long *addr,
-			    unsigned long caller)
-{
-	int size = obj_size(cachep);
-
-	addr = (unsigned long *)&((char *)addr)[obj_offset(cachep)];
-
-	if (size < 5 * sizeof(unsigned long))
-		return;
-
-	*addr++ = 0x12345678;
-	*addr++ = caller;
-	*addr++ = smp_processor_id();
-	size -= 3 * sizeof(unsigned long);
-	{
-		unsigned long *sptr = &caller;
-		unsigned long svalue;
-
-		while (!kstack_end(sptr)) {
-			svalue = *sptr++;
-			if (kernel_text_address(svalue)) {
-				*addr++ = svalue;
-				size -= sizeof(unsigned long);
-				if (size <= sizeof(unsigned long))
-					break;
-			}
-		}
-
-	}
-	*addr++ = 0x87654321;
-}
-#endif
-
-static void poison_obj(struct kmem_cache *cachep, void *addr, unsigned char val)
-{
-	int size = obj_size(cachep);
-	addr = &((char *)addr)[obj_offset(cachep)];
-
-	memset(addr, val, size);
-	*(unsigned char *)(addr + size - 1) = POISON_END;
-}
-
-static void dump_line(char *data, int offset, int limit)
-{
-	int i;
-	unsigned char error = 0;
-	int bad_count = 0;
-
-	printk(KERN_ERR "%03x:", offset);
-	for (i = 0; i < limit; i++) {
-		if (data[offset + i] != POISON_FREE) {
-			error = data[offset + i];
-			bad_count++;
-		}
-		printk(" %02x", (unsigned char)data[offset + i]);
-	}
-	printk("\n");
-
-	if (bad_count == 1) {
-		error ^= POISON_FREE;
-		if (!(error & (error - 1))) {
-			printk(KERN_ERR "Single bit error detected. Probably "
-					"bad RAM.\n");
-#ifdef CONFIG_X86
-			printk(KERN_ERR "Run memtest86+ or a similar memory "
-					"test tool.\n");
-#else
-			printk(KERN_ERR "Run a memory test tool.\n");
-#endif
-		}
-	}
-}
-#endif
-
-#if DEBUG
-
-static void print_objinfo(struct kmem_cache *cachep, void *objp, int lines)
-{
-	int i, size;
-	char *realobj;
-
-	if (cachep->flags & SLAB_RED_ZONE) {
-		printk(KERN_ERR "Redzone: 0x%llx/0x%llx.\n",
-			*dbg_redzone1(cachep, objp),
-			*dbg_redzone2(cachep, objp));
-	}
-
-	if (cachep->flags & SLAB_STORE_USER) {
-		printk(KERN_ERR "Last user: [<%p>]",
-			*dbg_userword(cachep, objp));
-		print_symbol("(%s)",
-				(unsigned long)*dbg_userword(cachep, objp));
-		printk("\n");
-	}
-	realobj = (char *)objp + obj_offset(cachep);
-	size = obj_size(cachep);
-	for (i = 0; i < size && lines; i += 16, lines--) {
-		int limit;
-		limit = 16;
-		if (i + limit > size)
-			limit = size - i;
-		dump_line(realobj, i, limit);
-	}
-}
-
-static void check_poison_obj(struct kmem_cache *cachep, void *objp)
-{
-	char *realobj;
-	int size, i;
-	int lines = 0;
-
-	realobj = (char *)objp + obj_offset(cachep);
-	size = obj_size(cachep);
-
-	for (i = 0; i < size; i++) {
-		char exp = POISON_FREE;
-		if (i == size - 1)
-			exp = POISON_END;
-		if (realobj[i] != exp) {
-			int limit;
-			/* Mismatch ! */
-			/* Print header */
-			if (lines == 0) {
-				printk(KERN_ERR
-					"Slab corruption: %s start=%p, len=%d\n",
-					cachep->name, realobj, size);
-				print_objinfo(cachep, objp, 0);
-			}
-			/* Hexdump the affected line */
-			i = (i / 16) * 16;
-			limit = 16;
-			if (i + limit > size)
-				limit = size - i;
-			dump_line(realobj, i, limit);
-			i += 16;
-			lines++;
-			/* Limit to 5 lines */
-			if (lines > 5)
-				break;
-		}
-	}
-	if (lines != 0) {
-		/* Print some data about the neighboring objects, if they
-		 * exist:
-		 */
-		struct slab *slabp = virt_to_slab(objp);
-		unsigned int objnr;
-
-		objnr = obj_to_index(cachep, slabp, objp);
-		if (objnr) {
-			objp = index_to_obj(cachep, slabp, objnr - 1);
-			realobj = (char *)objp + obj_offset(cachep);
-			printk(KERN_ERR "Prev obj: start=%p, len=%d\n",
-			       realobj, size);
-			print_objinfo(cachep, objp, 2);
-		}
-		if (objnr + 1 < cachep->num) {
-			objp = index_to_obj(cachep, slabp, objnr + 1);
-			realobj = (char *)objp + obj_offset(cachep);
-			printk(KERN_ERR "Next obj: start=%p, len=%d\n",
-			       realobj, size);
-			print_objinfo(cachep, objp, 2);
-		}
-	}
-}
-#endif
-
-#if DEBUG
-/**
- * slab_destroy_objs - destroy a slab and its objects
- * @cachep: cache pointer being destroyed
- * @slabp: slab pointer being destroyed
- *
- * Call the registered destructor for each object in a slab that is being
- * destroyed.
- */
-static void slab_destroy_objs(struct kmem_cache *cachep, struct slab *slabp)
-{
-	int i;
-	for (i = 0; i < cachep->num; i++) {
-		void *objp = index_to_obj(cachep, slabp, i);
-
-		if (cachep->flags & SLAB_POISON) {
-#ifdef CONFIG_DEBUG_PAGEALLOC
-			if (cachep->buffer_size % PAGE_SIZE == 0 &&
-					OFF_SLAB(cachep))
-				kernel_map_pages(virt_to_page(objp),
-					cachep->buffer_size / PAGE_SIZE, 1);
-			else
-				check_poison_obj(cachep, objp);
-#else
-			check_poison_obj(cachep, objp);
-#endif
-		}
-		if (cachep->flags & SLAB_RED_ZONE) {
-			if (*dbg_redzone1(cachep, objp) != RED_INACTIVE)
-				slab_error(cachep, "start of a freed object "
-					   "was overwritten");
-			if (*dbg_redzone2(cachep, objp) != RED_INACTIVE)
-				slab_error(cachep, "end of a freed object "
-					   "was overwritten");
-		}
-	}
-}
-#else
-static void slab_destroy_objs(struct kmem_cache *cachep, struct slab *slabp)
-{
-}
-#endif
-
-/**
- * slab_destroy - destroy and release all objects in a slab
- * @cachep: cache pointer being destroyed
- * @slabp: slab pointer being destroyed
- *
- * Destroy all the objs in a slab, and release the mem back to the system.
- * Before calling the slab must have been unlinked from the cache.  The
- * cache-lock is not held/needed.
- */
-static void slab_destroy(struct kmem_cache *cachep, struct slab *slabp)
-{
-	void *addr = slabp->s_mem - slabp->colouroff;
-
-	slab_destroy_objs(cachep, slabp);
-	if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU)) {
-		struct slab_rcu *slab_rcu;
-
-		slab_rcu = (struct slab_rcu *)slabp;
-		slab_rcu->cachep = cachep;
-		slab_rcu->addr = addr;
-		call_rcu(&slab_rcu->head, kmem_rcu_free);
-	} else {
-		kmem_freepages(cachep, addr);
-		if (OFF_SLAB(cachep))
-			kmem_cache_free(cachep->slabp_cache, slabp);
-	}
-}
-
-/*
- * For setting up all the kmem_list3s for cache whose buffer_size is same as
- * size of kmem_list3.
- */
-static void __init set_up_list3s(struct kmem_cache *cachep, int index)
-{
-	int node;
-
-	for_each_online_node(node) {
-		cachep->nodelists[node] = &initkmem_list3[index + node];
-		cachep->nodelists[node]->next_reap = jiffies +
-		    REAPTIMEOUT_LIST3 +
-		    ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
-	}
-}
-
-static void __kmem_cache_destroy(struct kmem_cache *cachep)
-{
-	int i;
-	struct kmem_list3 *l3;
-
-	for_each_online_cpu(i)
-	    kfree(cachep->array[i]);
-
-	/* NUMA: free the list3 structures */
-	for_each_online_node(i) {
-		l3 = cachep->nodelists[i];
-		if (l3) {
-			kfree(l3->shared);
-			free_alien_cache(l3->alien);
-			kfree(l3);
-		}
-	}
-	kmem_cache_free(&cache_cache, cachep);
-}
-
-
-/**
- * calculate_slab_order - calculate size (page order) of slabs
- * @cachep: pointer to the cache that is being created
- * @size: size of objects to be created in this cache.
- * @align: required alignment for the objects.
- * @flags: slab allocation flags
- *
- * Also calculates the number of objects per slab.
- *
- * This could be made much more intelligent.  For now, try to avoid using
- * high order pages for slabs.  When the gfp() functions are more friendly
- * towards high-order requests, this should be changed.
- */
-static size_t calculate_slab_order(struct kmem_cache *cachep,
-			size_t size, size_t align, unsigned long flags)
-{
-	unsigned long offslab_limit;
-	size_t left_over = 0;
-	int gfporder;
-
-	for (gfporder = 0; gfporder <= KMALLOC_MAX_ORDER; gfporder++) {
-		unsigned int num;
-		size_t remainder;
-
-		cache_estimate(gfporder, size, align, flags, &remainder, &num);
-		if (!num)
-			continue;
-
-		if (flags & CFLGS_OFF_SLAB) {
-			/*
-			 * Max number of objs-per-slab for caches which
-			 * use off-slab slabs. Needed to avoid a possible
-			 * looping condition in cache_grow().
-			 */
-			offslab_limit = size - sizeof(struct slab);
-			offslab_limit /= sizeof(kmem_bufctl_t);
-
- 			if (num > offslab_limit)
-				break;
-		}
-
-		/* Found something acceptable - save it away */
-		cachep->num = num;
-		cachep->gfporder = gfporder;
-		left_over = remainder;
-
-		/*
-		 * A VFS-reclaimable slab tends to have most allocations
-		 * as GFP_NOFS and we really don't want to have to be allocating
-		 * higher-order pages when we are unable to shrink dcache.
-		 */
-		if (flags & SLAB_RECLAIM_ACCOUNT)
-			break;
-
-		/*
-		 * Large number of objects is good, but very large slabs are
-		 * currently bad for the gfp()s.
-		 */
-		if (gfporder >= slab_break_gfp_order)
-			break;
-
-		/*
-		 * Acceptable internal fragmentation?
-		 */
-		if (left_over * 8 <= (PAGE_SIZE << gfporder))
-			break;
-	}
-	return left_over;
-}
-
-static int __init_refok setup_cpu_cache(struct kmem_cache *cachep)
-{
-	if (g_cpucache_up == FULL)
-		return enable_cpucache(cachep);
-
-	if (g_cpucache_up == NONE) {
-		/*
-		 * Note: the first kmem_cache_create must create the cache
-		 * that's used by kmalloc(24), otherwise the creation of
-		 * further caches will BUG().
-		 */
-		cachep->array[smp_processor_id()] = &initarray_generic.cache;
-
-		/*
-		 * If the cache that's used by kmalloc(sizeof(kmem_list3)) is
-		 * the first cache, then we need to set up all its list3s,
-		 * otherwise the creation of further caches will BUG().
-		 */
-		set_up_list3s(cachep, SIZE_AC);
-		if (INDEX_AC == INDEX_L3)
-			g_cpucache_up = PARTIAL_L3;
-		else
-			g_cpucache_up = PARTIAL_AC;
-	} else {
-		cachep->array[smp_processor_id()] =
-			kmalloc(sizeof(struct arraycache_init), GFP_KERNEL);
-
-		if (g_cpucache_up == PARTIAL_AC) {
-			set_up_list3s(cachep, SIZE_L3);
-			g_cpucache_up = PARTIAL_L3;
-		} else {
-			int node;
-			for_each_online_node(node) {
-				cachep->nodelists[node] =
-				    kmalloc_node(sizeof(struct kmem_list3),
-						GFP_KERNEL, node);
-				BUG_ON(!cachep->nodelists[node]);
-				kmem_list3_init(cachep->nodelists[node]);
-			}
-		}
-	}
-	cachep->nodelists[numa_node_id()]->next_reap =
-			jiffies + REAPTIMEOUT_LIST3 +
-			((unsigned long)cachep) % REAPTIMEOUT_LIST3;
-
-	cpu_cache_get(cachep)->avail = 0;
-	cpu_cache_get(cachep)->limit = BOOT_CPUCACHE_ENTRIES;
-	cpu_cache_get(cachep)->batchcount = 1;
-	cpu_cache_get(cachep)->touched = 0;
-	cachep->batchcount = 1;
-	cachep->limit = BOOT_CPUCACHE_ENTRIES;
-	return 0;
-}
-
-/**
- * kmem_cache_create - Create a cache.
- * @name: A string which is used in /proc/slabinfo to identify this cache.
- * @size: The size of objects to be created in this cache.
- * @align: The required alignment for the objects.
- * @flags: SLAB flags
- * @ctor: A constructor for the objects.
- * @ops: A kmem_cache_ops structure (ignored).
- *
- * Returns a ptr to the cache on success, NULL on failure.
- * Cannot be called within a int, but can be interrupted.
- * The @ctor is run when new pages are allocated by the cache
- * and the @dtor is run before the pages are handed back.
- *
- * @name must be valid until the cache is destroyed. This implies that
- * the module calling this has to destroy the cache before getting unloaded.
- *
- * The flags are
- *
- * %SLAB_POISON - Poison the slab with a known test pattern (a5a5a5a5)
- * to catch references to uninitialised memory.
- *
- * %SLAB_RED_ZONE - Insert `Red' zones around the allocated memory to check
- * for buffer overruns.
- *
- * %SLAB_HWCACHE_ALIGN - Align the objects in this cache to a hardware
- * cacheline.  This can be beneficial if you're counting cycles as closely
- * as davem.
- */
-struct kmem_cache *
-kmem_cache_create (const char *name, size_t size, size_t align,
-	unsigned long flags,
-	void (*ctor)(void*, struct kmem_cache *, unsigned long),
-	const struct kmem_cache_ops *ops)
-{
-	size_t left_over, slab_size, ralign;
-	struct kmem_cache *cachep = NULL, *pc;
-
-	/*
-	 * Sanity checks... these are all serious usage bugs.
-	 */
-	if (!name || in_interrupt() || (size < BYTES_PER_WORD) ||
-	    size > KMALLOC_MAX_SIZE) {
-		printk(KERN_ERR "%s: Early error in slab %s\n", __FUNCTION__,
-				name);
-		BUG();
-	}
-
-	/*
-	 * We use cache_chain_mutex to ensure a consistent view of
-	 * cpu_online_map as well.  Please see cpuup_callback
-	 */
-	mutex_lock(&cache_chain_mutex);
-
-	list_for_each_entry(pc, &cache_chain, next) {
-		char tmp;
-		int res;
-
-		/*
-		 * This happens when the module gets unloaded and doesn't
-		 * destroy its slab cache and no-one else reuses the vmalloc
-		 * area of the module.  Print a warning.
-		 */
-		res = probe_kernel_address(pc->name, tmp);
-		if (res) {
-			printk(KERN_ERR
-			       "SLAB: cache with size %d has lost its name\n",
-			       pc->buffer_size);
-			continue;
-		}
-
-		if (!strcmp(pc->name, name)) {
-			printk(KERN_ERR
-			       "kmem_cache_create: duplicate cache %s\n", name);
-			dump_stack();
-			goto oops;
-		}
-	}
-
-#if DEBUG
-	WARN_ON(strchr(name, ' '));	/* It confuses parsers */
-#if FORCED_DEBUG
-	/*
-	 * Enable redzoning and last user accounting, except for caches with
-	 * large objects, if the increased size would increase the object size
-	 * above the next power of two: caches with object sizes just above a
-	 * power of two have a significant amount of internal fragmentation.
-	 */
-	if (size < 4096 || fls(size - 1) == fls(size-1 + 3 * BYTES_PER_WORD))
-		flags |= SLAB_RED_ZONE | SLAB_STORE_USER;
-	if (!(flags & SLAB_DESTROY_BY_RCU))
-		flags |= SLAB_POISON;
-#endif
-	if (flags & SLAB_DESTROY_BY_RCU)
-		BUG_ON(flags & SLAB_POISON);
-#endif
-	/*
-	 * Always checks flags, a caller might be expecting debug support which
-	 * isn't available.
-	 */
-	BUG_ON(flags & ~CREATE_MASK);
-
-	/*
-	 * Check that size is in terms of words.  This is needed to avoid
-	 * unaligned accesses for some archs when redzoning is used, and makes
-	 * sure any on-slab bufctl's are also correctly aligned.
-	 */
-	if (size & (BYTES_PER_WORD - 1)) {
-		size += (BYTES_PER_WORD - 1);
-		size &= ~(BYTES_PER_WORD - 1);
-	}
-
-	/* calculate the final buffer alignment: */
-
-	/* 1) arch recommendation: can be overridden for debug */
-	if (flags & SLAB_HWCACHE_ALIGN) {
-		/*
-		 * Default alignment: as specified by the arch code.  Except if
-		 * an object is really small, then squeeze multiple objects into
-		 * one cacheline.
-		 */
-		ralign = cache_line_size();
-		while (size <= ralign / 2)
-			ralign /= 2;
-	} else {
-		ralign = BYTES_PER_WORD;
-	}
-
-	/*
-	 * Redzoning and user store require word alignment. Note this will be
-	 * overridden by architecture or caller mandated alignment if either
-	 * is greater than BYTES_PER_WORD.
-	 */
-	if (flags & SLAB_RED_ZONE || flags & SLAB_STORE_USER)
-		ralign = __alignof__(unsigned long long);
-
-	/* 2) arch mandated alignment */
-	if (ralign < ARCH_SLAB_MINALIGN) {
-		ralign = ARCH_SLAB_MINALIGN;
-	}
-	/* 3) caller mandated alignment */
-	if (ralign < align) {
-		ralign = align;
-	}
-	/* disable debug if necessary */
-	if (ralign > __alignof__(unsigned long long))
-		flags &= ~(SLAB_RED_ZONE | SLAB_STORE_USER);
-	/*
-	 * 4) Store it.
-	 */
-	align = ralign;
-
-	/* Get cache's description obj. */
-	cachep = kmem_cache_zalloc(&cache_cache, GFP_KERNEL);
-	if (!cachep)
-		goto oops;
-
-#if DEBUG
-	cachep->obj_size = size;
-
-	/*
-	 * Both debugging options require word-alignment which is calculated
-	 * into align above.
-	 */
-	if (flags & SLAB_RED_ZONE) {
-		/* add space for red zone words */
-		cachep->obj_offset += sizeof(unsigned long long);
-		size += 2 * sizeof(unsigned long long);
-	}
-	if (flags & SLAB_STORE_USER) {
-		/* user store requires one word storage behind the end of
-		 * the real object.
-		 */
-		size += BYTES_PER_WORD;
-	}
-#if FORCED_DEBUG && defined(CONFIG_DEBUG_PAGEALLOC)
-	if (size >= malloc_sizes[INDEX_L3 + 1].cs_size
-	    && cachep->obj_size > cache_line_size() && size < PAGE_SIZE) {
-		cachep->obj_offset += PAGE_SIZE - size;
-		size = PAGE_SIZE;
-	}
-#endif
-#endif
-
-	/*
-	 * Determine if the slab management is 'on' or 'off' slab.
-	 * (bootstrapping cannot cope with offslab caches so don't do
-	 * it too early on.)
-	 */
-	if ((size >= (PAGE_SIZE >> 3)) && !slab_early_init)
-		/*
-		 * Size is large, assume best to place the slab management obj
-		 * off-slab (should allow better packing of objs).
-		 */
-		flags |= CFLGS_OFF_SLAB;
-
-	size = ALIGN(size, align);
-
-	left_over = calculate_slab_order(cachep, size, align, flags);
-
-	if (!cachep->num) {
-		printk(KERN_ERR
-		       "kmem_cache_create: couldn't create cache %s.\n", name);
-		kmem_cache_free(&cache_cache, cachep);
-		cachep = NULL;
-		goto oops;
-	}
-	slab_size = ALIGN(cachep->num * sizeof(kmem_bufctl_t)
-			  + sizeof(struct slab), align);
-
-	/*
-	 * If the slab has been placed off-slab, and we have enough space then
-	 * move it on-slab. This is at the expense of any extra colouring.
-	 */
-	if (flags & CFLGS_OFF_SLAB && left_over >= slab_size) {
-		flags &= ~CFLGS_OFF_SLAB;
-		left_over -= slab_size;
-	}
-
-	if (flags & CFLGS_OFF_SLAB) {
-		/* really off slab. No need for manual alignment */
-		slab_size =
-		    cachep->num * sizeof(kmem_bufctl_t) + sizeof(struct slab);
-	}
-
-	cachep->colour_off = cache_line_size();
-	/* Offset must be a multiple of the alignment. */
-	if (cachep->colour_off < align)
-		cachep->colour_off = align;
-	cachep->colour = left_over / cachep->colour_off;
-	cachep->slab_size = slab_size;
-	cachep->flags = flags;
-	cachep->gfpflags = 0;
-	if (CONFIG_ZONE_DMA_FLAG && (flags & SLAB_CACHE_DMA))
-		cachep->gfpflags |= GFP_DMA;
-	cachep->buffer_size = size;
-	cachep->reciprocal_buffer_size = reciprocal_value(size);
-
-	if (flags & CFLGS_OFF_SLAB) {
-		cachep->slabp_cache = kmem_find_general_cachep(slab_size, 0u);
-		/*
-		 * This is a possibility for one of the malloc_sizes caches.
-		 * But since we go off slab only for object size greater than
-		 * PAGE_SIZE/8, and malloc_sizes gets created in ascending order,
-		 * this should not happen at all.
-		 * But leave a BUG_ON for some lucky dude.
-		 */
-		BUG_ON(ZERO_OR_NULL_PTR(cachep->slabp_cache));
-	}
-	cachep->ctor = ctor;
-	cachep->name = name;
-
-	if (setup_cpu_cache(cachep)) {
-		__kmem_cache_destroy(cachep);
-		cachep = NULL;
-		goto oops;
-	}
-
-	/* cache setup completed, link it into the list */
-	list_add(&cachep->next, &cache_chain);
-oops:
-	if (!cachep && (flags & SLAB_PANIC))
-		panic("kmem_cache_create(): failed to create slab `%s'\n",
-		      name);
-	mutex_unlock(&cache_chain_mutex);
-	return cachep;
-}
-EXPORT_SYMBOL(kmem_cache_create);
-
-#if DEBUG
-static void check_irq_off(void)
-{
-	BUG_ON(!irqs_disabled());
-}
-
-static void check_irq_on(void)
-{
-	BUG_ON(irqs_disabled());
-}
-
-static void check_spinlock_acquired(struct kmem_cache *cachep)
-{
-#ifdef CONFIG_SMP
-	check_irq_off();
-	assert_spin_locked(&cachep->nodelists[numa_node_id()]->list_lock);
-#endif
-}
-
-static void check_spinlock_acquired_node(struct kmem_cache *cachep, int node)
-{
-#ifdef CONFIG_SMP
-	check_irq_off();
-	assert_spin_locked(&cachep->nodelists[node]->list_lock);
-#endif
-}
-
-#else
-#define check_irq_off()	do { } while(0)
-#define check_irq_on()	do { } while(0)
-#define check_spinlock_acquired(x) do { } while(0)
-#define check_spinlock_acquired_node(x, y) do { } while(0)
-#endif
-
-static void drain_array(struct kmem_cache *cachep, struct kmem_list3 *l3,
-			struct array_cache *ac,
-			int force, int node);
-
-static void do_drain(void *arg)
-{
-	struct kmem_cache *cachep = arg;
-	struct array_cache *ac;
-	int node = numa_node_id();
-
-	check_irq_off();
-	ac = cpu_cache_get(cachep);
-	spin_lock(&cachep->nodelists[node]->list_lock);
-	free_block(cachep, ac->entry, ac->avail, node);
-	spin_unlock(&cachep->nodelists[node]->list_lock);
-	ac->avail = 0;
-}
-
-static void drain_cpu_caches(struct kmem_cache *cachep)
-{
-	struct kmem_list3 *l3;
-	int node;
-
-	on_each_cpu(do_drain, cachep, 1, 1);
-	check_irq_on();
-	for_each_online_node(node) {
-		l3 = cachep->nodelists[node];
-		if (l3 && l3->alien)
-			drain_alien_cache(cachep, l3->alien);
-	}
-
-	for_each_online_node(node) {
-		l3 = cachep->nodelists[node];
-		if (l3)
-			drain_array(cachep, l3, l3->shared, 1, node);
-	}
-}
-
-/*
- * Remove slabs from the list of free slabs.
- * Specify the number of slabs to drain in tofree.
- *
- * Returns the actual number of slabs released.
- */
-static int drain_freelist(struct kmem_cache *cache,
-			struct kmem_list3 *l3, int tofree)
-{
-	struct list_head *p;
-	int nr_freed;
-	struct slab *slabp;
-
-	nr_freed = 0;
-	while (nr_freed < tofree && !list_empty(&l3->slabs_free)) {
-
-		spin_lock_irq(&l3->list_lock);
-		p = l3->slabs_free.prev;
-		if (p == &l3->slabs_free) {
-			spin_unlock_irq(&l3->list_lock);
-			goto out;
-		}
-
-		slabp = list_entry(p, struct slab, list);
-#if DEBUG
-		BUG_ON(slabp->inuse);
-#endif
-		list_del(&slabp->list);
-		/*
-		 * Safe to drop the lock. The slab is no longer linked
-		 * to the cache.
-		 */
-		l3->free_objects -= cache->num;
-		spin_unlock_irq(&l3->list_lock);
-		slab_destroy(cache, slabp);
-		nr_freed++;
-	}
-out:
-	return nr_freed;
-}
-
-/* Called with cache_chain_mutex held to protect against cpu hotplug */
-static int __cache_shrink(struct kmem_cache *cachep)
-{
-	int ret = 0, i = 0;
-	struct kmem_list3 *l3;
-
-	drain_cpu_caches(cachep);
-
-	check_irq_on();
-	for_each_online_node(i) {
-		l3 = cachep->nodelists[i];
-		if (!l3)
-			continue;
-
-		drain_freelist(cachep, l3, l3->free_objects);
-
-		ret += !list_empty(&l3->slabs_full) ||
-			!list_empty(&l3->slabs_partial);
-	}
-	return (ret ? 1 : 0);
-}
-
-/**
- * kmem_cache_shrink - Shrink a cache.
- * @cachep: The cache to shrink.
- *
- * Releases as many slabs as possible for a cache.
- * To help debugging, a zero exit status indicates all slabs were released.
- */
-int kmem_cache_shrink(struct kmem_cache *cachep)
-{
-	int ret;
-	BUG_ON(!cachep || in_interrupt());
-
-	mutex_lock(&cache_chain_mutex);
-	ret = __cache_shrink(cachep);
-	mutex_unlock(&cache_chain_mutex);
-	return ret;
-}
-EXPORT_SYMBOL(kmem_cache_shrink);
-
-int kmem_cache_defrag(int percent, int node)
-{
-	return 0;
-}
-
-/*
- * SLAB does not support slab defragmentation
- */
-int kmem_cache_vacate(struct page *page)
-{
-	return 0;
-}
-EXPORT_SYMBOL(kmem_cache_vacate);
-
-/**
- * kmem_cache_destroy - delete a cache
- * @cachep: the cache to destroy
- *
- * Remove a &struct kmem_cache object from the slab cache.
- *
- * It is expected this function will be called by a module when it is
- * unloaded.  This will remove the cache completely, and avoid a duplicate
- * cache being allocated each time a module is loaded and unloaded, if the
- * module doesn't have persistent in-kernel storage across loads and unloads.
- *
- * The cache must be empty before calling this function.
- *
- * The caller must guarantee that noone will allocate memory from the cache
- * during the kmem_cache_destroy().
- */
-void kmem_cache_destroy(struct kmem_cache *cachep)
-{
-	BUG_ON(!cachep || in_interrupt());
-
-	/* Find the cache in the chain of caches. */
-	mutex_lock(&cache_chain_mutex);
-	/*
-	 * the chain is never empty, cache_cache is never destroyed
-	 */
-	list_del(&cachep->next);
-	if (__cache_shrink(cachep)) {
-		slab_error(cachep, "Can't free all objects");
-		list_add(&cachep->next, &cache_chain);
-		mutex_unlock(&cache_chain_mutex);
-		return;
-	}
-
-	if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
-		synchronize_rcu();
-
-	__kmem_cache_destroy(cachep);
-	mutex_unlock(&cache_chain_mutex);
-}
-EXPORT_SYMBOL(kmem_cache_destroy);
-
-/*
- * Get the memory for a slab management obj.
- * For a slab cache when the slab descriptor is off-slab, slab descriptors
- * always come from malloc_sizes caches.  The slab descriptor cannot
- * come from the same cache which is getting created because,
- * when we are searching for an appropriate cache for these
- * descriptors in kmem_cache_create, we search through the malloc_sizes array.
- * If we are creating a malloc_sizes cache here it would not be visible to
- * kmem_find_general_cachep till the initialization is complete.
- * Hence we cannot have slabp_cache same as the original cache.
- */
-static struct slab *alloc_slabmgmt(struct kmem_cache *cachep, void *objp,
-				   int colour_off, gfp_t local_flags,
-				   int nodeid)
-{
-	struct slab *slabp;
-
-	if (OFF_SLAB(cachep)) {
-		/* Slab management obj is off-slab. */
-		slabp = kmem_cache_alloc_node(cachep->slabp_cache,
-					      local_flags & ~GFP_THISNODE, nodeid);
-		if (!slabp)
-			return NULL;
-	} else {
-		slabp = objp + colour_off;
-		colour_off += cachep->slab_size;
-	}
-	slabp->inuse = 0;
-	slabp->colouroff = colour_off;
-	slabp->s_mem = objp + colour_off;
-	slabp->nodeid = nodeid;
-	return slabp;
-}
-
-static inline kmem_bufctl_t *slab_bufctl(struct slab *slabp)
-{
-	return (kmem_bufctl_t *) (slabp + 1);
-}
-
-static void cache_init_objs(struct kmem_cache *cachep,
-			    struct slab *slabp)
-{
-	int i;
-
-	for (i = 0; i < cachep->num; i++) {
-		void *objp = index_to_obj(cachep, slabp, i);
-#if DEBUG
-		/* need to poison the objs? */
-		if (cachep->flags & SLAB_POISON)
-			poison_obj(cachep, objp, POISON_FREE);
-		if (cachep->flags & SLAB_STORE_USER)
-			*dbg_userword(cachep, objp) = NULL;
-
-		if (cachep->flags & SLAB_RED_ZONE) {
-			*dbg_redzone1(cachep, objp) = RED_INACTIVE;
-			*dbg_redzone2(cachep, objp) = RED_INACTIVE;
-		}
-		/*
-		 * Constructors are not allowed to allocate memory from the same
-		 * cache which they are a constructor for.  Otherwise, deadlock.
-		 * They must also be threaded.
-		 */
-		if (cachep->ctor && !(cachep->flags & SLAB_POISON))
-			cachep->ctor(objp + obj_offset(cachep), cachep,
-				     0);
-
-		if (cachep->flags & SLAB_RED_ZONE) {
-			if (*dbg_redzone2(cachep, objp) != RED_INACTIVE)
-				slab_error(cachep, "constructor overwrote the"
-					   " end of an object");
-			if (*dbg_redzone1(cachep, objp) != RED_INACTIVE)
-				slab_error(cachep, "constructor overwrote the"
-					   " start of an object");
-		}
-		if ((cachep->buffer_size % PAGE_SIZE) == 0 &&
-			    OFF_SLAB(cachep) && cachep->flags & SLAB_POISON)
-			kernel_map_pages(virt_to_page(objp),
-					 cachep->buffer_size / PAGE_SIZE, 0);
-#else
-		if (cachep->ctor)
-			cachep->ctor(objp, cachep, 0);
-#endif
-		slab_bufctl(slabp)[i] = i + 1;
-	}
-	slab_bufctl(slabp)[i - 1] = BUFCTL_END;
-	slabp->free = 0;
-}
-
-static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
-{
-	if (CONFIG_ZONE_DMA_FLAG) {
-		if (flags & GFP_DMA)
-			BUG_ON(!(cachep->gfpflags & GFP_DMA));
-		else
-			BUG_ON(cachep->gfpflags & GFP_DMA);
-	}
-}
-
-static void *slab_get_obj(struct kmem_cache *cachep, struct slab *slabp,
-				int nodeid)
-{
-	void *objp = index_to_obj(cachep, slabp, slabp->free);
-	kmem_bufctl_t next;
-
-	slabp->inuse++;
-	next = slab_bufctl(slabp)[slabp->free];
-#if DEBUG
-	slab_bufctl(slabp)[slabp->free] = BUFCTL_FREE;
-	WARN_ON(slabp->nodeid != nodeid);
-#endif
-	slabp->free = next;
-
-	return objp;
-}
-
-static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
-				void *objp, int nodeid)
-{
-	unsigned int objnr = obj_to_index(cachep, slabp, objp);
-
-#if DEBUG
-	/* Verify that the slab belongs to the intended node */
-	WARN_ON(slabp->nodeid != nodeid);
-
-	if (slab_bufctl(slabp)[objnr] + 1 <= SLAB_LIMIT + 1) {
-		printk(KERN_ERR "slab: double free detected in cache "
-				"'%s', objp %p\n", cachep->name, objp);
-		BUG();
-	}
-#endif
-	slab_bufctl(slabp)[objnr] = slabp->free;
-	slabp->free = objnr;
-	slabp->inuse--;
-}
-
-/*
- * Map pages beginning at addr to the given cache and slab. This is required
- * for the slab allocator to be able to lookup the cache and slab of a
- * virtual address for kfree, ksize, kmem_ptr_validate, and slab debugging.
- */
-static void slab_map_pages(struct kmem_cache *cache, struct slab *slab,
-			   void *addr)
-{
-	int nr_pages;
-	struct page *page;
-
-	page = virt_to_page(addr);
-
-	nr_pages = 1;
-	if (likely(!PageCompound(page)))
-		nr_pages <<= cache->gfporder;
-
-	do {
-		page_set_cache(page, cache);
-		page_set_slab(page, slab);
-		page++;
-	} while (--nr_pages);
-}
-
-/*
- * Grow (by 1) the number of slabs within a cache.  This is called by
- * kmem_cache_alloc() when there are no active objs left in a cache.
- */
-static int cache_grow(struct kmem_cache *cachep,
-		gfp_t flags, int nodeid, void *objp)
-{
-	struct slab *slabp;
-	size_t offset;
-	gfp_t local_flags;
-	struct kmem_list3 *l3;
-
-	/*
-	 * Be lazy and only check for valid flags here,  keeping it out of the
-	 * critical path in kmem_cache_alloc().
-	 */
-	BUG_ON(flags & ~(GFP_DMA | __GFP_ZERO | GFP_LEVEL_MASK));
-
-	local_flags = (flags & GFP_LEVEL_MASK);
-	/* Take the l3 list lock to change the colour_next on this node */
-	check_irq_off();
-	l3 = cachep->nodelists[nodeid];
-	spin_lock(&l3->list_lock);
-
-	/* Get colour for the slab, and cal the next value. */
-	offset = l3->colour_next;
-	l3->colour_next++;
-	if (l3->colour_next >= cachep->colour)
-		l3->colour_next = 0;
-	spin_unlock(&l3->list_lock);
-
-	offset *= cachep->colour_off;
-
-	if (local_flags & __GFP_WAIT)
-		local_irq_enable();
-
-	/*
-	 * The test for missing atomic flag is performed here, rather than
-	 * the more obvious place, simply to reduce the critical path length
-	 * in kmem_cache_alloc(). If a caller is seriously mis-behaving they
-	 * will eventually be caught here (where it matters).
-	 */
-	kmem_flagcheck(cachep, flags);
-
-	/*
-	 * Get mem for the objs.  Attempt to allocate a physical page from
-	 * 'nodeid'.
-	 */
-	if (!objp)
-		objp = kmem_getpages(cachep, flags, nodeid);
-	if (!objp)
-		goto failed;
-
-	/* Get slab management. */
-	slabp = alloc_slabmgmt(cachep, objp, offset,
-			local_flags & ~GFP_THISNODE, nodeid);
-	if (!slabp)
-		goto opps1;
-
-	slabp->nodeid = nodeid;
-	slab_map_pages(cachep, slabp, objp);
-
-	cache_init_objs(cachep, slabp);
-
-	if (local_flags & __GFP_WAIT)
-		local_irq_disable();
-	check_irq_off();
-	spin_lock(&l3->list_lock);
-
-	/* Make slab active. */
-	list_add_tail(&slabp->list, &(l3->slabs_free));
-	STATS_INC_GROWN(cachep);
-	l3->free_objects += cachep->num;
-	spin_unlock(&l3->list_lock);
-	return 1;
-opps1:
-	kmem_freepages(cachep, objp);
-failed:
-	if (local_flags & __GFP_WAIT)
-		local_irq_disable();
-	return 0;
-}
-
-#if DEBUG
-
-/*
- * Perform extra freeing checks:
- * - detect bad pointers.
- * - POISON/RED_ZONE checking
- */
-static void kfree_debugcheck(const void *objp)
-{
-	if (!virt_addr_valid(objp)) {
-		printk(KERN_ERR "kfree_debugcheck: out of range ptr %lxh.\n",
-		       (unsigned long)objp);
-		BUG();
-	}
-}
-
-static inline void verify_redzone_free(struct kmem_cache *cache, void *obj)
-{
-	unsigned long long redzone1, redzone2;
-
-	redzone1 = *dbg_redzone1(cache, obj);
-	redzone2 = *dbg_redzone2(cache, obj);
-
-	/*
-	 * Redzone is ok.
-	 */
-	if (redzone1 == RED_ACTIVE && redzone2 == RED_ACTIVE)
-		return;
-
-	if (redzone1 == RED_INACTIVE && redzone2 == RED_INACTIVE)
-		slab_error(cache, "double free detected");
-	else
-		slab_error(cache, "memory outside object was overwritten");
-
-	printk(KERN_ERR "%p: redzone 1:0x%llx, redzone 2:0x%llx.\n",
-			obj, redzone1, redzone2);
-}
-
-static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
-				   void *caller)
-{
-	struct page *page;
-	unsigned int objnr;
-	struct slab *slabp;
-
-	objp -= obj_offset(cachep);
-	kfree_debugcheck(objp);
-	page = virt_to_head_page(objp);
-
-	slabp = page_get_slab(page);
-
-	if (cachep->flags & SLAB_RED_ZONE) {
-		verify_redzone_free(cachep, objp);
-		*dbg_redzone1(cachep, objp) = RED_INACTIVE;
-		*dbg_redzone2(cachep, objp) = RED_INACTIVE;
-	}
-	if (cachep->flags & SLAB_STORE_USER)
-		*dbg_userword(cachep, objp) = caller;
-
-	objnr = obj_to_index(cachep, slabp, objp);
-
-	BUG_ON(objnr >= cachep->num);
-	BUG_ON(objp != index_to_obj(cachep, slabp, objnr));
-
-#ifdef CONFIG_DEBUG_SLAB_LEAK
-	slab_bufctl(slabp)[objnr] = BUFCTL_FREE;
-#endif
-	if (cachep->flags & SLAB_POISON) {
-#ifdef CONFIG_DEBUG_PAGEALLOC
-		if ((cachep->buffer_size % PAGE_SIZE)==0 && OFF_SLAB(cachep)) {
-			store_stackinfo(cachep, objp, (unsigned long)caller);
-			kernel_map_pages(virt_to_page(objp),
-					 cachep->buffer_size / PAGE_SIZE, 0);
-		} else {
-			poison_obj(cachep, objp, POISON_FREE);
-		}
-#else
-		poison_obj(cachep, objp, POISON_FREE);
-#endif
-	}
-	return objp;
-}
-
-static void check_slabp(struct kmem_cache *cachep, struct slab *slabp)
-{
-	kmem_bufctl_t i;
-	int entries = 0;
-
-	/* Check slab's freelist to see if this obj is there. */
-	for (i = slabp->free; i != BUFCTL_END; i = slab_bufctl(slabp)[i]) {
-		entries++;
-		if (entries > cachep->num || i >= cachep->num)
-			goto bad;
-	}
-	if (entries != cachep->num - slabp->inuse) {
-bad:
-		printk(KERN_ERR "slab: Internal list corruption detected in "
-				"cache '%s'(%d), slabp %p(%d). Hexdump:\n",
-			cachep->name, cachep->num, slabp, slabp->inuse);
-		for (i = 0;
-		     i < sizeof(*slabp) + cachep->num * sizeof(kmem_bufctl_t);
-		     i++) {
-			if (i % 16 == 0)
-				printk("\n%03x:", i);
-			printk(" %02x", ((unsigned char *)slabp)[i]);
-		}
-		printk("\n");
-		BUG();
-	}
-}
-#else
-#define kfree_debugcheck(x) do { } while(0)
-#define cache_free_debugcheck(x,objp,z) (objp)
-#define check_slabp(x,y) do { } while(0)
-#endif
-
-static void *cache_alloc_refill(struct kmem_cache *cachep, gfp_t flags)
-{
-	int batchcount;
-	struct kmem_list3 *l3;
-	struct array_cache *ac;
-	int node;
-
-	node = numa_node_id();
-
-	check_irq_off();
-	ac = cpu_cache_get(cachep);
-retry:
-	batchcount = ac->batchcount;
-	if (!ac->touched && batchcount > BATCHREFILL_LIMIT) {
-		/*
-		 * If there was little recent activity on this cache, then
-		 * perform only a partial refill.  Otherwise we could generate
-		 * refill bouncing.
-		 */
-		batchcount = BATCHREFILL_LIMIT;
-	}
-	l3 = cachep->nodelists[node];
-
-	BUG_ON(ac->avail > 0 || !l3);
-	spin_lock(&l3->list_lock);
-
-	/* See if we can refill from the shared array */
-	if (l3->shared && transfer_objects(ac, l3->shared, batchcount))
-		goto alloc_done;
-
-	while (batchcount > 0) {
-		struct list_head *entry;
-		struct slab *slabp;
-		/* Get slab alloc is to come from. */
-		entry = l3->slabs_partial.next;
-		if (entry == &l3->slabs_partial) {
-			l3->free_touched = 1;
-			entry = l3->slabs_free.next;
-			if (entry == &l3->slabs_free)
-				goto must_grow;
-		}
-
-		slabp = list_entry(entry, struct slab, list);
-		check_slabp(cachep, slabp);
-		check_spinlock_acquired(cachep);
-
-		/*
-		 * The slab was either on partial or free list so
-		 * there must be at least one object available for
-		 * allocation.
-		 */
-		BUG_ON(slabp->inuse < 0 || slabp->inuse >= cachep->num);
-
-		while (slabp->inuse < cachep->num && batchcount--) {
-			STATS_INC_ALLOCED(cachep);
-			STATS_INC_ACTIVE(cachep);
-			STATS_SET_HIGH(cachep);
-
-			ac->entry[ac->avail++] = slab_get_obj(cachep, slabp,
-							    node);
-		}
-		check_slabp(cachep, slabp);
-
-		/* move slabp to correct slabp list: */
-		list_del(&slabp->list);
-		if (slabp->free == BUFCTL_END)
-			list_add(&slabp->list, &l3->slabs_full);
-		else
-			list_add(&slabp->list, &l3->slabs_partial);
-	}
-
-must_grow:
-	l3->free_objects -= ac->avail;
-alloc_done:
-	spin_unlock(&l3->list_lock);
-
-	if (unlikely(!ac->avail)) {
-		int x;
-		x = cache_grow(cachep, flags | GFP_THISNODE, node, NULL);
-
-		/* cache_grow can reenable interrupts, then ac could change. */
-		ac = cpu_cache_get(cachep);
-		if (!x && ac->avail == 0)	/* no objects in sight? abort */
-			return NULL;
-
-		if (!ac->avail)		/* objects refilled by interrupt? */
-			goto retry;
-	}
-	ac->touched = 1;
-	return ac->entry[--ac->avail];
-}
-
-static inline void cache_alloc_debugcheck_before(struct kmem_cache *cachep,
-						gfp_t flags)
-{
-	might_sleep_if(flags & __GFP_WAIT);
-#if DEBUG
-	kmem_flagcheck(cachep, flags);
-#endif
-}
-
-#if DEBUG
-static void *cache_alloc_debugcheck_after(struct kmem_cache *cachep,
-				gfp_t flags, void *objp, void *caller)
-{
-	if (!objp)
-		return objp;
-	if (cachep->flags & SLAB_POISON) {
-#ifdef CONFIG_DEBUG_PAGEALLOC
-		if ((cachep->buffer_size % PAGE_SIZE) == 0 && OFF_SLAB(cachep))
-			kernel_map_pages(virt_to_page(objp),
-					 cachep->buffer_size / PAGE_SIZE, 1);
-		else
-			check_poison_obj(cachep, objp);
-#else
-		check_poison_obj(cachep, objp);
-#endif
-		poison_obj(cachep, objp, POISON_INUSE);
-	}
-	if (cachep->flags & SLAB_STORE_USER)
-		*dbg_userword(cachep, objp) = caller;
-
-	if (cachep->flags & SLAB_RED_ZONE) {
-		if (*dbg_redzone1(cachep, objp) != RED_INACTIVE ||
-				*dbg_redzone2(cachep, objp) != RED_INACTIVE) {
-			slab_error(cachep, "double free, or memory outside"
-						" object was overwritten");
-			printk(KERN_ERR
-				"%p: redzone 1:0x%llx, redzone 2:0x%llx\n",
-				objp, *dbg_redzone1(cachep, objp),
-				*dbg_redzone2(cachep, objp));
-		}
-		*dbg_redzone1(cachep, objp) = RED_ACTIVE;
-		*dbg_redzone2(cachep, objp) = RED_ACTIVE;
-	}
-#ifdef CONFIG_DEBUG_SLAB_LEAK
-	{
-		struct slab *slabp;
-		unsigned objnr;
-
-		slabp = page_get_slab(virt_to_head_page(objp));
-		objnr = (unsigned)(objp - slabp->s_mem) / cachep->buffer_size;
-		slab_bufctl(slabp)[objnr] = BUFCTL_ACTIVE;
-	}
-#endif
-	objp += obj_offset(cachep);
-	if (cachep->ctor && cachep->flags & SLAB_POISON)
-		cachep->ctor(objp, cachep, 0);
-#if ARCH_SLAB_MINALIGN
-	if ((u32)objp & (ARCH_SLAB_MINALIGN-1)) {
-		printk(KERN_ERR "0x%p: not aligned to ARCH_SLAB_MINALIGN=%d\n",
-		       objp, ARCH_SLAB_MINALIGN);
-	}
-#endif
-	return objp;
-}
-#else
-#define cache_alloc_debugcheck_after(a,b,objp,d) (objp)
-#endif
-
-#ifdef CONFIG_FAILSLAB
-
-static struct failslab_attr {
-
-	struct fault_attr attr;
-
-	u32 ignore_gfp_wait;
-#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
-	struct dentry *ignore_gfp_wait_file;
-#endif
-
-} failslab = {
-	.attr = FAULT_ATTR_INITIALIZER,
-	.ignore_gfp_wait = 1,
-};
-
-static int __init setup_failslab(char *str)
-{
-	return setup_fault_attr(&failslab.attr, str);
-}
-__setup("failslab=", setup_failslab);
-
-static int should_failslab(struct kmem_cache *cachep, gfp_t flags)
-{
-	if (cachep == &cache_cache)
-		return 0;
-	if (flags & __GFP_NOFAIL)
-		return 0;
-	if (failslab.ignore_gfp_wait && (flags & __GFP_WAIT))
-		return 0;
-
-	return should_fail(&failslab.attr, obj_size(cachep));
-}
-
-#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
-
-static int __init failslab_debugfs(void)
-{
-	mode_t mode = S_IFREG | S_IRUSR | S_IWUSR;
-	struct dentry *dir;
-	int err;
-
-	err = init_fault_attr_dentries(&failslab.attr, "failslab");
-	if (err)
-		return err;
-	dir = failslab.attr.dentries.dir;
-
-	failslab.ignore_gfp_wait_file =
-		debugfs_create_bool("ignore-gfp-wait", mode, dir,
-				      &failslab.ignore_gfp_wait);
-
-	if (!failslab.ignore_gfp_wait_file) {
-		err = -ENOMEM;
-		debugfs_remove(failslab.ignore_gfp_wait_file);
-		cleanup_fault_attr_dentries(&failslab.attr);
-	}
-
-	return err;
-}
-
-late_initcall(failslab_debugfs);
-
-#endif /* CONFIG_FAULT_INJECTION_DEBUG_FS */
-
-#else /* CONFIG_FAILSLAB */
-
-static inline int should_failslab(struct kmem_cache *cachep, gfp_t flags)
-{
-	return 0;
-}
-
-#endif /* CONFIG_FAILSLAB */
-
-static inline void *____cache_alloc(struct kmem_cache *cachep, gfp_t flags)
-{
-	void *objp;
-	struct array_cache *ac;
-
-	check_irq_off();
-
-	ac = cpu_cache_get(cachep);
-	if (likely(ac->avail)) {
-		STATS_INC_ALLOCHIT(cachep);
-		ac->touched = 1;
-		objp = ac->entry[--ac->avail];
-	} else {
-		STATS_INC_ALLOCMISS(cachep);
-		objp = cache_alloc_refill(cachep, flags);
-	}
-	return objp;
-}
-
-#ifdef CONFIG_NUMA
-/*
- * Try allocating on another node if PF_SPREAD_SLAB|PF_MEMPOLICY.
- *
- * If we are in_interrupt, then process context, including cpusets and
- * mempolicy, may not apply and should not be used for allocation policy.
- */
-static void *alternate_node_alloc(struct kmem_cache *cachep, gfp_t flags)
-{
-	int nid_alloc, nid_here;
-
-	if (in_interrupt() || (flags & __GFP_THISNODE))
-		return NULL;
-	nid_alloc = nid_here = numa_node_id();
-	if (cpuset_do_slab_mem_spread() && (cachep->flags & SLAB_MEM_SPREAD))
-		nid_alloc = cpuset_mem_spread_node();
-	else if (current->mempolicy)
-		nid_alloc = slab_node(current->mempolicy);
-	if (nid_alloc != nid_here)
-		return ____cache_alloc_node(cachep, flags, nid_alloc);
-	return NULL;
-}
-
-/*
- * Fallback function if there was no memory available and no objects on a
- * certain node and fall back is permitted. First we scan all the
- * available nodelists for available objects. If that fails then we
- * perform an allocation without specifying a node. This allows the page
- * allocator to do its reclaim / fallback magic. We then insert the
- * slab into the proper nodelist and then allocate from it.
- */
-static void *fallback_alloc(struct kmem_cache *cache, gfp_t flags)
-{
-	struct zonelist *zonelist;
-	gfp_t local_flags;
-	struct zone **z;
-	void *obj = NULL;
-	int nid;
-
-	if (flags & __GFP_THISNODE)
-		return NULL;
-
-	zonelist = &NODE_DATA(slab_node(current->mempolicy))
-			->node_zonelists[gfp_zone(flags)];
-	local_flags = (flags & GFP_LEVEL_MASK);
-
-retry:
-	/*
-	 * Look through allowed nodes for objects available
-	 * from existing per node queues.
-	 */
-	for (z = zonelist->zones; *z && !obj; z++) {
-		nid = zone_to_nid(*z);
-
-		if (cpuset_zone_allowed_hardwall(*z, flags) &&
-			cache->nodelists[nid] &&
-			cache->nodelists[nid]->free_objects)
-				obj = ____cache_alloc_node(cache,
-					flags | GFP_THISNODE, nid);
-	}
-
-	if (!obj) {
-		/*
-		 * This allocation will be performed within the constraints
-		 * of the current cpuset / memory policy requirements.
-		 * We may trigger various forms of reclaim on the allowed
-		 * set and go into memory reserves if necessary.
-		 */
-		if (local_flags & __GFP_WAIT)
-			local_irq_enable();
-		kmem_flagcheck(cache, flags);
-		obj = kmem_getpages(cache, flags, -1);
-		if (local_flags & __GFP_WAIT)
-			local_irq_disable();
-		if (obj) {
-			/*
-			 * Insert into the appropriate per node queues
-			 */
-			nid = page_to_nid(virt_to_page(obj));
-			if (cache_grow(cache, flags, nid, obj)) {
-				obj = ____cache_alloc_node(cache,
-					flags | GFP_THISNODE, nid);
-				if (!obj)
-					/*
-					 * Another processor may allocate the
-					 * objects in the slab since we are
-					 * not holding any locks.
-					 */
-					goto retry;
-			} else {
-				/* cache_grow already freed obj */
-				obj = NULL;
-			}
-		}
-	}
-	return obj;
-}
-
-/*
- * A interface to enable slab creation on nodeid
- */
-static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
-				int nodeid)
-{
-	struct list_head *entry;
-	struct slab *slabp;
-	struct kmem_list3 *l3;
-	void *obj;
-	int x;
-
-	l3 = cachep->nodelists[nodeid];
-	BUG_ON(!l3);
-
-retry:
-	check_irq_off();
-	spin_lock(&l3->list_lock);
-	entry = l3->slabs_partial.next;
-	if (entry == &l3->slabs_partial) {
-		l3->free_touched = 1;
-		entry = l3->slabs_free.next;
-		if (entry == &l3->slabs_free)
-			goto must_grow;
-	}
-
-	slabp = list_entry(entry, struct slab, list);
-	check_spinlock_acquired_node(cachep, nodeid);
-	check_slabp(cachep, slabp);
-
-	STATS_INC_NODEALLOCS(cachep);
-	STATS_INC_ACTIVE(cachep);
-	STATS_SET_HIGH(cachep);
-
-	BUG_ON(slabp->inuse == cachep->num);
-
-	obj = slab_get_obj(cachep, slabp, nodeid);
-	check_slabp(cachep, slabp);
-	l3->free_objects--;
-	/* move slabp to correct slabp list: */
-	list_del(&slabp->list);
-
-	if (slabp->free == BUFCTL_END)
-		list_add(&slabp->list, &l3->slabs_full);
-	else
-		list_add(&slabp->list, &l3->slabs_partial);
-
-	spin_unlock(&l3->list_lock);
-	goto done;
-
-must_grow:
-	spin_unlock(&l3->list_lock);
-	x = cache_grow(cachep, flags | GFP_THISNODE, nodeid, NULL);
-	if (x)
-		goto retry;
-
-	return fallback_alloc(cachep, flags);
-
-done:
-	return obj;
-}
-
-/**
- * kmem_cache_alloc_node - Allocate an object on the specified node
- * @cachep: The cache to allocate from.
- * @flags: See kmalloc().
- * @nodeid: node number of the target node.
- * @caller: return address of caller, used for debug information
- *
- * Identical to kmem_cache_alloc but it will allocate memory on the given
- * node, which can improve the performance for cpu bound structures.
- *
- * Fallback to other node is possible if __GFP_THISNODE is not set.
- */
-static __always_inline void *
-__cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid,
-		   void *caller)
-{
-	unsigned long save_flags;
-	void *ptr;
-
-	if (should_failslab(cachep, flags))
-		return NULL;
-
-	cache_alloc_debugcheck_before(cachep, flags);
-	local_irq_save(save_flags);
-
-	if (unlikely(nodeid == -1))
-		nodeid = numa_node_id();
-
-	if (unlikely(!cachep->nodelists[nodeid])) {
-		/* Node not bootstrapped yet */
-		ptr = fallback_alloc(cachep, flags);
-		goto out;
-	}
-
-	if (nodeid == numa_node_id()) {
-		/*
-		 * Use the locally cached objects if possible.
-		 * However ____cache_alloc does not allow fallback
-		 * to other nodes. It may fail while we still have
-		 * objects on other nodes available.
-		 */
-		ptr = ____cache_alloc(cachep, flags);
-		if (ptr)
-			goto out;
-	}
-	/* ___cache_alloc_node can fall back to other nodes */
-	ptr = ____cache_alloc_node(cachep, flags, nodeid);
-  out:
-	local_irq_restore(save_flags);
-	ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
-
-	if (unlikely((flags & __GFP_ZERO) && ptr))
-		memset(ptr, 0, obj_size(cachep));
-
-	return ptr;
-}
-
-static __always_inline void *
-__do_cache_alloc(struct kmem_cache *cache, gfp_t flags)
-{
-	void *objp;
-
-	if (unlikely(current->flags & (PF_SPREAD_SLAB | PF_MEMPOLICY))) {
-		objp = alternate_node_alloc(cache, flags);
-		if (objp)
-			goto out;
-	}
-	objp = ____cache_alloc(cache, flags);
-
-	/*
-	 * We may just have run out of memory on the local node.
-	 * ____cache_alloc_node() knows how to locate memory on other nodes
-	 */
- 	if (!objp)
- 		objp = ____cache_alloc_node(cache, flags, numa_node_id());
-
-  out:
-	return objp;
-}
-#else
-
-static __always_inline void *
-__do_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
-{
-	return ____cache_alloc(cachep, flags);
-}
-
-#endif /* CONFIG_NUMA */
-
-static __always_inline void *
-__cache_alloc(struct kmem_cache *cachep, gfp_t flags, void *caller)
-{
-	unsigned long save_flags;
-	void *objp;
-
-	if (should_failslab(cachep, flags))
-		return NULL;
-
-	cache_alloc_debugcheck_before(cachep, flags);
-	local_irq_save(save_flags);
-	objp = __do_cache_alloc(cachep, flags);
-	local_irq_restore(save_flags);
-	objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
-	prefetchw(objp);
-
-	if (unlikely((flags & __GFP_ZERO) && objp))
-		memset(objp, 0, obj_size(cachep));
-
-	return objp;
-}
-
-/*
- * Caller needs to acquire correct kmem_list's list_lock
- */
-static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects,
-		       int node)
-{
-	int i;
-	struct kmem_list3 *l3;
-
-	for (i = 0; i < nr_objects; i++) {
-		void *objp = objpp[i];
-		struct slab *slabp;
-
-		slabp = virt_to_slab(objp);
-		l3 = cachep->nodelists[node];
-		list_del(&slabp->list);
-		check_spinlock_acquired_node(cachep, node);
-		check_slabp(cachep, slabp);
-		slab_put_obj(cachep, slabp, objp, node);
-		STATS_DEC_ACTIVE(cachep);
-		l3->free_objects++;
-		check_slabp(cachep, slabp);
-
-		/* fixup slab chains */
-		if (slabp->inuse == 0) {
-			if (l3->free_objects > l3->free_limit) {
-				l3->free_objects -= cachep->num;
-				/* No need to drop any previously held
-				 * lock here, even if we have a off-slab slab
-				 * descriptor it is guaranteed to come from
-				 * a different cache, refer to comments before
-				 * alloc_slabmgmt.
-				 */
-				slab_destroy(cachep, slabp);
-			} else {
-				list_add(&slabp->list, &l3->slabs_free);
-			}
-		} else {
-			/* Unconditionally move a slab to the end of the
-			 * partial list on free - maximum time for the
-			 * other objects to be freed, too.
-			 */
-			list_add_tail(&slabp->list, &l3->slabs_partial);
-		}
-	}
-}
-
-static void cache_flusharray(struct kmem_cache *cachep, struct array_cache *ac)
-{
-	int batchcount;
-	struct kmem_list3 *l3;
-	int node = numa_node_id();
-
-	batchcount = ac->batchcount;
-#if DEBUG
-	BUG_ON(!batchcount || batchcount > ac->avail);
-#endif
-	check_irq_off();
-	l3 = cachep->nodelists[node];
-	spin_lock(&l3->list_lock);
-	if (l3->shared) {
-		struct array_cache *shared_array = l3->shared;
-		int max = shared_array->limit - shared_array->avail;
-		if (max) {
-			if (batchcount > max)
-				batchcount = max;
-			memcpy(&(shared_array->entry[shared_array->avail]),
-			       ac->entry, sizeof(void *) * batchcount);
-			shared_array->avail += batchcount;
-			goto free_done;
-		}
-	}
-
-	free_block(cachep, ac->entry, batchcount, node);
-free_done:
-#if STATS
-	{
-		int i = 0;
-		struct list_head *p;
-
-		p = l3->slabs_free.next;
-		while (p != &(l3->slabs_free)) {
-			struct slab *slabp;
-
-			slabp = list_entry(p, struct slab, list);
-			BUG_ON(slabp->inuse);
-
-			i++;
-			p = p->next;
-		}
-		STATS_SET_FREEABLE(cachep, i);
-	}
-#endif
-	spin_unlock(&l3->list_lock);
-	ac->avail -= batchcount;
-	memmove(ac->entry, &(ac->entry[batchcount]), sizeof(void *)*ac->avail);
-}
-
-/*
- * Release an obj back to its cache. If the obj has a constructed state, it must
- * be in this state _before_ it is released.  Called with disabled ints.
- */
-static inline void __cache_free(struct kmem_cache *cachep, void *objp)
-{
-	struct array_cache *ac = cpu_cache_get(cachep);
-
-	check_irq_off();
-	objp = cache_free_debugcheck(cachep, objp, __builtin_return_address(0));
-
-	if (cache_free_alien(cachep, objp))
-		return;
-
-	if (likely(ac->avail < ac->limit)) {
-		STATS_INC_FREEHIT(cachep);
-		ac->entry[ac->avail++] = objp;
-		return;
-	} else {
-		STATS_INC_FREEMISS(cachep);
-		cache_flusharray(cachep, ac);
-		ac->entry[ac->avail++] = objp;
-	}
-}
-
-/**
- * kmem_cache_alloc - Allocate an object
- * @cachep: The cache to allocate from.
- * @flags: See kmalloc().
- *
- * Allocate an object from this cache.  The flags are only relevant
- * if the cache has no available objects.
- */
-void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
-{
-	return __cache_alloc(cachep, flags, __builtin_return_address(0));
-}
-EXPORT_SYMBOL(kmem_cache_alloc);
-
-/**
- * kmem_ptr_validate - check if an untrusted pointer might
- *	be a slab entry.
- * @cachep: the cache we're checking against
- * @ptr: pointer to validate
- *
- * This verifies that the untrusted pointer looks sane:
- * it is _not_ a guarantee that the pointer is actually
- * part of the slab cache in question, but it at least
- * validates that the pointer can be dereferenced and
- * looks half-way sane.
- *
- * Currently only used for dentry validation.
- */
-int kmem_ptr_validate(struct kmem_cache *cachep, const void *ptr)
-{
-	unsigned long addr = (unsigned long)ptr;
-	unsigned long min_addr = PAGE_OFFSET;
-	unsigned long align_mask = BYTES_PER_WORD - 1;
-	unsigned long size = cachep->buffer_size;
-	struct page *page;
-
-	if (unlikely(addr < min_addr))
-		goto out;
-	if (unlikely(addr > (unsigned long)high_memory - size))
-		goto out;
-	if (unlikely(addr & align_mask))
-		goto out;
-	if (unlikely(!kern_addr_valid(addr)))
-		goto out;
-	if (unlikely(!kern_addr_valid(addr + size - 1)))
-		goto out;
-	page = virt_to_page(ptr);
-	if (unlikely(!PageSlab(page)))
-		goto out;
-	if (unlikely(page_get_cache(page) != cachep))
-		goto out;
-	return 1;
-out:
-	return 0;
-}
-
-#ifdef CONFIG_NUMA
-void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
-{
-	return __cache_alloc_node(cachep, flags, nodeid,
-			__builtin_return_address(0));
-}
-EXPORT_SYMBOL(kmem_cache_alloc_node);
-
-static __always_inline void *
-__do_kmalloc_node(size_t size, gfp_t flags, int node, void *caller)
-{
-	struct kmem_cache *cachep;
-
-	cachep = kmem_find_general_cachep(size, flags);
-	if (unlikely(ZERO_OR_NULL_PTR(cachep)))
-		return cachep;
-	return kmem_cache_alloc_node(cachep, flags, node);
-}
-
-#ifdef CONFIG_DEBUG_SLAB
-void *__kmalloc_node(size_t size, gfp_t flags, int node)
-{
-	return __do_kmalloc_node(size, flags, node,
-			__builtin_return_address(0));
-}
-EXPORT_SYMBOL(__kmalloc_node);
-
-void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
-		int node, void *caller)
-{
-	return __do_kmalloc_node(size, flags, node, caller);
-}
-EXPORT_SYMBOL(__kmalloc_node_track_caller);
-#else
-void *__kmalloc_node(size_t size, gfp_t flags, int node)
-{
-	return __do_kmalloc_node(size, flags, node, NULL);
-}
-EXPORT_SYMBOL(__kmalloc_node);
-#endif /* CONFIG_DEBUG_SLAB */
-#endif /* CONFIG_NUMA */
-
-/**
- * __do_kmalloc - allocate memory
- * @size: how many bytes of memory are required.
- * @flags: the type of memory to allocate (see kmalloc).
- * @caller: function caller for debug tracking of the caller
- */
-static __always_inline void *__do_kmalloc(size_t size, gfp_t flags,
-					  void *caller)
-{
-	struct kmem_cache *cachep;
-
-	/* If you want to save a few bytes .text space: replace
-	 * __ with kmem_.
-	 * Then kmalloc uses the uninlined functions instead of the inline
-	 * functions.
-	 */
-	cachep = __find_general_cachep(size, flags);
-	if (unlikely(cachep == NULL))
-		return NULL;
-	return __cache_alloc(cachep, flags, caller);
-}
-
-
-#ifdef CONFIG_DEBUG_SLAB
-void *__kmalloc(size_t size, gfp_t flags)
-{
-	return __do_kmalloc(size, flags, __builtin_return_address(0));
-}
-EXPORT_SYMBOL(__kmalloc);
-
-void *__kmalloc_track_caller(size_t size, gfp_t flags, void *caller)
-{
-	return __do_kmalloc(size, flags, caller);
-}
-EXPORT_SYMBOL(__kmalloc_track_caller);
-
-#else
-void *__kmalloc(size_t size, gfp_t flags)
-{
-	return __do_kmalloc(size, flags, NULL);
-}
-EXPORT_SYMBOL(__kmalloc);
-#endif
-
-/**
- * kmem_cache_free - Deallocate an object
- * @cachep: The cache the allocation was from.
- * @objp: The previously allocated object.
- *
- * Free an object which was previously allocated from this
- * cache.
- */
-void kmem_cache_free(struct kmem_cache *cachep, void *objp)
-{
-	unsigned long flags;
-
-	BUG_ON(virt_to_cache(objp) != cachep);
-
-	local_irq_save(flags);
-	debug_check_no_locks_freed(objp, obj_size(cachep));
-	__cache_free(cachep, objp);
-	local_irq_restore(flags);
-}
-EXPORT_SYMBOL(kmem_cache_free);
-
-/**
- * kfree - free previously allocated memory
- * @objp: pointer returned by kmalloc.
- *
- * If @objp is NULL, no operation is performed.
- *
- * Don't free memory not originally allocated by kmalloc()
- * or you will run into trouble.
- */
-void kfree(const void *objp)
-{
-	struct kmem_cache *c;
-	unsigned long flags;
-
-	if (unlikely(ZERO_OR_NULL_PTR(objp)))
-		return;
-	local_irq_save(flags);
-	kfree_debugcheck(objp);
-	c = virt_to_cache(objp);
-	debug_check_no_locks_freed(objp, obj_size(c));
-	__cache_free(c, (void *)objp);
-	local_irq_restore(flags);
-}
-EXPORT_SYMBOL(kfree);
-
-unsigned int kmem_cache_size(struct kmem_cache *cachep)
-{
-	return obj_size(cachep);
-}
-EXPORT_SYMBOL(kmem_cache_size);
-
-const char *kmem_cache_name(struct kmem_cache *cachep)
-{
-	return cachep->name;
-}
-EXPORT_SYMBOL_GPL(kmem_cache_name);
-
-/*
- * This initializes kmem_list3 or resizes varioius caches for all nodes.
- */
-static int alloc_kmemlist(struct kmem_cache *cachep)
-{
-	int node;
-	struct kmem_list3 *l3;
-	struct array_cache *new_shared;
-	struct array_cache **new_alien = NULL;
-
-	for_each_online_node(node) {
-
-                if (use_alien_caches) {
-                        new_alien = alloc_alien_cache(node, cachep->limit);
-                        if (!new_alien)
-                                goto fail;
-                }
-
-		new_shared = NULL;
-		if (cachep->shared) {
-			new_shared = alloc_arraycache(node,
-				cachep->shared*cachep->batchcount,
-					0xbaadf00d);
-			if (!new_shared) {
-				free_alien_cache(new_alien);
-				goto fail;
-			}
-		}
-
-		l3 = cachep->nodelists[node];
-		if (l3) {
-			struct array_cache *shared = l3->shared;
-
-			spin_lock_irq(&l3->list_lock);
-
-			if (shared)
-				free_block(cachep, shared->entry,
-						shared->avail, node);
-
-			l3->shared = new_shared;
-			if (!l3->alien) {
-				l3->alien = new_alien;
-				new_alien = NULL;
-			}
-			l3->free_limit = (1 + nr_cpus_node(node)) *
-					cachep->batchcount + cachep->num;
-			spin_unlock_irq(&l3->list_lock);
-			kfree(shared);
-			free_alien_cache(new_alien);
-			continue;
-		}
-		l3 = kmalloc_node(sizeof(struct kmem_list3), GFP_KERNEL, node);
-		if (!l3) {
-			free_alien_cache(new_alien);
-			kfree(new_shared);
-			goto fail;
-		}
-
-		kmem_list3_init(l3);
-		l3->next_reap = jiffies + REAPTIMEOUT_LIST3 +
-				((unsigned long)cachep) % REAPTIMEOUT_LIST3;
-		l3->shared = new_shared;
-		l3->alien = new_alien;
-		l3->free_limit = (1 + nr_cpus_node(node)) *
-					cachep->batchcount + cachep->num;
-		cachep->nodelists[node] = l3;
-	}
-	return 0;
-
-fail:
-	if (!cachep->next.next) {
-		/* Cache is not active yet. Roll back what we did */
-		node--;
-		while (node >= 0) {
-			if (cachep->nodelists[node]) {
-				l3 = cachep->nodelists[node];
-
-				kfree(l3->shared);
-				free_alien_cache(l3->alien);
-				kfree(l3);
-				cachep->nodelists[node] = NULL;
-			}
-			node--;
-		}
-	}
-	return -ENOMEM;
-}
-
-struct ccupdate_struct {
-	struct kmem_cache *cachep;
-	struct array_cache *new[NR_CPUS];
-};
-
-static void do_ccupdate_local(void *info)
-{
-	struct ccupdate_struct *new = info;
-	struct array_cache *old;
-
-	check_irq_off();
-	old = cpu_cache_get(new->cachep);
-
-	new->cachep->array[smp_processor_id()] = new->new[smp_processor_id()];
-	new->new[smp_processor_id()] = old;
-}
-
-/* Always called with the cache_chain_mutex held */
-static int do_tune_cpucache(struct kmem_cache *cachep, int limit,
-				int batchcount, int shared)
-{
-	struct ccupdate_struct *new;
-	int i;
-
-	new = kzalloc(sizeof(*new), GFP_KERNEL);
-	if (!new)
-		return -ENOMEM;
-
-	for_each_online_cpu(i) {
-		new->new[i] = alloc_arraycache(cpu_to_node(i), limit,
-						batchcount);
-		if (!new->new[i]) {
-			for (i--; i >= 0; i--)
-				kfree(new->new[i]);
-			kfree(new);
-			return -ENOMEM;
-		}
-	}
-	new->cachep = cachep;
-
-	on_each_cpu(do_ccupdate_local, (void *)new, 1, 1);
-
-	check_irq_on();
-	cachep->batchcount = batchcount;
-	cachep->limit = limit;
-	cachep->shared = shared;
-
-	for_each_online_cpu(i) {
-		struct array_cache *ccold = new->new[i];
-		if (!ccold)
-			continue;
-		spin_lock_irq(&cachep->nodelists[cpu_to_node(i)]->list_lock);
-		free_block(cachep, ccold->entry, ccold->avail, cpu_to_node(i));
-		spin_unlock_irq(&cachep->nodelists[cpu_to_node(i)]->list_lock);
-		kfree(ccold);
-	}
-	kfree(new);
-	return alloc_kmemlist(cachep);
-}
-
-/* Called with cache_chain_mutex held always */
-static int enable_cpucache(struct kmem_cache *cachep)
-{
-	int err;
-	int limit, shared;
-
-	/*
-	 * The head array serves three purposes:
-	 * - create a LIFO ordering, i.e. return objects that are cache-warm
-	 * - reduce the number of spinlock operations.
-	 * - reduce the number of linked list operations on the slab and
-	 *   bufctl chains: array operations are cheaper.
-	 * The numbers are guessed, we should auto-tune as described by
-	 * Bonwick.
-	 */
-	if (cachep->buffer_size > 131072)
-		limit = 1;
-	else if (cachep->buffer_size > PAGE_SIZE)
-		limit = 8;
-	else if (cachep->buffer_size > 1024)
-		limit = 24;
-	else if (cachep->buffer_size > 256)
-		limit = 54;
-	else
-		limit = 120;
-
-	/*
-	 * CPU bound tasks (e.g. network routing) can exhibit cpu bound
-	 * allocation behaviour: Most allocs on one cpu, most free operations
-	 * on another cpu. For these cases, an efficient object passing between
-	 * cpus is necessary. This is provided by a shared array. The array
-	 * replaces Bonwick's magazine layer.
-	 * On uniprocessor, it's functionally equivalent (but less efficient)
-	 * to a larger limit. Thus disabled by default.
-	 */
-	shared = 0;
-	if (cachep->buffer_size <= PAGE_SIZE && num_possible_cpus() > 1)
-		shared = 8;
-
-#if DEBUG
-	/*
-	 * With debugging enabled, large batchcount lead to excessively long
-	 * periods with disabled local interrupts. Limit the batchcount
-	 */
-	if (limit > 32)
-		limit = 32;
-#endif
-	err = do_tune_cpucache(cachep, limit, (limit + 1) / 2, shared);
-	if (err)
-		printk(KERN_ERR "enable_cpucache failed for %s, error %d.\n",
-		       cachep->name, -err);
-	return err;
-}
-
-/*
- * Drain an array if it contains any elements taking the l3 lock only if
- * necessary. Note that the l3 listlock also protects the array_cache
- * if drain_array() is used on the shared array.
- */
-void drain_array(struct kmem_cache *cachep, struct kmem_list3 *l3,
-			 struct array_cache *ac, int force, int node)
-{
-	int tofree;
-
-	if (!ac || !ac->avail)
-		return;
-	if (ac->touched && !force) {
-		ac->touched = 0;
-	} else {
-		spin_lock_irq(&l3->list_lock);
-		if (ac->avail) {
-			tofree = force ? ac->avail : (ac->limit + 4) / 5;
-			if (tofree > ac->avail)
-				tofree = (ac->avail + 1) / 2;
-			free_block(cachep, ac->entry, tofree, node);
-			ac->avail -= tofree;
-			memmove(ac->entry, &(ac->entry[tofree]),
-				sizeof(void *) * ac->avail);
-		}
-		spin_unlock_irq(&l3->list_lock);
-	}
-}
-
-/**
- * cache_reap - Reclaim memory from caches.
- * @w: work descriptor
- *
- * Called from workqueue/eventd every few seconds.
- * Purpose:
- * - clear the per-cpu caches for this CPU.
- * - return freeable pages to the main free memory pool.
- *
- * If we cannot acquire the cache chain mutex then just give up - we'll try
- * again on the next iteration.
- */
-static void cache_reap(struct work_struct *w)
-{
-	struct kmem_cache *searchp;
-	struct kmem_list3 *l3;
-	int node = numa_node_id();
-	struct delayed_work *work =
-		container_of(w, struct delayed_work, work);
-
-	if (!mutex_trylock(&cache_chain_mutex))
-		/* Give up. Setup the next iteration. */
-		goto out;
-
-	list_for_each_entry(searchp, &cache_chain, next) {
-		check_irq_on();
-
-		/*
-		 * We only take the l3 lock if absolutely necessary and we
-		 * have established with reasonable certainty that
-		 * we can do some work if the lock was obtained.
-		 */
-		l3 = searchp->nodelists[node];
-
-		reap_alien(searchp, l3);
-
-		drain_array(searchp, l3, cpu_cache_get(searchp), 0, node);
-
-		/*
-		 * These are racy checks but it does not matter
-		 * if we skip one check or scan twice.
-		 */
-		if (time_after(l3->next_reap, jiffies))
-			goto next;
-
-		l3->next_reap = jiffies + REAPTIMEOUT_LIST3;
-
-		drain_array(searchp, l3, l3->shared, 0, node);
-
-		if (l3->free_touched)
-			l3->free_touched = 0;
-		else {
-			int freed;
-
-			freed = drain_freelist(searchp, l3, (l3->free_limit +
-				5 * searchp->num - 1) / (5 * searchp->num));
-			STATS_ADD_REAPED(searchp, freed);
-		}
-next:
-		cond_resched();
-	}
-	check_irq_on();
-	mutex_unlock(&cache_chain_mutex);
-	next_reap_node();
-out:
-	/* Set up the next iteration */
-	schedule_delayed_work(work, round_jiffies_relative(REAPTIMEOUT_CPUC));
-}
-
-#ifdef CONFIG_PROC_FS
-
-static void print_slabinfo_header(struct seq_file *m)
-{
-	/*
-	 * Output format version, so at least we can change it
-	 * without _too_ many complaints.
-	 */
-#if STATS
-	seq_puts(m, "slabinfo - version: 2.1 (statistics)\n");
-#else
-	seq_puts(m, "slabinfo - version: 2.1\n");
-#endif
-	seq_puts(m, "# name            <active_objs> <num_objs> <objsize> "
-		 "<objperslab> <pagesperslab>");
-	seq_puts(m, " : tunables <limit> <batchcount> <sharedfactor>");
-	seq_puts(m, " : slabdata <active_slabs> <num_slabs> <sharedavail>");
-#if STATS
-	seq_puts(m, " : globalstat <listallocs> <maxobjs> <grown> <reaped> "
-		 "<error> <maxfreeable> <nodeallocs> <remotefrees> <alienoverflow>");
-	seq_puts(m, " : cpustat <allochit> <allocmiss> <freehit> <freemiss>");
-#endif
-	seq_putc(m, '\n');
-}
-
-static void *s_start(struct seq_file *m, loff_t *pos)
-{
-	loff_t n = *pos;
-
-	mutex_lock(&cache_chain_mutex);
-	if (!n)
-		print_slabinfo_header(m);
-
-	return seq_list_start(&cache_chain, *pos);
-}
-
-static void *s_next(struct seq_file *m, void *p, loff_t *pos)
-{
-	return seq_list_next(p, &cache_chain, pos);
-}
-
-static void s_stop(struct seq_file *m, void *p)
-{
-	mutex_unlock(&cache_chain_mutex);
-}
-
-static int s_show(struct seq_file *m, void *p)
-{
-	struct kmem_cache *cachep = list_entry(p, struct kmem_cache, next);
-	struct slab *slabp;
-	unsigned long active_objs;
-	unsigned long num_objs;
-	unsigned long active_slabs = 0;
-	unsigned long num_slabs, free_objects = 0, shared_avail = 0;
-	const char *name;
-	char *error = NULL;
-	int node;
-	struct kmem_list3 *l3;
-
-	active_objs = 0;
-	num_slabs = 0;
-	for_each_online_node(node) {
-		l3 = cachep->nodelists[node];
-		if (!l3)
-			continue;
-
-		check_irq_on();
-		spin_lock_irq(&l3->list_lock);
-
-		list_for_each_entry(slabp, &l3->slabs_full, list) {
-			if (slabp->inuse != cachep->num && !error)
-				error = "slabs_full accounting error";
-			active_objs += cachep->num;
-			active_slabs++;
-		}
-		list_for_each_entry(slabp, &l3->slabs_partial, list) {
-			if (slabp->inuse == cachep->num && !error)
-				error = "slabs_partial inuse accounting error";
-			if (!slabp->inuse && !error)
-				error = "slabs_partial/inuse accounting error";
-			active_objs += slabp->inuse;
-			active_slabs++;
-		}
-		list_for_each_entry(slabp, &l3->slabs_free, list) {
-			if (slabp->inuse && !error)
-				error = "slabs_free/inuse accounting error";
-			num_slabs++;
-		}
-		free_objects += l3->free_objects;
-		if (l3->shared)
-			shared_avail += l3->shared->avail;
-
-		spin_unlock_irq(&l3->list_lock);
-	}
-	num_slabs += active_slabs;
-	num_objs = num_slabs * cachep->num;
-	if (num_objs - active_objs != free_objects && !error)
-		error = "free_objects accounting error";
-
-	name = cachep->name;
-	if (error)
-		printk(KERN_ERR "slab: cache %s error: %s\n", name, error);
-
-	seq_printf(m, "%-17s %6lu %6lu %6u %4u %4d",
-		   name, active_objs, num_objs, cachep->buffer_size,
-		   cachep->num, (1 << cachep->gfporder));
-	seq_printf(m, " : tunables %4u %4u %4u",
-		   cachep->limit, cachep->batchcount, cachep->shared);
-	seq_printf(m, " : slabdata %6lu %6lu %6lu",
-		   active_slabs, num_slabs, shared_avail);
-#if STATS
-	{			/* list3 stats */
-		unsigned long high = cachep->high_mark;
-		unsigned long allocs = cachep->num_allocations;
-		unsigned long grown = cachep->grown;
-		unsigned long reaped = cachep->reaped;
-		unsigned long errors = cachep->errors;
-		unsigned long max_freeable = cachep->max_freeable;
-		unsigned long node_allocs = cachep->node_allocs;
-		unsigned long node_frees = cachep->node_frees;
-		unsigned long overflows = cachep->node_overflow;
-
-		seq_printf(m, " : globalstat %7lu %6lu %5lu %4lu \
-				%4lu %4lu %4lu %4lu %4lu", allocs, high, grown,
-				reaped, errors, max_freeable, node_allocs,
-				node_frees, overflows);
-	}
-	/* cpu stats */
-	{
-		unsigned long allochit = atomic_read(&cachep->allochit);
-		unsigned long allocmiss = atomic_read(&cachep->allocmiss);
-		unsigned long freehit = atomic_read(&cachep->freehit);
-		unsigned long freemiss = atomic_read(&cachep->freemiss);
-
-		seq_printf(m, " : cpustat %6lu %6lu %6lu %6lu",
-			   allochit, allocmiss, freehit, freemiss);
-	}
-#endif
-	seq_putc(m, '\n');
-	return 0;
-}
-
-/*
- * slabinfo_op - iterator that generates /proc/slabinfo
- *
- * Output layout:
- * cache-name
- * num-active-objs
- * total-objs
- * object size
- * num-active-slabs
- * total-slabs
- * num-pages-per-slab
- * + further values on SMP and with statistics enabled
- */
-
-const struct seq_operations slabinfo_op = {
-	.start = s_start,
-	.next = s_next,
-	.stop = s_stop,
-	.show = s_show,
-};
-
-#define MAX_SLABINFO_WRITE 128
-/**
- * slabinfo_write - Tuning for the slab allocator
- * @file: unused
- * @buffer: user buffer
- * @count: data length
- * @ppos: unused
- */
-ssize_t slabinfo_write(struct file *file, const char __user * buffer,
-		       size_t count, loff_t *ppos)
-{
-	char kbuf[MAX_SLABINFO_WRITE + 1], *tmp;
-	int limit, batchcount, shared, res;
-	struct kmem_cache *cachep;
-
-	if (count > MAX_SLABINFO_WRITE)
-		return -EINVAL;
-	if (copy_from_user(&kbuf, buffer, count))
-		return -EFAULT;
-	kbuf[MAX_SLABINFO_WRITE] = '\0';
-
-	tmp = strchr(kbuf, ' ');
-	if (!tmp)
-		return -EINVAL;
-	*tmp = '\0';
-	tmp++;
-	if (sscanf(tmp, " %d %d %d", &limit, &batchcount, &shared) != 3)
-		return -EINVAL;
-
-	/* Find the cache in the chain of caches. */
-	mutex_lock(&cache_chain_mutex);
-	res = -EINVAL;
-	list_for_each_entry(cachep, &cache_chain, next) {
-		if (!strcmp(cachep->name, kbuf)) {
-			if (limit < 1 || batchcount < 1 ||
-					batchcount > limit || shared < 0) {
-				res = 0;
-			} else {
-				res = do_tune_cpucache(cachep, limit,
-						       batchcount, shared);
-			}
-			break;
-		}
-	}
-	mutex_unlock(&cache_chain_mutex);
-	if (res >= 0)
-		res = count;
-	return res;
-}
-
-#ifdef CONFIG_DEBUG_SLAB_LEAK
-
-static void *leaks_start(struct seq_file *m, loff_t *pos)
-{
-	mutex_lock(&cache_chain_mutex);
-	return seq_list_start(&cache_chain, *pos);
-}
-
-static inline int add_caller(unsigned long *n, unsigned long v)
-{
-	unsigned long *p;
-	int l;
-	if (!v)
-		return 1;
-	l = n[1];
-	p = n + 2;
-	while (l) {
-		int i = l/2;
-		unsigned long *q = p + 2 * i;
-		if (*q == v) {
-			q[1]++;
-			return 1;
-		}
-		if (*q > v) {
-			l = i;
-		} else {
-			p = q + 2;
-			l -= i + 1;
-		}
-	}
-	if (++n[1] == n[0])
-		return 0;
-	memmove(p + 2, p, n[1] * 2 * sizeof(unsigned long) - ((void *)p - (void *)n));
-	p[0] = v;
-	p[1] = 1;
-	return 1;
-}
-
-static void handle_slab(unsigned long *n, struct kmem_cache *c, struct slab *s)
-{
-	void *p;
-	int i;
-	if (n[0] == n[1])
-		return;
-	for (i = 0, p = s->s_mem; i < c->num; i++, p += c->buffer_size) {
-		if (slab_bufctl(s)[i] != BUFCTL_ACTIVE)
-			continue;
-		if (!add_caller(n, (unsigned long)*dbg_userword(c, p)))
-			return;
-	}
-}
-
-static void show_symbol(struct seq_file *m, unsigned long address)
-{
-#ifdef CONFIG_KALLSYMS
-	unsigned long offset, size;
-	char modname[MODULE_NAME_LEN + 1], name[KSYM_NAME_LEN + 1];
-
-	if (lookup_symbol_attrs(address, &size, &offset, modname, name) == 0) {
-		seq_printf(m, "%s+%#lx/%#lx", name, offset, size);
-		if (modname[0])
-			seq_printf(m, " [%s]", modname);
-		return;
-	}
-#endif
-	seq_printf(m, "%p", (void *)address);
-}
-
-static int leaks_show(struct seq_file *m, void *p)
-{
-	struct kmem_cache *cachep = list_entry(p, struct kmem_cache, next);
-	struct slab *slabp;
-	struct kmem_list3 *l3;
-	const char *name;
-	unsigned long *n = m->private;
-	int node;
-	int i;
-
-	if (!(cachep->flags & SLAB_STORE_USER))
-		return 0;
-	if (!(cachep->flags & SLAB_RED_ZONE))
-		return 0;
-
-	/* OK, we can do it */
-
-	n[1] = 0;
-
-	for_each_online_node(node) {
-		l3 = cachep->nodelists[node];
-		if (!l3)
-			continue;
-
-		check_irq_on();
-		spin_lock_irq(&l3->list_lock);
-
-		list_for_each_entry(slabp, &l3->slabs_full, list)
-			handle_slab(n, cachep, slabp);
-		list_for_each_entry(slabp, &l3->slabs_partial, list)
-			handle_slab(n, cachep, slabp);
-		spin_unlock_irq(&l3->list_lock);
-	}
-	name = cachep->name;
-	if (n[0] == n[1]) {
-		/* Increase the buffer size */
-		mutex_unlock(&cache_chain_mutex);
-		m->private = kzalloc(n[0] * 4 * sizeof(unsigned long), GFP_KERNEL);
-		if (!m->private) {
-			/* Too bad, we are really out */
-			m->private = n;
-			mutex_lock(&cache_chain_mutex);
-			return -ENOMEM;
-		}
-		*(unsigned long *)m->private = n[0] * 2;
-		kfree(n);
-		mutex_lock(&cache_chain_mutex);
-		/* Now make sure this entry will be retried */
-		m->count = m->size;
-		return 0;
-	}
-	for (i = 0; i < n[1]; i++) {
-		seq_printf(m, "%s: %lu ", name, n[2*i+3]);
-		show_symbol(m, n[2*i+2]);
-		seq_putc(m, '\n');
-	}
-
-	return 0;
-}
-
-const struct seq_operations slabstats_op = {
-	.start = leaks_start,
-	.next = s_next,
-	.stop = s_stop,
-	.show = leaks_show,
-};
-#endif
-#endif
-
-/**
- * ksize - get the actual amount of memory allocated for a given object
- * @objp: Pointer to the object
- *
- * kmalloc may internally round up allocations and return more memory
- * than requested. ksize() can be used to determine the actual amount of
- * memory allocated. The caller may use this additional memory, even though
- * a smaller amount of memory was initially specified with the kmalloc call.
- * The caller must guarantee that objp points to a valid object previously
- * allocated with either kmalloc() or kmem_cache_alloc(). The object
- * must not be freed during the duration of the call.
- */
-size_t ksize(const void *objp)
-{
-	if (unlikely(ZERO_OR_NULL_PTR(objp)))
-		return 0;
-
-	return obj_size(virt_to_cache(objp));
-}
Index: linux-2.6.22-rc6-mm1/lib/Kconfig.debug
===================================================================
--- linux-2.6.22-rc6-mm1.orig/lib/Kconfig.debug	2007-07-05 23:31:31.000000000 -0700
+++ linux-2.6.22-rc6-mm1/lib/Kconfig.debug	2007-07-05 23:31:45.000000000 -0700
@@ -141,23 +141,6 @@ config TIMER_STATS
 	  (it defaults to deactivated on bootup and will only be activated
 	  if some application like powertop activates it explicitly).
 
-config DEBUG_SLAB
-	bool "Debug slab memory allocations"
-	depends on DEBUG_KERNEL && SLAB
-	help
-	  Say Y here to have the kernel do limited verification on memory
-	  allocation as well as poisoning memory on free to catch use of freed
-	  memory. This can make kmalloc/kfree-intensive workloads much slower.
-
-config DEBUG_SLAB_LEAK
-	bool "Slab memory leak debugging"
-	depends on DEBUG_SLAB
-	default y
-	help
-	  Enable /proc/slab_allocators - provides detailed information about
-	  which parts of the kernel are using slab objects.  May be used for
-	  tracking memory leaks and for instrumenting memory usage.
-
 config SLUB_DEBUG_ON
 	bool "SLUB debugging on by default"
 	depends on SLUB && SLUB_DEBUG
Index: linux-2.6.22-rc6-mm1/fs/proc/proc_misc.c
===================================================================
--- linux-2.6.22-rc6-mm1.orig/fs/proc/proc_misc.c	2007-07-05 23:37:36.000000000 -0700
+++ linux-2.6.22-rc6-mm1/fs/proc/proc_misc.c	2007-07-05 23:38:01.000000000 -0700
@@ -412,47 +412,6 @@ static const struct file_operations proc
 };
 #endif
 
-#ifdef CONFIG_SLAB
-static int slabinfo_open(struct inode *inode, struct file *file)
-{
-	return seq_open(file, &slabinfo_op);
-}
-static const struct file_operations proc_slabinfo_operations = {
-	.open		= slabinfo_open,
-	.read		= seq_read,
-	.write		= slabinfo_write,
-	.llseek		= seq_lseek,
-	.release	= seq_release,
-};
-
-#ifdef CONFIG_DEBUG_SLAB_LEAK
-extern struct seq_operations slabstats_op;
-static int slabstats_open(struct inode *inode, struct file *file)
-{
-	unsigned long *n = kzalloc(PAGE_SIZE, GFP_KERNEL);
-	int ret = -ENOMEM;
-	if (n) {
-		ret = seq_open(file, &slabstats_op);
-		if (!ret) {
-			struct seq_file *m = file->private_data;
-			*n = PAGE_SIZE / (2 * sizeof(unsigned long));
-			m->private = n;
-			n = NULL;
-		}
-		kfree(n);
-	}
-	return ret;
-}
-
-static const struct file_operations proc_slabstats_operations = {
-	.open		= slabstats_open,
-	.read		= seq_read,
-	.llseek		= seq_lseek,
-	.release	= seq_release_private,
-};
-#endif
-#endif
-
 static int show_stat(struct seq_file *p, void *v)
 {
 	int i;
@@ -933,12 +892,6 @@ void __init proc_misc_init(void)
 #endif
 	create_seq_entry("stat", 0, &proc_stat_operations);
 	create_seq_entry("interrupts", 0, &proc_interrupts_operations);
-#ifdef CONFIG_SLAB
-	create_seq_entry("slabinfo",S_IWUSR|S_IRUGO,&proc_slabinfo_operations);
-#ifdef CONFIG_DEBUG_SLAB_LEAK
-	create_seq_entry("slab_allocators", 0 ,&proc_slabstats_operations);
-#endif
-#endif
 	create_seq_entry("buddyinfo",S_IRUGO, &fragmentation_file_operations);
 	create_seq_entry("pagetypeinfo", S_IRUGO, &pagetypeinfo_file_ops);
 	create_seq_entry("vmstat",S_IRUGO, &proc_vmstat_file_operations);

-- 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
  2007-07-08  3:49 [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance Christoph Lameter
                   ` (9 preceding siblings ...)
  2007-07-08  3:50 ` [patch 10/10] Remove slab in 2.6.24 Christoph Lameter
@ 2007-07-08  4:37 ` David Miller
  2007-07-09 15:45   ` Christoph Lameter
  2007-07-08 11:20 ` Andi Kleen
  11 siblings, 1 reply; 111+ messages in thread
From: David Miller @ 2007-07-08  4:37 UTC (permalink / raw)
  To: clameter
  Cc: linux-kernel, linux-mm, suresh.b.siddha, corey.d.gough, penberg, akpm

From: Christoph Lameter <clameter@sgi.com>
Date: Sat, 07 Jul 2007 20:49:52 -0700

> A cmpxchg is less costly than interrupt enabe/disable

This is cpu dependant, and in fact not true at all on Niagara
and several of the cpus in the UltraSPARC family.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-08  3:50 ` [patch 09/10] Remove the SLOB allocator for 2.6.23 Christoph Lameter
@ 2007-07-08  7:51   ` Ingo Molnar
  2007-07-08  9:43     ` Nick Piggin
                       ` (3 more replies)
  2007-07-09 20:52   ` Matt Mackall
  1 sibling, 4 replies; 111+ messages in thread
From: Ingo Molnar @ 2007-07-08  7:51 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, linux-mm, suresh.b.siddha, corey.d.gough,
	Pekka Enberg, akpm, Matt Mackall


(added Matt to the Cc: list)

* Christoph Lameter <clameter@sgi.com> wrote:

> Maintenance of slab allocators becomes a problem as new features for 
> allocators are developed. The SLOB allocator in particular has been 
> lagging behind in many ways in the past:
> 
> - Had no support for SLAB_DESTROY_BY_RCU for years (but no one 
>   noticed)
> 
> - Still has no support for slab reclaim counters. This may currently 
>   not be necessary if one would restrict the supported configurations 
>   for functionality relying on these. But even that has not been done.
> 
> The only current advantage over SLUB in terms of memory savings is 
> through SLOBs kmalloc layout that is not power of two based like SLAB 
> and SLUB which allows to eliminate some memory waste.
> 
> Through that SLOB has still a slight memory advantage over SLUB of 
> ~350k in for a standard server configuration. It is likely that the 
> savings are is smaller for real embedded configurations that have less 
> functionality.

actually, one real advantage of the SLOB is that it is a minimal, really 
simple allocator. Its text and data size is so small as well.

here's the size comparison:

   text    data     bss     dec     hex filename
  10788     837      16   11641    2d79 mm/slab.o
   6205    4207     124   10536    2928 mm/slub.o
   1640      44       4    1688     698 mm/slob.o

slab/slub have roughly the same footprint, but slob is 10% of that size. 
Would be a waste to throw this away.

A year ago the -rt kernel defaulted to the SLOB for a few releases, and 
barring some initial scalability issues (which were solved in -rt) it 
worked pretty well on generic PCs, so i dont buy the 'it doesnt work' 
argument either.

	Ingo

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-08  7:51   ` Ingo Molnar
@ 2007-07-08  9:43     ` Nick Piggin
  2007-07-08  9:54       ` Ingo Molnar
  2007-07-08 18:02     ` Andrew Morton
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 111+ messages in thread
From: Nick Piggin @ 2007-07-08  9:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Christoph Lameter, linux-kernel, linux-mm, suresh.b.siddha,
	corey.d.gough, Pekka Enberg, akpm, Matt Mackall

Ingo Molnar wrote:
> (added Matt to the Cc: list)
> 
> * Christoph Lameter <clameter@sgi.com> wrote:
> 
> 
>>Maintenance of slab allocators becomes a problem as new features for 
>>allocators are developed. The SLOB allocator in particular has been 
>>lagging behind in many ways in the past:
>>
>>- Had no support for SLAB_DESTROY_BY_RCU for years (but no one 
>>  noticed)

It likely was not frequently used on SMP, I guess.


>>- Still has no support for slab reclaim counters. This may currently 
>>  not be necessary if one would restrict the supported configurations 
>>  for functionality relying on these. But even that has not been done.

SLOB has so far run fine without any of these, hasn't it?


>>The only current advantage over SLUB in terms of memory savings is 
>>through SLOBs kmalloc layout that is not power of two based like SLAB 
>>and SLUB which allows to eliminate some memory waste.

Wrong. All "slabs" allocate out of the same pool of memory in SLOB, so
you also wind up with less waste via _external_ fragmentation, which is
espeically important on small memory machines (the kmalloc layout issue
is a problem of internal fragmentation). SLOB is also smaller and simpler
code as Ingo pointed out.


>>Through that SLOB has still a slight memory advantage over SLUB of 
>>~350k in for a standard server configuration. It is likely that the 
>>savings are is smaller for real embedded configurations that have less 
>>functionality.

When I last tested, I got similar savings with a pretty stripped down
kernel and a small mem= available RAM. Ie. to the point where those 350K
saved were a very significant chunk of remaining free memory after init
comes up.

I said exactly the same thing last time this came up. I would love to
remove code if its functionality can be adequately replaced by existing
code, but I think your reasons for removing SLOB aren't that good, and
just handwaving away the significant memory savings doesn't work.

People run 2.6 kernels with several MB of RAM, don't they? So losing
several hundred K is as bad to them as a patch that causes an Altix to
waste several hundred GB is to you.


> A year ago the -rt kernel defaulted to the SLOB for a few releases, and 
> barring some initial scalability issues (which were solved in -rt) it 
> worked pretty well on generic PCs, so i dont buy the 'it doesnt work' 
> argument either.

It's actually recently been made to work on SMP, it is much more scalable
to large memories, and some initial NUMA work is happening that some
embedded guys are interested in, all without increasing static footprint
too much, and it has actually decreased dynamic footprint too.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-08  9:43     ` Nick Piggin
@ 2007-07-08  9:54       ` Ingo Molnar
  2007-07-08 10:23         ` Nick Piggin
  0 siblings, 1 reply; 111+ messages in thread
From: Ingo Molnar @ 2007-07-08  9:54 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Christoph Lameter, linux-kernel, linux-mm, suresh.b.siddha,
	corey.d.gough, Pekka Enberg, akpm, Matt Mackall, Steven Rostedt

[-- Attachment #1: Type: text/plain, Size: 1788 bytes --]


* Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> I said exactly the same thing last time this came up. I would love to 
> remove code if its functionality can be adequately replaced by 
> existing code, but I think your reasons for removing SLOB aren't that 
> good, and just handwaving away the significant memory savings doesn't 
> work.

yeah. Also, the decision here is pretty easy: the behavior of the 
allocator is not really visible to applications. So this isnt like 
having a parallel IO scheduler or a parallel process scheduler (which 
cause problems to us by fragmenting the application space) - so the 
long-term cost to us kernel maintainers should be relatively low.

> > A year ago the -rt kernel defaulted to the SLOB for a few releases, 
> > and barring some initial scalability issues (which were solved in 
> > -rt) it worked pretty well on generic PCs, so i dont buy the 'it 
> > doesnt work' argument either.
> 
> It's actually recently been made to work on SMP, it is much more 
> scalable to large memories, and some initial NUMA work is happening 
> that some embedded guys are interested in, all without increasing 
> static footprint too much, and it has actually decreased dynamic 
> footprint too.

cool. I was referring to something else: people were running -rt on 
their beefy desktop boxes with several gigs of RAM they complained about 
the slowdown that is caused by SLOB's linear list walking.

Steve Rostedt did two nice changes to fix those scalability problems. 
I've attached Steve's two patches. With these in place SLOB was very 
usable for large systems as well, with no measurable overhead. 
(obviously the lack of per-cpu caching can still be measured, but it's a 
lot less of a problem in practice than the linear list walking was.)

	Ingo

[-- Attachment #2: slob-scale-no-bigblock-list.patch --]
[-- Type: text/plain, Size: 3673 bytes --]

This patch uses the mem_map pages to find the bigblock descriptor for
large allocations.

-- Steve

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

 mm/slob.c |   74 ++++++++++++++++++++++++++++++++++----------------------------
 1 file changed, 41 insertions(+), 33 deletions(-)

Index: linux/mm/slob.c
===================================================================
--- linux.orig/mm/slob.c
+++ linux/mm/slob.c
@@ -49,15 +49,42 @@ typedef struct slob_block slob_t;
 struct bigblock {
 	int order;
 	void *pages;
-	struct bigblock *next;
 };
 typedef struct bigblock bigblock_t;
 
 static slob_t arena = { .next = &arena, .units = 1 };
 static slob_t *slobfree = &arena;
-static bigblock_t *bigblocks;
 static DEFINE_SPINLOCK(slob_lock);
-static DEFINE_SPINLOCK(block_lock);
+
+#define __get_slob_block(b) ((unsigned long)(b) & ~(PAGE_SIZE-1))
+
+static inline struct page *get_slob_page(const void *mem)
+{
+	void *virt = (void*)__get_slob_block(mem);
+
+	return virt_to_page(virt);
+}
+
+static inline void zero_slob_block(const void *b)
+{
+	struct page *page;
+	page = get_slob_page(b);
+	memset(&page->lru, 0, sizeof(page->lru));
+}
+
+static inline void *get_slob_block(const void *b)
+{
+	struct page *page;
+	page = get_slob_page(b);
+	return page->lru.next;
+}
+
+static inline void set_slob_block(const void *b, void *data)
+{
+	struct page *page;
+	page = get_slob_page(b);
+	page->lru.next = data;
+}
 
 static void slob_free(void *b, int size);
 static void slob_timer_cbk(void);
@@ -109,6 +136,7 @@ static void *slob_alloc(size_t size, gfp
 			if (!cur)
 				return 0;
 
+			zero_slob_block(cur);
 			slob_free(cur, PAGE_SIZE);
 			spin_lock_irqsave(&slob_lock, flags);
 			cur = slobfree;
@@ -163,7 +191,6 @@ void *__kmalloc(size_t size, gfp_t gfp)
 {
 	slob_t *m;
 	bigblock_t *bb;
-	unsigned long flags;
 
 	if (size < PAGE_SIZE - SLOB_UNIT) {
 		m = slob_alloc(size + SLOB_UNIT, gfp, 0);
@@ -178,10 +205,7 @@ void *__kmalloc(size_t size, gfp_t gfp)
 	bb->pages = (void *)__get_free_pages(gfp, bb->order);
 
 	if (bb->pages) {
-		spin_lock_irqsave(&block_lock, flags);
-		bb->next = bigblocks;
-		bigblocks = bb;
-		spin_unlock_irqrestore(&block_lock, flags);
+		set_slob_block(bb->pages, bb);
 		return bb->pages;
 	}
 
@@ -192,25 +216,16 @@ EXPORT_SYMBOL(__kmalloc);
 
 void kfree(const void *block)
 {
-	bigblock_t *bb, **last = &bigblocks;
-	unsigned long flags;
+	bigblock_t *bb;
 
 	if (!block)
 		return;
 
-	if (!((unsigned long)block & (PAGE_SIZE-1))) {
-		/* might be on the big block list */
-		spin_lock_irqsave(&block_lock, flags);
-		for (bb = bigblocks; bb; last = &bb->next, bb = bb->next) {
-			if (bb->pages == block) {
-				*last = bb->next;
-				spin_unlock_irqrestore(&block_lock, flags);
-				free_pages((unsigned long)block, bb->order);
-				slob_free(bb, sizeof(bigblock_t));
-				return;
-			}
-		}
-		spin_unlock_irqrestore(&block_lock, flags);
+	bb = get_slob_block(block);
+	if (bb) {
+		free_pages((unsigned long)block, bb->order);
+		slob_free(bb, sizeof(bigblock_t));
+		return;
 	}
 
 	slob_free((slob_t *)block - 1, 0);
@@ -222,20 +237,13 @@ EXPORT_SYMBOL(kfree);
 unsigned int ksize(const void *block)
 {
 	bigblock_t *bb;
-	unsigned long flags;
 
 	if (!block)
 		return 0;
 
-	if (!((unsigned long)block & (PAGE_SIZE-1))) {
-		spin_lock_irqsave(&block_lock, flags);
-		for (bb = bigblocks; bb; bb = bb->next)
-			if (bb->pages == block) {
-				spin_unlock_irqrestore(&slob_lock, flags);
-				return PAGE_SIZE << bb->order;
-			}
-		spin_unlock_irqrestore(&block_lock, flags);
-	}
+	bb = get_slob_block(block);
+	if (bb)
+		return PAGE_SIZE << bb->order;
 
 	return ((slob_t *)block - 1)->units * SLOB_UNIT;
 }

[-- Attachment #3: slob-scale-break-out-caches.patch --]
[-- Type: text/plain, Size: 12621 bytes --]

---
 mm/slob.c |  290 ++++++++++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 235 insertions(+), 55 deletions(-)

Index: linux/mm/slob.c
===================================================================
--- linux.orig/mm/slob.c
+++ linux/mm/slob.c
@@ -27,6 +27,20 @@
  * are allocated by calling __get_free_pages. As SLAB objects know
  * their size, no separate size bookkeeping is necessary and there is
  * essentially no allocation space overhead.
+ *
+ * Modified by: Steven Rostedt <rostedt@goodmis.org> 12/20/05
+ *
+ * Now we take advantage of the kmem_cache usage.  I've removed
+ * the global slobfree, and created one for every cache.
+ *
+ * For kmalloc/kfree I've reintroduced the usage of cache_sizes,
+ * but only for sizes 32 through PAGE_SIZE >> 1 by order of 2.
+ *
+ * Having the SLOB alloc per size of the cache should speed things up
+ * greatly, not only by making the search paths smaller, but also by
+ * keeping all the caches of similar units.  This way the fragmentation
+ * should not be as big of a problem.
+ *
  */
 
 #include <linux/slab.h>
@@ -36,6 +50,8 @@
 #include <linux/module.h>
 #include <linux/timer.h>
 
+#undef DEBUG_CACHE
+
 struct slob_block {
 	int units;
 	struct slob_block *next;
@@ -52,17 +68,66 @@ struct bigblock {
 };
 typedef struct bigblock bigblock_t;
 
-static slob_t arena = { .next = &arena, .units = 1 };
-static slob_t *slobfree = &arena;
-static DEFINE_SPINLOCK(slob_lock);
+struct kmem_cache {
+	unsigned int size, align;
+	const char *name;
+	slob_t *slobfree;
+	slob_t arena;
+	spinlock_t lock;
+	void (*ctor)(void *, struct kmem_cache *, unsigned long);
+	void (*dtor)(void *, struct kmem_cache *, unsigned long);
+	atomic_t items;
+	unsigned int free;
+	struct list_head list;
+};
+
+#define NR_SLOB_CACHES ((PAGE_SHIFT) - 5) /* 32 to PAGE_SIZE-1 by order of 2 */
+#define MAX_SLOB_CACHE_SIZE (PAGE_SIZE >> 1)
+
+static struct kmem_cache *cache_sizes[NR_SLOB_CACHES];
+static struct kmem_cache *bb_cache;
 
-#define __get_slob_block(b) ((unsigned long)(b) & ~(PAGE_SIZE-1))
+static struct semaphore	cache_chain_sem;
+static struct list_head cache_chain;
 
+#ifdef DEBUG_CACHE
+static void test_cache(kmem_cache_t *c)
+{
+	slob_t *cur = c->slobfree;
+	unsigned int x = -1 >> 2;
+
+	do {
+		BUG_ON(!cur->next);
+		cur = cur->next;
+	} while (cur != c->slobfree && --x);
+	BUG_ON(!x);
+}
+#else
+#define test_cache(x) do {} while(0)
+#endif
+
+/*
+ * Here we take advantage of the lru field of the pages that
+ * map to the pages we use in the SLOB.  This is done similar
+ * to what is done with SLAB.
+ *
+ * The lru.next field is used to get the bigblock descriptor
+ *    for large blocks larger than PAGE_SIZE >> 1.
+ *
+ * Set and retrieved by set_slob_block and get_slob_block
+ * respectively.
+ *
+ * The lru.prev field is used to find the cache descriptor
+ *   for small blocks smaller than or equal to PAGE_SIZE >> 1.
+ *
+ * Set and retrieved by set_slob_ptr and get_slob_ptr
+ * respectively.
+ *
+ * The use of lru.next tells us in kmalloc that the page is large.
+ */
 static inline struct page *get_slob_page(const void *mem)
 {
-	void *virt = (void*)__get_slob_block(mem);
-
-	return virt_to_page(virt);
+	return virt_to_page(mem);
 }
 
 static inline void zero_slob_block(const void *b)
@@ -86,20 +151,39 @@ static inline void set_slob_block(const 
 	page->lru.next = data;
 }
 
-static void slob_free(void *b, int size);
-static void slob_timer_cbk(void);
+static inline void *get_slob_ptr(const void *b)
+{
+	struct page *page;
+	page = get_slob_page(b);
+	return page->lru.prev;
+}
+
+static inline void set_slob_ptr(const void *b, void *data)
+{
+	struct page *page;
+	page = get_slob_page(b);
+	page->lru.prev = data;
+}
 
+static void slob_free(kmem_cache_t *cachep, void *b, int size);
 
-static void *slob_alloc(size_t size, gfp_t gfp, int align)
+static void *slob_alloc(kmem_cache_t *cachep, gfp_t gfp, int align)
 {
+	size_t size;
 	slob_t *prev, *cur, *aligned = 0;
-	int delta = 0, units = SLOB_UNITS(size);
+	int delta = 0, units;
 	unsigned long flags;
 
-	spin_lock_irqsave(&slob_lock, flags);
-	prev = slobfree;
+	size = cachep->size;
+	units = SLOB_UNITS(size);
+	BUG_ON(!units);
+
+	spin_lock_irqsave(&cachep->lock, flags);
+	prev = cachep->slobfree;
 	for (cur = prev->next; ; prev = cur, cur = cur->next) {
 		if (align) {
+			while (align < SLOB_UNIT)
+				align <<= 1;
 			aligned = (slob_t *)ALIGN((unsigned long)cur, align);
 			delta = aligned - cur;
 		}
@@ -122,12 +206,16 @@ static void *slob_alloc(size_t size, gfp
 				cur->units = units;
 			}
 
-			slobfree = prev;
-			spin_unlock_irqrestore(&slob_lock, flags);
+			cachep->slobfree = prev;
+			test_cache(cachep);
+			if (prev < prev->next)
+				BUG_ON(cur + cur->units > prev->next);
+			spin_unlock_irqrestore(&cachep->lock, flags);
 			return cur;
 		}
-		if (cur == slobfree) {
-			spin_unlock_irqrestore(&slob_lock, flags);
+		if (cur == cachep->slobfree) {
+			test_cache(cachep);
+			spin_unlock_irqrestore(&cachep->lock, flags);
 
 			if (size == PAGE_SIZE) /* trying to shrink arena? */
 				return 0;
@@ -137,14 +225,15 @@ static void *slob_alloc(size_t size, gfp
 				return 0;
 
 			zero_slob_block(cur);
-			slob_free(cur, PAGE_SIZE);
-			spin_lock_irqsave(&slob_lock, flags);
-			cur = slobfree;
+			set_slob_ptr(cur, cachep);
+			slob_free(cachep, cur, PAGE_SIZE);
+			spin_lock_irqsave(&cachep->lock, flags);
+			cur = cachep->slobfree;
 		}
 	}
 }
 
-static void slob_free(void *block, int size)
+static void slob_free(kmem_cache_t *cachep, void *block, int size)
 {
 	slob_t *cur, *b = (slob_t *)block;
 	unsigned long flags;
@@ -156,26 +245,29 @@ static void slob_free(void *block, int s
 		b->units = SLOB_UNITS(size);
 
 	/* Find reinsertion point */
-	spin_lock_irqsave(&slob_lock, flags);
-	for (cur = slobfree; !(b > cur && b < cur->next); cur = cur->next)
+	spin_lock_irqsave(&cachep->lock, flags);
+	for (cur = cachep->slobfree; !(b > cur && b < cur->next); cur = cur->next)
 		if (cur >= cur->next && (b > cur || b < cur->next))
 			break;
 
 	if (b + b->units == cur->next) {
 		b->units += cur->next->units;
 		b->next = cur->next->next;
+		BUG_ON(cur->next == &cachep->arena);
 	} else
 		b->next = cur->next;
 
 	if (cur + cur->units == b) {
 		cur->units += b->units;
 		cur->next = b->next;
+		BUG_ON(b == &cachep->arena);
 	} else
 		cur->next = b;
 
-	slobfree = cur;
+	cachep->slobfree = cur;
 
-	spin_unlock_irqrestore(&slob_lock, flags);
+	test_cache(cachep);
+	spin_unlock_irqrestore(&cachep->lock, flags);
 }
 
 static int FASTCALL(find_order(int size));
@@ -189,15 +281,24 @@ static int fastcall find_order(int size)
 
 void *__kmalloc(size_t size, gfp_t gfp)
 {
-	slob_t *m;
 	bigblock_t *bb;
 
-	if (size < PAGE_SIZE - SLOB_UNIT) {
-		m = slob_alloc(size + SLOB_UNIT, gfp, 0);
-		return m ? (void *)(m + 1) : 0;
+	/*
+	 * If the size is less than PAGE_SIZE >> 1 then
+	 * we use the generic caches.  Otherwise, we
+	 * just allocate the necessary pages.
+	 */
+	if (size <= MAX_SLOB_CACHE_SIZE) {
+		int i;
+		int order;
+		for (i=0, order=32; i < NR_SLOB_CACHES; i++, order <<= 1)
+			if (size <= order)
+				break;
+		BUG_ON(i == NR_SLOB_CACHES);
+		return kmem_cache_alloc(cache_sizes[i], gfp);
 	}
 
-	bb = slob_alloc(sizeof(bigblock_t), gfp, 0);
+	bb = slob_alloc(bb_cache, gfp, 0);
 	if (!bb)
 		return 0;
 
@@ -209,26 +310,33 @@ void *__kmalloc(size_t size, gfp_t gfp)
 		return bb->pages;
 	}
 
-	slob_free(bb, sizeof(bigblock_t));
+	slob_free(bb_cache, bb, sizeof(bigblock_t));
 	return 0;
 }
 EXPORT_SYMBOL(__kmalloc);
 
 void kfree(const void *block)
 {
+	kmem_cache_t *c;
 	bigblock_t *bb;
 
 	if (!block)
 		return;
 
+	/*
+	 * look into the page of the allocated block to
+	 * see if this is a big allocation or not.
+	 */
 	bb = get_slob_block(block);
 	if (bb) {
 		free_pages((unsigned long)block, bb->order);
-		slob_free(bb, sizeof(bigblock_t));
+		slob_free(bb_cache, bb, sizeof(bigblock_t));
 		return;
 	}
 
-	slob_free((slob_t *)block - 1, 0);
+	c = get_slob_ptr(block);
+	kmem_cache_free(c, (void *)block);
+
 	return;
 }
 
@@ -237,6 +345,7 @@ EXPORT_SYMBOL(kfree);
 unsigned int ksize(const void *block)
 {
 	bigblock_t *bb;
+	kmem_cache_t *c;
 
 	if (!block)
 		return 0;
@@ -245,14 +354,16 @@ unsigned int ksize(const void *block)
 	if (bb)
 		return PAGE_SIZE << bb->order;
 
-	return ((slob_t *)block - 1)->units * SLOB_UNIT;
+	c = get_slob_ptr(block);
+	return c->size;
 }
 
-struct kmem_cache {
-	unsigned int size, align;
-	const char *name;
-	void (*ctor)(void *, struct kmem_cache *, unsigned long);
-	void (*dtor)(void *, struct kmem_cache *, unsigned long);
+static slob_t cache_arena = { .next = &cache_arena, .units = 0 };
+struct kmem_cache cache_cache = {
+	.name = "cache",
+	.slobfree = &cache_cache.arena,
+	.arena = { .next = &cache_cache.arena, .units = 0 },
+	.lock = SPIN_LOCK_UNLOCKED
 };
 
 struct kmem_cache *kmem_cache_create(const char *name, size_t size,
@@ -261,8 +372,22 @@ struct kmem_cache *kmem_cache_create(con
 	void (*dtor)(void*, struct kmem_cache *, unsigned long))
 {
 	struct kmem_cache *c;
+	void *p;
+
+	c = slob_alloc(&cache_cache, flags, 0);
+
+	memset(c, 0, sizeof(*c));
 
-	c = slob_alloc(sizeof(struct kmem_cache), flags, 0);
+	c->size = PAGE_SIZE;
+	c->arena.next = &c->arena;
+	c->arena.units = 0;
+	c->slobfree = &c->arena;
+	atomic_set(&c->items, 0);
+	spin_lock_init(&c->lock);
+
+	p = slob_alloc(c, 0, PAGE_SIZE-1);
+	if (p)
+		free_page((unsigned long)p);
 
 	if (c) {
 		c->name = name;
@@ -274,6 +399,9 @@ struct kmem_cache *kmem_cache_create(con
 		if (c->align < align)
 			c->align = align;
 	}
+	down(&cache_chain_sem);
+	list_add_tail(&c->list, &cache_chain);
+	up(&cache_chain_sem);
 
 	return c;
 }
@@ -281,7 +409,17 @@ EXPORT_SYMBOL(kmem_cache_create);
 
 void kmem_cache_destroy(struct kmem_cache *c)
 {
-	slob_free(c, sizeof(struct kmem_cache));
+	down(&cache_chain_sem);
+	list_del(&c->list);
+	up(&cache_chain_sem);
+
+	BUG_ON(atomic_read(&c->items));
+
+	/*
+	 * WARNING!!! Memory leak!
+	 */
+	printk("FIX ME: need to free memory\n");
+	slob_free(&cache_cache, c, sizeof(struct kmem_cache));
 }
 EXPORT_SYMBOL(kmem_cache_destroy);
 
@@ -289,11 +427,16 @@ void *kmem_cache_alloc(struct kmem_cache
 {
 	void *b;
 
-	if (c->size < PAGE_SIZE)
-		b = slob_alloc(c->size, flags, c->align);
+	atomic_inc(&c->items);
+
+	if (c->size <= MAX_SLOB_CACHE_SIZE)
+		b = slob_alloc(c, flags, c->align);
 	else
 		b = (void *)__get_free_pages(flags, find_order(c->size));
 
+	if (!b)
+		return 0;
+
 	if (c->ctor)
 		c->ctor(b, c, SLAB_CTOR_CONSTRUCTOR);
 
@@ -313,11 +456,13 @@ EXPORT_SYMBOL(kmem_cache_zalloc);
 
 void kmem_cache_free(struct kmem_cache *c, void *b)
 {
+	atomic_dec(&c->items);
+
 	if (c->dtor)
 		c->dtor(b, c, 0);
 
-	if (c->size < PAGE_SIZE)
-		slob_free(b, c->size);
+	if (c->size <= MAX_SLOB_CACHE_SIZE)
+		slob_free(c, b, c->size);
 	else
 		free_pages((unsigned long)b, find_order(c->size));
 }
@@ -335,9 +480,6 @@ const char *kmem_cache_name(struct kmem_
 }
 EXPORT_SYMBOL(kmem_cache_name);
 
-static struct timer_list slob_timer = TIMER_INITIALIZER(
-	(void (*)(unsigned long))slob_timer_cbk, 0, 0);
-
 int kmem_cache_shrink(struct kmem_cache *d)
 {
 	return 0;
@@ -349,17 +491,55 @@ int kmem_ptr_validate(struct kmem_cache 
 	return 0;
 }
 
-void __init kmem_cache_init(void)
+static char cache_names[NR_SLOB_CACHES][15];
+
+void kmem_cache_init(void)
 {
-	slob_timer_cbk();
+	static int done;
+	void *p;
+
+	if (!done) {
+		int i;
+		int size = 32;
+		done = 1;
+
+		init_MUTEX(&cache_chain_sem);
+		INIT_LIST_HEAD(&cache_chain);
+
+		cache_cache.size = PAGE_SIZE;
+		p = slob_alloc(&cache_cache, 0, PAGE_SIZE-1);
+		if (p)
+			free_page((unsigned long)p);
+		cache_cache.size = sizeof(struct kmem_cache);
+
+		bb_cache = kmem_cache_create("bb_cache",sizeof(bigblock_t), 0,
+					     GFP_KERNEL, NULL, NULL);
+		for (i=0; i < NR_SLOB_CACHES; i++, size <<= 1)
+			cache_sizes[i] = kmem_cache_create(cache_names[i], size, 0,
+							   GFP_KERNEL, NULL, NULL);
+	}
 }
 
-static void slob_timer_cbk(void)
+static void test_slob(slob_t *s)
 {
-	void *p = slob_alloc(PAGE_SIZE, 0, PAGE_SIZE-1);
+	slob_t *p;
+	long x = 0;
 
-	if (p)
-		free_page((unsigned long)p);
+	for (p=s->next; p != s && x < 10000; p = p->next, x++)
+		printk(".");
+}
+
+void print_slobs(void)
+{
+	struct list_head *curr;
+
+	list_for_each(curr, &cache_chain) {
+		kmem_cache_t *c = list_entry(curr, struct kmem_cache, list);
 
-	mod_timer(&slob_timer, jiffies + HZ);
+		printk("%s items:%d",
+		       c->name?:"<none>",
+		       atomic_read(&c->items));
+		test_slob(&c->arena);
+		printk("\n");
+	}
 }

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-08  9:54       ` Ingo Molnar
@ 2007-07-08 10:23         ` Nick Piggin
  2007-07-08 10:42           ` Ingo Molnar
  0 siblings, 1 reply; 111+ messages in thread
From: Nick Piggin @ 2007-07-08 10:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Christoph Lameter, linux-kernel, linux-mm, suresh.b.siddha,
	corey.d.gough, Pekka Enberg, akpm, Matt Mackall, Steven Rostedt

Ingo Molnar wrote:
> * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
> 
>>I said exactly the same thing last time this came up. I would love to 
>>remove code if its functionality can be adequately replaced by 
>>existing code, but I think your reasons for removing SLOB aren't that 
>>good, and just handwaving away the significant memory savings doesn't 
>>work.
> 
> 
> yeah. Also, the decision here is pretty easy: the behavior of the 
> allocator is not really visible to applications. So this isnt like 
> having a parallel IO scheduler or a parallel process scheduler (which 
> cause problems to us by fragmenting the application space) - so the 
> long-term cost to us kernel maintainers should be relatively low.

Yep.


>>>A year ago the -rt kernel defaulted to the SLOB for a few releases, 
>>>and barring some initial scalability issues (which were solved in 
>>>-rt) it worked pretty well on generic PCs, so i dont buy the 'it 
>>>doesnt work' argument either.
>>
>>It's actually recently been made to work on SMP, it is much more 
>>scalable to large memories, and some initial NUMA work is happening 
>>that some embedded guys are interested in, all without increasing 
>>static footprint too much, and it has actually decreased dynamic 
>>footprint too.
> 
> 
> cool. I was referring to something else: people were running -rt on 
> their beefy desktop boxes with several gigs of RAM they complained about 
> the slowdown that is caused by SLOB's linear list walking.

That is what I meant by scalable to large memories. It is not perfect,
but it is much better now. I noticed huge slowdowns too when test booting
the slob RCU patch on my 4GB desktop, so I did a few things to improve
freelist walking as well (the patches are in -mm, prefixed with slob-).

Afterwards, performance seems to be fairly good (obviously not as good
as SLAB or SLUB on such a configuration, but definitely usable and the
desktop was not noticably slower).


> Steve Rostedt did two nice changes to fix those scalability problems. 
> I've attached Steve's two patches. With these in place SLOB was very 
> usable for large systems as well, with no measurable overhead. 
> (obviously the lack of per-cpu caching can still be measured, but it's a 
> lot less of a problem in practice than the linear list walking was.)

Thanks for sending those. One is actually obsolete because we removed
bigblock list completely, however I had not seen the other one. Such an
approach could be used, OTOH, having all allocations come from the same
pool does have its advantages in terms of memory usage.

I don't think it has been quite decided on the next step to take with
SLOB, however I have an idea that if we had per-cpu freelists (where
other lists could be used as a fallback), then that would go a long way
to improving the SMP scalability, CPU cache hotness, and long list
walking issues all at once.

However I like the fact that there is no need for a big rush to improve
it, and so patches and ideas can be brewed up slowly :)

Thanks,
Nick

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-08 10:23         ` Nick Piggin
@ 2007-07-08 10:42           ` Ingo Molnar
  0 siblings, 0 replies; 111+ messages in thread
From: Ingo Molnar @ 2007-07-08 10:42 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Christoph Lameter, linux-kernel, linux-mm, suresh.b.siddha,
	corey.d.gough, Pekka Enberg, akpm, Matt Mackall, Steven Rostedt


* Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> >cool. I was referring to something else: people were running -rt on 
> >their beefy desktop boxes with several gigs of RAM they complained 
> >about the slowdown that is caused by SLOB's linear list walking.
> 
> That is what I meant by scalable to large memories. It is not perfect, 
> but it is much better now. I noticed huge slowdowns too when test 
> booting the slob RCU patch on my 4GB desktop, so I did a few things to 
> improve freelist walking as well (the patches are in -mm, prefixed 
> with slob-).

ah, good - i only looked at the upstream mm/slob.c git-log. I like those 
4 slob-* patches in -mm: in particular slob-remove-bigblock-tracking.patch
and slob-rework-freelist-handling.patch are really elegant. Simplicity of 
slob does not seem to suffer either.

	Ingo

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
  2007-07-08  3:49 [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance Christoph Lameter
                   ` (10 preceding siblings ...)
  2007-07-08  4:37 ` [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance David Miller
@ 2007-07-08 11:20 ` Andi Kleen
  2007-07-09 15:50     ` Christoph Lameter
  11 siblings, 1 reply; 111+ messages in thread
From: Andi Kleen @ 2007-07-08 11:20 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-kernel, linux-mm

Christoph Lameter <clameter@sgi.com> writes:

> A cmpxchg is less costly than interrupt enabe/disable

That sounds wrong.

-Andi

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-08  7:51   ` Ingo Molnar
  2007-07-08  9:43     ` Nick Piggin
@ 2007-07-08 18:02     ` Andrew Morton
  2007-07-09  2:57       ` Nick Piggin
  2007-07-09 21:57       ` Matt Mackall
  2007-07-09 12:31     ` Matthieu CASTET
  2007-07-09 16:00     ` Christoph Lameter
  3 siblings, 2 replies; 111+ messages in thread
From: Andrew Morton @ 2007-07-08 18:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Christoph Lameter, linux-kernel, linux-mm, suresh.b.siddha,
	corey.d.gough, Pekka Enberg, Matt Mackall

On Sun, 8 Jul 2007 09:51:19 +0200 Ingo Molnar <mingo@elte.hu> wrote:

> 
> (added Matt to the Cc: list)
> 
> * Christoph Lameter <clameter@sgi.com> wrote:
> 
> > Maintenance of slab allocators becomes a problem as new features for 
> > allocators are developed. The SLOB allocator in particular has been 
> > lagging behind in many ways in the past:
> > 
> > - Had no support for SLAB_DESTROY_BY_RCU for years (but no one 
> >   noticed)
> > 
> > - Still has no support for slab reclaim counters. This may currently 
> >   not be necessary if one would restrict the supported configurations 
> >   for functionality relying on these. But even that has not been done.
> > 
> > The only current advantage over SLUB in terms of memory savings is 
> > through SLOBs kmalloc layout that is not power of two based like SLAB 
> > and SLUB which allows to eliminate some memory waste.
> > 
> > Through that SLOB has still a slight memory advantage over SLUB of 
> > ~350k in for a standard server configuration. It is likely that the 
> > savings are is smaller for real embedded configurations that have less 
> > functionality.
> 
> actually, one real advantage of the SLOB is that it is a minimal, really 
> simple allocator. Its text and data size is so small as well.
> 
> here's the size comparison:
> 
>    text    data     bss     dec     hex filename
>   10788     837      16   11641    2d79 mm/slab.o
>    6205    4207     124   10536    2928 mm/slub.o
>    1640      44       4    1688     698 mm/slob.o
> 
> slab/slub have roughly the same footprint, but slob is 10% of that size. 
> Would be a waste to throw this away.
> 
> A year ago the -rt kernel defaulted to the SLOB for a few releases, and 
> barring some initial scalability issues (which were solved in -rt) it 
> worked pretty well on generic PCs, so i dont buy the 'it doesnt work' 
> argument either.
> 

I don't think a saving of a few k of text would justify slob's retention.

A reason for retaining slob would be that it has some O(n) memory saving
due to better packing, etc.  Indeed that was the reason for merging it in
the first place.  If slob no longer retains that advantage (wrt slub) then
we no longer need it.


Guys, look at this the other way.  Suppose we only had slub, and someone
came along and said "here's a whole new allocator which saves 4.5k of
text", would we merge it on that basis?  Hell no, it's not worth it.  What
we might do is to get motivated to see if we can make slub less porky under
appropriate config settings.


Let's not get sentimental about these things: in general, if there's any
reasonable way in which we can rid ourselves of any code at all, we should
do so, no?

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-08 18:02     ` Andrew Morton
@ 2007-07-09  2:57       ` Nick Piggin
  2007-07-09 11:04         ` Pekka Enberg
  2007-07-09 16:06         ` Christoph Lameter
  2007-07-09 21:57       ` Matt Mackall
  1 sibling, 2 replies; 111+ messages in thread
From: Nick Piggin @ 2007-07-09  2:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Christoph Lameter, linux-kernel, linux-mm,
	suresh.b.siddha, corey.d.gough, Pekka Enberg, Matt Mackall,
	Denis Vlasenko, Erik Andersen

Andrew Morton wrote:
> On Sun, 8 Jul 2007 09:51:19 +0200 Ingo Molnar <mingo@elte.hu> wrote:

>>A year ago the -rt kernel defaulted to the SLOB for a few releases, and 
>>barring some initial scalability issues (which were solved in -rt) it 
>>worked pretty well on generic PCs, so i dont buy the 'it doesnt work' 
>>argument either.
>>
> 
> 
> I don't think a saving of a few k of text would justify slob's retention.

Probably not.


> A reason for retaining slob would be that it has some O(n) memory saving
> due to better packing, etc.  Indeed that was the reason for merging it in
> the first place.  If slob no longer retains that advantage (wrt slub) then
> we no longer need it.

SLOB contains several significant O(1) and also O(n) memory savings that
are so far impossible-by-design for SLUB. They are: slab external
fragmentation is significantly reduced; kmalloc internal fragmentation is
significantly reduced; order of magnitude smaller kmem_cache data type;
order of magnitude less code...

Actually with an unscientific test boot of a semi-stripped down kernel and
mem=16MB, SLOB (the version in -mm) uses 400K less than SLUB (or a full 50%
more RAM free after booting into bash as the init).

Now it's not for me to say that this is significant enough to make SLOB
worth keeping, or what sort of results it yields in the field, so I cc'ed
Denis who is the busybox maintainer, and Erik who is ulibc maintainer in
case they have anything to add.


> Guys, look at this the other way.  Suppose we only had slub, and someone
> came along and said "here's a whole new allocator which saves 4.5k of
> text", would we merge it on that basis?  Hell no, it's not worth it.  What
> we might do is to get motivated to see if we can make slub less porky under
> appropriate config settings.


In light of Denis's recent statement I saw "In busybox project people
can kill for 1.7K", there might be a mass killing of kernel developers
in Cambridge this year if SLOB gets removed ;)

Joking aside, the last time this came up, I thought we concluded that
removal of SLOB would be a severe regression for a significant userbase.


> Let's not get sentimental about these things: in general, if there's any
> reasonable way in which we can rid ourselves of any code at all, we should
> do so, no?

Definitely. And this is exactly what we said last time as well. If the
small memory embedded guys are happy for SLOB to go, then I'm happy too.
So, the relevant question is -- are most/all current SLOB users are now
happy to switch over to SLUB, in light of the recent advances to both
allocators?

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-09  2:57       ` Nick Piggin
@ 2007-07-09 11:04         ` Pekka Enberg
  2007-07-09 11:16           ` Nick Piggin
  2007-07-09 16:08             ` Christoph Lameter
  2007-07-09 16:06         ` Christoph Lameter
  1 sibling, 2 replies; 111+ messages in thread
From: Pekka Enberg @ 2007-07-09 11:04 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, Ingo Molnar, Christoph Lameter, linux-kernel,
	linux-mm, suresh.b.siddha, corey.d.gough, Matt Mackall,
	Denis Vlasenko, Erik Andersen

Hi Nick,

On 7/9/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> SLOB contains several significant O(1) and also O(n) memory savings that
> are so far impossible-by-design for SLUB. They are: slab external
> fragmentation is significantly reduced; kmalloc internal fragmentation is
> significantly reduced; order of magnitude smaller kmem_cache data type;
> order of magnitude less code...

I assume with "slab external fragmentation" you mean allocating a
whole page for a slab when there are not enough objects to fill the
whole thing thus wasting memory? We could try to combat that by
packing multiple variable-sized slabs within a single page. Also,
adding some non-power-of-two kmalloc caches might help with internal
fragmentation.

In any case, SLUB needs some serious tuning for smaller machines
before we can get rid of SLOB.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-09 11:04         ` Pekka Enberg
@ 2007-07-09 11:16           ` Nick Piggin
  2007-07-09 12:47             ` Pekka Enberg
  2007-07-09 13:46             ` Pekka J Enberg
  2007-07-09 16:08             ` Christoph Lameter
  1 sibling, 2 replies; 111+ messages in thread
From: Nick Piggin @ 2007-07-09 11:16 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andrew Morton, Ingo Molnar, Christoph Lameter, linux-kernel,
	linux-mm, suresh.b.siddha, corey.d.gough, Matt Mackall,
	Denis Vlasenko, Erik Andersen

Pekka Enberg wrote:
> Hi Nick,
> 
> On 7/9/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
>> SLOB contains several significant O(1) and also O(n) memory savings that
>> are so far impossible-by-design for SLUB. They are: slab external
>> fragmentation is significantly reduced; kmalloc internal fragmentation is
>> significantly reduced; order of magnitude smaller kmem_cache data type;
>> order of magnitude less code...
> 
> 
> I assume with "slab external fragmentation" you mean allocating a
> whole page for a slab when there are not enough objects to fill the
> whole thing thus wasting memory?

Yep. Without really analysing it, I guess SLOB's savings here are
O(1) over SLUB and will relatively diminish as the machine size gets
larger, however with the number of slabs even a small kernel creates,
this is likely to be significant on small memory systems.


> We could try to combat that by
> packing multiple variable-sized slabs within a single page. Also,

Yeah, that could help.


> adding some non-power-of-two kmalloc caches might help with internal
> fragmentation.

That too, although of course it will work against the external
fragmentation problem. This is more of an O(n) problem and may not
be responsible for as much waste as the first issue on small memory
machines (I haven't done detailed profiling so I don't know).


> In any case, SLUB needs some serious tuning for smaller machines
> before we can get rid of SLOB.

I would always be happy to see the default allocator become more
space efficient... I think it would be most productive to get
detailed profiles of exactly where the memory is being wasted on
small mem= boots. Although I don't think matching SLOB would be
an easy task at all.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-08  7:51   ` Ingo Molnar
  2007-07-08  9:43     ` Nick Piggin
  2007-07-08 18:02     ` Andrew Morton
@ 2007-07-09 12:31     ` Matthieu CASTET
  2007-07-09 16:00     ` Christoph Lameter
  3 siblings, 0 replies; 111+ messages in thread
From: Matthieu CASTET @ 2007-07-09 12:31 UTC (permalink / raw)
  To: linux-kernel

Ingo Molnar <mingo <at> elte.hu> writes:


> A year ago the -rt kernel defaulted to the SLOB for a few releases, and 
> barring some initial scalability issues (which were solved in -rt) it 
> worked pretty well on generic PCs, so i dont buy the 'it doesnt work' 
> argument either.
> 
Last time I tried it, I cannot mount (or mount very slowly) jffs2 flash with 
slob.
Changing slob to slab fix the issue.


Matthieu


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-09 11:16           ` Nick Piggin
@ 2007-07-09 12:47             ` Pekka Enberg
  2007-07-09 13:46             ` Pekka J Enberg
  1 sibling, 0 replies; 111+ messages in thread
From: Pekka Enberg @ 2007-07-09 12:47 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, Ingo Molnar, Christoph Lameter, linux-kernel,
	linux-mm, suresh.b.siddha, corey.d.gough, Matt Mackall,
	Denis Vlasenko, Erik Andersen

Hi Nick,

Pekka Enberg wrote:
> > I assume with "slab external fragmentation" you mean allocating a
> > whole page for a slab when there are not enough objects to fill the
> > whole thing thus wasting memory?

On 7/9/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> Yep. Without really analysing it, I guess SLOB's savings here are
> O(1) over SLUB and will relatively diminish as the machine size gets
> larger, however with the number of slabs even a small kernel creates,
> this is likely to be significant on small memory systems.

Running the included script on my little Debian on UML with 32 MB of
RAM shows anywhere from 20 KB up to 100 KB of wasted space on light
load. What's interesting is that the wasted amount seems to stabilize
around 70 KB and never goes below that.

                                               Pekka

#!/bin/bash

total_wasted=0

for i in $(find /sys/slab -type d -mindepth 1 -maxdepth 1 | sort)
do
  nr_objs=$(cat $i/objects)
  slabs=$(cat $i/slabs)
  objs_per_slab=$(cat $i/objs_per_slab)
  let "max_objs=$objs_per_slab*$slabs"

  object_size=$(cat $i/object_size)
  let "wasted=($max_objs-$nr_objs)*$object_size"

  if [ "$wasted" -ne "0" ]; then
    echo "$i: max_objs=$max_objs, nr_objs=$nr_objs, $wasted bytes wasted"
  fi
  let "total_wasted=$total_wasted+$wasted"
done

echo "Total wasted: $total_wasted"

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-09 11:16           ` Nick Piggin
  2007-07-09 12:47             ` Pekka Enberg
@ 2007-07-09 13:46             ` Pekka J Enberg
  1 sibling, 0 replies; 111+ messages in thread
From: Pekka J Enberg @ 2007-07-09 13:46 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, Ingo Molnar, Christoph Lameter, linux-kernel,
	linux-mm, suresh.b.siddha, corey.d.gough, Matt Mackall,
	Denis Vlasenko, Erik Andersen

Hi Nick,

Pekka Enberg wrote:
> > adding some non-power-of-two kmalloc caches might help with internal
> > fragmentation.

On Mon, 9 Jul 2007, Nick Piggin wrote:
> That too, although of course it will work against the external
> fragmentation problem. This is more of an O(n) problem and may not
> be responsible for as much waste as the first issue on small memory
> machines (I haven't done detailed profiling so I don't know).

I would have thought so too, but my crude hack to approximate internal 
fragmentation says otherwise. On the same Debian on UML virtual machine I 
see up to 190 KB of wasted space due to internal fragmentation (average 
allocation size being considerably smaller than object size for cache) 
with the biggest offender being kmalloc-512.

But, what we really need is some real workloads on small machines using 
something resembling my scripts to figure out the memory profile for SLUB.

			Pekka

#!/bin/bash

total_wasted=0

for i in $(find /sys/slab -type d -mindepth 1 -maxdepth 1 | sort)
do
  slabs=$(cat $i/slabs)
  objs_per_slab=$(cat $i/objs_per_slab)
  let "max_objs=$objs_per_slab*$slabs"

  object_size=$(cat $i/object_size)
  average_alloc_size=$(cat $i/average_alloc_size)

  if [ "0" -ne "$average_alloc_size" ]; then
    let "wasted=($object_size-$average_alloc_size)*max_objs"
    echo "$i: object_size=$object_size, 
average_alloc_size=$average_alloc_size, $wasted bytes wasted"
    let "total_wasted=$total_wasted+$wasted"
  fi
done
echo "Total internal fragmentation: $total_wasted bytes"

---
 include/linux/slub_def.h |    1 +
 mm/slub.c                |   20 ++++++++++++++++++--
 2 files changed, 19 insertions(+), 2 deletions(-)

Index: 2.6/include/linux/slub_def.h
===================================================================
--- 2.6.orig/include/linux/slub_def.h	2007-07-09 16:09:24.000000000 +0300
+++ 2.6/include/linux/slub_def.h	2007-07-09 16:18:42.000000000 +0300
@@ -29,6 +29,7 @@ struct kmem_cache {
 	int objsize;		/* The size of an object without meta data */
 	int offset;		/* Free pointer offset. */
 	int order;
+	int average_alloc_size;
 
 	/*
 	 * Avoid an extra cache line for UP, SMP and for the node local to
Index: 2.6/mm/slub.c
===================================================================
--- 2.6.orig/mm/slub.c	2007-07-09 16:09:24.000000000 +0300
+++ 2.6/mm/slub.c	2007-07-09 16:35:53.000000000 +0300
@@ -2238,12 +2238,22 @@ 	BUG_ON(index < 0);
 	return &kmalloc_caches[index];
 }
 
+static void update_avg(struct kmem_cache *s, size_t size)
+{
+	if (s->average_alloc_size)
+		s->average_alloc_size = (s->average_alloc_size + size) / 2;
+	else
+		s->average_alloc_size = size;
+}
+
 void *__kmalloc(size_t size, gfp_t flags)
 {
 	struct kmem_cache *s = get_slab(size, flags);
 
-	if (s)
+	if (s) {
+		update_avg(s, size);
 		return slab_alloc(s, flags, -1, __builtin_return_address(0));
+	}
 	return ZERO_SIZE_PTR;
 }
 EXPORT_SYMBOL(__kmalloc);
@@ -2253,8 +2263,11 @@ void *__kmalloc_node(size_t size, gfp_t 
 {
 	struct kmem_cache *s = get_slab(size, flags);
 
-	if (s)
+	if (s) {
+		update_avg(s, size);
 		return slab_alloc(s, flags, node, __builtin_return_address(0));
+	}
+
 	return ZERO_SIZE_PTR;
 }
 EXPORT_SYMBOL(__kmalloc_node);
@@ -2677,6 +2690,8 @@ void *__kmalloc_track_caller(size_t size
 	if (!s)
 		return ZERO_SIZE_PTR;
 
+	update_avg(s, size);
+
 	return slab_alloc(s, gfpflags, -1, caller);
 }
 
@@ -2688,6 +2703,8 @@ void *__kmalloc_node_track_caller(size_t
 	if (!s)
 		return ZERO_SIZE_PTR;
 
+	update_avg(s, size);
+
 	return slab_alloc(s, gfpflags, node, caller);
 }
 
@@ -3268,6 +3285,12 @@ static ssize_t objects_show(struct kmem_
 }
 SLAB_ATTR_RO(objects);
 
+static ssize_t average_alloc_size_show(struct kmem_cache *s, char *buf)
+{
+	return sprintf(buf, "%d\n", s->average_alloc_size);
+}
+SLAB_ATTR_RO(average_alloc_size);
+
 static ssize_t sanity_checks_show(struct kmem_cache *s, char *buf)
 {
 	return sprintf(buf, "%d\n", !!(s->flags & SLAB_DEBUG_FREE));
@@ -3466,6 +3489,7 @@ static struct attribute * slab_attrs[] =
 	&order_attr.attr,
 	&objects_attr.attr,
 	&slabs_attr.attr,
+	&average_alloc_size_attr.attr,
 	&partial_attr.attr,
 	&cpu_slabs_attr.attr,
 	&ctor_attr.attr,

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
  2007-07-08  4:37 ` [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance David Miller
@ 2007-07-09 15:45   ` Christoph Lameter
  2007-07-09 19:43     ` David Miller
  0 siblings, 1 reply; 111+ messages in thread
From: Christoph Lameter @ 2007-07-09 15:45 UTC (permalink / raw)
  To: David Miller
  Cc: linux-kernel, linux-mm, suresh.b.siddha, corey.d.gough, penberg, akpm

On Sat, 7 Jul 2007, David Miller wrote:

> From: Christoph Lameter <clameter@sgi.com>
> Date: Sat, 07 Jul 2007 20:49:52 -0700
> 
> > A cmpxchg is less costly than interrupt enabe/disable
> 
> This is cpu dependant, and in fact not true at all on Niagara
> and several of the cpus in the UltraSPARC family.

Hmmm... So have alternate aloc/free paths depending on the cpu?


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
  2007-07-08 11:20 ` Andi Kleen
@ 2007-07-09 15:50     ` Christoph Lameter
  0 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-09 15:50 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, linux-mm, mbligh

On Sun, 8 Jul 2007, Andi Kleen wrote:

> Christoph Lameter <clameter@sgi.com> writes:
> 
> > A cmpxchg is less costly than interrupt enabe/disable
> 
> That sounds wrong.

Martin Bligh was able to significantly increase his LTTng performance 
by using cmpxchg. See his article in the 2007 proceedings of the OLS 
Volume 1, page 39.

His numbers were:

interrupts enable disable : 210.6ns
local cmpxchg             : 9.0ns

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
@ 2007-07-09 15:50     ` Christoph Lameter
  0 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-09 15:50 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, linux-mm, mbligh

On Sun, 8 Jul 2007, Andi Kleen wrote:

> Christoph Lameter <clameter@sgi.com> writes:
> 
> > A cmpxchg is less costly than interrupt enabe/disable
> 
> That sounds wrong.

Martin Bligh was able to significantly increase his LTTng performance 
by using cmpxchg. See his article in the 2007 proceedings of the OLS 
Volume 1, page 39.

His numbers were:

interrupts enable disable : 210.6ns
local cmpxchg             : 9.0ns

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
  2007-07-09 15:50     ` Christoph Lameter
@ 2007-07-09 15:59       ` Martin Bligh
  -1 siblings, 0 replies; 111+ messages in thread
From: Martin Bligh @ 2007-07-09 15:59 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Andi Kleen, linux-kernel, linux-mm

Christoph Lameter wrote:
> On Sun, 8 Jul 2007, Andi Kleen wrote:
> 
>> Christoph Lameter <clameter@sgi.com> writes:
>>
>>> A cmpxchg is less costly than interrupt enabe/disable
>> That sounds wrong.
> 
> Martin Bligh was able to significantly increase his LTTng performance 
> by using cmpxchg. See his article in the 2007 proceedings of the OLS 
> Volume 1, page 39.
> 
> His numbers were:
> 
> interrupts enable disable : 210.6ns
> local cmpxchg             : 9.0ns

Those numbers came from Mathieu Desnoyers (LTTng) if you
want more details.


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
@ 2007-07-09 15:59       ` Martin Bligh
  0 siblings, 0 replies; 111+ messages in thread
From: Martin Bligh @ 2007-07-09 15:59 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Andi Kleen, linux-kernel, linux-mm

Christoph Lameter wrote:
> On Sun, 8 Jul 2007, Andi Kleen wrote:
> 
>> Christoph Lameter <clameter@sgi.com> writes:
>>
>>> A cmpxchg is less costly than interrupt enabe/disable
>> That sounds wrong.
> 
> Martin Bligh was able to significantly increase his LTTng performance 
> by using cmpxchg. See his article in the 2007 proceedings of the OLS 
> Volume 1, page 39.
> 
> His numbers were:
> 
> interrupts enable disable : 210.6ns
> local cmpxchg             : 9.0ns

Those numbers came from Mathieu Desnoyers (LTTng) if you
want more details.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-08  7:51   ` Ingo Molnar
                       ` (2 preceding siblings ...)
  2007-07-09 12:31     ` Matthieu CASTET
@ 2007-07-09 16:00     ` Christoph Lameter
  3 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-09 16:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-mm, suresh.b.siddha, corey.d.gough,
	Pekka Enberg, akpm, Matt Mackall

On Sun, 8 Jul 2007, Ingo Molnar wrote:

> actually, one real advantage of the SLOB is that it is a minimal, really 
> simple allocator. Its text and data size is so small as well.
> 
> here's the size comparison:
> 
>    text    data     bss     dec     hex filename
>   10788     837      16   11641    2d79 mm/slab.o
>    6205    4207     124   10536    2928 mm/slub.o
>    1640      44       4    1688     698 mm/slob.o
> 
> slab/slub have roughly the same footprint, but slob is 10% of that size. 
> Would be a waste to throw this away.

The last of my tests showed that SLOB is at about 50% of the size of 
SLUB. You need to compile SLUB in embedded mode with !CONFIG_SLUB_DEBUG to 
get a reduce code size.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-09  2:57       ` Nick Piggin
  2007-07-09 11:04         ` Pekka Enberg
@ 2007-07-09 16:06         ` Christoph Lameter
  2007-07-09 16:51           ` Andrew Morton
  2007-07-10  1:41           ` Nick Piggin
  1 sibling, 2 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-09 16:06 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, Ingo Molnar, linux-kernel, linux-mm,
	suresh.b.siddha, corey.d.gough, Pekka Enberg, Matt Mackall,
	Denis Vlasenko, Erik Andersen

On Mon, 9 Jul 2007, Nick Piggin wrote:

> > A reason for retaining slob would be that it has some O(n) memory saving
> > due to better packing, etc.  Indeed that was the reason for merging it in
> > the first place.  If slob no longer retains that advantage (wrt slub) then
> > we no longer need it.
> 
> SLOB contains several significant O(1) and also O(n) memory savings that
> are so far impossible-by-design for SLUB. They are: slab external
> fragmentation is significantly reduced; kmalloc internal fragmentation is
> significantly reduced; order of magnitude smaller kmem_cache data type;
> order of magnitude less code...

Well that is only true for kmalloc objects < PAGE_SIZE and to some extend 
offset by the need to keep per object data in SLUB. But yes the power of 
two caches are a necessary design feature of SLAB/SLUB that allows O(1) 
operations of kmalloc slabs which in turns causes memory wastage because 
of rounding of the alloc to the next power of two. SLUB has less wastage
there than SLAB since it can fit power of two object tightly into a slab 
instead of having to place additional control information there like SLAB.

O(n) memory savings? What is that?

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-09 11:04         ` Pekka Enberg
@ 2007-07-09 16:08             ` Christoph Lameter
  2007-07-09 16:08             ` Christoph Lameter
  1 sibling, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-09 16:08 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Nick Piggin, Andrew Morton, Ingo Molnar, linux-kernel, linux-mm,
	suresh.b.siddha, corey.d.gough, Matt Mackall, Denis Vlasenko,
	Erik Andersen

On Mon, 9 Jul 2007, Pekka Enberg wrote:

> I assume with "slab external fragmentation" you mean allocating a
> whole page for a slab when there are not enough objects to fill the
> whole thing thus wasting memory? We could try to combat that by
> packing multiple variable-sized slabs within a single page. Also,
> adding some non-power-of-two kmalloc caches might help with internal
> fragmentation.

Ther are already non-power-of-two kmalloc caches for 96 and 192 bytes 
sizes.
> 
> In any case, SLUB needs some serious tuning for smaller machines
> before we can get rid of SLOB.

Switch off CONFIG_SLUB_DEBUG to get memory savings.
 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
@ 2007-07-09 16:08             ` Christoph Lameter
  0 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-09 16:08 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Nick Piggin, Andrew Morton, Ingo Molnar, linux-kernel, linux-mm,
	suresh.b.siddha, corey.d.gough, Matt Mackall, Denis Vlasenko,
	Erik Andersen

On Mon, 9 Jul 2007, Pekka Enberg wrote:

> I assume with "slab external fragmentation" you mean allocating a
> whole page for a slab when there are not enough objects to fill the
> whole thing thus wasting memory? We could try to combat that by
> packing multiple variable-sized slabs within a single page. Also,
> adding some non-power-of-two kmalloc caches might help with internal
> fragmentation.

Ther are already non-power-of-two kmalloc caches for 96 and 192 bytes 
sizes.
> 
> In any case, SLUB needs some serious tuning for smaller machines
> before we can get rid of SLOB.

Switch off CONFIG_SLUB_DEBUG to get memory savings.
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-09 16:06         ` Christoph Lameter
@ 2007-07-09 16:51           ` Andrew Morton
  2007-07-09 17:26             ` Christoph Lameter
  2007-07-09 23:09             ` Matt Mackall
  2007-07-10  1:41           ` Nick Piggin
  1 sibling, 2 replies; 111+ messages in thread
From: Andrew Morton @ 2007-07-09 16:51 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Nick Piggin, Ingo Molnar, linux-kernel, linux-mm,
	suresh.b.siddha, corey.d.gough, Pekka Enberg, Matt Mackall,
	Denis Vlasenko, Erik Andersen

On Mon, 9 Jul 2007 09:06:46 -0700 (PDT) Christoph Lameter <clameter@sgi.com> wrote:

> But yes the power of 
> two caches are a necessary design feature of SLAB/SLUB that allows O(1) 
> operations of kmalloc slabs which in turns causes memory wastage because 
> of rounding of the alloc to the next power of two.

I've frequently wondered why we don't just create more caches for kmalloc:
make it denser than each-power-of-2-plus-a-few-others-in-between.

I assume the tradeoff here is better packing versus having a ridiculous
number of caches.  Is there any other cost?

Because even having 1024 caches wouldn't consume a terrible amount of
memory and I bet it would result in aggregate savings.

Of course, a scheme which creates kmalloc caches on-demand would be better,
but that would kill our compile-time cache selection, I suspect.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-09 16:51           ` Andrew Morton
@ 2007-07-09 17:26             ` Christoph Lameter
  2007-07-09 18:00               ` Andrew Morton
  2007-07-10  1:43               ` Nick Piggin
  2007-07-09 23:09             ` Matt Mackall
  1 sibling, 2 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-09 17:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nick Piggin, Ingo Molnar, linux-kernel, linux-mm,
	suresh.b.siddha, corey.d.gough, Pekka Enberg, Matt Mackall,
	Denis Vlasenko, Erik Andersen

On Mon, 9 Jul 2007, Andrew Morton wrote:

> On Mon, 9 Jul 2007 09:06:46 -0700 (PDT) Christoph Lameter <clameter@sgi.com> wrote:
> 
> > But yes the power of 
> > two caches are a necessary design feature of SLAB/SLUB that allows O(1) 
> > operations of kmalloc slabs which in turns causes memory wastage because 
> > of rounding of the alloc to the next power of two.
> 
> I've frequently wondered why we don't just create more caches for kmalloc:
> make it denser than each-power-of-2-plus-a-few-others-in-between.
> 
> I assume the tradeoff here is better packing versus having a ridiculous
> number of caches.  Is there any other cost?
> Because even having 1024 caches wouldn't consume a terrible amount of
> memory and I bet it would result in aggregate savings.

I have tried any number of approaches without too much success. Even one 
slab cache for every 8 bytes. This creates additional admin overhead 
through more control structure (that is pretty minimal but nevertheless 
exists)

The main issue is that kmallocs of different size must use different 
pages. If one allocates one 64 byte item and one 256 byte item and both 64 
byte and 256 byte are empty then SLAB/SLUB will have to allocate 2 pages. 
SLUB can fit them into one. This is basically only relevant early after 
boot. The advantage goes away as the system starts to work and as more 
objects are allocated in the slabs but the power-of-two slab will always
have to extend its size in page size chunks which leads to some overhead 
that SLOB can avoid by placing entities of multiple size in one slab. 
The tradeoff in SLOB is that is cannot be an O(1) allocator because it 
has to manage these variable sized objects by traversing the lists.

I think the advantage that SLOB generates here is pretty minimal and is 
easily offset by the problems of maintaining SLOB.
 
> Of course, a scheme which creates kmalloc caches on-demand would be better,
> but that would kill our compile-time cache selection, I suspect.

SLUB creates kmalloc caches on-demand for DMA caches already. But then we 
are not allowing compile time cache selection for DMA caches.


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-09 17:26             ` Christoph Lameter
@ 2007-07-09 18:00               ` Andrew Morton
  2007-07-10  1:43               ` Nick Piggin
  1 sibling, 0 replies; 111+ messages in thread
From: Andrew Morton @ 2007-07-09 18:00 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Nick Piggin, Ingo Molnar, linux-kernel, linux-mm,
	suresh.b.siddha, corey.d.gough, Pekka Enberg, Matt Mackall,
	Denis Vlasenko, Erik Andersen

On Mon, 9 Jul 2007 10:26:08 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> > I assume the tradeoff here is better packing versus having a ridiculous
> > number of caches.  Is there any other cost?
> > Because even having 1024 caches wouldn't consume a terrible amount of
> > memory and I bet it would result in aggregate savings.
> 
> I have tried any number of approaches without too much success. Even one 
> slab cache for every 8 bytes. This creates additional admin overhead 
> through more control structure (that is pretty minimal but nevertheless 
> exists)
> 
> The main issue is that kmallocs of different size must use different 
> pages. If one allocates one 64 byte item and one 256 byte item and both 64 
> byte and 256 byte are empty then SLAB/SLUB will have to allocate 2 pages. 
> SLUB can fit them into one. This is basically only relevant early after 
> boot. The advantage goes away as the system starts to work and as more 
> objects are allocated in the slabs but the power-of-two slab will always
> have to extend its size in page size chunks which leads to some overhead 
> that SLOB can avoid by placing entities of multiple size in one slab. 
> The tradeoff in SLOB is that is cannot be an O(1) allocator because it 
> has to manage these variable sized objects by traversing the lists.
> 
> I think the advantage that SLOB generates here is pretty minimal and is 
> easily offset by the problems of maintaining SLOB.

Sure.  But I wasn't proposing this as a way to make slub cover slob's advantage.
I was wondering what effect it would have on a more typical medium to large sized
system.

Not much, really: if any particular subsystem is using a "lot" of slab memory then
it should create its own cache rather than using kmalloc anyway, so forget it ;)

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
  2007-07-09 15:59       ` Martin Bligh
@ 2007-07-09 18:11         ` Christoph Lameter
  -1 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-09 18:11 UTC (permalink / raw)
  To: Martin Bligh; +Cc: Andi Kleen, linux-kernel, linux-mm

On Mon, 9 Jul 2007, Martin Bligh wrote:

> Those numbers came from Mathieu Desnoyers (LTTng) if you
> want more details.

Okay the source for these numbers is in his paper for the OLS 2006: Volume 
1 page 208-209? I do not see the exact number that you referred to there.

He seems to be comparing spinlock acquire / release vs. cmpxchg. So I 
guess you got your material from somewhere else?

Also the cmpxchg used there is the lockless variant. cmpxchg 29 cycles w/o 
lock prefix and 112 with lock prefix.

I see you reference another paper by Desnoyers: 
http://tree.celinuxforum.org/CelfPubWiki/ELC2006Presentations?action=AttachFile&do=get&target=celf2006-desnoyers.pdf

I do not see anything relevant there. Where did those numbers come from?

The lockless cmpxchg is certainly an interesting idea. Certain for some 
platforms I could disable preempt and then do a lockless cmpxchg.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
@ 2007-07-09 18:11         ` Christoph Lameter
  0 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-09 18:11 UTC (permalink / raw)
  To: Martin Bligh; +Cc: Andi Kleen, linux-kernel, linux-mm

On Mon, 9 Jul 2007, Martin Bligh wrote:

> Those numbers came from Mathieu Desnoyers (LTTng) if you
> want more details.

Okay the source for these numbers is in his paper for the OLS 2006: Volume 
1 page 208-209? I do not see the exact number that you referred to there.

He seems to be comparing spinlock acquire / release vs. cmpxchg. So I 
guess you got your material from somewhere else?

Also the cmpxchg used there is the lockless variant. cmpxchg 29 cycles w/o 
lock prefix and 112 with lock prefix.

I see you reference another paper by Desnoyers: 
http://tree.celinuxforum.org/CelfPubWiki/ELC2006Presentations?action=AttachFile&do=get&target=celf2006-desnoyers.pdf

I do not see anything relevant there. Where did those numbers come from?

The lockless cmpxchg is certainly an interesting idea. Certain for some 
platforms I could disable preempt and then do a lockless cmpxchg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
  2007-07-09 15:45   ` Christoph Lameter
@ 2007-07-09 19:43     ` David Miller
  2007-07-09 21:21       ` Christoph Lameter
  0 siblings, 1 reply; 111+ messages in thread
From: David Miller @ 2007-07-09 19:43 UTC (permalink / raw)
  To: clameter
  Cc: linux-kernel, linux-mm, suresh.b.siddha, corey.d.gough, penberg, akpm

From: Christoph Lameter <clameter@sgi.com>
Date: Mon, 9 Jul 2007 08:45:42 -0700 (PDT)

> On Sat, 7 Jul 2007, David Miller wrote:
> 
> > From: Christoph Lameter <clameter@sgi.com>
> > Date: Sat, 07 Jul 2007 20:49:52 -0700
> > 
> > > A cmpxchg is less costly than interrupt enabe/disable
> > 
> > This is cpu dependant, and in fact not true at all on Niagara
> > and several of the cpus in the UltraSPARC family.
> 
> Hmmm... So have alternate aloc/free paths depending on the cpu?

As Andi seemed to imply I don't even think cmpxchg is faster than
interrupt enable/disable on current generation AMD x86_64 chips, so
are you targetting this optimization solely at Intel x86 Core Duo
32-bit chips?  That's the only one I can see which will benefit from
this.  Are you going to probe the cpu sub-type and patch SLUB?

I really don't think this optimization is wise as even if you
could decide at build time, it's going to be a maintainence and
debugging nightmare to have to field bug reports given two different
locking schemes.

Please reconsider this change, thanks.


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-08  3:50 ` [patch 09/10] Remove the SLOB allocator for 2.6.23 Christoph Lameter
  2007-07-08  7:51   ` Ingo Molnar
@ 2007-07-09 20:52   ` Matt Mackall
  1 sibling, 0 replies; 111+ messages in thread
From: Matt Mackall @ 2007-07-09 20:52 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, linux-mm, suresh.b.siddha, corey.d.gough,
	Pekka Enberg, akpm

First, WTF wasn't I cc:ed on this? Are you actually trying to me make
me fuming mad?

On Sat, Jul 07, 2007 at 08:50:01PM -0700, Christoph Lameter wrote:
> Maintenance of slab allocators becomes a problem as new features for
> allocators are developed. The SLOB allocator in particular has been lagging
> behind in many ways in the past:
> 
> - Had no support for SLAB_DESTROY_BY_RCU for years (but no one noticed)

We've been over this 50 times. The target users were never affected.
And it's fixed. So why the HELL are you mentioning this again? 
 
> - Still has no support for slab reclaim counters. This may currently not
>   be necessary if one would restrict the supported configurations for
>   functionality relying on these. But even that has not been done.

We've been over this 50 times. Last time around, I inspected all the
code paths and demonstrated that despite your handwaving, IT DIDN'T
MATTER.

> The only current advantage over SLUB in terms of memory savings is through
> SLOBs kmalloc layout that is not power of two based like SLAB and SLUB which
> allows to eliminate some memory waste.
> 
> Through that SLOB has still a slight memory advantage over SLUB of ~350k in
> for a standard server configuration. It is likely that the savings are is
> smaller for real embedded configurations that have less functionality.

Sometimes I do not think there is a cluebat large enough for you. 350K
is FREAKING HUGE on a cell phone. That's most of a kernel!

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
  2007-07-09 18:11         ` Christoph Lameter
@ 2007-07-09 21:00           ` Martin Bligh
  -1 siblings, 0 replies; 111+ messages in thread
From: Martin Bligh @ 2007-07-09 21:00 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Andi Kleen, linux-kernel, linux-mm, Mathieu Desnoyers

Christoph Lameter wrote:
> On Mon, 9 Jul 2007, Martin Bligh wrote:
> 
>> Those numbers came from Mathieu Desnoyers (LTTng) if you
>> want more details.
> 
> Okay the source for these numbers is in his paper for the OLS 2006: Volume 
> 1 page 208-209? I do not see the exact number that you referred to there.

Nope, he was a direct co-author on the paper, was
working here, and measured it.

> He seems to be comparing spinlock acquire / release vs. cmpxchg. So I 
> guess you got your material from somewhere else?
> 
> Also the cmpxchg used there is the lockless variant. cmpxchg 29 cycles w/o 
> lock prefix and 112 with lock prefix.
> 
> I see you reference another paper by Desnoyers: 
> http://tree.celinuxforum.org/CelfPubWiki/ELC2006Presentations?action=AttachFile&do=get&target=celf2006-desnoyers.pdf
> 
> I do not see anything relevant there. Where did those numbers come from?
> 
> The lockless cmpxchg is certainly an interesting idea. Certain for some 
> platforms I could disable preempt and then do a lockless cmpxchg.

Matheiu, can you give some more details? Obviously the exact numbers
will vary by archicture, machine size, etc, but it's a good point
for discussion.

M.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
@ 2007-07-09 21:00           ` Martin Bligh
  0 siblings, 0 replies; 111+ messages in thread
From: Martin Bligh @ 2007-07-09 21:00 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Andi Kleen, linux-kernel, linux-mm, Mathieu Desnoyers

Christoph Lameter wrote:
> On Mon, 9 Jul 2007, Martin Bligh wrote:
> 
>> Those numbers came from Mathieu Desnoyers (LTTng) if you
>> want more details.
> 
> Okay the source for these numbers is in his paper for the OLS 2006: Volume 
> 1 page 208-209? I do not see the exact number that you referred to there.

Nope, he was a direct co-author on the paper, was
working here, and measured it.

> He seems to be comparing spinlock acquire / release vs. cmpxchg. So I 
> guess you got your material from somewhere else?
> 
> Also the cmpxchg used there is the lockless variant. cmpxchg 29 cycles w/o 
> lock prefix and 112 with lock prefix.
> 
> I see you reference another paper by Desnoyers: 
> http://tree.celinuxforum.org/CelfPubWiki/ELC2006Presentations?action=AttachFile&do=get&target=celf2006-desnoyers.pdf
> 
> I do not see anything relevant there. Where did those numbers come from?
> 
> The lockless cmpxchg is certainly an interesting idea. Certain for some 
> platforms I could disable preempt and then do a lockless cmpxchg.

Matheiu, can you give some more details? Obviously the exact numbers
will vary by archicture, machine size, etc, but it's a good point
for discussion.

M.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
  2007-07-09 19:43     ` David Miller
@ 2007-07-09 21:21       ` Christoph Lameter
  0 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-09 21:21 UTC (permalink / raw)
  To: David Miller
  Cc: linux-kernel, linux-mm, suresh.b.siddha, corey.d.gough, penberg, akpm

On Mon, 9 Jul 2007, David Miller wrote:

> As Andi seemed to imply I don't even think cmpxchg is faster than
> interrupt enable/disable on current generation AMD x86_64 chips, so
> are you targetting this optimization solely at Intel x86 Core Duo
> 32-bit chips?  That's the only one I can see which will benefit from
> this.  Are you going to probe the cpu sub-type and patch SLUB?

Not sure. The numbers I have seen in the papers indicate a potential 
speedup by a factor of 10 on x86 hardware. I have not done any 
benchmarking myself yet.

> Please reconsider this change, thanks.

This is an RFC after all so nothing is fixed yet. Certainly I will keep 
this in mind.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
  2007-07-09 21:00           ` Martin Bligh
@ 2007-07-09 21:44             ` Mathieu Desnoyers
  -1 siblings, 0 replies; 111+ messages in thread
From: Mathieu Desnoyers @ 2007-07-09 21:44 UTC (permalink / raw)
  To: Martin Bligh; +Cc: Christoph Lameter, Andi Kleen, linux-kernel, linux-mm

Hi,

* Martin Bligh (mbligh@mbligh.org) wrote:
> Christoph Lameter wrote:
> >On Mon, 9 Jul 2007, Martin Bligh wrote:
> >
> >>Those numbers came from Mathieu Desnoyers (LTTng) if you
> >>want more details.
> >
> >Okay the source for these numbers is in his paper for the OLS 2006: Volume 
> >1 page 208-209? I do not see the exact number that you referred to there.
> 

Hrm, the reference page number is wrong: it is in OLS 2006, Vol. 1 page
216 (section 4.5.2 Scalability). I originally pulled out the page number
from my local paper copy. oops.


> Nope, he was a direct co-author on the paper, was
> working here, and measured it.
> 
> >He seems to be comparing spinlock acquire / release vs. cmpxchg. So I 
> >guess you got your material from somewhere else?
> >

I ran a test specifically for this paper where I got this result
comparing the local irq enable/disable to local cmpxchg.

> >Also the cmpxchg used there is the lockless variant. cmpxchg 29 cycles w/o 
> >lock prefix and 112 with lock prefix.

Yep, I volountarily used the variant without lock prefix because the
data is per cpu and I disable preemption.

> >
> >I see you reference another paper by Desnoyers: 
> >http://tree.celinuxforum.org/CelfPubWiki/ELC2006Presentations?action=AttachFile&do=get&target=celf2006-desnoyers.pdf
> >
> >I do not see anything relevant there. Where did those numbers come from?
> >
> >The lockless cmpxchg is certainly an interesting idea. Certain for some 
> >platforms I could disable preempt and then do a lockless cmpxchg.
> 

Yes, preempt disabling or, eventually, the new thread migration
disabling I just proposed as an RFC on LKML. (that would make -rt people
happier)

> Mathieu, can you give some more details? Obviously the exact numbers
> will vary by archicture, machine size, etc, but it's a good point
> for discussion.
> 

Sure, also note that the UP cmpxchg (see asm-$ARCH/local.h in 2.6.22) is
faster on architectures like powerpc and MIPS where it is possible to
remove some memory barriers.

See 2.6.22 Documentation/local_ops.txt for a thorough discussion. Don't
hesitate ping me if you have more questions.

Regards,

Mathieu


-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
@ 2007-07-09 21:44             ` Mathieu Desnoyers
  0 siblings, 0 replies; 111+ messages in thread
From: Mathieu Desnoyers @ 2007-07-09 21:44 UTC (permalink / raw)
  To: Martin Bligh; +Cc: Christoph Lameter, Andi Kleen, linux-kernel, linux-mm

Hi,

* Martin Bligh (mbligh@mbligh.org) wrote:
> Christoph Lameter wrote:
> >On Mon, 9 Jul 2007, Martin Bligh wrote:
> >
> >>Those numbers came from Mathieu Desnoyers (LTTng) if you
> >>want more details.
> >
> >Okay the source for these numbers is in his paper for the OLS 2006: Volume 
> >1 page 208-209? I do not see the exact number that you referred to there.
> 

Hrm, the reference page number is wrong: it is in OLS 2006, Vol. 1 page
216 (section 4.5.2 Scalability). I originally pulled out the page number
from my local paper copy. oops.


> Nope, he was a direct co-author on the paper, was
> working here, and measured it.
> 
> >He seems to be comparing spinlock acquire / release vs. cmpxchg. So I 
> >guess you got your material from somewhere else?
> >

I ran a test specifically for this paper where I got this result
comparing the local irq enable/disable to local cmpxchg.

> >Also the cmpxchg used there is the lockless variant. cmpxchg 29 cycles w/o 
> >lock prefix and 112 with lock prefix.

Yep, I volountarily used the variant without lock prefix because the
data is per cpu and I disable preemption.

> >
> >I see you reference another paper by Desnoyers: 
> >http://tree.celinuxforum.org/CelfPubWiki/ELC2006Presentations?action=AttachFile&do=get&target=celf2006-desnoyers.pdf
> >
> >I do not see anything relevant there. Where did those numbers come from?
> >
> >The lockless cmpxchg is certainly an interesting idea. Certain for some 
> >platforms I could disable preempt and then do a lockless cmpxchg.
> 

Yes, preempt disabling or, eventually, the new thread migration
disabling I just proposed as an RFC on LKML. (that would make -rt people
happier)

> Mathieu, can you give some more details? Obviously the exact numbers
> will vary by archicture, machine size, etc, but it's a good point
> for discussion.
> 

Sure, also note that the UP cmpxchg (see asm-$ARCH/local.h in 2.6.22) is
faster on architectures like powerpc and MIPS where it is possible to
remove some memory barriers.

See 2.6.22 Documentation/local_ops.txt for a thorough discussion. Don't
hesitate ping me if you have more questions.

Regards,

Mathieu


-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
  2007-07-09 21:44             ` Mathieu Desnoyers
@ 2007-07-09 21:55               ` Christoph Lameter
  -1 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-09 21:55 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: Martin Bligh, Andi Kleen, linux-kernel, linux-mm

On Mon, 9 Jul 2007, Mathieu Desnoyers wrote:

> > >Okay the source for these numbers is in his paper for the OLS 2006: Volume 
> > >1 page 208-209? I do not see the exact number that you referred to there.
> > 
> 
> Hrm, the reference page number is wrong: it is in OLS 2006, Vol. 1 page
> 216 (section 4.5.2 Scalability). I originally pulled out the page number
> from my local paper copy. oops.

4.5.2 is on page 208 in my copy of the proceedings.


> > >He seems to be comparing spinlock acquire / release vs. cmpxchg. So I 
> > >guess you got your material from somewhere else?
> > >
> 
> I ran a test specifically for this paper where I got this result
> comparing the local irq enable/disable to local cmpxchg.


The numbers are pretty important and suggest that we can obtain 
a significant speed increase by avoid local irq disable enable in the slab 
allocator fast paths. Do you some more numbers? Any other publication that 
mentions these?


> Yep, I volountarily used the variant without lock prefix because the
> data is per cpu and I disable preemption.

local_cmpxchg generates this?

> Yes, preempt disabling or, eventually, the new thread migration
> disabling I just proposed as an RFC on LKML. (that would make -rt people
> happier)

Right.

> Sure, also note that the UP cmpxchg (see asm-$ARCH/local.h in 2.6.22) is
> faster on architectures like powerpc and MIPS where it is possible to
> remove some memory barriers.

UP cmpxchg meaning local_cmpxchg?

> See 2.6.22 Documentation/local_ops.txt for a thorough discussion. Don't
> hesitate ping me if you have more questions.

That is pretty thin and does not mention atomic_cmpxchg. You way want to 
expand on your ideas a bit.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
@ 2007-07-09 21:55               ` Christoph Lameter
  0 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-09 21:55 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: Martin Bligh, Andi Kleen, linux-kernel, linux-mm

On Mon, 9 Jul 2007, Mathieu Desnoyers wrote:

> > >Okay the source for these numbers is in his paper for the OLS 2006: Volume 
> > >1 page 208-209? I do not see the exact number that you referred to there.
> > 
> 
> Hrm, the reference page number is wrong: it is in OLS 2006, Vol. 1 page
> 216 (section 4.5.2 Scalability). I originally pulled out the page number
> from my local paper copy. oops.

4.5.2 is on page 208 in my copy of the proceedings.


> > >He seems to be comparing spinlock acquire / release vs. cmpxchg. So I 
> > >guess you got your material from somewhere else?
> > >
> 
> I ran a test specifically for this paper where I got this result
> comparing the local irq enable/disable to local cmpxchg.


The numbers are pretty important and suggest that we can obtain 
a significant speed increase by avoid local irq disable enable in the slab 
allocator fast paths. Do you some more numbers? Any other publication that 
mentions these?


> Yep, I volountarily used the variant without lock prefix because the
> data is per cpu and I disable preemption.

local_cmpxchg generates this?

> Yes, preempt disabling or, eventually, the new thread migration
> disabling I just proposed as an RFC on LKML. (that would make -rt people
> happier)

Right.

> Sure, also note that the UP cmpxchg (see asm-$ARCH/local.h in 2.6.22) is
> faster on architectures like powerpc and MIPS where it is possible to
> remove some memory barriers.

UP cmpxchg meaning local_cmpxchg?

> See 2.6.22 Documentation/local_ops.txt for a thorough discussion. Don't
> hesitate ping me if you have more questions.

That is pretty thin and does not mention atomic_cmpxchg. You way want to 
expand on your ideas a bit.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-08 18:02     ` Andrew Morton
  2007-07-09  2:57       ` Nick Piggin
@ 2007-07-09 21:57       ` Matt Mackall
  1 sibling, 0 replies; 111+ messages in thread
From: Matt Mackall @ 2007-07-09 21:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Christoph Lameter, linux-kernel, linux-mm,
	suresh.b.siddha, corey.d.gough, Pekka Enberg

On Sun, Jul 08, 2007 at 11:02:24AM -0700, Andrew Morton wrote:
> Guys, look at this the other way.  Suppose we only had slub, and someone
> came along and said "here's a whole new allocator which saves 4.5k of
> text", would we merge it on that basis?  Hell no, it's not worth it.  What
> we might do is to get motivated to see if we can make slub less porky under
> appropriate config settings.

Well I think we would obviously throw out SLAB and SLUB if they
weren't somewhat faster than SLOB. They're much more problematic and
one of the big features that Christoph's pushing is a fix for a
problem that SLOB simply doesn't have: huge numbers of SLAB/SLUB pages
being held down by small numbers of objects. 

> Let's not get sentimental about these things: in general, if there's any
> reasonable way in which we can rid ourselves of any code at all, we should
> do so, no?

I keep suggesting a Voyager Replacement Fund, but James isn't
interested.

But seriously, I don't think it should be at all surprising that the
allocator that's most appropriate for machines with < 32MB of RAM is
different than the one for machines with > 1TB of RAM.

The maintenance overhead of SLOB is fairly minimal. The biggest
outstanding SLOB problem is nommu's rather broken memory size
reporting.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
  2007-07-09 21:55               ` Christoph Lameter
@ 2007-07-09 22:58                 ` Mathieu Desnoyers
  -1 siblings, 0 replies; 111+ messages in thread
From: Mathieu Desnoyers @ 2007-07-09 22:58 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Martin Bligh, Andi Kleen, linux-kernel, linux-mm

* Christoph Lameter (clameter@sgi.com) wrote:
> On Mon, 9 Jul 2007, Mathieu Desnoyers wrote:
> 
> > > >He seems to be comparing spinlock acquire / release vs. cmpxchg. So I 
> > > >guess you got your material from somewhere else?
> > > >
> > 
> > I ran a test specifically for this paper where I got this result
> > comparing the local irq enable/disable to local cmpxchg.
> 
> 
> The numbers are pretty important and suggest that we can obtain 
> a significant speed increase by avoid local irq disable enable in the slab 
> allocator fast paths. Do you some more numbers? Any other publication that 
> mentions these?
> 

The original publication in which I released the idea was my LTTng paper
at OLS 2006. Outside this, I have not found other paper that talks about
this idea.

The test code is basically just disabling interrupts, reading the TSC
at the beginning and end and does 20000 loops of local_cmpxchg. I can
send you the code if you want it.

> 
> > Yep, I volountarily used the variant without lock prefix because the
> > data is per cpu and I disable preemption.
> 
> local_cmpxchg generates this?
> 

Yes.

> > Yes, preempt disabling or, eventually, the new thread migration
> > disabling I just proposed as an RFC on LKML. (that would make -rt people
> > happier)
> 
> Right.
> 
> > Sure, also note that the UP cmpxchg (see asm-$ARCH/local.h in 2.6.22) is
> > faster on architectures like powerpc and MIPS where it is possible to
> > remove some memory barriers.
> 
> UP cmpxchg meaning local_cmpxchg?
> 

Yes.

> > See 2.6.22 Documentation/local_ops.txt for a thorough discussion. Don't
> > hesitate ping me if you have more questions.
> 
> That is pretty thin and does not mention atomic_cmpxchg. You way want to 
> expand on your ideas a bit.

Sure, the idea goes as follow: if you have a per cpu variable that needs
to be concurrently modified in a coherent manner by any context (NMI,
irq, bh, process) running on the given CPU, you only need to use an
operation atomic wrt to the given CPU. You just have to make sure that
only this CPU will modify the variable (therefore, you must disable
preemption around modification) and you have to make sure that the
read-side, which can come from any CPU, is accessing this variable
atomically. Also, you have to be aware that the read-side might see an
older version of the other cpu's value because there is no SMP write
memory barrier involved. The value, however, will always be up to date
if the variable is read from the "local" CPU.

What applies to local_inc, given as example in the local_ops.txt
document, applies integrally to local_cmpxchg. And I would say that
local_cmpxchg is by far the cheapest locking mechanism I have found, and
use today, for my kernel tracer. The idea emerged from my need to trace
every execution context, including NMIs, while still providing good
performances. local_cmpxchg was the perfect fit; that's why I deployed
it in local.h in each and every architecture.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
@ 2007-07-09 22:58                 ` Mathieu Desnoyers
  0 siblings, 0 replies; 111+ messages in thread
From: Mathieu Desnoyers @ 2007-07-09 22:58 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Martin Bligh, Andi Kleen, linux-kernel, linux-mm

* Christoph Lameter (clameter@sgi.com) wrote:
> On Mon, 9 Jul 2007, Mathieu Desnoyers wrote:
> 
> > > >He seems to be comparing spinlock acquire / release vs. cmpxchg. So I 
> > > >guess you got your material from somewhere else?
> > > >
> > 
> > I ran a test specifically for this paper where I got this result
> > comparing the local irq enable/disable to local cmpxchg.
> 
> 
> The numbers are pretty important and suggest that we can obtain 
> a significant speed increase by avoid local irq disable enable in the slab 
> allocator fast paths. Do you some more numbers? Any other publication that 
> mentions these?
> 

The original publication in which I released the idea was my LTTng paper
at OLS 2006. Outside this, I have not found other paper that talks about
this idea.

The test code is basically just disabling interrupts, reading the TSC
at the beginning and end and does 20000 loops of local_cmpxchg. I can
send you the code if you want it.

> 
> > Yep, I volountarily used the variant without lock prefix because the
> > data is per cpu and I disable preemption.
> 
> local_cmpxchg generates this?
> 

Yes.

> > Yes, preempt disabling or, eventually, the new thread migration
> > disabling I just proposed as an RFC on LKML. (that would make -rt people
> > happier)
> 
> Right.
> 
> > Sure, also note that the UP cmpxchg (see asm-$ARCH/local.h in 2.6.22) is
> > faster on architectures like powerpc and MIPS where it is possible to
> > remove some memory barriers.
> 
> UP cmpxchg meaning local_cmpxchg?
> 

Yes.

> > See 2.6.22 Documentation/local_ops.txt for a thorough discussion. Don't
> > hesitate ping me if you have more questions.
> 
> That is pretty thin and does not mention atomic_cmpxchg. You way want to 
> expand on your ideas a bit.

Sure, the idea goes as follow: if you have a per cpu variable that needs
to be concurrently modified in a coherent manner by any context (NMI,
irq, bh, process) running on the given CPU, you only need to use an
operation atomic wrt to the given CPU. You just have to make sure that
only this CPU will modify the variable (therefore, you must disable
preemption around modification) and you have to make sure that the
read-side, which can come from any CPU, is accessing this variable
atomically. Also, you have to be aware that the read-side might see an
older version of the other cpu's value because there is no SMP write
memory barrier involved. The value, however, will always be up to date
if the variable is read from the "local" CPU.

What applies to local_inc, given as example in the local_ops.txt
document, applies integrally to local_cmpxchg. And I would say that
local_cmpxchg is by far the cheapest locking mechanism I have found, and
use today, for my kernel tracer. The idea emerged from my need to trace
every execution context, including NMIs, while still providing good
performances. local_cmpxchg was the perfect fit; that's why I deployed
it in local.h in each and every architecture.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
  2007-07-09 22:58                 ` Mathieu Desnoyers
@ 2007-07-09 23:08                   ` Christoph Lameter
  -1 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-09 23:08 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: Martin Bligh, Andi Kleen, linux-kernel, linux-mm

On Mon, 9 Jul 2007, Mathieu Desnoyers wrote:

> > > Yep, I volountarily used the variant without lock prefix because the
> > > data is per cpu and I disable preemption.
> > 
> > local_cmpxchg generates this?
> > 
> 
> Yes.

Does not work here. If I use

static void __always_inline *slab_alloc(struct kmem_cache *s,
                gfp_t gfpflags, int node, void *addr)
{
        void **object;
        struct kmem_cache_cpu *c;

        preempt_disable();
        c = get_cpu_slab(s, smp_processor_id());
redo:
        object = c->freelist;
        if (unlikely(!object || !node_match(c, node)))
                return __slab_alloc(s, gfpflags, node, addr, c);

        if (cmpxchg_local(&c->freelist, object, object[c->offset]) != object)
                goto redo;

        preempt_enable();
        if (unlikely((gfpflags & __GFP_ZERO)))
                memset(object, 0, c->objsize);

        return object;
}

Then the code will include a lock prefix:

    3270:       48 8b 1a                mov    (%rdx),%rbx
    3273:       48 85 db                test   %rbx,%rbx
    3276:       74 23                   je     329b <kmem_cache_alloc+0x4b>
    3278:       8b 42 14                mov    0x14(%rdx),%eax
    327b:       4c 8b 0c c3             mov    (%rbx,%rax,8),%r9
    327f:       48 89 d8                mov    %rbx,%rax
    3282:       f0 4c 0f b1 0a          lock cmpxchg %r9,(%rdx)
    3287:       48 39 c3                cmp    %rax,%rbx
    328a:       75 e4                   jne    3270 <kmem_cache_alloc+0x20>
    328c:       66 85 f6                test   %si,%si
    328f:       78 19                   js     32aa <kmem_cache_alloc+0x5a>
    3291:       48 89 d8                mov    %rbx,%rax
    3294:       48 83 c4 08             add    $0x8,%rsp
    3298:       5b                      pop    %rbx
    3299:       c9                      leaveq
    329a:       c3                      retq


> What applies to local_inc, given as example in the local_ops.txt
> document, applies integrally to local_cmpxchg. And I would say that
> local_cmpxchg is by far the cheapest locking mechanism I have found, and
> use today, for my kernel tracer. The idea emerged from my need to trace
> every execution context, including NMIs, while still providing good
> performances. local_cmpxchg was the perfect fit; that's why I deployed
> it in local.h in each and every architecture.

Great idea. The SLUB allocator may be able to use your idea to improve 
both the alloc and free path.


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
@ 2007-07-09 23:08                   ` Christoph Lameter
  0 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-09 23:08 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: Martin Bligh, Andi Kleen, linux-kernel, linux-mm

On Mon, 9 Jul 2007, Mathieu Desnoyers wrote:

> > > Yep, I volountarily used the variant without lock prefix because the
> > > data is per cpu and I disable preemption.
> > 
> > local_cmpxchg generates this?
> > 
> 
> Yes.

Does not work here. If I use

static void __always_inline *slab_alloc(struct kmem_cache *s,
                gfp_t gfpflags, int node, void *addr)
{
        void **object;
        struct kmem_cache_cpu *c;

        preempt_disable();
        c = get_cpu_slab(s, smp_processor_id());
redo:
        object = c->freelist;
        if (unlikely(!object || !node_match(c, node)))
                return __slab_alloc(s, gfpflags, node, addr, c);

        if (cmpxchg_local(&c->freelist, object, object[c->offset]) != object)
                goto redo;

        preempt_enable();
        if (unlikely((gfpflags & __GFP_ZERO)))
                memset(object, 0, c->objsize);

        return object;
}

Then the code will include a lock prefix:

    3270:       48 8b 1a                mov    (%rdx),%rbx
    3273:       48 85 db                test   %rbx,%rbx
    3276:       74 23                   je     329b <kmem_cache_alloc+0x4b>
    3278:       8b 42 14                mov    0x14(%rdx),%eax
    327b:       4c 8b 0c c3             mov    (%rbx,%rax,8),%r9
    327f:       48 89 d8                mov    %rbx,%rax
    3282:       f0 4c 0f b1 0a          lock cmpxchg %r9,(%rdx)
    3287:       48 39 c3                cmp    %rax,%rbx
    328a:       75 e4                   jne    3270 <kmem_cache_alloc+0x20>
    328c:       66 85 f6                test   %si,%si
    328f:       78 19                   js     32aa <kmem_cache_alloc+0x5a>
    3291:       48 89 d8                mov    %rbx,%rax
    3294:       48 83 c4 08             add    $0x8,%rsp
    3298:       5b                      pop    %rbx
    3299:       c9                      leaveq
    329a:       c3                      retq


> What applies to local_inc, given as example in the local_ops.txt
> document, applies integrally to local_cmpxchg. And I would say that
> local_cmpxchg is by far the cheapest locking mechanism I have found, and
> use today, for my kernel tracer. The idea emerged from my need to trace
> every execution context, including NMIs, while still providing good
> performances. local_cmpxchg was the perfect fit; that's why I deployed
> it in local.h in each and every architecture.

Great idea. The SLUB allocator may be able to use your idea to improve 
both the alloc and free path.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-09 16:51           ` Andrew Morton
  2007-07-09 17:26             ` Christoph Lameter
@ 2007-07-09 23:09             ` Matt Mackall
  1 sibling, 0 replies; 111+ messages in thread
From: Matt Mackall @ 2007-07-09 23:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Nick Piggin, Ingo Molnar, linux-kernel,
	linux-mm, suresh.b.siddha, corey.d.gough, Pekka Enberg,
	Denis Vlasenko, Erik Andersen

On Mon, Jul 09, 2007 at 09:51:16AM -0700, Andrew Morton wrote:
> On Mon, 9 Jul 2007 09:06:46 -0700 (PDT) Christoph Lameter <clameter@sgi.com> wrote:
> 
> > But yes the power of 
> > two caches are a necessary design feature of SLAB/SLUB that allows O(1) 
> > operations of kmalloc slabs which in turns causes memory wastage because 
> > of rounding of the alloc to the next power of two.
> 
> I've frequently wondered why we don't just create more caches for kmalloc:
> make it denser than each-power-of-2-plus-a-few-others-in-between.
> 
> I assume the tradeoff here is better packing versus having a ridiculous
> number of caches.  Is there any other cost?

It magnifies the fragmentation problem.

SLAB (and SLUB) makes the optimistic assumption that objects of the
same type/size have similar lifetimes. But for some objects, it's not
uncommon to do many temporary allocations but have some objects with
indefinite lifespans. dcache is a very frequently encountered example,
but there's no reason we couldn't see it with sockets and many other
object types. 

Every new arena introduces further opportunity for this sort of
fragmentation. If we had, say, separate pools for 48 byte and 64 byte
objects, an unfortunate usage pattern for 48-byte kmallocs could DoS
requests for 64 byte objects that would work just fine if they both
came out of the same pool. If we have 10 pools with long-lived
objects, we're much worse off than if we had 1 or 2.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
  2007-07-09 22:58                 ` Mathieu Desnoyers
@ 2007-07-10  0:55                   ` Christoph Lameter
  -1 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-10  0:55 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Martin Bligh, Andi Kleen, linux-kernel, linux-mm, David Miller

Ok here is a replacement patch for the cmpxchg patch. Problems

1. cmpxchg_local is not available on all arches. If we wanted to do
   this then it needs to be universally available.

2. cmpxchg_local does generate the "lock" prefix. It should not do that.
   Without fixes to cmpxchg_local we cannot expect maximum performance.

3. The approach is x86 centric. It relies on a cmpxchg that does not
   synchronize with memory used by other cpus and therefore is more
   lightweight. As far as I know the IA64 cmpxchg cannot do that.
   Neither several other processors. I am not sure how cmpxchgless
   platforms would use that. We need a detailed comparison of
   interrupt enable /disable vs. cmpxchg cycle counts for cachelines in
   the cpu cache to evaluate the impact that such a change would have.

   The cmpxchg (or its emulation) does not need any barriers since the
   accesses can only come from a single processor. 

Mathieu measured a significant performance benefit coming from not using
interrupt enable / disable.

Some rough processor cycle counts (anyone have better numbers?)

	STI	CLI	CMPXCHG
IA32	36	26	1 (assume XCHG == CMPXCHG, sti/cli also need stack pushes/pulls)
IA64	12	12	1 (but ar.ccv needs 11 cycles to set comparator,
			need register moves to preserve processors flags)

Looks like STI/CLI is pretty expensive and it seems that we may be able to
optimize the alloc / free hotpath quite a bit if we could drop the 
interrupt enable / disable. But we need some measurements.


Draft of a new patch:

SLUB: Single atomic instruction alloc/free using cmpxchg_local

A cmpxchg allows us to avoid disabling and enabling interrupts. The cmpxchg
is optimal to allow operations on per cpu freelist. We can stay on one
processor by disabling preemption() and allowing concurrent interrupts
thus avoiding the overhead of disabling and enabling interrupts.

Pro:
	- No need to disable interrupts.
	- Preempt disable /enable vanishes on non preempt kernels
Con:
        - Slightly complexer handling.
	- Updates to atomic instructions needed

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 mm/slub.c |   72 ++++++++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 49 insertions(+), 23 deletions(-)

Index: linux-2.6.22-rc6-mm1/mm/slub.c
===================================================================
--- linux-2.6.22-rc6-mm1.orig/mm/slub.c	2007-07-09 15:04:46.000000000 -0700
+++ linux-2.6.22-rc6-mm1/mm/slub.c	2007-07-09 17:09:00.000000000 -0700
@@ -1467,12 +1467,14 @@ static void *__slab_alloc(struct kmem_ca
 {
 	void **object;
 	struct page *new;
+	unsigned long flags;
 
+	local_irq_save(flags);
 	if (!c->page)
 		goto new_slab;
 
 	slab_lock(c->page);
-	if (unlikely(!node_match(c, node)))
+	if (unlikely(!node_match(c, node) || c->freelist))
 		goto another_slab;
 load_freelist:
 	object = c->page->freelist;
@@ -1486,7 +1488,14 @@ load_freelist:
 	c->page->inuse = s->objects;
 	c->page->freelist = NULL;
 	c->node = page_to_nid(c->page);
+out:
 	slab_unlock(c->page);
+	local_irq_restore(flags);
+	preempt_enable();
+
+	if (unlikely((gfpflags & __GFP_ZERO)))
+		memset(object, 0, c->objsize);
+
 	return object;
 
 another_slab:
@@ -1527,6 +1536,8 @@ new_slab:
 		c->page = new;
 		goto load_freelist;
 	}
+	local_irq_restore(flags);
+	preempt_enable();
 	return NULL;
 debug:
 	c->freelist = NULL;
@@ -1536,8 +1547,7 @@ debug:
 
 	c->page->inuse++;
 	c->page->freelist = object[c->offset];
-	slab_unlock(c->page);
-	return object;
+	goto out;
 }
 
 /*
@@ -1554,23 +1564,20 @@ static void __always_inline *slab_alloc(
 		gfp_t gfpflags, int node, void *addr)
 {
 	void **object;
-	unsigned long flags;
 	struct kmem_cache_cpu *c;
 
-	local_irq_save(flags);
+	preempt_disable();
 	c = get_cpu_slab(s, smp_processor_id());
-	if (unlikely(!c->page || !c->freelist ||
-					!node_match(c, node)))
+redo:
+	object = c->freelist;
+	if (unlikely(!object || !node_match(c, node)))
+		return __slab_alloc(s, gfpflags, node, addr, c);
 
-		object = __slab_alloc(s, gfpflags, node, addr, c);
+	if (cmpxchg_local(&c->freelist, object, object[c->offset]) != object)
+		goto redo;
 
-	else {
-		object = c->freelist;
-		c->freelist = object[c->offset];
-	}
-	local_irq_restore(flags);
-
-	if (unlikely((gfpflags & __GFP_ZERO) && object))
+	preempt_enable();
+	if (unlikely((gfpflags & __GFP_ZERO)))
 		memset(object, 0, c->objsize);
 
 	return object;
@@ -1603,7 +1610,9 @@ static void __slab_free(struct kmem_cach
 {
 	void *prior;
 	void **object = (void *)x;
+	unsigned long flags;
 
+	local_irq_save(flags);
 	slab_lock(page);
 
 	if (unlikely(SlabDebug(page)))
@@ -1629,6 +1638,8 @@ checks_ok:
 
 out_unlock:
 	slab_unlock(page);
+	local_irq_restore(flags);
+	preempt_enable();
 	return;
 
 slab_empty:
@@ -1639,6 +1650,8 @@ slab_empty:
 		remove_partial(s, page);
 
 	slab_unlock(page);
+	local_irq_restore(flags);
+	preempt_enable();
 	discard_slab(s, page);
 	return;
 
@@ -1663,18 +1676,31 @@ static void __always_inline slab_free(st
 			struct page *page, void *x, void *addr)
 {
 	void **object = (void *)x;
-	unsigned long flags;
 	struct kmem_cache_cpu *c;
+	void **freelist;
 
-	local_irq_save(flags);
+	preempt_disable();
 	c = get_cpu_slab(s, smp_processor_id());
-	if (likely(page == c->page && c->freelist)) {
-		object[c->offset] = c->freelist;
-		c->freelist = object;
-	} else
-		__slab_free(s, page, x, addr, c->offset);
+redo:
+	freelist = c->freelist;
+	/*
+	 * Must read freelist before c->page. If a interrupt occurs and
+	 * changes c->page after we have read it here then it
+	 * will also have changed c->freelist and the cmpxchg will fail.
+	 *
+	 * If we would have checked c->page first then the freelist could
+	 * have been changed under us before we read c->freelist and we
+	 * would not be able to detect that situation.
+	 */
+	smp_rmb();
+	if (unlikely(page != c->page || !freelist))
+		return __slab_free(s, page, x, addr, c->offset);
+
+	object[c->offset] = freelist;
+	if (cmpxchg_local(&c->freelist, freelist, object) != freelist)
+		goto redo;
 
-	local_irq_restore(flags);
+	preempt_enable();
 }
 
 void kmem_cache_free(struct kmem_cache *s, void *x)


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
@ 2007-07-10  0:55                   ` Christoph Lameter
  0 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-10  0:55 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Martin Bligh, Andi Kleen, linux-kernel, linux-mm, David Miller

Ok here is a replacement patch for the cmpxchg patch. Problems

1. cmpxchg_local is not available on all arches. If we wanted to do
   this then it needs to be universally available.

2. cmpxchg_local does generate the "lock" prefix. It should not do that.
   Without fixes to cmpxchg_local we cannot expect maximum performance.

3. The approach is x86 centric. It relies on a cmpxchg that does not
   synchronize with memory used by other cpus and therefore is more
   lightweight. As far as I know the IA64 cmpxchg cannot do that.
   Neither several other processors. I am not sure how cmpxchgless
   platforms would use that. We need a detailed comparison of
   interrupt enable /disable vs. cmpxchg cycle counts for cachelines in
   the cpu cache to evaluate the impact that such a change would have.

   The cmpxchg (or its emulation) does not need any barriers since the
   accesses can only come from a single processor. 

Mathieu measured a significant performance benefit coming from not using
interrupt enable / disable.

Some rough processor cycle counts (anyone have better numbers?)

	STI	CLI	CMPXCHG
IA32	36	26	1 (assume XCHG == CMPXCHG, sti/cli also need stack pushes/pulls)
IA64	12	12	1 (but ar.ccv needs 11 cycles to set comparator,
			need register moves to preserve processors flags)

Looks like STI/CLI is pretty expensive and it seems that we may be able to
optimize the alloc / free hotpath quite a bit if we could drop the 
interrupt enable / disable. But we need some measurements.


Draft of a new patch:

SLUB: Single atomic instruction alloc/free using cmpxchg_local

A cmpxchg allows us to avoid disabling and enabling interrupts. The cmpxchg
is optimal to allow operations on per cpu freelist. We can stay on one
processor by disabling preemption() and allowing concurrent interrupts
thus avoiding the overhead of disabling and enabling interrupts.

Pro:
	- No need to disable interrupts.
	- Preempt disable /enable vanishes on non preempt kernels
Con:
        - Slightly complexer handling.
	- Updates to atomic instructions needed

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 mm/slub.c |   72 ++++++++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 49 insertions(+), 23 deletions(-)

Index: linux-2.6.22-rc6-mm1/mm/slub.c
===================================================================
--- linux-2.6.22-rc6-mm1.orig/mm/slub.c	2007-07-09 15:04:46.000000000 -0700
+++ linux-2.6.22-rc6-mm1/mm/slub.c	2007-07-09 17:09:00.000000000 -0700
@@ -1467,12 +1467,14 @@ static void *__slab_alloc(struct kmem_ca
 {
 	void **object;
 	struct page *new;
+	unsigned long flags;
 
+	local_irq_save(flags);
 	if (!c->page)
 		goto new_slab;
 
 	slab_lock(c->page);
-	if (unlikely(!node_match(c, node)))
+	if (unlikely(!node_match(c, node) || c->freelist))
 		goto another_slab;
 load_freelist:
 	object = c->page->freelist;
@@ -1486,7 +1488,14 @@ load_freelist:
 	c->page->inuse = s->objects;
 	c->page->freelist = NULL;
 	c->node = page_to_nid(c->page);
+out:
 	slab_unlock(c->page);
+	local_irq_restore(flags);
+	preempt_enable();
+
+	if (unlikely((gfpflags & __GFP_ZERO)))
+		memset(object, 0, c->objsize);
+
 	return object;
 
 another_slab:
@@ -1527,6 +1536,8 @@ new_slab:
 		c->page = new;
 		goto load_freelist;
 	}
+	local_irq_restore(flags);
+	preempt_enable();
 	return NULL;
 debug:
 	c->freelist = NULL;
@@ -1536,8 +1547,7 @@ debug:
 
 	c->page->inuse++;
 	c->page->freelist = object[c->offset];
-	slab_unlock(c->page);
-	return object;
+	goto out;
 }
 
 /*
@@ -1554,23 +1564,20 @@ static void __always_inline *slab_alloc(
 		gfp_t gfpflags, int node, void *addr)
 {
 	void **object;
-	unsigned long flags;
 	struct kmem_cache_cpu *c;
 
-	local_irq_save(flags);
+	preempt_disable();
 	c = get_cpu_slab(s, smp_processor_id());
-	if (unlikely(!c->page || !c->freelist ||
-					!node_match(c, node)))
+redo:
+	object = c->freelist;
+	if (unlikely(!object || !node_match(c, node)))
+		return __slab_alloc(s, gfpflags, node, addr, c);
 
-		object = __slab_alloc(s, gfpflags, node, addr, c);
+	if (cmpxchg_local(&c->freelist, object, object[c->offset]) != object)
+		goto redo;
 
-	else {
-		object = c->freelist;
-		c->freelist = object[c->offset];
-	}
-	local_irq_restore(flags);
-
-	if (unlikely((gfpflags & __GFP_ZERO) && object))
+	preempt_enable();
+	if (unlikely((gfpflags & __GFP_ZERO)))
 		memset(object, 0, c->objsize);
 
 	return object;
@@ -1603,7 +1610,9 @@ static void __slab_free(struct kmem_cach
 {
 	void *prior;
 	void **object = (void *)x;
+	unsigned long flags;
 
+	local_irq_save(flags);
 	slab_lock(page);
 
 	if (unlikely(SlabDebug(page)))
@@ -1629,6 +1638,8 @@ checks_ok:
 
 out_unlock:
 	slab_unlock(page);
+	local_irq_restore(flags);
+	preempt_enable();
 	return;
 
 slab_empty:
@@ -1639,6 +1650,8 @@ slab_empty:
 		remove_partial(s, page);
 
 	slab_unlock(page);
+	local_irq_restore(flags);
+	preempt_enable();
 	discard_slab(s, page);
 	return;
 
@@ -1663,18 +1676,31 @@ static void __always_inline slab_free(st
 			struct page *page, void *x, void *addr)
 {
 	void **object = (void *)x;
-	unsigned long flags;
 	struct kmem_cache_cpu *c;
+	void **freelist;
 
-	local_irq_save(flags);
+	preempt_disable();
 	c = get_cpu_slab(s, smp_processor_id());
-	if (likely(page == c->page && c->freelist)) {
-		object[c->offset] = c->freelist;
-		c->freelist = object;
-	} else
-		__slab_free(s, page, x, addr, c->offset);
+redo:
+	freelist = c->freelist;
+	/*
+	 * Must read freelist before c->page. If a interrupt occurs and
+	 * changes c->page after we have read it here then it
+	 * will also have changed c->freelist and the cmpxchg will fail.
+	 *
+	 * If we would have checked c->page first then the freelist could
+	 * have been changed under us before we read c->freelist and we
+	 * would not be able to detect that situation.
+	 */
+	smp_rmb();
+	if (unlikely(page != c->page || !freelist))
+		return __slab_free(s, page, x, addr, c->offset);
+
+	object[c->offset] = freelist;
+	if (cmpxchg_local(&c->freelist, freelist, object) != freelist)
+		goto redo;
 
-	local_irq_restore(flags);
+	preempt_enable();
 }
 
 void kmem_cache_free(struct kmem_cache *s, void *x)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-09 16:06         ` Christoph Lameter
  2007-07-09 16:51           ` Andrew Morton
@ 2007-07-10  1:41           ` Nick Piggin
  2007-07-10  1:51             ` Christoph Lameter
  1 sibling, 1 reply; 111+ messages in thread
From: Nick Piggin @ 2007-07-10  1:41 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, Ingo Molnar, linux-kernel, linux-mm,
	suresh.b.siddha, corey.d.gough, Pekka Enberg, Matt Mackall,
	Denis Vlasenko, Erik Andersen

Christoph Lameter wrote:
> On Mon, 9 Jul 2007, Nick Piggin wrote:
> 
> 
>>>A reason for retaining slob would be that it has some O(n) memory saving
>>>due to better packing, etc.  Indeed that was the reason for merging it in
>>>the first place.  If slob no longer retains that advantage (wrt slub) then
>>>we no longer need it.
>>
>>SLOB contains several significant O(1) and also O(n) memory savings that
>>are so far impossible-by-design for SLUB. They are: slab external
>>fragmentation is significantly reduced; kmalloc internal fragmentation is
>>significantly reduced; order of magnitude smaller kmem_cache data type;
>>order of magnitude less code...
> 
> 
> Well that is only true for kmalloc objects < PAGE_SIZE and to some extend 
> offset by the need to keep per object data in SLUB. But yes the power of 
> two caches are a necessary design feature of SLAB/SLUB that allows O(1) 
> operations of kmalloc slabs which in turns causes memory wastage because 
> of rounding of the alloc to the next power of two. SLUB has less wastage
> there than SLAB since it can fit power of two object tightly into a slab 
> instead of having to place additional control information there like SLAB.

OK but we're talking about SLOB. And the number that matters is the amount
of memory used, which is higher with SLUB than with SLOB in our tests.


> O(n) memory savings? What is that?

Allocate n things and your memory waste is proportional to n (well that's
O(n) waste, so I guess by savings I mean that SLOB's memory saving compared
to SLUB are proportional to n).

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-09 17:26             ` Christoph Lameter
  2007-07-09 18:00               ` Andrew Morton
@ 2007-07-10  1:43               ` Nick Piggin
  2007-07-10  1:56                 ` Christoph Lameter
  1 sibling, 1 reply; 111+ messages in thread
From: Nick Piggin @ 2007-07-10  1:43 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, Ingo Molnar, linux-kernel, linux-mm,
	suresh.b.siddha, corey.d.gough, Pekka Enberg, Matt Mackall,
	Denis Vlasenko, Erik Andersen

Christoph Lameter wrote:
> On Mon, 9 Jul 2007, Andrew Morton wrote:

> I think the advantage that SLOB generates here is pretty minimal and is 
> easily offset by the problems of maintaining SLOB.

I don't get it. Have you got agreement from the small memory people
that the advantages of SLOB are pretty minimal, or did you just
decide that? If the latter, did you completely miss reading my email?
What happens to the people who jump through hoops to save 1 or 2 K?

I don't see any problems with maintaining SLOB. It is simple enough
that I was able to write a userspace test harness for it and hack
away at it after reading the code the first time for half an hour or
so. It is nothing even slightly comparable to the problems of SLAB,
for example. And you don't have to maintain it at all anyway!

I like removing code as much as the next person, but I don't
understand why you are so intent on removing SLOB and willing to
dismiss its advantages so quickly.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-10  1:41           ` Nick Piggin
@ 2007-07-10  1:51             ` Christoph Lameter
  2007-07-10  1:58               ` Nick Piggin
  2007-07-10  2:32               ` Matt Mackall
  0 siblings, 2 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-10  1:51 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, Ingo Molnar, linux-kernel, linux-mm,
	suresh.b.siddha, corey.d.gough, Pekka Enberg, Matt Mackall,
	Denis Vlasenko, Erik Andersen

On Tue, 10 Jul 2007, Nick Piggin wrote:

> > O(n) memory savings? What is that?
> 
> Allocate n things and your memory waste is proportional to n (well that's
> O(n) waste, so I guess by savings I mean that SLOB's memory saving compared
> to SLUB are proportional to n).

n is the size of the object?

Its linear correlated to the object size. It does not grow 
exponentially as object size grows. Waste in the kmalloc array in the 
worst case is < objsize.


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-10  1:43               ` Nick Piggin
@ 2007-07-10  1:56                 ` Christoph Lameter
  2007-07-10  2:02                   ` Nick Piggin
  0 siblings, 1 reply; 111+ messages in thread
From: Christoph Lameter @ 2007-07-10  1:56 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, Ingo Molnar, linux-kernel, linux-mm,
	suresh.b.siddha, corey.d.gough, Pekka Enberg, Matt Mackall,
	Denis Vlasenko, Erik Andersen

On Tue, 10 Jul 2007, Nick Piggin wrote:

> I don't see any problems with maintaining SLOB. It is simple enough
> that I was able to write a userspace test harness for it and hack
> away at it after reading the code the first time for half an hour or
> so. It is nothing even slightly comparable to the problems of SLAB,
> for example. And you don't have to maintain it at all anyway!

I have to maintain it because I have to keep the slab APIs consistent 
(recently I added GFP_ZERO support and had to provide shims for slab 
defreag). It is not in a good state as described in the patch and has a 
history of not being maintained properly. Everyone that modifies the 
behavior of the slab allocator has to do something to avoid breaking SLOB. 
Its certainly fun to hack on but is that a criterion for keeping it in the 
tree?

> I like removing code as much as the next person, but I don't
> understand why you are so intent on removing SLOB and willing to
> dismiss its advantages so quickly.

Quickly? We have considered this for months now.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-10  1:51             ` Christoph Lameter
@ 2007-07-10  1:58               ` Nick Piggin
  2007-07-10  6:22                 ` Matt Mackall
  2007-07-10  2:32               ` Matt Mackall
  1 sibling, 1 reply; 111+ messages in thread
From: Nick Piggin @ 2007-07-10  1:58 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, Ingo Molnar, linux-kernel, linux-mm,
	suresh.b.siddha, corey.d.gough, Pekka Enberg, Matt Mackall,
	Denis Vlasenko, Erik Andersen

Christoph Lameter wrote:
> On Tue, 10 Jul 2007, Nick Piggin wrote:
> 
> 
>>>O(n) memory savings? What is that?
>>
>>Allocate n things and your memory waste is proportional to n (well that's
>>O(n) waste, so I guess by savings I mean that SLOB's memory saving compared
>>to SLUB are proportional to n).
> 
> 
> n is the size of the object?

n things -- n number of things (n calls to kmem_cache_alloc()).

Just a fancy way of saying roughly that memory waste will increase as
the size of the system increases. But that aspect of it I think is
not really a problem for non-tiny systems anyway because the waste
tends not to be too bad (and possibly the number of active allocations
does not increase O(n) with the size of RAM either).

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-10  1:56                 ` Christoph Lameter
@ 2007-07-10  2:02                   ` Nick Piggin
  2007-07-10  2:11                     ` Christoph Lameter
  0 siblings, 1 reply; 111+ messages in thread
From: Nick Piggin @ 2007-07-10  2:02 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, Ingo Molnar, linux-kernel, linux-mm,
	suresh.b.siddha, corey.d.gough, Pekka Enberg, Matt Mackall,
	Denis Vlasenko, Erik Andersen

Christoph Lameter wrote:
> On Tue, 10 Jul 2007, Nick Piggin wrote:
> 
> 
>>I don't see any problems with maintaining SLOB. It is simple enough
>>that I was able to write a userspace test harness for it and hack
>>away at it after reading the code the first time for half an hour or
>>so. It is nothing even slightly comparable to the problems of SLAB,
>>for example. And you don't have to maintain it at all anyway!
> 
> 
> I have to maintain it because I have to keep the slab APIs consistent 
> (recently I added GFP_ZERO support and had to provide shims for slab 
> defreag). It is not in a good state as described in the patch and has a 
> history of not being maintained properly. Everyone that modifies the 
> behavior of the slab allocator has to do something to avoid breaking SLOB. 
> Its certainly fun to hack on but is that a criterion for keeping it in the 
> tree?

Pretty standard fare that when you add something or change APIs, most
of the burden is on you to not break the kernel. I'd love nothing better
than to remove all but about 3 filesystems :)

It is reasonable to expect some help from maintainers, but I notice you
didn't even CC the SLOB maintainer in the patch to remove SLOB! So maybe
if you tried working a bit closer with him you could get better results?


>>I like removing code as much as the next person, but I don't
>>understand why you are so intent on removing SLOB and willing to
>>dismiss its advantages so quickly.
> 
> 
> Quickly? We have considered this for months now.

Quickly -- as in you quickly sweep the savings of 100s of K under the
rug and just declare that it is insignificant :)

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-10  2:02                   ` Nick Piggin
@ 2007-07-10  2:11                     ` Christoph Lameter
  2007-07-10  7:09                       ` Nick Piggin
  2007-07-10  8:32                       ` Matt Mackall
  0 siblings, 2 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-10  2:11 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, Ingo Molnar, linux-kernel, linux-mm,
	suresh.b.siddha, corey.d.gough, Pekka Enberg, Matt Mackall,
	Denis Vlasenko, Erik Andersen

On Tue, 10 Jul 2007, Nick Piggin wrote:

> It is reasonable to expect some help from maintainers, but I notice you
> didn't even CC the SLOB maintainer in the patch to remove SLOB! So maybe
> if you tried working a bit closer with him you could get better results?

The maintainers last patch to SLOB was the initial submission of the 
allocator. Then he acked subsequent patches. Most of the modifications to 
SLOB are my work. Attempts to talk to the maintainer result in inventive 
explanations why SLOB does not have to conform to kernel standards. There 
is no reasonable expectation that this will change.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-10  1:51             ` Christoph Lameter
  2007-07-10  1:58               ` Nick Piggin
@ 2007-07-10  2:32               ` Matt Mackall
  1 sibling, 0 replies; 111+ messages in thread
From: Matt Mackall @ 2007-07-10  2:32 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Nick Piggin, Andrew Morton, Ingo Molnar, linux-kernel, linux-mm,
	suresh.b.siddha, corey.d.gough, Pekka Enberg, Denis Vlasenko,
	Erik Andersen

On Mon, Jul 09, 2007 at 06:51:51PM -0700, Christoph Lameter wrote:
> On Tue, 10 Jul 2007, Nick Piggin wrote:
> 
> > > O(n) memory savings? What is that?
> > 
> > Allocate n things and your memory waste is proportional to n (well that's
> > O(n) waste, so I guess by savings I mean that SLOB's memory saving compared
> > to SLUB are proportional to n).
> 
> n is the size of the object?
> 
> Its linear correlated to the object size. It does not grow 
> exponentially as object size grows. Waste in the kmalloc array in the 
> worst case is < objsize.

N is the number of objects.

So, for large N, the overhead of SLUB of allocations of size m, where
m % alignment == 0 (for simplicity):

 kmem_cache_alloc case:

  (PAGE_SIZE % m) / m

 kmalloc case:

  nextpowerof2(m) - m

And for SLOB (with Nick's improvements):

 kmem_cache_alloc case:

  0 (the remainder of the page is usable for other allocs)

 kmalloc case:

  2 bytes (or whatever the minimal arch alignment is)

SLUB wins by two bytes on kmallocs that happen to be a power of two,
ties on kmem_cache_allocs that happen to evenly divide PAGE_SIZE and
kmallocs 2 bytes smaller than a power of 2, and loses everywhere else.

It also loses whenever a particular type/size has both short-lived and
long-lived objects because of pinning. This effect is worse on small
machines as it's easier to pin all of memory.

The downsides of current SLOB are:

 - single lock
 - potentially long walks to find free blocks

The average case walk has been shortened quite a bit by Nick's
patches. The combinatorics here are enough to make Knuth hand-wave,
but we can in theory get close to O(1) performance when we don't have
much memory pressure. Needs some serious benchmarking. When we DO have
memory pressure, both SLOB and SLUB will degrade to forcing the kernel
to walk a bunch of pages.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH] x86_64 - Use non locked version for local_cmpxchg()
  2007-07-09 23:08                   ` Christoph Lameter
@ 2007-07-10  5:16                     ` Mathieu Desnoyers
  -1 siblings, 0 replies; 111+ messages in thread
From: Mathieu Desnoyers @ 2007-07-10  5:16 UTC (permalink / raw)
  To: akpm; +Cc: Martin Bligh, Andi Kleen, linux-kernel, linux-mm

You are completely right: on x86_64, a bit got lost in the move to
cmpxchg.h, here is the fix. It applies on 2.6.22-rc6-mm1.

x86_64 - Use non locked version for local_cmpxchg()

local_cmpxchg() should not use any LOCK prefix. This change probably got lost in
the move to cmpxchg.h.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 include/asm-x86_64/cmpxchg.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6-lttng/include/asm-x86_64/cmpxchg.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-x86_64/cmpxchg.h	2007-07-10 01:10:10.000000000 -0400
+++ linux-2.6-lttng/include/asm-x86_64/cmpxchg.h	2007-07-10 01:11:03.000000000 -0400
@@ -128,7 +128,7 @@
 	((__typeof__(*(ptr)))__cmpxchg((ptr),(unsigned long)(o),\
 					(unsigned long)(n),sizeof(*(ptr))))
 #define cmpxchg_local(ptr,o,n)\
-	((__typeof__(*(ptr)))__cmpxchg((ptr),(unsigned long)(o),\
+	((__typeof__(*(ptr)))__cmpxchg_local((ptr),(unsigned long)(o),\
 					(unsigned long)(n),sizeof(*(ptr))))
 
 #endif


* Christoph Lameter (clameter@sgi.com) wrote:
> On Mon, 9 Jul 2007, Mathieu Desnoyers wrote:
> 
> > > > Yep, I volountarily used the variant without lock prefix because the
> > > > data is per cpu and I disable preemption.
> > > 
> > > local_cmpxchg generates this?
> > > 
> > 
> > Yes.
> 
> Does not work here. If I use
> 
> static void __always_inline *slab_alloc(struct kmem_cache *s,
>                 gfp_t gfpflags, int node, void *addr)
> {
>         void **object;
>         struct kmem_cache_cpu *c;
> 
>         preempt_disable();
>         c = get_cpu_slab(s, smp_processor_id());
> redo:
>         object = c->freelist;
>         if (unlikely(!object || !node_match(c, node)))
>                 return __slab_alloc(s, gfpflags, node, addr, c);
> 
>         if (cmpxchg_local(&c->freelist, object, object[c->offset]) != object)
>                 goto redo;
> 
>         preempt_enable();
>         if (unlikely((gfpflags & __GFP_ZERO)))
>                 memset(object, 0, c->objsize);
> 
>         return object;
> }
> 
> Then the code will include a lock prefix:
> 
>     3270:       48 8b 1a                mov    (%rdx),%rbx
>     3273:       48 85 db                test   %rbx,%rbx




>     3276:       74 23                   je     329b <kmem_cache_alloc+0x4b>
>     3278:       8b 42 14                mov    0x14(%rdx),%eax
>     327b:       4c 8b 0c c3             mov    (%rbx,%rax,8),%r9
>     327f:       48 89 d8                mov    %rbx,%rax
>     3282:       f0 4c 0f b1 0a          lock cmpxchg %r9,(%rdx)
>     3287:       48 39 c3                cmp    %rax,%rbx
>     328a:       75 e4                   jne    3270 <kmem_cache_alloc+0x20>
>     328c:       66 85 f6                test   %si,%si
>     328f:       78 19                   js     32aa <kmem_cache_alloc+0x5a>
>     3291:       48 89 d8                mov    %rbx,%rax
>     3294:       48 83 c4 08             add    $0x8,%rsp
>     3298:       5b                      pop    %rbx
>     3299:       c9                      leaveq
>     329a:       c3                      retq
> 
> 
> > What applies to local_inc, given as example in the local_ops.txt
> > document, applies integrally to local_cmpxchg. And I would say that
> > local_cmpxchg is by far the cheapest locking mechanism I have found, and
> > use today, for my kernel tracer. The idea emerged from my need to trace
> > every execution context, including NMIs, while still providing good
> > performances. local_cmpxchg was the perfect fit; that's why I deployed
> > it in local.h in each and every architecture.
> 
> Great idea. The SLUB allocator may be able to use your idea to improve 
> both the alloc and free path.
> 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH] x86_64 - Use non locked version for local_cmpxchg()
@ 2007-07-10  5:16                     ` Mathieu Desnoyers
  0 siblings, 0 replies; 111+ messages in thread
From: Mathieu Desnoyers @ 2007-07-10  5:16 UTC (permalink / raw)
  To: akpm; +Cc: Martin Bligh, Andi Kleen, linux-kernel, linux-mm

You are completely right: on x86_64, a bit got lost in the move to
cmpxchg.h, here is the fix. It applies on 2.6.22-rc6-mm1.

x86_64 - Use non locked version for local_cmpxchg()

local_cmpxchg() should not use any LOCK prefix. This change probably got lost in
the move to cmpxchg.h.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 include/asm-x86_64/cmpxchg.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6-lttng/include/asm-x86_64/cmpxchg.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-x86_64/cmpxchg.h	2007-07-10 01:10:10.000000000 -0400
+++ linux-2.6-lttng/include/asm-x86_64/cmpxchg.h	2007-07-10 01:11:03.000000000 -0400
@@ -128,7 +128,7 @@
 	((__typeof__(*(ptr)))__cmpxchg((ptr),(unsigned long)(o),\
 					(unsigned long)(n),sizeof(*(ptr))))
 #define cmpxchg_local(ptr,o,n)\
-	((__typeof__(*(ptr)))__cmpxchg((ptr),(unsigned long)(o),\
+	((__typeof__(*(ptr)))__cmpxchg_local((ptr),(unsigned long)(o),\
 					(unsigned long)(n),sizeof(*(ptr))))
 
 #endif


* Christoph Lameter (clameter@sgi.com) wrote:
> On Mon, 9 Jul 2007, Mathieu Desnoyers wrote:
> 
> > > > Yep, I volountarily used the variant without lock prefix because the
> > > > data is per cpu and I disable preemption.
> > > 
> > > local_cmpxchg generates this?
> > > 
> > 
> > Yes.
> 
> Does not work here. If I use
> 
> static void __always_inline *slab_alloc(struct kmem_cache *s,
>                 gfp_t gfpflags, int node, void *addr)
> {
>         void **object;
>         struct kmem_cache_cpu *c;
> 
>         preempt_disable();
>         c = get_cpu_slab(s, smp_processor_id());
> redo:
>         object = c->freelist;
>         if (unlikely(!object || !node_match(c, node)))
>                 return __slab_alloc(s, gfpflags, node, addr, c);
> 
>         if (cmpxchg_local(&c->freelist, object, object[c->offset]) != object)
>                 goto redo;
> 
>         preempt_enable();
>         if (unlikely((gfpflags & __GFP_ZERO)))
>                 memset(object, 0, c->objsize);
> 
>         return object;
> }
> 
> Then the code will include a lock prefix:
> 
>     3270:       48 8b 1a                mov    (%rdx),%rbx
>     3273:       48 85 db                test   %rbx,%rbx




>     3276:       74 23                   je     329b <kmem_cache_alloc+0x4b>
>     3278:       8b 42 14                mov    0x14(%rdx),%eax
>     327b:       4c 8b 0c c3             mov    (%rbx,%rax,8),%r9
>     327f:       48 89 d8                mov    %rbx,%rax
>     3282:       f0 4c 0f b1 0a          lock cmpxchg %r9,(%rdx)
>     3287:       48 39 c3                cmp    %rax,%rbx
>     328a:       75 e4                   jne    3270 <kmem_cache_alloc+0x20>
>     328c:       66 85 f6                test   %si,%si
>     328f:       78 19                   js     32aa <kmem_cache_alloc+0x5a>
>     3291:       48 89 d8                mov    %rbx,%rax
>     3294:       48 83 c4 08             add    $0x8,%rsp
>     3298:       5b                      pop    %rbx
>     3299:       c9                      leaveq
>     329a:       c3                      retq
> 
> 
> > What applies to local_inc, given as example in the local_ops.txt
> > document, applies integrally to local_cmpxchg. And I would say that
> > local_cmpxchg is by far the cheapest locking mechanism I have found, and
> > use today, for my kernel tracer. The idea emerged from my need to trace
> > every execution context, including NMIs, while still providing good
> > performances. local_cmpxchg was the perfect fit; that's why I deployed
> > it in local.h in each and every architecture.
> 
> Great idea. The SLUB allocator may be able to use your idea to improve 
> both the alloc and free path.
> 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-10  1:58               ` Nick Piggin
@ 2007-07-10  6:22                 ` Matt Mackall
  2007-07-10  7:03                   ` Nick Piggin
  0 siblings, 1 reply; 111+ messages in thread
From: Matt Mackall @ 2007-07-10  6:22 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Christoph Lameter, Andrew Morton, Ingo Molnar, linux-kernel,
	linux-mm, suresh.b.siddha, corey.d.gough, Pekka Enberg,
	Denis Vlasenko, Erik Andersen

On Tue, Jul 10, 2007 at 11:58:44AM +1000, Nick Piggin wrote:
> Christoph Lameter wrote:
> >On Tue, 10 Jul 2007, Nick Piggin wrote:
> >
> >
> >>>O(n) memory savings? What is that?
> >>
> >>Allocate n things and your memory waste is proportional to n (well that's
> >>O(n) waste, so I guess by savings I mean that SLOB's memory saving 
> >>compared
> >>to SLUB are proportional to n).
> >
> >
> >n is the size of the object?
> 
> n things -- n number of things (n calls to kmem_cache_alloc()).
> 
> Just a fancy way of saying roughly that memory waste will increase as
> the size of the system increases. But that aspect of it I think is
> not really a problem for non-tiny systems anyway because the waste
> tends not to be too bad (and possibly the number of active allocations
> does not increase O(n) with the size of RAM either).

If active allocations doesn't increase O(n) with the size of RAM,
what's all that RAM for?

If your memory isn't getting used for large VMAs or large amounts of
page cache, that means it's getting used by task structs,
radix_tree_nodes, sockets, dentries, inodes, etc.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-10  6:22                 ` Matt Mackall
@ 2007-07-10  7:03                   ` Nick Piggin
  0 siblings, 0 replies; 111+ messages in thread
From: Nick Piggin @ 2007-07-10  7:03 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Christoph Lameter, Andrew Morton, Ingo Molnar, linux-kernel,
	linux-mm, suresh.b.siddha, corey.d.gough, Pekka Enberg,
	Denis Vlasenko, Erik Andersen

Matt Mackall wrote:
> On Tue, Jul 10, 2007 at 11:58:44AM +1000, Nick Piggin wrote:

>>Just a fancy way of saying roughly that memory waste will increase as
>>the size of the system increases. But that aspect of it I think is
>>not really a problem for non-tiny systems anyway because the waste
>>tends not to be too bad (and possibly the number of active allocations
>>does not increase O(n) with the size of RAM either).
> 
> 
> If active allocations doesn't increase O(n) with the size of RAM,
> what's all that RAM for?
> 
> If your memory isn't getting used for large VMAs or large amounts of
> page cache, that means it's getting used by task structs,
> radix_tree_nodes, sockets, dentries, inodes, etc.

Yeah you could be right. Actually you most likey _are_ right for many
workloads.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-10  2:11                     ` Christoph Lameter
@ 2007-07-10  7:09                       ` Nick Piggin
  2007-07-10 22:09                         ` Christoph Lameter
  2007-07-10  8:32                       ` Matt Mackall
  1 sibling, 1 reply; 111+ messages in thread
From: Nick Piggin @ 2007-07-10  7:09 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, Ingo Molnar, linux-kernel, linux-mm,
	suresh.b.siddha, corey.d.gough, Pekka Enberg, Matt Mackall,
	Denis Vlasenko, Erik Andersen

Christoph Lameter wrote:
> On Tue, 10 Jul 2007, Nick Piggin wrote:
> 
> 
>>It is reasonable to expect some help from maintainers, but I notice you
>>didn't even CC the SLOB maintainer in the patch to remove SLOB! So maybe
>>if you tried working a bit closer with him you could get better results?
> 
> 
> The maintainers last patch to SLOB was the initial submission of the 
> allocator. Then he acked subsequent patches. Most of the modifications to 
> SLOB are my work. Attempts to talk to the maintainer result in inventive 
> explanations why SLOB does not have to conform to kernel standards. There 
> is no reasonable expectation that this will change.

Well I really don't want to mediate, but even in the case of a
completely MIA maintainer, that isn't really a good idea to throw out
working and useful code.

But last time this discussion came up, IIRC you ended up handwaving
about all the ways in which SLOB was broken but didn't actually come
up with any real problems. Matt seemed willing to add those counters
or whatever it was if/when doing so solved a real problem. And remember
that SLOB doesn't have to have feature parity with SLUB, so long as it
implements the slab API such that the kernel *works*.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-09 16:08             ` Christoph Lameter
@ 2007-07-10  8:17               ` Pekka J Enberg
  -1 siblings, 0 replies; 111+ messages in thread
From: Pekka J Enberg @ 2007-07-10  8:17 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Nick Piggin, Andrew Morton, Ingo Molnar, linux-kernel, linux-mm,
	suresh.b.siddha, corey.d.gough, Matt Mackall, Denis Vlasenko,
	Erik Andersen

Hi Christoph,

On Mon, 9 Jul 2007, Pekka Enberg wrote:
> > I assume with "slab external fragmentation" you mean allocating a
> > whole page for a slab when there are not enough objects to fill the
> > whole thing thus wasting memory? We could try to combat that by
> > packing multiple variable-sized slabs within a single page. Also,
> > adding some non-power-of-two kmalloc caches might help with internal
> > fragmentation.

On Mon, 9 Jul 2007, Christoph Lameter wrote:
> Ther are already non-power-of-two kmalloc caches for 96 and 192 bytes 
> sizes.

I know that, but for my setup at least, there seems to be a need for a 
non-power of two cache between 512 and 1024. What I am seeing is average 
allocation size for kmalloc-512 being around 270-280 which wastes total 
of 10 KB of memory due to internal fragmentation. Might be a buggy caller 
that can be fixed with its own cache too.

On Mon, 9 Jul 2007, Pekka Enberg wrote:
> > In any case, SLUB needs some serious tuning for smaller machines
> > before we can get rid of SLOB.

On Mon, 9 Jul 2007, Christoph Lameter wrote:
> Switch off CONFIG_SLUB_DEBUG to get memory savings.

Curious, /proc/meminfo immediately after boot shows:

SLUB (debugging enabled):

(none):~# cat /proc/meminfo 
MemTotal:        30260 kB
MemFree:         22096 kB

SLUB (debugging disabled):

(none):~# cat /proc/meminfo 
MemTotal:        30276 kB
MemFree:         22244 kB

SLOB:

(none):~# cat /proc/meminfo 
MemTotal:        30280 kB
MemFree:         22004 kB

That's 92 KB advantage for SLUB with debugging enabled and 240 KB when 
debugging is disabled.

Nick, Matt, care to retest SLUB and SLOB for your setups?

				Pekka

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
@ 2007-07-10  8:17               ` Pekka J Enberg
  0 siblings, 0 replies; 111+ messages in thread
From: Pekka J Enberg @ 2007-07-10  8:17 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Nick Piggin, Andrew Morton, Ingo Molnar, linux-kernel, linux-mm,
	suresh.b.siddha, corey.d.gough, Matt Mackall, Denis Vlasenko,
	Erik Andersen

Hi Christoph,

On Mon, 9 Jul 2007, Pekka Enberg wrote:
> > I assume with "slab external fragmentation" you mean allocating a
> > whole page for a slab when there are not enough objects to fill the
> > whole thing thus wasting memory? We could try to combat that by
> > packing multiple variable-sized slabs within a single page. Also,
> > adding some non-power-of-two kmalloc caches might help with internal
> > fragmentation.

On Mon, 9 Jul 2007, Christoph Lameter wrote:
> Ther are already non-power-of-two kmalloc caches for 96 and 192 bytes 
> sizes.

I know that, but for my setup at least, there seems to be a need for a 
non-power of two cache between 512 and 1024. What I am seeing is average 
allocation size for kmalloc-512 being around 270-280 which wastes total 
of 10 KB of memory due to internal fragmentation. Might be a buggy caller 
that can be fixed with its own cache too.

On Mon, 9 Jul 2007, Pekka Enberg wrote:
> > In any case, SLUB needs some serious tuning for smaller machines
> > before we can get rid of SLOB.

On Mon, 9 Jul 2007, Christoph Lameter wrote:
> Switch off CONFIG_SLUB_DEBUG to get memory savings.

Curious, /proc/meminfo immediately after boot shows:

SLUB (debugging enabled):

(none):~# cat /proc/meminfo 
MemTotal:        30260 kB
MemFree:         22096 kB

SLUB (debugging disabled):

(none):~# cat /proc/meminfo 
MemTotal:        30276 kB
MemFree:         22244 kB

SLOB:

(none):~# cat /proc/meminfo 
MemTotal:        30280 kB
MemFree:         22004 kB

That's 92 KB advantage for SLUB with debugging enabled and 240 KB when 
debugging is disabled.

Nick, Matt, care to retest SLUB and SLOB for your setups?

				Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-10  8:17               ` Pekka J Enberg
@ 2007-07-10  8:27                 ` Nick Piggin
  -1 siblings, 0 replies; 111+ messages in thread
From: Nick Piggin @ 2007-07-10  8:27 UTC (permalink / raw)
  To: Pekka J Enberg
  Cc: Christoph Lameter, Andrew Morton, Ingo Molnar, linux-kernel,
	linux-mm, suresh.b.siddha, corey.d.gough, Matt Mackall,
	Denis Vlasenko, Erik Andersen

Pekka J Enberg wrote:

> Curious, /proc/meminfo immediately after boot shows:
> 
> SLUB (debugging enabled):
> 
> (none):~# cat /proc/meminfo 
> MemTotal:        30260 kB
> MemFree:         22096 kB
> 
> SLUB (debugging disabled):
> 
> (none):~# cat /proc/meminfo 
> MemTotal:        30276 kB
> MemFree:         22244 kB
> 
> SLOB:
> 
> (none):~# cat /proc/meminfo 
> MemTotal:        30280 kB
> MemFree:         22004 kB
> 
> That's 92 KB advantage for SLUB with debugging enabled and 240 KB when 
> debugging is disabled.

Interesting. What kernel version are you using?


> Nick, Matt, care to retest SLUB and SLOB for your setups?

I don't think there has been a significant change in the area of
memory efficiency in either since I last tested, and Christoph and
I both produced the same result.

I can't say where SLOB is losing its memory, but there are a few
places that can still be improved, so I might get keen and take
another look at it once all the improvements to both allocators
gets upstream.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
@ 2007-07-10  8:27                 ` Nick Piggin
  0 siblings, 0 replies; 111+ messages in thread
From: Nick Piggin @ 2007-07-10  8:27 UTC (permalink / raw)
  To: Pekka J Enberg
  Cc: Christoph Lameter, Andrew Morton, Ingo Molnar, linux-kernel,
	linux-mm, suresh.b.siddha, corey.d.gough, Matt Mackall,
	Denis Vlasenko, Erik Andersen

Pekka J Enberg wrote:

> Curious, /proc/meminfo immediately after boot shows:
> 
> SLUB (debugging enabled):
> 
> (none):~# cat /proc/meminfo 
> MemTotal:        30260 kB
> MemFree:         22096 kB
> 
> SLUB (debugging disabled):
> 
> (none):~# cat /proc/meminfo 
> MemTotal:        30276 kB
> MemFree:         22244 kB
> 
> SLOB:
> 
> (none):~# cat /proc/meminfo 
> MemTotal:        30280 kB
> MemFree:         22004 kB
> 
> That's 92 KB advantage for SLUB with debugging enabled and 240 KB when 
> debugging is disabled.

Interesting. What kernel version are you using?


> Nick, Matt, care to retest SLUB and SLOB for your setups?

I don't think there has been a significant change in the area of
memory efficiency in either since I last tested, and Christoph and
I both produced the same result.

I can't say where SLOB is losing its memory, but there are a few
places that can still be improved, so I might get keen and take
another look at it once all the improvements to both allocators
gets upstream.

-- 
SUSE Labs, Novell Inc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
  2007-07-10  0:55                   ` Christoph Lameter
@ 2007-07-10  8:27                     ` Mathieu Desnoyers
  -1 siblings, 0 replies; 111+ messages in thread
From: Mathieu Desnoyers @ 2007-07-10  8:27 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Martin Bligh, Andi Kleen, linux-kernel, linux-mm, David Miller

* Christoph Lameter (clameter@sgi.com) wrote:
> Ok here is a replacement patch for the cmpxchg patch. Problems
> 
> 1. cmpxchg_local is not available on all arches. If we wanted to do
>    this then it needs to be universally available.
> 

cmpxchg_local is not available on all archs, but local_cmpxchg is. It
expects a local_t type which is nothing else than a long. When the local
atomic operation is not more efficient or not implemented on a given
architecture, asm-generic/local.h falls back on atomic_long_t. If you
want, you could work on the local_t type, which you could cast from a
long to a pointer when you need so, since their size are, AFAIK, always
the same (and some VM code even assume this is always the case).

> 2. cmpxchg_local does generate the "lock" prefix. It should not do that.
>    Without fixes to cmpxchg_local we cannot expect maximum performance.
> 

Yup, see the patch I just posted for this.

> 3. The approach is x86 centric. It relies on a cmpxchg that does not
>    synchronize with memory used by other cpus and therefore is more
>    lightweight. As far as I know the IA64 cmpxchg cannot do that.
>    Neither several other processors. I am not sure how cmpxchgless
>    platforms would use that. We need a detailed comparison of
>    interrupt enable /disable vs. cmpxchg cycle counts for cachelines in
>    the cpu cache to evaluate the impact that such a change would have.
> 
>    The cmpxchg (or its emulation) does not need any barriers since the
>    accesses can only come from a single processor. 
> 

Yes, expected improvements goes as follow:
x86, x86_64 : must faster due to non-LOCKed cmpxchg
alpha: should be faster due to memory barrier removal
mips: memory barriers removed
powerpc 32/64: memory barriers removed

On other architectures, either there is no better implementation than
the standard atomic cmpxchg or it just has not been implemented.

I guess that a test series that would tell us how must improvement is
seen on the optimized architectures (local cmpxchg vs interrupt
enable/disable) and also what effect the standard cmpxchg has compared
to interrupt disable/enable on the architectures where we can't do
better than the standard cmpxchg will tell us if it is an interesting
way to go.  I would be happy to do these tests, but I don't have the
hardware handy. I provide a test module to get these characteristics
from various architectures in this email.

> Mathieu measured a significant performance benefit coming from not using
> interrupt enable / disable.
> 
> Some rough processor cycle counts (anyone have better numbers?)
> 
> 	STI	CLI	CMPXCHG
> IA32	36	26	1 (assume XCHG == CMPXCHG, sti/cli also need stack pushes/pulls)
> IA64	12	12	1 (but ar.ccv needs 11 cycles to set comparator,
> 			need register moves to preserve processors flags)
> 

The measurements I get (in cycles):

             enable interrupts (STI)   disable interrupts (CLI)   local CMPXCHG
IA32 (P4)    112                        82                         26
x86_64 AMD64 125                       102                         19

> Looks like STI/CLI is pretty expensive and it seems that we may be able to
> optimize the alloc / free hotpath quite a bit if we could drop the 
> interrupt enable / disable. But we need some measurements.
> 
> 
> Draft of a new patch:
> 
> SLUB: Single atomic instruction alloc/free using cmpxchg_local
> 
> A cmpxchg allows us to avoid disabling and enabling interrupts. The cmpxchg
> is optimal to allow operations on per cpu freelist. We can stay on one
> processor by disabling preemption() and allowing concurrent interrupts
> thus avoiding the overhead of disabling and enabling interrupts.
> 
> Pro:
> 	- No need to disable interrupts.
> 	- Preempt disable /enable vanishes on non preempt kernels
> Con:
>         - Slightly complexer handling.
> 	- Updates to atomic instructions needed
> 
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
> 

Test local cmpxchg vs int disable/enable. Please run on a 2.6.22 kernel
(or recent 2.6.21-rcX-mmX) (with my cmpxchg local fix patch for x86_64).
Make sure the TSC reads (get_cycles()) are reliable on your platform.

Mathieu

/* test-cmpxchg-nolock.c
 *
 * Compare local cmpxchg with irq disable / enable.
 */

#include <linux/jiffies.h>
#include <linux/compiler.h>
#include <linux/init.h>
#include <linux/module.h>
#include <linux/calc64.h>
#include <asm/timex.h>
#include <asm/system.h>

#define NR_LOOPS 20000

int test_val = 0;

static void do_test_cmpxchg(void)
{
	int ret;
	long flags;
	unsigned int i;
	cycles_t time1, time2, time;
	long rem;

	local_irq_save(flags);
	preempt_disable();
	time1 = get_cycles();
	for (i = 0; i < NR_LOOPS; i++) {
		ret = cmpxchg_local(&test_val, 0, 0);
	}
	time2 = get_cycles();
	local_irq_restore(flags);
	preempt_enable();
	time = time2 - time1;

	printk(KERN_ALERT "test results: time for non locked cmpxchg\n");
	printk(KERN_ALERT "number of loops: %d\n", NR_LOOPS);
	printk(KERN_ALERT "total time: %llu\n", time);
	time = div_long_long_rem(time, NR_LOOPS, &rem);
	printk(KERN_ALERT "-> non locked cmpxchg takes %llu cycles\n", time);
	printk(KERN_ALERT "test end\n");
}

/*
 * This test will have a higher standard deviation due to incoming interrupts.
 */
static void do_test_enable_int(void)
{
	long flags;
	unsigned int i;
	cycles_t time1, time2, time;
	long rem;

	local_irq_save(flags);
	preempt_disable();
	time1 = get_cycles();
	for (i = 0; i < NR_LOOPS; i++) {
		local_irq_restore(flags);
	}
	time2 = get_cycles();
	local_irq_restore(flags);
	preempt_enable();
	time = time2 - time1;

	printk(KERN_ALERT "test results: time for enabling interrupts (STI)\n");
	printk(KERN_ALERT "number of loops: %d\n", NR_LOOPS);
	printk(KERN_ALERT "total time: %llu\n", time);
	time = div_long_long_rem(time, NR_LOOPS, &rem);
	printk(KERN_ALERT "-> enabling interrupts (STI) takes %llu cycles\n",
					time);
	printk(KERN_ALERT "test end\n");
}

static void do_test_disable_int(void)
{
	unsigned long flags, flags2;
	unsigned int i;
	cycles_t time1, time2, time;
	long rem;

	local_irq_save(flags);
	preempt_disable();
	time1 = get_cycles();
	for ( i = 0; i < NR_LOOPS; i++) {
		local_irq_save(flags2);
	}
	time2 = get_cycles();
	local_irq_restore(flags);
	preempt_enable();
	time = time2 - time1;

	printk(KERN_ALERT "test results: time for disabling interrupts (CLI)\n");
	printk(KERN_ALERT "number of loops: %d\n", NR_LOOPS);
	printk(KERN_ALERT "total time: %llu\n", time);
	time = div_long_long_rem(time, NR_LOOPS, &rem);
	printk(KERN_ALERT "-> disabling interrupts (CLI) takes %llu cycles\n",
				time);
	printk(KERN_ALERT "test end\n");
}



static int ltt_test_init(void)
{
	printk(KERN_ALERT "test init\n");
	
	do_test_cmpxchg();
	do_test_enable_int();
	do_test_disable_int();
	return -EAGAIN; /* Fail will directly unload the module */
}

static void ltt_test_exit(void)
{
	printk(KERN_ALERT "test exit\n");
}

module_init(ltt_test_init)
module_exit(ltt_test_exit)

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Mathieu Desnoyers");
MODULE_DESCRIPTION("Cmpxchg local test");

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
@ 2007-07-10  8:27                     ` Mathieu Desnoyers
  0 siblings, 0 replies; 111+ messages in thread
From: Mathieu Desnoyers @ 2007-07-10  8:27 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Martin Bligh, Andi Kleen, linux-kernel, linux-mm, David Miller

* Christoph Lameter (clameter@sgi.com) wrote:
> Ok here is a replacement patch for the cmpxchg patch. Problems
> 
> 1. cmpxchg_local is not available on all arches. If we wanted to do
>    this then it needs to be universally available.
> 

cmpxchg_local is not available on all archs, but local_cmpxchg is. It
expects a local_t type which is nothing else than a long. When the local
atomic operation is not more efficient or not implemented on a given
architecture, asm-generic/local.h falls back on atomic_long_t. If you
want, you could work on the local_t type, which you could cast from a
long to a pointer when you need so, since their size are, AFAIK, always
the same (and some VM code even assume this is always the case).

> 2. cmpxchg_local does generate the "lock" prefix. It should not do that.
>    Without fixes to cmpxchg_local we cannot expect maximum performance.
> 

Yup, see the patch I just posted for this.

> 3. The approach is x86 centric. It relies on a cmpxchg that does not
>    synchronize with memory used by other cpus and therefore is more
>    lightweight. As far as I know the IA64 cmpxchg cannot do that.
>    Neither several other processors. I am not sure how cmpxchgless
>    platforms would use that. We need a detailed comparison of
>    interrupt enable /disable vs. cmpxchg cycle counts for cachelines in
>    the cpu cache to evaluate the impact that such a change would have.
> 
>    The cmpxchg (or its emulation) does not need any barriers since the
>    accesses can only come from a single processor. 
> 

Yes, expected improvements goes as follow:
x86, x86_64 : must faster due to non-LOCKed cmpxchg
alpha: should be faster due to memory barrier removal
mips: memory barriers removed
powerpc 32/64: memory barriers removed

On other architectures, either there is no better implementation than
the standard atomic cmpxchg or it just has not been implemented.

I guess that a test series that would tell us how must improvement is
seen on the optimized architectures (local cmpxchg vs interrupt
enable/disable) and also what effect the standard cmpxchg has compared
to interrupt disable/enable on the architectures where we can't do
better than the standard cmpxchg will tell us if it is an interesting
way to go.  I would be happy to do these tests, but I don't have the
hardware handy. I provide a test module to get these characteristics
from various architectures in this email.

> Mathieu measured a significant performance benefit coming from not using
> interrupt enable / disable.
> 
> Some rough processor cycle counts (anyone have better numbers?)
> 
> 	STI	CLI	CMPXCHG
> IA32	36	26	1 (assume XCHG == CMPXCHG, sti/cli also need stack pushes/pulls)
> IA64	12	12	1 (but ar.ccv needs 11 cycles to set comparator,
> 			need register moves to preserve processors flags)
> 

The measurements I get (in cycles):

             enable interrupts (STI)   disable interrupts (CLI)   local CMPXCHG
IA32 (P4)    112                        82                         26
x86_64 AMD64 125                       102                         19

> Looks like STI/CLI is pretty expensive and it seems that we may be able to
> optimize the alloc / free hotpath quite a bit if we could drop the 
> interrupt enable / disable. But we need some measurements.
> 
> 
> Draft of a new patch:
> 
> SLUB: Single atomic instruction alloc/free using cmpxchg_local
> 
> A cmpxchg allows us to avoid disabling and enabling interrupts. The cmpxchg
> is optimal to allow operations on per cpu freelist. We can stay on one
> processor by disabling preemption() and allowing concurrent interrupts
> thus avoiding the overhead of disabling and enabling interrupts.
> 
> Pro:
> 	- No need to disable interrupts.
> 	- Preempt disable /enable vanishes on non preempt kernels
> Con:
>         - Slightly complexer handling.
> 	- Updates to atomic instructions needed
> 
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
> 

Test local cmpxchg vs int disable/enable. Please run on a 2.6.22 kernel
(or recent 2.6.21-rcX-mmX) (with my cmpxchg local fix patch for x86_64).
Make sure the TSC reads (get_cycles()) are reliable on your platform.

Mathieu

/* test-cmpxchg-nolock.c
 *
 * Compare local cmpxchg with irq disable / enable.
 */

#include <linux/jiffies.h>
#include <linux/compiler.h>
#include <linux/init.h>
#include <linux/module.h>
#include <linux/calc64.h>
#include <asm/timex.h>
#include <asm/system.h>

#define NR_LOOPS 20000

int test_val = 0;

static void do_test_cmpxchg(void)
{
	int ret;
	long flags;
	unsigned int i;
	cycles_t time1, time2, time;
	long rem;

	local_irq_save(flags);
	preempt_disable();
	time1 = get_cycles();
	for (i = 0; i < NR_LOOPS; i++) {
		ret = cmpxchg_local(&test_val, 0, 0);
	}
	time2 = get_cycles();
	local_irq_restore(flags);
	preempt_enable();
	time = time2 - time1;

	printk(KERN_ALERT "test results: time for non locked cmpxchg\n");
	printk(KERN_ALERT "number of loops: %d\n", NR_LOOPS);
	printk(KERN_ALERT "total time: %llu\n", time);
	time = div_long_long_rem(time, NR_LOOPS, &rem);
	printk(KERN_ALERT "-> non locked cmpxchg takes %llu cycles\n", time);
	printk(KERN_ALERT "test end\n");
}

/*
 * This test will have a higher standard deviation due to incoming interrupts.
 */
static void do_test_enable_int(void)
{
	long flags;
	unsigned int i;
	cycles_t time1, time2, time;
	long rem;

	local_irq_save(flags);
	preempt_disable();
	time1 = get_cycles();
	for (i = 0; i < NR_LOOPS; i++) {
		local_irq_restore(flags);
	}
	time2 = get_cycles();
	local_irq_restore(flags);
	preempt_enable();
	time = time2 - time1;

	printk(KERN_ALERT "test results: time for enabling interrupts (STI)\n");
	printk(KERN_ALERT "number of loops: %d\n", NR_LOOPS);
	printk(KERN_ALERT "total time: %llu\n", time);
	time = div_long_long_rem(time, NR_LOOPS, &rem);
	printk(KERN_ALERT "-> enabling interrupts (STI) takes %llu cycles\n",
					time);
	printk(KERN_ALERT "test end\n");
}

static void do_test_disable_int(void)
{
	unsigned long flags, flags2;
	unsigned int i;
	cycles_t time1, time2, time;
	long rem;

	local_irq_save(flags);
	preempt_disable();
	time1 = get_cycles();
	for ( i = 0; i < NR_LOOPS; i++) {
		local_irq_save(flags2);
	}
	time2 = get_cycles();
	local_irq_restore(flags);
	preempt_enable();
	time = time2 - time1;

	printk(KERN_ALERT "test results: time for disabling interrupts (CLI)\n");
	printk(KERN_ALERT "number of loops: %d\n", NR_LOOPS);
	printk(KERN_ALERT "total time: %llu\n", time);
	time = div_long_long_rem(time, NR_LOOPS, &rem);
	printk(KERN_ALERT "-> disabling interrupts (CLI) takes %llu cycles\n",
				time);
	printk(KERN_ALERT "test end\n");
}



static int ltt_test_init(void)
{
	printk(KERN_ALERT "test init\n");
	
	do_test_cmpxchg();
	do_test_enable_int();
	do_test_disable_int();
	return -EAGAIN; /* Fail will directly unload the module */
}

static void ltt_test_exit(void)
{
	printk(KERN_ALERT "test exit\n");
}

module_init(ltt_test_init)
module_exit(ltt_test_exit)

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Mathieu Desnoyers");
MODULE_DESCRIPTION("Cmpxchg local test");

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-10  2:11                     ` Christoph Lameter
  2007-07-10  7:09                       ` Nick Piggin
@ 2007-07-10  8:32                       ` Matt Mackall
  2007-07-10  9:01                         ` Håvard Skinnemoen
  2007-07-11  1:37                         ` Christoph Lameter
  1 sibling, 2 replies; 111+ messages in thread
From: Matt Mackall @ 2007-07-10  8:32 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Nick Piggin, Andrew Morton, Ingo Molnar, linux-kernel, linux-mm,
	suresh.b.siddha, corey.d.gough, Pekka Enberg, Denis Vlasenko,
	Erik Andersen

On Mon, Jul 09, 2007 at 07:11:03PM -0700, Christoph Lameter wrote:
> On Tue, 10 Jul 2007, Nick Piggin wrote:
> 
> > It is reasonable to expect some help from maintainers, but I notice you
> > didn't even CC the SLOB maintainer in the patch to remove SLOB! So maybe
> > if you tried working a bit closer with him you could get better results?
> 
> The maintainers last patch to SLOB was the initial submission of the 
> allocator. Then he acked subsequent patches. Most of the modifications to 
> SLOB are my work.

You're delusional.

http://www.kernel.org/hg/linux-2.6/annotate/tip/mm/slob.c

A grand total of 15 lines accounted to you, all of which are
completely trivial. Most of them are turning no-op macros into no-op
but non-inline functions. Not an improvement. Reverting that now on my
todo list, thanks for drawing my attention to it. The remainder is
just churn from your SLUB work.

While you're at it, note the lines from Dimitri, quickly fixing the
breakage you introduced.

Count many more lines from Nick, Pekka, and Akinobu making useful
changes. And note that all three of the real bugs fixed were fairly
hard to hit. It's unlikely that anyone would have ever hit the RCU
bug. The find_order bug only hit allocations of unlikely sizes like
8193 bytes. And SLAB_PANIC triggering is fairly unheard of, and simply
makes the kernel crash slightly sooner anyway.

The only remaining known bug is arguably a problem in nommu that SLOB
shouldn't be papering over.

That's pretty damn stable compared to the other allocators in the
kernel.

> Attempts to talk to the maintainer result in inventive explanations
> why SLOB does not have to conform to kernel standards.

Um.. would those be the same kernel standards you pulled out of your
ass to argue SLOB shouldn't exist?

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-10  8:32                       ` Matt Mackall
@ 2007-07-10  9:01                         ` Håvard Skinnemoen
  2007-07-10  9:11                           ` Nick Piggin
  2007-07-11  1:37                         ` Christoph Lameter
  1 sibling, 1 reply; 111+ messages in thread
From: Håvard Skinnemoen @ 2007-07-10  9:01 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Christoph Lameter, Nick Piggin, Andrew Morton, Ingo Molnar,
	linux-kernel, linux-mm, suresh.b.siddha, corey.d.gough,
	Pekka Enberg, Denis Vlasenko, Erik Andersen

On 7/10/07, Matt Mackall <mpm@selenic.com> wrote:
> The only remaining known bug is arguably a problem in nommu that SLOB
> shouldn't be papering over.

I've got another one for you: SLOB ignores ARCH_KMALLOC_MINALIGN so
using SLOB in combination with DMA and non-coherent architectures
causes data corruption.

That said, there are currently very few architectures that define
ARCH_KMALLOC_MINALIGN, so I guess it might be something I'm doing
wrong on avr32. But I'd really like to know how other non-coherent
architectures handle DMA to buffers sharing cachelines with unrelated
data...

Håvard

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-10  9:01                         ` Håvard Skinnemoen
@ 2007-07-10  9:11                           ` Nick Piggin
  2007-07-10  9:21                             ` Håvard Skinnemoen
  0 siblings, 1 reply; 111+ messages in thread
From: Nick Piggin @ 2007-07-10  9:11 UTC (permalink / raw)
  To: Håvard Skinnemoen
  Cc: Matt Mackall, Christoph Lameter, Andrew Morton, Ingo Molnar,
	linux-kernel, linux-mm, suresh.b.siddha, corey.d.gough,
	Pekka Enberg, Denis Vlasenko, Erik Andersen

Håvard Skinnemoen wrote:
> On 7/10/07, Matt Mackall <mpm@selenic.com> wrote:
> 
>> The only remaining known bug is arguably a problem in nommu that SLOB
>> shouldn't be papering over.
> 
> 
> I've got another one for you: SLOB ignores ARCH_KMALLOC_MINALIGN so
> using SLOB in combination with DMA and non-coherent architectures
> causes data corruption.

Should be fixed in mm, I believe: slob-improved-alignment-handling.patch

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-10  9:11                           ` Nick Piggin
@ 2007-07-10  9:21                             ` Håvard Skinnemoen
  0 siblings, 0 replies; 111+ messages in thread
From: Håvard Skinnemoen @ 2007-07-10  9:21 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Matt Mackall, Christoph Lameter, Andrew Morton, Ingo Molnar,
	linux-kernel, linux-mm, suresh.b.siddha, corey.d.gough,
	Pekka Enberg, Denis Vlasenko, Erik Andersen

On 7/10/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> Håvard Skinnemoen wrote:
> > On 7/10/07, Matt Mackall <mpm@selenic.com> wrote:
> >
> >> The only remaining known bug is arguably a problem in nommu that SLOB
> >> shouldn't be papering over.
> >
> >
> > I've got another one for you: SLOB ignores ARCH_KMALLOC_MINALIGN so
> > using SLOB in combination with DMA and non-coherent architectures
> > causes data corruption.
>
> Should be fixed in mm, I believe: slob-improved-alignment-handling.patch

Indeed. Thanks, I'll give it a try later today.

Håvard

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-10  8:27                 ` Nick Piggin
@ 2007-07-10  9:31                   ` Pekka Enberg
  -1 siblings, 0 replies; 111+ messages in thread
From: Pekka Enberg @ 2007-07-10  9:31 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Christoph Lameter, Andrew Morton, Ingo Molnar, linux-kernel,
	linux-mm, suresh.b.siddha, corey.d.gough, Matt Mackall,
	Denis Vlasenko, Erik Andersen

Hi Nick,

Pekka J Enberg wrote:
> > That's 92 KB advantage for SLUB with debugging enabled and 240 KB when
> > debugging is disabled.

On 7/10/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> Interesting. What kernel version are you using?

Linus' git head from yesterday so the results are likely to be
sensitive to workload and mine doesn't represent real embedded use.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
@ 2007-07-10  9:31                   ` Pekka Enberg
  0 siblings, 0 replies; 111+ messages in thread
From: Pekka Enberg @ 2007-07-10  9:31 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Christoph Lameter, Andrew Morton, Ingo Molnar, linux-kernel,
	linux-mm, suresh.b.siddha, corey.d.gough, Matt Mackall,
	Denis Vlasenko, Erik Andersen

Hi Nick,

Pekka J Enberg wrote:
> > That's 92 KB advantage for SLUB with debugging enabled and 240 KB when
> > debugging is disabled.

On 7/10/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> Interesting. What kernel version are you using?

Linus' git head from yesterday so the results are likely to be
sensitive to workload and mine doesn't represent real embedded use.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-10  9:31                   ` Pekka Enberg
@ 2007-07-10 10:09                     ` Nick Piggin
  -1 siblings, 0 replies; 111+ messages in thread
From: Nick Piggin @ 2007-07-10 10:09 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Ingo Molnar, linux-kernel,
	linux-mm, suresh.b.siddha, corey.d.gough, Matt Mackall,
	Denis Vlasenko, Erik Andersen

Pekka Enberg wrote:
> Hi Nick,
> 
> Pekka J Enberg wrote:
> 
>> > That's 92 KB advantage for SLUB with debugging enabled and 240 KB when
>> > debugging is disabled.
> 
> 
> On 7/10/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
>> Interesting. What kernel version are you using?
> 
> 
> Linus' git head from yesterday so the results are likely to be
> sensitive to workload and mine doesn't represent real embedded use.

Hi Pekka,

There is one thing that the SLOB patches in -mm do besides result in
slightly better packing and memory efficiency (which might be unlikely
to explain the difference you are seeing), and that is that they do
away with the delayed freeing of unused SLOB pages back to the page
allocator.

In git head, these pages are freed via a timer so they can take a
while to make their way back to the buddy allocator so they don't
register as free memory as such.

Anyway, I would be very interested to see any situation where the
SLOB in -mm uses more memory than SLUB, even on test configs like
yours.

Thanks,
Nick

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
@ 2007-07-10 10:09                     ` Nick Piggin
  0 siblings, 0 replies; 111+ messages in thread
From: Nick Piggin @ 2007-07-10 10:09 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Ingo Molnar, linux-kernel,
	linux-mm, suresh.b.siddha, corey.d.gough, Matt Mackall,
	Denis Vlasenko, Erik Andersen

Pekka Enberg wrote:
> Hi Nick,
> 
> Pekka J Enberg wrote:
> 
>> > That's 92 KB advantage for SLUB with debugging enabled and 240 KB when
>> > debugging is disabled.
> 
> 
> On 7/10/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
>> Interesting. What kernel version are you using?
> 
> 
> Linus' git head from yesterday so the results are likely to be
> sensitive to workload and mine doesn't represent real embedded use.

Hi Pekka,

There is one thing that the SLOB patches in -mm do besides result in
slightly better packing and memory efficiency (which might be unlikely
to explain the difference you are seeing), and that is that they do
away with the delayed freeing of unused SLOB pages back to the page
allocator.

In git head, these pages are freed via a timer so they can take a
while to make their way back to the buddy allocator so they don't
register as free memory as such.

Anyway, I would be very interested to see any situation where the
SLOB in -mm uses more memory than SLUB, even on test configs like
yours.

Thanks,
Nick

-- 
SUSE Labs, Novell Inc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-10  9:31                   ` Pekka Enberg
@ 2007-07-10 12:02                     ` Matt Mackall
  -1 siblings, 0 replies; 111+ messages in thread
From: Matt Mackall @ 2007-07-10 12:02 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Nick Piggin, Christoph Lameter, Andrew Morton, Ingo Molnar,
	linux-kernel, linux-mm, suresh.b.siddha, corey.d.gough,
	Denis Vlasenko, Erik Andersen

On Tue, Jul 10, 2007 at 12:31:40PM +0300, Pekka Enberg wrote:
> Hi Nick,
> 
> Pekka J Enberg wrote:
> >> That's 92 KB advantage for SLUB with debugging enabled and 240 KB when
> >> debugging is disabled.
> 
> On 7/10/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> >Interesting. What kernel version are you using?
> 
> Linus' git head from yesterday so the results are likely to be
> sensitive to workload and mine doesn't represent real embedded use.

Using 2.6.22-rc6-mm1 with a 64MB lguest and busybox, I'm seeing the
following as the best MemFree numbers after several boots each:

SLAB: 54796
SLOB: 55044
SLUB: 53944
SLUB: 54788 (debug turned off)

These numbers bounce around a lot more from boot to boot than I
remember, so take these numbers with a grain of salt.

Disabling the debug code in the build gives this, by the way:

mm/slub.c: In function ‘init_kmem_cache_node’:
mm/slub.c:1873: error: ‘struct kmem_cache_node’ has no member named
‘full’

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
@ 2007-07-10 12:02                     ` Matt Mackall
  0 siblings, 0 replies; 111+ messages in thread
From: Matt Mackall @ 2007-07-10 12:02 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Nick Piggin, Christoph Lameter, Andrew Morton, Ingo Molnar,
	linux-kernel, linux-mm, suresh.b.siddha, corey.d.gough,
	Denis Vlasenko, Erik Andersen

On Tue, Jul 10, 2007 at 12:31:40PM +0300, Pekka Enberg wrote:
> Hi Nick,
> 
> Pekka J Enberg wrote:
> >> That's 92 KB advantage for SLUB with debugging enabled and 240 KB when
> >> debugging is disabled.
> 
> On 7/10/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> >Interesting. What kernel version are you using?
> 
> Linus' git head from yesterday so the results are likely to be
> sensitive to workload and mine doesn't represent real embedded use.

Using 2.6.22-rc6-mm1 with a 64MB lguest and busybox, I'm seeing the
following as the best MemFree numbers after several boots each:

SLAB: 54796
SLOB: 55044
SLUB: 53944
SLUB: 54788 (debug turned off)

These numbers bounce around a lot more from boot to boot than I
remember, so take these numbers with a grain of salt.

Disabling the debug code in the build gives this, by the way:

mm/slub.c: In function a??init_kmem_cache_nodea??:
mm/slub.c:1873: error: a??struct kmem_cache_nodea?? has no member named
a??fulla??

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-10 12:02                     ` Matt Mackall
@ 2007-07-10 12:57                       ` Pekka J Enberg
  -1 siblings, 0 replies; 111+ messages in thread
From: Pekka J Enberg @ 2007-07-10 12:57 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Nick Piggin, Christoph Lameter, Andrew Morton, Ingo Molnar,
	linux-kernel, linux-mm, suresh.b.siddha, corey.d.gough,
	Denis Vlasenko, Erik Andersen

Hi Matt,

On Tue, 10 Jul 2007, Matt Mackall wrote:
> Using 2.6.22-rc6-mm1 with a 64MB lguest and busybox, I'm seeing the
> following as the best MemFree numbers after several boots each:
> 
> SLAB: 54796
> SLOB: 55044
> SLUB: 53944
> SLUB: 54788 (debug turned off)
> 
> These numbers bounce around a lot more from boot to boot than I
> remember, so take these numbers with a grain of salt.

To rule out userland, 2.6.22 with 32 MB defconfig UML and busybox [1] on 
i386:

SLOB: 26708
SLUB: 27212 (no debug)

Unfortunately UML is broken in 2.6.22-rc6-mm1, so I don't know if SLOB 
patches help there.

  1. http://uml.nagafix.co.uk/BusyBox-1.5.0/BusyBox-1.5.0-x86-root_fs.bz2

			Pekka

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
@ 2007-07-10 12:57                       ` Pekka J Enberg
  0 siblings, 0 replies; 111+ messages in thread
From: Pekka J Enberg @ 2007-07-10 12:57 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Nick Piggin, Christoph Lameter, Andrew Morton, Ingo Molnar,
	linux-kernel, linux-mm, suresh.b.siddha, corey.d.gough,
	Denis Vlasenko, Erik Andersen

Hi Matt,

On Tue, 10 Jul 2007, Matt Mackall wrote:
> Using 2.6.22-rc6-mm1 with a 64MB lguest and busybox, I'm seeing the
> following as the best MemFree numbers after several boots each:
> 
> SLAB: 54796
> SLOB: 55044
> SLUB: 53944
> SLUB: 54788 (debug turned off)
> 
> These numbers bounce around a lot more from boot to boot than I
> remember, so take these numbers with a grain of salt.

To rule out userland, 2.6.22 with 32 MB defconfig UML and busybox [1] on 
i386:

SLOB: 26708
SLUB: 27212 (no debug)

Unfortunately UML is broken in 2.6.22-rc6-mm1, so I don't know if SLOB 
patches help there.

  1. http://uml.nagafix.co.uk/BusyBox-1.5.0/BusyBox-1.5.0-x86-root_fs.bz2

			Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
  2007-07-10  8:27                     ` Mathieu Desnoyers
@ 2007-07-10 18:38                       ` Christoph Lameter
  -1 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-10 18:38 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Martin Bligh, Andi Kleen, linux-kernel, linux-mm, David Miller

On Tue, 10 Jul 2007, Mathieu Desnoyers wrote:

> cmpxchg_local is not available on all archs, but local_cmpxchg is. It
> expects a local_t type which is nothing else than a long. When the local
> atomic operation is not more efficient or not implemented on a given
> architecture, asm-generic/local.h falls back on atomic_long_t. If you
> want, you could work on the local_t type, which you could cast from a
> long to a pointer when you need so, since their size are, AFAIK, always
> the same (and some VM code even assume this is always the case).

It would be cleaner to have cmpxchg_local on all arches. The type 
conversion is hacky. If this is really working then we should also use the 
mechanism for other things like the vm statistics.

> The measurements I get (in cycles):
> 
>              enable interrupts (STI)   disable interrupts (CLI)   local CMPXCHG
> IA32 (P4)    112                        82                         26
> x86_64 AMD64 125                       102                         19


Looks good and seems to indicate that we can at least double the speed of 
slab allocation.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
@ 2007-07-10 18:38                       ` Christoph Lameter
  0 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-10 18:38 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Martin Bligh, Andi Kleen, linux-kernel, linux-mm, David Miller

On Tue, 10 Jul 2007, Mathieu Desnoyers wrote:

> cmpxchg_local is not available on all archs, but local_cmpxchg is. It
> expects a local_t type which is nothing else than a long. When the local
> atomic operation is not more efficient or not implemented on a given
> architecture, asm-generic/local.h falls back on atomic_long_t. If you
> want, you could work on the local_t type, which you could cast from a
> long to a pointer when you need so, since their size are, AFAIK, always
> the same (and some VM code even assume this is always the case).

It would be cleaner to have cmpxchg_local on all arches. The type 
conversion is hacky. If this is really working then we should also use the 
mechanism for other things like the vm statistics.

> The measurements I get (in cycles):
> 
>              enable interrupts (STI)   disable interrupts (CLI)   local CMPXCHG
> IA32 (P4)    112                        82                         26
> x86_64 AMD64 125                       102                         19


Looks good and seems to indicate that we can at least double the speed of 
slab allocation.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH] x86_64 - Use non locked version for local_cmpxchg()
  2007-07-10  5:16                     ` Mathieu Desnoyers
@ 2007-07-10 20:46                       ` Christoph Lameter
  -1 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-10 20:46 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, Martin Bligh, Andi Kleen, linux-kernel, linux-mm

On Tue, 10 Jul 2007, Mathieu Desnoyers wrote:

> You are completely right: on x86_64, a bit got lost in the move to
> cmpxchg.h, here is the fix. It applies on 2.6.22-rc6-mm1.

A trival fix. Make sure that it gets merged soon.

Acked-by: Christoph Lameter <clameter@sgi.com>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH] x86_64 - Use non locked version for local_cmpxchg()
@ 2007-07-10 20:46                       ` Christoph Lameter
  0 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-10 20:46 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, Martin Bligh, Andi Kleen, linux-kernel, linux-mm

On Tue, 10 Jul 2007, Mathieu Desnoyers wrote:

> You are completely right: on x86_64, a bit got lost in the move to
> cmpxchg.h, here is the fix. It applies on 2.6.22-rc6-mm1.

A trival fix. Make sure that it gets merged soon.

Acked-by: Christoph Lameter <clameter@sgi.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
  2007-07-10  8:27                     ` Mathieu Desnoyers
@ 2007-07-10 20:59                       ` Mathieu Desnoyers
  -1 siblings, 0 replies; 111+ messages in thread
From: Mathieu Desnoyers @ 2007-07-10 20:59 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Martin Bligh, Andi Kleen, linux-kernel, linux-mm, David Miller,
	Alexandre Guédon

Another architecture tested

Comparison: irq enable/disable vs local CMPXCHG
             enable interrupts (STI)   disable interrupts (CLI)    local CMPXCHG
Tested-by: Mathieu Desnoyers <compudj@krystal.dyndns.org>
IA32 (P4)               112                        82                       26
x86_64 AMD64            125                       102                       19
Tested-by: Alexandre Guédon <totalworlddomination@gmail.com>
x86_64 Intel Core2 Quad  21                        19                        7


-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
@ 2007-07-10 20:59                       ` Mathieu Desnoyers
  0 siblings, 0 replies; 111+ messages in thread
From: Mathieu Desnoyers @ 2007-07-10 20:59 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Martin Bligh, Andi Kleen, linux-kernel, linux-mm, David Miller,
	Alexandre Guédon

Another architecture tested

Comparison: irq enable/disable vs local CMPXCHG
             enable interrupts (STI)   disable interrupts (CLI)    local CMPXCHG
Tested-by: Mathieu Desnoyers <compudj@krystal.dyndns.org>
IA32 (P4)               112                        82                       26
x86_64 AMD64            125                       102                       19
Tested-by: Alexandre Guedon <totalworlddomination@gmail.com>
x86_64 Intel Core2 Quad  21                        19                        7


-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-10  7:09                       ` Nick Piggin
@ 2007-07-10 22:09                         ` Christoph Lameter
  2007-07-10 23:12                           ` Matt Mackall
  0 siblings, 1 reply; 111+ messages in thread
From: Christoph Lameter @ 2007-07-10 22:09 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, Ingo Molnar, linux-kernel, linux-mm,
	suresh.b.siddha, corey.d.gough, Pekka Enberg, Matt Mackall,
	Denis Vlasenko, Erik Andersen

On Tue, 10 Jul 2007, Nick Piggin wrote:

> But last time this discussion came up, IIRC you ended up handwaving
> about all the ways in which SLOB was broken but didn't actually come
> up with any real problems. Matt seemed willing to add those counters
> or whatever it was if/when doing so solved a real problem. And remember
> that SLOB doesn't have to have feature parity with SLUB, so long as it
> implements the slab API such that the kernel *works*.

No it does not have to have feature parity. And yes we identified areas in 
which SLOB may cause problems due to not implementing things (f.e. 
suspend resume). The counters are still missing and thus core development 
cannot rely on those being there.


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-10 12:02                     ` Matt Mackall
  (?)
  (?)
@ 2007-07-10 22:12                     ` Christoph Lameter
  2007-07-10 22:40                         ` Matt Mackall
  -1 siblings, 1 reply; 111+ messages in thread
From: Christoph Lameter @ 2007-07-10 22:12 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Pekka Enberg, Nick Piggin, Andrew Morton, Ingo Molnar,
	linux-kernel, linux-mm, suresh.b.siddha, corey.d.gough,
	Denis Vlasenko, Erik Andersen

[-- Attachment #1: Type: TEXT/PLAIN, Size: 797 bytes --]

On Tue, 10 Jul 2007, Matt Mackall wrote:

> following as the best MemFree numbers after several boots each:
> 
> SLAB: 54796
> SLOB: 55044
> SLUB: 53944
> SLUB: 54788 (debug turned off)

That was without "slub_debug" as a parameter or with !CONFIG_SLUB_DEBUG?

Data size and code size will decrease if you compile with 
!CONFIG_SLUB_DEBUG. slub_debug on the command line governs if debug 
information is used.

> These numbers bounce around a lot more from boot to boot than I
> remember, so take these numbers with a grain of salt.
> 
> Disabling the debug code in the build gives this, by the way:
> 
> mm/slub.c: In function ÿÿinit_kmem_cache_nodeÿÿ:
> mm/slub.c:1873: error: ÿÿstruct kmem_cache_nodeÿÿ has no member named
> ÿÿfullÿÿ

A fix for that is in Andrew's tree.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-10 22:12                     ` Christoph Lameter
@ 2007-07-10 22:40                         ` Matt Mackall
  0 siblings, 0 replies; 111+ messages in thread
From: Matt Mackall @ 2007-07-10 22:40 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Nick Piggin, Andrew Morton, Ingo Molnar,
	linux-kernel, linux-mm, suresh.b.siddha, corey.d.gough,
	Denis Vlasenko, Erik Andersen

On Tue, Jul 10, 2007 at 03:12:38PM -0700, Christoph Lameter wrote:
> On Tue, 10 Jul 2007, Matt Mackall wrote:
> 
> > following as the best MemFree numbers after several boots each:
> > 
> > SLAB: 54796
> > SLOB: 55044
> > SLUB: 53944
> > SLUB: 54788 (debug turned off)
> 
> That was without "slub_debug" as a parameter or with !CONFIG_SLUB_DEBUG?

Without the parameter, as the other way doesn't compile in -mm1.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
@ 2007-07-10 22:40                         ` Matt Mackall
  0 siblings, 0 replies; 111+ messages in thread
From: Matt Mackall @ 2007-07-10 22:40 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Nick Piggin, Andrew Morton, Ingo Molnar,
	linux-kernel, linux-mm, suresh.b.siddha, corey.d.gough,
	Denis Vlasenko, Erik Andersen

On Tue, Jul 10, 2007 at 03:12:38PM -0700, Christoph Lameter wrote:
> On Tue, 10 Jul 2007, Matt Mackall wrote:
> 
> > following as the best MemFree numbers after several boots each:
> > 
> > SLAB: 54796
> > SLOB: 55044
> > SLUB: 53944
> > SLUB: 54788 (debug turned off)
> 
> That was without "slub_debug" as a parameter or with !CONFIG_SLUB_DEBUG?

Without the parameter, as the other way doesn't compile in -mm1.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-10 22:40                         ` Matt Mackall
@ 2007-07-10 22:50                           ` Christoph Lameter
  -1 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-10 22:50 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Pekka Enberg, Nick Piggin, Andrew Morton, Ingo Molnar,
	linux-kernel, linux-mm, suresh.b.siddha, corey.d.gough,
	Denis Vlasenko, Erik Andersen

On Tue, 10 Jul 2007, Matt Mackall wrote:

> Without the parameter, as the other way doesn't compile in -mm1.

here is the patch that went into mm after mm1 was released.

---
 mm/slub.c |    4 ++++
 1 file changed, 4 insertions(+)

Index: linux-2.6.22-rc6-mm1/mm/slub.c
===================================================================
--- linux-2.6.22-rc6-mm1.orig/mm/slub.c	2007-07-06 13:28:57.000000000 -0700
+++ linux-2.6.22-rc6-mm1/mm/slub.c	2007-07-06 13:29:01.000000000 -0700
@@ -1868,7 +1868,9 @@ static void init_kmem_cache_node(struct 
 	atomic_long_set(&n->nr_slabs, 0);
 	spin_lock_init(&n->list_lock);
 	INIT_LIST_HEAD(&n->partial);
+#ifdef CONFIG_SLUB_DEBUG
 	INIT_LIST_HEAD(&n->full);
+#endif
 }
 
 #ifdef CONFIG_NUMA
@@ -1898,8 +1900,10 @@ static struct kmem_cache_node * __init e
 	page->freelist = get_freepointer(kmalloc_caches, n);
 	page->inuse++;
 	kmalloc_caches->node[node] = n;
+#ifdef CONFIG_SLUB_DEBUG
 	init_object(kmalloc_caches, n, 1);
 	init_tracking(kmalloc_caches, n);
+#endif
 	init_kmem_cache_node(n);
 	atomic_long_inc(&n->nr_slabs);
 	add_partial(n, page);

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
@ 2007-07-10 22:50                           ` Christoph Lameter
  0 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-07-10 22:50 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Pekka Enberg, Nick Piggin, Andrew Morton, Ingo Molnar,
	linux-kernel, linux-mm, suresh.b.siddha, corey.d.gough,
	Denis Vlasenko, Erik Andersen

On Tue, 10 Jul 2007, Matt Mackall wrote:

> Without the parameter, as the other way doesn't compile in -mm1.

here is the patch that went into mm after mm1 was released.

---
 mm/slub.c |    4 ++++
 1 file changed, 4 insertions(+)

Index: linux-2.6.22-rc6-mm1/mm/slub.c
===================================================================
--- linux-2.6.22-rc6-mm1.orig/mm/slub.c	2007-07-06 13:28:57.000000000 -0700
+++ linux-2.6.22-rc6-mm1/mm/slub.c	2007-07-06 13:29:01.000000000 -0700
@@ -1868,7 +1868,9 @@ static void init_kmem_cache_node(struct 
 	atomic_long_set(&n->nr_slabs, 0);
 	spin_lock_init(&n->list_lock);
 	INIT_LIST_HEAD(&n->partial);
+#ifdef CONFIG_SLUB_DEBUG
 	INIT_LIST_HEAD(&n->full);
+#endif
 }
 
 #ifdef CONFIG_NUMA
@@ -1898,8 +1900,10 @@ static struct kmem_cache_node * __init e
 	page->freelist = get_freepointer(kmalloc_caches, n);
 	page->inuse++;
 	kmalloc_caches->node[node] = n;
+#ifdef CONFIG_SLUB_DEBUG
 	init_object(kmalloc_caches, n, 1);
 	init_tracking(kmalloc_caches, n);
+#endif
 	init_kmem_cache_node(n);
 	atomic_long_inc(&n->nr_slabs);
 	add_partial(n, page);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-10 22:09                         ` Christoph Lameter
@ 2007-07-10 23:12                           ` Matt Mackall
  0 siblings, 0 replies; 111+ messages in thread
From: Matt Mackall @ 2007-07-10 23:12 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Nick Piggin, Andrew Morton, Ingo Molnar, linux-kernel, linux-mm,
	suresh.b.siddha, corey.d.gough, Pekka Enberg, Denis Vlasenko,
	Erik Andersen

On Tue, Jul 10, 2007 at 03:09:06PM -0700, Christoph Lameter wrote:
> On Tue, 10 Jul 2007, Nick Piggin wrote:
> 
> > But last time this discussion came up, IIRC you ended up handwaving
> > about all the ways in which SLOB was broken but didn't actually come
> > up with any real problems. Matt seemed willing to add those counters
> > or whatever it was if/when doing so solved a real problem. And remember
> > that SLOB doesn't have to have feature parity with SLUB, so long as it
> > implements the slab API such that the kernel *works*.
> 
> No it does not have to have feature parity. And yes we identified areas in 
> which SLOB may cause problems due to not implementing things (f.e. 
> suspend resume).

Please remind me what these things are. Suspend and hibernate work
fine here.

> The counters are still missing and thus core development 
> cannot rely on those being there.

None of the VM makes real use of the SLAB counters. Nor will they ever
make real use of them because the counters are not usefully defined.
In other words, reclaimable is a lie. If SLAB claims there's 1M of
reclaimable memory, the actual amount may more often than not be zero
because we already tried to shrink the SLAB.

As I've said before, if you come up with a real use for these numbers,
I will sacrifice the appropriate number of puppies to support it in
SLOB. But I'd rather not crush any puppies here prematurely.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-10  8:32                       ` Matt Mackall
  2007-07-10  9:01                         ` Håvard Skinnemoen
@ 2007-07-11  1:37                         ` Christoph Lameter
  2007-07-11  2:06                           ` Matt Mackall
  1 sibling, 1 reply; 111+ messages in thread
From: Christoph Lameter @ 2007-07-11  1:37 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Nick Piggin, Andrew Morton, Ingo Molnar, linux-kernel, linux-mm,
	suresh.b.siddha, corey.d.gough, Pekka Enberg, Denis Vlasenko,
	Erik Andersen

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1876 bytes --]

On Tue, 10 Jul 2007, Matt Mackall wrote:

> You're delusional.


Git log says otherwise:

 git log --pretty=short mm/slob.c

Author: Christoph Lameter <clameter@sgi.com>
    Remove SLAB_CTOR_CONSTRUCTOR
Author: Christoph Lameter <clameter@sgi.com>
    Slab allocators: Drop support for destructors
Author: Nick Piggin <nickpiggin@yahoo.com.au>
    slob: implement RCU freeing
Author: Akinobu Mita <akinobu.mita@gmail.com>
    slob: fix page order calculation on not 4KB page
Author: Christoph Lameter <clameter@sgi.com>
    slab allocators: Remove obsolete SLAB_MUST_HWCACHE_ALIGN
Author: Akinobu Mita <akinobu.mita@gmail.com>
    slob: handle SLAB_PANIC flag
Author: Pekka Enberg <penberg@cs.helsinki.fi>
    slab: introduce krealloc
Author: Dimitri Gorokhovik <dimitri.gorokhovik@free.fr>
    [PATCH] MM: SLOB is broken by recent cleanup of slab.h
Author: Christoph Lameter <clameter@sgi.com>
    [PATCH] More slab.h cleanups
Author: Christoph Lameter <clameter@sgi.com>
    [PATCH] Cleanup slab headers / API to allow easy addition of new slab allocators
Author: Alexey Dobriyan <adobriyan@gmail.com>
    [PATCH] Make kmem_cache_destroy() return void
Author: Christoph Lameter <clameter@sgi.com>
    [PATCH] ZVC: Support NR_SLAB_RECLAIMABLE / NR_SLAB_UNRECLAIMABLE
Author: Christoph Lameter <clameter@sgi.com>
    [PATCH] Extract the allocpercpu functions from the slab allocator
Author: Jörn Engel <joern@wohnheim.fh-wedel.de>
    Remove obsolete #include <linux/config.h>
Author: John Hawkes <hawkes@sgi.com>
    [PATCH] mm/slob.c: for_each_possible_cpu(), not NR_CPUS
Author: Pekka Enberg <penberg@cs.helsinki.fi>
    [PATCH] slab: introduce kmem_cache_zalloc allocator
Author: Ingo Molnar <mingo@elte.hu>
    [PATCH] SLOB=y && SMP=y fix
Author: Matt Mackall <mpm@selenic.com>
    [PATCH] slob: introduce the SLOB allocator

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-11  1:37                         ` Christoph Lameter
@ 2007-07-11  2:06                           ` Matt Mackall
  2007-07-11 18:06                             ` Christoph Lameter
  0 siblings, 1 reply; 111+ messages in thread
From: Matt Mackall @ 2007-07-11  2:06 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Nick Piggin, Andrew Morton, Ingo Molnar, linux-kernel, linux-mm,
	suresh.b.siddha, corey.d.gough, Pekka Enberg, Denis Vlasenko,
	Erik Andersen

On Tue, Jul 10, 2007 at 06:37:42PM -0700, Christoph Lameter wrote:
> On Tue, 10 Jul 2007, Matt Mackall wrote:
> 
> > You're delusional.
> 
> Git log says otherwise:
> 
>  git log --pretty=short mm/slob.c

A dozen trivial cleanups do not make you maintainer. Otherwise we'd
all be sending our patches to Adrian rather than Linus.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-11  2:06                           ` Matt Mackall
@ 2007-07-11 18:06                             ` Christoph Lameter
  2007-07-11 18:25                               ` Pekka J Enberg
  0 siblings, 1 reply; 111+ messages in thread
From: Christoph Lameter @ 2007-07-11 18:06 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Nick Piggin, Andrew Morton, Ingo Molnar, linux-kernel, linux-mm,
	suresh.b.siddha, corey.d.gough, Pekka Enberg, Denis Vlasenko,
	Erik Andersen

On Tue, 10 Jul 2007, Matt Mackall wrote:

> >  git log --pretty=short mm/slob.c
> 
> A dozen trivial cleanups do not make you maintainer. Otherwise we'd
> all be sending our patches to Adrian rather than Linus.

Of course you are the maintainer but you only authored a single patch 
which was the original submission in all the time that SLOB was in the 
tree. I keep having to clean up the allocator that has--according to 
Pekka--more memory requirements than SLUB. There is no point in keeping it 
around anymore it seems.






^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-11 18:06                             ` Christoph Lameter
@ 2007-07-11 18:25                               ` Pekka J Enberg
  2007-07-11 18:33                                 ` Christoph Lameter
  2007-07-12  0:33                                 ` Nick Piggin
  0 siblings, 2 replies; 111+ messages in thread
From: Pekka J Enberg @ 2007-07-11 18:25 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Matt Mackall, Nick Piggin, Andrew Morton, Ingo Molnar,
	linux-kernel, linux-mm, suresh.b.siddha, corey.d.gough,
	Denis Vlasenko, Erik Andersen

Hi Christoph,

On Wed, 11 Jul 2007, Christoph Lameter wrote:
> Of course you are the maintainer but you only authored a single patch 
> which was the original submission in all the time that SLOB was in the 
> tree. I keep having to clean up the allocator that has--according to 
> Pekka--more memory requirements than SLUB. There is no point in keeping it 
> around anymore it seems.

Well, it was a test setup with UML and busybox and didn't have all the 
SLOB optimizations Nick mentioned, so we shouldn't draw any definite 
conclusions from it. I couldn't get 2.6.22-rc6-mm1 to compile so I'll try 
again after Andrew pushes a new release out.

Furthermore, as much as I would like to see SLOB nuked too, we can't do 
that until Matt and Nick are satisfied with SLUB for small devices and 
what I can gather, they aren't.

			Pekka

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-11 18:25                               ` Pekka J Enberg
@ 2007-07-11 18:33                                 ` Christoph Lameter
  2007-07-11 18:36                                   ` Pekka J Enberg
  2007-07-12  0:33                                 ` Nick Piggin
  1 sibling, 1 reply; 111+ messages in thread
From: Christoph Lameter @ 2007-07-11 18:33 UTC (permalink / raw)
  To: Pekka J Enberg
  Cc: Matt Mackall, Nick Piggin, Andrew Morton, Ingo Molnar,
	linux-kernel, linux-mm, suresh.b.siddha, corey.d.gough,
	Denis Vlasenko, Erik Andersen

On Wed, 11 Jul 2007, Pekka J Enberg wrote:

> Hi Christoph,
> 
> On Wed, 11 Jul 2007, Christoph Lameter wrote:
> > Of course you are the maintainer but you only authored a single patch 
> > which was the original submission in all the time that SLOB was in the 
> > tree. I keep having to clean up the allocator that has--according to 
> > Pekka--more memory requirements than SLUB. There is no point in keeping it 
> > around anymore it seems.
> 
> Well, it was a test setup with UML and busybox and didn't have all the 
> SLOB optimizations Nick mentioned, so we shouldn't draw any definite 
> conclusions from it. I couldn't get 2.6.22-rc6-mm1 to compile so I'll try 
> again after Andrew pushes a new release out.

But you did get 2.6.22 to compile it seems.

Here is the fix against 2.6.22-rc6-mm1 again.

---
 mm/slub.c |    4 ++++
 1 file changed, 4 insertions(+)

Index: linux-2.6.22-rc6-mm1/mm/slub.c
===================================================================
--- linux-2.6.22-rc6-mm1.orig/mm/slub.c	2007-07-06 13:28:57.000000000 -0700
+++ linux-2.6.22-rc6-mm1/mm/slub.c	2007-07-06 13:29:01.000000000 -0700
@@ -1868,7 +1868,9 @@ static void init_kmem_cache_node(struct 
 	atomic_long_set(&n->nr_slabs, 0);
 	spin_lock_init(&n->list_lock);
 	INIT_LIST_HEAD(&n->partial);
+#ifdef CONFIG_SLUB_DEBUG
 	INIT_LIST_HEAD(&n->full);
+#endif
 }
 
 #ifdef CONFIG_NUMA
@@ -1898,8 +1900,10 @@ static struct kmem_cache_node * __init e
 	page->freelist = get_freepointer(kmalloc_caches, n);
 	page->inuse++;
 	kmalloc_caches->node[node] = n;
+#ifdef CONFIG_SLUB_DEBUG
 	init_object(kmalloc_caches, n, 1);
 	init_tracking(kmalloc_caches, n);
+#endif
 	init_kmem_cache_node(n);
 	atomic_long_inc(&n->nr_slabs);
 	add_partial(n, page);

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-11 18:33                                 ` Christoph Lameter
@ 2007-07-11 18:36                                   ` Pekka J Enberg
  0 siblings, 0 replies; 111+ messages in thread
From: Pekka J Enberg @ 2007-07-11 18:36 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Matt Mackall, Nick Piggin, Andrew Morton, Ingo Molnar,
	linux-kernel, linux-mm, suresh.b.siddha, corey.d.gough,
	Denis Vlasenko, Erik Andersen

On Wed, 11 Jul 2007, Christoph Lameter wrote:
> But you did get 2.6.22 to compile it seems.
> 
> Here is the fix against 2.6.22-rc6-mm1 again.

I didn't get so far to compile with SLUB. There's some UML architecture 
problem and SLOB missing __kmalloc which UML also needs.

			Pekka

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 09/10] Remove the SLOB allocator for 2.6.23
  2007-07-11 18:25                               ` Pekka J Enberg
  2007-07-11 18:33                                 ` Christoph Lameter
@ 2007-07-12  0:33                                 ` Nick Piggin
  1 sibling, 0 replies; 111+ messages in thread
From: Nick Piggin @ 2007-07-12  0:33 UTC (permalink / raw)
  To: Pekka J Enberg
  Cc: Christoph Lameter, Matt Mackall, Andrew Morton, Ingo Molnar,
	linux-kernel, linux-mm, suresh.b.siddha, corey.d.gough,
	Denis Vlasenko, Erik Andersen

Pekka J Enberg wrote:
> Hi Christoph,
> 
> On Wed, 11 Jul 2007, Christoph Lameter wrote:
> 
>>Of course you are the maintainer but you only authored a single patch 
>>which was the original submission in all the time that SLOB was in the 
>>tree. I keep having to clean up the allocator that has--according to 
>>Pekka--more memory requirements than SLUB. There is no point in keeping it 
>>around anymore it seems.
> 
> 
> Well, it was a test setup with UML and busybox and didn't have all the 
> SLOB optimizations Nick mentioned, so we shouldn't draw any definite 
> conclusions from it. I couldn't get 2.6.22-rc6-mm1 to compile so I'll try 
> again after Andrew pushes a new release out.
> 
> Furthermore, as much as I would like to see SLOB nuked too, we can't do 
> that until Matt and Nick are satisfied with SLUB for small devices and 
> what I can gather, they aren't.

Just to be clear: I do really like SLUB of course. And if that was able
to get as good or nearly as good (for appropriate values of nearly) memory
efficiency as SLOB in relevant situations, that would be fantastic and
SLOB could go away.

I don't really have a good knowledge of small memory devices being used,
other than apparently they can boot with 2MB (maybe less with nommu?). So
even a few K could be very significant for these.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
  2007-07-10  0:55                   ` Christoph Lameter
@ 2007-08-13 22:18                     ` Mathieu Desnoyers
  -1 siblings, 0 replies; 111+ messages in thread
From: Mathieu Desnoyers @ 2007-08-13 22:18 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Martin Bligh, Andi Kleen, linux-kernel, linux-mm, David Miller

Some review here. I think we could do much better..

* Christoph Lameter (clameter@sgi.com) wrote:
 
> Index: linux-2.6.22-rc6-mm1/mm/slub.c
> ===================================================================
> --- linux-2.6.22-rc6-mm1.orig/mm/slub.c	2007-07-09 15:04:46.000000000 -0700
> +++ linux-2.6.22-rc6-mm1/mm/slub.c	2007-07-09 17:09:00.000000000 -0700
> @@ -1467,12 +1467,14 @@ static void *__slab_alloc(struct kmem_ca
>  {
>  	void **object;
>  	struct page *new;
> +	unsigned long flags;
>  
> +	local_irq_save(flags);
>  	if (!c->page)
>  		goto new_slab;
>  
>  	slab_lock(c->page);
> -	if (unlikely(!node_match(c, node)))
> +	if (unlikely(!node_match(c, node) || c->freelist))
>  		goto another_slab;
>  load_freelist:
>  	object = c->page->freelist;
> @@ -1486,7 +1488,14 @@ load_freelist:
>  	c->page->inuse = s->objects;
>  	c->page->freelist = NULL;
>  	c->node = page_to_nid(c->page);
> +out:
>  	slab_unlock(c->page);
> +	local_irq_restore(flags);
> +	preempt_enable();
> +
> +	if (unlikely((gfpflags & __GFP_ZERO)))
> +		memset(object, 0, c->objsize);
> +
>  	return object;
>  
>  another_slab:
> @@ -1527,6 +1536,8 @@ new_slab:
>  		c->page = new;
>  		goto load_freelist;
>  	}
> +	local_irq_restore(flags);
> +	preempt_enable();
>  	return NULL;
>  debug:
>  	c->freelist = NULL;
> @@ -1536,8 +1547,7 @@ debug:
>  
>  	c->page->inuse++;
>  	c->page->freelist = object[c->offset];
> -	slab_unlock(c->page);
> -	return object;
> +	goto out;
>  }
>  
>  /*
> @@ -1554,23 +1564,20 @@ static void __always_inline *slab_alloc(
>  		gfp_t gfpflags, int node, void *addr)
>  {
>  	void **object;
> -	unsigned long flags;
>  	struct kmem_cache_cpu *c;
>  

What if we prefetch c->freelist here ? I see in this diff that the other
code just reads it sooner as a condition for the if().

> -	local_irq_save(flags);
> +	preempt_disable();
>  	c = get_cpu_slab(s, smp_processor_id());
> -	if (unlikely(!c->page || !c->freelist ||
> -					!node_match(c, node)))
> +redo:
> +	object = c->freelist;
> +	if (unlikely(!object || !node_match(c, node)))
> +		return __slab_alloc(s, gfpflags, node, addr, c);
>  
> -		object = __slab_alloc(s, gfpflags, node, addr, c);
> +	if (cmpxchg_local(&c->freelist, object, object[c->offset]) != object)
> +		goto redo;
>  
> -	else {
> -		object = c->freelist;
> -		c->freelist = object[c->offset];
> -	}
> -	local_irq_restore(flags);
> -
> -	if (unlikely((gfpflags & __GFP_ZERO) && object))
> +	preempt_enable();
> +	if (unlikely((gfpflags & __GFP_ZERO)))
>  		memset(object, 0, c->objsize);
>  
>  	return object;
> @@ -1603,7 +1610,9 @@ static void __slab_free(struct kmem_cach
>  {
>  	void *prior;
>  	void **object = (void *)x;
> +	unsigned long flags;
>  
> +	local_irq_save(flags);
>  	slab_lock(page);
>  
>  	if (unlikely(SlabDebug(page)))
> @@ -1629,6 +1638,8 @@ checks_ok:
>  
>  out_unlock:
>  	slab_unlock(page);
> +	local_irq_restore(flags);
> +	preempt_enable();
>  	return;
>  
>  slab_empty:
> @@ -1639,6 +1650,8 @@ slab_empty:
>  		remove_partial(s, page);
>  
>  	slab_unlock(page);
> +	local_irq_restore(flags);
> +	preempt_enable();
>  	discard_slab(s, page);
>  	return;
>  
> @@ -1663,18 +1676,31 @@ static void __always_inline slab_free(st
>  			struct page *page, void *x, void *addr)
>  {
>  	void **object = (void *)x;
> -	unsigned long flags;
>  	struct kmem_cache_cpu *c;
> +	void **freelist;
>  

Prefetching c->freelist would also make sense here.

> -	local_irq_save(flags);
> +	preempt_disable();
>  	c = get_cpu_slab(s, smp_processor_id());
> -	if (likely(page == c->page && c->freelist)) {
> -		object[c->offset] = c->freelist;
> -		c->freelist = object;
> -	} else
> -		__slab_free(s, page, x, addr, c->offset);
> +redo:
> +	freelist = c->freelist;

I suspect this smp_rmb() may be the cause of a major slowdown.
Therefore, I think we should try taking a copy of c->page and simply
check if it has changed right after the cmpxchg_local:

  page = c->page;

> +	/*
> +	 * Must read freelist before c->page. If a interrupt occurs and
> +	 * changes c->page after we have read it here then it
> +	 * will also have changed c->freelist and the cmpxchg will fail.
> +	 *
> +	 * If we would have checked c->page first then the freelist could
> +	 * have been changed under us before we read c->freelist and we
> +	 * would not be able to detect that situation.
> +	 */
> +	smp_rmb();
> +	if (unlikely(page != c->page || !freelist))
> +		return __slab_free(s, page, x, addr, c->offset);
> +
> +	object[c->offset] = freelist;
-> +	if (cmpxchg_local(&c->freelist, freelist, object) != freelist)
+> +	if (cmpxchg_local(&c->freelist, freelist, object) != freelist
        || page != c->page)
> +		goto redo;
>  

Therefore, in the scenario where:
1 - c->page is read
2 - Interrupt comes, changes c->page and c->freelist
3 - c->freelist is read
4 - cmpxchg c->freelist succeeds
5 - Then, page != c->page, so we goto redo.

It also works if 4 and 5 are swapped.

I could test the modification if you point to me which kernel version it
should apply to. However, I don't have the same hardware you use.

By the way, the smp_rmb() barrier does not make sense with the comment.
If it is _really_ protecting against reordering wrt interrupts, then it
should be a rmb(), not smp_rmb() (because it will be reordered on UP).
But I think the best would just be to work without rmb() at all, as
proposed here.

Mathieu

> -	local_irq_restore(flags);
> +	preempt_enable();
>  }
>  
>  void kmem_cache_free(struct kmem_cache *s, void *x)
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
@ 2007-08-13 22:18                     ` Mathieu Desnoyers
  0 siblings, 0 replies; 111+ messages in thread
From: Mathieu Desnoyers @ 2007-08-13 22:18 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Martin Bligh, Andi Kleen, linux-kernel, linux-mm, David Miller

Some review here. I think we could do much better..

* Christoph Lameter (clameter@sgi.com) wrote:
 
> Index: linux-2.6.22-rc6-mm1/mm/slub.c
> ===================================================================
> --- linux-2.6.22-rc6-mm1.orig/mm/slub.c	2007-07-09 15:04:46.000000000 -0700
> +++ linux-2.6.22-rc6-mm1/mm/slub.c	2007-07-09 17:09:00.000000000 -0700
> @@ -1467,12 +1467,14 @@ static void *__slab_alloc(struct kmem_ca
>  {
>  	void **object;
>  	struct page *new;
> +	unsigned long flags;
>  
> +	local_irq_save(flags);
>  	if (!c->page)
>  		goto new_slab;
>  
>  	slab_lock(c->page);
> -	if (unlikely(!node_match(c, node)))
> +	if (unlikely(!node_match(c, node) || c->freelist))
>  		goto another_slab;
>  load_freelist:
>  	object = c->page->freelist;
> @@ -1486,7 +1488,14 @@ load_freelist:
>  	c->page->inuse = s->objects;
>  	c->page->freelist = NULL;
>  	c->node = page_to_nid(c->page);
> +out:
>  	slab_unlock(c->page);
> +	local_irq_restore(flags);
> +	preempt_enable();
> +
> +	if (unlikely((gfpflags & __GFP_ZERO)))
> +		memset(object, 0, c->objsize);
> +
>  	return object;
>  
>  another_slab:
> @@ -1527,6 +1536,8 @@ new_slab:
>  		c->page = new;
>  		goto load_freelist;
>  	}
> +	local_irq_restore(flags);
> +	preempt_enable();
>  	return NULL;
>  debug:
>  	c->freelist = NULL;
> @@ -1536,8 +1547,7 @@ debug:
>  
>  	c->page->inuse++;
>  	c->page->freelist = object[c->offset];
> -	slab_unlock(c->page);
> -	return object;
> +	goto out;
>  }
>  
>  /*
> @@ -1554,23 +1564,20 @@ static void __always_inline *slab_alloc(
>  		gfp_t gfpflags, int node, void *addr)
>  {
>  	void **object;
> -	unsigned long flags;
>  	struct kmem_cache_cpu *c;
>  

What if we prefetch c->freelist here ? I see in this diff that the other
code just reads it sooner as a condition for the if().

> -	local_irq_save(flags);
> +	preempt_disable();
>  	c = get_cpu_slab(s, smp_processor_id());
> -	if (unlikely(!c->page || !c->freelist ||
> -					!node_match(c, node)))
> +redo:
> +	object = c->freelist;
> +	if (unlikely(!object || !node_match(c, node)))
> +		return __slab_alloc(s, gfpflags, node, addr, c);
>  
> -		object = __slab_alloc(s, gfpflags, node, addr, c);
> +	if (cmpxchg_local(&c->freelist, object, object[c->offset]) != object)
> +		goto redo;
>  
> -	else {
> -		object = c->freelist;
> -		c->freelist = object[c->offset];
> -	}
> -	local_irq_restore(flags);
> -
> -	if (unlikely((gfpflags & __GFP_ZERO) && object))
> +	preempt_enable();
> +	if (unlikely((gfpflags & __GFP_ZERO)))
>  		memset(object, 0, c->objsize);
>  
>  	return object;
> @@ -1603,7 +1610,9 @@ static void __slab_free(struct kmem_cach
>  {
>  	void *prior;
>  	void **object = (void *)x;
> +	unsigned long flags;
>  
> +	local_irq_save(flags);
>  	slab_lock(page);
>  
>  	if (unlikely(SlabDebug(page)))
> @@ -1629,6 +1638,8 @@ checks_ok:
>  
>  out_unlock:
>  	slab_unlock(page);
> +	local_irq_restore(flags);
> +	preempt_enable();
>  	return;
>  
>  slab_empty:
> @@ -1639,6 +1650,8 @@ slab_empty:
>  		remove_partial(s, page);
>  
>  	slab_unlock(page);
> +	local_irq_restore(flags);
> +	preempt_enable();
>  	discard_slab(s, page);
>  	return;
>  
> @@ -1663,18 +1676,31 @@ static void __always_inline slab_free(st
>  			struct page *page, void *x, void *addr)
>  {
>  	void **object = (void *)x;
> -	unsigned long flags;
>  	struct kmem_cache_cpu *c;
> +	void **freelist;
>  

Prefetching c->freelist would also make sense here.

> -	local_irq_save(flags);
> +	preempt_disable();
>  	c = get_cpu_slab(s, smp_processor_id());
> -	if (likely(page == c->page && c->freelist)) {
> -		object[c->offset] = c->freelist;
> -		c->freelist = object;
> -	} else
> -		__slab_free(s, page, x, addr, c->offset);
> +redo:
> +	freelist = c->freelist;

I suspect this smp_rmb() may be the cause of a major slowdown.
Therefore, I think we should try taking a copy of c->page and simply
check if it has changed right after the cmpxchg_local:

  page = c->page;

> +	/*
> +	 * Must read freelist before c->page. If a interrupt occurs and
> +	 * changes c->page after we have read it here then it
> +	 * will also have changed c->freelist and the cmpxchg will fail.
> +	 *
> +	 * If we would have checked c->page first then the freelist could
> +	 * have been changed under us before we read c->freelist and we
> +	 * would not be able to detect that situation.
> +	 */
> +	smp_rmb();
> +	if (unlikely(page != c->page || !freelist))
> +		return __slab_free(s, page, x, addr, c->offset);
> +
> +	object[c->offset] = freelist;
-> +	if (cmpxchg_local(&c->freelist, freelist, object) != freelist)
+> +	if (cmpxchg_local(&c->freelist, freelist, object) != freelist
        || page != c->page)
> +		goto redo;
>  

Therefore, in the scenario where:
1 - c->page is read
2 - Interrupt comes, changes c->page and c->freelist
3 - c->freelist is read
4 - cmpxchg c->freelist succeeds
5 - Then, page != c->page, so we goto redo.

It also works if 4 and 5 are swapped.

I could test the modification if you point to me which kernel version it
should apply to. However, I don't have the same hardware you use.

By the way, the smp_rmb() barrier does not make sense with the comment.
If it is _really_ protecting against reordering wrt interrupts, then it
should be a rmb(), not smp_rmb() (because it will be reordered on UP).
But I think the best would just be to work without rmb() at all, as
proposed here.

Mathieu

> -	local_irq_restore(flags);
> +	preempt_enable();
>  }
>  
>  void kmem_cache_free(struct kmem_cache *s, void *x)
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
  2007-08-13 22:18                     ` Mathieu Desnoyers
@ 2007-08-13 22:28                       ` Christoph Lameter
  -1 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-08-13 22:28 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Martin Bligh, Andi Kleen, linux-kernel, linux-mm, David Miller

On Mon, 13 Aug 2007, Mathieu Desnoyers wrote:

> > @@ -1554,23 +1564,20 @@ static void __always_inline *slab_alloc(
> >  		gfp_t gfpflags, int node, void *addr)
> >  {
> >  	void **object;
> > -	unsigned long flags;
> >  	struct kmem_cache_cpu *c;
> >  
> 
> What if we prefetch c->freelist here ? I see in this diff that the other
> code just reads it sooner as a condition for the if().

Not sure as to what this may bring. If you read it earlier then you may 
get the wrong value and then may have to refetch the cacheline.

We cannot fetch c->freelist without determining c. I can remove the 
check for c->page == page so that the fetch of c->freelist comes 
immeidately after detemination of c. But that does not change performance.

> > -		c->freelist = object;
> > -	} else
> > -		__slab_free(s, page, x, addr, c->offset);
> > +redo:
> > +	freelist = c->freelist;
> 
> I suspect this smp_rmb() may be the cause of a major slowdown.
> Therefore, I think we should try taking a copy of c->page and simply
> check if it has changed right after the cmpxchg_local:

Thought so too and I removed that smp_rmb and tested this modification 
on UP again without any performance gains. I think the cacheline fetches 
dominates the execution thread here and cmpxchg does not bring us 
anything.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance
@ 2007-08-13 22:28                       ` Christoph Lameter
  0 siblings, 0 replies; 111+ messages in thread
From: Christoph Lameter @ 2007-08-13 22:28 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Martin Bligh, Andi Kleen, linux-kernel, linux-mm, David Miller

On Mon, 13 Aug 2007, Mathieu Desnoyers wrote:

> > @@ -1554,23 +1564,20 @@ static void __always_inline *slab_alloc(
> >  		gfp_t gfpflags, int node, void *addr)
> >  {
> >  	void **object;
> > -	unsigned long flags;
> >  	struct kmem_cache_cpu *c;
> >  
> 
> What if we prefetch c->freelist here ? I see in this diff that the other
> code just reads it sooner as a condition for the if().

Not sure as to what this may bring. If you read it earlier then you may 
get the wrong value and then may have to refetch the cacheline.

We cannot fetch c->freelist without determining c. I can remove the 
check for c->page == page so that the fetch of c->freelist comes 
immeidately after detemination of c. But that does not change performance.

> > -		c->freelist = object;
> > -	} else
> > -		__slab_free(s, page, x, addr, c->offset);
> > +redo:
> > +	freelist = c->freelist;
> 
> I suspect this smp_rmb() may be the cause of a major slowdown.
> Therefore, I think we should try taking a copy of c->page and simply
> check if it has changed right after the cmpxchg_local:

Thought so too and I removed that smp_rmb and tested this modification 
on UP again without any performance gains. I think the cacheline fetches 
dominates the execution thread here and cmpxchg does not bring us 
anything.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 111+ messages in thread

end of thread, other threads:[~2007-08-13 22:28 UTC | newest]

Thread overview: 111+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-07-08  3:49 [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance Christoph Lameter
2007-07-08  3:49 ` [patch 01/10] SLUB: Direct pass through of page size or higher kmalloc requests Christoph Lameter
2007-07-08  3:49 ` [patch 02/10] SLUB: Avoid page struct cacheline bouncing due to remote frees to cpu slab Christoph Lameter
2007-07-08  3:49 ` [patch 03/10] SLUB: Do not use page->mapping Christoph Lameter
2007-07-08  3:49 ` [patch 04/10] SLUB: Move page->offset to kmem_cache_cpu->offset Christoph Lameter
2007-07-08  3:49 ` [patch 05/10] SLUB: Avoid touching page struct when freeing to per cpu slab Christoph Lameter
2007-07-08  3:49 ` [patch 06/10] SLUB: Place kmem_cache_cpu structures in a NUMA aware way Christoph Lameter
2007-07-08  3:49 ` [patch 07/10] SLUB: Optimize cacheline use for zeroing Christoph Lameter
2007-07-08  3:50 ` [patch 08/10] SLUB: Single atomic instruction alloc/free using cmpxchg Christoph Lameter
2007-07-08  3:50 ` [patch 09/10] Remove the SLOB allocator for 2.6.23 Christoph Lameter
2007-07-08  7:51   ` Ingo Molnar
2007-07-08  9:43     ` Nick Piggin
2007-07-08  9:54       ` Ingo Molnar
2007-07-08 10:23         ` Nick Piggin
2007-07-08 10:42           ` Ingo Molnar
2007-07-08 18:02     ` Andrew Morton
2007-07-09  2:57       ` Nick Piggin
2007-07-09 11:04         ` Pekka Enberg
2007-07-09 11:16           ` Nick Piggin
2007-07-09 12:47             ` Pekka Enberg
2007-07-09 13:46             ` Pekka J Enberg
2007-07-09 16:08           ` Christoph Lameter
2007-07-09 16:08             ` Christoph Lameter
2007-07-10  8:17             ` Pekka J Enberg
2007-07-10  8:17               ` Pekka J Enberg
2007-07-10  8:27               ` Nick Piggin
2007-07-10  8:27                 ` Nick Piggin
2007-07-10  9:31                 ` Pekka Enberg
2007-07-10  9:31                   ` Pekka Enberg
2007-07-10 10:09                   ` Nick Piggin
2007-07-10 10:09                     ` Nick Piggin
2007-07-10 12:02                   ` Matt Mackall
2007-07-10 12:02                     ` Matt Mackall
2007-07-10 12:57                     ` Pekka J Enberg
2007-07-10 12:57                       ` Pekka J Enberg
2007-07-10 22:12                     ` Christoph Lameter
2007-07-10 22:40                       ` Matt Mackall
2007-07-10 22:40                         ` Matt Mackall
2007-07-10 22:50                         ` Christoph Lameter
2007-07-10 22:50                           ` Christoph Lameter
2007-07-09 16:06         ` Christoph Lameter
2007-07-09 16:51           ` Andrew Morton
2007-07-09 17:26             ` Christoph Lameter
2007-07-09 18:00               ` Andrew Morton
2007-07-10  1:43               ` Nick Piggin
2007-07-10  1:56                 ` Christoph Lameter
2007-07-10  2:02                   ` Nick Piggin
2007-07-10  2:11                     ` Christoph Lameter
2007-07-10  7:09                       ` Nick Piggin
2007-07-10 22:09                         ` Christoph Lameter
2007-07-10 23:12                           ` Matt Mackall
2007-07-10  8:32                       ` Matt Mackall
2007-07-10  9:01                         ` Håvard Skinnemoen
2007-07-10  9:11                           ` Nick Piggin
2007-07-10  9:21                             ` Håvard Skinnemoen
2007-07-11  1:37                         ` Christoph Lameter
2007-07-11  2:06                           ` Matt Mackall
2007-07-11 18:06                             ` Christoph Lameter
2007-07-11 18:25                               ` Pekka J Enberg
2007-07-11 18:33                                 ` Christoph Lameter
2007-07-11 18:36                                   ` Pekka J Enberg
2007-07-12  0:33                                 ` Nick Piggin
2007-07-09 23:09             ` Matt Mackall
2007-07-10  1:41           ` Nick Piggin
2007-07-10  1:51             ` Christoph Lameter
2007-07-10  1:58               ` Nick Piggin
2007-07-10  6:22                 ` Matt Mackall
2007-07-10  7:03                   ` Nick Piggin
2007-07-10  2:32               ` Matt Mackall
2007-07-09 21:57       ` Matt Mackall
2007-07-09 12:31     ` Matthieu CASTET
2007-07-09 16:00     ` Christoph Lameter
2007-07-09 20:52   ` Matt Mackall
2007-07-08  3:50 ` [patch 10/10] Remove slab in 2.6.24 Christoph Lameter
2007-07-08  4:37 ` [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance David Miller
2007-07-09 15:45   ` Christoph Lameter
2007-07-09 19:43     ` David Miller
2007-07-09 21:21       ` Christoph Lameter
2007-07-08 11:20 ` Andi Kleen
2007-07-09 15:50   ` Christoph Lameter
2007-07-09 15:50     ` Christoph Lameter
2007-07-09 15:59     ` Martin Bligh
2007-07-09 15:59       ` Martin Bligh
2007-07-09 18:11       ` Christoph Lameter
2007-07-09 18:11         ` Christoph Lameter
2007-07-09 21:00         ` Martin Bligh
2007-07-09 21:00           ` Martin Bligh
2007-07-09 21:44           ` Mathieu Desnoyers
2007-07-09 21:44             ` Mathieu Desnoyers
2007-07-09 21:55             ` Christoph Lameter
2007-07-09 21:55               ` Christoph Lameter
2007-07-09 22:58               ` Mathieu Desnoyers
2007-07-09 22:58                 ` Mathieu Desnoyers
2007-07-09 23:08                 ` Christoph Lameter
2007-07-09 23:08                   ` Christoph Lameter
2007-07-10  5:16                   ` [PATCH] x86_64 - Use non locked version for local_cmpxchg() Mathieu Desnoyers
2007-07-10  5:16                     ` Mathieu Desnoyers
2007-07-10 20:46                     ` Christoph Lameter
2007-07-10 20:46                       ` Christoph Lameter
2007-07-10  0:55                 ` [patch 00/10] [RFC] SLUB patches for more functionality, performance and maintenance Christoph Lameter
2007-07-10  0:55                   ` Christoph Lameter
2007-07-10  8:27                   ` Mathieu Desnoyers
2007-07-10  8:27                     ` Mathieu Desnoyers
2007-07-10 18:38                     ` Christoph Lameter
2007-07-10 18:38                       ` Christoph Lameter
2007-07-10 20:59                     ` Mathieu Desnoyers
2007-07-10 20:59                       ` Mathieu Desnoyers
2007-08-13 22:18                   ` Mathieu Desnoyers
2007-08-13 22:18                     ` Mathieu Desnoyers
2007-08-13 22:28                     ` Christoph Lameter
2007-08-13 22:28                       ` Christoph Lameter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.