All of lore.kernel.org
 help / color / mirror / Atom feed
* [slubllv7 00/17] SLUB: Lockless freelists for objects V7
@ 2011-06-01 17:25 Christoph Lameter
  2011-06-01 17:25 ` [slubllv7 01/17] slub: Push irq disable into allocate_slab() Christoph Lameter
                   ` (16 more replies)
  0 siblings, 17 replies; 41+ messages in thread
From: Christoph Lameter @ 2011-06-01 17:25 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Eric Dumazet, H. Peter Anvin, linux-kernel,
	Thomas Gleixner

V6->V7	- Work out issues with the x86 arch specific patch.
	- Add review tags.

V5->V6  - Diffed against current Linus for -next integration.
	- Rework descriptions
	- Patches could use some review.

V4->V5	- More cleanup. Remove gotos from __slab_alloc and __slab_free
	- Some structural changes to alloc and free to clean up the code
	- Statistics modifications folded in other patches.
	- Fixes to patches already in Pekka's slabnext.
	- Include missing upstream fixes

V3->V4	- Diffed against Pekka's slab/next tree.
	- Numerous cleanups in particular as a result of the removal of the
	  #ifdef CMPXCHG_LOCAL stuff.
	- Smaller cleanups whereever I saw something.

V2->V3
	- Provide statistics
	- Fallback logic to page lock if cmpxchg16b is not available.
	- Better counter support
	- More cleanups and clarifications

Well here is another result of my obsession with SLAB allocators. There must be
some way to get an allocator done that is faster without queueing and I hope
that we are now there (maybe only almost...). Any help with cleaning up the
rough edges would be appreciated.

This patchset implement wider lockless operations in slub affecting most of the
slowpaths. In particular the patch decreases the overhead in the performance
critical section of __slab_free.

One test that I ran was "hackbench 200 process 200" on 2.6.39-rc3 under KVM

Run	SLAB	SLUB	SLUB LL
1st	35.2	35.9	31.9
2nd	34.6	30.8	27.9
3rd	33.8	29.9	28.8

Note that the SLUB version in 2.6.29-rc1 already has an optimized allocation
and free path using this_cpu_cmpxchg_double(). SLUB LL takes it to new heights
by also using cmpxchg_double() in the slowpaths (especially in the kfree()
case where we frequently cannot use the fastpath because there is no queue).

The patch uses a cmpxchg_double (also introduced here) to do an atomic change
on the state of a slab page that includes the following pieces of information:

1. Freelist pointer
2. Number of objects inuse
3. Frozen state of a slab

Disabling of interrupts (which is a significant latency in the
allocator paths) is avoided in the __slab_free case.

There are some concerns with this patch. The use of cmpxchg_double on
fields of the page struct requires alignment of the fields to double
word boundaries. That can only be accomplished by adding some padding
to struct page which blows it up to 64 byte (on x86_64). Comments
in the source describe these things in more detail.

The cmpxchg_double() operation introduced here could also be used to
update other doublewords in the page struct in a lockless fashion. One
can envision page state changes that involved flags and mappings or
maybe do list operations locklessly (but with the current scheme we
would need to update two other words elsewhere at the same time too,
so another scheme would be needed).


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [slubllv7 01/17] slub: Push irq disable into allocate_slab()
  2011-06-01 17:25 [slubllv7 00/17] SLUB: Lockless freelists for objects V7 Christoph Lameter
@ 2011-06-01 17:25 ` Christoph Lameter
  2011-06-01 17:25 ` [slubllv7 02/17] slub: Do not use frozen page flag but a bit in the page counters Christoph Lameter
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Christoph Lameter @ 2011-06-01 17:25 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Eric Dumazet, H. Peter Anvin, linux-kernel,
	Thomas Gleixner

[-- Attachment #1: push_irq_disable --]
[-- Type: text/plain, Size: 1639 bytes --]

Do the irq handling in allocate_slab() instead of __slab_alloc().

__slab_alloc() is already cluttered and allocate_slab() is already
fiddling around with gfp flags.

v6->v7:
	Only increment ORDER_FALLBACK if we get a page during fallback

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 mm/slub.c |   23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-05-26 16:13:58.085604969 -0500
+++ linux-2.6/mm/slub.c	2011-05-31 09:42:08.102989621 -0500
@@ -1187,6 +1187,11 @@ static struct page *allocate_slab(struct
 	struct kmem_cache_order_objects oo = s->oo;
 	gfp_t alloc_gfp;
 
+	flags &= gfp_allowed_mask;
+
+	if (flags & __GFP_WAIT)
+		local_irq_enable();
+
 	flags |= s->allocflags;
 
 	/*
@@ -1203,12 +1208,17 @@ static struct page *allocate_slab(struct
 		 * Try a lower order alloc if possible
 		 */
 		page = alloc_slab_page(flags, node, oo);
-		if (!page)
-			return NULL;
 
-		stat(s, ORDER_FALLBACK);
+		if (page)
+			stat(s, ORDER_FALLBACK);
 	}
 
+	if (flags & __GFP_WAIT)
+		local_irq_disable();
+
+	if (!page)
+		return NULL;
+
 	if (kmemcheck_enabled
 		&& !(s->flags & (SLAB_NOTRACK | DEBUG_DEFAULT_FLAGS))) {
 		int pages = 1 << oo_order(oo);
@@ -1849,15 +1859,8 @@ new_slab:
 		goto load_freelist;
 	}
 
-	gfpflags &= gfp_allowed_mask;
-	if (gfpflags & __GFP_WAIT)
-		local_irq_enable();
-
 	page = new_slab(s, gfpflags, node);
 
-	if (gfpflags & __GFP_WAIT)
-		local_irq_disable();
-
 	if (page) {
 		c = __this_cpu_ptr(s->cpu_slab);
 		stat(s, ALLOC_SLAB);


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [slubllv7 02/17] slub: Do not use frozen page flag but a bit in the page counters
  2011-06-01 17:25 [slubllv7 00/17] SLUB: Lockless freelists for objects V7 Christoph Lameter
  2011-06-01 17:25 ` [slubllv7 01/17] slub: Push irq disable into allocate_slab() Christoph Lameter
@ 2011-06-01 17:25 ` Christoph Lameter
  2011-06-01 17:25 ` [slubllv7 03/17] slub: Move page->frozen handling near where the page->freelist handling occurs Christoph Lameter
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Christoph Lameter @ 2011-06-01 17:25 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Eric Dumazet, H. Peter Anvin, linux-kernel,
	Thomas Gleixner

[-- Attachment #1: frozen_field --]
[-- Type: text/plain, Size: 3338 bytes --]

Do not use a page flag for the frozen bit. It needs to be part
of the state that is handled with cmpxchg_double(). So use a bit
in the counter struct in the page struct for that purpose.

Signed-off-by: Christoph Lameter <cl@linux.com>


---
 include/linux/mm_types.h   |    5 +++--
 include/linux/page-flags.h |    5 -----
 mm/slub.c                  |   12 ++++++------
 3 files changed, 9 insertions(+), 13 deletions(-)

Index: linux-2.6/include/linux/mm_types.h
===================================================================
--- linux-2.6.orig/include/linux/mm_types.h	2011-05-31 09:40:57.402990070 -0500
+++ linux-2.6/include/linux/mm_types.h	2011-05-31 09:42:53.632989323 -0500
@@ -41,8 +41,9 @@ struct page {
 					 * & limit reverse map searches.
 					 */
 		struct {		/* SLUB */
-			u16 inuse;
-			u16 objects;
+			unsigned inuse:16;
+			unsigned objects:15;
+			unsigned frozen:1;
 		};
 	};
 	union {
Index: linux-2.6/include/linux/page-flags.h
===================================================================
--- linux-2.6.orig/include/linux/page-flags.h	2011-05-31 09:40:57.392990068 -0500
+++ linux-2.6/include/linux/page-flags.h	2011-05-31 09:43:25.402989115 -0500
@@ -124,9 +124,6 @@ enum pageflags {
 
 	/* SLOB */
 	PG_slob_free = PG_private,
-
-	/* SLUB */
-	PG_slub_frozen = PG_active,
 };
 
 #ifndef __GENERATING_BOUNDS_H
@@ -212,8 +209,6 @@ PAGEFLAG(SwapBacked, swapbacked) __CLEAR
 
 __PAGEFLAG(SlobFree, slob_free)
 
-__PAGEFLAG(SlubFrozen, slub_frozen)
-
 /*
  * Private page markings that may be used by the filesystem that owns the page
  * for its own purposes.
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-05-31 09:42:08.102989621 -0500
+++ linux-2.6/mm/slub.c	2011-05-31 09:42:53.632989323 -0500
@@ -166,7 +166,7 @@ static inline int kmem_cache_debug(struc
 
 #define OO_SHIFT	16
 #define OO_MASK		((1 << OO_SHIFT) - 1)
-#define MAX_OBJS_PER_PAGE	65535 /* since page.objects is u16 */
+#define MAX_OBJS_PER_PAGE	32767 /* since page.objects is u15 */
 
 /* Internal SLUB flags */
 #define __OBJECT_POISON		0x80000000UL /* Poison object */
@@ -1025,7 +1025,7 @@ static noinline int free_debug_processin
 	}
 
 	/* Special debug activities for freeing objects */
-	if (!PageSlubFrozen(page) && !page->freelist)
+	if (!page->frozen && !page->freelist)
 		remove_full(s, page);
 	if (s->flags & SLAB_STORE_USER)
 		set_track(s, object, TRACK_FREE, addr);
@@ -1424,7 +1424,7 @@ static inline int lock_and_freeze_slab(s
 {
 	if (slab_trylock(page)) {
 		__remove_partial(n, page);
-		__SetPageSlubFrozen(page);
+		page->frozen = 1;
 		return 1;
 	}
 	return 0;
@@ -1538,7 +1538,7 @@ static void unfreeze_slab(struct kmem_ca
 {
 	struct kmem_cache_node *n = get_node(s, page_to_nid(page));
 
-	__ClearPageSlubFrozen(page);
+	page->frozen = 0;
 	if (page->inuse) {
 
 		if (page->freelist) {
@@ -1868,7 +1868,7 @@ new_slab:
 			flush_slab(s, c);
 
 		slab_lock(page);
-		__SetPageSlubFrozen(page);
+		page->frozen = 1;
 		c->node = page_to_nid(page);
 		c->page = page;
 		goto load_freelist;
@@ -2048,7 +2048,7 @@ static void __slab_free(struct kmem_cach
 	page->freelist = object;
 	page->inuse--;
 
-	if (unlikely(PageSlubFrozen(page))) {
+	if (unlikely(page->frozen)) {
 		stat(s, FREE_FROZEN);
 		goto out_unlock;
 	}


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [slubllv7 03/17] slub: Move page->frozen handling near where the page->freelist handling occurs
  2011-06-01 17:25 [slubllv7 00/17] SLUB: Lockless freelists for objects V7 Christoph Lameter
  2011-06-01 17:25 ` [slubllv7 01/17] slub: Push irq disable into allocate_slab() Christoph Lameter
  2011-06-01 17:25 ` [slubllv7 02/17] slub: Do not use frozen page flag but a bit in the page counters Christoph Lameter
@ 2011-06-01 17:25 ` Christoph Lameter
  2011-06-01 17:25 ` [slubllv7 04/17] x86: Add support for cmpxchg_double Christoph Lameter
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Christoph Lameter @ 2011-06-01 17:25 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Eric Dumazet, H. Peter Anvin, linux-kernel,
	Thomas Gleixner

[-- Attachment #1: frozen_move --]
[-- Type: text/plain, Size: 1925 bytes --]

This is necessary because the frozen bit has to be handled in the same cmpxchg_double
with the freelist and the counters.

Signed-off-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>

---
 mm/slub.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-05-31 09:42:53.632989323 -0500
+++ linux-2.6/mm/slub.c	2011-05-31 09:43:48.312988970 -0500
@@ -1286,6 +1286,7 @@ static struct page *new_slab(struct kmem
 
 	page->freelist = start;
 	page->inuse = 0;
+	page->frozen = 1;
 out:
 	return page;
 }
@@ -1424,7 +1425,6 @@ static inline int lock_and_freeze_slab(s
 {
 	if (slab_trylock(page)) {
 		__remove_partial(n, page);
-		page->frozen = 1;
 		return 1;
 	}
 	return 0;
@@ -1538,7 +1538,6 @@ static void unfreeze_slab(struct kmem_ca
 {
 	struct kmem_cache_node *n = get_node(s, page_to_nid(page));
 
-	page->frozen = 0;
 	if (page->inuse) {
 
 		if (page->freelist) {
@@ -1671,6 +1670,7 @@ static void deactivate_slab(struct kmem_
 	}
 	c->page = NULL;
 	c->tid = next_tid(c->tid);
+	page->frozen = 0;
 	unfreeze_slab(s, page, tail);
 }
 
@@ -1831,6 +1831,8 @@ static void *__slab_alloc(struct kmem_ca
 	stat(s, ALLOC_REFILL);
 
 load_freelist:
+	VM_BUG_ON(!page->frozen);
+
 	object = page->freelist;
 	if (unlikely(!object))
 		goto another_slab;
@@ -1854,6 +1856,7 @@ new_slab:
 	page = get_partial(s, gfpflags, node);
 	if (page) {
 		stat(s, ALLOC_FROM_PARTIAL);
+		page->frozen = 1;
 		c->node = page_to_nid(page);
 		c->page = page;
 		goto load_freelist;
@@ -2375,6 +2378,7 @@ static void early_kmem_cache_node_alloc(
 	BUG_ON(!n);
 	page->freelist = get_freepointer(kmem_cache_node, n);
 	page->inuse++;
+	page->frozen = 0;
 	kmem_cache_node->node[node] = n;
 #ifdef CONFIG_SLUB_DEBUG
 	init_object(kmem_cache_node, n, SLUB_RED_ACTIVE);


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [slubllv7 04/17] x86: Add support for cmpxchg_double
  2011-06-01 17:25 [slubllv7 00/17] SLUB: Lockless freelists for objects V7 Christoph Lameter
                   ` (2 preceding siblings ...)
  2011-06-01 17:25 ` [slubllv7 03/17] slub: Move page->frozen handling near where the page->freelist handling occurs Christoph Lameter
@ 2011-06-01 17:25 ` Christoph Lameter
  2011-06-09  9:53   ` Pekka Enberg
                     ` (2 more replies)
  2011-06-01 17:25 ` [slubllv7 05/17] mm: Rearrange struct page Christoph Lameter
                   ` (12 subsequent siblings)
  16 siblings, 3 replies; 41+ messages in thread
From: Christoph Lameter @ 2011-06-01 17:25 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, tj, Eric Dumazet, H. Peter Anvin, linux-kernel,
	Thomas Gleixner

[-- Attachment #1: cmpxchg_double_x86 --]
[-- Type: text/plain, Size: 5996 bytes --]

A simple implementation that only supports the word size and does not
have a fallback mode (would require a spinlock).

Add 32 and 64 bit support for cmpxchg_double. cmpxchg double uses
the cmpxchg8b or cmpxchg16b instruction on x86 processors to compare
and swap 2 machine words. This allows lockless algorithms to move more
context information through critical sections.

Set a flag CONFIG_CMPXCHG_DOUBLE to signal that support for double word
cmpxchg detection has been build into the kernel. Note that each subsystem
using cmpxchg_double has to implement a fall back mechanism as long as
we offer support for processors that do not implement cmpxchg_double.

Reviewed-by: H. Peter Anvin <hpa@zytor.com>
Cc: tj@kernel.org
Signed-off-by: Christoph Lameter <cl@linux.com>

---
 arch/x86/Kconfig.cpu              |    3 ++
 arch/x86/include/asm/cmpxchg_32.h |   48 ++++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/cmpxchg_64.h |   45 +++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/cpufeature.h |    2 +
 4 files changed, 98 insertions(+)

Index: linux-2.6/arch/x86/include/asm/cmpxchg_64.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/cmpxchg_64.h	2011-06-01 11:01:05.002406114 -0500
+++ linux-2.6/arch/x86/include/asm/cmpxchg_64.h	2011-06-01 11:01:48.222405834 -0500
@@ -151,4 +151,49 @@ extern void __cmpxchg_wrong_size(void);
 	cmpxchg_local((ptr), (o), (n));					\
 })
 
+#define cmpxchg16b(ptr, o1, o2, n1, n2)				\
+({								\
+	char __ret;						\
+	__typeof__(o2) __junk;					\
+	__typeof__(*(ptr)) __old1 = (o1);			\
+	__typeof__(o2) __old2 = (o2);				\
+	__typeof__(*(ptr)) __new1 = (n1);			\
+	__typeof__(o2) __new2 = (n2);				\
+	asm volatile(LOCK_PREFIX "cmpxchg16b %2;setz %1"	\
+		       : "=d"(__junk), "=a"(__ret), "+m" (*ptr)	\
+		       : "b"(__new1), "c"(__new2),		\
+		         "a"(__old1), "d"(__old2));		\
+	__ret; })
+
+
+#define cmpxchg16b_local(ptr, o1, o2, n1, n2)			\
+({								\
+	char __ret;						\
+	__typeof__(o2) __junk;					\
+	__typeof__(*(ptr)) __old1 = (o1);			\
+	__typeof__(o2) __old2 = (o2);				\
+	__typeof__(*(ptr)) __new1 = (n1);			\
+	__typeof__(o2) __new2 = (n2);				\
+	asm volatile("cmpxchg16b %2;setz %1"			\
+		       : "=d"(__junk), "=a"(__ret), "+m" (*ptr)	\
+		       : "b"(__new1), "c"(__new2),		\
+ 		         "a"(__old1), "d"(__old2));		\
+	__ret; })
+
+#define cmpxchg_double(ptr, o1, o2, n1, n2)				\
+({									\
+	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
+	VM_BUG_ON((unsigned long)(ptr) % 16);				\
+	cmpxchg16b((ptr), (o1), (o2), (n1), (n2));			\
+})
+
+#define cmpxchg_double_local(ptr, o1, o2, n1, n2)			\
+({									\
+	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
+	VM_BUG_ON((unsigned long)(ptr) % 16);				\
+	cmpxchg16b_local((ptr), (o1), (o2), (n1), (n2));		\
+})
+
+#define system_has_cmpxchg_double() cpu_has_cx16
+
 #endif /* _ASM_X86_CMPXCHG_64_H */
Index: linux-2.6/arch/x86/include/asm/cmpxchg_32.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/cmpxchg_32.h	2011-06-01 11:01:05.022406109 -0500
+++ linux-2.6/arch/x86/include/asm/cmpxchg_32.h	2011-06-01 11:01:48.222405834 -0500
@@ -280,4 +280,52 @@ static inline unsigned long cmpxchg_386(
 
 #endif
 
+#define cmpxchg8b(ptr, o1, o2, n1, n2)				\
+({								\
+	char __ret;						\
+	__typeof__(o2) __dummy;					\
+	__typeof__(*(ptr)) __old1 = (o1);			\
+	__typeof__(o2) __old2 = (o2);				\
+	__typeof__(*(ptr)) __new1 = (n1);			\
+	__typeof__(o2) __new2 = (n2);				\
+	asm volatile(LOCK_PREFIX "cmpxchg8b %2; setz %1"	\
+		       : "=d"(__dummy), "=a" (__ret), "+m" (*ptr)\
+		       : "a" (__old1), "d"(__old2),		\
+		         "b" (__new1), "c" (__new2)		\
+		       : "memory");				\
+	__ret; })
+
+
+#define cmpxchg8b_local(ptr, o1, o2, n1, n2)			\
+({								\
+	char __ret;						\
+	__typeof__(o2) __dummy;					\
+	__typeof__(*(ptr)) __old1 = (o1);			\
+	__typeof__(o2) __old2 = (o2);				\
+	__typeof__(*(ptr)) __new1 = (n1);			\
+	__typeof__(o2) __new2 = (n2);				\
+	asm volatile("cmpxchg8b %2; setz %1"			\
+		       : "=d"(__dummy), "=a"(__ret), "+m" (*ptr)\
+		       : "a" (__old), "d"(__old2),		\
+		         "b" (__new1), "c" (__new2),		\
+		       : "memory");				\
+	__ret; })
+
+
+#define cmpxchg_double(ptr, o1, o2, n1, n2)				\
+({									\
+	BUILD_BUG_ON(sizeof(*(ptr)) != 4);				\
+	VM_BUG_ON((unsigned long)(ptr) % 8);				\
+	cmpxchg8b((ptr), (o1), (o2), (n1), (n2));			\
+})
+
+#define cmpxchg_double_local(ptr, o1, o2, n1, n2)			\
+({									\
+       BUILD_BUG_ON(sizeof(*(ptr)) != 4);				\
+       VM_BUG_ON((unsigned long)(ptr) % 8);				\
+       cmpxchg16b_local((ptr), (o1), (o2), (n1), (n2));			\
+})
+
+#define system_has_cmpxchg_double() cpu_has_cx8
+
 #endif /* _ASM_X86_CMPXCHG_32_H */
Index: linux-2.6/arch/x86/Kconfig.cpu
===================================================================
--- linux-2.6.orig/arch/x86/Kconfig.cpu	2011-06-01 11:01:05.032406108 -0500
+++ linux-2.6/arch/x86/Kconfig.cpu	2011-06-01 11:02:20.912405628 -0500
@@ -312,6 +312,9 @@ config X86_CMPXCHG
 config CMPXCHG_LOCAL
 	def_bool X86_64 || (X86_32 && !M386)
 
+config CMPXCHG_DOUBLE
+	def_bool y
+
 config X86_L1_CACHE_SHIFT
 	int
 	default "7" if MPENTIUM4 || MPSC
Index: linux-2.6/arch/x86/include/asm/cpufeature.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/cpufeature.h	2011-06-01 11:01:05.012406112 -0500
+++ linux-2.6/arch/x86/include/asm/cpufeature.h	2011-06-01 11:01:48.222405834 -0500
@@ -288,6 +288,8 @@ extern const char * const x86_power_flag
 #define cpu_has_hypervisor	boot_cpu_has(X86_FEATURE_HYPERVISOR)
 #define cpu_has_pclmulqdq	boot_cpu_has(X86_FEATURE_PCLMULQDQ)
 #define cpu_has_perfctr_core	boot_cpu_has(X86_FEATURE_PERFCTR_CORE)
+#define cpu_has_cx8		boot_cpu_has(X86_FEATURE_CX8)
+#define cpu_has_cx16		boot_cpu_has(X86_FEATURE_CX16)
 
 #if defined(CONFIG_X86_INVLPG) || defined(CONFIG_X86_64)
 # define cpu_has_invlpg		1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [slubllv7 05/17] mm: Rearrange struct page
  2011-06-01 17:25 [slubllv7 00/17] SLUB: Lockless freelists for objects V7 Christoph Lameter
                   ` (3 preceding siblings ...)
  2011-06-01 17:25 ` [slubllv7 04/17] x86: Add support for cmpxchg_double Christoph Lameter
@ 2011-06-01 17:25 ` Christoph Lameter
  2011-06-09  9:57   ` Pekka Enberg
  2011-06-01 17:25 ` [slubllv7 06/17] slub: Add cmpxchg_double_slab() Christoph Lameter
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 41+ messages in thread
From: Christoph Lameter @ 2011-06-01 17:25 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Eric Dumazet, H. Peter Anvin, linux-kernel,
	Thomas Gleixner

[-- Attachment #1: resort_struct_page --]
[-- Type: text/plain, Size: 4950 bytes --]

We need to be able to use cmpxchg_double on the freelist and object count
field in struct page. Rearrange the fields in struct page according to
doubleword entities so that the freelist pointer comes before the counters.
Do the rearranging with a future in mind where we use more doubleword
atomics to avoid locking of updates to flags/mapping or lru pointers.

Create another union to allow access to counters in struct page as a
single unsigned long value.

The doublewords must be properly aligned for cmpxchg_double to work.
Sadly this increases the size of page struct by one word on some architectures.
But as a resultpage structs are now cacheline aligned on x86_64.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 include/linux/mm_types.h |   89 +++++++++++++++++++++++++++++++----------------
 1 file changed, 60 insertions(+), 29 deletions(-)

Index: linux-2.6/include/linux/mm_types.h
===================================================================
--- linux-2.6.orig/include/linux/mm_types.h	2011-05-31 09:46:41.912987862 -0500
+++ linux-2.6/include/linux/mm_types.h	2011-05-31 09:46:44.282987846 -0500
@@ -30,52 +30,74 @@ struct address_space;
  * moment. Note that we have no way to track which tasks are using
  * a page, though if it is a pagecache page, rmap structures can tell us
  * who is mapping it.
+ *
+ * The objects in struct page are organized in double word blocks in
+ * order to allows us to use atomic double word operations on portions
+ * of struct page. That is currently only used by slub but the arrangement
+ * allows the use of atomic double word operations on the flags/mapping
+ * and lru list pointers also.
  */
 struct page {
+	/* First double word block */
 	unsigned long flags;		/* Atomic flags, some possibly
 					 * updated asynchronously */
-	atomic_t _count;		/* Usage count, see below. */
+	struct address_space *mapping;	/* If low bit clear, points to
+					 * inode address_space, or NULL.
+					 * If page mapped as anonymous
+					 * memory, low bit is set, and
+					 * it points to anon_vma object:
+					 * see PAGE_MAPPING_ANON below.
+					 */
+	/* Second double word */
 	union {
-		atomic_t _mapcount;	/* Count of ptes mapped in mms,
-					 * to show when page is mapped
-					 * & limit reverse map searches.
+		struct {
+			pgoff_t index;		/* Our offset within mapping. */
+			atomic_t _mapcount;	/* Count of ptes mapped in mms,
+							 * to show when page is mapped
+							 * & limit reverse map searches.
+							 */
+			atomic_t _count;		/* Usage count, see below. */
+		};
+
+		struct {			/* SLUB cmpxchg_double area */
+			void *freelist;
+			union {
+				unsigned long counters;
+				struct {
+					unsigned inuse:16;
+					unsigned objects:15;
+					unsigned frozen:1;
+					/*
+					 * Kernel may make use of this field even when slub
+					 * uses the rest of the double word!
 					 */
-		struct {		/* SLUB */
-			unsigned inuse:16;
-			unsigned objects:15;
-			unsigned frozen:1;
+					atomic_t _count;
+				};
+			};
 		};
 	};
+
+	/* Third double word block */
+	struct list_head lru;		/* Pageout list, eg. active_list
+					 * protected by zone->lru_lock !
+					 */
+
+	/* Remainder is not double word aligned */
 	union {
-	    struct {
-		unsigned long private;		/* Mapping-private opaque data:
+	 	unsigned long private;		/* Mapping-private opaque data:
 					 	 * usually used for buffer_heads
 						 * if PagePrivate set; used for
 						 * swp_entry_t if PageSwapCache;
 						 * indicates order in the buddy
 						 * system if PG_buddy is set.
 						 */
-		struct address_space *mapping;	/* If low bit clear, points to
-						 * inode address_space, or NULL.
-						 * If page mapped as anonymous
-						 * memory, low bit is set, and
-						 * it points to anon_vma object:
-						 * see PAGE_MAPPING_ANON below.
-						 */
-	    };
 #if USE_SPLIT_PTLOCKS
-	    spinlock_t ptl;
+		spinlock_t ptl;
 #endif
-	    struct kmem_cache *slab;	/* SLUB: Pointer to slab */
-	    struct page *first_page;	/* Compound tail pages */
+		struct kmem_cache *slab;	/* SLUB: Pointer to slab */
+		struct page *first_page;	/* Compound tail pages */
 	};
-	union {
-		pgoff_t index;		/* Our offset within mapping. */
-		void *freelist;		/* SLUB: freelist req. slab lock */
-	};
-	struct list_head lru;		/* Pageout list, eg. active_list
-					 * protected by zone->lru_lock !
-					 */
+
 	/*
 	 * On machines where all RAM is mapped into kernel address space,
 	 * we can simply calculate the virtual address. On machines with
@@ -101,7 +123,16 @@ struct page {
 	 */
 	void *shadow;
 #endif
-};
+}
+/*
+ * If another subsystem starts using the double word pairing for atomic
+ * operations on struct page then it must change the #if to ensure
+ * proper alignment of the page struct.
+ */
+#if defined(CONFIG_SLUB) && defined(CONFIG_CMPXCHG_LOCAL)
+	__attribute__((__aligned__(2*sizeof(unsigned long))))
+#endif
+;
 
 typedef unsigned long __nocast vm_flags_t;
 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [slubllv7 06/17] slub: Add cmpxchg_double_slab()
  2011-06-01 17:25 [slubllv7 00/17] SLUB: Lockless freelists for objects V7 Christoph Lameter
                   ` (4 preceding siblings ...)
  2011-06-01 17:25 ` [slubllv7 05/17] mm: Rearrange struct page Christoph Lameter
@ 2011-06-01 17:25 ` Christoph Lameter
  2011-07-11 19:55   ` Eric Dumazet
  2011-06-01 17:25 ` [slubllv7 07/17] slub: explicit list_lock taking Christoph Lameter
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 41+ messages in thread
From: Christoph Lameter @ 2011-06-01 17:25 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Eric Dumazet, H. Peter Anvin, linux-kernel,
	Thomas Gleixner

[-- Attachment #1: cmpxchg_double_slab --]
[-- Type: text/plain, Size: 5211 bytes --]

Add a function that operates on the second doubleword in the page struct
and manipulates the object counters, the freelist and the frozen attribute.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 include/linux/slub_def.h |    1 
 mm/slub.c                |   65 +++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 61 insertions(+), 5 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-05-31 11:57:59.622937422 -0500
+++ linux-2.6/mm/slub.c	2011-05-31 12:03:16.652935392 -0500
@@ -131,6 +131,9 @@ static inline int kmem_cache_debug(struc
 /* Enable to test recovery from slab corruption on boot */
 #undef SLUB_RESILIENCY_TEST
 
+/* Enable to log cmpxchg failures */
+#undef SLUB_DEBUG_CMPXCHG
+
 /*
  * Mininum number of partial slabs. These will be left on the partial
  * lists even if they are empty. kmem_cache_shrink may reclaim them.
@@ -170,6 +173,7 @@ static inline int kmem_cache_debug(struc
 
 /* Internal SLUB flags */
 #define __OBJECT_POISON		0x80000000UL /* Poison object */
+#define __CMPXCHG_DOUBLE	0x40000000UL /* Use cmpxchg_double */
 
 static int kmem_size = sizeof(struct kmem_cache);
 
@@ -338,6 +342,37 @@ static inline int oo_objects(struct kmem
 	return x.x & OO_MASK;
 }
 
+static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
+		void *freelist_old, unsigned long counters_old,
+		void *freelist_new, unsigned long counters_new,
+		const char *n)
+{
+#ifdef CONFIG_CMPXCHG_DOUBLE
+	if (s->flags & __CMPXCHG_DOUBLE) {
+		if (cmpxchg_double(&page->freelist,
+			freelist_old, counters_old,
+			freelist_new, counters_new))
+		return 1;
+	} else
+#endif
+	{
+		if (page->freelist == freelist_old && page->counters == counters_old) {
+			page->freelist = freelist_new;
+			page->counters = counters_new;
+			return 1;
+		}
+	}
+
+	cpu_relax();
+	stat(s, CMPXCHG_DOUBLE_FAIL);
+
+#ifdef SLUB_DEBUG_CMPXCHG
+	printk(KERN_INFO "%s %s: cmpxchg double redo ", n, s->name);
+#endif
+
+	return 0;
+}
+
 #ifdef CONFIG_SLUB_DEBUG
 /*
  * Determine a map of object in use on a page.
@@ -2600,6 +2635,12 @@ static int kmem_cache_open(struct kmem_c
 		}
 	}
 
+#ifdef CONFIG_CMPXCHG_DOUBLE
+	if (system_has_cmpxchg_double() && (s->flags & SLAB_DEBUG_FLAGS) == 0)
+		/* Enable fast mode */
+		s->flags |= __CMPXCHG_DOUBLE;
+#endif
+
 	/*
 	 * The larger the object size is, the more pages we want on the partial
 	 * list to avoid pounding the page allocator excessively.
@@ -4252,8 +4293,10 @@ static ssize_t sanity_checks_store(struc
 				const char *buf, size_t length)
 {
 	s->flags &= ~SLAB_DEBUG_FREE;
-	if (buf[0] == '1')
+	if (buf[0] == '1') {
+		s->flags &= ~__CMPXCHG_DOUBLE;
 		s->flags |= SLAB_DEBUG_FREE;
+	}
 	return length;
 }
 SLAB_ATTR(sanity_checks);
@@ -4267,8 +4310,10 @@ static ssize_t trace_store(struct kmem_c
 							size_t length)
 {
 	s->flags &= ~SLAB_TRACE;
-	if (buf[0] == '1')
+	if (buf[0] == '1') {
+		s->flags &= ~__CMPXCHG_DOUBLE;
 		s->flags |= SLAB_TRACE;
+	}
 	return length;
 }
 SLAB_ATTR(trace);
@@ -4285,8 +4330,10 @@ static ssize_t red_zone_store(struct kme
 		return -EBUSY;
 
 	s->flags &= ~SLAB_RED_ZONE;
-	if (buf[0] == '1')
+	if (buf[0] == '1') {
+		s->flags &= ~__CMPXCHG_DOUBLE;
 		s->flags |= SLAB_RED_ZONE;
+	}
 	calculate_sizes(s, -1);
 	return length;
 }
@@ -4304,8 +4351,10 @@ static ssize_t poison_store(struct kmem_
 		return -EBUSY;
 
 	s->flags &= ~SLAB_POISON;
-	if (buf[0] == '1')
+	if (buf[0] == '1') {
+		s->flags &= ~__CMPXCHG_DOUBLE;
 		s->flags |= SLAB_POISON;
+	}
 	calculate_sizes(s, -1);
 	return length;
 }
@@ -4323,8 +4372,10 @@ static ssize_t store_user_store(struct k
 		return -EBUSY;
 
 	s->flags &= ~SLAB_STORE_USER;
-	if (buf[0] == '1')
+	if (buf[0] == '1') {
+		s->flags &= ~__CMPXCHG_DOUBLE;
 		s->flags |= SLAB_STORE_USER;
+	}
 	calculate_sizes(s, -1);
 	return length;
 }
@@ -4497,6 +4548,8 @@ STAT_ATTR(DEACTIVATE_TO_HEAD, deactivate
 STAT_ATTR(DEACTIVATE_TO_TAIL, deactivate_to_tail);
 STAT_ATTR(DEACTIVATE_REMOTE_FREES, deactivate_remote_frees);
 STAT_ATTR(ORDER_FALLBACK, order_fallback);
+STAT_ATTR(CMPXCHG_DOUBLE_CPU_FAIL, cmpxchg_double_cpu_fail);
+STAT_ATTR(CMPXCHG_DOUBLE_FAIL, cmpxchg_double_fail);
 #endif
 
 static struct attribute *slab_attrs[] = {
@@ -4554,6 +4607,8 @@ static struct attribute *slab_attrs[] =
 	&deactivate_to_tail_attr.attr,
 	&deactivate_remote_frees_attr.attr,
 	&order_fallback_attr.attr,
+	&cmpxchg_double_fail_attr.attr,
+	&cmpxchg_double_cpu_fail_attr.attr,
 #endif
 #ifdef CONFIG_FAILSLAB
 	&failslab_attr.attr,
Index: linux-2.6/include/linux/slub_def.h
===================================================================
--- linux-2.6.orig/include/linux/slub_def.h	2011-05-31 11:50:01.762940481 -0500
+++ linux-2.6/include/linux/slub_def.h	2011-05-31 11:58:01.742937411 -0500
@@ -33,6 +33,7 @@ enum stat_item {
 	DEACTIVATE_REMOTE_FREES,/* Slab contained remotely freed objects */
 	ORDER_FALLBACK,		/* Number of times fallback was necessary */
 	CMPXCHG_DOUBLE_CPU_FAIL,/* Failure of this_cpu_cmpxchg_double */
+	CMPXCHG_DOUBLE_FAIL,	/* Number of times that cmpxchg double did not match */
 	NR_SLUB_STAT_ITEMS };
 
 struct kmem_cache_cpu {


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [slubllv7 07/17] slub: explicit list_lock taking
  2011-06-01 17:25 [slubllv7 00/17] SLUB: Lockless freelists for objects V7 Christoph Lameter
                   ` (5 preceding siblings ...)
  2011-06-01 17:25 ` [slubllv7 06/17] slub: Add cmpxchg_double_slab() Christoph Lameter
@ 2011-06-01 17:25 ` Christoph Lameter
  2011-06-01 17:25 ` [slubllv7 08/17] slub: Pass kmem_cache struct to lock and freeze slab Christoph Lameter
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Christoph Lameter @ 2011-06-01 17:25 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Eric Dumazet, H. Peter Anvin, linux-kernel,
	Thomas Gleixner

[-- Attachment #1: unlock_list_ops --]
[-- Type: text/plain, Size: 6845 bytes --]

The allocator fastpath rework does change the usage of the list_lock.
Remove the list_lock processing from the functions that hide them from the
critical sections and move them into those critical sections.

This in turn simplifies the support functions (no __ variant needed anymore)
and simplifies the lock handling on bootstrap.

Inline add_partial since it becomes pretty simple.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 mm/slub.c |   89 ++++++++++++++++++++++++++++++++++----------------------------
 1 file changed, 49 insertions(+), 40 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-05-31 10:14:00.812977367 -0500
+++ linux-2.6/mm/slub.c	2011-05-31 10:14:03.852977349 -0500
@@ -916,26 +916,27 @@ static inline void slab_free_hook(struct
 
 /*
  * Tracking of fully allocated slabs for debugging purposes.
+ *
+ * list_lock must be held.
  */
-static void add_full(struct kmem_cache_node *n, struct page *page)
+static void add_full(struct kmem_cache *s,
+	struct kmem_cache_node *n, struct page *page)
 {
-	spin_lock(&n->list_lock);
+	if (!(s->flags & SLAB_STORE_USER))
+		return;
+
 	list_add(&page->lru, &n->full);
-	spin_unlock(&n->list_lock);
 }
 
+/*
+ * list_lock must be held.
+ */
 static void remove_full(struct kmem_cache *s, struct page *page)
 {
-	struct kmem_cache_node *n;
-
 	if (!(s->flags & SLAB_STORE_USER))
 		return;
 
-	n = get_node(s, page_to_nid(page));
-
-	spin_lock(&n->list_lock);
 	list_del(&page->lru);
-	spin_unlock(&n->list_lock);
 }
 
 /* Tracking of the number of slabs for debugging purposes */
@@ -1060,8 +1061,13 @@ static noinline int free_debug_processin
 	}
 
 	/* Special debug activities for freeing objects */
-	if (!page->frozen && !page->freelist)
+	if (!page->frozen && !page->freelist) {
+		struct kmem_cache_node *n = get_node(s, page_to_nid(page));
+
+		spin_lock(&n->list_lock);
 		remove_full(s, page);
+		spin_unlock(&n->list_lock);
+	}
 	if (s->flags & SLAB_STORE_USER)
 		set_track(s, object, TRACK_FREE, addr);
 	trace(s, page, object, 0);
@@ -1170,7 +1176,8 @@ static inline int slab_pad_check(struct
 			{ return 1; }
 static inline int check_object(struct kmem_cache *s, struct page *page,
 			void *object, u8 val) { return 1; }
-static inline void add_full(struct kmem_cache_node *n, struct page *page) {}
+static inline void add_full(struct kmem_cache *s, struct kmem_cache_node *n,
+					struct page *page) {}
 static inline unsigned long kmem_cache_flags(unsigned long objsize,
 	unsigned long flags, const char *name,
 	void (*ctor)(void *))
@@ -1420,38 +1427,33 @@ static __always_inline int slab_trylock(
 }
 
 /*
- * Management of partially allocated slabs
+ * Management of partially allocated slabs.
+ *
+ * list_lock must be held.
  */
-static void add_partial(struct kmem_cache_node *n,
+static inline void add_partial(struct kmem_cache_node *n,
 				struct page *page, int tail)
 {
-	spin_lock(&n->list_lock);
 	n->nr_partial++;
 	if (tail)
 		list_add_tail(&page->lru, &n->partial);
 	else
 		list_add(&page->lru, &n->partial);
-	spin_unlock(&n->list_lock);
 }
 
-static inline void __remove_partial(struct kmem_cache_node *n,
+/*
+ * list_lock must be held.
+ */
+static inline void remove_partial(struct kmem_cache_node *n,
 					struct page *page)
 {
 	list_del(&page->lru);
 	n->nr_partial--;
 }
 
-static void remove_partial(struct kmem_cache *s, struct page *page)
-{
-	struct kmem_cache_node *n = get_node(s, page_to_nid(page));
-
-	spin_lock(&n->list_lock);
-	__remove_partial(n, page);
-	spin_unlock(&n->list_lock);
-}
-
 /*
- * Lock slab and remove from the partial list.
+ * Lock slab, remove from the partial list and put the object into the
+ * per cpu freelist.
  *
  * Must hold list_lock.
  */
@@ -1459,7 +1461,7 @@ static inline int lock_and_freeze_slab(s
 							struct page *page)
 {
 	if (slab_trylock(page)) {
-		__remove_partial(n, page);
+		remove_partial(n, page);
 		return 1;
 	}
 	return 0;
@@ -1576,12 +1578,17 @@ static void unfreeze_slab(struct kmem_ca
 	if (page->inuse) {
 
 		if (page->freelist) {
+			spin_lock(&n->list_lock);
 			add_partial(n, page, tail);
+			spin_unlock(&n->list_lock);
 			stat(s, tail ? DEACTIVATE_TO_TAIL : DEACTIVATE_TO_HEAD);
 		} else {
 			stat(s, DEACTIVATE_FULL);
-			if (kmem_cache_debug(s) && (s->flags & SLAB_STORE_USER))
-				add_full(n, page);
+			if (kmem_cache_debug(s) && (s->flags & SLAB_STORE_USER)) {
+				spin_lock(&n->list_lock);
+				add_full(s, n, page);
+				spin_unlock(&n->list_lock);
+			}
 		}
 		slab_unlock(page);
 	} else {
@@ -1597,7 +1604,9 @@ static void unfreeze_slab(struct kmem_ca
 			 * kmem_cache_shrink can reclaim any empty slabs from
 			 * the partial list.
 			 */
+			spin_lock(&n->list_lock);
 			add_partial(n, page, 1);
+			spin_unlock(&n->list_lock);
 			slab_unlock(page);
 		} else {
 			slab_unlock(page);
@@ -2099,7 +2108,11 @@ static void __slab_free(struct kmem_cach
 	 * then add it.
 	 */
 	if (unlikely(!prior)) {
+		struct kmem_cache_node *n = get_node(s, page_to_nid(page));
+
+		spin_lock(&n->list_lock);
 		add_partial(get_node(s, page_to_nid(page)), page, 1);
+		spin_unlock(&n->list_lock);
 		stat(s, FREE_ADD_PARTIAL);
 	}
 
@@ -2113,7 +2126,11 @@ slab_empty:
 		/*
 		 * Slab still on the partial list.
 		 */
-		remove_partial(s, page);
+		struct kmem_cache_node *n = get_node(s, page_to_nid(page));
+
+		spin_lock(&n->list_lock);
+		remove_partial(n, page);
+		spin_unlock(&n->list_lock);
 		stat(s, FREE_REMOVE_PARTIAL);
 	}
 	slab_unlock(page);
@@ -2395,7 +2412,6 @@ static void early_kmem_cache_node_alloc(
 {
 	struct page *page;
 	struct kmem_cache_node *n;
-	unsigned long flags;
 
 	BUG_ON(kmem_cache_node->size < sizeof(struct kmem_cache_node));
 
@@ -2422,14 +2438,7 @@ static void early_kmem_cache_node_alloc(
 	init_kmem_cache_node(n, kmem_cache_node);
 	inc_slabs_node(kmem_cache_node, node, page->objects);
 
-	/*
-	 * lockdep requires consistent irq usage for each lock
-	 * so even though there cannot be a race this early in
-	 * the boot sequence, we still disable irqs.
-	 */
-	local_irq_save(flags);
 	add_partial(n, page, 0);
-	local_irq_restore(flags);
 }
 
 static void free_kmem_cache_nodes(struct kmem_cache *s)
@@ -2713,7 +2722,7 @@ static void free_partial(struct kmem_cac
 	spin_lock_irqsave(&n->list_lock, flags);
 	list_for_each_entry_safe(page, h, &n->partial, lru) {
 		if (!page->inuse) {
-			__remove_partial(n, page);
+			remove_partial(n, page);
 			discard_slab(s, page);
 		} else {
 			list_slab_objects(s, page,
@@ -3051,7 +3060,7 @@ int kmem_cache_shrink(struct kmem_cache
 				 * may have freed the last object and be
 				 * waiting to release the slab.
 				 */
-				__remove_partial(n, page);
+				remove_partial(n, page);
 				slab_unlock(page);
 				discard_slab(s, page);
 			} else {


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [slubllv7 08/17] slub: Pass kmem_cache struct to lock and freeze slab
  2011-06-01 17:25 [slubllv7 00/17] SLUB: Lockless freelists for objects V7 Christoph Lameter
                   ` (6 preceding siblings ...)
  2011-06-01 17:25 ` [slubllv7 07/17] slub: explicit list_lock taking Christoph Lameter
@ 2011-06-01 17:25 ` Christoph Lameter
  2011-06-01 17:25 ` [slubllv7 09/17] slub: Rework allocator fastpaths Christoph Lameter
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Christoph Lameter @ 2011-06-01 17:25 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Eric Dumazet, H. Peter Anvin, linux-kernel,
	Thomas Gleixner

[-- Attachment #1: pass_kmem_cache_to_lock_and_freeze --]
[-- Type: text/plain, Size: 2227 bytes --]

We need more information about the slab for the cmpxchg implementation.

Signed-off-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>

---
 mm/slub.c |   15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-05-31 10:14:03.852977349 -0500
+++ linux-2.6/mm/slub.c	2011-05-31 10:14:06.172977333 -0500
@@ -1457,8 +1457,8 @@ static inline void remove_partial(struct
  *
  * Must hold list_lock.
  */
-static inline int lock_and_freeze_slab(struct kmem_cache_node *n,
-							struct page *page)
+static inline int lock_and_freeze_slab(struct kmem_cache *s,
+		struct kmem_cache_node *n, struct page *page)
 {
 	if (slab_trylock(page)) {
 		remove_partial(n, page);
@@ -1470,7 +1470,8 @@ static inline int lock_and_freeze_slab(s
 /*
  * Try to allocate a partial slab from a specific node.
  */
-static struct page *get_partial_node(struct kmem_cache_node *n)
+static struct page *get_partial_node(struct kmem_cache *s,
+					struct kmem_cache_node *n)
 {
 	struct page *page;
 
@@ -1485,7 +1486,7 @@ static struct page *get_partial_node(str
 
 	spin_lock(&n->list_lock);
 	list_for_each_entry(page, &n->partial, lru)
-		if (lock_and_freeze_slab(n, page))
+		if (lock_and_freeze_slab(s, n, page))
 			goto out;
 	page = NULL;
 out:
@@ -1536,7 +1537,7 @@ static struct page *get_any_partial(stru
 
 		if (n && cpuset_zone_allowed_hardwall(zone, flags) &&
 				n->nr_partial > s->min_partial) {
-			page = get_partial_node(n);
+			page = get_partial_node(s, n);
 			if (page) {
 				put_mems_allowed();
 				return page;
@@ -1556,7 +1557,7 @@ static struct page *get_partial(struct k
 	struct page *page;
 	int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node;
 
-	page = get_partial_node(get_node(s, searchnode));
+	page = get_partial_node(s, get_node(s, searchnode));
 	if (page || node != NUMA_NO_NODE)
 		return page;
 
@@ -2081,7 +2082,7 @@ static void __slab_free(struct kmem_cach
 {
 	void *prior;
 	void **object = (void *)x;
-	unsigned long flags;
+	unsigned long uninitialized_var(flags);
 
 	local_irq_save(flags);
 	slab_lock(page);


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [slubllv7 09/17] slub: Rework allocator fastpaths
  2011-06-01 17:25 [slubllv7 00/17] SLUB: Lockless freelists for objects V7 Christoph Lameter
                   ` (7 preceding siblings ...)
  2011-06-01 17:25 ` [slubllv7 08/17] slub: Pass kmem_cache struct to lock and freeze slab Christoph Lameter
@ 2011-06-01 17:25 ` Christoph Lameter
  2011-06-01 17:25 ` [slubllv7 10/17] slub: Invert locking and avoid slab lock Christoph Lameter
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Christoph Lameter @ 2011-06-01 17:25 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Eric Dumazet, H. Peter Anvin, linux-kernel,
	Thomas Gleixner

[-- Attachment #1: rework_fastpaths --]
[-- Type: text/plain, Size: 14347 bytes --]

Rework the allocation paths so that updates of the page freelist, frozen state
and number of objects use cmpxchg_double_slab().

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 mm/slub.c |  409 ++++++++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 280 insertions(+), 129 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-05-31 10:14:06.172977333 -0500
+++ linux-2.6/mm/slub.c	2011-05-31 10:17:07.132976175 -0500
@@ -992,11 +992,6 @@ static noinline int alloc_debug_processi
 	if (!check_slab(s, page))
 		goto bad;
 
-	if (!on_freelist(s, page, object)) {
-		object_err(s, page, object, "Object already allocated");
-		goto bad;
-	}
-
 	if (!check_valid_pointer(s, page, object)) {
 		object_err(s, page, object, "Freelist Pointer check fails");
 		goto bad;
@@ -1060,14 +1055,6 @@ static noinline int free_debug_processin
 		goto fail;
 	}
 
-	/* Special debug activities for freeing objects */
-	if (!page->frozen && !page->freelist) {
-		struct kmem_cache_node *n = get_node(s, page_to_nid(page));
-
-		spin_lock(&n->list_lock);
-		remove_full(s, page);
-		spin_unlock(&n->list_lock);
-	}
 	if (s->flags & SLAB_STORE_USER)
 		set_track(s, object, TRACK_FREE, addr);
 	trace(s, page, object, 0);
@@ -1178,6 +1165,7 @@ static inline int check_object(struct km
 			void *object, u8 val) { return 1; }
 static inline void add_full(struct kmem_cache *s, struct kmem_cache_node *n,
 					struct page *page) {}
+static inline void remove_full(struct kmem_cache *s, struct page *page) {}
 static inline unsigned long kmem_cache_flags(unsigned long objsize,
 	unsigned long flags, const char *name,
 	void (*ctor)(void *))
@@ -1460,11 +1448,52 @@ static inline void remove_partial(struct
 static inline int lock_and_freeze_slab(struct kmem_cache *s,
 		struct kmem_cache_node *n, struct page *page)
 {
-	if (slab_trylock(page)) {
-		remove_partial(n, page);
+	void *freelist;
+	unsigned long counters;
+	struct page new;
+
+
+	if (!slab_trylock(page))
+		return 0;
+
+	/*
+	 * Zap the freelist and set the frozen bit.
+	 * The old freelist is the list of objects for the
+	 * per cpu allocation list.
+	 */
+	do {
+		freelist = page->freelist;
+		counters = page->counters;
+		new.counters = counters;
+		new.inuse = page->objects;
+
+		VM_BUG_ON(new.frozen);
+		new.frozen = 1;
+
+	} while (!cmpxchg_double_slab(s, page,
+			freelist, counters,
+			NULL, new.counters,
+			"lock and freeze"));
+
+	remove_partial(n, page);
+
+	if (freelist) {
+		/* Populate the per cpu freelist */
+		this_cpu_write(s->cpu_slab->freelist, freelist);
+		this_cpu_write(s->cpu_slab->page, page);
+		this_cpu_write(s->cpu_slab->node, page_to_nid(page));
 		return 1;
+	} else {
+		/*
+		 * Slab page came from the wrong list. No object to allocate
+		 * from. Put it onto the correct list and continue partial
+		 * scan.
+		 */
+		printk(KERN_ERR "SLUB: %s : Page without available objects on"
+			" partial list\n", s->name);
+		slab_unlock(page);
+		return 0;
 	}
-	return 0;
 }
 
 /*
@@ -1564,59 +1593,6 @@ static struct page *get_partial(struct k
 	return get_any_partial(s, flags);
 }
 
-/*
- * Move a page back to the lists.
- *
- * Must be called with the slab lock held.
- *
- * On exit the slab lock will have been dropped.
- */
-static void unfreeze_slab(struct kmem_cache *s, struct page *page, int tail)
-	__releases(bitlock)
-{
-	struct kmem_cache_node *n = get_node(s, page_to_nid(page));
-
-	if (page->inuse) {
-
-		if (page->freelist) {
-			spin_lock(&n->list_lock);
-			add_partial(n, page, tail);
-			spin_unlock(&n->list_lock);
-			stat(s, tail ? DEACTIVATE_TO_TAIL : DEACTIVATE_TO_HEAD);
-		} else {
-			stat(s, DEACTIVATE_FULL);
-			if (kmem_cache_debug(s) && (s->flags & SLAB_STORE_USER)) {
-				spin_lock(&n->list_lock);
-				add_full(s, n, page);
-				spin_unlock(&n->list_lock);
-			}
-		}
-		slab_unlock(page);
-	} else {
-		stat(s, DEACTIVATE_EMPTY);
-		if (n->nr_partial < s->min_partial) {
-			/*
-			 * Adding an empty slab to the partial slabs in order
-			 * to avoid page allocator overhead. This slab needs
-			 * to come after the other slabs with objects in
-			 * so that the others get filled first. That way the
-			 * size of the partial list stays small.
-			 *
-			 * kmem_cache_shrink can reclaim any empty slabs from
-			 * the partial list.
-			 */
-			spin_lock(&n->list_lock);
-			add_partial(n, page, 1);
-			spin_unlock(&n->list_lock);
-			slab_unlock(page);
-		} else {
-			slab_unlock(page);
-			stat(s, FREE_SLAB);
-			discard_slab(s, page);
-		}
-	}
-}
-
 #ifdef CONFIG_PREEMPT
 /*
  * Calculate the next globally unique transaction for disambiguiation
@@ -1686,37 +1662,158 @@ void init_kmem_cache_cpus(struct kmem_ca
 /*
  * Remove the cpu slab
  */
+
+/*
+ * Remove the cpu slab
+ */
 static void deactivate_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
-	__releases(bitlock)
 {
+	enum slab_modes { M_NONE, M_PARTIAL, M_FULL, M_FREE };
 	struct page *page = c->page;
-	int tail = 1;
+	struct kmem_cache_node *n = get_node(s, page_to_nid(page));
+	int lock = 0;
+	enum slab_modes l = M_NONE, m = M_NONE;
+	void *freelist;
+	void *nextfree;
+	int tail = 0;
+	struct page new;
+	struct page old;
 
-	if (page->freelist)
+	if (page->freelist) {
 		stat(s, DEACTIVATE_REMOTE_FREES);
+		tail = 1;
+	}
+
+	c->tid = next_tid(c->tid);
+	c->page = NULL;
+	freelist = c->freelist;
+	c->freelist = NULL;
+
 	/*
-	 * Merge cpu freelist into slab freelist. Typically we get here
-	 * because both freelists are empty. So this is unlikely
-	 * to occur.
+	 * Stage one: Free all available per cpu objects back
+	 * to the page freelist while it is still frozen. Leave the
+	 * last one.
+	 *
+	 * There is no need to take the list->lock because the page
+	 * is still frozen.
 	 */
-	while (unlikely(c->freelist)) {
-		void **object;
+	while (freelist && (nextfree = get_freepointer(s, freelist))) {
+		void *prior;
+		unsigned long counters;
+
+		do {
+			prior = page->freelist;
+			counters = page->counters;
+			set_freepointer(s, freelist, prior);
+			new.counters = counters;
+			new.inuse--;
+			VM_BUG_ON(!new.frozen);
+
+		} while (!cmpxchg_double_slab(s, page,
+			prior, counters,
+			freelist, new.counters,
+			"drain percpu freelist"));
 
-		tail = 0;	/* Hot objects. Put the slab first */
+		freelist = nextfree;
+	}
 
-		/* Retrieve object from cpu_freelist */
-		object = c->freelist;
-		c->freelist = get_freepointer(s, c->freelist);
+	/*
+	 * Stage two: Ensure that the page is unfrozen while the
+	 * list presence reflects the actual number of objects
+	 * during unfreeze.
+	 *
+	 * We setup the list membership and then perform a cmpxchg
+	 * with the count. If there is a mismatch then the page
+	 * is not unfrozen but the page is on the wrong list.
+	 *
+	 * Then we restart the process which may have to remove
+	 * the page from the list that we just put it on again
+	 * because the number of objects in the slab may have
+	 * changed.
+	 */
+redo:
 
-		/* And put onto the regular freelist */
-		set_freepointer(s, object, page->freelist);
-		page->freelist = object;
-		page->inuse--;
+	old.freelist = page->freelist;
+	old.counters = page->counters;
+	VM_BUG_ON(!old.frozen);
+
+	/* Determine target state of the slab */
+	new.counters = old.counters;
+	if (freelist) {
+		new.inuse--;
+		set_freepointer(s, freelist, old.freelist);
+		new.freelist = freelist;
+	} else
+		new.freelist = old.freelist;
+
+	new.frozen = 0;
+
+	if (!new.inuse && n->nr_partial < s->min_partial)
+		m = M_FREE;
+	else if (new.freelist) {
+		m = M_PARTIAL;
+		if (!lock) {
+			lock = 1;
+			/*
+			 * Taking the spinlock removes the possiblity
+			 * that acquire_slab() will see a slab page that
+			 * is frozen
+			 */
+			spin_lock(&n->list_lock);
+		}
+	} else {
+		m = M_FULL;
+		if (kmem_cache_debug(s) && !lock) {
+			lock = 1;
+			/*
+			 * This also ensures that the scanning of full
+			 * slabs from diagnostic functions will not see
+			 * any frozen slabs.
+			 */
+			spin_lock(&n->list_lock);
+		}
+	}
+
+	if (l != m) {
+
+		if (l == M_PARTIAL)
+
+			remove_partial(n, page);
+
+		else if (l == M_FULL)
+
+			remove_full(s, page);
+
+		if (m == M_PARTIAL) {
+
+			add_partial(n, page, tail);
+			stat(s, tail ? DEACTIVATE_TO_TAIL : DEACTIVATE_TO_HEAD);
+
+		} else if (m == M_FULL) {
+
+			stat(s, DEACTIVATE_FULL);
+			add_full(s, n, page);
+
+		}
+	}
+
+	l = m;
+	if (!cmpxchg_double_slab(s, page,
+				old.freelist, old.counters,
+				new.freelist, new.counters,
+				"unfreezing slab"))
+		goto redo;
+
+	slab_unlock(page);
+
+	if (lock)
+		spin_unlock(&n->list_lock);
+
+	if (m == M_FREE) {
+		stat(s, DEACTIVATE_EMPTY);
+		discard_slab(s, page);
+		stat(s, FREE_SLAB);
 	}
-	c->page = NULL;
-	c->tid = next_tid(c->tid);
-	page->frozen = 0;
-	unfreeze_slab(s, page, tail);
 }
 
 static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
@@ -1851,6 +1948,8 @@ static void *__slab_alloc(struct kmem_ca
 	void **object;
 	struct page *page;
 	unsigned long flags;
+	struct page new;
+	unsigned long counters;
 
 	local_irq_save(flags);
 #ifdef CONFIG_PREEMPT
@@ -1873,25 +1972,33 @@ static void *__slab_alloc(struct kmem_ca
 	if (unlikely(!node_match(c, node)))
 		goto another_slab;
 
-	stat(s, ALLOC_REFILL);
+	stat(s, ALLOC_SLOWPATH);
+
+	do {
+		object = page->freelist;
+		counters = page->counters;
+		new.counters = counters;
+		new.inuse = page->objects;
+		VM_BUG_ON(!new.frozen);
+
+	} while (!cmpxchg_double_slab(s, page,
+			object, counters,
+			NULL, new.counters,
+			"__slab_alloc"));
 
 load_freelist:
 	VM_BUG_ON(!page->frozen);
 
-	object = page->freelist;
 	if (unlikely(!object))
 		goto another_slab;
-	if (kmem_cache_debug(s))
-		goto debug;
 
-	c->freelist = get_freepointer(s, object);
-	page->inuse = page->objects;
-	page->freelist = NULL;
+	stat(s, ALLOC_REFILL);
 
 	slab_unlock(page);
+
+	c->freelist = get_freepointer(s, object);
 	c->tid = next_tid(c->tid);
 	local_irq_restore(flags);
-	stat(s, ALLOC_SLOWPATH);
 	return object;
 
 another_slab:
@@ -1901,9 +2008,10 @@ new_slab:
 	page = get_partial(s, gfpflags, node);
 	if (page) {
 		stat(s, ALLOC_FROM_PARTIAL);
-		page->frozen = 1;
-		c->node = page_to_nid(page);
-		c->page = page;
+		object = c->freelist;
+
+		if (kmem_cache_debug(s))
+			goto debug;
 		goto load_freelist;
 	}
 
@@ -1911,12 +2019,19 @@ new_slab:
 
 	if (page) {
 		c = __this_cpu_ptr(s->cpu_slab);
-		stat(s, ALLOC_SLAB);
 		if (c->page)
 			flush_slab(s, c);
 
+		/*
+		 * No other reference to the page yet so we can
+		 * muck around with it freely without cmpxchg
+		 */
+		object = page->freelist;
+		page->freelist = NULL;
+		page->inuse = page->objects;
+
+		stat(s, ALLOC_SLAB);
 		slab_lock(page);
-		page->frozen = 1;
 		c->node = page_to_nid(page);
 		c->page = page;
 		goto load_freelist;
@@ -1925,12 +2040,12 @@ new_slab:
 		slab_out_of_memory(s, gfpflags, node);
 	local_irq_restore(flags);
 	return NULL;
+
 debug:
-	if (!alloc_debug_processing(s, page, object, addr))
-		goto another_slab;
+	if (!object || !alloc_debug_processing(s, page, object, addr))
+		goto new_slab;
 
-	page->inuse++;
-	page->freelist = get_freepointer(s, object);
+	c->freelist = get_freepointer(s, object);
 	deactivate_slab(s, c);
 	c->page = NULL;
 	c->node = NUMA_NO_NODE;
@@ -2082,6 +2197,11 @@ static void __slab_free(struct kmem_cach
 {
 	void *prior;
 	void **object = (void *)x;
+	int was_frozen;
+	int inuse;
+	struct page new;
+	unsigned long counters;
+	struct kmem_cache_node *n = NULL;
 	unsigned long uninitialized_var(flags);
 
 	local_irq_save(flags);
@@ -2091,32 +2211,65 @@ static void __slab_free(struct kmem_cach
 	if (kmem_cache_debug(s) && !free_debug_processing(s, page, x, addr))
 		goto out_unlock;
 
-	prior = page->freelist;
-	set_freepointer(s, object, prior);
-	page->freelist = object;
-	page->inuse--;
-
-	if (unlikely(page->frozen)) {
-		stat(s, FREE_FROZEN);
-		goto out_unlock;
-	}
+	do {
+		prior = page->freelist;
+		counters = page->counters;
+		set_freepointer(s, object, prior);
+		new.counters = counters;
+		was_frozen = new.frozen;
+		new.inuse--;
+		if ((!new.inuse || !prior) && !was_frozen && !n) {
+                        n = get_node(s, page_to_nid(page));
+			/*
+			 * Speculatively acquire the list_lock.
+			 * If the cmpxchg does not succeed then we may
+			 * drop the list_lock without any processing.
+			 *
+			 * Otherwise the list_lock will synchronize with
+			 * other processors updating the list of slabs.
+			 */
+                        spin_lock(&n->list_lock);
+		}
+		inuse = new.inuse;
 
-	if (unlikely(!page->inuse))
-		goto slab_empty;
+	} while (!cmpxchg_double_slab(s, page,
+		prior, counters,
+		object, new.counters,
+		"__slab_free"));
+
+	if (likely(!n)) {
+                /*
+		 * The list lock was not taken therefore no list
+		 * activity can be necessary.
+		 */
+                if (was_frozen)
+                        stat(s, FREE_FROZEN);
+                goto out_unlock;
+        }
 
 	/*
-	 * Objects left in the slab. If it was not on the partial list before
-	 * then add it.
+	 * was_frozen may have been set after we acquired the list_lock in
+	 * an earlier loop. So we need to check it here again.
 	 */
-	if (unlikely(!prior)) {
-		struct kmem_cache_node *n = get_node(s, page_to_nid(page));
+	if (was_frozen)
+		stat(s, FREE_FROZEN);
+	else {
+		if (unlikely(!inuse && n->nr_partial > s->min_partial))
+                        goto slab_empty;
 
-		spin_lock(&n->list_lock);
-		add_partial(get_node(s, page_to_nid(page)), page, 1);
-		spin_unlock(&n->list_lock);
-		stat(s, FREE_ADD_PARTIAL);
+		/*
+		 * Objects left in the slab. If it was not on the partial list before
+		 * then add it.
+		 */
+		if (unlikely(!prior)) {
+			remove_full(s, page);
+			add_partial(n, page, 0);
+			stat(s, FREE_ADD_PARTIAL);
+		}
 	}
 
+	spin_unlock(&n->list_lock);
+
 out_unlock:
 	slab_unlock(page);
 	local_irq_restore(flags);
@@ -2127,13 +2280,11 @@ slab_empty:
 		/*
 		 * Slab still on the partial list.
 		 */
-		struct kmem_cache_node *n = get_node(s, page_to_nid(page));
-
-		spin_lock(&n->list_lock);
 		remove_partial(n, page);
-		spin_unlock(&n->list_lock);
 		stat(s, FREE_REMOVE_PARTIAL);
 	}
+
+	spin_unlock(&n->list_lock);
 	slab_unlock(page);
 	local_irq_restore(flags);
 	stat(s, FREE_SLAB);


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [slubllv7 10/17] slub: Invert locking and avoid slab lock
  2011-06-01 17:25 [slubllv7 00/17] SLUB: Lockless freelists for objects V7 Christoph Lameter
                   ` (8 preceding siblings ...)
  2011-06-01 17:25 ` [slubllv7 09/17] slub: Rework allocator fastpaths Christoph Lameter
@ 2011-06-01 17:25 ` Christoph Lameter
  2011-06-01 17:25 ` [slubllv7 11/17] slub: Disable interrupts in free_debug processing Christoph Lameter
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Christoph Lameter @ 2011-06-01 17:25 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Eric Dumazet, H. Peter Anvin, linux-kernel,
	Thomas Gleixner

[-- Attachment #1: slab_lock_subsume --]
[-- Type: text/plain, Size: 10333 bytes --]

Locking slabs is no longer necesary if the arch supports cmpxchg operations
and if no debuggin features are used on a slab. If the arch does not support
cmpxchg then we fallback to use the slab lock to do a cmpxchg like operation.

The patch also changes the lock order. Slab locks are subsumed to the node lock
now. With that approach slab_trylocking is no longer necessary.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 mm/slub.c |  131 +++++++++++++++++++++++++-------------------------------------
 1 file changed, 53 insertions(+), 78 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-05-31 10:17:07.132976175 -0500
+++ linux-2.6/mm/slub.c	2011-05-31 10:20:05.472975031 -0500
@@ -2,10 +2,11 @@
  * SLUB: A slab allocator that limits cache line use instead of queuing
  * objects in per cpu and per node lists.
  *
- * The allocator synchronizes using per slab locks and only
- * uses a centralized lock to manage a pool of partial slabs.
+ * The allocator synchronizes using per slab locks or atomic operatios
+ * and only uses a centralized lock to manage a pool of partial slabs.
  *
  * (C) 2007 SGI, Christoph Lameter
+ * (C) 2011 Linux Foundation, Christoph Lameter
  */
 
 #include <linux/mm.h>
@@ -32,15 +33,27 @@
 
 /*
  * Lock order:
- *   1. slab_lock(page)
- *   2. slab->list_lock
- *
- *   The slab_lock protects operations on the object of a particular
- *   slab and its metadata in the page struct. If the slab lock
- *   has been taken then no allocations nor frees can be performed
- *   on the objects in the slab nor can the slab be added or removed
- *   from the partial or full lists since this would mean modifying
- *   the page_struct of the slab.
+ *   1. slub_lock (Global Semaphore)
+ *   2. node->list_lock
+ *   3. slab_lock(page) (Only on some arches and for debugging)
+ *
+ *   slub_lock
+ *
+ *   The role of the slub_lock is to protect the list of all the slabs
+ *   and to synchronize major metadata changes to slab cache structures.
+ *
+ *   The slab_lock is only used for debugging and on arches that do not
+ *   have the ability to do a cmpxchg_double. It only protects the second
+ *   double word in the page struct. Meaning
+ *	A. page->freelist	-> List of object free in a page
+ *	B. page->counters	-> Counters of objects
+ *	C. page->frozen		-> frozen state
+ *
+ *   If a slab is frozen then it is exempt from list management. It is not
+ *   on any list. The processor that froze the slab is the one who can
+ *   perform list operations on the page. Other processors may put objects
+ *   onto the freelist but the processor that froze the slab is the only
+ *   one that can retrieve the objects from the page's freelist.
  *
  *   The list_lock protects the partial and full list on each node and
  *   the partial slab counter. If taken then no new slabs may be added or
@@ -53,20 +66,6 @@
  *   slabs, operations can continue without any centralized lock. F.e.
  *   allocating a long series of objects that fill up slabs does not require
  *   the list lock.
- *
- *   The lock order is sometimes inverted when we are trying to get a slab
- *   off a list. We take the list_lock and then look for a page on the list
- *   to use. While we do that objects in the slabs may be freed. We can
- *   only operate on the slab if we have also taken the slab_lock. So we use
- *   a slab_trylock() on the slab. If trylock was successful then no frees
- *   can occur anymore and we can use the slab for allocations etc. If the
- *   slab_trylock() does not succeed then frees are in progress in the slab and
- *   we must stay away from it for a while since we may cause a bouncing
- *   cacheline if we try to acquire the lock. So go onto the next slab.
- *   If all pages are busy then we may allocate a new slab instead of reusing
- *   a partial slab. A new slab has no one operating on it and thus there is
- *   no danger of cacheline contention.
- *
  *   Interrupts are disabled during allocation and deallocation in order to
  *   make the slab allocator safe to use in the context of an irq. In addition
  *   interrupts are disabled to ensure that the processor does not change
@@ -342,6 +341,19 @@ static inline int oo_objects(struct kmem
 	return x.x & OO_MASK;
 }
 
+/*
+ * Per slab locking using the pagelock
+ */
+static __always_inline void slab_lock(struct page *page)
+{
+	bit_spin_lock(PG_locked, &page->flags);
+}
+
+static __always_inline void slab_unlock(struct page *page)
+{
+	__bit_spin_unlock(PG_locked, &page->flags);
+}
+
 static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
 		void *freelist_old, unsigned long counters_old,
 		void *freelist_new, unsigned long counters_new,
@@ -356,11 +368,14 @@ static inline bool cmpxchg_double_slab(s
 	} else
 #endif
 	{
+		slab_lock(page);
 		if (page->freelist == freelist_old && page->counters == counters_old) {
 			page->freelist = freelist_new;
 			page->counters = counters_new;
+			slab_unlock(page);
 			return 1;
 		}
+		slab_unlock(page);
 	}
 
 	cpu_relax();
@@ -377,7 +392,7 @@ static inline bool cmpxchg_double_slab(s
 /*
  * Determine a map of object in use on a page.
  *
- * Slab lock or node listlock must be held to guarantee that the page does
+ * Node listlock must be held to guarantee that the page does
  * not vanish from under us.
  */
 static void get_map(struct kmem_cache *s, struct page *page, unsigned long *map)
@@ -808,10 +823,11 @@ static int check_slab(struct kmem_cache
 static int on_freelist(struct kmem_cache *s, struct page *page, void *search)
 {
 	int nr = 0;
-	void *fp = page->freelist;
+	void *fp;
 	void *object = NULL;
 	unsigned long max_objects;
 
+	fp = page->freelist;
 	while (fp && nr <= page->objects) {
 		if (fp == search)
 			return 1;
@@ -1024,6 +1040,8 @@ bad:
 static noinline int free_debug_processing(struct kmem_cache *s,
 		 struct page *page, void *object, unsigned long addr)
 {
+	slab_lock(page);
+
 	if (!check_slab(s, page))
 		goto fail;
 
@@ -1059,10 +1077,12 @@ static noinline int free_debug_processin
 		set_track(s, object, TRACK_FREE, addr);
 	trace(s, page, object, 0);
 	init_object(s, object, SLUB_RED_INACTIVE);
+	slab_unlock(page);
 	return 1;
 
 fail:
 	slab_fix(s, "Object at 0x%p not freed", object);
+	slab_unlock(page);
 	return 0;
 }
 
@@ -1394,27 +1414,6 @@ static void discard_slab(struct kmem_cac
 }
 
 /*
- * Per slab locking using the pagelock
- */
-static __always_inline void slab_lock(struct page *page)
-{
-	bit_spin_lock(PG_locked, &page->flags);
-}
-
-static __always_inline void slab_unlock(struct page *page)
-{
-	__bit_spin_unlock(PG_locked, &page->flags);
-}
-
-static __always_inline int slab_trylock(struct page *page)
-{
-	int rc = 1;
-
-	rc = bit_spin_trylock(PG_locked, &page->flags);
-	return rc;
-}
-
-/*
  * Management of partially allocated slabs.
  *
  * list_lock must be held.
@@ -1445,17 +1444,13 @@ static inline void remove_partial(struct
  *
  * Must hold list_lock.
  */
-static inline int lock_and_freeze_slab(struct kmem_cache *s,
+static inline int acquire_slab(struct kmem_cache *s,
 		struct kmem_cache_node *n, struct page *page)
 {
 	void *freelist;
 	unsigned long counters;
 	struct page new;
 
-
-	if (!slab_trylock(page))
-		return 0;
-
 	/*
 	 * Zap the freelist and set the frozen bit.
 	 * The old freelist is the list of objects for the
@@ -1491,7 +1486,6 @@ static inline int lock_and_freeze_slab(s
 		 */
 		printk(KERN_ERR "SLUB: %s : Page without available objects on"
 			" partial list\n", s->name);
-		slab_unlock(page);
 		return 0;
 	}
 }
@@ -1515,7 +1509,7 @@ static struct page *get_partial_node(str
 
 	spin_lock(&n->list_lock);
 	list_for_each_entry(page, &n->partial, lru)
-		if (lock_and_freeze_slab(s, n, page))
+		if (acquire_slab(s, n, page))
 			goto out;
 	page = NULL;
 out:
@@ -1804,8 +1798,6 @@ redo:
 				"unfreezing slab"))
 		goto redo;
 
-	slab_unlock(page);
-
 	if (lock)
 		spin_unlock(&n->list_lock);
 
@@ -1819,7 +1811,6 @@ redo:
 static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 {
 	stat(s, CPUSLAB_FLUSH);
-	slab_lock(c->page);
 	deactivate_slab(s, c);
 }
 
@@ -1968,7 +1959,6 @@ static void *__slab_alloc(struct kmem_ca
 	if (!page)
 		goto new_slab;
 
-	slab_lock(page);
 	if (unlikely(!node_match(c, node)))
 		goto another_slab;
 
@@ -1994,8 +1984,6 @@ load_freelist:
 
 	stat(s, ALLOC_REFILL);
 
-	slab_unlock(page);
-
 	c->freelist = get_freepointer(s, object);
 	c->tid = next_tid(c->tid);
 	local_irq_restore(flags);
@@ -2031,7 +2019,6 @@ new_slab:
 		page->inuse = page->objects;
 
 		stat(s, ALLOC_SLAB);
-		slab_lock(page);
 		c->node = page_to_nid(page);
 		c->page = page;
 		goto load_freelist;
@@ -2205,7 +2192,6 @@ static void __slab_free(struct kmem_cach
 	unsigned long uninitialized_var(flags);
 
 	local_irq_save(flags);
-	slab_lock(page);
 	stat(s, FREE_SLOWPATH);
 
 	if (kmem_cache_debug(s) && !free_debug_processing(s, page, x, addr))
@@ -2271,7 +2257,6 @@ static void __slab_free(struct kmem_cach
 	spin_unlock(&n->list_lock);
 
 out_unlock:
-	slab_unlock(page);
 	local_irq_restore(flags);
 	return;
 
@@ -2285,7 +2270,6 @@ slab_empty:
 	}
 
 	spin_unlock(&n->list_lock);
-	slab_unlock(page);
 	local_irq_restore(flags);
 	stat(s, FREE_SLAB);
 	discard_slab(s, page);
@@ -3206,14 +3190,8 @@ int kmem_cache_shrink(struct kmem_cache
 		 * list_lock. page->inuse here is the upper limit.
 		 */
 		list_for_each_entry_safe(page, t, &n->partial, lru) {
-			if (!page->inuse && slab_trylock(page)) {
-				/*
-				 * Must hold slab lock here because slab_free
-				 * may have freed the last object and be
-				 * waiting to release the slab.
-				 */
+			if (!page->inuse) {
 				remove_partial(n, page);
-				slab_unlock(page);
 				discard_slab(s, page);
 			} else {
 				list_move(&page->lru,
@@ -3801,12 +3779,9 @@ static int validate_slab(struct kmem_cac
 static void validate_slab_slab(struct kmem_cache *s, struct page *page,
 						unsigned long *map)
 {
-	if (slab_trylock(page)) {
-		validate_slab(s, page, map);
-		slab_unlock(page);
-	} else
-		printk(KERN_INFO "SLUB %s: Skipped busy slab 0x%p\n",
-			s->name, page);
+	slab_lock(page);
+	validate_slab(s, page, map);
+	slab_unlock(page);
 }
 
 static int validate_slab_node(struct kmem_cache *s,


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [slubllv7 11/17] slub: Disable interrupts in free_debug processing
  2011-06-01 17:25 [slubllv7 00/17] SLUB: Lockless freelists for objects V7 Christoph Lameter
                   ` (9 preceding siblings ...)
  2011-06-01 17:25 ` [slubllv7 10/17] slub: Invert locking and avoid slab lock Christoph Lameter
@ 2011-06-01 17:25 ` Christoph Lameter
  2011-06-01 17:25 ` [slubllv7 12/17] slub: Avoid disabling interrupts in free slowpath Christoph Lameter
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Christoph Lameter @ 2011-06-01 17:25 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Eric Dumazet, H. Peter Anvin, linux-kernel,
	Thomas Gleixner

[-- Attachment #1: irqoff_in_free_debug_processing --]
[-- Type: text/plain, Size: 1466 bytes --]

We will be calling free_debug_processing with interrupts disabled
in some case when the later patches are applied. Some of the
functions called by free_debug_processing expect interrupts to be
off.

Signed-off-by: Christoph Lameter <cl@linux.com>


---
 mm/slub.c |   14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-05-31 10:20:05.472975031 -0500
+++ linux-2.6/mm/slub.c	2011-05-31 10:20:09.792975006 -0500
@@ -1040,6 +1040,10 @@ bad:
 static noinline int free_debug_processing(struct kmem_cache *s,
 		 struct page *page, void *object, unsigned long addr)
 {
+	unsigned long flags;
+	int rc = 0;
+
+	local_irq_save(flags);
 	slab_lock(page);
 
 	if (!check_slab(s, page))
@@ -1056,7 +1060,7 @@ static noinline int free_debug_processin
 	}
 
 	if (!check_object(s, page, object, SLUB_RED_ACTIVE))
-		return 0;
+		goto out;
 
 	if (unlikely(s != page->slab)) {
 		if (!PageSlab(page)) {
@@ -1077,13 +1081,15 @@ static noinline int free_debug_processin
 		set_track(s, object, TRACK_FREE, addr);
 	trace(s, page, object, 0);
 	init_object(s, object, SLUB_RED_INACTIVE);
+	rc = 1;
+out:
 	slab_unlock(page);
-	return 1;
+	local_irq_restore(flags);
+	return rc;
 
 fail:
 	slab_fix(s, "Object at 0x%p not freed", object);
-	slab_unlock(page);
-	return 0;
+	goto out;
 }
 
 static int __init setup_slub_debug(char *str)


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [slubllv7 12/17] slub: Avoid disabling interrupts in free slowpath
  2011-06-01 17:25 [slubllv7 00/17] SLUB: Lockless freelists for objects V7 Christoph Lameter
                   ` (10 preceding siblings ...)
  2011-06-01 17:25 ` [slubllv7 11/17] slub: Disable interrupts in free_debug processing Christoph Lameter
@ 2011-06-01 17:25 ` Christoph Lameter
  2011-06-01 17:25 ` [slubllv7 13/17] slub: Get rid of the another_slab label Christoph Lameter
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Christoph Lameter @ 2011-06-01 17:25 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Eric Dumazet, H. Peter Anvin, linux-kernel,
	Thomas Gleixner

[-- Attachment #1: slab_free_without_irqoff --]
[-- Type: text/plain, Size: 2079 bytes --]

Disabling interrupts can be avoided now. However, list operation still require
disabling interrupts since allocations can occur from interrupt
contexts and there is no way to perform atomic list operations.

The acquition of the list_lock therefore has to disable interrupts as well.

Dropping interrupt handling significantly simplifies the slowpath.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 mm/slub.c |   16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-05-31 10:20:09.792975006 -0500
+++ linux-2.6/mm/slub.c	2011-05-31 10:20:15.502974969 -0500
@@ -2197,11 +2197,10 @@ static void __slab_free(struct kmem_cach
 	struct kmem_cache_node *n = NULL;
 	unsigned long uninitialized_var(flags);
 
-	local_irq_save(flags);
 	stat(s, FREE_SLOWPATH);
 
 	if (kmem_cache_debug(s) && !free_debug_processing(s, page, x, addr))
-		goto out_unlock;
+		return;
 
 	do {
 		prior = page->freelist;
@@ -2220,7 +2219,7 @@ static void __slab_free(struct kmem_cach
 			 * Otherwise the list_lock will synchronize with
 			 * other processors updating the list of slabs.
 			 */
-                        spin_lock(&n->list_lock);
+                        spin_lock_irqsave(&n->list_lock, flags);
 		}
 		inuse = new.inuse;
 
@@ -2236,7 +2235,7 @@ static void __slab_free(struct kmem_cach
 		 */
                 if (was_frozen)
                         stat(s, FREE_FROZEN);
-                goto out_unlock;
+                return;
         }
 
 	/*
@@ -2259,11 +2258,7 @@ static void __slab_free(struct kmem_cach
 			stat(s, FREE_ADD_PARTIAL);
 		}
 	}
-
-	spin_unlock(&n->list_lock);
-
-out_unlock:
-	local_irq_restore(flags);
+	spin_unlock_irqrestore(&n->list_lock, flags);
 	return;
 
 slab_empty:
@@ -2275,8 +2270,7 @@ slab_empty:
 		stat(s, FREE_REMOVE_PARTIAL);
 	}
 
-	spin_unlock(&n->list_lock);
-	local_irq_restore(flags);
+	spin_unlock_irqrestore(&n->list_lock, flags);
 	stat(s, FREE_SLAB);
 	discard_slab(s, page);
 }


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [slubllv7 13/17] slub: Get rid of the another_slab label
  2011-06-01 17:25 [slubllv7 00/17] SLUB: Lockless freelists for objects V7 Christoph Lameter
                   ` (11 preceding siblings ...)
  2011-06-01 17:25 ` [slubllv7 12/17] slub: Avoid disabling interrupts in free slowpath Christoph Lameter
@ 2011-06-01 17:25 ` Christoph Lameter
  2011-06-01 17:25 ` [slubllv7 14/17] slub: Add statistics for the case that the current slab does not match the node Christoph Lameter
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Christoph Lameter @ 2011-06-01 17:25 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Eric Dumazet, H. Peter Anvin, linux-kernel,
	Thomas Gleixner

[-- Attachment #1: eliminate_another_slab --]
[-- Type: text/plain, Size: 1107 bytes --]

We can avoid deactivate slab in special cases if we do the
deactivation of slabs in each code flow that leads to new_slab.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 mm/slub.c |   11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-05-31 14:27:11.362880110 -0500
+++ linux-2.6/mm/slub.c	2011-05-31 14:27:12.002880106 -0500
@@ -1965,8 +1965,10 @@ static void *__slab_alloc(struct kmem_ca
 	if (!page)
 		goto new_slab;
 
-	if (unlikely(!node_match(c, node)))
-		goto another_slab;
+	if (unlikely(!node_match(c, node))) {
+		deactivate_slab(s, c);
+		goto new_slab;
+	}
 
 	stat(s, ALLOC_SLOWPATH);
 
@@ -1986,7 +1988,7 @@ load_freelist:
 	VM_BUG_ON(!page->frozen);
 
 	if (unlikely(!object))
-		goto another_slab;
+		goto new_slab;
 
 	stat(s, ALLOC_REFILL);
 
@@ -1995,9 +1997,6 @@ load_freelist:
 	local_irq_restore(flags);
 	return object;
 
-another_slab:
-	deactivate_slab(s, c);
-
 new_slab:
 	page = get_partial(s, gfpflags, node);
 	if (page) {


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [slubllv7 14/17] slub: Add statistics for the case that the current slab does not match the node
  2011-06-01 17:25 [slubllv7 00/17] SLUB: Lockless freelists for objects V7 Christoph Lameter
                   ` (12 preceding siblings ...)
  2011-06-01 17:25 ` [slubllv7 13/17] slub: Get rid of the another_slab label Christoph Lameter
@ 2011-06-01 17:25 ` Christoph Lameter
  2011-06-01 17:25 ` [slubllv7 15/17] slub: fast release on full slab Christoph Lameter
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Christoph Lameter @ 2011-06-01 17:25 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Eric Dumazet, H. Peter Anvin, linux-kernel,
	Thomas Gleixner

[-- Attachment #1: node_mismatch --]
[-- Type: text/plain, Size: 2050 bytes --]

Slub reloads the per cpu slab if the page does not satisfy the NUMA condition. Track
those reloads since doing so has a performance impact.

Signed-off-by: Christoph Lameter <cl@linux.com>


---
 include/linux/slub_def.h |    1 +
 mm/slub.c                |    3 +++
 2 files changed, 4 insertions(+)

Index: linux-2.6/include/linux/slub_def.h
===================================================================
--- linux-2.6.orig/include/linux/slub_def.h	2011-05-31 14:27:07.332880140 -0500
+++ linux-2.6/include/linux/slub_def.h	2011-05-31 14:27:17.792880073 -0500
@@ -24,6 +24,7 @@ enum stat_item {
 	ALLOC_FROM_PARTIAL,	/* Cpu slab acquired from partial list */
 	ALLOC_SLAB,		/* Cpu slab acquired from page allocator */
 	ALLOC_REFILL,		/* Refill cpu slab from slab freelist */
+	ALLOC_NODE_MISMATCH,	/* Switching cpu slab */
 	FREE_SLAB,		/* Slab freed to the page allocator */
 	CPUSLAB_FLUSH,		/* Abandoning of the cpu slab */
 	DEACTIVATE_FULL,	/* Cpu slab was full when deactivated */
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-05-31 14:27:12.002880106 -0500
+++ linux-2.6/mm/slub.c	2011-05-31 14:27:17.792880073 -0500
@@ -1966,6 +1966,7 @@ static void *__slab_alloc(struct kmem_ca
 		goto new_slab;
 
 	if (unlikely(!node_match(c, node))) {
+		stat(s, ALLOC_NODE_MISMATCH);
 		deactivate_slab(s, c);
 		goto new_slab;
 	}
@@ -4675,6 +4676,7 @@ STAT_ATTR(FREE_REMOVE_PARTIAL, free_remo
 STAT_ATTR(ALLOC_FROM_PARTIAL, alloc_from_partial);
 STAT_ATTR(ALLOC_SLAB, alloc_slab);
 STAT_ATTR(ALLOC_REFILL, alloc_refill);
+STAT_ATTR(ALLOC_NODE_MISMATCH, alloc_node_mismatch);
 STAT_ATTR(FREE_SLAB, free_slab);
 STAT_ATTR(CPUSLAB_FLUSH, cpuslab_flush);
 STAT_ATTR(DEACTIVATE_FULL, deactivate_full);
@@ -4734,6 +4736,7 @@ static struct attribute *slab_attrs[] =
 	&alloc_from_partial_attr.attr,
 	&alloc_slab_attr.attr,
 	&alloc_refill_attr.attr,
+	&alloc_node_mismatch_attr.attr,
 	&free_slab_attr.attr,
 	&cpuslab_flush_attr.attr,
 	&deactivate_full_attr.attr,


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [slubllv7 15/17] slub: fast release on full slab
  2011-06-01 17:25 [slubllv7 00/17] SLUB: Lockless freelists for objects V7 Christoph Lameter
                   ` (13 preceding siblings ...)
  2011-06-01 17:25 ` [slubllv7 14/17] slub: Add statistics for the case that the current slab does not match the node Christoph Lameter
@ 2011-06-01 17:25 ` Christoph Lameter
  2011-06-01 17:25 ` [slubllv7 16/17] slub: Not necessary to check for empty slab on load_freelist Christoph Lameter
  2011-06-01 17:26 ` [slubllv7 17/17] slub: slabinfo update for cmpxchg handling Christoph Lameter
  16 siblings, 0 replies; 41+ messages in thread
From: Christoph Lameter @ 2011-06-01 17:25 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Eric Dumazet, H. Peter Anvin, linux-kernel,
	Thomas Gleixner

[-- Attachment #1: slab_alloc_fast_release --]
[-- Type: text/plain, Size: 3041 bytes --]

Make deactivation occur implicitly while checking out the current freelist.

This avoids one cmpxchg operation on a slab that is now fully in use.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 include/linux/slub_def.h |    1 +
 mm/slub.c                |   21 +++++++++++++++++++--
 2 files changed, 20 insertions(+), 2 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-05-31 14:27:17.792880073 -0500
+++ linux-2.6/mm/slub.c	2011-05-31 14:27:21.372880046 -0500
@@ -1977,9 +1977,21 @@ static void *__slab_alloc(struct kmem_ca
 		object = page->freelist;
 		counters = page->counters;
 		new.counters = counters;
-		new.inuse = page->objects;
 		VM_BUG_ON(!new.frozen);
 
+		/*
+		 * If there is no object left then we use this loop to
+		 * deactivate the slab which is simple since no objects
+		 * are left in the slab and therefore we do not need to
+		 * put the page back onto the partial list.
+		 *
+		 * If there are objects left then we retrieve them
+		 * and use them to refill the per cpu queue.
+		*/
+
+		new.inuse = page->objects;
+		new.frozen = object != NULL;
+
 	} while (!cmpxchg_double_slab(s, page,
 			object, counters,
 			NULL, new.counters,
@@ -1988,8 +2000,11 @@ static void *__slab_alloc(struct kmem_ca
 load_freelist:
 	VM_BUG_ON(!page->frozen);
 
-	if (unlikely(!object))
+	if (unlikely(!object)) {
+		c->page = NULL;
+		stat(s, DEACTIVATE_BYPASS);
 		goto new_slab;
+	}
 
 	stat(s, ALLOC_REFILL);
 
@@ -4684,6 +4699,7 @@ STAT_ATTR(DEACTIVATE_EMPTY, deactivate_e
 STAT_ATTR(DEACTIVATE_TO_HEAD, deactivate_to_head);
 STAT_ATTR(DEACTIVATE_TO_TAIL, deactivate_to_tail);
 STAT_ATTR(DEACTIVATE_REMOTE_FREES, deactivate_remote_frees);
+STAT_ATTR(DEACTIVATE_BYPASS, deactivate_bypass);
 STAT_ATTR(ORDER_FALLBACK, order_fallback);
 STAT_ATTR(CMPXCHG_DOUBLE_CPU_FAIL, cmpxchg_double_cpu_fail);
 STAT_ATTR(CMPXCHG_DOUBLE_FAIL, cmpxchg_double_fail);
@@ -4744,6 +4760,7 @@ static struct attribute *slab_attrs[] =
 	&deactivate_to_head_attr.attr,
 	&deactivate_to_tail_attr.attr,
 	&deactivate_remote_frees_attr.attr,
+	&deactivate_bypass_attr.attr,
 	&order_fallback_attr.attr,
 	&cmpxchg_double_fail_attr.attr,
 	&cmpxchg_double_cpu_fail_attr.attr,
Index: linux-2.6/include/linux/slub_def.h
===================================================================
--- linux-2.6.orig/include/linux/slub_def.h	2011-05-31 14:27:17.792880073 -0500
+++ linux-2.6/include/linux/slub_def.h	2011-05-31 14:27:21.382880050 -0500
@@ -32,6 +32,7 @@ enum stat_item {
 	DEACTIVATE_TO_HEAD,	/* Cpu slab was moved to the head of partials */
 	DEACTIVATE_TO_TAIL,	/* Cpu slab was moved to the tail of partials */
 	DEACTIVATE_REMOTE_FREES,/* Slab contained remotely freed objects */
+	DEACTIVATE_BYPASS,	/* Implicit deactivation */
 	ORDER_FALLBACK,		/* Number of times fallback was necessary */
 	CMPXCHG_DOUBLE_CPU_FAIL,/* Failure of this_cpu_cmpxchg_double */
 	CMPXCHG_DOUBLE_FAIL,	/* Number of times that cmpxchg double did not match */


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [slubllv7 16/17] slub: Not necessary to check for empty slab on load_freelist
  2011-06-01 17:25 [slubllv7 00/17] SLUB: Lockless freelists for objects V7 Christoph Lameter
                   ` (14 preceding siblings ...)
  2011-06-01 17:25 ` [slubllv7 15/17] slub: fast release on full slab Christoph Lameter
@ 2011-06-01 17:25 ` Christoph Lameter
  2011-06-01 17:26 ` [slubllv7 17/17] slub: slabinfo update for cmpxchg handling Christoph Lameter
  16 siblings, 0 replies; 41+ messages in thread
From: Christoph Lameter @ 2011-06-01 17:25 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Eric Dumazet, H. Peter Anvin, linux-kernel,
	Thomas Gleixner

[-- Attachment #1: goto_load_freelist --]
[-- Type: text/plain, Size: 870 bytes --]

load_freelist is now only branched to only if there are objects available.
So no need to check the object variable for NULL.

---
 mm/slub.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-05-31 14:27:21.372880046 -0500
+++ linux-2.6/mm/slub.c	2011-05-31 14:27:25.262880020 -0500
@@ -1997,9 +1997,6 @@ static void *__slab_alloc(struct kmem_ca
 			NULL, new.counters,
 			"__slab_alloc"));
 
-load_freelist:
-	VM_BUG_ON(!page->frozen);
-
 	if (unlikely(!object)) {
 		c->page = NULL;
 		stat(s, DEACTIVATE_BYPASS);
@@ -2008,6 +2005,8 @@ load_freelist:
 
 	stat(s, ALLOC_REFILL);
 
+load_freelist:
+	VM_BUG_ON(!page->frozen);
 	c->freelist = get_freepointer(s, object);
 	c->tid = next_tid(c->tid);
 	local_irq_restore(flags);


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [slubllv7 17/17] slub: slabinfo update for cmpxchg handling
  2011-06-01 17:25 [slubllv7 00/17] SLUB: Lockless freelists for objects V7 Christoph Lameter
                   ` (15 preceding siblings ...)
  2011-06-01 17:25 ` [slubllv7 16/17] slub: Not necessary to check for empty slab on load_freelist Christoph Lameter
@ 2011-06-01 17:26 ` Christoph Lameter
  16 siblings, 0 replies; 41+ messages in thread
From: Christoph Lameter @ 2011-06-01 17:26 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Eric Dumazet, H. Peter Anvin, linux-kernel,
	Thomas Gleixner

[-- Attachment #1: update_slabinfo --]
[-- Type: text/plain, Size: 6085 bytes --]

Update the statistics handling and the slabinfo tool to include the new
statistics in the reports it generates.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 tools/slub/slabinfo.c |   57 ++++++++++++++++++++++++++++++++++----------------
 1 file changed, 39 insertions(+), 18 deletions(-)

Index: linux-2.6/tools/slub/slabinfo.c
===================================================================
--- linux-2.6.orig/tools/slub/slabinfo.c	2011-05-24 09:36:51.534876607 -0500
+++ linux-2.6/tools/slub/slabinfo.c	2011-05-24 09:41:18.114874899 -0500
@@ -2,8 +2,9 @@
  * Slabinfo: Tool to get reports about slabs
  *
  * (C) 2007 sgi, Christoph Lameter
+ * (C) 2011 Linux Foundation, Christoph Lameter
  *
- * Compile by:
+ * Compile with:
  *
  * gcc -o slabinfo slabinfo.c
  */
@@ -39,6 +40,8 @@ struct slabinfo {
 	unsigned long cpuslab_flush, deactivate_full, deactivate_empty;
 	unsigned long deactivate_to_head, deactivate_to_tail;
 	unsigned long deactivate_remote_frees, order_fallback;
+	unsigned long cmpxchg_double_cpu_fail, cmpxchg_double_fail;
+	unsigned long alloc_node_mismatch, deactivate_bypass;
 	int numa[MAX_NODES];
 	int numa_partial[MAX_NODES];
 } slabinfo[MAX_SLABS];
@@ -99,7 +102,7 @@ static void fatal(const char *x, ...)
 
 static void usage(void)
 {
-	printf("slabinfo 5/7/2007. (c) 2007 sgi.\n\n"
+	printf("slabinfo 4/15/2011. (c) 2007 sgi/(c) 2011 Linux Foundation.\n\n"
 		"slabinfo [-ahnpvtsz] [-d debugopts] [slab-regexp]\n"
 		"-a|--aliases           Show aliases\n"
 		"-A|--activity          Most active slabs first\n"
@@ -293,7 +296,7 @@ int line = 0;
 static void first_line(void)
 {
 	if (show_activity)
-		printf("Name                   Objects      Alloc       Free   %%Fast Fallb O\n");
+		printf("Name                   Objects      Alloc       Free   %%Fast Fallb O CmpX   UL\n");
 	else
 		printf("Name                   Objects Objsize    Space "
 			"Slabs/Part/Cpu  O/S O %%Fr %%Ef Flg\n");
@@ -379,14 +382,14 @@ static void show_tracking(struct slabinf
 	printf("\n%s: Kernel object allocation\n", s->name);
 	printf("-----------------------------------------------------------------------\n");
 	if (read_slab_obj(s, "alloc_calls"))
-		printf(buffer);
+		printf("%s", buffer);
 	else
 		printf("No Data\n");
 
 	printf("\n%s: Kernel object freeing\n", s->name);
 	printf("------------------------------------------------------------------------\n");
 	if (read_slab_obj(s, "free_calls"))
-		printf(buffer);
+		printf("%s", buffer);
 	else
 		printf("No Data\n");
 
@@ -400,7 +403,7 @@ static void ops(struct slabinfo *s)
 	if (read_slab_obj(s, "ops")) {
 		printf("\n%s: kmem_cache operations\n", s->name);
 		printf("--------------------------------------------\n");
-		printf(buffer);
+		printf("%s", buffer);
 	} else
 		printf("\n%s has no kmem_cache operations\n", s->name);
 }
@@ -462,19 +465,32 @@ static void slab_stats(struct slabinfo *
 	if (s->cpuslab_flush)
 		printf("Flushes %8lu\n", s->cpuslab_flush);
 
-	if (s->alloc_refill)
-		printf("Refill %8lu\n", s->alloc_refill);
-
 	total = s->deactivate_full + s->deactivate_empty +
-			s->deactivate_to_head + s->deactivate_to_tail;
+			s->deactivate_to_head + s->deactivate_to_tail + s->deactivate_bypass;
 
-	if (total)
-		printf("Deactivate Full=%lu(%lu%%) Empty=%lu(%lu%%) "
-			"ToHead=%lu(%lu%%) ToTail=%lu(%lu%%)\n",
-			s->deactivate_full, (s->deactivate_full * 100) / total,
-			s->deactivate_empty, (s->deactivate_empty * 100) / total,
-			s->deactivate_to_head, (s->deactivate_to_head * 100) / total,
+	if (total) {
+		printf("\nSlab Deactivation             Ocurrences  %%\n");
+		printf("-------------------------------------------------\n");
+		printf("Slab full                     %7lu  %3lu%%\n",
+			s->deactivate_full, (s->deactivate_full * 100) / total);
+		printf("Slab empty                    %7lu  %3lu%%\n",
+			s->deactivate_empty, (s->deactivate_empty * 100) / total);
+		printf("Moved to head of partial list %7lu  %3lu%%\n",
+			s->deactivate_to_head, (s->deactivate_to_head * 100) / total);
+		printf("Moved to tail of partial list %7lu  %3lu%%\n",
 			s->deactivate_to_tail, (s->deactivate_to_tail * 100) / total);
+		printf("Deactivation bypass           %7lu  %3lu%%\n",
+			s->deactivate_bypass, (s->deactivate_bypass * 100) / total);
+		printf("Refilled from foreign frees   %7lu  %3lu%%\n",
+			s->alloc_refill, (s->alloc_refill * 100) / total);
+		printf("Node mismatch                 %7lu  %3lu%%\n",
+			s->alloc_node_mismatch, (s->alloc_node_mismatch * 100) / total);
+	}
+
+	if (s->cmpxchg_double_fail || s->cmpxchg_double_cpu_fail)
+		printf("\nCmpxchg_double Looping\n------------------------\n");
+		printf("Locked Cmpxchg Double redos   %lu\nUnlocked Cmpxchg Double redos %lu\n",
+			s->cmpxchg_double_fail, s->cmpxchg_double_cpu_fail);
 }
 
 static void report(struct slabinfo *s)
@@ -573,12 +589,13 @@ static void slabcache(struct slabinfo *s
 		total_alloc = s->alloc_fastpath + s->alloc_slowpath;
 		total_free = s->free_fastpath + s->free_slowpath;
 
-		printf("%-21s %8ld %10ld %10ld %3ld %3ld %5ld %1d\n",
+		printf("%-21s %8ld %10ld %10ld %3ld %3ld %5ld %1d %4ld %4ld\n",
 			s->name, s->objects,
 			total_alloc, total_free,
 			total_alloc ? (s->alloc_fastpath * 100 / total_alloc) : 0,
 			total_free ? (s->free_fastpath * 100 / total_free) : 0,
-			s->order_fallback, s->order);
+			s->order_fallback, s->order, s->cmpxchg_double_fail,
+			s->cmpxchg_double_cpu_fail);
 	}
 	else
 		printf("%-21s %8ld %7d %8s %14s %4d %1d %3ld %3ld %s\n",
@@ -1190,6 +1207,10 @@ static void read_slab_dir(void)
 			slab->deactivate_to_tail = get_obj("deactivate_to_tail");
 			slab->deactivate_remote_frees = get_obj("deactivate_remote_frees");
 			slab->order_fallback = get_obj("order_fallback");
+			slab->cmpxchg_double_cpu_fail = get_obj("cmpxchg_double_cpu_fail");
+			slab->cmpxchg_double_fail = get_obj("cmpxchg_double_fail");
+			slab->alloc_node_mismatch = get_obj("alloc_node_mismatch");
+			slab->deactivate_bypass = get_obj("deactivate_bypass");
 			chdir("..");
 			if (slab->name[0] == ':')
 				alias_targets++;


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [slubllv7 04/17] x86: Add support for cmpxchg_double
  2011-06-01 17:25 ` [slubllv7 04/17] x86: Add support for cmpxchg_double Christoph Lameter
@ 2011-06-09  9:53   ` Pekka Enberg
  2011-06-10 15:17     ` Christoph Lameter
  2011-06-15  8:55   ` Tejun Heo
  2011-06-25 23:49   ` [tip:x86/atomic] " tip-bot for Christoph Lameter
  2 siblings, 1 reply; 41+ messages in thread
From: Pekka Enberg @ 2011-06-09  9:53 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: David Rientjes, tj, Eric Dumazet, H. Peter Anvin, linux-kernel,
	Thomas Gleixner, mingo

On 6/1/11 8:25 PM, Christoph Lameter wrote:
> A simple implementation that only supports the word size and does not
> have a fallback mode (would require a spinlock).
>
> Add 32 and 64 bit support for cmpxchg_double. cmpxchg double uses
> the cmpxchg8b or cmpxchg16b instruction on x86 processors to compare
> and swap 2 machine words. This allows lockless algorithms to move more
> context information through critical sections.
>
> Set a flag CONFIG_CMPXCHG_DOUBLE to signal that support for double word
> cmpxchg detection has been build into the kernel. Note that each subsystem
> using cmpxchg_double has to implement a fall back mechanism as long as
> we offer support for processors that do not implement cmpxchg_double.
>
> Reviewed-by: H. Peter Anvin<hpa@zytor.com>
> Cc: tj@kernel.org
> Signed-off-by: Christoph Lameter<cl@linux.com>

Tejun, I'm going to queue this series for linux-next in the next few 
days. Could you either pick up this patch so I can pull from your tree 
or alternatively, is it OK to queue this through slab.git?

			Pekka

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [slubllv7 05/17] mm: Rearrange struct page
  2011-06-01 17:25 ` [slubllv7 05/17] mm: Rearrange struct page Christoph Lameter
@ 2011-06-09  9:57   ` Pekka Enberg
  2011-06-09 16:45     ` Andrew Morton
  0 siblings, 1 reply; 41+ messages in thread
From: Pekka Enberg @ 2011-06-09  9:57 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: David Rientjes, Eric Dumazet, H. Peter Anvin, linux-kernel,
	Thomas Gleixner, kamezawa.hiroyu, akpm, kosaki.motohiro

On 6/1/11 8:25 PM, Christoph Lameter wrote:
> We need to be able to use cmpxchg_double on the freelist and object count
> field in struct page. Rearrange the fields in struct page according to
> doubleword entities so that the freelist pointer comes before the counters.
> Do the rearranging with a future in mind where we use more doubleword
> atomics to avoid locking of updates to flags/mapping or lru pointers.
>
> Create another union to allow access to counters in struct page as a
> single unsigned long value.
>
> The doublewords must be properly aligned for cmpxchg_double to work.
> Sadly this increases the size of page struct by one word on some architectures.
> But as a resultpage structs are now cacheline aligned on x86_64.
>
> Signed-off-by: Christoph Lameter<cl@linux.com>

I'd like to queue this SLUB patch series to linux-next through slab.git. 
Any NAKs or ACKs for this patch?

			Pekka

> ---
>   include/linux/mm_types.h |   89 +++++++++++++++++++++++++++++++----------------
>   1 file changed, 60 insertions(+), 29 deletions(-)
>
> Index: linux-2.6/include/linux/mm_types.h
> ===================================================================
> --- linux-2.6.orig/include/linux/mm_types.h	2011-05-31 09:46:41.912987862 -0500
> +++ linux-2.6/include/linux/mm_types.h	2011-05-31 09:46:44.282987846 -0500
> @@ -30,52 +30,74 @@ struct address_space;
>    * moment. Note that we have no way to track which tasks are using
>    * a page, though if it is a pagecache page, rmap structures can tell us
>    * who is mapping it.
> + *
> + * The objects in struct page are organized in double word blocks in
> + * order to allows us to use atomic double word operations on portions
> + * of struct page. That is currently only used by slub but the arrangement
> + * allows the use of atomic double word operations on the flags/mapping
> + * and lru list pointers also.
>    */
>   struct page {
> +	/* First double word block */
>   	unsigned long flags;		/* Atomic flags, some possibly
>   					 * updated asynchronously */
> -	atomic_t _count;		/* Usage count, see below. */
> +	struct address_space *mapping;	/* If low bit clear, points to
> +					 * inode address_space, or NULL.
> +					 * If page mapped as anonymous
> +					 * memory, low bit is set, and
> +					 * it points to anon_vma object:
> +					 * see PAGE_MAPPING_ANON below.
> +					 */
> +	/* Second double word */
>   	union {
> -		atomic_t _mapcount;	/* Count of ptes mapped in mms,
> -					 * to show when page is mapped
> -					 *&  limit reverse map searches.
> +		struct {
> +			pgoff_t index;		/* Our offset within mapping. */
> +			atomic_t _mapcount;	/* Count of ptes mapped in mms,
> +							 * to show when page is mapped
> +							 *&  limit reverse map searches.
> +							 */
> +			atomic_t _count;		/* Usage count, see below. */
> +		};
> +
> +		struct {			/* SLUB cmpxchg_double area */
> +			void *freelist;
> +			union {
> +				unsigned long counters;
> +				struct {
> +					unsigned inuse:16;
> +					unsigned objects:15;
> +					unsigned frozen:1;
> +					/*
> +					 * Kernel may make use of this field even when slub
> +					 * uses the rest of the double word!
>   					 */
> -		struct {		/* SLUB */
> -			unsigned inuse:16;
> -			unsigned objects:15;
> -			unsigned frozen:1;
> +					atomic_t _count;
> +				};
> +			};
>   		};
>   	};
> +
> +	/* Third double word block */
> +	struct list_head lru;		/* Pageout list, eg. active_list
> +					 * protected by zone->lru_lock !
> +					 */
> +
> +	/* Remainder is not double word aligned */
>   	union {
> -	    struct {
> -		unsigned long private;		/* Mapping-private opaque data:
> +	 	unsigned long private;		/* Mapping-private opaque data:
>   					 	 * usually used for buffer_heads
>   						 * if PagePrivate set; used for
>   						 * swp_entry_t if PageSwapCache;
>   						 * indicates order in the buddy
>   						 * system if PG_buddy is set.
>   						 */
> -		struct address_space *mapping;	/* If low bit clear, points to
> -						 * inode address_space, or NULL.
> -						 * If page mapped as anonymous
> -						 * memory, low bit is set, and
> -						 * it points to anon_vma object:
> -						 * see PAGE_MAPPING_ANON below.
> -						 */
> -	    };
>   #if USE_SPLIT_PTLOCKS
> -	    spinlock_t ptl;
> +		spinlock_t ptl;
>   #endif
> -	    struct kmem_cache *slab;	/* SLUB: Pointer to slab */
> -	    struct page *first_page;	/* Compound tail pages */
> +		struct kmem_cache *slab;	/* SLUB: Pointer to slab */
> +		struct page *first_page;	/* Compound tail pages */
>   	};
> -	union {
> -		pgoff_t index;		/* Our offset within mapping. */
> -		void *freelist;		/* SLUB: freelist req. slab lock */
> -	};
> -	struct list_head lru;		/* Pageout list, eg. active_list
> -					 * protected by zone->lru_lock !
> -					 */
> +
>   	/*
>   	 * On machines where all RAM is mapped into kernel address space,
>   	 * we can simply calculate the virtual address. On machines with
> @@ -101,7 +123,16 @@ struct page {
>   	 */
>   	void *shadow;
>   #endif
> -};
> +}
> +/*
> + * If another subsystem starts using the double word pairing for atomic
> + * operations on struct page then it must change the #if to ensure
> + * proper alignment of the page struct.
> + */
> +#if defined(CONFIG_SLUB)&&  defined(CONFIG_CMPXCHG_LOCAL)
> +	__attribute__((__aligned__(2*sizeof(unsigned long))))
> +#endif
> +;
>
>   typedef unsigned long __nocast vm_flags_t;
>
>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [slubllv7 05/17] mm: Rearrange struct page
  2011-06-09  9:57   ` Pekka Enberg
@ 2011-06-09 16:45     ` Andrew Morton
  2011-06-09 17:03       ` [PATCH] checkpatch: Add a "prefer __aligned" check Joe Perches
  0 siblings, 1 reply; 41+ messages in thread
From: Andrew Morton @ 2011-06-09 16:45 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, David Rientjes, Eric Dumazet, H. Peter Anvin,
	linux-kernel, Thomas Gleixner, kamezawa.hiroyu, kosaki.motohiro,
	Joe Perches

On Thu, 09 Jun 2011 12:57:50 +0300 Pekka Enberg <penberg@cs.helsinki.fi> wrote:

> On 6/1/11 8:25 PM, Christoph Lameter wrote:
> > We need to be able to use cmpxchg_double on the freelist and object count
> > field in struct page. Rearrange the fields in struct page according to
> > doubleword entities so that the freelist pointer comes before the counters.
> > Do the rearranging with a future in mind where we use more doubleword
> > atomics to avoid locking of updates to flags/mapping or lru pointers.
> >
> > Create another union to allow access to counters in struct page as a
> > single unsigned long value.
> >
> > The doublewords must be properly aligned for cmpxchg_double to work.
> > Sadly this increases the size of page struct by one word on some architectures.
> > But as a resultpage structs are now cacheline aligned on x86_64.

Is it worth this cost?

> > Signed-off-by: Christoph Lameter<cl@linux.com>
> 
> I'd like to queue this SLUB patch series to linux-next through slab.git. 
> Any NAKs or ACKs for this patch?
> 
> >   include/linux/mm_types.h |   89 +++++++++++++++++++++++++++++++----------------
> >   1 file changed, 60 insertions(+), 29 deletions(-)
> >
> > Index: linux-2.6/include/linux/mm_types.h
> > ===================================================================
> > --- linux-2.6.orig/include/linux/mm_types.h	2011-05-31 09:46:41.912987862 -0500
> > +++ linux-2.6/include/linux/mm_types.h	2011-05-31 09:46:44.282987846 -0500
> > @@ -30,52 +30,74 @@ struct address_space;
> >    * moment. Note that we have no way to track which tasks are using
> >    * a page, though if it is a pagecache page, rmap structures can tell us
> >    * who is mapping it.
> > + *
> > + * The objects in struct page are organized in double word blocks in
> > + * order to allows us to use atomic double word operations on portions
> > + * of struct page. That is currently only used by slub but the arrangement
> > + * allows the use of atomic double word operations on the flags/mapping
> > + * and lru list pointers also.

I don't really like the word "word" much.  There's always uncertainty
about whether thee and me are talking about the same thing.  And
perhaps one day words will be 64-bit.

So if we mean 32-bit, let's say 32-bit?

> > +/*
> > + * If another subsystem starts using the double word pairing for atomic
> > + * operations on struct page then it must change the #if to ensure
> > + * proper alignment of the page struct.
> > + */
> > +#if defined(CONFIG_SLUB)&&  defined(CONFIG_CMPXCHG_LOCAL)
> > +	__attribute__((__aligned__(2*sizeof(unsigned long))))
> > +#endif

I guess we need a "hey, use __aligned" checkpatch rule.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] checkpatch: Add a "prefer __aligned" check
  2011-06-09 16:45     ` Andrew Morton
@ 2011-06-09 17:03       ` Joe Perches
  0 siblings, 0 replies; 41+ messages in thread
From: Joe Perches @ 2011-06-09 17:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Pekka Enberg, Christoph Lameter, David Rientjes, Eric Dumazet,
	H. Peter Anvin, linux-kernel, Thomas Gleixner, kamezawa.hiroyu,
	kosaki.motohiro

Prefer the use of __aligned(size) over __attribute__((__aligned___(size)))

Link: http://lkml.kernel.org/r/20110609094526.1571774c.akpm@linux-foundation.org
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Joe Perches <joe@perches.com>
---

On Thu, 2011-06-09 at 09:45 -0700, Andrew Morton wrote:
> I guess we need a "hey, use __aligned" checkpatch rule.

 scripts/checkpatch.pl |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)
 
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 8657f99..352626c 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2743,6 +2743,11 @@ sub process {
 			WARN("__packed is preferred over __attribute__((packed))\n" . $herecurr);
 		}
 
+# Check for __attribute__ aligned, prefer __aligned
+		if ($line =~ /\b__attribute__\s*\(\s*\(.*aligned/) {
+			WARN("__aligned(size) is preferred over __attribute__((aligned(size)))\n" . $herecurr);
+		}
+
 # check for sizeof(&)
 		if ($line =~ /\bsizeof\s*\(\s*\&/) {
 			WARN("sizeof(& should be avoided\n" . $herecurr);




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [slubllv7 04/17] x86: Add support for cmpxchg_double
  2011-06-09  9:53   ` Pekka Enberg
@ 2011-06-10 15:17     ` Christoph Lameter
  2011-06-11  9:50       ` Pekka Enberg
  0 siblings, 1 reply; 41+ messages in thread
From: Christoph Lameter @ 2011-06-10 15:17 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, tj, Eric Dumazet, H. Peter Anvin, linux-kernel,
	Thomas Gleixner, mingo

On Thu, 9 Jun 2011, Pekka Enberg wrote:

> Tejun, I'm going to queue this series for linux-next in the next few days.
> Could you either pick up this patch so I can pull from your tree or
> alternatively, is it OK to queue this through slab.git?

This patch is not related to percpu functionality. Tejun does general x86
patches?


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [slubllv7 04/17] x86: Add support for cmpxchg_double
  2011-06-10 15:17     ` Christoph Lameter
@ 2011-06-11  9:50       ` Pekka Enberg
  2011-06-11 17:02         ` Christoph Lameter
  0 siblings, 1 reply; 41+ messages in thread
From: Pekka Enberg @ 2011-06-11  9:50 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, David Rientjes, tj, Eric Dumazet, H. Peter Anvin,
	linux-kernel, Thomas Gleixner, mingo

On Fri, Jun 10, 2011 at 6:17 PM, Christoph Lameter <cl@linux.com> wrote:
> On Thu, 9 Jun 2011, Pekka Enberg wrote:
>
>> Tejun, I'm going to queue this series for linux-next in the next few days.
>> Could you either pick up this patch so I can pull from your tree or
>> alternatively, is it OK to queue this through slab.git?
>
> This patch is not related to percpu functionality. Tejun does general x86
> patches?

Oh, I don't know - but I definitely need more ACKs to even consider
pushing this through slab.git.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [slubllv7 04/17] x86: Add support for cmpxchg_double
  2011-06-11  9:50       ` Pekka Enberg
@ 2011-06-11 17:02         ` Christoph Lameter
  2011-06-14  5:49           ` Pekka Enberg
  0 siblings, 1 reply; 41+ messages in thread
From: Christoph Lameter @ 2011-06-11 17:02 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Pekka Enberg, David Rientjes, tj, Eric Dumazet, H. Peter Anvin,
	linux-kernel, Thomas Gleixner, mingo

On Sat, 11 Jun 2011, Pekka Enberg wrote:

> Oh, I don't know - but I definitely need more ACKs to even consider
> pushing this through slab.git.

Any additional ackers out there?;-)


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [slubllv7 04/17] x86: Add support for cmpxchg_double
  2011-06-11 17:02         ` Christoph Lameter
@ 2011-06-14  5:49           ` Pekka Enberg
  2011-06-14  8:04             ` Ingo Molnar
  0 siblings, 1 reply; 41+ messages in thread
From: Pekka Enberg @ 2011-06-14  5:49 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, David Rientjes, tj, Eric Dumazet, H. Peter Anvin,
	linux-kernel, Thomas Gleixner, mingo

On 6/11/11 8:02 PM, Christoph Lameter wrote:
> On Sat, 11 Jun 2011, Pekka Enberg wrote:
>
>> Oh, I don't know - but I definitely need more ACKs to even consider
>> pushing this through slab.git.
>
> Any additional ackers out there?;-)

Ingo, ping? Could we have this in -tip so I can pull from there? This is 
blocking me from queuing SLUB performance improvements to linux-next.

			Pekka

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [slubllv7 04/17] x86: Add support for cmpxchg_double
  2011-06-14  5:49           ` Pekka Enberg
@ 2011-06-14  8:04             ` Ingo Molnar
  2011-06-14 14:04               ` Christoph Lameter
  0 siblings, 1 reply; 41+ messages in thread
From: Ingo Molnar @ 2011-06-14  8:04 UTC (permalink / raw)
  To: Pekka Enberg, H. Peter Anvin
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, tj,
	Eric Dumazet, H. Peter Anvin, linux-kernel, Thomas Gleixner


* Pekka Enberg <penberg@cs.helsinki.fi> wrote:

> On 6/11/11 8:02 PM, Christoph Lameter wrote:
> >On Sat, 11 Jun 2011, Pekka Enberg wrote:
> >
> >>Oh, I don't know - but I definitely need more ACKs to even consider
> >>pushing this through slab.git.
> >
> >Any additional ackers out there?;-)
> 
> Ingo, ping? Could we have this in -tip so I can pull from there? 
> This is blocking me from queuing SLUB performance improvements to 
> linux-next.

hpa, what's your take on this patch?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [slubllv7 04/17] x86: Add support for cmpxchg_double
  2011-06-14  8:04             ` Ingo Molnar
@ 2011-06-14 14:04               ` Christoph Lameter
  2011-06-14 15:05                 ` H. Peter Anvin
  0 siblings, 1 reply; 41+ messages in thread
From: Christoph Lameter @ 2011-06-14 14:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, H. Peter Anvin, Pekka Enberg, David Rientjes, tj,
	Eric Dumazet, linux-kernel, Thomas Gleixner

On Tue, 14 Jun 2011, Ingo Molnar wrote:

>
> * Pekka Enberg <penberg@cs.helsinki.fi> wrote:
>
> > On 6/11/11 8:02 PM, Christoph Lameter wrote:
> > >On Sat, 11 Jun 2011, Pekka Enberg wrote:
> > >
> > >>Oh, I don't know - but I definitely need more ACKs to even consider
> > >>pushing this through slab.git.
> > >
> > >Any additional ackers out there?;-)
> >
> > Ingo, ping? Could we have this in -tip so I can pull from there?
> > This is blocking me from queuing SLUB performance improvements to
> > linux-next.
>
> hpa, what's your take on this patch?

He acked it.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [slubllv7 04/17] x86: Add support for cmpxchg_double
  2011-06-14 14:04               ` Christoph Lameter
@ 2011-06-14 15:05                 ` H. Peter Anvin
  0 siblings, 0 replies; 41+ messages in thread
From: H. Peter Anvin @ 2011-06-14 15:05 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Ingo Molnar, Pekka Enberg, Pekka Enberg, David Rientjes, tj,
	Eric Dumazet, linux-kernel, Thomas Gleixner

On 06/14/2011 07:04 AM, Christoph Lameter wrote:
> On Tue, 14 Jun 2011, Ingo Molnar wrote:
> 
>>
>> * Pekka Enberg <penberg@cs.helsinki.fi> wrote:
>>
>>> On 6/11/11 8:02 PM, Christoph Lameter wrote:
>>>> On Sat, 11 Jun 2011, Pekka Enberg wrote:
>>>>
>>>>> Oh, I don't know - but I definitely need more ACKs to even consider
>>>>> pushing this through slab.git.
>>>>
>>>> Any additional ackers out there?;-)
>>>
>>> Ingo, ping? Could we have this in -tip so I can pull from there?
>>> This is blocking me from queuing SLUB performance improvements to
>>> linux-next.
>>
>> hpa, what's your take on this patch?
> 
> He acked it.
> 

Yep, looks fine now.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [slubllv7 04/17] x86: Add support for cmpxchg_double
  2011-06-01 17:25 ` [slubllv7 04/17] x86: Add support for cmpxchg_double Christoph Lameter
  2011-06-09  9:53   ` Pekka Enberg
@ 2011-06-15  8:55   ` Tejun Heo
  2011-06-15 14:26     ` Christoph Lameter
  2011-06-25 23:49   ` [tip:x86/atomic] " tip-bot for Christoph Lameter
  2 siblings, 1 reply; 41+ messages in thread
From: Tejun Heo @ 2011-06-15  8:55 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, David Rientjes, Eric Dumazet, H. Peter Anvin,
	linux-kernel, Thomas Gleixner

Hello, Christoph, Pekka.  Sorry about the delay.

On Wed, Jun 01, 2011 at 12:25:47PM -0500, Christoph Lameter wrote:
> Index: linux-2.6/arch/x86/include/asm/cmpxchg_64.h
> ===================================================================
> --- linux-2.6.orig/arch/x86/include/asm/cmpxchg_64.h	2011-06-01 11:01:05.002406114 -0500
> +++ linux-2.6/arch/x86/include/asm/cmpxchg_64.h	2011-06-01 11:01:48.222405834 -0500
> +#define cmpxchg_double(ptr, o1, o2, n1, n2)				\
> +({									\
> +	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
> +	VM_BUG_ON((unsigned long)(ptr) % 16);				\
> +	cmpxchg16b((ptr), (o1), (o2), (n1), (n2));			\
> +})
> +
> +#define cmpxchg_double_local(ptr, o1, o2, n1, n2)			\
> +({									\
> +	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
> +	VM_BUG_ON((unsigned long)(ptr) % 16);				\
> +	cmpxchg16b_local((ptr), (o1), (o2), (n1), (n2));		\
> +})

Do we really need cmpxchg16b*() macros separately?  Why not just
collapse them into cmpxchg_double*()?  Also, it would be better if we
have the same level of VM_BUG_ON() checks as in percpu cmpxchg_double
ops.  Maybe we should put them in a separate macro?

> +#define system_has_cmpxchg_double() cpu_has_cx16

Where's the fallback %false definition for the above feature macro for
archs which don't support cmpxchg_double?  Also, is system_has_*()
conventional?  Isn't arch_has_*() more conventional for this purpose?

>  #endif /* _ASM_X86_CMPXCHG_64_H */
> Index: linux-2.6/arch/x86/include/asm/cmpxchg_32.h
> ===================================================================
> --- linux-2.6.orig/arch/x86/include/asm/cmpxchg_32.h	2011-06-01 11:01:05.022406109 -0500
> +++ linux-2.6/arch/x86/include/asm/cmpxchg_32.h	2011-06-01 11:01:48.222405834 -0500
> @@ -280,4 +280,52 @@ static inline unsigned long cmpxchg_386(
>  
>  #endif
>  
> +#define cmpxchg8b(ptr, o1, o2, n1, n2)				\
> +({								\
> +	char __ret;						\
> +	__typeof__(o2) __dummy;					\
> +	__typeof__(*(ptr)) __old1 = (o1);			\
> +	__typeof__(o2) __old2 = (o2);				\
> +	__typeof__(*(ptr)) __new1 = (n1);			\
> +	__typeof__(o2) __new2 = (n2);				\
> +	asm volatile(LOCK_PREFIX "cmpxchg8b %2; setz %1"	\
> +		       : "=d"(__dummy), "=a" (__ret), "+m" (*ptr)\
> +		       : "a" (__old1), "d"(__old2),		\
> +		         "b" (__new1), "c" (__new2)		\
> +		       : "memory");				\
> +	__ret; })
> +
> +
> +#define cmpxchg8b_local(ptr, o1, o2, n1, n2)			\
> +({								\
> +	char __ret;						\
> +	__typeof__(o2) __dummy;					\
> +	__typeof__(*(ptr)) __old1 = (o1);			\
> +	__typeof__(o2) __old2 = (o2);				\
> +	__typeof__(*(ptr)) __new1 = (n1);			\
> +	__typeof__(o2) __new2 = (n2);				\
> +	asm volatile("cmpxchg8b %2; setz %1"			\
> +		       : "=d"(__dummy), "=a"(__ret), "+m" (*ptr)\
> +		       : "a" (__old), "d"(__old2),		\
> +		         "b" (__new1), "c" (__new2),		\
> +		       : "memory");				\
> +	__ret; })

Wouldn't it be better to use cmpxchg64() for cmpxchg_double()?

Another thing is that choosing different code path depending on
has_cmpxchg_double() would be quite messy and won't bode well with
many people.  I agree that fallback implementation would be heavier
for SMP safe operations but some archs already do that for cmpxchg
(forgot which one).  If we're gonna export this to generic code,
wouldn't it be better to implement proper generic fallbacks and
provide has_*() as hint?

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [slubllv7 04/17] x86: Add support for cmpxchg_double
  2011-06-15  8:55   ` Tejun Heo
@ 2011-06-15 14:26     ` Christoph Lameter
  2011-06-15 16:39       ` Tejun Heo
  0 siblings, 1 reply; 41+ messages in thread
From: Christoph Lameter @ 2011-06-15 14:26 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Pekka Enberg, David Rientjes, Eric Dumazet, H. Peter Anvin,
	linux-kernel, Thomas Gleixner

On Wed, 15 Jun 2011, Tejun Heo wrote:

> Hello, Christoph, Pekka.  Sorry about the delay.
>
> On Wed, Jun 01, 2011 at 12:25:47PM -0500, Christoph Lameter wrote:
> > Index: linux-2.6/arch/x86/include/asm/cmpxchg_64.h
> > ===================================================================
> > --- linux-2.6.orig/arch/x86/include/asm/cmpxchg_64.h	2011-06-01 11:01:05.002406114 -0500
> > +++ linux-2.6/arch/x86/include/asm/cmpxchg_64.h	2011-06-01 11:01:48.222405834 -0500
> > +#define cmpxchg_double(ptr, o1, o2, n1, n2)				\
> > +({									\
> > +	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
> > +	VM_BUG_ON((unsigned long)(ptr) % 16);				\
> > +	cmpxchg16b((ptr), (o1), (o2), (n1), (n2));			\
> > +})
> > +
> > +#define cmpxchg_double_local(ptr, o1, o2, n1, n2)			\
> > +({									\
> > +	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
> > +	VM_BUG_ON((unsigned long)(ptr) % 16);				\
> > +	cmpxchg16b_local((ptr), (o1), (o2), (n1), (n2));		\
> > +})
>
> Do we really need cmpxchg16b*() macros separately?  Why not just
> collapse them into cmpxchg_double*()?  Also, it would be better if we
> have the same level of VM_BUG_ON() checks as in percpu cmpxchg_double
> ops.  Maybe we should put them in a separate macro?

The method here is to put all the high level checks in cmpxchg_double()
and then do the low level asm stuff in cmpxchg16b macros. I think that is
a good separation.

 > > +#define system_has_cmpxchg_double() cpu_has_cx16
>
> Where's the fallback %false definition for the above feature macro for
> archs which don't support cmpxchg_double?  Also, is system_has_*()
> conventional?  Isn't arch_has_*() more conventional for this purpose?

There is a convention for querying processor flags from core code?

The system_has_cmpxchg_double() is only used if the arch defines
CONFIG_CMPXCHG_DOUBLE

> > +#define cmpxchg8b_local(ptr, o1, o2, n1, n2)			\
> > +({								\
> > +	char __ret;						\
> > +	__typeof__(o2) __dummy;					\
> > +	__typeof__(*(ptr)) __old1 = (o1);			\
> > +	__typeof__(o2) __old2 = (o2);				\
> > +	__typeof__(*(ptr)) __new1 = (n1);			\
> > +	__typeof__(o2) __new2 = (n2);				\
> > +	asm volatile("cmpxchg8b %2; setz %1"			\
> > +		       : "=d"(__dummy), "=a"(__ret), "+m" (*ptr)\
> > +		       : "a" (__old), "d"(__old2),		\
> > +		         "b" (__new1), "c" (__new2),		\
> > +		       : "memory");				\
> > +	__ret; })
>
> Wouldn't it be better to use cmpxchg64() for cmpxchg_double()?

This way it is done in the same way on 32 bit than on 64 bit. The use of
cmpxchg64 also means that some of the parameters would have to be combined
to form 64 bit ints from the 32 bit ones before __cmpxchg64 could be used.

__cmpxchg64 has different parameter conventions.

> Another thing is that choosing different code path depending on
> has_cmpxchg_double() would be quite messy and won't bode well with
> many people.  I agree that fallback implementation would be heavier
> for SMP safe operations but some archs already do that for cmpxchg
> (forgot which one).  If we're gonna export this to generic code,
> wouldn't it be better to implement proper generic fallbacks and
> provide has_*() as hint?

A generic fallback for cmpxchg_double would mean having to disable
interrupts and then take a global spinlock. There are significant scaling
problems with such an implementation.

The fallback through the subsystem means that the subsystem can do locking
that scales better. In the case of SLUB we fall back to a bit lock in the
page struct which is a hot cache line in the hotpaths. This is the same
approach as used before the lockless patches and we expect the performance
on platforms not supporting cmpxchg_double to stay the same.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [slubllv7 04/17] x86: Add support for cmpxchg_double
  2011-06-15 14:26     ` Christoph Lameter
@ 2011-06-15 16:39       ` Tejun Heo
  2011-06-15 17:19         ` Christoph Lameter
  0 siblings, 1 reply; 41+ messages in thread
From: Tejun Heo @ 2011-06-15 16:39 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, David Rientjes, Eric Dumazet, H. Peter Anvin,
	linux-kernel, Thomas Gleixner

Hello, Christoph.

On Wed, Jun 15, 2011 at 09:26:15AM -0500, Christoph Lameter wrote:
> > Do we really need cmpxchg16b*() macros separately?  Why not just
> > collapse them into cmpxchg_double*()?  Also, it would be better if we
> > have the same level of VM_BUG_ON() checks as in percpu cmpxchg_double
> > ops.  Maybe we should put them in a separate macro?
> 
> The method here is to put all the high level checks in cmpxchg_double()
> and then do the low level asm stuff in cmpxchg16b macros. I think that is
> a good separation.

I don't know; then, I think the name better clearly indicate that
they're not used outside of implementation.  I don't see merit in
keeping them separate.

> > > +#define system_has_cmpxchg_double() cpu_has_cx16
> >
> > Where's the fallback %false definition for the above feature macro for
> > archs which don't support cmpxchg_double?  Also, is system_has_*()
> > conventional?  Isn't arch_has_*() more conventional for this purpose?
> 
> There is a convention for querying processor flags from core code?

At least generic ptrace code uses arch_has_block/single_step().
Probably better than introducing something completely new.

> The system_has_cmpxchg_double() is only used if the arch defines
> CONFIG_CMPXCHG_DOUBLE

Why?  What's the benefit of that?

> This way it is done in the same way on 32 bit than on 64 bit. The use of
> cmpxchg64 also means that some of the parameters would have to be combined
> to form 64 bit ints from the 32 bit ones before __cmpxchg64 could be used.
> 
> __cmpxchg64 has different parameter conventions.

But they all just deal with the starting addresses and the _local
version already has proper fallback implementation.

> > Another thing is that choosing different code path depending on
> > has_cmpxchg_double() would be quite messy and won't bode well with
> > many people.  I agree that fallback implementation would be heavier
> > for SMP safe operations but some archs already do that for cmpxchg
> > (forgot which one).  If we're gonna export this to generic code,
> > wouldn't it be better to implement proper generic fallbacks and
> > provide has_*() as hint?
> 
> A generic fallback for cmpxchg_double would mean having to disable
> interrupts and then take a global spinlock. There are significant scaling
> problems with such an implementation.
> 
> The fallback through the subsystem means that the subsystem can do locking
> that scales better. In the case of SLUB we fall back to a bit lock in the
> page struct which is a hot cache line in the hotpaths. This is the same
> approach as used before the lockless patches and we expect the performance
> on platforms not supporting cmpxchg_double to stay the same.

Yes, that's nice but you're introducing new operations and they should
meet the usual conventions and cmpxchg fallback on the arch which I
don't recall now already uses hashed lock so it's not like this is
completely new.  As added, the interface basically requires extreme
ifdeffery which isn't good.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [slubllv7 04/17] x86: Add support for cmpxchg_double
  2011-06-15 16:39       ` Tejun Heo
@ 2011-06-15 17:19         ` Christoph Lameter
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Lameter @ 2011-06-15 17:19 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Pekka Enberg, David Rientjes, Eric Dumazet, H. Peter Anvin,
	linux-kernel, Thomas Gleixner

On Wed, 15 Jun 2011, Tejun Heo wrote:

> > > > +#define system_has_cmpxchg_double() cpu_has_cx16
> > >
> > > Where's the fallback %false definition for the above feature macro for
> > > archs which don't support cmpxchg_double?  Also, is system_has_*()
> > > conventional?  Isn't arch_has_*() more conventional for this purpose?
> >
> > There is a convention for querying processor flags from core code?
>
> At least generic ptrace code uses arch_has_block/single_step().
> Probably better than introducing something completely new.

Its not a property of the arch that we are after. We need to know of the
hardware that is running the kernel (the system) has that capability.

> > The system_has_cmpxchg_double() is only used if the arch defines
> > CONFIG_CMPXCHG_DOUBLE
>
> Why?  What's the benefit of that?

You dont have to define system_has_cmpxchg_double() for systems without
CONFIG_CMPXCHG_DOUBLE.

> > This way it is done in the same way on 32 bit than on 64 bit. The use of
> > cmpxchg64 also means that some of the parameters would have to be combined
> > to form 64 bit ints from the 32 bit ones before __cmpxchg64 could be used.
> >
> > __cmpxchg64 has different parameter conventions.
>
> But they all just deal with the starting addresses and the _local
> version already has proper fallback implementation.

The local version is not that problematic since it is just provided for
completeness. If you want to go through the gyrations of providing a
conversion layer with casting and conversion between 32 and 64 bit
entities then please do so. But its not going to be nice.

> > The fallback through the subsystem means that the subsystem can do locking
> > that scales better. In the case of SLUB we fall back to a bit lock in the
> > page struct which is a hot cache line in the hotpaths. This is the same
> > approach as used before the lockless patches and we expect the performance
> > on platforms not supporting cmpxchg_double to stay the same.
>
> Yes, that's nice but you're introducing new operations and they should
> meet the usual conventions and cmpxchg fallback on the arch which I
> don't recall now already uses hashed lock so it's not like this is
> completely new.  As added, the interface basically requires extreme
> ifdeffery which isn't good.

I'd be glad if you could improve on it. But the fallback to a hashed lock
would also mean additional cache footprint which would not be acceptable
for the hotpaths that this is used for now.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [tip:x86/atomic] x86: Add support for cmpxchg_double
  2011-06-01 17:25 ` [slubllv7 04/17] x86: Add support for cmpxchg_double Christoph Lameter
  2011-06-09  9:53   ` Pekka Enberg
  2011-06-15  8:55   ` Tejun Heo
@ 2011-06-25 23:49   ` tip-bot for Christoph Lameter
  2 siblings, 0 replies; 41+ messages in thread
From: tip-bot for Christoph Lameter @ 2011-06-25 23:49 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, penberg, tj, cl, tglx

Commit-ID:  3824abd1279ef75f791c43a6b1e3162ae0692b42
Gitweb:     http://git.kernel.org/tip/3824abd1279ef75f791c43a6b1e3162ae0692b42
Author:     Christoph Lameter <cl@linux.com>
AuthorDate: Wed, 1 Jun 2011 12:25:47 -0500
Committer:  H. Peter Anvin <hpa@zytor.com>
CommitDate: Sat, 25 Jun 2011 12:17:32 -0700

x86: Add support for cmpxchg_double

A simple implementation that only supports the word size and does not
have a fallback mode (would require a spinlock).

Add 32 and 64 bit support for cmpxchg_double. cmpxchg double uses
the cmpxchg8b or cmpxchg16b instruction on x86 processors to compare
and swap 2 machine words. This allows lockless algorithms to move more
context information through critical sections.

Set a flag CONFIG_CMPXCHG_DOUBLE to signal that support for double word
cmpxchg detection has been build into the kernel. Note that each subsystem
using cmpxchg_double has to implement a fall back mechanism as long as
we offer support for processors that do not implement cmpxchg_double.

Reviewed-by: H. Peter Anvin <hpa@zytor.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Christoph Lameter <cl@linux.com>
Link: http://lkml.kernel.org/r/20110601172614.173427964@linux.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
 arch/x86/Kconfig.cpu              |    3 ++
 arch/x86/include/asm/cmpxchg_32.h |   48 +++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/cmpxchg_64.h |   45 ++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/cpufeature.h |    2 +
 4 files changed, 98 insertions(+), 0 deletions(-)

diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
index 6a7cfdf..e3ca7e0 100644
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -312,6 +312,9 @@ config X86_CMPXCHG
 config CMPXCHG_LOCAL
 	def_bool X86_64 || (X86_32 && !M386)
 
+config CMPXCHG_DOUBLE
+	def_bool y
+
 config X86_L1_CACHE_SHIFT
 	int
 	default "7" if MPENTIUM4 || MPSC
diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index 284a6e8..3deb725 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -280,4 +280,52 @@ static inline unsigned long cmpxchg_386(volatile void *ptr, unsigned long old,
 
 #endif
 
+#define cmpxchg8b(ptr, o1, o2, n1, n2)				\
+({								\
+	char __ret;						\
+	__typeof__(o2) __dummy;					\
+	__typeof__(*(ptr)) __old1 = (o1);			\
+	__typeof__(o2) __old2 = (o2);				\
+	__typeof__(*(ptr)) __new1 = (n1);			\
+	__typeof__(o2) __new2 = (n2);				\
+	asm volatile(LOCK_PREFIX "cmpxchg8b %2; setz %1"	\
+		       : "=d"(__dummy), "=a" (__ret), "+m" (*ptr)\
+		       : "a" (__old1), "d"(__old2),		\
+		         "b" (__new1), "c" (__new2)		\
+		       : "memory");				\
+	__ret; })
+
+
+#define cmpxchg8b_local(ptr, o1, o2, n1, n2)			\
+({								\
+	char __ret;						\
+	__typeof__(o2) __dummy;					\
+	__typeof__(*(ptr)) __old1 = (o1);			\
+	__typeof__(o2) __old2 = (o2);				\
+	__typeof__(*(ptr)) __new1 = (n1);			\
+	__typeof__(o2) __new2 = (n2);				\
+	asm volatile("cmpxchg8b %2; setz %1"			\
+		       : "=d"(__dummy), "=a"(__ret), "+m" (*ptr)\
+		       : "a" (__old), "d"(__old2),		\
+		         "b" (__new1), "c" (__new2),		\
+		       : "memory");				\
+	__ret; })
+
+
+#define cmpxchg_double(ptr, o1, o2, n1, n2)				\
+({									\
+	BUILD_BUG_ON(sizeof(*(ptr)) != 4);				\
+	VM_BUG_ON((unsigned long)(ptr) % 8);				\
+	cmpxchg8b((ptr), (o1), (o2), (n1), (n2));			\
+})
+
+#define cmpxchg_double_local(ptr, o1, o2, n1, n2)			\
+({									\
+       BUILD_BUG_ON(sizeof(*(ptr)) != 4);				\
+       VM_BUG_ON((unsigned long)(ptr) % 8);				\
+       cmpxchg16b_local((ptr), (o1), (o2), (n1), (n2));			\
+})
+
+#define system_has_cmpxchg_double() cpu_has_cx8
+
 #endif /* _ASM_X86_CMPXCHG_32_H */
diff --git a/arch/x86/include/asm/cmpxchg_64.h b/arch/x86/include/asm/cmpxchg_64.h
index 423ae58..7cf5c0a 100644
--- a/arch/x86/include/asm/cmpxchg_64.h
+++ b/arch/x86/include/asm/cmpxchg_64.h
@@ -151,4 +151,49 @@ extern void __cmpxchg_wrong_size(void);
 	cmpxchg_local((ptr), (o), (n));					\
 })
 
+#define cmpxchg16b(ptr, o1, o2, n1, n2)				\
+({								\
+	char __ret;						\
+	__typeof__(o2) __junk;					\
+	__typeof__(*(ptr)) __old1 = (o1);			\
+	__typeof__(o2) __old2 = (o2);				\
+	__typeof__(*(ptr)) __new1 = (n1);			\
+	__typeof__(o2) __new2 = (n2);				\
+	asm volatile(LOCK_PREFIX "cmpxchg16b %2;setz %1"	\
+		       : "=d"(__junk), "=a"(__ret), "+m" (*ptr)	\
+		       : "b"(__new1), "c"(__new2),		\
+		         "a"(__old1), "d"(__old2));		\
+	__ret; })
+
+
+#define cmpxchg16b_local(ptr, o1, o2, n1, n2)			\
+({								\
+	char __ret;						\
+	__typeof__(o2) __junk;					\
+	__typeof__(*(ptr)) __old1 = (o1);			\
+	__typeof__(o2) __old2 = (o2);				\
+	__typeof__(*(ptr)) __new1 = (n1);			\
+	__typeof__(o2) __new2 = (n2);				\
+	asm volatile("cmpxchg16b %2;setz %1"			\
+		       : "=d"(__junk), "=a"(__ret), "+m" (*ptr)	\
+		       : "b"(__new1), "c"(__new2),		\
+		         "a"(__old1), "d"(__old2));		\
+	__ret; })
+
+#define cmpxchg_double(ptr, o1, o2, n1, n2)				\
+({									\
+	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
+	VM_BUG_ON((unsigned long)(ptr) % 16);				\
+	cmpxchg16b((ptr), (o1), (o2), (n1), (n2));			\
+})
+
+#define cmpxchg_double_local(ptr, o1, o2, n1, n2)			\
+({									\
+	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
+	VM_BUG_ON((unsigned long)(ptr) % 16);				\
+	cmpxchg16b_local((ptr), (o1), (o2), (n1), (n2));		\
+})
+
+#define system_has_cmpxchg_double() cpu_has_cx16
+
 #endif /* _ASM_X86_CMPXCHG_64_H */
diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 71cc380..d1053cd 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -288,6 +288,8 @@ extern const char * const x86_power_flags[32];
 #define cpu_has_hypervisor	boot_cpu_has(X86_FEATURE_HYPERVISOR)
 #define cpu_has_pclmulqdq	boot_cpu_has(X86_FEATURE_PCLMULQDQ)
 #define cpu_has_perfctr_core	boot_cpu_has(X86_FEATURE_PERFCTR_CORE)
+#define cpu_has_cx8		boot_cpu_has(X86_FEATURE_CX8)
+#define cpu_has_cx16		boot_cpu_has(X86_FEATURE_CX16)
 
 #if defined(CONFIG_X86_INVLPG) || defined(CONFIG_X86_64)
 # define cpu_has_invlpg		1

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [slubllv7 06/17] slub: Add cmpxchg_double_slab()
  2011-06-01 17:25 ` [slubllv7 06/17] slub: Add cmpxchg_double_slab() Christoph Lameter
@ 2011-07-11 19:55   ` Eric Dumazet
  2011-07-12 15:59     ` Christoph Lameter
  0 siblings, 1 reply; 41+ messages in thread
From: Eric Dumazet @ 2011-07-11 19:55 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, David Rientjes, H. Peter Anvin, linux-kernel,
	Thomas Gleixner

Le mercredi 01 juin 2011 à 12:25 -0500, Christoph Lameter a écrit :
> pièce jointe document texte brut (cmpxchg_double_slab)
> Add a function that operates on the second doubleword in the page struct
> and manipulates the object counters, the freelist and the frozen attribute.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> ---
>  include/linux/slub_def.h |    1 
>  mm/slub.c                |   65 +++++++++++++++++++++++++++++++++++++++++++----
>  2 files changed, 61 insertions(+), 5 deletions(-)
> 
> Index: linux-2.6/mm/slub.c
> ===================================================================
> --- linux-2.6.orig/mm/slub.c	2011-05-31 11:57:59.622937422 -0500
> +++ linux-2.6/mm/slub.c	2011-05-31 12:03:16.652935392 -0500
> @@ -131,6 +131,9 @@ static inline int kmem_cache_debug(struc
>  /* Enable to test recovery from slab corruption on boot */
>  #undef SLUB_RESILIENCY_TEST
>  
> +/* Enable to log cmpxchg failures */
> +#undef SLUB_DEBUG_CMPXCHG
> +
>  /*
>   * Mininum number of partial slabs. These will be left on the partial
>   * lists even if they are empty. kmem_cache_shrink may reclaim them.
> @@ -170,6 +173,7 @@ static inline int kmem_cache_debug(struc
>  
>  /* Internal SLUB flags */
>  #define __OBJECT_POISON		0x80000000UL /* Poison object */
> +#define __CMPXCHG_DOUBLE	0x40000000UL /* Use cmpxchg_double */
>  
>  static int kmem_size = sizeof(struct kmem_cache);
>  
> @@ -338,6 +342,37 @@ static inline int oo_objects(struct kmem
>  	return x.x & OO_MASK;
>  }
>  
> +static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
> +		void *freelist_old, unsigned long counters_old,
> +		void *freelist_new, unsigned long counters_new,
> +		const char *n)
> +{
> +#ifdef CONFIG_CMPXCHG_DOUBLE
> +	if (s->flags & __CMPXCHG_DOUBLE) {
> +		if (cmpxchg_double(&page->freelist,
> +			freelist_old, counters_old,
> +			freelist_new, counters_new))
> +		return 1;
> +	} else
> +#endif
> +	{
> +		if (page->freelist == freelist_old && page->counters == counters_old) {
> +			page->freelist = freelist_new;
> +			page->counters = counters_new;
> +			return 1;
> +		}
> +	}

This works only on 64bit arches, where page->counters get all following
fields combined : inuse, objects, frozen, _count

On 32bit arch, I am afraid you have to disable the cmpxchg_double()
thing ?





^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [slubllv7 06/17] slub: Add cmpxchg_double_slab()
  2011-07-11 19:55   ` Eric Dumazet
@ 2011-07-12 15:59     ` Christoph Lameter
  2011-07-12 16:06       ` Eric Dumazet
  0 siblings, 1 reply; 41+ messages in thread
From: Christoph Lameter @ 2011-07-12 15:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Pekka Enberg, David Rientjes, H. Peter Anvin, linux-kernel,
	Thomas Gleixner

On Mon, 11 Jul 2011, Eric Dumazet wrote:

> > @@ -338,6 +342,37 @@ static inline int oo_objects(struct kmem
> >  	return x.x & OO_MASK;
> >  }
> >
> > +static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
> > +		void *freelist_old, unsigned long counters_old,
> > +		void *freelist_new, unsigned long counters_new,
> > +		const char *n)
> > +{
> > +#ifdef CONFIG_CMPXCHG_DOUBLE
> > +	if (s->flags & __CMPXCHG_DOUBLE) {
> > +		if (cmpxchg_double(&page->freelist,
> > +			freelist_old, counters_old,
> > +			freelist_new, counters_new))
> > +		return 1;
> > +	} else
> > +#endif
> > +	{
> > +		if (page->freelist == freelist_old && page->counters == counters_old) {
> > +			page->freelist = freelist_new;
> > +			page->counters = counters_new;
> > +			return 1;
> > +		}
> > +	}
>
> This works only on 64bit arches, where page->counters get all following
> fields combined : inuse, objects, frozen, _count
>
> On 32bit arch, I am afraid you have to disable the cmpxchg_double()
> thing ?

We do not need to have _count included. This is just there because the
field is in the way on 64 bit and we can only do 2x 64 bit cmpxchges.  On
32 bi we can drop _count from "counters".




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [slubllv7 06/17] slub: Add cmpxchg_double_slab()
  2011-07-12 15:59     ` Christoph Lameter
@ 2011-07-12 16:06       ` Eric Dumazet
  2011-07-12 16:47         ` Christoph Lameter
  0 siblings, 1 reply; 41+ messages in thread
From: Eric Dumazet @ 2011-07-12 16:06 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, David Rientjes, H. Peter Anvin, linux-kernel,
	Thomas Gleixner

Le mardi 12 juillet 2011 à 10:59 -0500, Christoph Lameter a écrit :

> We do not need to have _count included. This is just there because the
> field is in the way on 64 bit and we can only do 2x 64 bit cmpxchges.  On
> 32 bi we can drop _count from "counters".
> 

OK, thanks for clarification.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [slubllv7 06/17] slub: Add cmpxchg_double_slab()
  2011-07-12 16:06       ` Eric Dumazet
@ 2011-07-12 16:47         ` Christoph Lameter
  2011-07-12 18:40           ` H. Peter Anvin
  0 siblings, 1 reply; 41+ messages in thread
From: Christoph Lameter @ 2011-07-12 16:47 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Pekka Enberg, David Rientjes, H. Peter Anvin, linux-kernel,
	Thomas Gleixner

[-- Attachment #1: Type: TEXT/PLAIN, Size: 652 bytes --]

On Tue, 12 Jul 2011, Eric Dumazet wrote:

> Le mardi 12 juillet 2011 à 10:59 -0500, Christoph Lameter a écrit :
>
> > We do not need to have _count included. This is just there because the
> > field is in the way on 64 bit and we can only do 2x 64 bit cmpxchges.  On
> > 32 bi we can drop _count from "counters".
> >
>
> OK, thanks for clarification.

Still I'd like to get some ideas on how to make the whole thing much
cleaner. Isnt there some way to convert a struct to an unsigned long
without going through a union? And a way to convert a struct + 32 bit
atomic_t into a 64 bit unsigned long? Would simplify things significantly.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [slubllv7 06/17] slub: Add cmpxchg_double_slab()
  2011-07-12 16:47         ` Christoph Lameter
@ 2011-07-12 18:40           ` H. Peter Anvin
  2011-07-12 18:53             ` Christoph Lameter
  0 siblings, 1 reply; 41+ messages in thread
From: H. Peter Anvin @ 2011-07-12 18:40 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Eric Dumazet, Pekka Enberg, David Rientjes, linux-kernel,
	Thomas Gleixner

On 07/12/2011 09:47 AM, Christoph Lameter wrote:
> On Tue, 12 Jul 2011, Eric Dumazet wrote:
> 
>> Le mardi 12 juillet 2011 à 10:59 -0500, Christoph Lameter a écrit :
>>
>>> We do not need to have _count included. This is just there because the
>>> field is in the way on 64 bit and we can only do 2x 64 bit cmpxchges.  On
>>> 32 bi we can drop _count from "counters".
>>
>> OK, thanks for clarification.
> 
> Still I'd like to get some ideas on how to make the whole thing much
> cleaner. Isnt there some way to convert a struct to an unsigned long
> without going through a union? And a way to convert a struct + 32 bit
> atomic_t into a 64 bit unsigned long? Would simplify things significantly.

If you know it is in memory you can cast ("pun") the pointer.

If it's not in memory that can be inefficient, though.

	-hpa

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [slubllv7 06/17] slub: Add cmpxchg_double_slab()
  2011-07-12 18:40           ` H. Peter Anvin
@ 2011-07-12 18:53             ` Christoph Lameter
  2011-07-12 20:40               ` H. Peter Anvin
  0 siblings, 1 reply; 41+ messages in thread
From: Christoph Lameter @ 2011-07-12 18:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Eric Dumazet, Pekka Enberg, David Rientjes, linux-kernel,
	Thomas Gleixner

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1109 bytes --]

On Tue, 12 Jul 2011, H. Peter Anvin wrote:

> On 07/12/2011 09:47 AM, Christoph Lameter wrote:
> > On Tue, 12 Jul 2011, Eric Dumazet wrote:
> >
> >> Le mardi 12 juillet 2011 à 10:59 -0500, Christoph Lameter a écrit :
> >>
> >>> We do not need to have _count included. This is just there because the
> >>> field is in the way on 64 bit and we can only do 2x 64 bit cmpxchges.  On
> >>> 32 bi we can drop _count from "counters".
> >>
> >> OK, thanks for clarification.
> >
> > Still I'd like to get some ideas on how to make the whole thing much
> > cleaner. Isnt there some way to convert a struct to an unsigned long
> > without going through a union? And a way to convert a struct + 32 bit
> > atomic_t into a 64 bit unsigned long? Would simplify things significantly.
>
> If you know it is in memory you can cast ("pun") the pointer.

Well I use a page struct on the stack now to put the information in memory
and then use the union to cast it.

> If it's not in memory that can be inefficient, though.

Yeah. Isnt there some C trick to cast a word size struct to unsigned long?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [slubllv7 06/17] slub: Add cmpxchg_double_slab()
  2011-07-12 18:53             ` Christoph Lameter
@ 2011-07-12 20:40               ` H. Peter Anvin
  0 siblings, 0 replies; 41+ messages in thread
From: H. Peter Anvin @ 2011-07-12 20:40 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Eric Dumazet, Pekka Enberg, David Rientjes, linux-kernel,
	Thomas Gleixner

On 07/12/2011 11:53 AM, Christoph Lameter wrote:
> Yeah. Isnt there some C trick to cast a word size struct to unsigned long?

Only a union.

	-hpa


^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2011-07-12 20:46 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-01 17:25 [slubllv7 00/17] SLUB: Lockless freelists for objects V7 Christoph Lameter
2011-06-01 17:25 ` [slubllv7 01/17] slub: Push irq disable into allocate_slab() Christoph Lameter
2011-06-01 17:25 ` [slubllv7 02/17] slub: Do not use frozen page flag but a bit in the page counters Christoph Lameter
2011-06-01 17:25 ` [slubllv7 03/17] slub: Move page->frozen handling near where the page->freelist handling occurs Christoph Lameter
2011-06-01 17:25 ` [slubllv7 04/17] x86: Add support for cmpxchg_double Christoph Lameter
2011-06-09  9:53   ` Pekka Enberg
2011-06-10 15:17     ` Christoph Lameter
2011-06-11  9:50       ` Pekka Enberg
2011-06-11 17:02         ` Christoph Lameter
2011-06-14  5:49           ` Pekka Enberg
2011-06-14  8:04             ` Ingo Molnar
2011-06-14 14:04               ` Christoph Lameter
2011-06-14 15:05                 ` H. Peter Anvin
2011-06-15  8:55   ` Tejun Heo
2011-06-15 14:26     ` Christoph Lameter
2011-06-15 16:39       ` Tejun Heo
2011-06-15 17:19         ` Christoph Lameter
2011-06-25 23:49   ` [tip:x86/atomic] " tip-bot for Christoph Lameter
2011-06-01 17:25 ` [slubllv7 05/17] mm: Rearrange struct page Christoph Lameter
2011-06-09  9:57   ` Pekka Enberg
2011-06-09 16:45     ` Andrew Morton
2011-06-09 17:03       ` [PATCH] checkpatch: Add a "prefer __aligned" check Joe Perches
2011-06-01 17:25 ` [slubllv7 06/17] slub: Add cmpxchg_double_slab() Christoph Lameter
2011-07-11 19:55   ` Eric Dumazet
2011-07-12 15:59     ` Christoph Lameter
2011-07-12 16:06       ` Eric Dumazet
2011-07-12 16:47         ` Christoph Lameter
2011-07-12 18:40           ` H. Peter Anvin
2011-07-12 18:53             ` Christoph Lameter
2011-07-12 20:40               ` H. Peter Anvin
2011-06-01 17:25 ` [slubllv7 07/17] slub: explicit list_lock taking Christoph Lameter
2011-06-01 17:25 ` [slubllv7 08/17] slub: Pass kmem_cache struct to lock and freeze slab Christoph Lameter
2011-06-01 17:25 ` [slubllv7 09/17] slub: Rework allocator fastpaths Christoph Lameter
2011-06-01 17:25 ` [slubllv7 10/17] slub: Invert locking and avoid slab lock Christoph Lameter
2011-06-01 17:25 ` [slubllv7 11/17] slub: Disable interrupts in free_debug processing Christoph Lameter
2011-06-01 17:25 ` [slubllv7 12/17] slub: Avoid disabling interrupts in free slowpath Christoph Lameter
2011-06-01 17:25 ` [slubllv7 13/17] slub: Get rid of the another_slab label Christoph Lameter
2011-06-01 17:25 ` [slubllv7 14/17] slub: Add statistics for the case that the current slab does not match the node Christoph Lameter
2011-06-01 17:25 ` [slubllv7 15/17] slub: fast release on full slab Christoph Lameter
2011-06-01 17:25 ` [slubllv7 16/17] slub: Not necessary to check for empty slab on load_freelist Christoph Lameter
2011-06-01 17:26 ` [slubllv7 17/17] slub: slabinfo update for cmpxchg handling Christoph Lameter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.