All of lore.kernel.org
 help / color / mirror / Atom feed
* [slub rfc1 00/12] slub: RFC lockless allocation paths V1
@ 2011-09-02 20:46 Christoph Lameter
  2011-09-02 20:46 ` [slub rfc1 01/12] slub: free slabs without holding locks (V2) Christoph Lameter
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: Christoph Lameter @ 2011-09-02 20:46 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, linux-mm

Draft of a patchset to make the allocation paths lockless as well.

I have done just a hackbench test on this to make sure that it works.
Various additional overhead is added to the fastpaths so this may
require additional work before it becomes mergeable.

The first two patches are cleanup patches that have been posted a couple of
times. Those can be merged.

The basic principle is to use double word atomic allocations to check
lists of objects in and out of the per cpu structures and the
per page structures.

Since we can only handle two words atomically we need to reduce the
state being kept for per cpu queues. Thus the page and the node field
in kmem_cache_cpu have to be dropped. Both of those values can be
determined from an object pointer after all but the calculation of
those values impacts the performance of the allocator. Not sure what
the impact is. Could be offset by the removal of the overhead for
interrupt disabling/enabling and the code savings because the per
cpu state for queueing is much less.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [slub rfc1 01/12] slub: free slabs without holding locks (V2)
  2011-09-02 20:46 [slub rfc1 00/12] slub: RFC lockless allocation paths V1 Christoph Lameter
@ 2011-09-02 20:46 ` Christoph Lameter
  2011-09-02 20:46 ` [slub rfc1 02/12] slub: Remove useless statements in __slab_alloc Christoph Lameter
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2011-09-02 20:46 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, linux-mm

[-- Attachment #1: slub_free_wo_locks --]
[-- Type: text/plain, Size: 3365 bytes --]

There are two situations in which slub holds a lock while releasing
pages:

	A. During kmem_cache_shrink()
	B. During kmem_cache_close()

For A build a list while holding the lock and then release the pages
later. In case of B we are the last remaining user of the slab so
there is no need to take the listlock.

After this patch all calls to the page allocator to free pages are
done without holding any spinlocks. kmem_cache_destroy() will still
hold the slub_lock semaphore.

V1->V2. Remove kfree. Avoid locking in free_partial.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 mm/slub.c |   26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-08-09 13:01:59.071582163 -0500
+++ linux-2.6/mm/slub.c	2011-08-09 13:05:00.051582012 -0500
@@ -2970,13 +2970,13 @@ static void list_slab_objects(struct kme
 
 /*
  * Attempt to free all partial slabs on a node.
+ * This is called from kmem_cache_close(). We must be the last thread
+ * using the cache and therefore we do not need to lock anymore.
  */
 static void free_partial(struct kmem_cache *s, struct kmem_cache_node *n)
 {
-	unsigned long flags;
 	struct page *page, *h;
 
-	spin_lock_irqsave(&n->list_lock, flags);
 	list_for_each_entry_safe(page, h, &n->partial, lru) {
 		if (!page->inuse) {
 			remove_partial(n, page);
@@ -2986,7 +2986,6 @@ static void free_partial(struct kmem_cac
 				"Objects remaining on kmem_cache_close()");
 		}
 	}
-	spin_unlock_irqrestore(&n->list_lock, flags);
 }
 
 /*
@@ -3020,6 +3019,7 @@ void kmem_cache_destroy(struct kmem_cach
 	s->refcount--;
 	if (!s->refcount) {
 		list_del(&s->list);
+		up_write(&slub_lock);
 		if (kmem_cache_close(s)) {
 			printk(KERN_ERR "SLUB %s: %s called for cache that "
 				"still has objects.\n", s->name, __func__);
@@ -3028,8 +3028,8 @@ void kmem_cache_destroy(struct kmem_cach
 		if (s->flags & SLAB_DESTROY_BY_RCU)
 			rcu_barrier();
 		sysfs_slab_remove(s);
-	}
-	up_write(&slub_lock);
+	} else
+		up_write(&slub_lock);
 }
 EXPORT_SYMBOL(kmem_cache_destroy);
 
@@ -3347,23 +3347,23 @@ int kmem_cache_shrink(struct kmem_cache
 		 * list_lock. page->inuse here is the upper limit.
 		 */
 		list_for_each_entry_safe(page, t, &n->partial, lru) {
-			if (!page->inuse) {
-				remove_partial(n, page);
-				discard_slab(s, page);
-			} else {
-				list_move(&page->lru,
-				slabs_by_inuse + page->inuse);
-			}
+			list_move(&page->lru, slabs_by_inuse + page->inuse);
+			if (!page->inuse)
+				n->nr_partial--;
 		}
 
 		/*
 		 * Rebuild the partial list with the slabs filled up most
 		 * first and the least used slabs at the end.
 		 */
-		for (i = objects - 1; i >= 0; i--)
+		for (i = objects - 1; i > 0; i--)
 			list_splice(slabs_by_inuse + i, n->partial.prev);
 
 		spin_unlock_irqrestore(&n->list_lock, flags);
+
+		/* Release empty slabs */
+		list_for_each_entry_safe(page, t, slabs_by_inuse, lru)
+			discard_slab(s, page);
 	}
 
 	kfree(slabs_by_inuse);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [slub rfc1 02/12] slub: Remove useless statements in __slab_alloc
  2011-09-02 20:46 [slub rfc1 00/12] slub: RFC lockless allocation paths V1 Christoph Lameter
  2011-09-02 20:46 ` [slub rfc1 01/12] slub: free slabs without holding locks (V2) Christoph Lameter
@ 2011-09-02 20:46 ` Christoph Lameter
  2011-09-02 20:47 ` [slub rfc1 03/12] slub: Get rid of the node field Christoph Lameter
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2011-09-02 20:46 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, torvalds, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, linux-mm

[-- Attachment #1: remove_useless_page_null --]
[-- Type: text/plain, Size: 1458 bytes --]

Two statements in __slab_alloc() do not have any effect.

1. c->page is already set to NULL by deactivate_slab() called right before.

2. gfpflags are masked in new_slab() before being passed to the page
   allocator. There is no need to mask gfpflags in __slab_alloc in particular
   since most frequent processing in __slab_alloc does not require the use of a
   gfpmask.

Cc: torvalds@linux-foundation.org
Signed-off-by: Christoph Lameter <cl@linux.com>

---
 mm/slub.c |    4 ----
 1 file changed, 4 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-08-01 11:03:15.000000000 -0500
+++ linux-2.6/mm/slub.c	2011-08-01 11:04:06.385859038 -0500
@@ -2064,9 +2064,6 @@ static void *__slab_alloc(struct kmem_ca
 	c = this_cpu_ptr(s->cpu_slab);
 #endif
 
-	/* We handle __GFP_ZERO in the caller */
-	gfpflags &= ~__GFP_ZERO;
-
 	page = c->page;
 	if (!page)
 		goto new_slab;
@@ -2163,7 +2160,6 @@ debug:
 
 	c->freelist = get_freepointer(s, object);
 	deactivate_slab(s, c);
-	c->page = NULL;
 	c->node = NUMA_NO_NODE;
 	local_irq_restore(flags);
 	return object;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [slub rfc1 03/12] slub: Get rid of the node field
  2011-09-02 20:46 [slub rfc1 00/12] slub: RFC lockless allocation paths V1 Christoph Lameter
  2011-09-02 20:46 ` [slub rfc1 01/12] slub: free slabs without holding locks (V2) Christoph Lameter
  2011-09-02 20:46 ` [slub rfc1 02/12] slub: Remove useless statements in __slab_alloc Christoph Lameter
@ 2011-09-02 20:47 ` Christoph Lameter
  2011-09-02 20:47 ` [slub rfc1 04/12] slub: Separate out kmem_cache_cpu processing from deactivate_slab Christoph Lameter
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2011-09-02 20:47 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, linux-mm

[-- Attachment #1: get_rid_of_cnode --]
[-- Type: text/plain, Size: 2937 bytes --]

The node field is always page_to_nid(c->page). So its rather easy to
replace. Note that there will be additional overhead in various hot paths.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 include/linux/slub_def.h |    1 -
 mm/slub.c                |   12 +++++-------
 2 files changed, 5 insertions(+), 8 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-09-01 07:27:22.000000000 -0500
+++ linux-2.6/mm/slub.c	2011-09-02 08:20:19.021219504 -0500
@@ -1588,7 +1588,6 @@ static inline int acquire_slab(struct km
 		/* Populate the per cpu freelist */
 		this_cpu_write(s->cpu_slab->freelist, freelist);
 		this_cpu_write(s->cpu_slab->page, page);
-		this_cpu_write(s->cpu_slab->node, page_to_nid(page));
 		return 1;
 	} else {
 		/*
@@ -1958,7 +1957,7 @@ static void flush_all(struct kmem_cache
 static inline int node_match(struct kmem_cache_cpu *c, int node)
 {
 #ifdef CONFIG_NUMA
-	if (node != NUMA_NO_NODE && c->node != node)
+	if (node != NUMA_NO_NODE && page_to_nid(c->page) != node)
 		return 0;
 #endif
 	return 1;
@@ -2142,7 +2141,6 @@ new_slab:
 		page->inuse = page->objects;
 
 		stat(s, ALLOC_SLAB);
-		c->node = page_to_nid(page);
 		c->page = page;
 
 		if (kmem_cache_debug(s))
@@ -2160,7 +2158,6 @@ debug:
 
 	c->freelist = get_freepointer(s, object);
 	deactivate_slab(s, c);
-	c->node = NUMA_NO_NODE;
 	local_irq_restore(flags);
 	return object;
 }
@@ -4316,9 +4313,10 @@ static ssize_t show_slab_objects(struct
 		for_each_possible_cpu(cpu) {
 			struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);
 
-			if (!c || c->node < 0)
+			if (!c || !c->freelist)
 				continue;
 
+			node = page_to_nid(c->page);
 			if (c->page) {
 					if (flags & SO_TOTAL)
 						x = c->page->objects;
@@ -4328,9 +4326,9 @@ static ssize_t show_slab_objects(struct
 					x = 1;
 
 				total += x;
-				nodes[c->node] += x;
+				nodes[node] += x;
 			}
-			per_cpu[c->node]++;
+			per_cpu[node]++;
 		}
 	}
 
Index: linux-2.6/include/linux/slub_def.h
===================================================================
--- linux-2.6.orig/include/linux/slub_def.h	2011-09-01 07:26:53.000000000 -0500
+++ linux-2.6/include/linux/slub_def.h	2011-09-02 08:18:46.071220101 -0500
@@ -42,7 +42,6 @@ struct kmem_cache_cpu {
 	void **freelist;	/* Pointer to next available object */
 	unsigned long tid;	/* Globally unique transaction id */
 	struct page *page;	/* The slab from which we are allocating */
-	int node;		/* The node of the page (or -1 for debug) */
 #ifdef CONFIG_SLUB_STATS
 	unsigned stat[NR_SLUB_STAT_ITEMS];
 #endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [slub rfc1 04/12] slub: Separate out kmem_cache_cpu processing from deactivate_slab
  2011-09-02 20:46 [slub rfc1 00/12] slub: RFC lockless allocation paths V1 Christoph Lameter
                   ` (2 preceding siblings ...)
  2011-09-02 20:47 ` [slub rfc1 03/12] slub: Get rid of the node field Christoph Lameter
@ 2011-09-02 20:47 ` Christoph Lameter
  2011-09-02 20:47 ` [slub rfc1 05/12] slub: Extract get_freelist from __slab_alloc Christoph Lameter
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2011-09-02 20:47 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, linux-mm

[-- Attachment #1: separate_deactivate_slab --]
[-- Type: text/plain, Size: 2524 bytes --]

Processing on fields of kmem_cache needs to be separate since we will be
handling that with cmpxchg_double later.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 mm/slub.c |   24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-09-02 08:20:19.021219504 -0500
+++ linux-2.6/mm/slub.c	2011-09-02 08:20:25.911219458 -0500
@@ -1771,14 +1771,12 @@ void init_kmem_cache_cpus(struct kmem_ca
 /*
  * Remove the cpu slab
  */
-static void deactivate_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
+static void deactivate_slab(struct kmem_cache *s, struct page *page, void *freelist)
 {
 	enum slab_modes { M_NONE, M_PARTIAL, M_FULL, M_FREE };
-	struct page *page = c->page;
 	struct kmem_cache_node *n = get_node(s, page_to_nid(page));
 	int lock = 0;
 	enum slab_modes l = M_NONE, m = M_NONE;
-	void *freelist;
 	void *nextfree;
 	int tail = 0;
 	struct page new;
@@ -1789,11 +1787,6 @@ static void deactivate_slab(struct kmem_
 		tail = 1;
 	}
 
-	c->tid = next_tid(c->tid);
-	c->page = NULL;
-	freelist = c->freelist;
-	c->freelist = NULL;
-
 	/*
 	 * Stage one: Free all available per cpu objects back
 	 * to the page freelist while it is still frozen. Leave the
@@ -1922,7 +1915,11 @@ redo:
 static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 {
 	stat(s, CPUSLAB_FLUSH);
-	deactivate_slab(s, c);
+	deactivate_slab(s, c->page, c->freelist);
+
+	c->tid = next_tid(c->tid);
+	c->page = NULL;
+	c->freelist = NULL;
 }
 
 /*
@@ -2069,7 +2066,9 @@ static void *__slab_alloc(struct kmem_ca
 
 	if (unlikely(!node_match(c, node))) {
 		stat(s, ALLOC_NODE_MISMATCH);
-		deactivate_slab(s, c);
+		deactivate_slab(s, c->page, c->freelist);
+		c->page = NULL;
+		c->freelist = NULL;
 		goto new_slab;
 	}
 
@@ -2156,8 +2155,9 @@ debug:
 	if (!object || !alloc_debug_processing(s, page, object, addr))
 		goto new_slab;
 
-	c->freelist = get_freepointer(s, object);
-	deactivate_slab(s, c);
+	deactivate_slab(s, c->page, get_freepointer(s, object));
+	c->page = NULL;
+	c->freelist = NULL;
 	local_irq_restore(flags);
 	return object;
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [slub rfc1 05/12] slub: Extract get_freelist from __slab_alloc
  2011-09-02 20:46 [slub rfc1 00/12] slub: RFC lockless allocation paths V1 Christoph Lameter
                   ` (3 preceding siblings ...)
  2011-09-02 20:47 ` [slub rfc1 04/12] slub: Separate out kmem_cache_cpu processing from deactivate_slab Christoph Lameter
@ 2011-09-02 20:47 ` Christoph Lameter
  2011-09-02 20:47 ` [slub rfc1 06/12] slub: Use freelist instead of "object" in __slab_alloc Christoph Lameter
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2011-09-02 20:47 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, linux-mm

[-- Attachment #1: extract_get_freelist --]
[-- Type: text/plain, Size: 2797 bytes --]

get_freelist retrieves free objects from the page freelist (put there by remote
frees) or deactivates a slab page if no more objects are available.

Signed-off-by: Christoph Lameter <cl@linux.com>


---
 mm/slub.c |   57 ++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 32 insertions(+), 25 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-09-02 08:20:25.911219458 -0500
+++ linux-2.6/mm/slub.c	2011-09-02 08:20:32.491219417 -0500
@@ -2024,6 +2024,37 @@ slab_out_of_memory(struct kmem_cache *s,
 }
 
 /*
+ * Check the page->freelist of a page and either transfer the freelist to the per cpu freelist
+ * or deactivate the page.
+ *
+ * The page is still frozen if the return value is not NULL.
+ *
+ * If this function returns NULL then the page has been unfrozen.
+ */
+static inline void *get_freelist(struct kmem_cache *s, struct page *page)
+{
+	struct page new;
+	unsigned long counters;
+	void *freelist;
+
+	do {
+		freelist = page->freelist;
+		counters = page->counters;
+		new.counters = counters;
+		VM_BUG_ON(!new.frozen);
+
+		new.inuse = page->objects;
+		new.frozen = freelist != NULL;
+
+	} while (!cmpxchg_double_slab(s, page,
+		freelist, counters,
+		NULL, new.counters,
+		"get_freelist"));
+
+	return freelist;
+}
+
+/*
  * Slow path. The lockless freelist is empty or we need to perform
  * debugging duties.
  *
@@ -2047,8 +2078,6 @@ static void *__slab_alloc(struct kmem_ca
 	void **object;
 	struct page *page;
 	unsigned long flags;
-	struct page new;
-	unsigned long counters;
 
 	local_irq_save(flags);
 #ifdef CONFIG_PREEMPT
@@ -2074,29 +2103,7 @@ static void *__slab_alloc(struct kmem_ca
 
 	stat(s, ALLOC_SLOWPATH);
 
-	do {
-		object = page->freelist;
-		counters = page->counters;
-		new.counters = counters;
-		VM_BUG_ON(!new.frozen);
-
-		/*
-		 * If there is no object left then we use this loop to
-		 * deactivate the slab which is simple since no objects
-		 * are left in the slab and therefore we do not need to
-		 * put the page back onto the partial list.
-		 *
-		 * If there are objects left then we retrieve them
-		 * and use them to refill the per cpu queue.
-		*/
-
-		new.inuse = page->objects;
-		new.frozen = object != NULL;
-
-	} while (!__cmpxchg_double_slab(s, page,
-			object, counters,
-			NULL, new.counters,
-			"__slab_alloc"));
+	object = get_freelist(s, page);
 
 	if (unlikely(!object)) {
 		c->page = NULL;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [slub rfc1 06/12] slub: Use freelist instead of "object" in __slab_alloc
  2011-09-02 20:46 [slub rfc1 00/12] slub: RFC lockless allocation paths V1 Christoph Lameter
                   ` (4 preceding siblings ...)
  2011-09-02 20:47 ` [slub rfc1 05/12] slub: Extract get_freelist from __slab_alloc Christoph Lameter
@ 2011-09-02 20:47 ` Christoph Lameter
  2011-09-02 20:47 ` [slub rfc1 07/12] slub: pass page to node_match() instead of kmem_cache_cpu structure Christoph Lameter
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2011-09-02 20:47 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, linux-mm

[-- Attachment #1: use_freelist_instead_of_object --]
[-- Type: text/plain, Size: 3190 bytes --]

The variable "object" really refers to a list of objects that we
are handling. Since the lockless allocator path will depend on it
we rename the variable now.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 mm/slub.c |   29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-09-02 08:20:32.491219417 -0500
+++ linux-2.6/mm/slub.c	2011-09-02 08:20:39.221219372 -0500
@@ -2075,7 +2075,7 @@ static inline void *get_freelist(struct
 static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 			  unsigned long addr, struct kmem_cache_cpu *c)
 {
-	void **object;
+	void *freelist;
 	struct page *page;
 	unsigned long flags;
 
@@ -2089,13 +2089,15 @@ static void *__slab_alloc(struct kmem_ca
 	c = this_cpu_ptr(s->cpu_slab);
 #endif
 
+	freelist = c->freelist;
 	page = c->page;
 	if (!page)
 		goto new_slab;
 
+
 	if (unlikely(!node_match(c, node))) {
 		stat(s, ALLOC_NODE_MISMATCH);
-		deactivate_slab(s, c->page, c->freelist);
+		deactivate_slab(s, page, freelist);
 		c->page = NULL;
 		c->freelist = NULL;
 		goto new_slab;
@@ -2103,9 +2105,9 @@ static void *__slab_alloc(struct kmem_ca
 
 	stat(s, ALLOC_SLOWPATH);
 
-	object = get_freelist(s, page);
+	freelist = get_freelist(s, page);
 
-	if (unlikely(!object)) {
+	if (unlikely(!freelist)) {
 		c->page = NULL;
 		stat(s, DEACTIVATE_BYPASS);
 		goto new_slab;
@@ -2114,18 +2116,21 @@ static void *__slab_alloc(struct kmem_ca
 	stat(s, ALLOC_REFILL);
 
 load_freelist:
+	/*
+	 * freelist is pointing to the list of objects to be used.
+	 * page is pointing to the page from which the objects are obtained.
+	 */
 	VM_BUG_ON(!page->frozen);
-	c->freelist = get_freepointer(s, object);
+	c->freelist = get_freepointer(s, freelist);
 	c->tid = next_tid(c->tid);
 	local_irq_restore(flags);
-	return object;
+	return freelist;
 
 new_slab:
 	page = get_partial(s, gfpflags, node);
 	if (page) {
 		stat(s, ALLOC_FROM_PARTIAL);
-		object = c->freelist;
-
+		freelist = c->freelist;
 		if (kmem_cache_debug(s))
 			goto debug;
 		goto load_freelist;
@@ -2142,7 +2147,7 @@ new_slab:
 		 * No other reference to the page yet so we can
 		 * muck around with it freely without cmpxchg
 		 */
-		object = page->freelist;
+		freelist = page->freelist;
 		page->freelist = NULL;
 		page->inuse = page->objects;
 
@@ -2159,14 +2164,14 @@ new_slab:
 	return NULL;
 
 debug:
-	if (!object || !alloc_debug_processing(s, page, object, addr))
+	if (!freelist || !alloc_debug_processing(s, page, freelist, addr))
 		goto new_slab;
 
-	deactivate_slab(s, c->page, get_freepointer(s, object));
+	deactivate_slab(s, c->page, get_freepointer(s, freelist));
 	c->page = NULL;
 	c->freelist = NULL;
 	local_irq_restore(flags);
-	return object;
+	return freelist;
 }
 
 /*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [slub rfc1 07/12] slub: pass page to node_match() instead of kmem_cache_cpu structure
  2011-09-02 20:46 [slub rfc1 00/12] slub: RFC lockless allocation paths V1 Christoph Lameter
                   ` (5 preceding siblings ...)
  2011-09-02 20:47 ` [slub rfc1 06/12] slub: Use freelist instead of "object" in __slab_alloc Christoph Lameter
@ 2011-09-02 20:47 ` Christoph Lameter
  2011-09-02 20:47 ` [slub rfc1 08/12] slub: enable use of deactivate_slab with interrupts on Christoph Lameter
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2011-09-02 20:47 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, linux-mm

[-- Attachment #1: page_parameter_to_node_match --]
[-- Type: text/plain, Size: 1912 bytes --]

The page field will go away and so its more convenient to pass the
page struct to kmem_cache_cpu instead.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 mm/slub.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-09-02 08:20:39.221219372 -0500
+++ linux-2.6/mm/slub.c	2011-09-02 08:20:45.611219333 -0500
@@ -1951,10 +1951,10 @@ static void flush_all(struct kmem_cache
  * Check if the objects in a per cpu structure fit numa
  * locality expectations.
  */
-static inline int node_match(struct kmem_cache_cpu *c, int node)
+static inline int node_match(struct page *page, int node)
 {
 #ifdef CONFIG_NUMA
-	if (node != NUMA_NO_NODE && page_to_nid(c->page) != node)
+	if (node != NUMA_NO_NODE && page_to_nid(page) != node)
 		return 0;
 #endif
 	return 1;
@@ -2095,7 +2095,7 @@ static void *__slab_alloc(struct kmem_ca
 		goto new_slab;
 
 
-	if (unlikely(!node_match(c, node))) {
+	if (unlikely(!node_match(page, node))) {
 		stat(s, ALLOC_NODE_MISMATCH);
 		deactivate_slab(s, page, freelist);
 		c->page = NULL;
@@ -2189,6 +2189,7 @@ static __always_inline void *slab_alloc(
 {
 	void **object;
 	struct kmem_cache_cpu *c;
+	struct page *page;
 	unsigned long tid;
 
 	if (slab_pre_alloc_hook(s, gfpflags))
@@ -2214,7 +2215,8 @@ redo:
 	barrier();
 
 	object = c->freelist;
-	if (unlikely(!object || !node_match(c, node)))
+	page = c->page;
+	if (unlikely(!object || !node_match(page, node)))
 
 		object = __slab_alloc(s, gfpflags, node, addr, c);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [slub rfc1 08/12] slub: enable use of deactivate_slab with interrupts on
  2011-09-02 20:46 [slub rfc1 00/12] slub: RFC lockless allocation paths V1 Christoph Lameter
                   ` (6 preceding siblings ...)
  2011-09-02 20:47 ` [slub rfc1 07/12] slub: pass page to node_match() instead of kmem_cache_cpu structure Christoph Lameter
@ 2011-09-02 20:47 ` Christoph Lameter
  2011-09-02 20:47 ` [slub rfc1 09/12] slub: Run deactivate_slab with interrupts enabled Christoph Lameter
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2011-09-02 20:47 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, linux-mm

[-- Attachment #1: allocate_slab_with_irq_enabled --]
[-- Type: text/plain, Size: 1970 bytes --]

Locking needs to change a bit.

Signed-off-by: Christoph Lameter <cl@linux.com>


---
 mm/slub.c |   11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-09-02 08:21:28.931219055 -0500
+++ linux-2.6/mm/slub.c	2011-09-02 08:22:38.141218609 -0500
@@ -1781,6 +1781,7 @@ static void deactivate_slab(struct kmem_
 	int tail = 0;
 	struct page new;
 	struct page old;
+	unsigned long uninitialized_var(flags);
 
 	if (page->freelist) {
 		stat(s, DEACTIVATE_REMOTE_FREES);
@@ -1807,7 +1808,7 @@ static void deactivate_slab(struct kmem_
 			new.inuse--;
 			VM_BUG_ON(!new.frozen);
 
-		} while (!__cmpxchg_double_slab(s, page,
+		} while (!cmpxchg_double_slab(s, page,
 			prior, counters,
 			freelist, new.counters,
 			"drain percpu freelist"));
@@ -1857,7 +1858,7 @@ redo:
 			 * that acquire_slab() will see a slab page that
 			 * is frozen
 			 */
-			spin_lock(&n->list_lock);
+			spin_lock_irqsave(&n->list_lock, flags);
 		}
 	} else {
 		m = M_FULL;
@@ -1868,7 +1869,7 @@ redo:
 			 * slabs from diagnostic functions will not see
 			 * any frozen slabs.
 			 */
-			spin_lock(&n->list_lock);
+			spin_lock_irqsave(&n->list_lock, flags);
 		}
 	}
 
@@ -1896,14 +1897,14 @@ redo:
 	}
 
 	l = m;
-	if (!__cmpxchg_double_slab(s, page,
+	if (!cmpxchg_double_slab(s, page,
 				old.freelist, old.counters,
 				new.freelist, new.counters,
 				"unfreezing slab"))
 		goto redo;
 
 	if (lock)
-		spin_unlock(&n->list_lock);
+		spin_unlock_irqrestore(&n->list_lock, flags);
 
 	if (m == M_FREE) {
 		stat(s, DEACTIVATE_EMPTY);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [slub rfc1 09/12] slub: Run deactivate_slab with interrupts enabled
  2011-09-02 20:46 [slub rfc1 00/12] slub: RFC lockless allocation paths V1 Christoph Lameter
                   ` (7 preceding siblings ...)
  2011-09-02 20:47 ` [slub rfc1 08/12] slub: enable use of deactivate_slab with interrupts on Christoph Lameter
@ 2011-09-02 20:47 ` Christoph Lameter
  2011-09-02 20:47 ` [slub rfc1 10/12] slub: Enable use of get_partial " Christoph Lameter
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2011-09-02 20:47 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, linux-mm

[-- Attachment #1: irq_enabled_deactivate_slab --]
[-- Type: text/plain, Size: 1297 bytes --]

Do not enable and disable interrupts if we were called with interrupts
enabled.

Signed-off-by: Christoph Lameter <cl@linux.com>


---
 mm/slub.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-09-02 08:22:38.141218609 -0500
+++ linux-2.6/mm/slub.c	2011-09-02 08:22:48.311218548 -0500
@@ -1348,10 +1348,11 @@ static struct page *allocate_slab(struct
 	struct page *page;
 	struct kmem_cache_order_objects oo = s->oo;
 	gfp_t alloc_gfp;
+	int irqs_were_disabled = irqs_disabled();
 
 	flags &= gfp_allowed_mask;
 
-	if (flags & __GFP_WAIT)
+	if (irqs_were_disabled && flags & __GFP_WAIT)
 		local_irq_enable();
 
 	flags |= s->allocflags;
@@ -1375,7 +1376,7 @@ static struct page *allocate_slab(struct
 			stat(s, ORDER_FALLBACK);
 	}
 
-	if (flags & __GFP_WAIT)
+	if (irqs_were_disabled && flags & __GFP_WAIT)
 		local_irq_disable();
 
 	if (!page)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [slub rfc1 10/12] slub: Enable use of get_partial with interrupts enabled
  2011-09-02 20:46 [slub rfc1 00/12] slub: RFC lockless allocation paths V1 Christoph Lameter
                   ` (8 preceding siblings ...)
  2011-09-02 20:47 ` [slub rfc1 09/12] slub: Run deactivate_slab with interrupts enabled Christoph Lameter
@ 2011-09-02 20:47 ` Christoph Lameter
  2011-09-02 20:47 ` [slub rfc1 11/12] slub: Remove kmem_cache_cpu dependency from acquire slab Christoph Lameter
  2011-09-02 20:47 ` [slub rfc1 12/12] slub: Drop page field from kmem_cache_cpu Christoph Lameter
  11 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2011-09-02 20:47 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, linux-mm

[-- Attachment #1: irq_enabled_acquire_slab --]
[-- Type: text/plain, Size: 1318 bytes --]

Need to disable interrupts when taking the node list lock.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 mm/slub.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-09-02 10:12:03.601176577 -0500
+++ linux-2.6/mm/slub.c	2011-09-02 10:12:25.981176437 -0500
@@ -1609,6 +1609,7 @@ static struct page *get_partial_node(str
 					struct kmem_cache_node *n)
 {
 	struct page *page;
+	unsigned long flags;
 
 	/*
 	 * Racy check. If we mistakenly see no partial slabs then we
@@ -1619,13 +1620,13 @@ static struct page *get_partial_node(str
 	if (!n || !n->nr_partial)
 		return NULL;
 
-	spin_lock(&n->list_lock);
+	spin_lock_irqsave(&n->list_lock, flags);
 	list_for_each_entry(page, &n->partial, lru)
 		if (acquire_slab(s, n, page))
 			goto out;
 	page = NULL;
 out:
-	spin_unlock(&n->list_lock);
+	spin_unlock_irqrestore(&n->list_lock, flags);
 	return page;
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [slub rfc1 11/12] slub: Remove kmem_cache_cpu dependency from acquire slab
  2011-09-02 20:46 [slub rfc1 00/12] slub: RFC lockless allocation paths V1 Christoph Lameter
                   ` (9 preceding siblings ...)
  2011-09-02 20:47 ` [slub rfc1 10/12] slub: Enable use of get_partial " Christoph Lameter
@ 2011-09-02 20:47 ` Christoph Lameter
  2011-09-02 20:47 ` [slub rfc1 12/12] slub: Drop page field from kmem_cache_cpu Christoph Lameter
  11 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2011-09-02 20:47 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, linux-mm

[-- Attachment #1: remove_kmem_cache_cpu_dependency_from_acquire_slab --]
[-- Type: text/plain, Size: 2363 bytes --]

Instead of putting the freepointer into the kmem_cache_cpu structure put it
into the page struct reusing the lru.next field.

Also convert the manual warning into a WARN_ON.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 mm/slub.c |   41 +++++++++++++++--------------------------
 1 file changed, 15 insertions(+), 26 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-09-02 10:12:25.981176437 -0500
+++ linux-2.6/mm/slub.c	2011-09-02 10:12:30.881176403 -0500
@@ -1569,37 +1569,25 @@ static inline int acquire_slab(struct km
 	 * The old freelist is the list of objects for the
 	 * per cpu allocation list.
 	 */
-	do {
-		freelist = page->freelist;
-		counters = page->counters;
-		new.counters = counters;
-		new.inuse = page->objects;
+	freelist = page->freelist;
+	counters = page->counters;
+	new.counters = counters;
+	new.inuse = page->objects;
 
-		VM_BUG_ON(new.frozen);
-		new.frozen = 1;
+	VM_BUG_ON(new.frozen);
+	new.frozen = 1;
 
-	} while (!__cmpxchg_double_slab(s, page,
+	if (!__cmpxchg_double_slab(s, page,
 			freelist, counters,
 			NULL, new.counters,
-			"lock and freeze"));
-
-	remove_partial(n, page);
+			"acquire_slab"))
 
-	if (freelist) {
-		/* Populate the per cpu freelist */
-		this_cpu_write(s->cpu_slab->freelist, freelist);
-		this_cpu_write(s->cpu_slab->page, page);
-		return 1;
-	} else {
-		/*
-		 * Slab page came from the wrong list. No object to allocate
-		 * from. Put it onto the correct list and continue partial
-		 * scan.
-		 */
-		printk(KERN_ERR "SLUB: %s : Page without available objects on"
-			" partial list\n", s->name);
 		return 0;
-	}
+
+	remove_partial(n, page);
+	WARN_ON(!freelist);
+	page->lru.next = freelist;
+	return 1;
 }
 
 /*
@@ -2133,7 +2121,8 @@ new_slab:
 	page = get_partial(s, gfpflags, node);
 	if (page) {
 		stat(s, ALLOC_FROM_PARTIAL);
-		freelist = c->freelist;
+		freelist = page->lru.next;
+		c->page  = page;
 		if (kmem_cache_debug(s))
 			goto debug;
 		goto load_freelist;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [slub rfc1 12/12] slub: Drop page field from kmem_cache_cpu
  2011-09-02 20:46 [slub rfc1 00/12] slub: RFC lockless allocation paths V1 Christoph Lameter
                   ` (10 preceding siblings ...)
  2011-09-02 20:47 ` [slub rfc1 11/12] slub: Remove kmem_cache_cpu dependency from acquire slab Christoph Lameter
@ 2011-09-02 20:47 ` Christoph Lameter
  11 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2011-09-02 20:47 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, linux-mm

[-- Attachment #1: drop_kmem_cache_cpu_page --]
[-- Type: text/plain, Size: 8043 bytes --]

The page field can be calculated from the freelist pointer because

	page == virt_to_head_page(object)

This introduces additional inefficiencies since the calculation is complex.

We then end up with a special case for freelist == NULL because we can then no
longer determine which page is the active per cpu slab. Therefore we must
deactivate the slab page when the last object is allocated from the per cpu
list.

This patch in effect makes the slub allocator paths also lockless and no longer
requiring a disabling of interrupts or preemption.

Signed-off-by: Christoph Lameter <cl@linux.com>



---
 mm/slub.c |  140 ++++++++++++++++++++++++++++++++++++++++----------------------
 1 file changed, 92 insertions(+), 48 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-09-02 10:12:30.881176403 -0500
+++ linux-2.6/mm/slub.c	2011-09-02 10:12:50.291176282 -0500
@@ -1906,11 +1906,11 @@ redo:
 static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 {
 	stat(s, CPUSLAB_FLUSH);
-	deactivate_slab(s, c->page, c->freelist);
-
-	c->tid = next_tid(c->tid);
-	c->page = NULL;
-	c->freelist = NULL;
+	if (c->freelist) {
+		deactivate_slab(s, virt_to_head_page(c->freelist), c->freelist);
+		c->tid = next_tid(c->tid);
+		c->freelist = NULL;
+}
 }
 
 /*
@@ -1922,7 +1922,7 @@ static inline void __flush_cpu_slab(stru
 {
 	struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);
 
-	if (likely(c && c->page))
+	if (likely(c && c->freelist))
 		flush_slab(s, c);
 }
 
@@ -2015,6 +2015,55 @@ slab_out_of_memory(struct kmem_cache *s,
 }
 
 /*
+ * Retrieve pointer to the current freelist and
+ * zap the per cpu object list.
+ *
+ * Returns NULL if there was no object on the freelist.
+ */
+void *get_cpu_objects(struct kmem_cache *s)
+{
+	void *freelist;
+	unsigned long tid;
+
+	do {
+		struct kmem_cache_cpu *c = this_cpu_ptr(s->cpu_slab);
+
+		tid = c->tid;
+		barrier();
+		freelist = c->freelist;
+		if (!freelist)
+			return NULL;
+
+	} while (!this_cpu_cmpxchg_double(s->cpu_slab->freelist, s->cpu_slab->tid,
+			freelist, tid,
+			NULL, next_tid(tid)));
+
+	return freelist;
+}
+
+/*
+ * Set the per cpu object list to the freelist. The page must
+ * be frozen.
+ *
+ * Page will be unfrozen (and the freelist object put onto the pages freelist)
+ * if the per cpu freelist has been used in the meantime.
+ */
+static inline void put_cpu_objects(struct kmem_cache *s,
+				struct page *page, void *freelist)
+{
+	unsigned long tid = this_cpu_read(s->cpu_slab->tid);
+
+	VM_BUG_ON(!page->frozen);
+	if (!irqsafe_cpu_cmpxchg_double(s->cpu_slab->freelist, s->cpu_slab->tid,
+		NULL, tid, freelist, next_tid(tid)))
+
+		/*
+		 * There was an intervening free or alloc. Cannot free to the
+		 * per cpu queue. Must unfreeze page.
+		 */
+		deactivate_slab(s, page, freelist);
+}
+/*
  * Check the page->freelist of a page and either transfer the freelist to the per cpu freelist
  * or deactivate the page.
  *
@@ -2064,33 +2113,21 @@ static inline void *get_freelist(struct
  * a call to the page allocator and the setup of a new slab.
  */
 static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
-			  unsigned long addr, struct kmem_cache_cpu *c)
+			  unsigned long addr)
 {
-	void *freelist;
+	void *freelist, *next;
 	struct page *page;
-	unsigned long flags;
 
-	local_irq_save(flags);
-#ifdef CONFIG_PREEMPT
-	/*
-	 * We may have been preempted and rescheduled on a different
-	 * cpu before disabling interrupts. Need to reload cpu area
-	 * pointer.
-	 */
-	c = this_cpu_ptr(s->cpu_slab);
-#endif
-
-	freelist = c->freelist;
-	page = c->page;
-	if (!page)
+	freelist = get_cpu_objects(s);
+	if (!freelist)
 		goto new_slab;
 
+	page = virt_to_head_page(freelist);
+	BUG_ON(!page->frozen);
 
 	if (unlikely(!node_match(page, node))) {
 		stat(s, ALLOC_NODE_MISMATCH);
 		deactivate_slab(s, page, freelist);
-		c->page = NULL;
-		c->freelist = NULL;
 		goto new_slab;
 	}
 
@@ -2099,7 +2136,6 @@ static void *__slab_alloc(struct kmem_ca
 	freelist = get_freelist(s, page);
 
 	if (unlikely(!freelist)) {
-		c->page = NULL;
 		stat(s, DEACTIVATE_BYPASS);
 		goto new_slab;
 	}
@@ -2111,10 +2147,19 @@ load_freelist:
 	 * freelist is pointing to the list of objects to be used.
 	 * page is pointing to the page from which the objects are obtained.
 	 */
+	next = get_freepointer(s, freelist);
 	VM_BUG_ON(!page->frozen);
-	c->freelist = get_freepointer(s, freelist);
-	c->tid = next_tid(c->tid);
-	local_irq_restore(flags);
+
+	if (!next)
+		/*
+		 * last object so we either unfreeze the page or
+		 * get more objects.
+		 */
+		next = get_freelist(s, page);
+
+	if (next)
+		put_cpu_objects(s, page, next);
+
 	return freelist;
 
 new_slab:
@@ -2122,7 +2167,6 @@ new_slab:
 	if (page) {
 		stat(s, ALLOC_FROM_PARTIAL);
 		freelist = page->lru.next;
-		c->page  = page;
 		if (kmem_cache_debug(s))
 			goto debug;
 		goto load_freelist;
@@ -2131,10 +2175,6 @@ new_slab:
 	page = new_slab(s, gfpflags, node);
 
 	if (page) {
-		c = __this_cpu_ptr(s->cpu_slab);
-		if (c->page)
-			flush_slab(s, c);
-
 		/*
 		 * No other reference to the page yet so we can
 		 * muck around with it freely without cmpxchg
@@ -2144,7 +2184,6 @@ new_slab:
 		page->inuse = page->objects;
 
 		stat(s, ALLOC_SLAB);
-		c->page = page;
 
 		if (kmem_cache_debug(s))
 			goto debug;
@@ -2152,17 +2191,13 @@ new_slab:
 	}
 	if (!(gfpflags & __GFP_NOWARN) && printk_ratelimit())
 		slab_out_of_memory(s, gfpflags, node);
-	local_irq_restore(flags);
 	return NULL;
 
 debug:
 	if (!freelist || !alloc_debug_processing(s, page, freelist, addr))
 		goto new_slab;
 
-	deactivate_slab(s, c->page, get_freepointer(s, freelist));
-	c->page = NULL;
-	c->freelist = NULL;
-	local_irq_restore(flags);
+	deactivate_slab(s, page, get_freepointer(s, freelist));
 	return freelist;
 }
 
@@ -2207,12 +2242,13 @@ redo:
 	barrier();
 
 	object = c->freelist;
-	page = c->page;
-	if (unlikely(!object || !node_match(page, node)))
+	if (unlikely(!object || !node_match((page = virt_to_head_page(object)), node)))
 
-		object = __slab_alloc(s, gfpflags, node, addr, c);
+		object = __slab_alloc(s, gfpflags, node, addr);
 
 	else {
+		void *next = get_freepointer_safe(s, object);
+
 		/*
 		 * The cmpxchg will only match if there was no additional
 		 * operation and if we are on the right processor.
@@ -2228,12 +2264,18 @@ redo:
 		if (unlikely(!irqsafe_cpu_cmpxchg_double(
 				s->cpu_slab->freelist, s->cpu_slab->tid,
 				object, tid,
-				get_freepointer_safe(s, object), next_tid(tid)))) {
+				next, next_tid(tid)))) {
 
 			note_cmpxchg_failure("slab_alloc", s, tid);
 			goto redo;
 		}
 		stat(s, ALLOC_FASTPATH);
+		if (!next) {
+			next = get_freelist(s, page);
+			if (next)
+				/* Refill the per cpu queue */
+				put_cpu_objects(s, page, next);
+		}
 	}
 
 	if (unlikely(gfpflags & __GFP_ZERO) && object)
@@ -2432,7 +2474,7 @@ redo:
 	tid = c->tid;
 	barrier();
 
-	if (likely(page == c->page)) {
+	if (c->freelist && likely(page == virt_to_head_page(c->freelist))) {
 		set_freepointer(s, object, c->freelist);
 
 		if (unlikely(!irqsafe_cpu_cmpxchg_double(
@@ -4318,16 +4360,18 @@ static ssize_t show_slab_objects(struct
 
 		for_each_possible_cpu(cpu) {
 			struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);
+			struct page *page;
 
 			if (!c || !c->freelist)
 				continue;
 
-			node = page_to_nid(c->page);
-			if (c->page) {
+			page = virt_to_head_page(c->freelist);
+			node = page_to_nid(page);
+			if (page) {
 					if (flags & SO_TOTAL)
-						x = c->page->objects;
+						x = page->objects;
 				else if (flags & SO_OBJECTS)
-					x = c->page->inuse;
+					x = page->inuse;
 				else
 					x = 1;
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-09-02 20:48 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-02 20:46 [slub rfc1 00/12] slub: RFC lockless allocation paths V1 Christoph Lameter
2011-09-02 20:46 ` [slub rfc1 01/12] slub: free slabs without holding locks (V2) Christoph Lameter
2011-09-02 20:46 ` [slub rfc1 02/12] slub: Remove useless statements in __slab_alloc Christoph Lameter
2011-09-02 20:47 ` [slub rfc1 03/12] slub: Get rid of the node field Christoph Lameter
2011-09-02 20:47 ` [slub rfc1 04/12] slub: Separate out kmem_cache_cpu processing from deactivate_slab Christoph Lameter
2011-09-02 20:47 ` [slub rfc1 05/12] slub: Extract get_freelist from __slab_alloc Christoph Lameter
2011-09-02 20:47 ` [slub rfc1 06/12] slub: Use freelist instead of "object" in __slab_alloc Christoph Lameter
2011-09-02 20:47 ` [slub rfc1 07/12] slub: pass page to node_match() instead of kmem_cache_cpu structure Christoph Lameter
2011-09-02 20:47 ` [slub rfc1 08/12] slub: enable use of deactivate_slab with interrupts on Christoph Lameter
2011-09-02 20:47 ` [slub rfc1 09/12] slub: Run deactivate_slab with interrupts enabled Christoph Lameter
2011-09-02 20:47 ` [slub rfc1 10/12] slub: Enable use of get_partial " Christoph Lameter
2011-09-02 20:47 ` [slub rfc1 11/12] slub: Remove kmem_cache_cpu dependency from acquire slab Christoph Lameter
2011-09-02 20:47 ` [slub rfc1 12/12] slub: Drop page field from kmem_cache_cpu Christoph Lameter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.