All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/3] re-shrink 'struct page' when SLUB is on.
@ 2013-12-11 22:40 ` Dave Hansen
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Hansen @ 2013-12-11 22:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, cl, kirill.shutemov, Andi Kleen, Dave Hansen

OK, here's the real reason I was digging around for exactly which
fields in 'struct page' slub uses.

SLUB depends on a 16-byte cmpxchg for an optimization.  For the
purposes of this series, I'm assuming that it is a very important
optimization that we desperately need to keep around.

In order to get guaranteed 16-byte alignment (required by the
hardware on x86), 'struct page' is padded out from 56 to 64
bytes.

Those 8-bytes matter.  We've gone to great lengths to keep
'struct page' small in the past.  It's a shame that we bloat it
now just for alignment reasons when we have extra space.  These
patches attempt _internal_ alignment instead of external
alignment for slub.

I also got a bug report from some folks running a large database
benchmark.  Their old kernel uses slab and their new one uses
slub.  They were swapping and couldn't figure out why.  It turned
out to be the 2GB of RAM that the slub padding wastes on their
system.

On my box, that 2GB cost about $200 to populate back when we
bought it.  I want my $200 back.

This is compile tested and lightly runtime tested.  I'm curious
what people think of it before we push it futher.  This is on top
of my previous patch to create a 'struct slab_page', but doesn't
really rely on it.  It could easily be done independently.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC][PATCH 0/3] re-shrink 'struct page' when SLUB is on.
@ 2013-12-11 22:40 ` Dave Hansen
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Hansen @ 2013-12-11 22:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, cl, kirill.shutemov, Andi Kleen, Dave Hansen

OK, here's the real reason I was digging around for exactly which
fields in 'struct page' slub uses.

SLUB depends on a 16-byte cmpxchg for an optimization.  For the
purposes of this series, I'm assuming that it is a very important
optimization that we desperately need to keep around.

In order to get guaranteed 16-byte alignment (required by the
hardware on x86), 'struct page' is padded out from 56 to 64
bytes.

Those 8-bytes matter.  We've gone to great lengths to keep
'struct page' small in the past.  It's a shame that we bloat it
now just for alignment reasons when we have extra space.  These
patches attempt _internal_ alignment instead of external
alignment for slub.

I also got a bug report from some folks running a large database
benchmark.  Their old kernel uses slab and their new one uses
slub.  They were swapping and couldn't figure out why.  It turned
out to be the 2GB of RAM that the slub padding wastes on their
system.

On my box, that 2GB cost about $200 to populate back when we
bought it.  I want my $200 back.

This is compile tested and lightly runtime tested.  I'm curious
what people think of it before we push it futher.  This is on top
of my previous patch to create a 'struct slab_page', but doesn't
really rely on it.  It could easily be done independently.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC][PATCH 1/3] mm: slab: create helpers for slab ->freelist pointer
  2013-12-11 22:40 ` Dave Hansen
@ 2013-12-11 22:40   ` Dave Hansen
  -1 siblings, 0 replies; 18+ messages in thread
From: Dave Hansen @ 2013-12-11 22:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, cl, kirill.shutemov, Andi Kleen, Dave Hansen


We have a need to move the ->freelist data around 'struct page'
in order to keep a cmpxchg aligned.  First step is to add an
accessor function which we will hook in to in the next patch.

I'm not super-happy with how this looks.  It's a bit ugly, but it
does work.  I'm open to some better suggestions for how to do
this.

---

 linux.git-davehans/include/linux/mm_types.h |    2 
 linux.git-davehans/mm/slab.c                |   25 ++++----
 linux.git-davehans/mm/slob.c                |   34 +++++++---
 linux.git-davehans/mm/slub.c                |   87 +++++++++++++++-------------
 4 files changed, 87 insertions(+), 61 deletions(-)

diff -puN include/linux/mm_types.h~slub-internally-align-freelist-and-counters include/linux/mm_types.h
--- linux.git/include/linux/mm_types.h~slub-internally-align-freelist-and-counters	2013-12-11 13:19:54.001948772 -0800
+++ linux.git-davehans/include/linux/mm_types.h	2013-12-11 13:19:54.010949170 -0800
@@ -143,7 +143,7 @@ struct slab_page {
 	void *s_mem;			/* slab first object */
 
 	/* Second double word */
-	void *freelist;		/* sl[aou]b first free object */
+	void *_freelist;		/* sl[aou]b first free object */
 
 	union {
 #if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
diff -puN mm/slab.c~slub-internally-align-freelist-and-counters mm/slab.c
--- linux.git/mm/slab.c~slub-internally-align-freelist-and-counters	2013-12-11 13:19:54.003948861 -0800
+++ linux.git-davehans/mm/slab.c	2013-12-11 13:19:54.011949215 -0800
@@ -1950,6 +1950,16 @@ static void slab_destroy_debugcheck(stru
 }
 #endif
 
+static inline unsigned int **slab_freelist_ptr(struct slab_page *page)
+{
+	return (unsigned int **)&page->_freelist;
+}
+
+static inline unsigned int *slab_freelist(struct slab_page *page)
+{
+	return *slab_freelist_ptr(page);
+}
+
 /**
  * slab_destroy - destroy and release all objects in a slab
  * @cachep: cache pointer being destroyed
@@ -1963,7 +1973,7 @@ static void slab_destroy(struct kmem_cac
 {
 	void *freelist;
 
-	freelist = page->freelist;
+	freelist = slab_freelist(page);
 	slab_destroy_debugcheck(cachep, page);
 	if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU)) {
 		struct rcu_head *head;
@@ -2549,11 +2559,6 @@ static void *alloc_slabmgmt(struct kmem_
 	return freelist;
 }
 
-static inline unsigned int *slab_freelist(struct slab_page *page)
-{
-	return (unsigned int *)(page->freelist);
-}
-
 static void cache_init_objs(struct kmem_cache *cachep,
 			    struct slab_page *page)
 {
@@ -2615,7 +2620,7 @@ static void *slab_get_obj(struct kmem_ca
 {
 	void *objp;
 
-	objp = index_to_obj(cachep, page, slab_freelist(page)[page->active]);
+	objp = index_to_obj(cachep, page, (slab_freelist(page))[page->active]);
 	page->active++;
 #if DEBUG
 	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
@@ -2636,7 +2641,7 @@ static void slab_put_obj(struct kmem_cac
 
 	/* Verify double free bug */
 	for (i = page->active; i < cachep->num; i++) {
-		if (slab_freelist(page)[i] == objnr) {
+		if ((slab_freelist(page))[i] == objnr) {
 			printk(KERN_ERR "slab: double free detected in cache "
 					"'%s', objp %p\n", cachep->name, objp);
 			BUG();
@@ -2656,7 +2661,7 @@ static void slab_map_pages(struct kmem_c
 			   void *freelist)
 {
 	page->slab_cache = cache;
-	page->freelist = freelist;
+	*slab_freelist_ptr(page) = freelist;
 }
 
 /*
@@ -4217,7 +4222,7 @@ static void handle_slab(unsigned long *n
 
 		for (j = page->active; j < c->num; j++) {
 			/* Skip freed item */
-			if (slab_freelist(page)[j] == i) {
+			if ((slab_freelist(page))[j] == i) {
 				active = false;
 				break;
 			}
diff -puN mm/slob.c~slub-internally-align-freelist-and-counters mm/slob.c
--- linux.git/mm/slob.c~slub-internally-align-freelist-and-counters	2013-12-11 13:19:54.004948905 -0800
+++ linux.git-davehans/mm/slob.c	2013-12-11 13:19:54.012949259 -0800
@@ -211,6 +211,16 @@ static void slob_free_pages(void *b, int
 	free_pages((unsigned long)b, order);
 }
 
+static inline void **slab_freelist_ptr(struct slab_page *sp)
+{
+	return &sp->_freelist;
+}
+
+static inline void *slab_freelist(struct slab_page *sp)
+{
+	return *slab_freelist_ptr(sp);
+}
+
 /*
  * Allocate a slob block within a given slob_page sp.
  */
@@ -219,7 +229,7 @@ static void *slob_page_alloc(struct slab
 	slob_t *prev, *cur, *aligned = NULL;
 	int delta = 0, units = SLOB_UNITS(size);
 
-	for (prev = NULL, cur = sp->freelist; ; prev = cur, cur = slob_next(cur)) {
+	for (prev = NULL, cur = slab_freelist(sp); ; prev = cur, cur = slob_next(cur)) {
 		slobidx_t avail = slob_units(cur);
 
 		if (align) {
@@ -243,12 +253,12 @@ static void *slob_page_alloc(struct slab
 				if (prev)
 					set_slob(prev, slob_units(prev), next);
 				else
-					sp->freelist = next;
+					*slab_freelist_ptr(sp) = next;
 			} else { /* fragment */
 				if (prev)
 					set_slob(prev, slob_units(prev), cur + units);
 				else
-					sp->freelist = cur + units;
+					*slab_freelist_ptr(sp) = cur + units;
 				set_slob(cur + units, avail - units, next);
 			}
 
@@ -322,7 +332,7 @@ static void *slob_alloc(size_t size, gfp
 
 		spin_lock_irqsave(&slob_lock, flags);
 		sp->units = SLOB_UNITS(PAGE_SIZE);
-		sp->freelist = b;
+		*slab_freelist_ptr(sp) = b;
 		INIT_LIST_HEAD(&sp->list);
 		set_slob(b, SLOB_UNITS(PAGE_SIZE), b + SLOB_UNITS(PAGE_SIZE));
 		set_slob_page_free(sp, slob_list);
@@ -369,7 +379,7 @@ static void slob_free(void *block, int s
 	if (!slob_page_free(sp)) {
 		/* This slob page is about to become partially free. Easy! */
 		sp->units = units;
-		sp->freelist = b;
+		*slab_freelist_ptr(sp) = b;
 		set_slob(b, units,
 			(void *)((unsigned long)(b +
 					SLOB_UNITS(PAGE_SIZE)) & PAGE_MASK));
@@ -389,15 +399,15 @@ static void slob_free(void *block, int s
 	 */
 	sp->units += units;
 
-	if (b < (slob_t *)sp->freelist) {
-		if (b + units == sp->freelist) {
-			units += slob_units(sp->freelist);
-			sp->freelist = slob_next(sp->freelist);
+	if (b < (slob_t *)slab_freelist(sp)) {
+		if (b + units == slab_freelist(sp)) {
+			units += slob_units(slab_freelist(sp));
+			*slab_freelist_ptr(sp) = slob_next(slab_freelist(sp));
 		}
-		set_slob(b, units, sp->freelist);
-		sp->freelist = b;
+		set_slob(b, units, slab_freelist(sp));
+		*slab_freelist_ptr(sp) = b;
 	} else {
-		prev = sp->freelist;
+		prev = slab_freelist(sp);
 		next = slob_next(prev);
 		while (b > next) {
 			prev = next;
diff -puN mm/slub.c~slub-internally-align-freelist-and-counters mm/slub.c
--- linux.git/mm/slub.c~slub-internally-align-freelist-and-counters	2013-12-11 13:19:54.006948993 -0800
+++ linux.git-davehans/mm/slub.c	2013-12-11 13:19:54.014949347 -0800
@@ -52,7 +52,7 @@
  *   The slab_lock is only used for debugging and on arches that do not
  *   have the ability to do a cmpxchg_double. It only protects the second
  *   double word in the page struct. Meaning
- *	A. page->freelist	-> List of object free in a page
+ *	A. slab_freelist(page)	-> (pointer to) list of object free in a page
  *	B. page->counters	-> Counters of objects
  *	C. page->frozen		-> frozen state
  *
@@ -228,6 +228,16 @@ static inline void stat(const struct kme
 #endif
 }
 
+static inline void **slab_freelist_ptr(struct slab_page *spage)
+{
+	return &spage->_freelist;
+}
+
+static inline void *slab_freelist(struct slab_page *spage)
+{
+	return *slab_freelist_ptr(spage);
+}
+
 /********************************************************************
  * 			Core slab cache functions
  *******************************************************************/
@@ -380,7 +390,7 @@ static inline bool __cmpxchg_double_slab
 #if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
     defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
 	if (s->flags & __CMPXCHG_DOUBLE) {
-		if (cmpxchg_double(&page->freelist, &page->counters,
+		if (cmpxchg_double(slab_freelist_ptr(page), &page->counters,
 			freelist_old, counters_old,
 			freelist_new, counters_new))
 		return 1;
@@ -388,9 +398,9 @@ static inline bool __cmpxchg_double_slab
 #endif
 	{
 		slab_lock(page);
-		if (page->freelist == freelist_old &&
+		if (slab_freelist(page) == freelist_old &&
 					page->counters == counters_old) {
-			page->freelist = freelist_new;
+			*slab_freelist_ptr(page) = freelist_new;
 			page->counters = counters_new;
 			slab_unlock(page);
 			return 1;
@@ -416,7 +426,7 @@ static inline bool cmpxchg_double_slab(s
 #if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
     defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
 	if (s->flags & __CMPXCHG_DOUBLE) {
-		if (cmpxchg_double(&page->freelist, &page->counters,
+		if (cmpxchg_double(slab_freelist_ptr(page), &page->counters,
 			freelist_old, counters_old,
 			freelist_new, counters_new))
 		return 1;
@@ -427,9 +437,9 @@ static inline bool cmpxchg_double_slab(s
 
 		local_irq_save(flags);
 		slab_lock(page);
-		if (page->freelist == freelist_old &&
+		if (slab_freelist(page) == freelist_old &&
 					page->counters == counters_old) {
-			page->freelist = freelist_new;
+			*slab_freelist_ptr(page) = freelist_new;
 			page->counters = counters_new;
 			slab_unlock(page);
 			local_irq_restore(flags);
@@ -461,7 +471,7 @@ static void get_map(struct kmem_cache *s
 	void *p;
 	void *addr = slub_page_address(page);
 
-	for (p = page->freelist; p; p = get_freepointer(s, p))
+	for (p = slab_freelist(page); p; p = get_freepointer(s, p))
 		set_bit(slab_index(p, s, addr), map);
 }
 
@@ -572,7 +582,7 @@ static void print_page_info(struct slab_
 {
 	printk(KERN_ERR
 	       "INFO: Slab 0x%p objects=%u used=%u fp=0x%p flags=0x%04lx\n",
-	       page, page->objects, page->inuse, page->freelist, page->flags);
+	       page, page->objects, page->inuse, slab_freelist(page), page->flags);
 
 }
 
@@ -884,7 +894,7 @@ static int on_freelist(struct kmem_cache
 	void *object = NULL;
 	unsigned long max_objects;
 
-	fp = page->freelist;
+	fp = slab_freelist(page);
 	while (fp && nr <= page->objects) {
 		if (fp == search)
 			return 1;
@@ -895,7 +905,7 @@ static int on_freelist(struct kmem_cache
 				set_freepointer(s, object, NULL);
 			} else {
 				slab_err(s, page, "Freepointer corrupt");
-				page->freelist = NULL;
+				*slab_freelist_ptr(page) = NULL;
 				page->inuse = page->objects;
 				slab_fix(s, "Freelist cleared");
 				return 0;
@@ -934,7 +944,7 @@ static void trace(struct kmem_cache *s,
 			s->name,
 			alloc ? "alloc" : "free",
 			object, page->inuse,
-			page->freelist);
+			slab_freelist(page));
 
 		if (!alloc)
 			print_section("Object ", (void *)object,
@@ -1101,7 +1111,7 @@ bad:
 		 */
 		slab_fix(s, "Marking all objects used");
 		page->inuse = page->objects;
-		page->freelist = NULL;
+		*slab_freelist_ptr(page) = NULL;
 	}
 	return 0;
 }
@@ -1440,7 +1450,7 @@ static struct slab_page *new_slab(struct
 	setup_object(s, spage, last);
 	set_freepointer(s, last, NULL);
 
-	spage->freelist = start;
+	*slab_freelist_ptr(spage) = start;
 	spage->inuse = spage->objects;
 	spage->frozen = 1;
 out:
@@ -1570,15 +1580,15 @@ static inline void *acquire_slab(struct
 	 * The old freelist is the list of objects for the
 	 * per cpu allocation list.
 	 */
-	freelist = page->freelist;
+	freelist = slab_freelist(page);
 	counters = page->counters;
 	new.counters = counters;
 	*objects = new.objects - new.inuse;
 	if (mode) {
 		new.inuse = page->objects;
-		new.freelist = NULL;
+		*slab_freelist_ptr(&new) = NULL;
 	} else {
-		new.freelist = freelist;
+		*slab_freelist_ptr(&new) = freelist;
 	}
 
 	VM_BUG_ON(new.frozen);
@@ -1586,7 +1596,8 @@ static inline void *acquire_slab(struct
 
 	if (!__cmpxchg_double_slab(s, page,
 			freelist, counters,
-			new.freelist, new.counters,
+			slab_freelist(&new),
+			new.counters,
 			"acquire_slab"))
 		return NULL;
 
@@ -1812,7 +1823,7 @@ static void deactivate_slab(struct kmem_
 	struct slab_page new;
 	struct slab_page old;
 
-	if (page->freelist) {
+	if (slab_freelist(page)) {
 		stat(s, DEACTIVATE_REMOTE_FREES);
 		tail = DEACTIVATE_TO_TAIL;
 	}
@@ -1830,7 +1841,7 @@ static void deactivate_slab(struct kmem_
 		unsigned long counters;
 
 		do {
-			prior = page->freelist;
+			prior = slab_freelist(page);
 			counters = page->counters;
 			set_freepointer(s, freelist, prior);
 			new.counters = counters;
@@ -1861,7 +1872,7 @@ static void deactivate_slab(struct kmem_
 	 */
 redo:
 
-	old.freelist = page->freelist;
+	*slab_freelist_ptr(&old) = slab_freelist(page);
 	old.counters = page->counters;
 	VM_BUG_ON(!old.frozen);
 
@@ -1869,16 +1880,16 @@ redo:
 	new.counters = old.counters;
 	if (freelist) {
 		new.inuse--;
-		set_freepointer(s, freelist, old.freelist);
-		new.freelist = freelist;
+		set_freepointer(s, freelist, slab_freelist(&old));
+		*slab_freelist_ptr(&new) = freelist;
 	} else
-		new.freelist = old.freelist;
+		*slab_freelist_ptr(&new) = slab_freelist(&old);
 
 	new.frozen = 0;
 
 	if (!new.inuse && n->nr_partial > s->min_partial)
 		m = M_FREE;
-	else if (new.freelist) {
+	else if (slab_freelist(&new)) {
 		m = M_PARTIAL;
 		if (!lock) {
 			lock = 1;
@@ -1927,8 +1938,8 @@ redo:
 
 	l = m;
 	if (!__cmpxchg_double_slab(s, page,
-				old.freelist, old.counters,
-				new.freelist, new.counters,
+				slab_freelist(&old), old.counters,
+				slab_freelist(&new), new.counters,
 				"unfreezing slab"))
 		goto redo;
 
@@ -1973,18 +1984,18 @@ static void unfreeze_partials(struct kme
 
 		do {
 
-			old.freelist = page->freelist;
+			*slab_freelist_ptr(&old) = slab_freelist(page);
 			old.counters = page->counters;
 			VM_BUG_ON(!old.frozen);
 
 			new.counters = old.counters;
-			new.freelist = old.freelist;
+			*slab_freelist_ptr(&new) = slab_freelist(&old);
 
 			new.frozen = 0;
 
 		} while (!__cmpxchg_double_slab(s, page,
-				old.freelist, old.counters,
-				new.freelist, new.counters,
+				slab_freelist_ptr(&old), old.counters,
+				slab_freelist_ptr(&new), new.counters,
 				"unfreezing slab"));
 
 		if (unlikely(!new.inuse && n->nr_partial > s->min_partial)) {
@@ -2208,8 +2219,8 @@ static inline void *new_slab_objects(str
 		 * No other reference to the page yet so we can
 		 * muck around with it freely without cmpxchg
 		 */
-		freelist = page->freelist;
-		page->freelist = NULL;
+		freelist = slab_freelist(page);
+		*slab_freelist_ptr(page) = NULL;
 
 		stat(s, ALLOC_SLAB);
 		c->page = page;
@@ -2229,7 +2240,7 @@ static inline bool pfmemalloc_match(stru
 }
 
 /*
- * Check the page->freelist of a page and either transfer the freelist to the
+ * Check the slab_freelist(page) of a page and either transfer the freelist to the
  * per cpu freelist or deactivate the page.
  *
  * The page is still frozen if the return value is not NULL.
@@ -2245,7 +2256,7 @@ static inline void *get_freelist(struct
 	void *freelist;
 
 	do {
-		freelist = page->freelist;
+		freelist = slab_freelist(page);
 		counters = page->counters;
 
 		new.counters = counters;
@@ -2557,7 +2568,7 @@ static void __slab_free(struct kmem_cach
 			spin_unlock_irqrestore(&n->list_lock, flags);
 			n = NULL;
 		}
-		prior = page->freelist;
+		prior = slab_freelist(page);
 		counters = page->counters;
 		set_freepointer(s, object, prior);
 		new.counters = counters;
@@ -2901,9 +2912,9 @@ static void early_kmem_cache_node_alloc(
 				"in order to be able to continue\n");
 	}
 
-	n = page->freelist;
+	n = slab_freelist(page);
 	BUG_ON(!n);
-	page->freelist = get_freepointer(kmem_cache_node, n);
+	*slab_freelist_ptr(page) = get_freepointer(kmem_cache_node, n);
 	page->inuse = 1;
 	page->frozen = 0;
 	kmem_cache_node->node[node] = n;
_

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC][PATCH 1/3] mm: slab: create helpers for slab ->freelist pointer
@ 2013-12-11 22:40   ` Dave Hansen
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Hansen @ 2013-12-11 22:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, cl, kirill.shutemov, Andi Kleen, Dave Hansen


We have a need to move the ->freelist data around 'struct page'
in order to keep a cmpxchg aligned.  First step is to add an
accessor function which we will hook in to in the next patch.

I'm not super-happy with how this looks.  It's a bit ugly, but it
does work.  I'm open to some better suggestions for how to do
this.

---

 linux.git-davehans/include/linux/mm_types.h |    2 
 linux.git-davehans/mm/slab.c                |   25 ++++----
 linux.git-davehans/mm/slob.c                |   34 +++++++---
 linux.git-davehans/mm/slub.c                |   87 +++++++++++++++-------------
 4 files changed, 87 insertions(+), 61 deletions(-)

diff -puN include/linux/mm_types.h~slub-internally-align-freelist-and-counters include/linux/mm_types.h
--- linux.git/include/linux/mm_types.h~slub-internally-align-freelist-and-counters	2013-12-11 13:19:54.001948772 -0800
+++ linux.git-davehans/include/linux/mm_types.h	2013-12-11 13:19:54.010949170 -0800
@@ -143,7 +143,7 @@ struct slab_page {
 	void *s_mem;			/* slab first object */
 
 	/* Second double word */
-	void *freelist;		/* sl[aou]b first free object */
+	void *_freelist;		/* sl[aou]b first free object */
 
 	union {
 #if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
diff -puN mm/slab.c~slub-internally-align-freelist-and-counters mm/slab.c
--- linux.git/mm/slab.c~slub-internally-align-freelist-and-counters	2013-12-11 13:19:54.003948861 -0800
+++ linux.git-davehans/mm/slab.c	2013-12-11 13:19:54.011949215 -0800
@@ -1950,6 +1950,16 @@ static void slab_destroy_debugcheck(stru
 }
 #endif
 
+static inline unsigned int **slab_freelist_ptr(struct slab_page *page)
+{
+	return (unsigned int **)&page->_freelist;
+}
+
+static inline unsigned int *slab_freelist(struct slab_page *page)
+{
+	return *slab_freelist_ptr(page);
+}
+
 /**
  * slab_destroy - destroy and release all objects in a slab
  * @cachep: cache pointer being destroyed
@@ -1963,7 +1973,7 @@ static void slab_destroy(struct kmem_cac
 {
 	void *freelist;
 
-	freelist = page->freelist;
+	freelist = slab_freelist(page);
 	slab_destroy_debugcheck(cachep, page);
 	if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU)) {
 		struct rcu_head *head;
@@ -2549,11 +2559,6 @@ static void *alloc_slabmgmt(struct kmem_
 	return freelist;
 }
 
-static inline unsigned int *slab_freelist(struct slab_page *page)
-{
-	return (unsigned int *)(page->freelist);
-}
-
 static void cache_init_objs(struct kmem_cache *cachep,
 			    struct slab_page *page)
 {
@@ -2615,7 +2620,7 @@ static void *slab_get_obj(struct kmem_ca
 {
 	void *objp;
 
-	objp = index_to_obj(cachep, page, slab_freelist(page)[page->active]);
+	objp = index_to_obj(cachep, page, (slab_freelist(page))[page->active]);
 	page->active++;
 #if DEBUG
 	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
@@ -2636,7 +2641,7 @@ static void slab_put_obj(struct kmem_cac
 
 	/* Verify double free bug */
 	for (i = page->active; i < cachep->num; i++) {
-		if (slab_freelist(page)[i] == objnr) {
+		if ((slab_freelist(page))[i] == objnr) {
 			printk(KERN_ERR "slab: double free detected in cache "
 					"'%s', objp %p\n", cachep->name, objp);
 			BUG();
@@ -2656,7 +2661,7 @@ static void slab_map_pages(struct kmem_c
 			   void *freelist)
 {
 	page->slab_cache = cache;
-	page->freelist = freelist;
+	*slab_freelist_ptr(page) = freelist;
 }
 
 /*
@@ -4217,7 +4222,7 @@ static void handle_slab(unsigned long *n
 
 		for (j = page->active; j < c->num; j++) {
 			/* Skip freed item */
-			if (slab_freelist(page)[j] == i) {
+			if ((slab_freelist(page))[j] == i) {
 				active = false;
 				break;
 			}
diff -puN mm/slob.c~slub-internally-align-freelist-and-counters mm/slob.c
--- linux.git/mm/slob.c~slub-internally-align-freelist-and-counters	2013-12-11 13:19:54.004948905 -0800
+++ linux.git-davehans/mm/slob.c	2013-12-11 13:19:54.012949259 -0800
@@ -211,6 +211,16 @@ static void slob_free_pages(void *b, int
 	free_pages((unsigned long)b, order);
 }
 
+static inline void **slab_freelist_ptr(struct slab_page *sp)
+{
+	return &sp->_freelist;
+}
+
+static inline void *slab_freelist(struct slab_page *sp)
+{
+	return *slab_freelist_ptr(sp);
+}
+
 /*
  * Allocate a slob block within a given slob_page sp.
  */
@@ -219,7 +229,7 @@ static void *slob_page_alloc(struct slab
 	slob_t *prev, *cur, *aligned = NULL;
 	int delta = 0, units = SLOB_UNITS(size);
 
-	for (prev = NULL, cur = sp->freelist; ; prev = cur, cur = slob_next(cur)) {
+	for (prev = NULL, cur = slab_freelist(sp); ; prev = cur, cur = slob_next(cur)) {
 		slobidx_t avail = slob_units(cur);
 
 		if (align) {
@@ -243,12 +253,12 @@ static void *slob_page_alloc(struct slab
 				if (prev)
 					set_slob(prev, slob_units(prev), next);
 				else
-					sp->freelist = next;
+					*slab_freelist_ptr(sp) = next;
 			} else { /* fragment */
 				if (prev)
 					set_slob(prev, slob_units(prev), cur + units);
 				else
-					sp->freelist = cur + units;
+					*slab_freelist_ptr(sp) = cur + units;
 				set_slob(cur + units, avail - units, next);
 			}
 
@@ -322,7 +332,7 @@ static void *slob_alloc(size_t size, gfp
 
 		spin_lock_irqsave(&slob_lock, flags);
 		sp->units = SLOB_UNITS(PAGE_SIZE);
-		sp->freelist = b;
+		*slab_freelist_ptr(sp) = b;
 		INIT_LIST_HEAD(&sp->list);
 		set_slob(b, SLOB_UNITS(PAGE_SIZE), b + SLOB_UNITS(PAGE_SIZE));
 		set_slob_page_free(sp, slob_list);
@@ -369,7 +379,7 @@ static void slob_free(void *block, int s
 	if (!slob_page_free(sp)) {
 		/* This slob page is about to become partially free. Easy! */
 		sp->units = units;
-		sp->freelist = b;
+		*slab_freelist_ptr(sp) = b;
 		set_slob(b, units,
 			(void *)((unsigned long)(b +
 					SLOB_UNITS(PAGE_SIZE)) & PAGE_MASK));
@@ -389,15 +399,15 @@ static void slob_free(void *block, int s
 	 */
 	sp->units += units;
 
-	if (b < (slob_t *)sp->freelist) {
-		if (b + units == sp->freelist) {
-			units += slob_units(sp->freelist);
-			sp->freelist = slob_next(sp->freelist);
+	if (b < (slob_t *)slab_freelist(sp)) {
+		if (b + units == slab_freelist(sp)) {
+			units += slob_units(slab_freelist(sp));
+			*slab_freelist_ptr(sp) = slob_next(slab_freelist(sp));
 		}
-		set_slob(b, units, sp->freelist);
-		sp->freelist = b;
+		set_slob(b, units, slab_freelist(sp));
+		*slab_freelist_ptr(sp) = b;
 	} else {
-		prev = sp->freelist;
+		prev = slab_freelist(sp);
 		next = slob_next(prev);
 		while (b > next) {
 			prev = next;
diff -puN mm/slub.c~slub-internally-align-freelist-and-counters mm/slub.c
--- linux.git/mm/slub.c~slub-internally-align-freelist-and-counters	2013-12-11 13:19:54.006948993 -0800
+++ linux.git-davehans/mm/slub.c	2013-12-11 13:19:54.014949347 -0800
@@ -52,7 +52,7 @@
  *   The slab_lock is only used for debugging and on arches that do not
  *   have the ability to do a cmpxchg_double. It only protects the second
  *   double word in the page struct. Meaning
- *	A. page->freelist	-> List of object free in a page
+ *	A. slab_freelist(page)	-> (pointer to) list of object free in a page
  *	B. page->counters	-> Counters of objects
  *	C. page->frozen		-> frozen state
  *
@@ -228,6 +228,16 @@ static inline void stat(const struct kme
 #endif
 }
 
+static inline void **slab_freelist_ptr(struct slab_page *spage)
+{
+	return &spage->_freelist;
+}
+
+static inline void *slab_freelist(struct slab_page *spage)
+{
+	return *slab_freelist_ptr(spage);
+}
+
 /********************************************************************
  * 			Core slab cache functions
  *******************************************************************/
@@ -380,7 +390,7 @@ static inline bool __cmpxchg_double_slab
 #if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
     defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
 	if (s->flags & __CMPXCHG_DOUBLE) {
-		if (cmpxchg_double(&page->freelist, &page->counters,
+		if (cmpxchg_double(slab_freelist_ptr(page), &page->counters,
 			freelist_old, counters_old,
 			freelist_new, counters_new))
 		return 1;
@@ -388,9 +398,9 @@ static inline bool __cmpxchg_double_slab
 #endif
 	{
 		slab_lock(page);
-		if (page->freelist == freelist_old &&
+		if (slab_freelist(page) == freelist_old &&
 					page->counters == counters_old) {
-			page->freelist = freelist_new;
+			*slab_freelist_ptr(page) = freelist_new;
 			page->counters = counters_new;
 			slab_unlock(page);
 			return 1;
@@ -416,7 +426,7 @@ static inline bool cmpxchg_double_slab(s
 #if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
     defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
 	if (s->flags & __CMPXCHG_DOUBLE) {
-		if (cmpxchg_double(&page->freelist, &page->counters,
+		if (cmpxchg_double(slab_freelist_ptr(page), &page->counters,
 			freelist_old, counters_old,
 			freelist_new, counters_new))
 		return 1;
@@ -427,9 +437,9 @@ static inline bool cmpxchg_double_slab(s
 
 		local_irq_save(flags);
 		slab_lock(page);
-		if (page->freelist == freelist_old &&
+		if (slab_freelist(page) == freelist_old &&
 					page->counters == counters_old) {
-			page->freelist = freelist_new;
+			*slab_freelist_ptr(page) = freelist_new;
 			page->counters = counters_new;
 			slab_unlock(page);
 			local_irq_restore(flags);
@@ -461,7 +471,7 @@ static void get_map(struct kmem_cache *s
 	void *p;
 	void *addr = slub_page_address(page);
 
-	for (p = page->freelist; p; p = get_freepointer(s, p))
+	for (p = slab_freelist(page); p; p = get_freepointer(s, p))
 		set_bit(slab_index(p, s, addr), map);
 }
 
@@ -572,7 +582,7 @@ static void print_page_info(struct slab_
 {
 	printk(KERN_ERR
 	       "INFO: Slab 0x%p objects=%u used=%u fp=0x%p flags=0x%04lx\n",
-	       page, page->objects, page->inuse, page->freelist, page->flags);
+	       page, page->objects, page->inuse, slab_freelist(page), page->flags);
 
 }
 
@@ -884,7 +894,7 @@ static int on_freelist(struct kmem_cache
 	void *object = NULL;
 	unsigned long max_objects;
 
-	fp = page->freelist;
+	fp = slab_freelist(page);
 	while (fp && nr <= page->objects) {
 		if (fp == search)
 			return 1;
@@ -895,7 +905,7 @@ static int on_freelist(struct kmem_cache
 				set_freepointer(s, object, NULL);
 			} else {
 				slab_err(s, page, "Freepointer corrupt");
-				page->freelist = NULL;
+				*slab_freelist_ptr(page) = NULL;
 				page->inuse = page->objects;
 				slab_fix(s, "Freelist cleared");
 				return 0;
@@ -934,7 +944,7 @@ static void trace(struct kmem_cache *s,
 			s->name,
 			alloc ? "alloc" : "free",
 			object, page->inuse,
-			page->freelist);
+			slab_freelist(page));
 
 		if (!alloc)
 			print_section("Object ", (void *)object,
@@ -1101,7 +1111,7 @@ bad:
 		 */
 		slab_fix(s, "Marking all objects used");
 		page->inuse = page->objects;
-		page->freelist = NULL;
+		*slab_freelist_ptr(page) = NULL;
 	}
 	return 0;
 }
@@ -1440,7 +1450,7 @@ static struct slab_page *new_slab(struct
 	setup_object(s, spage, last);
 	set_freepointer(s, last, NULL);
 
-	spage->freelist = start;
+	*slab_freelist_ptr(spage) = start;
 	spage->inuse = spage->objects;
 	spage->frozen = 1;
 out:
@@ -1570,15 +1580,15 @@ static inline void *acquire_slab(struct
 	 * The old freelist is the list of objects for the
 	 * per cpu allocation list.
 	 */
-	freelist = page->freelist;
+	freelist = slab_freelist(page);
 	counters = page->counters;
 	new.counters = counters;
 	*objects = new.objects - new.inuse;
 	if (mode) {
 		new.inuse = page->objects;
-		new.freelist = NULL;
+		*slab_freelist_ptr(&new) = NULL;
 	} else {
-		new.freelist = freelist;
+		*slab_freelist_ptr(&new) = freelist;
 	}
 
 	VM_BUG_ON(new.frozen);
@@ -1586,7 +1596,8 @@ static inline void *acquire_slab(struct
 
 	if (!__cmpxchg_double_slab(s, page,
 			freelist, counters,
-			new.freelist, new.counters,
+			slab_freelist(&new),
+			new.counters,
 			"acquire_slab"))
 		return NULL;
 
@@ -1812,7 +1823,7 @@ static void deactivate_slab(struct kmem_
 	struct slab_page new;
 	struct slab_page old;
 
-	if (page->freelist) {
+	if (slab_freelist(page)) {
 		stat(s, DEACTIVATE_REMOTE_FREES);
 		tail = DEACTIVATE_TO_TAIL;
 	}
@@ -1830,7 +1841,7 @@ static void deactivate_slab(struct kmem_
 		unsigned long counters;
 
 		do {
-			prior = page->freelist;
+			prior = slab_freelist(page);
 			counters = page->counters;
 			set_freepointer(s, freelist, prior);
 			new.counters = counters;
@@ -1861,7 +1872,7 @@ static void deactivate_slab(struct kmem_
 	 */
 redo:
 
-	old.freelist = page->freelist;
+	*slab_freelist_ptr(&old) = slab_freelist(page);
 	old.counters = page->counters;
 	VM_BUG_ON(!old.frozen);
 
@@ -1869,16 +1880,16 @@ redo:
 	new.counters = old.counters;
 	if (freelist) {
 		new.inuse--;
-		set_freepointer(s, freelist, old.freelist);
-		new.freelist = freelist;
+		set_freepointer(s, freelist, slab_freelist(&old));
+		*slab_freelist_ptr(&new) = freelist;
 	} else
-		new.freelist = old.freelist;
+		*slab_freelist_ptr(&new) = slab_freelist(&old);
 
 	new.frozen = 0;
 
 	if (!new.inuse && n->nr_partial > s->min_partial)
 		m = M_FREE;
-	else if (new.freelist) {
+	else if (slab_freelist(&new)) {
 		m = M_PARTIAL;
 		if (!lock) {
 			lock = 1;
@@ -1927,8 +1938,8 @@ redo:
 
 	l = m;
 	if (!__cmpxchg_double_slab(s, page,
-				old.freelist, old.counters,
-				new.freelist, new.counters,
+				slab_freelist(&old), old.counters,
+				slab_freelist(&new), new.counters,
 				"unfreezing slab"))
 		goto redo;
 
@@ -1973,18 +1984,18 @@ static void unfreeze_partials(struct kme
 
 		do {
 
-			old.freelist = page->freelist;
+			*slab_freelist_ptr(&old) = slab_freelist(page);
 			old.counters = page->counters;
 			VM_BUG_ON(!old.frozen);
 
 			new.counters = old.counters;
-			new.freelist = old.freelist;
+			*slab_freelist_ptr(&new) = slab_freelist(&old);
 
 			new.frozen = 0;
 
 		} while (!__cmpxchg_double_slab(s, page,
-				old.freelist, old.counters,
-				new.freelist, new.counters,
+				slab_freelist_ptr(&old), old.counters,
+				slab_freelist_ptr(&new), new.counters,
 				"unfreezing slab"));
 
 		if (unlikely(!new.inuse && n->nr_partial > s->min_partial)) {
@@ -2208,8 +2219,8 @@ static inline void *new_slab_objects(str
 		 * No other reference to the page yet so we can
 		 * muck around with it freely without cmpxchg
 		 */
-		freelist = page->freelist;
-		page->freelist = NULL;
+		freelist = slab_freelist(page);
+		*slab_freelist_ptr(page) = NULL;
 
 		stat(s, ALLOC_SLAB);
 		c->page = page;
@@ -2229,7 +2240,7 @@ static inline bool pfmemalloc_match(stru
 }
 
 /*
- * Check the page->freelist of a page and either transfer the freelist to the
+ * Check the slab_freelist(page) of a page and either transfer the freelist to the
  * per cpu freelist or deactivate the page.
  *
  * The page is still frozen if the return value is not NULL.
@@ -2245,7 +2256,7 @@ static inline void *get_freelist(struct
 	void *freelist;
 
 	do {
-		freelist = page->freelist;
+		freelist = slab_freelist(page);
 		counters = page->counters;
 
 		new.counters = counters;
@@ -2557,7 +2568,7 @@ static void __slab_free(struct kmem_cach
 			spin_unlock_irqrestore(&n->list_lock, flags);
 			n = NULL;
 		}
-		prior = page->freelist;
+		prior = slab_freelist(page);
 		counters = page->counters;
 		set_freepointer(s, object, prior);
 		new.counters = counters;
@@ -2901,9 +2912,9 @@ static void early_kmem_cache_node_alloc(
 				"in order to be able to continue\n");
 	}
 
-	n = page->freelist;
+	n = slab_freelist(page);
 	BUG_ON(!n);
-	page->freelist = get_freepointer(kmem_cache_node, n);
+	*slab_freelist_ptr(page) = get_freepointer(kmem_cache_node, n);
 	page->inuse = 1;
 	page->frozen = 0;
 	kmem_cache_node->node[node] = n;
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC][PATCH 2/3] mm: slab: move around slab ->freelist for cmpxchg
  2013-12-11 22:40 ` Dave Hansen
@ 2013-12-11 22:40   ` Dave Hansen
  -1 siblings, 0 replies; 18+ messages in thread
From: Dave Hansen @ 2013-12-11 22:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, cl, kirill.shutemov, Andi Kleen, Dave Hansen


The write-argument to cmpxchg_double() must be 16-byte aligned.
We used to align 'struct page' itself in order to guarantee this,
but that wastes 8-bytes per page.  Instead, we take 8-bytes
internal to the page before page->counters and move freelist
between there and the existing 8-bytes after counters.  That way,
no matter how 'stuct page' itself is aligned, we can ensure that
we have a 16-byte area with which to to this cmpxchg.



---

 linux.git-davehans/include/linux/mm_types.h |   17 +++++--
 linux.git-davehans/mm/slab.c                |    2 
 linux.git-davehans/mm/slab.h                |    1 
 linux.git-davehans/mm/slob.c                |    2 
 linux.git-davehans/mm/slub.c                |   67 +++++++++++++++++++++++-----
 5 files changed, 74 insertions(+), 15 deletions(-)

diff -puN include/linux/mm_types.h~move-around-freelist-to-align include/linux/mm_types.h
--- linux.git/include/linux/mm_types.h~move-around-freelist-to-align	2013-12-11 13:19:54.334963497 -0800
+++ linux.git-davehans/include/linux/mm_types.h	2013-12-11 13:19:54.344963939 -0800
@@ -140,11 +140,20 @@ struct slab_page {
 	/* First double word block */
 	unsigned long flags;		/* Atomic flags, some possibly
 					 * updated asynchronously */
-	void *s_mem;			/* slab first object */
+	union {
+		void *s_mem;			/* slab first object */
+		/*
+		 * The combination of ->counters and ->freelist
+		 * need to be doubleword-aligned in order for
+		 * slub's cmpxchg_double() to work properly.
+		 * slub does not use 's_mem', so we reuse it here
+		 * so we can always have alignment no matter how
+		 * struct page is aligned.
+		 */
+		void *_freelist_first;		/* sl[aou]b first free object */
+	};
 
 	/* Second double word */
-	void *_freelist;		/* sl[aou]b first free object */
-
 	union {
 #if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
 	defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
@@ -173,6 +182,8 @@ struct slab_page {
 		unsigned int active;	/* SLAB */
 	};
 
+	void *_freelist_second;		/* sl[aou]b first free object */
+
 	/* Third double word block */
 	union {
 		struct {		/* slub per cpu partial pages */
diff -puN mm/slab.c~move-around-freelist-to-align mm/slab.c
--- linux.git/mm/slab.c~move-around-freelist-to-align	2013-12-11 13:19:54.335963541 -0800
+++ linux.git-davehans/mm/slab.c	2013-12-11 13:19:54.345963983 -0800
@@ -1952,7 +1952,7 @@ static void slab_destroy_debugcheck(stru
 
 static inline unsigned int **slab_freelist_ptr(struct slab_page *page)
 {
-	return (unsigned int **)&page->_freelist;
+	return (unsigned int **)&page->_freelist_first;
 }
 
 static inline unsigned int *slab_freelist(struct slab_page *page)
diff -puN mm/slab.h~move-around-freelist-to-align mm/slab.h
--- linux.git/mm/slab.h~move-around-freelist-to-align	2013-12-11 13:19:54.337963630 -0800
+++ linux.git-davehans/mm/slab.h	2013-12-11 13:19:54.346964027 -0800
@@ -278,3 +278,4 @@ struct kmem_cache_node {
 
 void *slab_next(struct seq_file *m, void *p, loff_t *pos);
 void slab_stop(struct seq_file *m, void *p);
+
diff -puN mm/slob.c~move-around-freelist-to-align mm/slob.c
--- linux.git/mm/slob.c~move-around-freelist-to-align	2013-12-11 13:19:54.339963718 -0800
+++ linux.git-davehans/mm/slob.c	2013-12-11 13:19:54.346964027 -0800
@@ -213,7 +213,7 @@ static void slob_free_pages(void *b, int
 
 static inline void **slab_freelist_ptr(struct slab_page *sp)
 {
-	return &sp->_freelist;
+	return &sp->_freelist_first;
 }
 
 static inline void *slab_freelist(struct slab_page *sp)
diff -puN mm/slub.c~move-around-freelist-to-align mm/slub.c
--- linux.git/mm/slub.c~move-around-freelist-to-align	2013-12-11 13:19:54.340963762 -0800
+++ linux.git-davehans/mm/slub.c	2013-12-11 13:19:54.348964116 -0800
@@ -228,9 +228,23 @@ static inline void stat(const struct kme
 #endif
 }
 
-static inline void **slab_freelist_ptr(struct slab_page *spage)
+static inline bool ptr_doubleword_aligned(void *ptr)
 {
-	return &spage->_freelist;
+	int doubleword_bytes = BITS_PER_LONG * 2 / 8;
+	if (PTR_ALIGN(ptr, doubleword_bytes) == ptr)
+		return 1;
+	return 0;
+}
+
+void **slab_freelist_ptr(struct slab_page *spage)
+{
+	/*
+	 * If counters is aligned, then we use the ->freelist
+	 * slot _after_ it.
+	 */
+	if (ptr_doubleword_aligned(&spage->counters))
+		return &spage->_freelist_second;
+	return &spage->_freelist_first;
 }
 
 static inline void *slab_freelist(struct slab_page *spage)
@@ -380,6 +394,39 @@ static __always_inline void slab_unlock(
 	__bit_spin_unlock(PG_locked, &page->flags);
 }
 
+/*
+ * Take two adjecent 8b-aligned, but non-doubleword-aligned
+ * arguments and swap them around to guarantee that the
+ * first arg is doubleword-aligned.
+ *
+ * The write-argument to cmpxchg_double() must be 16-byte
+ * aligned.  We used to align 'struct page' itself in order
+ * to guarantee this, but that wastes 8-bytes per page.
+ * Instead, we take 8-bytes internal to the page before
+ * page->counters and move freelist between there and the
+ * existing 8-bytes after counters.  That way, no matter
+ * how 'stuct page' itself is aligned, we can ensure that
+ * we have a 16-byte area with which to to this cmpxchg.
+ */
+static inline bool __cmpxchg_double_slab_unaligned(struct slab_page *page,
+		void *freelist_old, unsigned long counters_old,
+		void *freelist_new, unsigned long counters_new)
+{
+	void **freelist = slab_freelist_ptr(page);
+	if (ptr_doubleword_aligned(&page->counters)) {
+		if (cmpxchg_double(&page->counters, freelist,
+			counters_old, freelist_old,
+			counters_new, freelist_new))
+			return 1;
+	} else {
+		if (cmpxchg_double(freelist, &page->counters,
+			freelist_old, counters_old,
+			freelist_new, counters_new))
+			return 1;
+	}
+	return 0;
+}
+
 /* Interrupts must be disabled (for the fallback code to work right) */
 static inline bool __cmpxchg_double_slab(struct kmem_cache *s, struct slab_page *page,
 		void *freelist_old, unsigned long counters_old,
@@ -390,10 +437,10 @@ static inline bool __cmpxchg_double_slab
 #if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
     defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
 	if (s->flags & __CMPXCHG_DOUBLE) {
-		if (cmpxchg_double(slab_freelist_ptr(page), &page->counters,
-			freelist_old, counters_old,
-			freelist_new, counters_new))
-		return 1;
+		if (__cmpxchg_double_slab_unaligned(page,
+				freelist_old, counters_old,
+				freelist_new, counters_new))
+			return 1;
 	} else
 #endif
 	{
@@ -426,10 +473,10 @@ static inline bool cmpxchg_double_slab(s
 #if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
     defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
 	if (s->flags & __CMPXCHG_DOUBLE) {
-		if (cmpxchg_double(slab_freelist_ptr(page), &page->counters,
-			freelist_old, counters_old,
-			freelist_new, counters_new))
-		return 1;
+		if (__cmpxchg_double_slab_unaligned(page,
+				freelist_old, counters_old,
+				freelist_new, counters_new))
+			return 1;
 	} else
 #endif
 	{
_

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC][PATCH 2/3] mm: slab: move around slab ->freelist for cmpxchg
@ 2013-12-11 22:40   ` Dave Hansen
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Hansen @ 2013-12-11 22:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, cl, kirill.shutemov, Andi Kleen, Dave Hansen


The write-argument to cmpxchg_double() must be 16-byte aligned.
We used to align 'struct page' itself in order to guarantee this,
but that wastes 8-bytes per page.  Instead, we take 8-bytes
internal to the page before page->counters and move freelist
between there and the existing 8-bytes after counters.  That way,
no matter how 'stuct page' itself is aligned, we can ensure that
we have a 16-byte area with which to to this cmpxchg.



---

 linux.git-davehans/include/linux/mm_types.h |   17 +++++--
 linux.git-davehans/mm/slab.c                |    2 
 linux.git-davehans/mm/slab.h                |    1 
 linux.git-davehans/mm/slob.c                |    2 
 linux.git-davehans/mm/slub.c                |   67 +++++++++++++++++++++++-----
 5 files changed, 74 insertions(+), 15 deletions(-)

diff -puN include/linux/mm_types.h~move-around-freelist-to-align include/linux/mm_types.h
--- linux.git/include/linux/mm_types.h~move-around-freelist-to-align	2013-12-11 13:19:54.334963497 -0800
+++ linux.git-davehans/include/linux/mm_types.h	2013-12-11 13:19:54.344963939 -0800
@@ -140,11 +140,20 @@ struct slab_page {
 	/* First double word block */
 	unsigned long flags;		/* Atomic flags, some possibly
 					 * updated asynchronously */
-	void *s_mem;			/* slab first object */
+	union {
+		void *s_mem;			/* slab first object */
+		/*
+		 * The combination of ->counters and ->freelist
+		 * need to be doubleword-aligned in order for
+		 * slub's cmpxchg_double() to work properly.
+		 * slub does not use 's_mem', so we reuse it here
+		 * so we can always have alignment no matter how
+		 * struct page is aligned.
+		 */
+		void *_freelist_first;		/* sl[aou]b first free object */
+	};
 
 	/* Second double word */
-	void *_freelist;		/* sl[aou]b first free object */
-
 	union {
 #if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
 	defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
@@ -173,6 +182,8 @@ struct slab_page {
 		unsigned int active;	/* SLAB */
 	};
 
+	void *_freelist_second;		/* sl[aou]b first free object */
+
 	/* Third double word block */
 	union {
 		struct {		/* slub per cpu partial pages */
diff -puN mm/slab.c~move-around-freelist-to-align mm/slab.c
--- linux.git/mm/slab.c~move-around-freelist-to-align	2013-12-11 13:19:54.335963541 -0800
+++ linux.git-davehans/mm/slab.c	2013-12-11 13:19:54.345963983 -0800
@@ -1952,7 +1952,7 @@ static void slab_destroy_debugcheck(stru
 
 static inline unsigned int **slab_freelist_ptr(struct slab_page *page)
 {
-	return (unsigned int **)&page->_freelist;
+	return (unsigned int **)&page->_freelist_first;
 }
 
 static inline unsigned int *slab_freelist(struct slab_page *page)
diff -puN mm/slab.h~move-around-freelist-to-align mm/slab.h
--- linux.git/mm/slab.h~move-around-freelist-to-align	2013-12-11 13:19:54.337963630 -0800
+++ linux.git-davehans/mm/slab.h	2013-12-11 13:19:54.346964027 -0800
@@ -278,3 +278,4 @@ struct kmem_cache_node {
 
 void *slab_next(struct seq_file *m, void *p, loff_t *pos);
 void slab_stop(struct seq_file *m, void *p);
+
diff -puN mm/slob.c~move-around-freelist-to-align mm/slob.c
--- linux.git/mm/slob.c~move-around-freelist-to-align	2013-12-11 13:19:54.339963718 -0800
+++ linux.git-davehans/mm/slob.c	2013-12-11 13:19:54.346964027 -0800
@@ -213,7 +213,7 @@ static void slob_free_pages(void *b, int
 
 static inline void **slab_freelist_ptr(struct slab_page *sp)
 {
-	return &sp->_freelist;
+	return &sp->_freelist_first;
 }
 
 static inline void *slab_freelist(struct slab_page *sp)
diff -puN mm/slub.c~move-around-freelist-to-align mm/slub.c
--- linux.git/mm/slub.c~move-around-freelist-to-align	2013-12-11 13:19:54.340963762 -0800
+++ linux.git-davehans/mm/slub.c	2013-12-11 13:19:54.348964116 -0800
@@ -228,9 +228,23 @@ static inline void stat(const struct kme
 #endif
 }
 
-static inline void **slab_freelist_ptr(struct slab_page *spage)
+static inline bool ptr_doubleword_aligned(void *ptr)
 {
-	return &spage->_freelist;
+	int doubleword_bytes = BITS_PER_LONG * 2 / 8;
+	if (PTR_ALIGN(ptr, doubleword_bytes) == ptr)
+		return 1;
+	return 0;
+}
+
+void **slab_freelist_ptr(struct slab_page *spage)
+{
+	/*
+	 * If counters is aligned, then we use the ->freelist
+	 * slot _after_ it.
+	 */
+	if (ptr_doubleword_aligned(&spage->counters))
+		return &spage->_freelist_second;
+	return &spage->_freelist_first;
 }
 
 static inline void *slab_freelist(struct slab_page *spage)
@@ -380,6 +394,39 @@ static __always_inline void slab_unlock(
 	__bit_spin_unlock(PG_locked, &page->flags);
 }
 
+/*
+ * Take two adjecent 8b-aligned, but non-doubleword-aligned
+ * arguments and swap them around to guarantee that the
+ * first arg is doubleword-aligned.
+ *
+ * The write-argument to cmpxchg_double() must be 16-byte
+ * aligned.  We used to align 'struct page' itself in order
+ * to guarantee this, but that wastes 8-bytes per page.
+ * Instead, we take 8-bytes internal to the page before
+ * page->counters and move freelist between there and the
+ * existing 8-bytes after counters.  That way, no matter
+ * how 'stuct page' itself is aligned, we can ensure that
+ * we have a 16-byte area with which to to this cmpxchg.
+ */
+static inline bool __cmpxchg_double_slab_unaligned(struct slab_page *page,
+		void *freelist_old, unsigned long counters_old,
+		void *freelist_new, unsigned long counters_new)
+{
+	void **freelist = slab_freelist_ptr(page);
+	if (ptr_doubleword_aligned(&page->counters)) {
+		if (cmpxchg_double(&page->counters, freelist,
+			counters_old, freelist_old,
+			counters_new, freelist_new))
+			return 1;
+	} else {
+		if (cmpxchg_double(freelist, &page->counters,
+			freelist_old, counters_old,
+			freelist_new, counters_new))
+			return 1;
+	}
+	return 0;
+}
+
 /* Interrupts must be disabled (for the fallback code to work right) */
 static inline bool __cmpxchg_double_slab(struct kmem_cache *s, struct slab_page *page,
 		void *freelist_old, unsigned long counters_old,
@@ -390,10 +437,10 @@ static inline bool __cmpxchg_double_slab
 #if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
     defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
 	if (s->flags & __CMPXCHG_DOUBLE) {
-		if (cmpxchg_double(slab_freelist_ptr(page), &page->counters,
-			freelist_old, counters_old,
-			freelist_new, counters_new))
-		return 1;
+		if (__cmpxchg_double_slab_unaligned(page,
+				freelist_old, counters_old,
+				freelist_new, counters_new))
+			return 1;
 	} else
 #endif
 	{
@@ -426,10 +473,10 @@ static inline bool cmpxchg_double_slab(s
 #if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
     defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
 	if (s->flags & __CMPXCHG_DOUBLE) {
-		if (cmpxchg_double(slab_freelist_ptr(page), &page->counters,
-			freelist_old, counters_old,
-			freelist_new, counters_new))
-		return 1;
+		if (__cmpxchg_double_slab_unaligned(page,
+				freelist_old, counters_old,
+				freelist_new, counters_new))
+			return 1;
 	} else
 #endif
 	{
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC][PATCH 3/3] mm: slabs: reset page at free
  2013-12-11 22:40 ` Dave Hansen
@ 2013-12-11 22:40   ` Dave Hansen
  -1 siblings, 0 replies; 18+ messages in thread
From: Dave Hansen @ 2013-12-11 22:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, cl, kirill.shutemov, Andi Kleen, Dave Hansen


We now have slub's ->freelist usage impinging on page->mapping's
storage space.  The buddy allocator wants ->mapping to be NULL
when a page is handed back, so we have to make sure that it is
cleared.

Note that slab already doeds this, so just create a common helper
and have all the slabs do it this way.  ->mapping is right next
to ->flags, so it's virtually guaranteed to be in the L1 at this
point, so this shouldn't cost very much to do in practice.

---

 linux.git-davehans/mm/slab.c |    3 +--
 linux.git-davehans/mm/slab.h |   11 +++++++++++
 linux.git-davehans/mm/slob.c |    2 +-
 linux.git-davehans/mm/slub.c |    2 +-
 4 files changed, 14 insertions(+), 4 deletions(-)

diff -puN mm/slab.c~slub-reset-page-at-free mm/slab.c
--- linux.git/mm/slab.c~slub-reset-page-at-free	2013-12-11 13:19:55.026994096 -0800
+++ linux.git-davehans/mm/slab.c	2013-12-11 13:19:55.036994538 -0800
@@ -1725,8 +1725,7 @@ static void kmem_freepages(struct kmem_c
 	BUG_ON(!PageSlab(page));
 	__ClearPageSlabPfmemalloc(page);
 	__ClearPageSlab(page);
-	page_mapcount_reset(page);
-	page->mapping = NULL;
+	slab_reset_page(page);
 
 	memcg_release_pages(cachep, cachep->gfporder);
 	if (current->reclaim_state)
diff -puN mm/slab.h~slub-reset-page-at-free mm/slab.h
--- linux.git/mm/slab.h~slub-reset-page-at-free	2013-12-11 13:19:55.028994185 -0800
+++ linux.git-davehans/mm/slab.h	2013-12-11 13:19:55.036994538 -0800
@@ -279,3 +279,14 @@ struct kmem_cache_node {
 void *slab_next(struct seq_file *m, void *p, loff_t *pos);
 void slab_stop(struct seq_file *m, void *p);
 
+/*
+ * The slab allocators use 'struct page' fields for all kinds of
+ * things.  This resets the page so that the buddy allocator will
+ * be happy with it.
+ */
+static inline void slab_reset_page(struct page *page)
+{
+	page->mapping = NULL;
+	page_mapcount_reset(page);
+}
+
diff -puN mm/slob.c~slub-reset-page-at-free mm/slob.c
--- linux.git/mm/slob.c~slub-reset-page-at-free	2013-12-11 13:19:55.029994229 -0800
+++ linux.git-davehans/mm/slob.c	2013-12-11 13:19:55.036994538 -0800
@@ -371,7 +371,7 @@ static void slob_free(void *block, int s
 			clear_slob_page_free(sp);
 		spin_unlock_irqrestore(&slob_lock, flags);
 		__ClearPageSlab((struct page *)sp);
-		page_mapcount_reset((struct page *)sp);
+		slab_reset_page((struct page *)sp);
 		slob_free_pages(b, 0);
 		return;
 	}
diff -puN mm/slub.c~slub-reset-page-at-free mm/slub.c
--- linux.git/mm/slub.c~slub-reset-page-at-free	2013-12-11 13:19:55.031994317 -0800
+++ linux.git-davehans/mm/slub.c	2013-12-11 13:19:55.039994671 -0800
@@ -1530,7 +1530,7 @@ static void __free_slab(struct kmem_cach
 	__ClearPageSlab(page);
 
 	memcg_release_pages(s, order);
-	page_mapcount_reset(page);
+	slab_reset_page(page);
 	if (current->reclaim_state)
 		current->reclaim_state->reclaimed_slab += pages;
 	__free_memcg_kmem_pages(page, order);
_

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC][PATCH 3/3] mm: slabs: reset page at free
@ 2013-12-11 22:40   ` Dave Hansen
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Hansen @ 2013-12-11 22:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, cl, kirill.shutemov, Andi Kleen, Dave Hansen


We now have slub's ->freelist usage impinging on page->mapping's
storage space.  The buddy allocator wants ->mapping to be NULL
when a page is handed back, so we have to make sure that it is
cleared.

Note that slab already doeds this, so just create a common helper
and have all the slabs do it this way.  ->mapping is right next
to ->flags, so it's virtually guaranteed to be in the L1 at this
point, so this shouldn't cost very much to do in practice.

---

 linux.git-davehans/mm/slab.c |    3 +--
 linux.git-davehans/mm/slab.h |   11 +++++++++++
 linux.git-davehans/mm/slob.c |    2 +-
 linux.git-davehans/mm/slub.c |    2 +-
 4 files changed, 14 insertions(+), 4 deletions(-)

diff -puN mm/slab.c~slub-reset-page-at-free mm/slab.c
--- linux.git/mm/slab.c~slub-reset-page-at-free	2013-12-11 13:19:55.026994096 -0800
+++ linux.git-davehans/mm/slab.c	2013-12-11 13:19:55.036994538 -0800
@@ -1725,8 +1725,7 @@ static void kmem_freepages(struct kmem_c
 	BUG_ON(!PageSlab(page));
 	__ClearPageSlabPfmemalloc(page);
 	__ClearPageSlab(page);
-	page_mapcount_reset(page);
-	page->mapping = NULL;
+	slab_reset_page(page);
 
 	memcg_release_pages(cachep, cachep->gfporder);
 	if (current->reclaim_state)
diff -puN mm/slab.h~slub-reset-page-at-free mm/slab.h
--- linux.git/mm/slab.h~slub-reset-page-at-free	2013-12-11 13:19:55.028994185 -0800
+++ linux.git-davehans/mm/slab.h	2013-12-11 13:19:55.036994538 -0800
@@ -279,3 +279,14 @@ struct kmem_cache_node {
 void *slab_next(struct seq_file *m, void *p, loff_t *pos);
 void slab_stop(struct seq_file *m, void *p);
 
+/*
+ * The slab allocators use 'struct page' fields for all kinds of
+ * things.  This resets the page so that the buddy allocator will
+ * be happy with it.
+ */
+static inline void slab_reset_page(struct page *page)
+{
+	page->mapping = NULL;
+	page_mapcount_reset(page);
+}
+
diff -puN mm/slob.c~slub-reset-page-at-free mm/slob.c
--- linux.git/mm/slob.c~slub-reset-page-at-free	2013-12-11 13:19:55.029994229 -0800
+++ linux.git-davehans/mm/slob.c	2013-12-11 13:19:55.036994538 -0800
@@ -371,7 +371,7 @@ static void slob_free(void *block, int s
 			clear_slob_page_free(sp);
 		spin_unlock_irqrestore(&slob_lock, flags);
 		__ClearPageSlab((struct page *)sp);
-		page_mapcount_reset((struct page *)sp);
+		slab_reset_page((struct page *)sp);
 		slob_free_pages(b, 0);
 		return;
 	}
diff -puN mm/slub.c~slub-reset-page-at-free mm/slub.c
--- linux.git/mm/slub.c~slub-reset-page-at-free	2013-12-11 13:19:55.031994317 -0800
+++ linux.git-davehans/mm/slub.c	2013-12-11 13:19:55.039994671 -0800
@@ -1530,7 +1530,7 @@ static void __free_slab(struct kmem_cach
 	__ClearPageSlab(page);
 
 	memcg_release_pages(s, order);
-	page_mapcount_reset(page);
+	slab_reset_page(page);
 	if (current->reclaim_state)
 		current->reclaim_state->reclaimed_slab += pages;
 	__free_memcg_kmem_pages(page, order);
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 3/3] mm: slabs: reset page at free
  2013-12-11 22:40   ` Dave Hansen
@ 2013-12-12 17:44     ` Christoph Lameter
  -1 siblings, 0 replies; 18+ messages in thread
From: Christoph Lameter @ 2013-12-12 17:44 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, linux-mm, kirill.shutemov, Andi Kleen

On Wed, 11 Dec 2013, Dave Hansen wrote:

> We now have slub's ->freelist usage impinging on page->mapping's
> storage space.  The buddy allocator wants ->mapping to be NULL
> when a page is handed back, so we have to make sure that it is
> cleared.
>
> Note that slab already doeds this, so just create a common helper
> and have all the slabs do it this way.  ->mapping is right next
> to ->flags, so it's virtually guaranteed to be in the L1 at this
> point, so this shouldn't cost very much to do in practice.

Maybe add a common page alloc and free function in mm/slab_common.c?

All the allocators do similar things after all.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 3/3] mm: slabs: reset page at free
@ 2013-12-12 17:44     ` Christoph Lameter
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Lameter @ 2013-12-12 17:44 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, linux-mm, kirill.shutemov, Andi Kleen

On Wed, 11 Dec 2013, Dave Hansen wrote:

> We now have slub's ->freelist usage impinging on page->mapping's
> storage space.  The buddy allocator wants ->mapping to be NULL
> when a page is handed back, so we have to make sure that it is
> cleared.
>
> Note that slab already doeds this, so just create a common helper
> and have all the slabs do it this way.  ->mapping is right next
> to ->flags, so it's virtually guaranteed to be in the L1 at this
> point, so this shouldn't cost very much to do in practice.

Maybe add a common page alloc and free function in mm/slab_common.c?

All the allocators do similar things after all.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 2/3] mm: slab: move around slab ->freelist for cmpxchg
  2013-12-11 22:40   ` Dave Hansen
@ 2013-12-12 17:46     ` Christoph Lameter
  -1 siblings, 0 replies; 18+ messages in thread
From: Christoph Lameter @ 2013-12-12 17:46 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, linux-mm, kirill.shutemov, Andi Kleen

On Wed, 11 Dec 2013, Dave Hansen wrote:

>
> The write-argument to cmpxchg_double() must be 16-byte aligned.
> We used to align 'struct page' itself in order to guarantee this,
> but that wastes 8-bytes per page.  Instead, we take 8-bytes
> internal to the page before page->counters and move freelist
> between there and the existing 8-bytes after counters.  That way,
> no matter how 'stuct page' itself is aligned, we can ensure that
> we have a 16-byte area with which to to this cmpxchg.

Well this adds additional branching to the fast paths.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 2/3] mm: slab: move around slab ->freelist for cmpxchg
@ 2013-12-12 17:46     ` Christoph Lameter
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Lameter @ 2013-12-12 17:46 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, linux-mm, kirill.shutemov, Andi Kleen

On Wed, 11 Dec 2013, Dave Hansen wrote:

>
> The write-argument to cmpxchg_double() must be 16-byte aligned.
> We used to align 'struct page' itself in order to guarantee this,
> but that wastes 8-bytes per page.  Instead, we take 8-bytes
> internal to the page before page->counters and move freelist
> between there and the existing 8-bytes after counters.  That way,
> no matter how 'stuct page' itself is aligned, we can ensure that
> we have a 16-byte area with which to to this cmpxchg.

Well this adds additional branching to the fast paths.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 1/3] mm: slab: create helpers for slab ->freelist pointer
  2013-12-11 22:40   ` Dave Hansen
@ 2013-12-12 19:32     ` Christoph Lameter
  -1 siblings, 0 replies; 18+ messages in thread
From: Christoph Lameter @ 2013-12-12 19:32 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, linux-mm, kirill.shutemov, Andi Kleen

On Wed, 11 Dec 2013, Dave Hansen wrote:

>
> We have a need to move the ->freelist data around 'struct page'
> in order to keep a cmpxchg aligned.  First step is to add an
> accessor function which we will hook in to in the next patch.
>
> I'm not super-happy with how this looks.  It's a bit ugly, but it
> does work.  I'm open to some better suggestions for how to do
> this.


I think the mapping field is not used by SLUB and its ok to use since SLAB
uses it for its memory pointer. Maybe you can use that to get the correct
alignment? Do an and of address used for the cmpxchg with 0xffff
.. ff0 to ensure proper aligment (the resulting address may overlap
the mapping field).


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 1/3] mm: slab: create helpers for slab ->freelist pointer
@ 2013-12-12 19:32     ` Christoph Lameter
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Lameter @ 2013-12-12 19:32 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, linux-mm, kirill.shutemov, Andi Kleen

On Wed, 11 Dec 2013, Dave Hansen wrote:

>
> We have a need to move the ->freelist data around 'struct page'
> in order to keep a cmpxchg aligned.  First step is to add an
> accessor function which we will hook in to in the next patch.
>
> I'm not super-happy with how this looks.  It's a bit ugly, but it
> does work.  I'm open to some better suggestions for how to do
> this.


I think the mapping field is not used by SLUB and its ok to use since SLAB
uses it for its memory pointer. Maybe you can use that to get the correct
alignment? Do an and of address used for the cmpxchg with 0xffff
.. ff0 to ensure proper aligment (the resulting address may overlap
the mapping field).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 2/3] mm: slab: move around slab ->freelist for cmpxchg
  2013-12-12 17:46     ` Christoph Lameter
@ 2013-12-12 19:39       ` Dave Hansen
  -1 siblings, 0 replies; 18+ messages in thread
From: Dave Hansen @ 2013-12-12 19:39 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-kernel, linux-mm, kirill.shutemov, Andi Kleen

On 12/12/2013 09:46 AM, Christoph Lameter wrote:
> On Wed, 11 Dec 2013, Dave Hansen wrote:
>> The write-argument to cmpxchg_double() must be 16-byte aligned.
>> We used to align 'struct page' itself in order to guarantee this,
>> but that wastes 8-bytes per page.  Instead, we take 8-bytes
>> internal to the page before page->counters and move freelist
>> between there and the existing 8-bytes after counters.  That way,
>> no matter how 'stuct page' itself is aligned, we can ensure that
>> we have a 16-byte area with which to to this cmpxchg.
> 
> Well this adds additional branching to the fast paths.

I don't think it *HAS* to inherently.  The reason here is really that we
swap the _order_ of the arguments to the cmpxchg() since their order in
memory changes.  Essentially, we do:

| flags | freelist  | counters |          |
| flags |           | counters | freelist |

I did this so I wouldn't have to make a helper for ->counters.  But, if
we also move counters around, we can do:

| flags | counters | freelist |          |
| flags |          | counters | freelist |

I believe we can do that all with plain pointer arithmetic and masks so
that it won't cost any branches.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 2/3] mm: slab: move around slab ->freelist for cmpxchg
@ 2013-12-12 19:39       ` Dave Hansen
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Hansen @ 2013-12-12 19:39 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-kernel, linux-mm, kirill.shutemov, Andi Kleen

On 12/12/2013 09:46 AM, Christoph Lameter wrote:
> On Wed, 11 Dec 2013, Dave Hansen wrote:
>> The write-argument to cmpxchg_double() must be 16-byte aligned.
>> We used to align 'struct page' itself in order to guarantee this,
>> but that wastes 8-bytes per page.  Instead, we take 8-bytes
>> internal to the page before page->counters and move freelist
>> between there and the existing 8-bytes after counters.  That way,
>> no matter how 'stuct page' itself is aligned, we can ensure that
>> we have a 16-byte area with which to to this cmpxchg.
> 
> Well this adds additional branching to the fast paths.

I don't think it *HAS* to inherently.  The reason here is really that we
swap the _order_ of the arguments to the cmpxchg() since their order in
memory changes.  Essentially, we do:

| flags | freelist  | counters |          |
| flags |           | counters | freelist |

I did this so I wouldn't have to make a helper for ->counters.  But, if
we also move counters around, we can do:

| flags | counters | freelist |          |
| flags |          | counters | freelist |

I believe we can do that all with plain pointer arithmetic and masks so
that it won't cost any branches.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 2/3] mm: slab: move around slab ->freelist for cmpxchg
  2013-12-12 17:46     ` Christoph Lameter
@ 2013-12-12 23:40       ` Andi Kleen
  -1 siblings, 0 replies; 18+ messages in thread
From: Andi Kleen @ 2013-12-12 23:40 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Dave Hansen, linux-kernel, linux-mm, kirill.shutemov

On Thu, Dec 12, 2013 at 05:46:02PM +0000, Christoph Lameter wrote:
> On Wed, 11 Dec 2013, Dave Hansen wrote:
> 
> >
> > The write-argument to cmpxchg_double() must be 16-byte aligned.
> > We used to align 'struct page' itself in order to guarantee this,
> > but that wastes 8-bytes per page.  Instead, we take 8-bytes
> > internal to the page before page->counters and move freelist
> > between there and the existing 8-bytes after counters.  That way,
> > no matter how 'stuct page' itself is aligned, we can ensure that
> > we have a 16-byte area with which to to this cmpxchg.
> 
> Well this adds additional branching to the fast paths.

The branch should be predictible and compare the cost of a branch
(near nothing on a modern OOO CPU with low IPC code like this when
predicted) to the cost of a cache miss (due to larger struct page)

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 2/3] mm: slab: move around slab ->freelist for cmpxchg
@ 2013-12-12 23:40       ` Andi Kleen
  0 siblings, 0 replies; 18+ messages in thread
From: Andi Kleen @ 2013-12-12 23:40 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Dave Hansen, linux-kernel, linux-mm, kirill.shutemov

On Thu, Dec 12, 2013 at 05:46:02PM +0000, Christoph Lameter wrote:
> On Wed, 11 Dec 2013, Dave Hansen wrote:
> 
> >
> > The write-argument to cmpxchg_double() must be 16-byte aligned.
> > We used to align 'struct page' itself in order to guarantee this,
> > but that wastes 8-bytes per page.  Instead, we take 8-bytes
> > internal to the page before page->counters and move freelist
> > between there and the existing 8-bytes after counters.  That way,
> > no matter how 'stuct page' itself is aligned, we can ensure that
> > we have a 16-byte area with which to to this cmpxchg.
> 
> Well this adds additional branching to the fast paths.

The branch should be predictible and compare the cost of a branch
(near nothing on a modern OOO CPU with low IPC code like this when
predicted) to the cost of a cache miss (due to larger struct page)

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-12-12 23:40 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-11 22:40 [RFC][PATCH 0/3] re-shrink 'struct page' when SLUB is on Dave Hansen
2013-12-11 22:40 ` Dave Hansen
2013-12-11 22:40 ` [RFC][PATCH 1/3] mm: slab: create helpers for slab ->freelist pointer Dave Hansen
2013-12-11 22:40   ` Dave Hansen
2013-12-12 19:32   ` Christoph Lameter
2013-12-12 19:32     ` Christoph Lameter
2013-12-11 22:40 ` [RFC][PATCH 2/3] mm: slab: move around slab ->freelist for cmpxchg Dave Hansen
2013-12-11 22:40   ` Dave Hansen
2013-12-12 17:46   ` Christoph Lameter
2013-12-12 17:46     ` Christoph Lameter
2013-12-12 19:39     ` Dave Hansen
2013-12-12 19:39       ` Dave Hansen
2013-12-12 23:40     ` Andi Kleen
2013-12-12 23:40       ` Andi Kleen
2013-12-11 22:40 ` [RFC][PATCH 3/3] mm: slabs: reset page at free Dave Hansen
2013-12-11 22:40   ` Dave Hansen
2013-12-12 17:44   ` Christoph Lameter
2013-12-12 17:44     ` Christoph Lameter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.