All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/9] re-shrink 'struct page' when SLUB is on.
@ 2014-01-14 18:00 ` Dave Hansen
  0 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-14 18:00 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, penberg, cl, Dave Hansen

This is a minor update from the last version.  The most notable
thing is that I was able to demonstrate that maintaining the
cmpxchg16 optimization has _some_ value.

These are still of RFC quality.  They're stable, but definitely
needs some wider testing, especially on 32-bit.  Mostly just
resending for Christoph to take a look.

These currently apply on top of linux-next.

Otherwise, the code changes are just a few minor cleanups.

---

SLUB depends on a 16-byte cmpxchg for an optimization which
allows it to not disable interrupts in its fast path.  This
optimization has some small but measurable benefits stemming from
the cmpxchg code not needing to disable interrrupts:

	http://www.sr71.net/~dave/intel/slub/slub-perf-20140109.png

In order to get guaranteed 16-byte alignment (required by the
hardware on x86), 'struct page' is padded out from 56 to 64
bytes.

Those 8-bytes matter.  We've gone to great lengths to keep
'struct page' small in the past.  It's a shame that we bloat it
now just for alignment reasons when we have extra space.  Plus,
bloating such a commonly-touched structure *HAS* cache footprint
implications.  The implications can be easily shown with 'proc
stat' when doing 16.8M kmalloc(32)/kfree() pairs:

vanilla 64-byte struct page:
>            883,412 LLC-loads                 #    0.296 M/sec
>            566,546 LLC-load-misses           #   64.13% of all LL-cache hits
patched 56-byte struct page:
>            556,751 LLC-loads                 #    0.186 M/sec
>            339,106 LLC-load-misses           #   60.91% of all LL-cache hits

These patches attempt _internal_ alignment instead of external
alignment for slub.

I also got a bug report from some folks running a large database
benchmark.  Their old kernel uses slab and their new one uses
slub.  They were swapping and couldn't figure out why.  It turned
out to be the 2GB of RAM that the slub padding wastes on their
system.

On my box, that 2GB cost about $200 to populate back when we
bought it.  I want my $200 back.

This set takes me from 16909584K of reserved memory at boot down
to 14814472K, so almost *exactly* 2GB of savings!  It also helps
performance, presumably because it touches 14% fewer struct page
cachelines.  A 30GB dd to a ramfs file:

	dd if=/dev/zero of=bigfile bs=$((1<<30)) count=30

is sped up by about 4.4% in my testing.

I've run this through its paces and have not had stability issues
with it.  It definitely needs some more testing, but it's
definitely ready for a wider audience.

I also wrote up a document describing 'struct page's layout:

	http://tinyurl.com/n6kmedz


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [RFC][PATCH 0/9] re-shrink 'struct page' when SLUB is on.
@ 2014-01-14 18:00 ` Dave Hansen
  0 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-14 18:00 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, penberg, cl, Dave Hansen

This is a minor update from the last version.  The most notable
thing is that I was able to demonstrate that maintaining the
cmpxchg16 optimization has _some_ value.

These are still of RFC quality.  They're stable, but definitely
needs some wider testing, especially on 32-bit.  Mostly just
resending for Christoph to take a look.

These currently apply on top of linux-next.

Otherwise, the code changes are just a few minor cleanups.

---

SLUB depends on a 16-byte cmpxchg for an optimization which
allows it to not disable interrupts in its fast path.  This
optimization has some small but measurable benefits stemming from
the cmpxchg code not needing to disable interrrupts:

	http://www.sr71.net/~dave/intel/slub/slub-perf-20140109.png

In order to get guaranteed 16-byte alignment (required by the
hardware on x86), 'struct page' is padded out from 56 to 64
bytes.

Those 8-bytes matter.  We've gone to great lengths to keep
'struct page' small in the past.  It's a shame that we bloat it
now just for alignment reasons when we have extra space.  Plus,
bloating such a commonly-touched structure *HAS* cache footprint
implications.  The implications can be easily shown with 'proc
stat' when doing 16.8M kmalloc(32)/kfree() pairs:

vanilla 64-byte struct page:
>            883,412 LLC-loads                 #    0.296 M/sec
>            566,546 LLC-load-misses           #   64.13% of all LL-cache hits
patched 56-byte struct page:
>            556,751 LLC-loads                 #    0.186 M/sec
>            339,106 LLC-load-misses           #   60.91% of all LL-cache hits

These patches attempt _internal_ alignment instead of external
alignment for slub.

I also got a bug report from some folks running a large database
benchmark.  Their old kernel uses slab and their new one uses
slub.  They were swapping and couldn't figure out why.  It turned
out to be the 2GB of RAM that the slub padding wastes on their
system.

On my box, that 2GB cost about $200 to populate back when we
bought it.  I want my $200 back.

This set takes me from 16909584K of reserved memory at boot down
to 14814472K, so almost *exactly* 2GB of savings!  It also helps
performance, presumably because it touches 14% fewer struct page
cachelines.  A 30GB dd to a ramfs file:

	dd if=/dev/zero of=bigfile bs=$((1<<30)) count=30

is sped up by about 4.4% in my testing.

I've run this through its paces and have not had stability issues
with it.  It definitely needs some more testing, but it's
definitely ready for a wider audience.

I also wrote up a document describing 'struct page's layout:

	http://tinyurl.com/n6kmedz

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [RFC][PATCH 1/9] mm: slab/slub: use page->list consistently instead of page->lru
  2014-01-14 18:00 ` Dave Hansen
@ 2014-01-14 18:00   ` Dave Hansen
  -1 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-14 18:00 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, penberg, cl, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

'struct page' has two list_head fields: 'lru' and 'list'.
Conveniently, they are unioned together.  This means that code
can use them interchangably, which gets horribly confusing like
with this nugget from slab.c:

>	list_del(&page->lru);
>	if (page->active == cachep->num)
>		list_add(&page->list, &n->slabs_full);

This patch makes the slab and slub code use page->lru
universally instead of mixing ->list and ->lru.

So, the new rule is: page->lru is what the you use if you want to
keep your page on a list.  Don't like the fact that it's not
called ->list?  Too bad.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/include/linux/mm_types.h |    3 ++-
 b/mm/slab.c                |    4 ++--
 b/mm/slob.c                |   10 +++++-----
 3 files changed, 9 insertions(+), 8 deletions(-)

diff -puN include/linux/mm_types.h~make-slab-use-page-lru-vs-list-consistently include/linux/mm_types.h
--- a/include/linux/mm_types.h~make-slab-use-page-lru-vs-list-consistently	2014-01-14 09:57:56.099621967 -0800
+++ b/include/linux/mm_types.h	2014-01-14 09:57:56.106622281 -0800
@@ -124,6 +124,8 @@ struct page {
 	union {
 		struct list_head lru;	/* Pageout list, eg. active_list
 					 * protected by zone->lru_lock !
+					 * Can be used as a generic list
+					 * by the page owner.
 					 */
 		struct {		/* slub per cpu partial pages */
 			struct page *next;	/* Next partial slab */
@@ -136,7 +138,6 @@ struct page {
 #endif
 		};
 
-		struct list_head list;	/* slobs list of pages */
 		struct slab *slab_page; /* slab fields */
 		struct rcu_head rcu_head;	/* Used by SLAB
 						 * when destroying via RCU
diff -puN mm/slab.c~make-slab-use-page-lru-vs-list-consistently mm/slab.c
--- a/mm/slab.c~make-slab-use-page-lru-vs-list-consistently	2014-01-14 09:57:56.101622056 -0800
+++ b/mm/slab.c	2014-01-14 09:57:56.108622370 -0800
@@ -2886,9 +2886,9 @@ retry:
 		/* move slabp to correct slabp list: */
 		list_del(&page->lru);
 		if (page->active == cachep->num)
-			list_add(&page->list, &n->slabs_full);
+			list_add(&page->lru, &n->slabs_full);
 		else
-			list_add(&page->list, &n->slabs_partial);
+			list_add(&page->lru, &n->slabs_partial);
 	}
 
 must_grow:
diff -puN mm/slob.c~make-slab-use-page-lru-vs-list-consistently mm/slob.c
--- a/mm/slob.c~make-slab-use-page-lru-vs-list-consistently	2014-01-14 09:57:56.103622146 -0800
+++ b/mm/slob.c	2014-01-14 09:57:56.109622415 -0800
@@ -111,13 +111,13 @@ static inline int slob_page_free(struct
 
 static void set_slob_page_free(struct page *sp, struct list_head *list)
 {
-	list_add(&sp->list, list);
+	list_add(&sp->lru, list);
 	__SetPageSlobFree(sp);
 }
 
 static inline void clear_slob_page_free(struct page *sp)
 {
-	list_del(&sp->list);
+	list_del(&sp->lru);
 	__ClearPageSlobFree(sp);
 }
 
@@ -282,7 +282,7 @@ static void *slob_alloc(size_t size, gfp
 
 	spin_lock_irqsave(&slob_lock, flags);
 	/* Iterate through each partially free page, try to find room */
-	list_for_each_entry(sp, slob_list, list) {
+	list_for_each_entry(sp, slob_list, lru) {
 #ifdef CONFIG_NUMA
 		/*
 		 * If there's a node specification, search for a partial
@@ -296,7 +296,7 @@ static void *slob_alloc(size_t size, gfp
 			continue;
 
 		/* Attempt to alloc */
-		prev = sp->list.prev;
+		prev = sp->lru.prev;
 		b = slob_page_alloc(sp, size, align);
 		if (!b)
 			continue;
@@ -322,7 +322,7 @@ static void *slob_alloc(size_t size, gfp
 		spin_lock_irqsave(&slob_lock, flags);
 		sp->units = SLOB_UNITS(PAGE_SIZE);
 		sp->freelist = b;
-		INIT_LIST_HEAD(&sp->list);
+		INIT_LIST_HEAD(&sp->lru);
 		set_slob(b, SLOB_UNITS(PAGE_SIZE), b + SLOB_UNITS(PAGE_SIZE));
 		set_slob_page_free(sp, slob_list);
 		b = slob_page_alloc(sp, size, align);
_

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [RFC][PATCH 1/9] mm: slab/slub: use page->list consistently instead of page->lru
@ 2014-01-14 18:00   ` Dave Hansen
  0 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-14 18:00 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, penberg, cl, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

'struct page' has two list_head fields: 'lru' and 'list'.
Conveniently, they are unioned together.  This means that code
can use them interchangably, which gets horribly confusing like
with this nugget from slab.c:

>	list_del(&page->lru);
>	if (page->active == cachep->num)
>		list_add(&page->list, &n->slabs_full);

This patch makes the slab and slub code use page->lru
universally instead of mixing ->list and ->lru.

So, the new rule is: page->lru is what the you use if you want to
keep your page on a list.  Don't like the fact that it's not
called ->list?  Too bad.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/include/linux/mm_types.h |    3 ++-
 b/mm/slab.c                |    4 ++--
 b/mm/slob.c                |   10 +++++-----
 3 files changed, 9 insertions(+), 8 deletions(-)

diff -puN include/linux/mm_types.h~make-slab-use-page-lru-vs-list-consistently include/linux/mm_types.h
--- a/include/linux/mm_types.h~make-slab-use-page-lru-vs-list-consistently	2014-01-14 09:57:56.099621967 -0800
+++ b/include/linux/mm_types.h	2014-01-14 09:57:56.106622281 -0800
@@ -124,6 +124,8 @@ struct page {
 	union {
 		struct list_head lru;	/* Pageout list, eg. active_list
 					 * protected by zone->lru_lock !
+					 * Can be used as a generic list
+					 * by the page owner.
 					 */
 		struct {		/* slub per cpu partial pages */
 			struct page *next;	/* Next partial slab */
@@ -136,7 +138,6 @@ struct page {
 #endif
 		};
 
-		struct list_head list;	/* slobs list of pages */
 		struct slab *slab_page; /* slab fields */
 		struct rcu_head rcu_head;	/* Used by SLAB
 						 * when destroying via RCU
diff -puN mm/slab.c~make-slab-use-page-lru-vs-list-consistently mm/slab.c
--- a/mm/slab.c~make-slab-use-page-lru-vs-list-consistently	2014-01-14 09:57:56.101622056 -0800
+++ b/mm/slab.c	2014-01-14 09:57:56.108622370 -0800
@@ -2886,9 +2886,9 @@ retry:
 		/* move slabp to correct slabp list: */
 		list_del(&page->lru);
 		if (page->active == cachep->num)
-			list_add(&page->list, &n->slabs_full);
+			list_add(&page->lru, &n->slabs_full);
 		else
-			list_add(&page->list, &n->slabs_partial);
+			list_add(&page->lru, &n->slabs_partial);
 	}
 
 must_grow:
diff -puN mm/slob.c~make-slab-use-page-lru-vs-list-consistently mm/slob.c
--- a/mm/slob.c~make-slab-use-page-lru-vs-list-consistently	2014-01-14 09:57:56.103622146 -0800
+++ b/mm/slob.c	2014-01-14 09:57:56.109622415 -0800
@@ -111,13 +111,13 @@ static inline int slob_page_free(struct
 
 static void set_slob_page_free(struct page *sp, struct list_head *list)
 {
-	list_add(&sp->list, list);
+	list_add(&sp->lru, list);
 	__SetPageSlobFree(sp);
 }
 
 static inline void clear_slob_page_free(struct page *sp)
 {
-	list_del(&sp->list);
+	list_del(&sp->lru);
 	__ClearPageSlobFree(sp);
 }
 
@@ -282,7 +282,7 @@ static void *slob_alloc(size_t size, gfp
 
 	spin_lock_irqsave(&slob_lock, flags);
 	/* Iterate through each partially free page, try to find room */
-	list_for_each_entry(sp, slob_list, list) {
+	list_for_each_entry(sp, slob_list, lru) {
 #ifdef CONFIG_NUMA
 		/*
 		 * If there's a node specification, search for a partial
@@ -296,7 +296,7 @@ static void *slob_alloc(size_t size, gfp
 			continue;
 
 		/* Attempt to alloc */
-		prev = sp->list.prev;
+		prev = sp->lru.prev;
 		b = slob_page_alloc(sp, size, align);
 		if (!b)
 			continue;
@@ -322,7 +322,7 @@ static void *slob_alloc(size_t size, gfp
 		spin_lock_irqsave(&slob_lock, flags);
 		sp->units = SLOB_UNITS(PAGE_SIZE);
 		sp->freelist = b;
-		INIT_LIST_HEAD(&sp->list);
+		INIT_LIST_HEAD(&sp->lru);
 		set_slob(b, SLOB_UNITS(PAGE_SIZE), b + SLOB_UNITS(PAGE_SIZE));
 		set_slob_page_free(sp, slob_list);
 		b = slob_page_alloc(sp, size, align);
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [RFC][PATCH 2/9] mm: slub: abstract out double cmpxchg option
  2014-01-14 18:00 ` Dave Hansen
@ 2014-01-14 18:00   ` Dave Hansen
  -1 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-14 18:00 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, penberg, cl, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

I found this useful to have in my testing.  I would like to have
it available for a bit, at least until other folks have had a
chance to do some testing with it.

We should probably just pull the help text and the description
out of this so that folks are not prompted for it instead of
ripping it out entirely.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/include/linux/mm_types.h |    3 +--
 b/init/Kconfig             |   11 +++++++++++
 b/mm/slub.c                |    9 +++------
 3 files changed, 15 insertions(+), 8 deletions(-)

diff -puN include/linux/mm_types.h~abstract-out-slub-double-cmpxchg-operation include/linux/mm_types.h
--- a/include/linux/mm_types.h~abstract-out-slub-double-cmpxchg-operation	2014-01-14 09:57:56.410635912 -0800
+++ b/include/linux/mm_types.h	2014-01-14 09:57:56.417636226 -0800
@@ -73,8 +73,7 @@ struct page {
 		};
 
 		union {
-#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
-	defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
+#if defined(CONFIG_SLUB_ATTEMPT_CMPXCHG_DOUBLE)
 			/* Used for cmpxchg_double in slub */
 			unsigned long counters;
 #else
diff -puN init/Kconfig~abstract-out-slub-double-cmpxchg-operation init/Kconfig
--- a/init/Kconfig~abstract-out-slub-double-cmpxchg-operation	2014-01-14 09:57:56.412636002 -0800
+++ b/init/Kconfig	2014-01-14 09:57:56.418636271 -0800
@@ -1603,6 +1603,17 @@ config SLUB_CPU_PARTIAL
 	  which requires the taking of locks that may cause latency spikes.
 	  Typically one would choose no for a realtime system.
 
+config SLUB_ATTEMPT_CMPXCHG_DOUBLE
+	default y
+	depends on SLUB && HAVE_CMPXCHG_DOUBLE && HAVE_ALIGNED_STRUCT_PAGE
+	bool "SLUB: attempt to use double-cmpxchg operations"
+	help
+	  Some CPUs support instructions that let you do a large double-word
+	  atomic cmpxchg operation.  This keeps the SLUB fastpath from
+	  needing to disable interrupts.
+
+	  If you are unsure, say y.
+
 config MMAP_ALLOW_UNINITIALIZED
 	bool "Allow mmapped anonymous memory to be uninitialized"
 	depends on EXPERT && !MMU
diff -puN mm/slub.c~abstract-out-slub-double-cmpxchg-operation mm/slub.c
--- a/mm/slub.c~abstract-out-slub-double-cmpxchg-operation	2014-01-14 09:57:56.414636092 -0800
+++ b/mm/slub.c	2014-01-14 09:57:56.420636361 -0800
@@ -362,8 +362,7 @@ static inline bool __cmpxchg_double_slab
 		const char *n)
 {
 	VM_BUG_ON(!irqs_disabled());
-#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
-    defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
+#if defined(CONFIG_SLUB_ATTEMPT_CMPXCHG_DOUBLE)
 	if (s->flags & __CMPXCHG_DOUBLE) {
 		if (cmpxchg_double(&page->freelist, &page->counters,
 			freelist_old, counters_old,
@@ -398,8 +397,7 @@ static inline bool cmpxchg_double_slab(s
 		void *freelist_new, unsigned long counters_new,
 		const char *n)
 {
-#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
-    defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
+#if defined(CONFIG_SLUB_ATTEMPT_CMPXCHG_DOUBLE)
 	if (s->flags & __CMPXCHG_DOUBLE) {
 		if (cmpxchg_double(&page->freelist, &page->counters,
 			freelist_old, counters_old,
@@ -3078,8 +3076,7 @@ static int kmem_cache_open(struct kmem_c
 		}
 	}
 
-#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
-    defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
+#if defined(CONFIG_SLUB_ATTEMPT_CMPXCHG_DOUBLE)
 	if (system_has_cmpxchg_double() && (s->flags & SLAB_DEBUG_FLAGS) == 0)
 		/* Enable fast mode */
 		s->flags |= __CMPXCHG_DOUBLE;
_

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [RFC][PATCH 2/9] mm: slub: abstract out double cmpxchg option
@ 2014-01-14 18:00   ` Dave Hansen
  0 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-14 18:00 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, penberg, cl, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

I found this useful to have in my testing.  I would like to have
it available for a bit, at least until other folks have had a
chance to do some testing with it.

We should probably just pull the help text and the description
out of this so that folks are not prompted for it instead of
ripping it out entirely.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/include/linux/mm_types.h |    3 +--
 b/init/Kconfig             |   11 +++++++++++
 b/mm/slub.c                |    9 +++------
 3 files changed, 15 insertions(+), 8 deletions(-)

diff -puN include/linux/mm_types.h~abstract-out-slub-double-cmpxchg-operation include/linux/mm_types.h
--- a/include/linux/mm_types.h~abstract-out-slub-double-cmpxchg-operation	2014-01-14 09:57:56.410635912 -0800
+++ b/include/linux/mm_types.h	2014-01-14 09:57:56.417636226 -0800
@@ -73,8 +73,7 @@ struct page {
 		};
 
 		union {
-#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
-	defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
+#if defined(CONFIG_SLUB_ATTEMPT_CMPXCHG_DOUBLE)
 			/* Used for cmpxchg_double in slub */
 			unsigned long counters;
 #else
diff -puN init/Kconfig~abstract-out-slub-double-cmpxchg-operation init/Kconfig
--- a/init/Kconfig~abstract-out-slub-double-cmpxchg-operation	2014-01-14 09:57:56.412636002 -0800
+++ b/init/Kconfig	2014-01-14 09:57:56.418636271 -0800
@@ -1603,6 +1603,17 @@ config SLUB_CPU_PARTIAL
 	  which requires the taking of locks that may cause latency spikes.
 	  Typically one would choose no for a realtime system.
 
+config SLUB_ATTEMPT_CMPXCHG_DOUBLE
+	default y
+	depends on SLUB && HAVE_CMPXCHG_DOUBLE && HAVE_ALIGNED_STRUCT_PAGE
+	bool "SLUB: attempt to use double-cmpxchg operations"
+	help
+	  Some CPUs support instructions that let you do a large double-word
+	  atomic cmpxchg operation.  This keeps the SLUB fastpath from
+	  needing to disable interrupts.
+
+	  If you are unsure, say y.
+
 config MMAP_ALLOW_UNINITIALIZED
 	bool "Allow mmapped anonymous memory to be uninitialized"
 	depends on EXPERT && !MMU
diff -puN mm/slub.c~abstract-out-slub-double-cmpxchg-operation mm/slub.c
--- a/mm/slub.c~abstract-out-slub-double-cmpxchg-operation	2014-01-14 09:57:56.414636092 -0800
+++ b/mm/slub.c	2014-01-14 09:57:56.420636361 -0800
@@ -362,8 +362,7 @@ static inline bool __cmpxchg_double_slab
 		const char *n)
 {
 	VM_BUG_ON(!irqs_disabled());
-#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
-    defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
+#if defined(CONFIG_SLUB_ATTEMPT_CMPXCHG_DOUBLE)
 	if (s->flags & __CMPXCHG_DOUBLE) {
 		if (cmpxchg_double(&page->freelist, &page->counters,
 			freelist_old, counters_old,
@@ -398,8 +397,7 @@ static inline bool cmpxchg_double_slab(s
 		void *freelist_new, unsigned long counters_new,
 		const char *n)
 {
-#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
-    defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
+#if defined(CONFIG_SLUB_ATTEMPT_CMPXCHG_DOUBLE)
 	if (s->flags & __CMPXCHG_DOUBLE) {
 		if (cmpxchg_double(&page->freelist, &page->counters,
 			freelist_old, counters_old,
@@ -3078,8 +3076,7 @@ static int kmem_cache_open(struct kmem_c
 		}
 	}
 
-#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
-    defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
+#if defined(CONFIG_SLUB_ATTEMPT_CMPXCHG_DOUBLE)
 	if (system_has_cmpxchg_double() && (s->flags & SLAB_DEBUG_FLAGS) == 0)
 		/* Enable fast mode */
 		s->flags |= __CMPXCHG_DOUBLE;
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [RFC][PATCH 3/9] mm: page->pfmemalloc only used by slab/skb
  2014-01-14 18:00 ` Dave Hansen
@ 2014-01-14 18:00   ` Dave Hansen
  -1 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-14 18:00 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, penberg, cl, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

page->pfmemalloc does not deserve a spot in 'struct page'.  It is
only used transiently _just_ after a page leaves the buddy
allocator.

Instead of declaring a union, we move its functionality behind a
few quick accessor functions.  This way we could also much more
easily audit that it is being used correctly in debugging
scenarios.  For instance, we could store a magic number in there
which could never get reused as a page->index and check that the
magic number exists in page_pfmemalloc().

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/include/linux/mm.h       |   17 +++++++++++++++++
 b/include/linux/mm_types.h |    9 ---------
 b/include/linux/skbuff.h   |   10 +++++-----
 b/mm/page_alloc.c          |    2 +-
 b/mm/slab.c                |    4 ++--
 b/mm/slub.c                |    2 +-
 6 files changed, 26 insertions(+), 18 deletions(-)

diff -puN include/linux/mm.h~page_pfmemalloc-only-used-by-slab include/linux/mm.h
--- a/include/linux/mm.h~page_pfmemalloc-only-used-by-slab	2014-01-14 09:57:56.726650082 -0800
+++ b/include/linux/mm.h	2014-01-14 09:57:56.740650710 -0800
@@ -2059,5 +2059,22 @@ void __init setup_nr_node_ids(void);
 static inline void setup_nr_node_ids(void) {}
 #endif
 
+/*
+ * If set by the page allocator, ALLOC_NO_WATERMARKS was set and the
+ * low watermark was not met implying that the system is under some
+ * pressure. The caller should try ensure this page is only used to
+ * free other pages.  Currently only used by sl[au]b.  Note that
+ * this is only valid for a short time after the page returns
+ * from the allocator.
+ */
+static inline int page_pfmemalloc(struct page *page)
+{
+	return !!page->index;
+}
+static inline void set_page_pfmemalloc(struct page *page, int pfmemalloc)
+{
+	page->index = pfmemalloc;
+}
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff -puN include/linux/mm_types.h~page_pfmemalloc-only-used-by-slab include/linux/mm_types.h
--- a/include/linux/mm_types.h~page_pfmemalloc-only-used-by-slab	2014-01-14 09:57:56.727650127 -0800
+++ b/include/linux/mm_types.h	2014-01-14 09:57:56.741650755 -0800
@@ -61,15 +61,6 @@ struct page {
 		union {
 			pgoff_t index;		/* Our offset within mapping. */
 			void *freelist;		/* sl[aou]b first free object */
-			bool pfmemalloc;	/* If set by the page allocator,
-						 * ALLOC_NO_WATERMARKS was set
-						 * and the low watermark was not
-						 * met implying that the system
-						 * is under some pressure. The
-						 * caller should try ensure
-						 * this page is only used to
-						 * free other pages.
-						 */
 		};
 
 		union {
diff -puN include/linux/skbuff.h~page_pfmemalloc-only-used-by-slab include/linux/skbuff.h
--- a/include/linux/skbuff.h~page_pfmemalloc-only-used-by-slab	2014-01-14 09:57:56.729650217 -0800
+++ b/include/linux/skbuff.h	2014-01-14 09:57:56.743650845 -0800
@@ -1399,11 +1399,11 @@ static inline void __skb_fill_page_desc(
 	skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
 
 	/*
-	 * Propagate page->pfmemalloc to the skb if we can. The problem is
+	 * Propagate page_pfmemalloc() to the skb if we can. The problem is
 	 * that not all callers have unique ownership of the page. If
 	 * pfmemalloc is set, we check the mapping as a mapping implies
 	 * page->index is set (index and pfmemalloc share space).
-	 * If it's a valid mapping, we cannot use page->pfmemalloc but we
+	 * If it's a valid mapping, we cannot use page_pfmemalloc() but we
 	 * do not lose pfmemalloc information as the pages would not be
 	 * allocated using __GFP_MEMALLOC.
 	 */
@@ -1412,7 +1412,7 @@ static inline void __skb_fill_page_desc(
 	skb_frag_size_set(frag, size);
 
 	page = compound_head(page);
-	if (page->pfmemalloc && !page->mapping)
+	if (page_pfmemalloc(page) && !page->mapping)
 		skb->pfmemalloc	= true;
 }
 
@@ -1999,7 +1999,7 @@ static inline struct page *__skb_alloc_p
 		gfp_mask |= __GFP_MEMALLOC;
 
 	page = alloc_pages_node(NUMA_NO_NODE, gfp_mask, order);
-	if (skb && page && page->pfmemalloc)
+	if (skb && page && page_pfmemalloc(page))
 		skb->pfmemalloc = true;
 
 	return page;
@@ -2028,7 +2028,7 @@ static inline struct page *__skb_alloc_p
 static inline void skb_propagate_pfmemalloc(struct page *page,
 					     struct sk_buff *skb)
 {
-	if (page && page->pfmemalloc)
+	if (page && page_pfmemalloc(page))
 		skb->pfmemalloc = true;
 }
 
diff -puN mm/page_alloc.c~page_pfmemalloc-only-used-by-slab mm/page_alloc.c
--- a/mm/page_alloc.c~page_pfmemalloc-only-used-by-slab	2014-01-14 09:57:56.731650307 -0800
+++ b/mm/page_alloc.c	2014-01-14 09:57:56.745650934 -0800
@@ -2073,7 +2073,7 @@ this_zone_full:
 		 * memory. The caller should avoid the page being used
 		 * for !PFMEMALLOC purposes.
 		 */
-		page->pfmemalloc = !!(alloc_flags & ALLOC_NO_WATERMARKS);
+		set_page_pfmemalloc(page, alloc_flags & ALLOC_NO_WATERMARKS);
 
 	return page;
 }
diff -puN mm/slab.c~page_pfmemalloc-only-used-by-slab mm/slab.c
--- a/mm/slab.c~page_pfmemalloc-only-used-by-slab	2014-01-14 09:57:56.733650396 -0800
+++ b/mm/slab.c	2014-01-14 09:57:56.747651024 -0800
@@ -1672,7 +1672,7 @@ static struct page *kmem_getpages(struct
 	}
 
 	/* Record if ALLOC_NO_WATERMARKS was set when allocating the slab */
-	if (unlikely(page->pfmemalloc))
+	if (unlikely(page_pfmemalloc(page)))
 		pfmemalloc_active = true;
 
 	nr_pages = (1 << cachep->gfporder);
@@ -1683,7 +1683,7 @@ static struct page *kmem_getpages(struct
 		add_zone_page_state(page_zone(page),
 			NR_SLAB_UNRECLAIMABLE, nr_pages);
 	__SetPageSlab(page);
-	if (page->pfmemalloc)
+	if (page_pfmemalloc(page))
 		SetPageSlabPfmemalloc(page);
 	memcg_bind_pages(cachep, cachep->gfporder);
 
diff -puN mm/slub.c~page_pfmemalloc-only-used-by-slab mm/slub.c
--- a/mm/slub.c~page_pfmemalloc-only-used-by-slab	2014-01-14 09:57:56.735650486 -0800
+++ b/mm/slub.c	2014-01-14 09:57:56.749651114 -0800
@@ -1401,7 +1401,7 @@ static struct page *new_slab(struct kmem
 	memcg_bind_pages(s, order);
 	page->slab_cache = s;
 	__SetPageSlab(page);
-	if (page->pfmemalloc)
+	if (page_pfmemalloc(page))
 		SetPageSlabPfmemalloc(page);
 
 	start = page_address(page);
_

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [RFC][PATCH 3/9] mm: page->pfmemalloc only used by slab/skb
@ 2014-01-14 18:00   ` Dave Hansen
  0 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-14 18:00 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, penberg, cl, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

page->pfmemalloc does not deserve a spot in 'struct page'.  It is
only used transiently _just_ after a page leaves the buddy
allocator.

Instead of declaring a union, we move its functionality behind a
few quick accessor functions.  This way we could also much more
easily audit that it is being used correctly in debugging
scenarios.  For instance, we could store a magic number in there
which could never get reused as a page->index and check that the
magic number exists in page_pfmemalloc().

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/include/linux/mm.h       |   17 +++++++++++++++++
 b/include/linux/mm_types.h |    9 ---------
 b/include/linux/skbuff.h   |   10 +++++-----
 b/mm/page_alloc.c          |    2 +-
 b/mm/slab.c                |    4 ++--
 b/mm/slub.c                |    2 +-
 6 files changed, 26 insertions(+), 18 deletions(-)

diff -puN include/linux/mm.h~page_pfmemalloc-only-used-by-slab include/linux/mm.h
--- a/include/linux/mm.h~page_pfmemalloc-only-used-by-slab	2014-01-14 09:57:56.726650082 -0800
+++ b/include/linux/mm.h	2014-01-14 09:57:56.740650710 -0800
@@ -2059,5 +2059,22 @@ void __init setup_nr_node_ids(void);
 static inline void setup_nr_node_ids(void) {}
 #endif
 
+/*
+ * If set by the page allocator, ALLOC_NO_WATERMARKS was set and the
+ * low watermark was not met implying that the system is under some
+ * pressure. The caller should try ensure this page is only used to
+ * free other pages.  Currently only used by sl[au]b.  Note that
+ * this is only valid for a short time after the page returns
+ * from the allocator.
+ */
+static inline int page_pfmemalloc(struct page *page)
+{
+	return !!page->index;
+}
+static inline void set_page_pfmemalloc(struct page *page, int pfmemalloc)
+{
+	page->index = pfmemalloc;
+}
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff -puN include/linux/mm_types.h~page_pfmemalloc-only-used-by-slab include/linux/mm_types.h
--- a/include/linux/mm_types.h~page_pfmemalloc-only-used-by-slab	2014-01-14 09:57:56.727650127 -0800
+++ b/include/linux/mm_types.h	2014-01-14 09:57:56.741650755 -0800
@@ -61,15 +61,6 @@ struct page {
 		union {
 			pgoff_t index;		/* Our offset within mapping. */
 			void *freelist;		/* sl[aou]b first free object */
-			bool pfmemalloc;	/* If set by the page allocator,
-						 * ALLOC_NO_WATERMARKS was set
-						 * and the low watermark was not
-						 * met implying that the system
-						 * is under some pressure. The
-						 * caller should try ensure
-						 * this page is only used to
-						 * free other pages.
-						 */
 		};
 
 		union {
diff -puN include/linux/skbuff.h~page_pfmemalloc-only-used-by-slab include/linux/skbuff.h
--- a/include/linux/skbuff.h~page_pfmemalloc-only-used-by-slab	2014-01-14 09:57:56.729650217 -0800
+++ b/include/linux/skbuff.h	2014-01-14 09:57:56.743650845 -0800
@@ -1399,11 +1399,11 @@ static inline void __skb_fill_page_desc(
 	skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
 
 	/*
-	 * Propagate page->pfmemalloc to the skb if we can. The problem is
+	 * Propagate page_pfmemalloc() to the skb if we can. The problem is
 	 * that not all callers have unique ownership of the page. If
 	 * pfmemalloc is set, we check the mapping as a mapping implies
 	 * page->index is set (index and pfmemalloc share space).
-	 * If it's a valid mapping, we cannot use page->pfmemalloc but we
+	 * If it's a valid mapping, we cannot use page_pfmemalloc() but we
 	 * do not lose pfmemalloc information as the pages would not be
 	 * allocated using __GFP_MEMALLOC.
 	 */
@@ -1412,7 +1412,7 @@ static inline void __skb_fill_page_desc(
 	skb_frag_size_set(frag, size);
 
 	page = compound_head(page);
-	if (page->pfmemalloc && !page->mapping)
+	if (page_pfmemalloc(page) && !page->mapping)
 		skb->pfmemalloc	= true;
 }
 
@@ -1999,7 +1999,7 @@ static inline struct page *__skb_alloc_p
 		gfp_mask |= __GFP_MEMALLOC;
 
 	page = alloc_pages_node(NUMA_NO_NODE, gfp_mask, order);
-	if (skb && page && page->pfmemalloc)
+	if (skb && page && page_pfmemalloc(page))
 		skb->pfmemalloc = true;
 
 	return page;
@@ -2028,7 +2028,7 @@ static inline struct page *__skb_alloc_p
 static inline void skb_propagate_pfmemalloc(struct page *page,
 					     struct sk_buff *skb)
 {
-	if (page && page->pfmemalloc)
+	if (page && page_pfmemalloc(page))
 		skb->pfmemalloc = true;
 }
 
diff -puN mm/page_alloc.c~page_pfmemalloc-only-used-by-slab mm/page_alloc.c
--- a/mm/page_alloc.c~page_pfmemalloc-only-used-by-slab	2014-01-14 09:57:56.731650307 -0800
+++ b/mm/page_alloc.c	2014-01-14 09:57:56.745650934 -0800
@@ -2073,7 +2073,7 @@ this_zone_full:
 		 * memory. The caller should avoid the page being used
 		 * for !PFMEMALLOC purposes.
 		 */
-		page->pfmemalloc = !!(alloc_flags & ALLOC_NO_WATERMARKS);
+		set_page_pfmemalloc(page, alloc_flags & ALLOC_NO_WATERMARKS);
 
 	return page;
 }
diff -puN mm/slab.c~page_pfmemalloc-only-used-by-slab mm/slab.c
--- a/mm/slab.c~page_pfmemalloc-only-used-by-slab	2014-01-14 09:57:56.733650396 -0800
+++ b/mm/slab.c	2014-01-14 09:57:56.747651024 -0800
@@ -1672,7 +1672,7 @@ static struct page *kmem_getpages(struct
 	}
 
 	/* Record if ALLOC_NO_WATERMARKS was set when allocating the slab */
-	if (unlikely(page->pfmemalloc))
+	if (unlikely(page_pfmemalloc(page)))
 		pfmemalloc_active = true;
 
 	nr_pages = (1 << cachep->gfporder);
@@ -1683,7 +1683,7 @@ static struct page *kmem_getpages(struct
 		add_zone_page_state(page_zone(page),
 			NR_SLAB_UNRECLAIMABLE, nr_pages);
 	__SetPageSlab(page);
-	if (page->pfmemalloc)
+	if (page_pfmemalloc(page))
 		SetPageSlabPfmemalloc(page);
 	memcg_bind_pages(cachep, cachep->gfporder);
 
diff -puN mm/slub.c~page_pfmemalloc-only-used-by-slab mm/slub.c
--- a/mm/slub.c~page_pfmemalloc-only-used-by-slab	2014-01-14 09:57:56.735650486 -0800
+++ b/mm/slub.c	2014-01-14 09:57:56.749651114 -0800
@@ -1401,7 +1401,7 @@ static struct page *new_slab(struct kmem
 	memcg_bind_pages(s, order);
 	page->slab_cache = s;
 	__SetPageSlab(page);
-	if (page->pfmemalloc)
+	if (page_pfmemalloc(page))
 		SetPageSlabPfmemalloc(page);
 
 	start = page_address(page);
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [RFC][PATCH 4/9] mm: slabs: reset page at free
  2014-01-14 18:00 ` Dave Hansen
@ 2014-01-14 18:00   ` Dave Hansen
  -1 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-14 18:00 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, penberg, cl, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

In order to simplify 'struct page', we will shortly be moving
some fields around.  This causes slub's ->freelist usage to
impinge on page->mapping's storage space.  The buddy allocator
wants ->mapping to be NULL when a page is handed back, so we have
to make sure that it is cleared.

Note that slab already doeds this, so just create a common helper
and have all the slabs do it this way.  ->mapping is right next
to ->flags, so it's virtually guaranteed to be in the L1 at this
point, so this shouldn't cost very much to do in practice.

Other allocators and users of 'struct page' may also want to call
this if they use parts of 'struct page' for nonstandard purposes.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/include/linux/mm.h |   11 +++++++++++
 b/mm/slab.c          |    3 +--
 b/mm/slob.c          |    2 +-
 b/mm/slub.c          |    2 +-
 4 files changed, 14 insertions(+), 4 deletions(-)

diff -puN include/linux/mm.h~slub-reset-page-at-free include/linux/mm.h
--- a/include/linux/mm.h~slub-reset-page-at-free	2014-01-14 09:57:57.099666808 -0800
+++ b/include/linux/mm.h	2014-01-14 09:57:57.110667301 -0800
@@ -2076,5 +2076,16 @@ static inline void set_page_pfmemalloc(s
 	page->index = pfmemalloc;
 }
 
+/*
+ * Custom allocators (like the slabs) use 'struct page' fields
+ * for all kinds of things.  This resets the page's state so that
+ * the buddy allocator will be happy with it.
+ */
+static inline void allocator_reset_page(struct page *page)
+{
+	page->mapping = NULL;
+	page_mapcount_reset(page);
+}
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff -puN mm/slab.c~slub-reset-page-at-free mm/slab.c
--- a/mm/slab.c~slub-reset-page-at-free	2014-01-14 09:57:57.101666898 -0800
+++ b/mm/slab.c	2014-01-14 09:57:57.111667346 -0800
@@ -1718,8 +1718,7 @@ static void kmem_freepages(struct kmem_c
 	BUG_ON(!PageSlab(page));
 	__ClearPageSlabPfmemalloc(page);
 	__ClearPageSlab(page);
-	page_mapcount_reset(page);
-	page->mapping = NULL;
+	allocator_reset_page(page);
 
 	memcg_release_pages(cachep, cachep->gfporder);
 	if (current->reclaim_state)
diff -puN mm/slob.c~slub-reset-page-at-free mm/slob.c
--- a/mm/slob.c~slub-reset-page-at-free	2014-01-14 09:57:57.103666988 -0800
+++ b/mm/slob.c	2014-01-14 09:57:57.112667391 -0800
@@ -360,7 +360,7 @@ static void slob_free(void *block, int s
 			clear_slob_page_free(sp);
 		spin_unlock_irqrestore(&slob_lock, flags);
 		__ClearPageSlab(sp);
-		page_mapcount_reset(sp);
+		allocator_reset_page(sp);
 		slob_free_pages(b, 0);
 		return;
 	}
diff -puN mm/slub.c~slub-reset-page-at-free mm/slub.c
--- a/mm/slub.c~slub-reset-page-at-free	2014-01-14 09:57:57.105667077 -0800
+++ b/mm/slub.c	2014-01-14 09:57:57.114667481 -0800
@@ -1450,7 +1450,7 @@ static void __free_slab(struct kmem_cach
 	__ClearPageSlab(page);
 
 	memcg_release_pages(s, order);
-	page_mapcount_reset(page);
+	allocator_reset_page(page);
 	if (current->reclaim_state)
 		current->reclaim_state->reclaimed_slab += pages;
 	__free_memcg_kmem_pages(page, order);
_

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [RFC][PATCH 4/9] mm: slabs: reset page at free
@ 2014-01-14 18:00   ` Dave Hansen
  0 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-14 18:00 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, penberg, cl, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

In order to simplify 'struct page', we will shortly be moving
some fields around.  This causes slub's ->freelist usage to
impinge on page->mapping's storage space.  The buddy allocator
wants ->mapping to be NULL when a page is handed back, so we have
to make sure that it is cleared.

Note that slab already doeds this, so just create a common helper
and have all the slabs do it this way.  ->mapping is right next
to ->flags, so it's virtually guaranteed to be in the L1 at this
point, so this shouldn't cost very much to do in practice.

Other allocators and users of 'struct page' may also want to call
this if they use parts of 'struct page' for nonstandard purposes.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/include/linux/mm.h |   11 +++++++++++
 b/mm/slab.c          |    3 +--
 b/mm/slob.c          |    2 +-
 b/mm/slub.c          |    2 +-
 4 files changed, 14 insertions(+), 4 deletions(-)

diff -puN include/linux/mm.h~slub-reset-page-at-free include/linux/mm.h
--- a/include/linux/mm.h~slub-reset-page-at-free	2014-01-14 09:57:57.099666808 -0800
+++ b/include/linux/mm.h	2014-01-14 09:57:57.110667301 -0800
@@ -2076,5 +2076,16 @@ static inline void set_page_pfmemalloc(s
 	page->index = pfmemalloc;
 }
 
+/*
+ * Custom allocators (like the slabs) use 'struct page' fields
+ * for all kinds of things.  This resets the page's state so that
+ * the buddy allocator will be happy with it.
+ */
+static inline void allocator_reset_page(struct page *page)
+{
+	page->mapping = NULL;
+	page_mapcount_reset(page);
+}
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff -puN mm/slab.c~slub-reset-page-at-free mm/slab.c
--- a/mm/slab.c~slub-reset-page-at-free	2014-01-14 09:57:57.101666898 -0800
+++ b/mm/slab.c	2014-01-14 09:57:57.111667346 -0800
@@ -1718,8 +1718,7 @@ static void kmem_freepages(struct kmem_c
 	BUG_ON(!PageSlab(page));
 	__ClearPageSlabPfmemalloc(page);
 	__ClearPageSlab(page);
-	page_mapcount_reset(page);
-	page->mapping = NULL;
+	allocator_reset_page(page);
 
 	memcg_release_pages(cachep, cachep->gfporder);
 	if (current->reclaim_state)
diff -puN mm/slob.c~slub-reset-page-at-free mm/slob.c
--- a/mm/slob.c~slub-reset-page-at-free	2014-01-14 09:57:57.103666988 -0800
+++ b/mm/slob.c	2014-01-14 09:57:57.112667391 -0800
@@ -360,7 +360,7 @@ static void slob_free(void *block, int s
 			clear_slob_page_free(sp);
 		spin_unlock_irqrestore(&slob_lock, flags);
 		__ClearPageSlab(sp);
-		page_mapcount_reset(sp);
+		allocator_reset_page(sp);
 		slob_free_pages(b, 0);
 		return;
 	}
diff -puN mm/slub.c~slub-reset-page-at-free mm/slub.c
--- a/mm/slub.c~slub-reset-page-at-free	2014-01-14 09:57:57.105667077 -0800
+++ b/mm/slub.c	2014-01-14 09:57:57.114667481 -0800
@@ -1450,7 +1450,7 @@ static void __free_slab(struct kmem_cach
 	__ClearPageSlab(page);
 
 	memcg_release_pages(s, order);
-	page_mapcount_reset(page);
+	allocator_reset_page(page);
 	if (current->reclaim_state)
 		current->reclaim_state->reclaimed_slab += pages;
 	__free_memcg_kmem_pages(page, order);
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [RFC][PATCH 5/9] mm: rearrange struct page
  2014-01-14 18:00 ` Dave Hansen
@ 2014-01-14 18:00   ` Dave Hansen
  -1 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-14 18:00 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, penberg, cl, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

To make the layout of 'struct page' look nicer, I broke
up a few of the unions.  But, this has a cost: things that
were guaranteed to line up before might not any more.  To make up
for that, some BUILD_BUG_ON()s are added to manually check for
the alignment dependencies.

This makes it *MUCH* more clear how the first few fields of
'struct page' get used by the slab allocators.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/include/linux/mm_types.h |   99 ++++++++++++++++++++++-----------------------
 b/mm/slab.c                |    6 +-
 b/mm/slab_common.c         |   17 +++++++
 b/mm/slob.c                |   25 +++++------
 4 files changed, 83 insertions(+), 64 deletions(-)

diff -puN include/linux/mm_types.h~rearrange-struct-page include/linux/mm_types.h
--- a/include/linux/mm_types.h~rearrange-struct-page	2014-01-14 09:57:57.429681606 -0800
+++ b/include/linux/mm_types.h	2014-01-14 09:57:57.437681965 -0800
@@ -46,26 +46,59 @@ struct page {
 	unsigned long flags;		/* Atomic flags, some possibly
 					 * updated asynchronously */
 	union {
-		struct address_space *mapping;	/* If low bit clear, points to
-						 * inode address_space, or NULL.
-						 * If page mapped as anonymous
-						 * memory, low bit is set, and
-						 * it points to anon_vma object:
-						 * see PAGE_MAPPING_ANON below.
-						 */
-		void *s_mem;			/* slab first object */
-	};
-
-	/* Second double word */
-	struct {
-		union {
+		struct /* the normal uses */ {
 			pgoff_t index;		/* Our offset within mapping. */
-			void *freelist;		/* sl[aou]b first free object */
+			/*
+			 * mapping: If low bit clear, points to
+			 * inode address_space, or NULL.  If page
+			 * mapped as anonymous memory, low bit is
+			 * set, and it points to anon_vma object:
+			 * see PAGE_MAPPING_ANON below.
+			 */
+			struct address_space *mapping;
+			/*
+			 * Count of ptes mapped in mms, to show when page
+			 * is mapped & limit reverse map searches.
+			 *
+			 * Used also for tail pages refcounting instead
+			 * of _count. Tail pages cannot be mapped and
+			 * keeping the tail page _count zero at all times
+			 * guarantees get_page_unless_zero() will never
+			 * succeed on tail pages.
+			 */
+			atomic_t _mapcount;
+			atomic_t _count;
+		}; /* end of the "normal" use */
+
+		struct { /* SLUB */
+			void *unused;
+			void *freelist;
+			unsigned inuse:16;
+			unsigned objects:15;
+			unsigned frozen:1;
+			atomic_t dontuse_slub_count;
 		};
-
-		union {
+		struct { /* SLAB */
+			void *s_mem;
+			void *slab_freelist;
+			unsigned int active;
+			atomic_t dontuse_slab_count;
+		};
+		struct { /* SLOB */
+			void *slob_unused;
+			void *slob_freelist;
+			unsigned int units;
+			atomic_t dontuse_slob_count;
+		};
+		/*
+		 * This is here to help the slub code deal with
+		 * its inuse/objects/frozen bitfields as a single
+		 * blob.
+		 */
+		struct { /* slub helpers */
+			void *slubhelp_unused;
+			void *slubhelp_freelist;
 #if defined(CONFIG_SLUB_ATTEMPT_CMPXCHG_DOUBLE)
-			/* Used for cmpxchg_double in slub */
 			unsigned long counters;
 #else
 			/*
@@ -75,38 +108,6 @@ struct page {
 			 */
 			unsigned counters;
 #endif
-
-			struct {
-
-				union {
-					/*
-					 * Count of ptes mapped in
-					 * mms, to show when page is
-					 * mapped & limit reverse map
-					 * searches.
-					 *
-					 * Used also for tail pages
-					 * refcounting instead of
-					 * _count. Tail pages cannot
-					 * be mapped and keeping the
-					 * tail page _count zero at
-					 * all times guarantees
-					 * get_page_unless_zero() will
-					 * never succeed on tail
-					 * pages.
-					 */
-					atomic_t _mapcount;
-
-					struct { /* SLUB */
-						unsigned inuse:16;
-						unsigned objects:15;
-						unsigned frozen:1;
-					};
-					int units;	/* SLOB */
-				};
-				atomic_t _count;		/* Usage count, see below. */
-			};
-			unsigned int active;	/* SLAB */
 		};
 	};
 
diff -puN mm/slab.c~rearrange-struct-page mm/slab.c
--- a/mm/slab.c~rearrange-struct-page	2014-01-14 09:57:57.431681696 -0800
+++ b/mm/slab.c	2014-01-14 09:57:57.439682054 -0800
@@ -1955,7 +1955,7 @@ static void slab_destroy(struct kmem_cac
 {
 	void *freelist;
 
-	freelist = page->freelist;
+	freelist = page->slab_freelist;
 	slab_destroy_debugcheck(cachep, page);
 	if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU)) {
 		struct rcu_head *head;
@@ -2543,7 +2543,7 @@ static void *alloc_slabmgmt(struct kmem_
 
 static inline unsigned int *slab_freelist(struct page *page)
 {
-	return (unsigned int *)(page->freelist);
+	return (unsigned int *)(page->slab_freelist);
 }
 
 static void cache_init_objs(struct kmem_cache *cachep,
@@ -2648,7 +2648,7 @@ static void slab_map_pages(struct kmem_c
 			   void *freelist)
 {
 	page->slab_cache = cache;
-	page->freelist = freelist;
+	page->slab_freelist = freelist;
 }
 
 /*
diff -puN mm/slab_common.c~rearrange-struct-page mm/slab_common.c
--- a/mm/slab_common.c~rearrange-struct-page	2014-01-14 09:57:57.432681741 -0800
+++ b/mm/slab_common.c	2014-01-14 09:57:57.440682099 -0800
@@ -676,3 +676,20 @@ static int __init slab_proc_init(void)
 }
 module_init(slab_proc_init);
 #endif /* CONFIG_SLABINFO */
+#define SLAB_PAGE_CHECK(field1, field2)        \
+	BUILD_BUG_ON(offsetof(struct page, field1) !=   \
+		     offsetof(struct page, field2))
+/*
+ * To make the layout of 'struct page' look nicer, we've broken
+ * up a few of the unions.  Folks declaring their own use of the
+ * first few fields need to make sure that their use does not
+ * interfere with page->_count.  This ensures that the individual
+ * users' use actually lines up with the real ->_count.
+ */
+void slab_build_checks(void)
+{
+	SLAB_PAGE_CHECK(_count, dontuse_slab_count);
+	SLAB_PAGE_CHECK(_count, dontuse_slub_count);
+	SLAB_PAGE_CHECK(_count, dontuse_slob_count);
+}
+
diff -puN mm/slob.c~rearrange-struct-page mm/slob.c
--- a/mm/slob.c~rearrange-struct-page	2014-01-14 09:57:57.434681830 -0800
+++ b/mm/slob.c	2014-01-14 09:57:57.440682099 -0800
@@ -219,7 +219,8 @@ static void *slob_page_alloc(struct page
 	slob_t *prev, *cur, *aligned = NULL;
 	int delta = 0, units = SLOB_UNITS(size);
 
-	for (prev = NULL, cur = sp->freelist; ; prev = cur, cur = slob_next(cur)) {
+	for (prev = NULL, cur = sp->slob_freelist; ;
+	     prev = cur,  cur = slob_next(cur)) {
 		slobidx_t avail = slob_units(cur);
 
 		if (align) {
@@ -243,12 +244,12 @@ static void *slob_page_alloc(struct page
 				if (prev)
 					set_slob(prev, slob_units(prev), next);
 				else
-					sp->freelist = next;
+					sp->slob_freelist = next;
 			} else { /* fragment */
 				if (prev)
 					set_slob(prev, slob_units(prev), cur + units);
 				else
-					sp->freelist = cur + units;
+					sp->slob_freelist = cur + units;
 				set_slob(cur + units, avail - units, next);
 			}
 
@@ -321,7 +322,7 @@ static void *slob_alloc(size_t size, gfp
 
 		spin_lock_irqsave(&slob_lock, flags);
 		sp->units = SLOB_UNITS(PAGE_SIZE);
-		sp->freelist = b;
+		sp->slob_freelist = b;
 		INIT_LIST_HEAD(&sp->lru);
 		set_slob(b, SLOB_UNITS(PAGE_SIZE), b + SLOB_UNITS(PAGE_SIZE));
 		set_slob_page_free(sp, slob_list);
@@ -368,7 +369,7 @@ static void slob_free(void *block, int s
 	if (!slob_page_free(sp)) {
 		/* This slob page is about to become partially free. Easy! */
 		sp->units = units;
-		sp->freelist = b;
+		sp->slob_freelist = b;
 		set_slob(b, units,
 			(void *)((unsigned long)(b +
 					SLOB_UNITS(PAGE_SIZE)) & PAGE_MASK));
@@ -388,15 +389,15 @@ static void slob_free(void *block, int s
 	 */
 	sp->units += units;
 
-	if (b < (slob_t *)sp->freelist) {
-		if (b + units == sp->freelist) {
-			units += slob_units(sp->freelist);
-			sp->freelist = slob_next(sp->freelist);
+	if (b < (slob_t *)sp->slob_freelist) {
+		if (b + units == sp->slob_freelist) {
+			units += slob_units(sp->slob_freelist);
+			sp->slob_freelist = slob_next(sp->slob_freelist);
 		}
-		set_slob(b, units, sp->freelist);
-		sp->freelist = b;
+		set_slob(b, units, sp->slob_freelist);
+		sp->slob_freelist = b;
 	} else {
-		prev = sp->freelist;
+		prev = sp->slob_freelist;
 		next = slob_next(prev);
 		while (b > next) {
 			prev = next;
_

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [RFC][PATCH 5/9] mm: rearrange struct page
@ 2014-01-14 18:00   ` Dave Hansen
  0 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-14 18:00 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, penberg, cl, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

To make the layout of 'struct page' look nicer, I broke
up a few of the unions.  But, this has a cost: things that
were guaranteed to line up before might not any more.  To make up
for that, some BUILD_BUG_ON()s are added to manually check for
the alignment dependencies.

This makes it *MUCH* more clear how the first few fields of
'struct page' get used by the slab allocators.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/include/linux/mm_types.h |   99 ++++++++++++++++++++++-----------------------
 b/mm/slab.c                |    6 +-
 b/mm/slab_common.c         |   17 +++++++
 b/mm/slob.c                |   25 +++++------
 4 files changed, 83 insertions(+), 64 deletions(-)

diff -puN include/linux/mm_types.h~rearrange-struct-page include/linux/mm_types.h
--- a/include/linux/mm_types.h~rearrange-struct-page	2014-01-14 09:57:57.429681606 -0800
+++ b/include/linux/mm_types.h	2014-01-14 09:57:57.437681965 -0800
@@ -46,26 +46,59 @@ struct page {
 	unsigned long flags;		/* Atomic flags, some possibly
 					 * updated asynchronously */
 	union {
-		struct address_space *mapping;	/* If low bit clear, points to
-						 * inode address_space, or NULL.
-						 * If page mapped as anonymous
-						 * memory, low bit is set, and
-						 * it points to anon_vma object:
-						 * see PAGE_MAPPING_ANON below.
-						 */
-		void *s_mem;			/* slab first object */
-	};
-
-	/* Second double word */
-	struct {
-		union {
+		struct /* the normal uses */ {
 			pgoff_t index;		/* Our offset within mapping. */
-			void *freelist;		/* sl[aou]b first free object */
+			/*
+			 * mapping: If low bit clear, points to
+			 * inode address_space, or NULL.  If page
+			 * mapped as anonymous memory, low bit is
+			 * set, and it points to anon_vma object:
+			 * see PAGE_MAPPING_ANON below.
+			 */
+			struct address_space *mapping;
+			/*
+			 * Count of ptes mapped in mms, to show when page
+			 * is mapped & limit reverse map searches.
+			 *
+			 * Used also for tail pages refcounting instead
+			 * of _count. Tail pages cannot be mapped and
+			 * keeping the tail page _count zero at all times
+			 * guarantees get_page_unless_zero() will never
+			 * succeed on tail pages.
+			 */
+			atomic_t _mapcount;
+			atomic_t _count;
+		}; /* end of the "normal" use */
+
+		struct { /* SLUB */
+			void *unused;
+			void *freelist;
+			unsigned inuse:16;
+			unsigned objects:15;
+			unsigned frozen:1;
+			atomic_t dontuse_slub_count;
 		};
-
-		union {
+		struct { /* SLAB */
+			void *s_mem;
+			void *slab_freelist;
+			unsigned int active;
+			atomic_t dontuse_slab_count;
+		};
+		struct { /* SLOB */
+			void *slob_unused;
+			void *slob_freelist;
+			unsigned int units;
+			atomic_t dontuse_slob_count;
+		};
+		/*
+		 * This is here to help the slub code deal with
+		 * its inuse/objects/frozen bitfields as a single
+		 * blob.
+		 */
+		struct { /* slub helpers */
+			void *slubhelp_unused;
+			void *slubhelp_freelist;
 #if defined(CONFIG_SLUB_ATTEMPT_CMPXCHG_DOUBLE)
-			/* Used for cmpxchg_double in slub */
 			unsigned long counters;
 #else
 			/*
@@ -75,38 +108,6 @@ struct page {
 			 */
 			unsigned counters;
 #endif
-
-			struct {
-
-				union {
-					/*
-					 * Count of ptes mapped in
-					 * mms, to show when page is
-					 * mapped & limit reverse map
-					 * searches.
-					 *
-					 * Used also for tail pages
-					 * refcounting instead of
-					 * _count. Tail pages cannot
-					 * be mapped and keeping the
-					 * tail page _count zero at
-					 * all times guarantees
-					 * get_page_unless_zero() will
-					 * never succeed on tail
-					 * pages.
-					 */
-					atomic_t _mapcount;
-
-					struct { /* SLUB */
-						unsigned inuse:16;
-						unsigned objects:15;
-						unsigned frozen:1;
-					};
-					int units;	/* SLOB */
-				};
-				atomic_t _count;		/* Usage count, see below. */
-			};
-			unsigned int active;	/* SLAB */
 		};
 	};
 
diff -puN mm/slab.c~rearrange-struct-page mm/slab.c
--- a/mm/slab.c~rearrange-struct-page	2014-01-14 09:57:57.431681696 -0800
+++ b/mm/slab.c	2014-01-14 09:57:57.439682054 -0800
@@ -1955,7 +1955,7 @@ static void slab_destroy(struct kmem_cac
 {
 	void *freelist;
 
-	freelist = page->freelist;
+	freelist = page->slab_freelist;
 	slab_destroy_debugcheck(cachep, page);
 	if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU)) {
 		struct rcu_head *head;
@@ -2543,7 +2543,7 @@ static void *alloc_slabmgmt(struct kmem_
 
 static inline unsigned int *slab_freelist(struct page *page)
 {
-	return (unsigned int *)(page->freelist);
+	return (unsigned int *)(page->slab_freelist);
 }
 
 static void cache_init_objs(struct kmem_cache *cachep,
@@ -2648,7 +2648,7 @@ static void slab_map_pages(struct kmem_c
 			   void *freelist)
 {
 	page->slab_cache = cache;
-	page->freelist = freelist;
+	page->slab_freelist = freelist;
 }
 
 /*
diff -puN mm/slab_common.c~rearrange-struct-page mm/slab_common.c
--- a/mm/slab_common.c~rearrange-struct-page	2014-01-14 09:57:57.432681741 -0800
+++ b/mm/slab_common.c	2014-01-14 09:57:57.440682099 -0800
@@ -676,3 +676,20 @@ static int __init slab_proc_init(void)
 }
 module_init(slab_proc_init);
 #endif /* CONFIG_SLABINFO */
+#define SLAB_PAGE_CHECK(field1, field2)        \
+	BUILD_BUG_ON(offsetof(struct page, field1) !=   \
+		     offsetof(struct page, field2))
+/*
+ * To make the layout of 'struct page' look nicer, we've broken
+ * up a few of the unions.  Folks declaring their own use of the
+ * first few fields need to make sure that their use does not
+ * interfere with page->_count.  This ensures that the individual
+ * users' use actually lines up with the real ->_count.
+ */
+void slab_build_checks(void)
+{
+	SLAB_PAGE_CHECK(_count, dontuse_slab_count);
+	SLAB_PAGE_CHECK(_count, dontuse_slub_count);
+	SLAB_PAGE_CHECK(_count, dontuse_slob_count);
+}
+
diff -puN mm/slob.c~rearrange-struct-page mm/slob.c
--- a/mm/slob.c~rearrange-struct-page	2014-01-14 09:57:57.434681830 -0800
+++ b/mm/slob.c	2014-01-14 09:57:57.440682099 -0800
@@ -219,7 +219,8 @@ static void *slob_page_alloc(struct page
 	slob_t *prev, *cur, *aligned = NULL;
 	int delta = 0, units = SLOB_UNITS(size);
 
-	for (prev = NULL, cur = sp->freelist; ; prev = cur, cur = slob_next(cur)) {
+	for (prev = NULL, cur = sp->slob_freelist; ;
+	     prev = cur,  cur = slob_next(cur)) {
 		slobidx_t avail = slob_units(cur);
 
 		if (align) {
@@ -243,12 +244,12 @@ static void *slob_page_alloc(struct page
 				if (prev)
 					set_slob(prev, slob_units(prev), next);
 				else
-					sp->freelist = next;
+					sp->slob_freelist = next;
 			} else { /* fragment */
 				if (prev)
 					set_slob(prev, slob_units(prev), cur + units);
 				else
-					sp->freelist = cur + units;
+					sp->slob_freelist = cur + units;
 				set_slob(cur + units, avail - units, next);
 			}
 
@@ -321,7 +322,7 @@ static void *slob_alloc(size_t size, gfp
 
 		spin_lock_irqsave(&slob_lock, flags);
 		sp->units = SLOB_UNITS(PAGE_SIZE);
-		sp->freelist = b;
+		sp->slob_freelist = b;
 		INIT_LIST_HEAD(&sp->lru);
 		set_slob(b, SLOB_UNITS(PAGE_SIZE), b + SLOB_UNITS(PAGE_SIZE));
 		set_slob_page_free(sp, slob_list);
@@ -368,7 +369,7 @@ static void slob_free(void *block, int s
 	if (!slob_page_free(sp)) {
 		/* This slob page is about to become partially free. Easy! */
 		sp->units = units;
-		sp->freelist = b;
+		sp->slob_freelist = b;
 		set_slob(b, units,
 			(void *)((unsigned long)(b +
 					SLOB_UNITS(PAGE_SIZE)) & PAGE_MASK));
@@ -388,15 +389,15 @@ static void slob_free(void *block, int s
 	 */
 	sp->units += units;
 
-	if (b < (slob_t *)sp->freelist) {
-		if (b + units == sp->freelist) {
-			units += slob_units(sp->freelist);
-			sp->freelist = slob_next(sp->freelist);
+	if (b < (slob_t *)sp->slob_freelist) {
+		if (b + units == sp->slob_freelist) {
+			units += slob_units(sp->slob_freelist);
+			sp->slob_freelist = slob_next(sp->slob_freelist);
 		}
-		set_slob(b, units, sp->freelist);
-		sp->freelist = b;
+		set_slob(b, units, sp->slob_freelist);
+		sp->slob_freelist = b;
 	} else {
-		prev = sp->freelist;
+		prev = sp->slob_freelist;
 		next = slob_next(prev);
 		while (b > next) {
 			prev = next;
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [RFC][PATCH 6/9] mm: slub: rearrange 'struct page' fields
  2014-01-14 18:00 ` Dave Hansen
@ 2014-01-14 18:01   ` Dave Hansen
  -1 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-14 18:01 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, penberg, cl, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

SLUB has some very unique alignment constraints it places
on 'struct page'.  Break those out in to a separate structure
which will not pollute 'struct page'.

This structure will be moved around inside 'struct page' at
runtime in the next patch, so it is necessary to break it out for
those uses as well.

Vim pattern used for the renames:
%s/\(page\|new\)\(->\|\.\)\(freelist\|counters\)/slub_data(\1)\2\3/g

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/include/linux/mm_types.h |   46 ++++++--
 b/mm/slab_common.c         |   29 ++++-
 b/mm/slub.c                |  253 +++++++++++++++++++++++----------------------
 3 files changed, 192 insertions(+), 136 deletions(-)

diff -puN include/linux/mm_types.h~slub-rearrange include/linux/mm_types.h
--- a/include/linux/mm_types.h~slub-rearrange	2014-01-14 09:57:57.755696224 -0800
+++ b/include/linux/mm_types.h	2014-01-14 09:57:57.762696538 -0800
@@ -23,6 +23,43 @@
 
 struct address_space;
 
+struct slub_data {
+	void *unused;
+	void *freelist;
+	union {
+		struct {
+			unsigned inuse:16;
+			unsigned objects:15;
+			unsigned frozen:1;
+			atomic_t dontuse_slub_count;
+		};
+		/*
+		 * ->counters is used to make it easier to copy
+		 * all of the above counters in one chunk.
+		 * The actual counts are never accessed via this.
+		 */
+#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
+    defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
+		unsigned long counters;
+#else
+		/*
+		 * Keep _count separate from slub cmpxchg_double data.
+		 * As the rest of the double word is protected by
+		 * slab_lock but _count is not.
+		 */
+		struct {
+			unsigned counters;
+			/*
+			 * This isn't used directly, but declare it here
+			 * for clarity since it must line up with _count
+			 * from 'struct page'
+			 */
+			atomic_t separate_count;
+		};
+#endif
+	};
+};
+
 #define USE_SPLIT_PTE_PTLOCKS	(NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS)
 #define USE_SPLIT_PMD_PTLOCKS	(USE_SPLIT_PTE_PTLOCKS && \
 		IS_ENABLED(CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK))
@@ -70,14 +107,7 @@ struct page {
 			atomic_t _count;
 		}; /* end of the "normal" use */
 
-		struct { /* SLUB */
-			void *unused;
-			void *freelist;
-			unsigned inuse:16;
-			unsigned objects:15;
-			unsigned frozen:1;
-			atomic_t dontuse_slub_count;
-		};
+		struct slub_data slub_data;
 		struct { /* SLAB */
 			void *s_mem;
 			void *slab_freelist;
diff -puN mm/slab_common.c~slub-rearrange mm/slab_common.c
--- a/mm/slab_common.c~slub-rearrange	2014-01-14 09:57:57.757696314 -0800
+++ b/mm/slab_common.c	2014-01-14 09:57:57.763696583 -0800
@@ -676,20 +676,39 @@ static int __init slab_proc_init(void)
 }
 module_init(slab_proc_init);
 #endif /* CONFIG_SLABINFO */
+
 #define SLAB_PAGE_CHECK(field1, field2)        \
 	BUILD_BUG_ON(offsetof(struct page, field1) !=   \
 		     offsetof(struct page, field2))
 /*
  * To make the layout of 'struct page' look nicer, we've broken
- * up a few of the unions.  Folks declaring their own use of the
- * first few fields need to make sure that their use does not
- * interfere with page->_count.  This ensures that the individual
- * users' use actually lines up with the real ->_count.
+ * up a few of the unions.  But, this has made it hard to see if
+ * any given use will interfere with page->_count.
+ *
+ * To work around this, each user declares their own _count field
+ * and we check them at build time to ensure that the independent
+ * definitions actually line up with the real ->_count.
  */
 void slab_build_checks(void)
 {
 	SLAB_PAGE_CHECK(_count, dontuse_slab_count);
-	SLAB_PAGE_CHECK(_count, dontuse_slub_count);
+	SLAB_PAGE_CHECK(_count, slub_data.dontuse_slub_count);
 	SLAB_PAGE_CHECK(_count, dontuse_slob_count);
+
+	/*
+	 * When doing a double-cmpxchg, the slub code sucks in
+	 * _count.  But, this is harmless since if _count is
+	 * modified, the cmpxchg will fail.  When not using a
+	 * real cmpxchg, the slub code uses a lock.  But, _count
+	 * is not modified under that lock and updates can be
+	 * lost if they race with one of the "faked" cmpxchg
+	 * under that lock.  This makes sure that the space we
+	 * carve out for _count in that case actually lines up
+	 * with the real _count.
+	 */
+#if !(defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
+	    defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE))
+	SLAB_PAGE_CHECK(_count, slub_data.separate_count);
+#endif
 }
 
diff -puN mm/slub.c~slub-rearrange mm/slub.c
--- a/mm/slub.c~slub-rearrange	2014-01-14 09:57:57.759696404 -0800
+++ b/mm/slub.c	2014-01-14 09:57:57.766696718 -0800
@@ -52,8 +52,8 @@
  *   The slab_lock is only used for debugging and on arches that do not
  *   have the ability to do a cmpxchg_double. It only protects the second
  *   double word in the page struct. Meaning
- *	A. page->freelist	-> List of object free in a page
- *	B. page->counters	-> Counters of objects
+ *	A. slub_data(page)->freelist	-> List of object free in a page
+ *	B. slub_data(page)->counters	-> Counters of objects
  *	C. page->frozen		-> frozen state
  *
  *   If a slab is frozen then it is exempt from list management. It is not
@@ -237,6 +237,12 @@ static inline struct kmem_cache_node *ge
 	return s->node[node];
 }
 
+static inline struct slub_data *slub_data(struct page *page)
+{
+	void *ptr = &page->slub_data;
+	return ptr;
+}
+
 /* Verify that a pointer has an address that is valid within a slab page */
 static inline int check_valid_pointer(struct kmem_cache *s,
 				struct page *page, const void *object)
@@ -247,7 +253,7 @@ static inline int check_valid_pointer(st
 		return 1;
 
 	base = page_address(page);
-	if (object < base || object >= base + page->objects * s->size ||
+	if (object < base || object >= base + slub_data(page)->objects * s->size ||
 		(object - base) % s->size) {
 		return 0;
 	}
@@ -364,7 +370,7 @@ static inline bool __cmpxchg_double_slab
 	VM_BUG_ON(!irqs_disabled());
 #if defined(CONFIG_SLUB_ATTEMPT_CMPXCHG_DOUBLE)
 	if (s->flags & __CMPXCHG_DOUBLE) {
-		if (cmpxchg_double(&page->freelist, &page->counters,
+		if (cmpxchg_double(&slub_data(page)->freelist, &slub_data(page)->counters,
 			freelist_old, counters_old,
 			freelist_new, counters_new))
 		return 1;
@@ -372,10 +378,10 @@ static inline bool __cmpxchg_double_slab
 #endif
 	{
 		slab_lock(page);
-		if (page->freelist == freelist_old &&
-					page->counters == counters_old) {
-			page->freelist = freelist_new;
-			page->counters = counters_new;
+		if (slub_data(page)->freelist == freelist_old &&
+					slub_data(page)->counters == counters_old) {
+			slub_data(page)->freelist = freelist_new;
+			slub_data(page)->counters = counters_new;
 			slab_unlock(page);
 			return 1;
 		}
@@ -399,7 +405,7 @@ static inline bool cmpxchg_double_slab(s
 {
 #if defined(CONFIG_SLUB_ATTEMPT_CMPXCHG_DOUBLE)
 	if (s->flags & __CMPXCHG_DOUBLE) {
-		if (cmpxchg_double(&page->freelist, &page->counters,
+		if (cmpxchg_double(&slub_data(page)->freelist, &slub_data(page)->counters,
 			freelist_old, counters_old,
 			freelist_new, counters_new))
 		return 1;
@@ -410,10 +416,10 @@ static inline bool cmpxchg_double_slab(s
 
 		local_irq_save(flags);
 		slab_lock(page);
-		if (page->freelist == freelist_old &&
-					page->counters == counters_old) {
-			page->freelist = freelist_new;
-			page->counters = counters_new;
+		if (slub_data(page)->freelist == freelist_old &&
+		    slub_data(page)->counters == counters_old) {
+			slub_data(page)->freelist = freelist_new;
+			slub_data(page)->counters = counters_new;
 			slab_unlock(page);
 			local_irq_restore(flags);
 			return 1;
@@ -444,7 +450,7 @@ static void get_map(struct kmem_cache *s
 	void *p;
 	void *addr = page_address(page);
 
-	for (p = page->freelist; p; p = get_freepointer(s, p))
+	for (p = slub_data(page)->freelist; p; p = get_freepointer(s, p))
 		set_bit(slab_index(p, s, addr), map);
 }
 
@@ -555,7 +561,8 @@ static void print_page_info(struct page
 {
 	printk(KERN_ERR
 	       "INFO: Slab 0x%p objects=%u used=%u fp=0x%p flags=0x%04lx\n",
-	       page, page->objects, page->inuse, page->freelist, page->flags);
+	       page, slub_data(page)->objects, slub_data(page)->inuse,
+	       slub_data(page)->freelist, page->flags);
 
 }
 
@@ -841,14 +848,14 @@ static int check_slab(struct kmem_cache
 	}
 
 	maxobj = order_objects(compound_order(page), s->size, s->reserved);
-	if (page->objects > maxobj) {
+	if (slub_data(page)->objects > maxobj) {
 		slab_err(s, page, "objects %u > max %u",
-			s->name, page->objects, maxobj);
+			s->name, slub_data(page)->objects, maxobj);
 		return 0;
 	}
-	if (page->inuse > page->objects) {
+	if (slub_data(page)->inuse > slub_data(page)->objects) {
 		slab_err(s, page, "inuse %u > max %u",
-			s->name, page->inuse, page->objects);
+			s->name, slub_data(page)->inuse, slub_data(page)->objects);
 		return 0;
 	}
 	/* Slab_pad_check fixes things up after itself */
@@ -867,8 +874,8 @@ static int on_freelist(struct kmem_cache
 	void *object = NULL;
 	unsigned long max_objects;
 
-	fp = page->freelist;
-	while (fp && nr <= page->objects) {
+	fp = slub_data(page)->freelist;
+	while (fp && nr <= slub_data(page)->objects) {
 		if (fp == search)
 			return 1;
 		if (!check_valid_pointer(s, page, fp)) {
@@ -878,8 +885,8 @@ static int on_freelist(struct kmem_cache
 				set_freepointer(s, object, NULL);
 			} else {
 				slab_err(s, page, "Freepointer corrupt");
-				page->freelist = NULL;
-				page->inuse = page->objects;
+				slub_data(page)->freelist = NULL;
+				slub_data(page)->inuse = slub_data(page)->objects;
 				slab_fix(s, "Freelist cleared");
 				return 0;
 			}
@@ -894,16 +901,16 @@ static int on_freelist(struct kmem_cache
 	if (max_objects > MAX_OBJS_PER_PAGE)
 		max_objects = MAX_OBJS_PER_PAGE;
 
-	if (page->objects != max_objects) {
+	if (slub_data(page)->objects != max_objects) {
 		slab_err(s, page, "Wrong number of objects. Found %d but "
-			"should be %d", page->objects, max_objects);
-		page->objects = max_objects;
+			"should be %d", slub_data(page)->objects, max_objects);
+		slub_data(page)->objects = max_objects;
 		slab_fix(s, "Number of objects adjusted.");
 	}
-	if (page->inuse != page->objects - nr) {
+	if (slub_data(page)->inuse != slub_data(page)->objects - nr) {
 		slab_err(s, page, "Wrong object count. Counter is %d but "
-			"counted were %d", page->inuse, page->objects - nr);
-		page->inuse = page->objects - nr;
+			"counted were %d", slub_data(page)->inuse, slub_data(page)->objects - nr);
+		slub_data(page)->inuse = slub_data(page)->objects - nr;
 		slab_fix(s, "Object count adjusted.");
 	}
 	return search == NULL;
@@ -916,8 +923,8 @@ static void trace(struct kmem_cache *s,
 		printk(KERN_INFO "TRACE %s %s 0x%p inuse=%d fp=0x%p\n",
 			s->name,
 			alloc ? "alloc" : "free",
-			object, page->inuse,
-			page->freelist);
+			object, slub_data(page)->inuse,
+			slub_data(page)->freelist);
 
 		if (!alloc)
 			print_section("Object ", (void *)object,
@@ -1082,8 +1089,8 @@ bad:
 		 * as used avoids touching the remaining objects.
 		 */
 		slab_fix(s, "Marking all objects used");
-		page->inuse = page->objects;
-		page->freelist = NULL;
+		slub_data(page)->inuse = slub_data(page)->objects;
+		slub_data(page)->freelist = NULL;
 	}
 	return 0;
 }
@@ -1364,7 +1371,7 @@ static struct page *allocate_slab(struct
 	if (!page)
 		return NULL;
 
-	page->objects = oo_objects(oo);
+	slub_data(page)->objects = oo_objects(oo);
 	mod_zone_page_state(page_zone(page),
 		(s->flags & SLAB_RECLAIM_ACCOUNT) ?
 		NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
@@ -1397,7 +1404,7 @@ static struct page *new_slab(struct kmem
 		goto out;
 
 	order = compound_order(page);
-	inc_slabs_node(s, page_to_nid(page), page->objects);
+	inc_slabs_node(s, page_to_nid(page), slub_data(page)->objects);
 	memcg_bind_pages(s, order);
 	page->slab_cache = s;
 	__SetPageSlab(page);
@@ -1410,7 +1417,7 @@ static struct page *new_slab(struct kmem
 		memset(start, POISON_INUSE, PAGE_SIZE << order);
 
 	last = start;
-	for_each_object(p, s, start, page->objects) {
+	for_each_object(p, s, start, slub_data(page)->objects) {
 		setup_object(s, page, last);
 		set_freepointer(s, last, p);
 		last = p;
@@ -1418,9 +1425,9 @@ static struct page *new_slab(struct kmem
 	setup_object(s, page, last);
 	set_freepointer(s, last, NULL);
 
-	page->freelist = start;
-	page->inuse = page->objects;
-	page->frozen = 1;
+	slub_data(page)->freelist = start;
+	slub_data(page)->inuse = slub_data(page)->objects;
+	slub_data(page)->frozen = 1;
 out:
 	return page;
 }
@@ -1435,7 +1442,7 @@ static void __free_slab(struct kmem_cach
 
 		slab_pad_check(s, page);
 		for_each_object(p, s, page_address(page),
-						page->objects)
+						slub_data(page)->objects)
 			check_object(s, page, p, SLUB_RED_INACTIVE);
 	}
 
@@ -1496,7 +1503,7 @@ static void free_slab(struct kmem_cache
 
 static void discard_slab(struct kmem_cache *s, struct page *page)
 {
-	dec_slabs_node(s, page_to_nid(page), page->objects);
+	dec_slabs_node(s, page_to_nid(page), slub_data(page)->objects);
 	free_slab(s, page);
 }
 
@@ -1545,23 +1552,23 @@ static inline void *acquire_slab(struct
 	 * The old freelist is the list of objects for the
 	 * per cpu allocation list.
 	 */
-	freelist = page->freelist;
-	counters = page->counters;
-	new.counters = counters;
-	*objects = new.objects - new.inuse;
+	freelist = slub_data(page)->freelist;
+	counters = slub_data(page)->counters;
+	slub_data(&new)->counters = counters;
+	*objects = slub_data(&new)->objects - slub_data(&new)->inuse;
 	if (mode) {
-		new.inuse = page->objects;
-		new.freelist = NULL;
+		slub_data(&new)->inuse = slub_data(page)->objects;
+		slub_data(&new)->freelist = NULL;
 	} else {
-		new.freelist = freelist;
+		slub_data(&new)->freelist = freelist;
 	}
 
-	VM_BUG_ON_PAGE(new.frozen, &new);
-	new.frozen = 1;
+	VM_BUG_ON_PAGE(slub_data(&new)->frozen, &new);
+	slub_data(&new)->frozen = 1;
 
 	if (!__cmpxchg_double_slab(s, page,
 			freelist, counters,
-			new.freelist, new.counters,
+			slub_data(&new)->freelist, slub_data(&new)->counters,
 			"acquire_slab"))
 		return NULL;
 
@@ -1786,7 +1793,7 @@ static void deactivate_slab(struct kmem_
 	struct page new;
 	struct page old;
 
-	if (page->freelist) {
+	if (slub_data(page)->freelist) {
 		stat(s, DEACTIVATE_REMOTE_FREES);
 		tail = DEACTIVATE_TO_TAIL;
 	}
@@ -1804,16 +1811,16 @@ static void deactivate_slab(struct kmem_
 		unsigned long counters;
 
 		do {
-			prior = page->freelist;
-			counters = page->counters;
+			prior = slub_data(page)->freelist;
+			counters = slub_data(page)->counters;
 			set_freepointer(s, freelist, prior);
-			new.counters = counters;
-			new.inuse--;
-			VM_BUG_ON_PAGE(!new.frozen, &new);
+			slub_data(&new)->counters = counters;
+			slub_data(&new)->inuse--;
+			VM_BUG_ON_PAGE(!slub_data(&new)->frozen, &new);
 
 		} while (!__cmpxchg_double_slab(s, page,
 			prior, counters,
-			freelist, new.counters,
+			freelist, slub_data(&new)->counters,
 			"drain percpu freelist"));
 
 		freelist = nextfree;
@@ -1835,24 +1842,24 @@ static void deactivate_slab(struct kmem_
 	 */
 redo:
 
-	old.freelist = page->freelist;
-	old.counters = page->counters;
-	VM_BUG_ON_PAGE(!old.frozen, &old);
+	slub_data(&old)->freelist = slub_data(page)->freelist;
+	slub_data(&old)->counters = slub_data(page)->counters;
+	VM_BUG_ON_PAGE(!slub_data(&old)->frozen, &old);
 
 	/* Determine target state of the slab */
-	new.counters = old.counters;
+	slub_data(&new)->counters = slub_data(&old)->counters;
 	if (freelist) {
-		new.inuse--;
-		set_freepointer(s, freelist, old.freelist);
-		new.freelist = freelist;
+		slub_data(&new)->inuse--;
+		set_freepointer(s, freelist, slub_data(&old)->freelist);
+		slub_data(&new)->freelist = freelist;
 	} else
-		new.freelist = old.freelist;
+		slub_data(&new)->freelist = slub_data(&old)->freelist;
 
-	new.frozen = 0;
+	slub_data(&new)->frozen = 0;
 
-	if (!new.inuse && n->nr_partial > s->min_partial)
+	if (!slub_data(&new)->inuse && n->nr_partial > s->min_partial)
 		m = M_FREE;
-	else if (new.freelist) {
+	else if (slub_data(&new)->freelist) {
 		m = M_PARTIAL;
 		if (!lock) {
 			lock = 1;
@@ -1901,8 +1908,8 @@ redo:
 
 	l = m;
 	if (!__cmpxchg_double_slab(s, page,
-				old.freelist, old.counters,
-				new.freelist, new.counters,
+				slub_data(&old)->freelist, slub_data(&old)->counters,
+				slub_data(&new)->freelist, slub_data(&new)->counters,
 				"unfreezing slab"))
 		goto redo;
 
@@ -1947,21 +1954,21 @@ static void unfreeze_partials(struct kme
 
 		do {
 
-			old.freelist = page->freelist;
-			old.counters = page->counters;
-			VM_BUG_ON_PAGE(!old.frozen, &old);
+			slub_data(&old)->freelist = slub_data(page)->freelist;
+			slub_data(&old)->counters = slub_data(page)->counters;
+			VM_BUG_ON_PAGE(!slub_data(&old)->frozen, &old);
 
-			new.counters = old.counters;
-			new.freelist = old.freelist;
+			slub_data(&new)->counters = slub_data(&old)->counters;
+			slub_data(&new)->freelist = slub_data(&old)->freelist;
 
-			new.frozen = 0;
+			slub_data(&new)->frozen = 0;
 
 		} while (!__cmpxchg_double_slab(s, page,
-				old.freelist, old.counters,
-				new.freelist, new.counters,
+				slub_data(&old)->freelist, slub_data(&old)->counters,
+				slub_data(&new)->freelist, slub_data(&new)->counters,
 				"unfreezing slab"));
 
-		if (unlikely(!new.inuse && n->nr_partial > s->min_partial)) {
+		if (unlikely(!slub_data(&new)->inuse && n->nr_partial > s->min_partial)) {
 			page->next = discard_page;
 			discard_page = page;
 		} else {
@@ -2025,7 +2032,7 @@ static void put_cpu_partial(struct kmem_
 		}
 
 		pages++;
-		pobjects += page->objects - page->inuse;
+		pobjects += slub_data(page)->objects - slub_data(page)->inuse;
 
 		page->pages = pages;
 		page->pobjects = pobjects;
@@ -2098,7 +2105,7 @@ static inline int node_match(struct page
 
 static int count_free(struct page *page)
 {
-	return page->objects - page->inuse;
+	return slub_data(page)->objects - slub_data(page)->inuse;
 }
 
 static unsigned long count_partial(struct kmem_cache_node *n,
@@ -2181,8 +2188,8 @@ static inline void *new_slab_objects(str
 		 * No other reference to the page yet so we can
 		 * muck around with it freely without cmpxchg
 		 */
-		freelist = page->freelist;
-		page->freelist = NULL;
+		freelist = slub_data(page)->freelist;
+		slub_data(page)->freelist = NULL;
 
 		stat(s, ALLOC_SLAB);
 		c->page = page;
@@ -2202,7 +2209,7 @@ static inline bool pfmemalloc_match(stru
 }
 
 /*
- * Check the page->freelist of a page and either transfer the freelist to the
+ * Check the ->freelist of a page and either transfer the freelist to the
  * per cpu freelist or deactivate the page.
  *
  * The page is still frozen if the return value is not NULL.
@@ -2218,18 +2225,18 @@ static inline void *get_freelist(struct
 	void *freelist;
 
 	do {
-		freelist = page->freelist;
-		counters = page->counters;
+		freelist = slub_data(page)->freelist;
+		counters = slub_data(page)->counters;
 
-		new.counters = counters;
-		VM_BUG_ON_PAGE(!new.frozen, &new);
+		slub_data(&new)->counters = counters;
+		VM_BUG_ON_PAGE(!slub_data(&new)->frozen, &new);
 
-		new.inuse = page->objects;
-		new.frozen = freelist != NULL;
+		slub_data(&new)->inuse = slub_data(page)->objects;
+		slub_data(&new)->frozen = freelist != NULL;
 
 	} while (!__cmpxchg_double_slab(s, page,
 		freelist, counters,
-		NULL, new.counters,
+		NULL, slub_data(&new)->counters,
 		"get_freelist"));
 
 	return freelist;
@@ -2316,7 +2323,7 @@ load_freelist:
 	 * page is pointing to the page from which the objects are obtained.
 	 * That page must be frozen for per cpu allocations to work.
 	 */
-	VM_BUG_ON_PAGE(!c->page->frozen, c->page);
+	VM_BUG_ON_PAGE(!slub_data(c->page)->frozen, c->page);
 	c->freelist = get_freepointer(s, freelist);
 	c->tid = next_tid(c->tid);
 	local_irq_restore(flags);
@@ -2530,13 +2537,13 @@ static void __slab_free(struct kmem_cach
 			spin_unlock_irqrestore(&n->list_lock, flags);
 			n = NULL;
 		}
-		prior = page->freelist;
-		counters = page->counters;
+		prior = slub_data(page)->freelist;
+		counters = slub_data(page)->counters;
 		set_freepointer(s, object, prior);
-		new.counters = counters;
-		was_frozen = new.frozen;
-		new.inuse--;
-		if ((!new.inuse || !prior) && !was_frozen) {
+		slub_data(&new)->counters = counters;
+		was_frozen = slub_data(&new)->frozen;
+		slub_data(&new)->inuse--;
+		if ((!slub_data(&new)->inuse || !prior) && !was_frozen) {
 
 			if (kmem_cache_has_cpu_partial(s) && !prior) {
 
@@ -2546,7 +2553,7 @@ static void __slab_free(struct kmem_cach
 				 * We can defer the list move and instead
 				 * freeze it.
 				 */
-				new.frozen = 1;
+				slub_data(&new)->frozen = 1;
 
 			} else { /* Needs to be taken off a list */
 
@@ -2566,7 +2573,7 @@ static void __slab_free(struct kmem_cach
 
 	} while (!cmpxchg_double_slab(s, page,
 		prior, counters,
-		object, new.counters,
+		object, slub_data(&new)->counters,
 		"__slab_free"));
 
 	if (likely(!n)) {
@@ -2575,7 +2582,7 @@ static void __slab_free(struct kmem_cach
 		 * If we just froze the page then put it onto the
 		 * per cpu partial list.
 		 */
-		if (new.frozen && !was_frozen) {
+		if (slub_data(&new)->frozen && !was_frozen) {
 			put_cpu_partial(s, page, 1);
 			stat(s, CPU_PARTIAL_FREE);
 		}
@@ -2588,7 +2595,7 @@ static void __slab_free(struct kmem_cach
                 return;
         }
 
-	if (unlikely(!new.inuse && n->nr_partial > s->min_partial))
+	if (unlikely(!slub_data(&new)->inuse && n->nr_partial > s->min_partial))
 		goto slab_empty;
 
 	/*
@@ -2875,18 +2882,18 @@ static void early_kmem_cache_node_alloc(
 				"in order to be able to continue\n");
 	}
 
-	n = page->freelist;
+	n = slub_data(page)->freelist;
 	BUG_ON(!n);
-	page->freelist = get_freepointer(kmem_cache_node, n);
-	page->inuse = 1;
-	page->frozen = 0;
+	slub_data(page)->freelist = get_freepointer(kmem_cache_node, n);
+	slub_data(page)->inuse = 1;
+	slub_data(page)->frozen = 0;
 	kmem_cache_node->node[node] = n;
 #ifdef CONFIG_SLUB_DEBUG
 	init_object(kmem_cache_node, n, SLUB_RED_ACTIVE);
 	init_tracking(kmem_cache_node, n);
 #endif
 	init_kmem_cache_node(n);
-	inc_slabs_node(kmem_cache_node, node, page->objects);
+	inc_slabs_node(kmem_cache_node, node, slub_data(page)->objects);
 
 	add_partial(n, page, DEACTIVATE_TO_HEAD);
 }
@@ -3141,7 +3148,7 @@ static void list_slab_objects(struct kme
 #ifdef CONFIG_SLUB_DEBUG
 	void *addr = page_address(page);
 	void *p;
-	unsigned long *map = kzalloc(BITS_TO_LONGS(page->objects) *
+	unsigned long *map = kzalloc(BITS_TO_LONGS(slub_data(page)->objects) *
 				     sizeof(long), GFP_ATOMIC);
 	if (!map)
 		return;
@@ -3149,7 +3156,7 @@ static void list_slab_objects(struct kme
 	slab_lock(page);
 
 	get_map(s, page, map);
-	for_each_object(p, s, addr, page->objects) {
+	for_each_object(p, s, addr, slub_data(page)->objects) {
 
 		if (!test_bit(slab_index(p, s, addr), map)) {
 			printk(KERN_ERR "INFO: Object 0x%p @offset=%tu\n",
@@ -3172,7 +3179,7 @@ static void free_partial(struct kmem_cac
 	struct page *page, *h;
 
 	list_for_each_entry_safe(page, h, &n->partial, lru) {
-		if (!page->inuse) {
+		if (!slub_data(page)->inuse) {
 			remove_partial(n, page);
 			discard_slab(s, page);
 		} else {
@@ -3409,11 +3416,11 @@ int kmem_cache_shrink(struct kmem_cache
 		 * Build lists indexed by the items in use in each slab.
 		 *
 		 * Note that concurrent frees may occur while we hold the
-		 * list_lock. page->inuse here is the upper limit.
+		 * list_lock.  ->inuse here is the upper limit.
 		 */
 		list_for_each_entry_safe(page, t, &n->partial, lru) {
-			list_move(&page->lru, slabs_by_inuse + page->inuse);
-			if (!page->inuse)
+			list_move(&page->lru, slabs_by_inuse + slub_data(page)->inuse);
+			if (!slub_data(page)->inuse)
 				n->nr_partial--;
 		}
 
@@ -3852,12 +3859,12 @@ void *__kmalloc_node_track_caller(size_t
 #ifdef CONFIG_SYSFS
 static int count_inuse(struct page *page)
 {
-	return page->inuse;
+	return slub_data(page)->inuse;
 }
 
 static int count_total(struct page *page)
 {
-	return page->objects;
+	return slub_data(page)->objects;
 }
 #endif
 
@@ -3873,16 +3880,16 @@ static int validate_slab(struct kmem_cac
 		return 0;
 
 	/* Now we know that a valid freelist exists */
-	bitmap_zero(map, page->objects);
+	bitmap_zero(map, slub_data(page)->objects);
 
 	get_map(s, page, map);
-	for_each_object(p, s, addr, page->objects) {
+	for_each_object(p, s, addr, slub_data(page)->objects) {
 		if (test_bit(slab_index(p, s, addr), map))
 			if (!check_object(s, page, p, SLUB_RED_INACTIVE))
 				return 0;
 	}
 
-	for_each_object(p, s, addr, page->objects)
+	for_each_object(p, s, addr, slub_data(page)->objects)
 		if (!test_bit(slab_index(p, s, addr), map))
 			if (!check_object(s, page, p, SLUB_RED_ACTIVE))
 				return 0;
@@ -4083,10 +4090,10 @@ static void process_slab(struct loc_trac
 	void *addr = page_address(page);
 	void *p;
 
-	bitmap_zero(map, page->objects);
+	bitmap_zero(map, slub_data(page)->objects);
 	get_map(s, page, map);
 
-	for_each_object(p, s, addr, page->objects)
+	for_each_object(p, s, addr, slub_data(page)->objects)
 		if (!test_bit(slab_index(p, s, addr), map))
 			add_location(t, s, get_track(s, p, alloc));
 }
@@ -4285,9 +4292,9 @@ static ssize_t show_slab_objects(struct
 
 			node = page_to_nid(page);
 			if (flags & SO_TOTAL)
-				x = page->objects;
+				x = slub_data(page)->objects;
 			else if (flags & SO_OBJECTS)
-				x = page->inuse;
+				x = slub_data(page)->inuse;
 			else
 				x = 1;
 
_

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [RFC][PATCH 6/9] mm: slub: rearrange 'struct page' fields
@ 2014-01-14 18:01   ` Dave Hansen
  0 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-14 18:01 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, penberg, cl, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

SLUB has some very unique alignment constraints it places
on 'struct page'.  Break those out in to a separate structure
which will not pollute 'struct page'.

This structure will be moved around inside 'struct page' at
runtime in the next patch, so it is necessary to break it out for
those uses as well.

Vim pattern used for the renames:
%s/\(page\|new\)\(->\|\.\)\(freelist\|counters\)/slub_data(\1)\2\3/g

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/include/linux/mm_types.h |   46 ++++++--
 b/mm/slab_common.c         |   29 ++++-
 b/mm/slub.c                |  253 +++++++++++++++++++++++----------------------
 3 files changed, 192 insertions(+), 136 deletions(-)

diff -puN include/linux/mm_types.h~slub-rearrange include/linux/mm_types.h
--- a/include/linux/mm_types.h~slub-rearrange	2014-01-14 09:57:57.755696224 -0800
+++ b/include/linux/mm_types.h	2014-01-14 09:57:57.762696538 -0800
@@ -23,6 +23,43 @@
 
 struct address_space;
 
+struct slub_data {
+	void *unused;
+	void *freelist;
+	union {
+		struct {
+			unsigned inuse:16;
+			unsigned objects:15;
+			unsigned frozen:1;
+			atomic_t dontuse_slub_count;
+		};
+		/*
+		 * ->counters is used to make it easier to copy
+		 * all of the above counters in one chunk.
+		 * The actual counts are never accessed via this.
+		 */
+#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
+    defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
+		unsigned long counters;
+#else
+		/*
+		 * Keep _count separate from slub cmpxchg_double data.
+		 * As the rest of the double word is protected by
+		 * slab_lock but _count is not.
+		 */
+		struct {
+			unsigned counters;
+			/*
+			 * This isn't used directly, but declare it here
+			 * for clarity since it must line up with _count
+			 * from 'struct page'
+			 */
+			atomic_t separate_count;
+		};
+#endif
+	};
+};
+
 #define USE_SPLIT_PTE_PTLOCKS	(NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS)
 #define USE_SPLIT_PMD_PTLOCKS	(USE_SPLIT_PTE_PTLOCKS && \
 		IS_ENABLED(CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK))
@@ -70,14 +107,7 @@ struct page {
 			atomic_t _count;
 		}; /* end of the "normal" use */
 
-		struct { /* SLUB */
-			void *unused;
-			void *freelist;
-			unsigned inuse:16;
-			unsigned objects:15;
-			unsigned frozen:1;
-			atomic_t dontuse_slub_count;
-		};
+		struct slub_data slub_data;
 		struct { /* SLAB */
 			void *s_mem;
 			void *slab_freelist;
diff -puN mm/slab_common.c~slub-rearrange mm/slab_common.c
--- a/mm/slab_common.c~slub-rearrange	2014-01-14 09:57:57.757696314 -0800
+++ b/mm/slab_common.c	2014-01-14 09:57:57.763696583 -0800
@@ -676,20 +676,39 @@ static int __init slab_proc_init(void)
 }
 module_init(slab_proc_init);
 #endif /* CONFIG_SLABINFO */
+
 #define SLAB_PAGE_CHECK(field1, field2)        \
 	BUILD_BUG_ON(offsetof(struct page, field1) !=   \
 		     offsetof(struct page, field2))
 /*
  * To make the layout of 'struct page' look nicer, we've broken
- * up a few of the unions.  Folks declaring their own use of the
- * first few fields need to make sure that their use does not
- * interfere with page->_count.  This ensures that the individual
- * users' use actually lines up with the real ->_count.
+ * up a few of the unions.  But, this has made it hard to see if
+ * any given use will interfere with page->_count.
+ *
+ * To work around this, each user declares their own _count field
+ * and we check them at build time to ensure that the independent
+ * definitions actually line up with the real ->_count.
  */
 void slab_build_checks(void)
 {
 	SLAB_PAGE_CHECK(_count, dontuse_slab_count);
-	SLAB_PAGE_CHECK(_count, dontuse_slub_count);
+	SLAB_PAGE_CHECK(_count, slub_data.dontuse_slub_count);
 	SLAB_PAGE_CHECK(_count, dontuse_slob_count);
+
+	/*
+	 * When doing a double-cmpxchg, the slub code sucks in
+	 * _count.  But, this is harmless since if _count is
+	 * modified, the cmpxchg will fail.  When not using a
+	 * real cmpxchg, the slub code uses a lock.  But, _count
+	 * is not modified under that lock and updates can be
+	 * lost if they race with one of the "faked" cmpxchg
+	 * under that lock.  This makes sure that the space we
+	 * carve out for _count in that case actually lines up
+	 * with the real _count.
+	 */
+#if !(defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
+	    defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE))
+	SLAB_PAGE_CHECK(_count, slub_data.separate_count);
+#endif
 }
 
diff -puN mm/slub.c~slub-rearrange mm/slub.c
--- a/mm/slub.c~slub-rearrange	2014-01-14 09:57:57.759696404 -0800
+++ b/mm/slub.c	2014-01-14 09:57:57.766696718 -0800
@@ -52,8 +52,8 @@
  *   The slab_lock is only used for debugging and on arches that do not
  *   have the ability to do a cmpxchg_double. It only protects the second
  *   double word in the page struct. Meaning
- *	A. page->freelist	-> List of object free in a page
- *	B. page->counters	-> Counters of objects
+ *	A. slub_data(page)->freelist	-> List of object free in a page
+ *	B. slub_data(page)->counters	-> Counters of objects
  *	C. page->frozen		-> frozen state
  *
  *   If a slab is frozen then it is exempt from list management. It is not
@@ -237,6 +237,12 @@ static inline struct kmem_cache_node *ge
 	return s->node[node];
 }
 
+static inline struct slub_data *slub_data(struct page *page)
+{
+	void *ptr = &page->slub_data;
+	return ptr;
+}
+
 /* Verify that a pointer has an address that is valid within a slab page */
 static inline int check_valid_pointer(struct kmem_cache *s,
 				struct page *page, const void *object)
@@ -247,7 +253,7 @@ static inline int check_valid_pointer(st
 		return 1;
 
 	base = page_address(page);
-	if (object < base || object >= base + page->objects * s->size ||
+	if (object < base || object >= base + slub_data(page)->objects * s->size ||
 		(object - base) % s->size) {
 		return 0;
 	}
@@ -364,7 +370,7 @@ static inline bool __cmpxchg_double_slab
 	VM_BUG_ON(!irqs_disabled());
 #if defined(CONFIG_SLUB_ATTEMPT_CMPXCHG_DOUBLE)
 	if (s->flags & __CMPXCHG_DOUBLE) {
-		if (cmpxchg_double(&page->freelist, &page->counters,
+		if (cmpxchg_double(&slub_data(page)->freelist, &slub_data(page)->counters,
 			freelist_old, counters_old,
 			freelist_new, counters_new))
 		return 1;
@@ -372,10 +378,10 @@ static inline bool __cmpxchg_double_slab
 #endif
 	{
 		slab_lock(page);
-		if (page->freelist == freelist_old &&
-					page->counters == counters_old) {
-			page->freelist = freelist_new;
-			page->counters = counters_new;
+		if (slub_data(page)->freelist == freelist_old &&
+					slub_data(page)->counters == counters_old) {
+			slub_data(page)->freelist = freelist_new;
+			slub_data(page)->counters = counters_new;
 			slab_unlock(page);
 			return 1;
 		}
@@ -399,7 +405,7 @@ static inline bool cmpxchg_double_slab(s
 {
 #if defined(CONFIG_SLUB_ATTEMPT_CMPXCHG_DOUBLE)
 	if (s->flags & __CMPXCHG_DOUBLE) {
-		if (cmpxchg_double(&page->freelist, &page->counters,
+		if (cmpxchg_double(&slub_data(page)->freelist, &slub_data(page)->counters,
 			freelist_old, counters_old,
 			freelist_new, counters_new))
 		return 1;
@@ -410,10 +416,10 @@ static inline bool cmpxchg_double_slab(s
 
 		local_irq_save(flags);
 		slab_lock(page);
-		if (page->freelist == freelist_old &&
-					page->counters == counters_old) {
-			page->freelist = freelist_new;
-			page->counters = counters_new;
+		if (slub_data(page)->freelist == freelist_old &&
+		    slub_data(page)->counters == counters_old) {
+			slub_data(page)->freelist = freelist_new;
+			slub_data(page)->counters = counters_new;
 			slab_unlock(page);
 			local_irq_restore(flags);
 			return 1;
@@ -444,7 +450,7 @@ static void get_map(struct kmem_cache *s
 	void *p;
 	void *addr = page_address(page);
 
-	for (p = page->freelist; p; p = get_freepointer(s, p))
+	for (p = slub_data(page)->freelist; p; p = get_freepointer(s, p))
 		set_bit(slab_index(p, s, addr), map);
 }
 
@@ -555,7 +561,8 @@ static void print_page_info(struct page
 {
 	printk(KERN_ERR
 	       "INFO: Slab 0x%p objects=%u used=%u fp=0x%p flags=0x%04lx\n",
-	       page, page->objects, page->inuse, page->freelist, page->flags);
+	       page, slub_data(page)->objects, slub_data(page)->inuse,
+	       slub_data(page)->freelist, page->flags);
 
 }
 
@@ -841,14 +848,14 @@ static int check_slab(struct kmem_cache
 	}
 
 	maxobj = order_objects(compound_order(page), s->size, s->reserved);
-	if (page->objects > maxobj) {
+	if (slub_data(page)->objects > maxobj) {
 		slab_err(s, page, "objects %u > max %u",
-			s->name, page->objects, maxobj);
+			s->name, slub_data(page)->objects, maxobj);
 		return 0;
 	}
-	if (page->inuse > page->objects) {
+	if (slub_data(page)->inuse > slub_data(page)->objects) {
 		slab_err(s, page, "inuse %u > max %u",
-			s->name, page->inuse, page->objects);
+			s->name, slub_data(page)->inuse, slub_data(page)->objects);
 		return 0;
 	}
 	/* Slab_pad_check fixes things up after itself */
@@ -867,8 +874,8 @@ static int on_freelist(struct kmem_cache
 	void *object = NULL;
 	unsigned long max_objects;
 
-	fp = page->freelist;
-	while (fp && nr <= page->objects) {
+	fp = slub_data(page)->freelist;
+	while (fp && nr <= slub_data(page)->objects) {
 		if (fp == search)
 			return 1;
 		if (!check_valid_pointer(s, page, fp)) {
@@ -878,8 +885,8 @@ static int on_freelist(struct kmem_cache
 				set_freepointer(s, object, NULL);
 			} else {
 				slab_err(s, page, "Freepointer corrupt");
-				page->freelist = NULL;
-				page->inuse = page->objects;
+				slub_data(page)->freelist = NULL;
+				slub_data(page)->inuse = slub_data(page)->objects;
 				slab_fix(s, "Freelist cleared");
 				return 0;
 			}
@@ -894,16 +901,16 @@ static int on_freelist(struct kmem_cache
 	if (max_objects > MAX_OBJS_PER_PAGE)
 		max_objects = MAX_OBJS_PER_PAGE;
 
-	if (page->objects != max_objects) {
+	if (slub_data(page)->objects != max_objects) {
 		slab_err(s, page, "Wrong number of objects. Found %d but "
-			"should be %d", page->objects, max_objects);
-		page->objects = max_objects;
+			"should be %d", slub_data(page)->objects, max_objects);
+		slub_data(page)->objects = max_objects;
 		slab_fix(s, "Number of objects adjusted.");
 	}
-	if (page->inuse != page->objects - nr) {
+	if (slub_data(page)->inuse != slub_data(page)->objects - nr) {
 		slab_err(s, page, "Wrong object count. Counter is %d but "
-			"counted were %d", page->inuse, page->objects - nr);
-		page->inuse = page->objects - nr;
+			"counted were %d", slub_data(page)->inuse, slub_data(page)->objects - nr);
+		slub_data(page)->inuse = slub_data(page)->objects - nr;
 		slab_fix(s, "Object count adjusted.");
 	}
 	return search == NULL;
@@ -916,8 +923,8 @@ static void trace(struct kmem_cache *s,
 		printk(KERN_INFO "TRACE %s %s 0x%p inuse=%d fp=0x%p\n",
 			s->name,
 			alloc ? "alloc" : "free",
-			object, page->inuse,
-			page->freelist);
+			object, slub_data(page)->inuse,
+			slub_data(page)->freelist);
 
 		if (!alloc)
 			print_section("Object ", (void *)object,
@@ -1082,8 +1089,8 @@ bad:
 		 * as used avoids touching the remaining objects.
 		 */
 		slab_fix(s, "Marking all objects used");
-		page->inuse = page->objects;
-		page->freelist = NULL;
+		slub_data(page)->inuse = slub_data(page)->objects;
+		slub_data(page)->freelist = NULL;
 	}
 	return 0;
 }
@@ -1364,7 +1371,7 @@ static struct page *allocate_slab(struct
 	if (!page)
 		return NULL;
 
-	page->objects = oo_objects(oo);
+	slub_data(page)->objects = oo_objects(oo);
 	mod_zone_page_state(page_zone(page),
 		(s->flags & SLAB_RECLAIM_ACCOUNT) ?
 		NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
@@ -1397,7 +1404,7 @@ static struct page *new_slab(struct kmem
 		goto out;
 
 	order = compound_order(page);
-	inc_slabs_node(s, page_to_nid(page), page->objects);
+	inc_slabs_node(s, page_to_nid(page), slub_data(page)->objects);
 	memcg_bind_pages(s, order);
 	page->slab_cache = s;
 	__SetPageSlab(page);
@@ -1410,7 +1417,7 @@ static struct page *new_slab(struct kmem
 		memset(start, POISON_INUSE, PAGE_SIZE << order);
 
 	last = start;
-	for_each_object(p, s, start, page->objects) {
+	for_each_object(p, s, start, slub_data(page)->objects) {
 		setup_object(s, page, last);
 		set_freepointer(s, last, p);
 		last = p;
@@ -1418,9 +1425,9 @@ static struct page *new_slab(struct kmem
 	setup_object(s, page, last);
 	set_freepointer(s, last, NULL);
 
-	page->freelist = start;
-	page->inuse = page->objects;
-	page->frozen = 1;
+	slub_data(page)->freelist = start;
+	slub_data(page)->inuse = slub_data(page)->objects;
+	slub_data(page)->frozen = 1;
 out:
 	return page;
 }
@@ -1435,7 +1442,7 @@ static void __free_slab(struct kmem_cach
 
 		slab_pad_check(s, page);
 		for_each_object(p, s, page_address(page),
-						page->objects)
+						slub_data(page)->objects)
 			check_object(s, page, p, SLUB_RED_INACTIVE);
 	}
 
@@ -1496,7 +1503,7 @@ static void free_slab(struct kmem_cache
 
 static void discard_slab(struct kmem_cache *s, struct page *page)
 {
-	dec_slabs_node(s, page_to_nid(page), page->objects);
+	dec_slabs_node(s, page_to_nid(page), slub_data(page)->objects);
 	free_slab(s, page);
 }
 
@@ -1545,23 +1552,23 @@ static inline void *acquire_slab(struct
 	 * The old freelist is the list of objects for the
 	 * per cpu allocation list.
 	 */
-	freelist = page->freelist;
-	counters = page->counters;
-	new.counters = counters;
-	*objects = new.objects - new.inuse;
+	freelist = slub_data(page)->freelist;
+	counters = slub_data(page)->counters;
+	slub_data(&new)->counters = counters;
+	*objects = slub_data(&new)->objects - slub_data(&new)->inuse;
 	if (mode) {
-		new.inuse = page->objects;
-		new.freelist = NULL;
+		slub_data(&new)->inuse = slub_data(page)->objects;
+		slub_data(&new)->freelist = NULL;
 	} else {
-		new.freelist = freelist;
+		slub_data(&new)->freelist = freelist;
 	}
 
-	VM_BUG_ON_PAGE(new.frozen, &new);
-	new.frozen = 1;
+	VM_BUG_ON_PAGE(slub_data(&new)->frozen, &new);
+	slub_data(&new)->frozen = 1;
 
 	if (!__cmpxchg_double_slab(s, page,
 			freelist, counters,
-			new.freelist, new.counters,
+			slub_data(&new)->freelist, slub_data(&new)->counters,
 			"acquire_slab"))
 		return NULL;
 
@@ -1786,7 +1793,7 @@ static void deactivate_slab(struct kmem_
 	struct page new;
 	struct page old;
 
-	if (page->freelist) {
+	if (slub_data(page)->freelist) {
 		stat(s, DEACTIVATE_REMOTE_FREES);
 		tail = DEACTIVATE_TO_TAIL;
 	}
@@ -1804,16 +1811,16 @@ static void deactivate_slab(struct kmem_
 		unsigned long counters;
 
 		do {
-			prior = page->freelist;
-			counters = page->counters;
+			prior = slub_data(page)->freelist;
+			counters = slub_data(page)->counters;
 			set_freepointer(s, freelist, prior);
-			new.counters = counters;
-			new.inuse--;
-			VM_BUG_ON_PAGE(!new.frozen, &new);
+			slub_data(&new)->counters = counters;
+			slub_data(&new)->inuse--;
+			VM_BUG_ON_PAGE(!slub_data(&new)->frozen, &new);
 
 		} while (!__cmpxchg_double_slab(s, page,
 			prior, counters,
-			freelist, new.counters,
+			freelist, slub_data(&new)->counters,
 			"drain percpu freelist"));
 
 		freelist = nextfree;
@@ -1835,24 +1842,24 @@ static void deactivate_slab(struct kmem_
 	 */
 redo:
 
-	old.freelist = page->freelist;
-	old.counters = page->counters;
-	VM_BUG_ON_PAGE(!old.frozen, &old);
+	slub_data(&old)->freelist = slub_data(page)->freelist;
+	slub_data(&old)->counters = slub_data(page)->counters;
+	VM_BUG_ON_PAGE(!slub_data(&old)->frozen, &old);
 
 	/* Determine target state of the slab */
-	new.counters = old.counters;
+	slub_data(&new)->counters = slub_data(&old)->counters;
 	if (freelist) {
-		new.inuse--;
-		set_freepointer(s, freelist, old.freelist);
-		new.freelist = freelist;
+		slub_data(&new)->inuse--;
+		set_freepointer(s, freelist, slub_data(&old)->freelist);
+		slub_data(&new)->freelist = freelist;
 	} else
-		new.freelist = old.freelist;
+		slub_data(&new)->freelist = slub_data(&old)->freelist;
 
-	new.frozen = 0;
+	slub_data(&new)->frozen = 0;
 
-	if (!new.inuse && n->nr_partial > s->min_partial)
+	if (!slub_data(&new)->inuse && n->nr_partial > s->min_partial)
 		m = M_FREE;
-	else if (new.freelist) {
+	else if (slub_data(&new)->freelist) {
 		m = M_PARTIAL;
 		if (!lock) {
 			lock = 1;
@@ -1901,8 +1908,8 @@ redo:
 
 	l = m;
 	if (!__cmpxchg_double_slab(s, page,
-				old.freelist, old.counters,
-				new.freelist, new.counters,
+				slub_data(&old)->freelist, slub_data(&old)->counters,
+				slub_data(&new)->freelist, slub_data(&new)->counters,
 				"unfreezing slab"))
 		goto redo;
 
@@ -1947,21 +1954,21 @@ static void unfreeze_partials(struct kme
 
 		do {
 
-			old.freelist = page->freelist;
-			old.counters = page->counters;
-			VM_BUG_ON_PAGE(!old.frozen, &old);
+			slub_data(&old)->freelist = slub_data(page)->freelist;
+			slub_data(&old)->counters = slub_data(page)->counters;
+			VM_BUG_ON_PAGE(!slub_data(&old)->frozen, &old);
 
-			new.counters = old.counters;
-			new.freelist = old.freelist;
+			slub_data(&new)->counters = slub_data(&old)->counters;
+			slub_data(&new)->freelist = slub_data(&old)->freelist;
 
-			new.frozen = 0;
+			slub_data(&new)->frozen = 0;
 
 		} while (!__cmpxchg_double_slab(s, page,
-				old.freelist, old.counters,
-				new.freelist, new.counters,
+				slub_data(&old)->freelist, slub_data(&old)->counters,
+				slub_data(&new)->freelist, slub_data(&new)->counters,
 				"unfreezing slab"));
 
-		if (unlikely(!new.inuse && n->nr_partial > s->min_partial)) {
+		if (unlikely(!slub_data(&new)->inuse && n->nr_partial > s->min_partial)) {
 			page->next = discard_page;
 			discard_page = page;
 		} else {
@@ -2025,7 +2032,7 @@ static void put_cpu_partial(struct kmem_
 		}
 
 		pages++;
-		pobjects += page->objects - page->inuse;
+		pobjects += slub_data(page)->objects - slub_data(page)->inuse;
 
 		page->pages = pages;
 		page->pobjects = pobjects;
@@ -2098,7 +2105,7 @@ static inline int node_match(struct page
 
 static int count_free(struct page *page)
 {
-	return page->objects - page->inuse;
+	return slub_data(page)->objects - slub_data(page)->inuse;
 }
 
 static unsigned long count_partial(struct kmem_cache_node *n,
@@ -2181,8 +2188,8 @@ static inline void *new_slab_objects(str
 		 * No other reference to the page yet so we can
 		 * muck around with it freely without cmpxchg
 		 */
-		freelist = page->freelist;
-		page->freelist = NULL;
+		freelist = slub_data(page)->freelist;
+		slub_data(page)->freelist = NULL;
 
 		stat(s, ALLOC_SLAB);
 		c->page = page;
@@ -2202,7 +2209,7 @@ static inline bool pfmemalloc_match(stru
 }
 
 /*
- * Check the page->freelist of a page and either transfer the freelist to the
+ * Check the ->freelist of a page and either transfer the freelist to the
  * per cpu freelist or deactivate the page.
  *
  * The page is still frozen if the return value is not NULL.
@@ -2218,18 +2225,18 @@ static inline void *get_freelist(struct
 	void *freelist;
 
 	do {
-		freelist = page->freelist;
-		counters = page->counters;
+		freelist = slub_data(page)->freelist;
+		counters = slub_data(page)->counters;
 
-		new.counters = counters;
-		VM_BUG_ON_PAGE(!new.frozen, &new);
+		slub_data(&new)->counters = counters;
+		VM_BUG_ON_PAGE(!slub_data(&new)->frozen, &new);
 
-		new.inuse = page->objects;
-		new.frozen = freelist != NULL;
+		slub_data(&new)->inuse = slub_data(page)->objects;
+		slub_data(&new)->frozen = freelist != NULL;
 
 	} while (!__cmpxchg_double_slab(s, page,
 		freelist, counters,
-		NULL, new.counters,
+		NULL, slub_data(&new)->counters,
 		"get_freelist"));
 
 	return freelist;
@@ -2316,7 +2323,7 @@ load_freelist:
 	 * page is pointing to the page from which the objects are obtained.
 	 * That page must be frozen for per cpu allocations to work.
 	 */
-	VM_BUG_ON_PAGE(!c->page->frozen, c->page);
+	VM_BUG_ON_PAGE(!slub_data(c->page)->frozen, c->page);
 	c->freelist = get_freepointer(s, freelist);
 	c->tid = next_tid(c->tid);
 	local_irq_restore(flags);
@@ -2530,13 +2537,13 @@ static void __slab_free(struct kmem_cach
 			spin_unlock_irqrestore(&n->list_lock, flags);
 			n = NULL;
 		}
-		prior = page->freelist;
-		counters = page->counters;
+		prior = slub_data(page)->freelist;
+		counters = slub_data(page)->counters;
 		set_freepointer(s, object, prior);
-		new.counters = counters;
-		was_frozen = new.frozen;
-		new.inuse--;
-		if ((!new.inuse || !prior) && !was_frozen) {
+		slub_data(&new)->counters = counters;
+		was_frozen = slub_data(&new)->frozen;
+		slub_data(&new)->inuse--;
+		if ((!slub_data(&new)->inuse || !prior) && !was_frozen) {
 
 			if (kmem_cache_has_cpu_partial(s) && !prior) {
 
@@ -2546,7 +2553,7 @@ static void __slab_free(struct kmem_cach
 				 * We can defer the list move and instead
 				 * freeze it.
 				 */
-				new.frozen = 1;
+				slub_data(&new)->frozen = 1;
 
 			} else { /* Needs to be taken off a list */
 
@@ -2566,7 +2573,7 @@ static void __slab_free(struct kmem_cach
 
 	} while (!cmpxchg_double_slab(s, page,
 		prior, counters,
-		object, new.counters,
+		object, slub_data(&new)->counters,
 		"__slab_free"));
 
 	if (likely(!n)) {
@@ -2575,7 +2582,7 @@ static void __slab_free(struct kmem_cach
 		 * If we just froze the page then put it onto the
 		 * per cpu partial list.
 		 */
-		if (new.frozen && !was_frozen) {
+		if (slub_data(&new)->frozen && !was_frozen) {
 			put_cpu_partial(s, page, 1);
 			stat(s, CPU_PARTIAL_FREE);
 		}
@@ -2588,7 +2595,7 @@ static void __slab_free(struct kmem_cach
                 return;
         }
 
-	if (unlikely(!new.inuse && n->nr_partial > s->min_partial))
+	if (unlikely(!slub_data(&new)->inuse && n->nr_partial > s->min_partial))
 		goto slab_empty;
 
 	/*
@@ -2875,18 +2882,18 @@ static void early_kmem_cache_node_alloc(
 				"in order to be able to continue\n");
 	}
 
-	n = page->freelist;
+	n = slub_data(page)->freelist;
 	BUG_ON(!n);
-	page->freelist = get_freepointer(kmem_cache_node, n);
-	page->inuse = 1;
-	page->frozen = 0;
+	slub_data(page)->freelist = get_freepointer(kmem_cache_node, n);
+	slub_data(page)->inuse = 1;
+	slub_data(page)->frozen = 0;
 	kmem_cache_node->node[node] = n;
 #ifdef CONFIG_SLUB_DEBUG
 	init_object(kmem_cache_node, n, SLUB_RED_ACTIVE);
 	init_tracking(kmem_cache_node, n);
 #endif
 	init_kmem_cache_node(n);
-	inc_slabs_node(kmem_cache_node, node, page->objects);
+	inc_slabs_node(kmem_cache_node, node, slub_data(page)->objects);
 
 	add_partial(n, page, DEACTIVATE_TO_HEAD);
 }
@@ -3141,7 +3148,7 @@ static void list_slab_objects(struct kme
 #ifdef CONFIG_SLUB_DEBUG
 	void *addr = page_address(page);
 	void *p;
-	unsigned long *map = kzalloc(BITS_TO_LONGS(page->objects) *
+	unsigned long *map = kzalloc(BITS_TO_LONGS(slub_data(page)->objects) *
 				     sizeof(long), GFP_ATOMIC);
 	if (!map)
 		return;
@@ -3149,7 +3156,7 @@ static void list_slab_objects(struct kme
 	slab_lock(page);
 
 	get_map(s, page, map);
-	for_each_object(p, s, addr, page->objects) {
+	for_each_object(p, s, addr, slub_data(page)->objects) {
 
 		if (!test_bit(slab_index(p, s, addr), map)) {
 			printk(KERN_ERR "INFO: Object 0x%p @offset=%tu\n",
@@ -3172,7 +3179,7 @@ static void free_partial(struct kmem_cac
 	struct page *page, *h;
 
 	list_for_each_entry_safe(page, h, &n->partial, lru) {
-		if (!page->inuse) {
+		if (!slub_data(page)->inuse) {
 			remove_partial(n, page);
 			discard_slab(s, page);
 		} else {
@@ -3409,11 +3416,11 @@ int kmem_cache_shrink(struct kmem_cache
 		 * Build lists indexed by the items in use in each slab.
 		 *
 		 * Note that concurrent frees may occur while we hold the
-		 * list_lock. page->inuse here is the upper limit.
+		 * list_lock.  ->inuse here is the upper limit.
 		 */
 		list_for_each_entry_safe(page, t, &n->partial, lru) {
-			list_move(&page->lru, slabs_by_inuse + page->inuse);
-			if (!page->inuse)
+			list_move(&page->lru, slabs_by_inuse + slub_data(page)->inuse);
+			if (!slub_data(page)->inuse)
 				n->nr_partial--;
 		}
 
@@ -3852,12 +3859,12 @@ void *__kmalloc_node_track_caller(size_t
 #ifdef CONFIG_SYSFS
 static int count_inuse(struct page *page)
 {
-	return page->inuse;
+	return slub_data(page)->inuse;
 }
 
 static int count_total(struct page *page)
 {
-	return page->objects;
+	return slub_data(page)->objects;
 }
 #endif
 
@@ -3873,16 +3880,16 @@ static int validate_slab(struct kmem_cac
 		return 0;
 
 	/* Now we know that a valid freelist exists */
-	bitmap_zero(map, page->objects);
+	bitmap_zero(map, slub_data(page)->objects);
 
 	get_map(s, page, map);
-	for_each_object(p, s, addr, page->objects) {
+	for_each_object(p, s, addr, slub_data(page)->objects) {
 		if (test_bit(slab_index(p, s, addr), map))
 			if (!check_object(s, page, p, SLUB_RED_INACTIVE))
 				return 0;
 	}
 
-	for_each_object(p, s, addr, page->objects)
+	for_each_object(p, s, addr, slub_data(page)->objects)
 		if (!test_bit(slab_index(p, s, addr), map))
 			if (!check_object(s, page, p, SLUB_RED_ACTIVE))
 				return 0;
@@ -4083,10 +4090,10 @@ static void process_slab(struct loc_trac
 	void *addr = page_address(page);
 	void *p;
 
-	bitmap_zero(map, page->objects);
+	bitmap_zero(map, slub_data(page)->objects);
 	get_map(s, page, map);
 
-	for_each_object(p, s, addr, page->objects)
+	for_each_object(p, s, addr, slub_data(page)->objects)
 		if (!test_bit(slab_index(p, s, addr), map))
 			add_location(t, s, get_track(s, p, alloc));
 }
@@ -4285,9 +4292,9 @@ static ssize_t show_slab_objects(struct
 
 			node = page_to_nid(page);
 			if (flags & SO_TOTAL)
-				x = page->objects;
+				x = slub_data(page)->objects;
 			else if (flags & SO_OBJECTS)
-				x = page->inuse;
+				x = slub_data(page)->inuse;
 			else
 				x = 1;
 
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [RFC][PATCH 7/9] mm: slub: remove 'struct page' alignment restrictions
  2014-01-14 18:00 ` Dave Hansen
@ 2014-01-14 18:01   ` Dave Hansen
  -1 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-14 18:01 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, penberg, cl, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

SLUB depends on a 16-byte cmpxchg for an optimization.  In order
to get guaranteed 16-byte alignment (required by the hardware on
x86), 'struct page' is padded out from 56 to 64 bytes.

Those 8-bytes matter.  We've gone to great lengths to keep
'struct page' small in the past.  It's a shame that we bloat it
now just for alignment reasons when we have *extra* space.  Also,
increasing the size of 'struct page' by 14% makes it 14% more
likely that we will miss a cacheline when fetching it.

This patch takes an unused 8-byte area of slub's 'struct page'
and reuses it to internally align to the 16-bytes that we need.

Note that this also gets rid of the ugly slub #ifdef that we use
to segregate ->counters and ->_count for cases where we need to
manipulate ->counters without the benefit of a hardware cmpxchg.

This patch takes me from 16909584K of reserved memory at boot
down to 14814472K, so almost *exactly* 2GB of savings!  It also
helps performance, presumably because of that 14% fewer
cacheline effect.  A 30GB dd to a ramfs file:

	dd if=/dev/zero of=bigfile bs=$((1<<30)) count=30

is sped up by about 4.4% in my testing.

The value of maintaining the cmpxchg16 operation can be
demonstrated in some tiny little microbenchmarks, so it is
probably something we should keep around instead of just using
the spinlock for everything:

	http://lkml.kernel.org/r/52B345A3.6090700@sr71.net

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/arch/Kconfig             |    8 -------
 b/arch/s390/Kconfig        |    1 
 b/arch/x86/Kconfig         |    1 
 b/include/linux/mm_types.h |   51 +++++++++++++--------------------------------
 b/init/Kconfig             |    2 -
 b/mm/slab_common.c         |   10 +++++---
 b/mm/slub.c                |    4 +++
 7 files changed, 26 insertions(+), 51 deletions(-)

diff -puN arch/Kconfig~remove-struct-page-alignment-restrictions arch/Kconfig
--- a/arch/Kconfig~remove-struct-page-alignment-restrictions	2014-01-14 09:57:58.074710529 -0800
+++ b/arch/Kconfig	2014-01-14 09:57:58.088711157 -0800
@@ -289,14 +289,6 @@ config HAVE_RCU_TABLE_FREE
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
 	bool
 
-config HAVE_ALIGNED_STRUCT_PAGE
-	bool
-	help
-	  This makes sure that struct pages are double word aligned and that
-	  e.g. the SLUB allocator can perform double word atomic operations
-	  on a struct page for better performance. However selecting this
-	  might increase the size of a struct page by a word.
-
 config HAVE_CMPXCHG_LOCAL
 	bool
 
diff -puN arch/s390/Kconfig~remove-struct-page-alignment-restrictions arch/s390/Kconfig
--- a/arch/s390/Kconfig~remove-struct-page-alignment-restrictions	2014-01-14 09:57:58.075710574 -0800
+++ b/arch/s390/Kconfig	2014-01-14 09:57:58.089711202 -0800
@@ -102,7 +102,6 @@ config S390
 	select GENERIC_FIND_FIRST_BIT
 	select GENERIC_SMP_IDLE_THREAD
 	select GENERIC_TIME_VSYSCALL
-	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
 	select HAVE_ARCH_JUMP_LABEL if !MARCH_G5
 	select HAVE_ARCH_SECCOMP_FILTER
 	select HAVE_ARCH_TRACEHOOK
diff -puN arch/x86/Kconfig~remove-struct-page-alignment-restrictions arch/x86/Kconfig
--- a/arch/x86/Kconfig~remove-struct-page-alignment-restrictions	2014-01-14 09:57:58.077710664 -0800
+++ b/arch/x86/Kconfig	2014-01-14 09:57:58.090711246 -0800
@@ -78,7 +78,6 @@ config X86
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_DEBUG_KMEMLEAK
 	select ANON_INODES
-	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
 	select HAVE_CMPXCHG_LOCAL
 	select HAVE_CMPXCHG_DOUBLE
 	select HAVE_ARCH_KMEMCHECK
diff -puN include/linux/mm_types.h~remove-struct-page-alignment-restrictions include/linux/mm_types.h
--- a/include/linux/mm_types.h~remove-struct-page-alignment-restrictions	2014-01-14 09:57:58.079710753 -0800
+++ b/include/linux/mm_types.h	2014-01-14 09:57:58.091711291 -0800
@@ -24,39 +24,35 @@
 struct address_space;
 
 struct slub_data {
-	void *unused;
 	void *freelist;
 	union {
 		struct {
 			unsigned inuse:16;
 			unsigned objects:15;
 			unsigned frozen:1;
-			atomic_t dontuse_slub_count;
 		};
 		/*
-		 * ->counters is used to make it easier to copy
-		 * all of the above counters in one chunk.
-		 * The actual counts are never accessed via this.
-		 */
-#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
-    defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
-		unsigned long counters;
-#else
-		/*
 		 * Keep _count separate from slub cmpxchg_double data.
 		 * As the rest of the double word is protected by
 		 * slab_lock but _count is not.
 		 */
 		struct {
-			unsigned counters;
-			/*
-			 * This isn't used directly, but declare it here
-			 * for clarity since it must line up with _count
-			 * from 'struct page'
-			 */
+			/* counters is just a helperfor the above bitfield */
+			unsigned long counters;
+			atomic_t padding;
 			atomic_t separate_count;
 		};
-#endif
+		/*
+		 * the double-cmpxchg case:
+		 * counters and _count overlap
+		 */
+		union {
+			unsigned long counters2;
+			struct {
+				atomic_t padding2;
+				atomic_t _count;
+			};
+		};
 	};
 };
 
@@ -71,15 +67,8 @@ struct slub_data {
  * moment. Note that we have no way to track which tasks are using
  * a page, though if it is a pagecache page, rmap structures can tell us
  * who is mapping it.
- *
- * The objects in struct page are organized in double word blocks in
- * order to allows us to use atomic double word operations on portions
- * of struct page. That is currently only used by slub but the arrangement
- * allows the use of atomic double word operations on the flags/mapping
- * and lru list pointers also.
  */
 struct page {
-	/* First double word block */
 	unsigned long flags;		/* Atomic flags, some possibly
 					 * updated asynchronously */
 	union {
@@ -141,7 +130,6 @@ struct page {
 		};
 	};
 
-	/* Third double word block */
 	union {
 		struct list_head lru;	/* Pageout list, eg. active_list
 					 * protected by zone->lru_lock !
@@ -168,7 +156,6 @@ struct page {
 #endif
 	};
 
-	/* Remainder is not double word aligned */
 	union {
 		unsigned long private;		/* Mapping-private opaque data:
 					 	 * usually used for buffer_heads
@@ -217,15 +204,7 @@ struct page {
 #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
 	int _last_cpupid;
 #endif
-}
-/*
- * The struct page can be forced to be double word aligned so that atomic ops
- * on double words work. The SLUB allocator can make use of such a feature.
- */
-#ifdef CONFIG_HAVE_ALIGNED_STRUCT_PAGE
-	__aligned(2 * sizeof(unsigned long))
-#endif
-;
+};
 
 struct page_frag {
 	struct page *page;
diff -puN init/Kconfig~remove-struct-page-alignment-restrictions init/Kconfig
--- a/init/Kconfig~remove-struct-page-alignment-restrictions	2014-01-14 09:57:58.081710843 -0800
+++ b/init/Kconfig	2014-01-14 09:57:58.092711336 -0800
@@ -1605,7 +1605,7 @@ config SLUB_CPU_PARTIAL
 
 config SLUB_ATTEMPT_CMPXCHG_DOUBLE
 	default y
-	depends on SLUB && HAVE_CMPXCHG_DOUBLE && HAVE_ALIGNED_STRUCT_PAGE
+	depends on SLUB && HAVE_CMPXCHG_DOUBLE
 	bool "SLUB: attempt to use double-cmpxchg operations"
 	help
 	  Some CPUs support instructions that let you do a large double-word
diff -puN mm/slab_common.c~remove-struct-page-alignment-restrictions mm/slab_common.c
--- a/mm/slab_common.c~remove-struct-page-alignment-restrictions	2014-01-14 09:57:58.083710932 -0800
+++ b/mm/slab_common.c	2014-01-14 09:57:58.092711336 -0800
@@ -692,7 +692,6 @@ module_init(slab_proc_init);
 void slab_build_checks(void)
 {
 	SLAB_PAGE_CHECK(_count, dontuse_slab_count);
-	SLAB_PAGE_CHECK(_count, slub_data.dontuse_slub_count);
 	SLAB_PAGE_CHECK(_count, dontuse_slob_count);
 
 	/*
@@ -706,9 +705,12 @@ void slab_build_checks(void)
 	 * carve out for _count in that case actually lines up
 	 * with the real _count.
 	 */
-#if !(defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
-	    defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE))
 	SLAB_PAGE_CHECK(_count, slub_data.separate_count);
-#endif
+
+	/*
+	 * We need at least three double-words worth of space to
+	 * ensure that we can align to a double-wordk internally.
+	 */
+	BUILD_BUG_ON(sizeof(struct slub_data) != sizeof(unsigned long) * 3);
 }
 
diff -puN mm/slub.c~remove-struct-page-alignment-restrictions mm/slub.c
--- a/mm/slub.c~remove-struct-page-alignment-restrictions	2014-01-14 09:57:58.085711022 -0800
+++ b/mm/slub.c	2014-01-14 09:57:58.094711426 -0800
@@ -239,7 +239,11 @@ static inline struct kmem_cache_node *ge
 
 static inline struct slub_data *slub_data(struct page *page)
 {
+	int doubleword_bytes = BITS_PER_LONG * 2 / 8;
 	void *ptr = &page->slub_data;
+#if defined(CONFIG_SLUB_ATTEMPT_CMPXCHG_DOUBLE)
+	ptr = PTR_ALIGN(ptr, doubleword_bytes);
+#endif
 	return ptr;
 }
 
_

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [RFC][PATCH 7/9] mm: slub: remove 'struct page' alignment restrictions
@ 2014-01-14 18:01   ` Dave Hansen
  0 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-14 18:01 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, penberg, cl, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

SLUB depends on a 16-byte cmpxchg for an optimization.  In order
to get guaranteed 16-byte alignment (required by the hardware on
x86), 'struct page' is padded out from 56 to 64 bytes.

Those 8-bytes matter.  We've gone to great lengths to keep
'struct page' small in the past.  It's a shame that we bloat it
now just for alignment reasons when we have *extra* space.  Also,
increasing the size of 'struct page' by 14% makes it 14% more
likely that we will miss a cacheline when fetching it.

This patch takes an unused 8-byte area of slub's 'struct page'
and reuses it to internally align to the 16-bytes that we need.

Note that this also gets rid of the ugly slub #ifdef that we use
to segregate ->counters and ->_count for cases where we need to
manipulate ->counters without the benefit of a hardware cmpxchg.

This patch takes me from 16909584K of reserved memory at boot
down to 14814472K, so almost *exactly* 2GB of savings!  It also
helps performance, presumably because of that 14% fewer
cacheline effect.  A 30GB dd to a ramfs file:

	dd if=/dev/zero of=bigfile bs=$((1<<30)) count=30

is sped up by about 4.4% in my testing.

The value of maintaining the cmpxchg16 operation can be
demonstrated in some tiny little microbenchmarks, so it is
probably something we should keep around instead of just using
the spinlock for everything:

	http://lkml.kernel.org/r/52B345A3.6090700@sr71.net

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/arch/Kconfig             |    8 -------
 b/arch/s390/Kconfig        |    1 
 b/arch/x86/Kconfig         |    1 
 b/include/linux/mm_types.h |   51 +++++++++++++--------------------------------
 b/init/Kconfig             |    2 -
 b/mm/slab_common.c         |   10 +++++---
 b/mm/slub.c                |    4 +++
 7 files changed, 26 insertions(+), 51 deletions(-)

diff -puN arch/Kconfig~remove-struct-page-alignment-restrictions arch/Kconfig
--- a/arch/Kconfig~remove-struct-page-alignment-restrictions	2014-01-14 09:57:58.074710529 -0800
+++ b/arch/Kconfig	2014-01-14 09:57:58.088711157 -0800
@@ -289,14 +289,6 @@ config HAVE_RCU_TABLE_FREE
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
 	bool
 
-config HAVE_ALIGNED_STRUCT_PAGE
-	bool
-	help
-	  This makes sure that struct pages are double word aligned and that
-	  e.g. the SLUB allocator can perform double word atomic operations
-	  on a struct page for better performance. However selecting this
-	  might increase the size of a struct page by a word.
-
 config HAVE_CMPXCHG_LOCAL
 	bool
 
diff -puN arch/s390/Kconfig~remove-struct-page-alignment-restrictions arch/s390/Kconfig
--- a/arch/s390/Kconfig~remove-struct-page-alignment-restrictions	2014-01-14 09:57:58.075710574 -0800
+++ b/arch/s390/Kconfig	2014-01-14 09:57:58.089711202 -0800
@@ -102,7 +102,6 @@ config S390
 	select GENERIC_FIND_FIRST_BIT
 	select GENERIC_SMP_IDLE_THREAD
 	select GENERIC_TIME_VSYSCALL
-	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
 	select HAVE_ARCH_JUMP_LABEL if !MARCH_G5
 	select HAVE_ARCH_SECCOMP_FILTER
 	select HAVE_ARCH_TRACEHOOK
diff -puN arch/x86/Kconfig~remove-struct-page-alignment-restrictions arch/x86/Kconfig
--- a/arch/x86/Kconfig~remove-struct-page-alignment-restrictions	2014-01-14 09:57:58.077710664 -0800
+++ b/arch/x86/Kconfig	2014-01-14 09:57:58.090711246 -0800
@@ -78,7 +78,6 @@ config X86
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_DEBUG_KMEMLEAK
 	select ANON_INODES
-	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
 	select HAVE_CMPXCHG_LOCAL
 	select HAVE_CMPXCHG_DOUBLE
 	select HAVE_ARCH_KMEMCHECK
diff -puN include/linux/mm_types.h~remove-struct-page-alignment-restrictions include/linux/mm_types.h
--- a/include/linux/mm_types.h~remove-struct-page-alignment-restrictions	2014-01-14 09:57:58.079710753 -0800
+++ b/include/linux/mm_types.h	2014-01-14 09:57:58.091711291 -0800
@@ -24,39 +24,35 @@
 struct address_space;
 
 struct slub_data {
-	void *unused;
 	void *freelist;
 	union {
 		struct {
 			unsigned inuse:16;
 			unsigned objects:15;
 			unsigned frozen:1;
-			atomic_t dontuse_slub_count;
 		};
 		/*
-		 * ->counters is used to make it easier to copy
-		 * all of the above counters in one chunk.
-		 * The actual counts are never accessed via this.
-		 */
-#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
-    defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
-		unsigned long counters;
-#else
-		/*
 		 * Keep _count separate from slub cmpxchg_double data.
 		 * As the rest of the double word is protected by
 		 * slab_lock but _count is not.
 		 */
 		struct {
-			unsigned counters;
-			/*
-			 * This isn't used directly, but declare it here
-			 * for clarity since it must line up with _count
-			 * from 'struct page'
-			 */
+			/* counters is just a helperfor the above bitfield */
+			unsigned long counters;
+			atomic_t padding;
 			atomic_t separate_count;
 		};
-#endif
+		/*
+		 * the double-cmpxchg case:
+		 * counters and _count overlap
+		 */
+		union {
+			unsigned long counters2;
+			struct {
+				atomic_t padding2;
+				atomic_t _count;
+			};
+		};
 	};
 };
 
@@ -71,15 +67,8 @@ struct slub_data {
  * moment. Note that we have no way to track which tasks are using
  * a page, though if it is a pagecache page, rmap structures can tell us
  * who is mapping it.
- *
- * The objects in struct page are organized in double word blocks in
- * order to allows us to use atomic double word operations on portions
- * of struct page. That is currently only used by slub but the arrangement
- * allows the use of atomic double word operations on the flags/mapping
- * and lru list pointers also.
  */
 struct page {
-	/* First double word block */
 	unsigned long flags;		/* Atomic flags, some possibly
 					 * updated asynchronously */
 	union {
@@ -141,7 +130,6 @@ struct page {
 		};
 	};
 
-	/* Third double word block */
 	union {
 		struct list_head lru;	/* Pageout list, eg. active_list
 					 * protected by zone->lru_lock !
@@ -168,7 +156,6 @@ struct page {
 #endif
 	};
 
-	/* Remainder is not double word aligned */
 	union {
 		unsigned long private;		/* Mapping-private opaque data:
 					 	 * usually used for buffer_heads
@@ -217,15 +204,7 @@ struct page {
 #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
 	int _last_cpupid;
 #endif
-}
-/*
- * The struct page can be forced to be double word aligned so that atomic ops
- * on double words work. The SLUB allocator can make use of such a feature.
- */
-#ifdef CONFIG_HAVE_ALIGNED_STRUCT_PAGE
-	__aligned(2 * sizeof(unsigned long))
-#endif
-;
+};
 
 struct page_frag {
 	struct page *page;
diff -puN init/Kconfig~remove-struct-page-alignment-restrictions init/Kconfig
--- a/init/Kconfig~remove-struct-page-alignment-restrictions	2014-01-14 09:57:58.081710843 -0800
+++ b/init/Kconfig	2014-01-14 09:57:58.092711336 -0800
@@ -1605,7 +1605,7 @@ config SLUB_CPU_PARTIAL
 
 config SLUB_ATTEMPT_CMPXCHG_DOUBLE
 	default y
-	depends on SLUB && HAVE_CMPXCHG_DOUBLE && HAVE_ALIGNED_STRUCT_PAGE
+	depends on SLUB && HAVE_CMPXCHG_DOUBLE
 	bool "SLUB: attempt to use double-cmpxchg operations"
 	help
 	  Some CPUs support instructions that let you do a large double-word
diff -puN mm/slab_common.c~remove-struct-page-alignment-restrictions mm/slab_common.c
--- a/mm/slab_common.c~remove-struct-page-alignment-restrictions	2014-01-14 09:57:58.083710932 -0800
+++ b/mm/slab_common.c	2014-01-14 09:57:58.092711336 -0800
@@ -692,7 +692,6 @@ module_init(slab_proc_init);
 void slab_build_checks(void)
 {
 	SLAB_PAGE_CHECK(_count, dontuse_slab_count);
-	SLAB_PAGE_CHECK(_count, slub_data.dontuse_slub_count);
 	SLAB_PAGE_CHECK(_count, dontuse_slob_count);
 
 	/*
@@ -706,9 +705,12 @@ void slab_build_checks(void)
 	 * carve out for _count in that case actually lines up
 	 * with the real _count.
 	 */
-#if !(defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
-	    defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE))
 	SLAB_PAGE_CHECK(_count, slub_data.separate_count);
-#endif
+
+	/*
+	 * We need at least three double-words worth of space to
+	 * ensure that we can align to a double-wordk internally.
+	 */
+	BUILD_BUG_ON(sizeof(struct slub_data) != sizeof(unsigned long) * 3);
 }
 
diff -puN mm/slub.c~remove-struct-page-alignment-restrictions mm/slub.c
--- a/mm/slub.c~remove-struct-page-alignment-restrictions	2014-01-14 09:57:58.085711022 -0800
+++ b/mm/slub.c	2014-01-14 09:57:58.094711426 -0800
@@ -239,7 +239,11 @@ static inline struct kmem_cache_node *ge
 
 static inline struct slub_data *slub_data(struct page *page)
 {
+	int doubleword_bytes = BITS_PER_LONG * 2 / 8;
 	void *ptr = &page->slub_data;
+#if defined(CONFIG_SLUB_ATTEMPT_CMPXCHG_DOUBLE)
+	ptr = PTR_ALIGN(ptr, doubleword_bytes);
+#endif
 	return ptr;
 }
 
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [RFC][PATCH 8/9] mm: slub: cleanups after code churn
  2014-01-14 18:00 ` Dave Hansen
@ 2014-01-14 18:01   ` Dave Hansen
  -1 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-14 18:01 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, penberg, cl, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

I added a bunch of longer than 80 column lines and other various
messes.  But, doing line-to-line code replacements makes the
previous patch much easier to audit.  I stuck the cleanups in
here instead.

The slub code also delcares a bunch of 'struct page's on the
stack.  Now that 'struct slub_data' is separate, we can declare
those smaller structures instead.  This ends up saving us a
couple hundred bytes in object size.

Doing all the work of doing the pointer alignment operations over
and over costs us some code size.  In the end (not this patch
alone), we take slub.o's size from 26672->27168, so about 500
bytes.  But, on an 8GB system, we saved about 256k in 'struct
page' overhead.  That's a pretty good tradeoff.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/mm/slub.c |  149 +++++++++++++++++++++++++++++++-----------------------------
 1 file changed, 78 insertions(+), 71 deletions(-)

diff -puN mm/slub.c~slub-cleanups mm/slub.c
--- a/mm/slub.c~slub-cleanups	2014-01-14 09:57:58.458727748 -0800
+++ b/mm/slub.c	2014-01-14 09:57:58.463727972 -0800
@@ -257,7 +257,8 @@ static inline int check_valid_pointer(st
 		return 1;
 
 	base = page_address(page);
-	if (object < base || object >= base + slub_data(page)->objects * s->size ||
+	if (object < base ||
+	    object >= base + slub_data(page)->objects * s->size ||
 		(object - base) % s->size) {
 		return 0;
 	}
@@ -374,16 +375,17 @@ static inline bool __cmpxchg_double_slab
 	VM_BUG_ON(!irqs_disabled());
 #if defined(CONFIG_SLUB_ATTEMPT_CMPXCHG_DOUBLE)
 	if (s->flags & __CMPXCHG_DOUBLE) {
-		if (cmpxchg_double(&slub_data(page)->freelist, &slub_data(page)->counters,
-			freelist_old, counters_old,
-			freelist_new, counters_new))
-		return 1;
+		if (cmpxchg_double(&slub_data(page)->freelist,
+				   &slub_data(page)->counters,
+				   freelist_old, counters_old,
+				   freelist_new, counters_new))
+			return 1;
 	} else
 #endif
 	{
 		slab_lock(page);
 		if (slub_data(page)->freelist == freelist_old &&
-					slub_data(page)->counters == counters_old) {
+		    slub_data(page)->counters == counters_old) {
 			slub_data(page)->freelist = freelist_new;
 			slub_data(page)->counters = counters_new;
 			slab_unlock(page);
@@ -407,11 +409,12 @@ static inline bool cmpxchg_double_slab(s
 		void *freelist_new, unsigned long counters_new,
 		const char *n)
 {
+	struct slub_data *sd = slub_data(page);
 #if defined(CONFIG_SLUB_ATTEMPT_CMPXCHG_DOUBLE)
 	if (s->flags & __CMPXCHG_DOUBLE) {
-		if (cmpxchg_double(&slub_data(page)->freelist, &slub_data(page)->counters,
-			freelist_old, counters_old,
-			freelist_new, counters_new))
+		if (cmpxchg_double(&sd->freelist, &sd->counters,
+				   freelist_old, counters_old,
+				   freelist_new, counters_new))
 		return 1;
 	} else
 #endif
@@ -420,10 +423,10 @@ static inline bool cmpxchg_double_slab(s
 
 		local_irq_save(flags);
 		slab_lock(page);
-		if (slub_data(page)->freelist == freelist_old &&
-		    slub_data(page)->counters == counters_old) {
-			slub_data(page)->freelist = freelist_new;
-			slub_data(page)->counters = counters_new;
+		if (sd->freelist == freelist_old &&
+		    sd->counters == counters_old) {
+			sd->freelist = freelist_new;
+			sd->counters = counters_new;
 			slab_unlock(page);
 			local_irq_restore(flags);
 			return 1;
@@ -859,7 +862,8 @@ static int check_slab(struct kmem_cache
 	}
 	if (slub_data(page)->inuse > slub_data(page)->objects) {
 		slab_err(s, page, "inuse %u > max %u",
-			s->name, slub_data(page)->inuse, slub_data(page)->objects);
+			s->name, slub_data(page)->inuse,
+			slub_data(page)->objects);
 		return 0;
 	}
 	/* Slab_pad_check fixes things up after itself */
@@ -890,7 +894,8 @@ static int on_freelist(struct kmem_cache
 			} else {
 				slab_err(s, page, "Freepointer corrupt");
 				slub_data(page)->freelist = NULL;
-				slub_data(page)->inuse = slub_data(page)->objects;
+				slub_data(page)->inuse =
+					slub_data(page)->objects;
 				slab_fix(s, "Freelist cleared");
 				return 0;
 			}
@@ -913,7 +918,8 @@ static int on_freelist(struct kmem_cache
 	}
 	if (slub_data(page)->inuse != slub_data(page)->objects - nr) {
 		slab_err(s, page, "Wrong object count. Counter is %d but "
-			"counted were %d", slub_data(page)->inuse, slub_data(page)->objects - nr);
+			"counted were %d", slub_data(page)->inuse,
+			slub_data(page)->objects - nr);
 		slub_data(page)->inuse = slub_data(page)->objects - nr;
 		slab_fix(s, "Object count adjusted.");
 	}
@@ -1547,7 +1553,7 @@ static inline void *acquire_slab(struct
 {
 	void *freelist;
 	unsigned long counters;
-	struct page new;
+	struct slub_data new;
 
 	lockdep_assert_held(&n->list_lock);
 
@@ -1558,21 +1564,21 @@ static inline void *acquire_slab(struct
 	 */
 	freelist = slub_data(page)->freelist;
 	counters = slub_data(page)->counters;
-	slub_data(&new)->counters = counters;
-	*objects = slub_data(&new)->objects - slub_data(&new)->inuse;
+	new.counters = counters;
+	*objects = new.objects - new.inuse;
 	if (mode) {
-		slub_data(&new)->inuse = slub_data(page)->objects;
-		slub_data(&new)->freelist = NULL;
+		new.inuse = slub_data(page)->objects;
+		new.freelist = NULL;
 	} else {
-		slub_data(&new)->freelist = freelist;
+		new.freelist = freelist;
 	}
 
-	VM_BUG_ON_PAGE(slub_data(&new)->frozen, &new);
-	slub_data(&new)->frozen = 1;
+	VM_BUG_ON_PAGE(new.frozen, &new);
+	new.frozen = 1;
 
 	if (!__cmpxchg_double_slab(s, page,
 			freelist, counters,
-			slub_data(&new)->freelist, slub_data(&new)->counters,
+			new.freelist, new.counters,
 			"acquire_slab"))
 		return NULL;
 
@@ -1794,8 +1800,8 @@ static void deactivate_slab(struct kmem_
 	enum slab_modes l = M_NONE, m = M_NONE;
 	void *nextfree;
 	int tail = DEACTIVATE_TO_HEAD;
-	struct page new;
-	struct page old;
+	struct slub_data new;
+	struct slub_data old;
 
 	if (slub_data(page)->freelist) {
 		stat(s, DEACTIVATE_REMOTE_FREES);
@@ -1818,13 +1824,13 @@ static void deactivate_slab(struct kmem_
 			prior = slub_data(page)->freelist;
 			counters = slub_data(page)->counters;
 			set_freepointer(s, freelist, prior);
-			slub_data(&new)->counters = counters;
-			slub_data(&new)->inuse--;
-			VM_BUG_ON_PAGE(!slub_data(&new)->frozen, &new);
+			new.counters = counters;
+			new.inuse--;
+			VM_BUG_ON_PAGE(!new.frozen, &new);
 
 		} while (!__cmpxchg_double_slab(s, page,
 			prior, counters,
-			freelist, slub_data(&new)->counters,
+			freelist, new.counters,
 			"drain percpu freelist"));
 
 		freelist = nextfree;
@@ -1846,24 +1852,24 @@ static void deactivate_slab(struct kmem_
 	 */
 redo:
 
-	slub_data(&old)->freelist = slub_data(page)->freelist;
-	slub_data(&old)->counters = slub_data(page)->counters;
-	VM_BUG_ON_PAGE(!slub_data(&old)->frozen, &old);
+	old.freelist = slub_data(page)->freelist;
+	old.counters = slub_data(page)->counters;
+	VM_BUG_ON_PAGE(!old.frozen, &old);
 
 	/* Determine target state of the slab */
-	slub_data(&new)->counters = slub_data(&old)->counters;
+	new.counters = old.counters;
 	if (freelist) {
-		slub_data(&new)->inuse--;
-		set_freepointer(s, freelist, slub_data(&old)->freelist);
-		slub_data(&new)->freelist = freelist;
+		new.inuse--;
+		set_freepointer(s, freelist, old.freelist);
+		new.freelist = freelist;
 	} else
-		slub_data(&new)->freelist = slub_data(&old)->freelist;
+		new.freelist = old.freelist;
 
-	slub_data(&new)->frozen = 0;
+	new.frozen = 0;
 
-	if (!slub_data(&new)->inuse && n->nr_partial > s->min_partial)
+	if (!new.inuse && n->nr_partial > s->min_partial)
 		m = M_FREE;
-	else if (slub_data(&new)->freelist) {
+	else if (new.freelist) {
 		m = M_PARTIAL;
 		if (!lock) {
 			lock = 1;
@@ -1912,8 +1918,8 @@ redo:
 
 	l = m;
 	if (!__cmpxchg_double_slab(s, page,
-				slub_data(&old)->freelist, slub_data(&old)->counters,
-				slub_data(&new)->freelist, slub_data(&new)->counters,
+				old.freelist, old.counters,
+				new.freelist, new.counters,
 				"unfreezing slab"))
 		goto redo;
 
@@ -1942,8 +1948,8 @@ static void unfreeze_partials(struct kme
 	struct page *page, *discard_page = NULL;
 
 	while ((page = c->partial)) {
-		struct page new;
-		struct page old;
+		struct slub_data new;
+		struct slub_data old;
 
 		c->partial = page->next;
 
@@ -1958,21 +1964,21 @@ static void unfreeze_partials(struct kme
 
 		do {
 
-			slub_data(&old)->freelist = slub_data(page)->freelist;
-			slub_data(&old)->counters = slub_data(page)->counters;
-			VM_BUG_ON_PAGE(!slub_data(&old)->frozen, &old);
+			old.freelist = slub_data(page)->freelist;
+			old.counters = slub_data(page)->counters;
+			VM_BUG_ON_PAGE(!old.frozen, &old);
 
-			slub_data(&new)->counters = slub_data(&old)->counters;
-			slub_data(&new)->freelist = slub_data(&old)->freelist;
+			new.counters = old.counters;
+			new.freelist = old.freelist;
 
-			slub_data(&new)->frozen = 0;
+			new.frozen = 0;
 
 		} while (!__cmpxchg_double_slab(s, page,
-				slub_data(&old)->freelist, slub_data(&old)->counters,
-				slub_data(&new)->freelist, slub_data(&new)->counters,
+				old.freelist, old.counters,
+				new.freelist, new.counters,
 				"unfreezing slab"));
 
-		if (unlikely(!slub_data(&new)->inuse && n->nr_partial > s->min_partial)) {
+		if (unlikely(!new.inuse && n->nr_partial > s->min_partial)) {
 			page->next = discard_page;
 			discard_page = page;
 		} else {
@@ -2224,7 +2230,7 @@ static inline bool pfmemalloc_match(stru
  */
 static inline void *get_freelist(struct kmem_cache *s, struct page *page)
 {
-	struct page new;
+	struct slub_data new;
 	unsigned long counters;
 	void *freelist;
 
@@ -2232,15 +2238,15 @@ static inline void *get_freelist(struct
 		freelist = slub_data(page)->freelist;
 		counters = slub_data(page)->counters;
 
-		slub_data(&new)->counters = counters;
-		VM_BUG_ON_PAGE(!slub_data(&new)->frozen, &new);
+		new.counters = counters;
+		VM_BUG_ON_PAGE(!new.frozen, &new);
 
-		slub_data(&new)->inuse = slub_data(page)->objects;
-		slub_data(&new)->frozen = freelist != NULL;
+		new.inuse = slub_data(page)->objects;
+		new.frozen = freelist != NULL;
 
 	} while (!__cmpxchg_double_slab(s, page,
 		freelist, counters,
-		NULL, slub_data(&new)->counters,
+		NULL, new.counters,
 		"get_freelist"));
 
 	return freelist;
@@ -2525,7 +2531,7 @@ static void __slab_free(struct kmem_cach
 	void *prior;
 	void **object = (void *)x;
 	int was_frozen;
-	struct page new;
+	struct slub_data new;
 	unsigned long counters;
 	struct kmem_cache_node *n = NULL;
 	unsigned long uninitialized_var(flags);
@@ -2544,10 +2550,10 @@ static void __slab_free(struct kmem_cach
 		prior = slub_data(page)->freelist;
 		counters = slub_data(page)->counters;
 		set_freepointer(s, object, prior);
-		slub_data(&new)->counters = counters;
-		was_frozen = slub_data(&new)->frozen;
-		slub_data(&new)->inuse--;
-		if ((!slub_data(&new)->inuse || !prior) && !was_frozen) {
+		new.counters = counters;
+		was_frozen = new.frozen;
+		new.inuse--;
+		if ((!new.inuse || !prior) && !was_frozen) {
 
 			if (kmem_cache_has_cpu_partial(s) && !prior) {
 
@@ -2557,7 +2563,7 @@ static void __slab_free(struct kmem_cach
 				 * We can defer the list move and instead
 				 * freeze it.
 				 */
-				slub_data(&new)->frozen = 1;
+				new.frozen = 1;
 
 			} else { /* Needs to be taken off a list */
 
@@ -2577,7 +2583,7 @@ static void __slab_free(struct kmem_cach
 
 	} while (!cmpxchg_double_slab(s, page,
 		prior, counters,
-		object, slub_data(&new)->counters,
+		object, new.counters,
 		"__slab_free"));
 
 	if (likely(!n)) {
@@ -2586,7 +2592,7 @@ static void __slab_free(struct kmem_cach
 		 * If we just froze the page then put it onto the
 		 * per cpu partial list.
 		 */
-		if (slub_data(&new)->frozen && !was_frozen) {
+		if (new.frozen && !was_frozen) {
 			put_cpu_partial(s, page, 1);
 			stat(s, CPU_PARTIAL_FREE);
 		}
@@ -2599,7 +2605,7 @@ static void __slab_free(struct kmem_cach
                 return;
         }
 
-	if (unlikely(!slub_data(&new)->inuse && n->nr_partial > s->min_partial))
+	if (unlikely(!new.inuse && n->nr_partial > s->min_partial))
 		goto slab_empty;
 
 	/*
@@ -3423,7 +3429,8 @@ int kmem_cache_shrink(struct kmem_cache
 		 * list_lock.  ->inuse here is the upper limit.
 		 */
 		list_for_each_entry_safe(page, t, &n->partial, lru) {
-			list_move(&page->lru, slabs_by_inuse + slub_data(page)->inuse);
+			list_move(&page->lru, slabs_by_inuse +
+						slub_data(page)->inuse);
 			if (!slub_data(page)->inuse)
 				n->nr_partial--;
 		}
_

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [RFC][PATCH 8/9] mm: slub: cleanups after code churn
@ 2014-01-14 18:01   ` Dave Hansen
  0 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-14 18:01 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, penberg, cl, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

I added a bunch of longer than 80 column lines and other various
messes.  But, doing line-to-line code replacements makes the
previous patch much easier to audit.  I stuck the cleanups in
here instead.

The slub code also delcares a bunch of 'struct page's on the
stack.  Now that 'struct slub_data' is separate, we can declare
those smaller structures instead.  This ends up saving us a
couple hundred bytes in object size.

Doing all the work of doing the pointer alignment operations over
and over costs us some code size.  In the end (not this patch
alone), we take slub.o's size from 26672->27168, so about 500
bytes.  But, on an 8GB system, we saved about 256k in 'struct
page' overhead.  That's a pretty good tradeoff.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/mm/slub.c |  149 +++++++++++++++++++++++++++++++-----------------------------
 1 file changed, 78 insertions(+), 71 deletions(-)

diff -puN mm/slub.c~slub-cleanups mm/slub.c
--- a/mm/slub.c~slub-cleanups	2014-01-14 09:57:58.458727748 -0800
+++ b/mm/slub.c	2014-01-14 09:57:58.463727972 -0800
@@ -257,7 +257,8 @@ static inline int check_valid_pointer(st
 		return 1;
 
 	base = page_address(page);
-	if (object < base || object >= base + slub_data(page)->objects * s->size ||
+	if (object < base ||
+	    object >= base + slub_data(page)->objects * s->size ||
 		(object - base) % s->size) {
 		return 0;
 	}
@@ -374,16 +375,17 @@ static inline bool __cmpxchg_double_slab
 	VM_BUG_ON(!irqs_disabled());
 #if defined(CONFIG_SLUB_ATTEMPT_CMPXCHG_DOUBLE)
 	if (s->flags & __CMPXCHG_DOUBLE) {
-		if (cmpxchg_double(&slub_data(page)->freelist, &slub_data(page)->counters,
-			freelist_old, counters_old,
-			freelist_new, counters_new))
-		return 1;
+		if (cmpxchg_double(&slub_data(page)->freelist,
+				   &slub_data(page)->counters,
+				   freelist_old, counters_old,
+				   freelist_new, counters_new))
+			return 1;
 	} else
 #endif
 	{
 		slab_lock(page);
 		if (slub_data(page)->freelist == freelist_old &&
-					slub_data(page)->counters == counters_old) {
+		    slub_data(page)->counters == counters_old) {
 			slub_data(page)->freelist = freelist_new;
 			slub_data(page)->counters = counters_new;
 			slab_unlock(page);
@@ -407,11 +409,12 @@ static inline bool cmpxchg_double_slab(s
 		void *freelist_new, unsigned long counters_new,
 		const char *n)
 {
+	struct slub_data *sd = slub_data(page);
 #if defined(CONFIG_SLUB_ATTEMPT_CMPXCHG_DOUBLE)
 	if (s->flags & __CMPXCHG_DOUBLE) {
-		if (cmpxchg_double(&slub_data(page)->freelist, &slub_data(page)->counters,
-			freelist_old, counters_old,
-			freelist_new, counters_new))
+		if (cmpxchg_double(&sd->freelist, &sd->counters,
+				   freelist_old, counters_old,
+				   freelist_new, counters_new))
 		return 1;
 	} else
 #endif
@@ -420,10 +423,10 @@ static inline bool cmpxchg_double_slab(s
 
 		local_irq_save(flags);
 		slab_lock(page);
-		if (slub_data(page)->freelist == freelist_old &&
-		    slub_data(page)->counters == counters_old) {
-			slub_data(page)->freelist = freelist_new;
-			slub_data(page)->counters = counters_new;
+		if (sd->freelist == freelist_old &&
+		    sd->counters == counters_old) {
+			sd->freelist = freelist_new;
+			sd->counters = counters_new;
 			slab_unlock(page);
 			local_irq_restore(flags);
 			return 1;
@@ -859,7 +862,8 @@ static int check_slab(struct kmem_cache
 	}
 	if (slub_data(page)->inuse > slub_data(page)->objects) {
 		slab_err(s, page, "inuse %u > max %u",
-			s->name, slub_data(page)->inuse, slub_data(page)->objects);
+			s->name, slub_data(page)->inuse,
+			slub_data(page)->objects);
 		return 0;
 	}
 	/* Slab_pad_check fixes things up after itself */
@@ -890,7 +894,8 @@ static int on_freelist(struct kmem_cache
 			} else {
 				slab_err(s, page, "Freepointer corrupt");
 				slub_data(page)->freelist = NULL;
-				slub_data(page)->inuse = slub_data(page)->objects;
+				slub_data(page)->inuse =
+					slub_data(page)->objects;
 				slab_fix(s, "Freelist cleared");
 				return 0;
 			}
@@ -913,7 +918,8 @@ static int on_freelist(struct kmem_cache
 	}
 	if (slub_data(page)->inuse != slub_data(page)->objects - nr) {
 		slab_err(s, page, "Wrong object count. Counter is %d but "
-			"counted were %d", slub_data(page)->inuse, slub_data(page)->objects - nr);
+			"counted were %d", slub_data(page)->inuse,
+			slub_data(page)->objects - nr);
 		slub_data(page)->inuse = slub_data(page)->objects - nr;
 		slab_fix(s, "Object count adjusted.");
 	}
@@ -1547,7 +1553,7 @@ static inline void *acquire_slab(struct
 {
 	void *freelist;
 	unsigned long counters;
-	struct page new;
+	struct slub_data new;
 
 	lockdep_assert_held(&n->list_lock);
 
@@ -1558,21 +1564,21 @@ static inline void *acquire_slab(struct
 	 */
 	freelist = slub_data(page)->freelist;
 	counters = slub_data(page)->counters;
-	slub_data(&new)->counters = counters;
-	*objects = slub_data(&new)->objects - slub_data(&new)->inuse;
+	new.counters = counters;
+	*objects = new.objects - new.inuse;
 	if (mode) {
-		slub_data(&new)->inuse = slub_data(page)->objects;
-		slub_data(&new)->freelist = NULL;
+		new.inuse = slub_data(page)->objects;
+		new.freelist = NULL;
 	} else {
-		slub_data(&new)->freelist = freelist;
+		new.freelist = freelist;
 	}
 
-	VM_BUG_ON_PAGE(slub_data(&new)->frozen, &new);
-	slub_data(&new)->frozen = 1;
+	VM_BUG_ON_PAGE(new.frozen, &new);
+	new.frozen = 1;
 
 	if (!__cmpxchg_double_slab(s, page,
 			freelist, counters,
-			slub_data(&new)->freelist, slub_data(&new)->counters,
+			new.freelist, new.counters,
 			"acquire_slab"))
 		return NULL;
 
@@ -1794,8 +1800,8 @@ static void deactivate_slab(struct kmem_
 	enum slab_modes l = M_NONE, m = M_NONE;
 	void *nextfree;
 	int tail = DEACTIVATE_TO_HEAD;
-	struct page new;
-	struct page old;
+	struct slub_data new;
+	struct slub_data old;
 
 	if (slub_data(page)->freelist) {
 		stat(s, DEACTIVATE_REMOTE_FREES);
@@ -1818,13 +1824,13 @@ static void deactivate_slab(struct kmem_
 			prior = slub_data(page)->freelist;
 			counters = slub_data(page)->counters;
 			set_freepointer(s, freelist, prior);
-			slub_data(&new)->counters = counters;
-			slub_data(&new)->inuse--;
-			VM_BUG_ON_PAGE(!slub_data(&new)->frozen, &new);
+			new.counters = counters;
+			new.inuse--;
+			VM_BUG_ON_PAGE(!new.frozen, &new);
 
 		} while (!__cmpxchg_double_slab(s, page,
 			prior, counters,
-			freelist, slub_data(&new)->counters,
+			freelist, new.counters,
 			"drain percpu freelist"));
 
 		freelist = nextfree;
@@ -1846,24 +1852,24 @@ static void deactivate_slab(struct kmem_
 	 */
 redo:
 
-	slub_data(&old)->freelist = slub_data(page)->freelist;
-	slub_data(&old)->counters = slub_data(page)->counters;
-	VM_BUG_ON_PAGE(!slub_data(&old)->frozen, &old);
+	old.freelist = slub_data(page)->freelist;
+	old.counters = slub_data(page)->counters;
+	VM_BUG_ON_PAGE(!old.frozen, &old);
 
 	/* Determine target state of the slab */
-	slub_data(&new)->counters = slub_data(&old)->counters;
+	new.counters = old.counters;
 	if (freelist) {
-		slub_data(&new)->inuse--;
-		set_freepointer(s, freelist, slub_data(&old)->freelist);
-		slub_data(&new)->freelist = freelist;
+		new.inuse--;
+		set_freepointer(s, freelist, old.freelist);
+		new.freelist = freelist;
 	} else
-		slub_data(&new)->freelist = slub_data(&old)->freelist;
+		new.freelist = old.freelist;
 
-	slub_data(&new)->frozen = 0;
+	new.frozen = 0;
 
-	if (!slub_data(&new)->inuse && n->nr_partial > s->min_partial)
+	if (!new.inuse && n->nr_partial > s->min_partial)
 		m = M_FREE;
-	else if (slub_data(&new)->freelist) {
+	else if (new.freelist) {
 		m = M_PARTIAL;
 		if (!lock) {
 			lock = 1;
@@ -1912,8 +1918,8 @@ redo:
 
 	l = m;
 	if (!__cmpxchg_double_slab(s, page,
-				slub_data(&old)->freelist, slub_data(&old)->counters,
-				slub_data(&new)->freelist, slub_data(&new)->counters,
+				old.freelist, old.counters,
+				new.freelist, new.counters,
 				"unfreezing slab"))
 		goto redo;
 
@@ -1942,8 +1948,8 @@ static void unfreeze_partials(struct kme
 	struct page *page, *discard_page = NULL;
 
 	while ((page = c->partial)) {
-		struct page new;
-		struct page old;
+		struct slub_data new;
+		struct slub_data old;
 
 		c->partial = page->next;
 
@@ -1958,21 +1964,21 @@ static void unfreeze_partials(struct kme
 
 		do {
 
-			slub_data(&old)->freelist = slub_data(page)->freelist;
-			slub_data(&old)->counters = slub_data(page)->counters;
-			VM_BUG_ON_PAGE(!slub_data(&old)->frozen, &old);
+			old.freelist = slub_data(page)->freelist;
+			old.counters = slub_data(page)->counters;
+			VM_BUG_ON_PAGE(!old.frozen, &old);
 
-			slub_data(&new)->counters = slub_data(&old)->counters;
-			slub_data(&new)->freelist = slub_data(&old)->freelist;
+			new.counters = old.counters;
+			new.freelist = old.freelist;
 
-			slub_data(&new)->frozen = 0;
+			new.frozen = 0;
 
 		} while (!__cmpxchg_double_slab(s, page,
-				slub_data(&old)->freelist, slub_data(&old)->counters,
-				slub_data(&new)->freelist, slub_data(&new)->counters,
+				old.freelist, old.counters,
+				new.freelist, new.counters,
 				"unfreezing slab"));
 
-		if (unlikely(!slub_data(&new)->inuse && n->nr_partial > s->min_partial)) {
+		if (unlikely(!new.inuse && n->nr_partial > s->min_partial)) {
 			page->next = discard_page;
 			discard_page = page;
 		} else {
@@ -2224,7 +2230,7 @@ static inline bool pfmemalloc_match(stru
  */
 static inline void *get_freelist(struct kmem_cache *s, struct page *page)
 {
-	struct page new;
+	struct slub_data new;
 	unsigned long counters;
 	void *freelist;
 
@@ -2232,15 +2238,15 @@ static inline void *get_freelist(struct
 		freelist = slub_data(page)->freelist;
 		counters = slub_data(page)->counters;
 
-		slub_data(&new)->counters = counters;
-		VM_BUG_ON_PAGE(!slub_data(&new)->frozen, &new);
+		new.counters = counters;
+		VM_BUG_ON_PAGE(!new.frozen, &new);
 
-		slub_data(&new)->inuse = slub_data(page)->objects;
-		slub_data(&new)->frozen = freelist != NULL;
+		new.inuse = slub_data(page)->objects;
+		new.frozen = freelist != NULL;
 
 	} while (!__cmpxchg_double_slab(s, page,
 		freelist, counters,
-		NULL, slub_data(&new)->counters,
+		NULL, new.counters,
 		"get_freelist"));
 
 	return freelist;
@@ -2525,7 +2531,7 @@ static void __slab_free(struct kmem_cach
 	void *prior;
 	void **object = (void *)x;
 	int was_frozen;
-	struct page new;
+	struct slub_data new;
 	unsigned long counters;
 	struct kmem_cache_node *n = NULL;
 	unsigned long uninitialized_var(flags);
@@ -2544,10 +2550,10 @@ static void __slab_free(struct kmem_cach
 		prior = slub_data(page)->freelist;
 		counters = slub_data(page)->counters;
 		set_freepointer(s, object, prior);
-		slub_data(&new)->counters = counters;
-		was_frozen = slub_data(&new)->frozen;
-		slub_data(&new)->inuse--;
-		if ((!slub_data(&new)->inuse || !prior) && !was_frozen) {
+		new.counters = counters;
+		was_frozen = new.frozen;
+		new.inuse--;
+		if ((!new.inuse || !prior) && !was_frozen) {
 
 			if (kmem_cache_has_cpu_partial(s) && !prior) {
 
@@ -2557,7 +2563,7 @@ static void __slab_free(struct kmem_cach
 				 * We can defer the list move and instead
 				 * freeze it.
 				 */
-				slub_data(&new)->frozen = 1;
+				new.frozen = 1;
 
 			} else { /* Needs to be taken off a list */
 
@@ -2577,7 +2583,7 @@ static void __slab_free(struct kmem_cach
 
 	} while (!cmpxchg_double_slab(s, page,
 		prior, counters,
-		object, slub_data(&new)->counters,
+		object, new.counters,
 		"__slab_free"));
 
 	if (likely(!n)) {
@@ -2586,7 +2592,7 @@ static void __slab_free(struct kmem_cach
 		 * If we just froze the page then put it onto the
 		 * per cpu partial list.
 		 */
-		if (slub_data(&new)->frozen && !was_frozen) {
+		if (new.frozen && !was_frozen) {
 			put_cpu_partial(s, page, 1);
 			stat(s, CPU_PARTIAL_FREE);
 		}
@@ -2599,7 +2605,7 @@ static void __slab_free(struct kmem_cach
                 return;
         }
 
-	if (unlikely(!slub_data(&new)->inuse && n->nr_partial > s->min_partial))
+	if (unlikely(!new.inuse && n->nr_partial > s->min_partial))
 		goto slab_empty;
 
 	/*
@@ -3423,7 +3429,8 @@ int kmem_cache_shrink(struct kmem_cache
 		 * list_lock.  ->inuse here is the upper limit.
 		 */
 		list_for_each_entry_safe(page, t, &n->partial, lru) {
-			list_move(&page->lru, slabs_by_inuse + slub_data(page)->inuse);
+			list_move(&page->lru, slabs_by_inuse +
+						slub_data(page)->inuse);
 			if (!slub_data(page)->inuse)
 				n->nr_partial--;
 		}
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [RFC][PATCH 9/9] mm: fix alignment checks on 32-bit
  2014-01-14 18:00 ` Dave Hansen
@ 2014-01-14 18:01   ` Dave Hansen
  -1 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-14 18:01 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, penberg, cl, Dave Hansen


The checks fail on 32-bit.  This needs to get folded back in to
the original patches.  Please ignore this for now.

---

 b/include/linux/mm_types.h |   14 ++++++++------
 b/mm/slab_common.c         |   16 ++++++----------
 2 files changed, 14 insertions(+), 16 deletions(-)

diff -puN include/linux/mm_types.h~fix-slub-size-32-bit include/linux/mm_types.h
--- a/include/linux/mm_types.h~fix-slub-size-32-bit	2014-01-14 09:57:58.738740304 -0800
+++ b/include/linux/mm_types.h	2014-01-14 09:57:58.743740528 -0800
@@ -37,19 +37,21 @@ struct slub_data {
 		 * slab_lock but _count is not.
 		 */
 		struct {
-			/* counters is just a helperfor the above bitfield */
+ 			/* counters is just a helper for the above bitfield */
 			unsigned long counters;
-			atomic_t padding;
+			atomic_t mapcount_padding;
 			atomic_t separate_count;
 		};
 		/*
-		 * the double-cmpxchg case:
-		 * counters and _count overlap
+		 * the double-cmpxchg case (never used on 32-bit):
+		 * counters overlaps _count, and we are careful
+		 * to only use 32-bits of 'counters' so that we
+		 * do not interfere with _count.
 		 */
 		union {
 			unsigned long counters2;
 			struct {
-				atomic_t padding2;
+				atomic_t _unused_mapcount;
 				atomic_t _count;
 			};
 		};
@@ -73,7 +75,6 @@ struct page {
 					 * updated asynchronously */
 	union {
 		struct /* the normal uses */ {
-			pgoff_t index;		/* Our offset within mapping. */
 			/*
 			 * mapping: If low bit clear, points to
 			 * inode address_space, or NULL.  If page
@@ -82,6 +83,7 @@ struct page {
 			 * see PAGE_MAPPING_ANON below.
 			 */
 			struct address_space *mapping;
+			pgoff_t index;		/* Our offset within mapping. */
 			/*
 			 * Count of ptes mapped in mms, to show when page
 			 * is mapped & limit reverse map searches.
diff -puN mm/slab_common.c~fix-slub-size-32-bit mm/slab_common.c
--- a/mm/slab_common.c~fix-slub-size-32-bit	2014-01-14 09:57:58.739740349 -0800
+++ b/mm/slab_common.c	2014-01-14 09:57:58.743740528 -0800
@@ -695,15 +695,11 @@ void slab_build_checks(void)
 	SLAB_PAGE_CHECK(_count, dontuse_slob_count);
 
 	/*
-	 * When doing a double-cmpxchg, the slub code sucks in
-	 * _count.  But, this is harmless since if _count is
-	 * modified, the cmpxchg will fail.  When not using a
-	 * real cmpxchg, the slub code uses a lock.  But, _count
-	 * is not modified under that lock and updates can be
-	 * lost if they race with one of the "faked" cmpxchg
-	 * under that lock.  This makes sure that the space we
-	 * carve out for _count in that case actually lines up
-	 * with the real _count.
+	 * The slub code uses page->_mapcount's space for some
+	 * internal counters.  But, since ->_count and
+	 * ->_mapcount are 32-bit everywhere and the slub
+	 * counters are an unsigned long which changes size,
+	 * we need to change the checks on 32 vs. 64-bit.
 	 */
 	SLAB_PAGE_CHECK(_count, slub_data.separate_count);
 
@@ -711,6 +707,6 @@ void slab_build_checks(void)
 	 * We need at least three double-words worth of space to
 	 * ensure that we can align to a double-wordk internally.
 	 */
-	BUILD_BUG_ON(sizeof(struct slub_data) != sizeof(unsigned long) * 3);
+	BUILD_BUG_ON(sizeof(struct slub_data) < sizeof(unsigned long) * 3);
 }
 
_

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [RFC][PATCH 9/9] mm: fix alignment checks on 32-bit
@ 2014-01-14 18:01   ` Dave Hansen
  0 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-14 18:01 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, penberg, cl, Dave Hansen


The checks fail on 32-bit.  This needs to get folded back in to
the original patches.  Please ignore this for now.

---

 b/include/linux/mm_types.h |   14 ++++++++------
 b/mm/slab_common.c         |   16 ++++++----------
 2 files changed, 14 insertions(+), 16 deletions(-)

diff -puN include/linux/mm_types.h~fix-slub-size-32-bit include/linux/mm_types.h
--- a/include/linux/mm_types.h~fix-slub-size-32-bit	2014-01-14 09:57:58.738740304 -0800
+++ b/include/linux/mm_types.h	2014-01-14 09:57:58.743740528 -0800
@@ -37,19 +37,21 @@ struct slub_data {
 		 * slab_lock but _count is not.
 		 */
 		struct {
-			/* counters is just a helperfor the above bitfield */
+ 			/* counters is just a helper for the above bitfield */
 			unsigned long counters;
-			atomic_t padding;
+			atomic_t mapcount_padding;
 			atomic_t separate_count;
 		};
 		/*
-		 * the double-cmpxchg case:
-		 * counters and _count overlap
+		 * the double-cmpxchg case (never used on 32-bit):
+		 * counters overlaps _count, and we are careful
+		 * to only use 32-bits of 'counters' so that we
+		 * do not interfere with _count.
 		 */
 		union {
 			unsigned long counters2;
 			struct {
-				atomic_t padding2;
+				atomic_t _unused_mapcount;
 				atomic_t _count;
 			};
 		};
@@ -73,7 +75,6 @@ struct page {
 					 * updated asynchronously */
 	union {
 		struct /* the normal uses */ {
-			pgoff_t index;		/* Our offset within mapping. */
 			/*
 			 * mapping: If low bit clear, points to
 			 * inode address_space, or NULL.  If page
@@ -82,6 +83,7 @@ struct page {
 			 * see PAGE_MAPPING_ANON below.
 			 */
 			struct address_space *mapping;
+			pgoff_t index;		/* Our offset within mapping. */
 			/*
 			 * Count of ptes mapped in mms, to show when page
 			 * is mapped & limit reverse map searches.
diff -puN mm/slab_common.c~fix-slub-size-32-bit mm/slab_common.c
--- a/mm/slab_common.c~fix-slub-size-32-bit	2014-01-14 09:57:58.739740349 -0800
+++ b/mm/slab_common.c	2014-01-14 09:57:58.743740528 -0800
@@ -695,15 +695,11 @@ void slab_build_checks(void)
 	SLAB_PAGE_CHECK(_count, dontuse_slob_count);
 
 	/*
-	 * When doing a double-cmpxchg, the slub code sucks in
-	 * _count.  But, this is harmless since if _count is
-	 * modified, the cmpxchg will fail.  When not using a
-	 * real cmpxchg, the slub code uses a lock.  But, _count
-	 * is not modified under that lock and updates can be
-	 * lost if they race with one of the "faked" cmpxchg
-	 * under that lock.  This makes sure that the space we
-	 * carve out for _count in that case actually lines up
-	 * with the real _count.
+	 * The slub code uses page->_mapcount's space for some
+	 * internal counters.  But, since ->_count and
+	 * ->_mapcount are 32-bit everywhere and the slub
+	 * counters are an unsigned long which changes size,
+	 * we need to change the checks on 32 vs. 64-bit.
 	 */
 	SLAB_PAGE_CHECK(_count, slub_data.separate_count);
 
@@ -711,6 +707,6 @@ void slab_build_checks(void)
 	 * We need at least three double-words worth of space to
 	 * ensure that we can align to a double-wordk internally.
 	 */
-	BUILD_BUG_ON(sizeof(struct slub_data) != sizeof(unsigned long) * 3);
+	BUILD_BUG_ON(sizeof(struct slub_data) < sizeof(unsigned long) * 3);
 }
 
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 1/9] mm: slab/slub: use page->list consistently instead of page->lru
  2014-01-14 18:00   ` Dave Hansen
@ 2014-01-14 19:31     ` Christoph Lameter
  -1 siblings, 0 replies; 62+ messages in thread
From: Christoph Lameter @ 2014-01-14 19:31 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, penberg

On Tue, 14 Jan 2014, Dave Hansen wrote:

> This patch makes the slab and slub code use page->lru
> universally instead of mixing ->list and ->lru.

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 1/9] mm: slab/slub: use page->list consistently instead of page->lru
@ 2014-01-14 19:31     ` Christoph Lameter
  0 siblings, 0 replies; 62+ messages in thread
From: Christoph Lameter @ 2014-01-14 19:31 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, penberg

On Tue, 14 Jan 2014, Dave Hansen wrote:

> This patch makes the slab and slub code use page->lru
> universally instead of mixing ->list and ->lru.

Acked-by: Christoph Lameter <cl@linux.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 3/9] mm: page->pfmemalloc only used by slab/skb
  2014-01-14 18:00   ` Dave Hansen
@ 2014-01-14 19:49     ` Christoph Lameter
  -1 siblings, 0 replies; 62+ messages in thread
From: Christoph Lameter @ 2014-01-14 19:49 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, penberg

On Tue, 14 Jan 2014, Dave Hansen wrote:

> page->pfmemalloc does not deserve a spot in 'struct page'.  It is
> only used transiently _just_ after a page leaves the buddy
> allocator.

Why would we need to do this if we are removing the cmpxchg_double?

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 3/9] mm: page->pfmemalloc only used by slab/skb
@ 2014-01-14 19:49     ` Christoph Lameter
  0 siblings, 0 replies; 62+ messages in thread
From: Christoph Lameter @ 2014-01-14 19:49 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, penberg

On Tue, 14 Jan 2014, Dave Hansen wrote:

> page->pfmemalloc does not deserve a spot in 'struct page'.  It is
> only used transiently _just_ after a page leaves the buddy
> allocator.

Why would we need to do this if we are removing the cmpxchg_double?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 2/9] mm: slub: abstract out double cmpxchg option
  2014-01-14 18:00   ` Dave Hansen
@ 2014-01-14 19:49     ` Christoph Lameter
  -1 siblings, 0 replies; 62+ messages in thread
From: Christoph Lameter @ 2014-01-14 19:49 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, penberg

On Tue, 14 Jan 2014, Dave Hansen wrote:

> I found this useful to have in my testing.  I would like to have
> it available for a bit, at least until other folks have had a
> chance to do some testing with it.

I dont really see the point of this patch since we already have
CONFIG_HAVE_ALIGNED_STRUCT_PAGE to play with.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 2/9] mm: slub: abstract out double cmpxchg option
@ 2014-01-14 19:49     ` Christoph Lameter
  0 siblings, 0 replies; 62+ messages in thread
From: Christoph Lameter @ 2014-01-14 19:49 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, penberg

On Tue, 14 Jan 2014, Dave Hansen wrote:

> I found this useful to have in my testing.  I would like to have
> it available for a bit, at least until other folks have had a
> chance to do some testing with it.

I dont really see the point of this patch since we already have
CONFIG_HAVE_ALIGNED_STRUCT_PAGE to play with.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 2/9] mm: slub: abstract out double cmpxchg option
  2014-01-14 19:49     ` Christoph Lameter
@ 2014-01-14 21:41       ` Dave Hansen
  -1 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-14 21:41 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, linux-kernel, akpm, penberg

On 01/14/2014 11:49 AM, Christoph Lameter wrote:
> On Tue, 14 Jan 2014, Dave Hansen wrote:
>> I found this useful to have in my testing.  I would like to have
>> it available for a bit, at least until other folks have had a
>> chance to do some testing with it.
> 
> I dont really see the point of this patch since we already have
> CONFIG_HAVE_ALIGNED_STRUCT_PAGE to play with.

With the current code, if you wanted to turn off the double-cmpxchg abd
get a 56-byte 'struct page' how would you do it?  Can you do it with a
.config, or do you need to hack the code?

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 2/9] mm: slub: abstract out double cmpxchg option
@ 2014-01-14 21:41       ` Dave Hansen
  0 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-14 21:41 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, linux-kernel, akpm, penberg

On 01/14/2014 11:49 AM, Christoph Lameter wrote:
> On Tue, 14 Jan 2014, Dave Hansen wrote:
>> I found this useful to have in my testing.  I would like to have
>> it available for a bit, at least until other folks have had a
>> chance to do some testing with it.
> 
> I dont really see the point of this patch since we already have
> CONFIG_HAVE_ALIGNED_STRUCT_PAGE to play with.

With the current code, if you wanted to turn off the double-cmpxchg abd
get a 56-byte 'struct page' how would you do it?  Can you do it with a
.config, or do you need to hack the code?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 3/9] mm: page->pfmemalloc only used by slab/skb
  2014-01-14 19:49     ` Christoph Lameter
@ 2014-01-14 22:17       ` Dave Hansen
  -1 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-14 22:17 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, linux-kernel, akpm, penberg

On 01/14/2014 11:49 AM, Christoph Lameter wrote:
> On Tue, 14 Jan 2014, Dave Hansen wrote:
>> page->pfmemalloc does not deserve a spot in 'struct page'.  It is
>> only used transiently _just_ after a page leaves the buddy
>> allocator.
> 
> Why would we need to do this if we are removing the cmpxchg_double?

Why do we need the patch?

'struct page' is a mess.  It's really hard to follow, and the space in
the definition is a limited resource.  We should not waste that space on
such a transient and unimportant value as pfmemalloc.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 3/9] mm: page->pfmemalloc only used by slab/skb
@ 2014-01-14 22:17       ` Dave Hansen
  0 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-14 22:17 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, linux-kernel, akpm, penberg

On 01/14/2014 11:49 AM, Christoph Lameter wrote:
> On Tue, 14 Jan 2014, Dave Hansen wrote:
>> page->pfmemalloc does not deserve a spot in 'struct page'.  It is
>> only used transiently _just_ after a page leaves the buddy
>> allocator.
> 
> Why would we need to do this if we are removing the cmpxchg_double?

Why do we need the patch?

'struct page' is a mess.  It's really hard to follow, and the space in
the definition is a limited resource.  We should not waste that space on
such a transient and unimportant value as pfmemalloc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 1/9] mm: slab/slub: use page->list consistently instead of page->lru
  2014-01-14 18:00   ` Dave Hansen
@ 2014-01-15  2:31     ` David Rientjes
  -1 siblings, 0 replies; 62+ messages in thread
From: David Rientjes @ 2014-01-15  2:31 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-mm, linux-kernel, Andrew Morton, Pekka Enberg, Christoph Lameter

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1469 bytes --]

On Tue, 14 Jan 2014, Dave Hansen wrote:

> diff -puN include/linux/mm_types.h~make-slab-use-page-lru-vs-list-consistently include/linux/mm_types.h
> --- a/include/linux/mm_types.h~make-slab-use-page-lru-vs-list-consistently	2014-01-14 09:57:56.099621967 -0800
> +++ b/include/linux/mm_types.h	2014-01-14 09:57:56.106622281 -0800
> @@ -124,6 +124,8 @@ struct page {
>  	union {
>  		struct list_head lru;	/* Pageout list, eg. active_list
>  					 * protected by zone->lru_lock !
> +					 * Can be used as a generic list
> +					 * by the page owner.
>  					 */
>  		struct {		/* slub per cpu partial pages */
>  			struct page *next;	/* Next partial slab */
> @@ -136,7 +138,6 @@ struct page {
>  #endif
>  		};
>  
> -		struct list_head list;	/* slobs list of pages */
>  		struct slab *slab_page; /* slab fields */
>  		struct rcu_head rcu_head;	/* Used by SLAB
>  						 * when destroying via RCU

Did you try with a CONFIG_BLOCK config?

block/blk-mq.c: In function ‘blk_mq_free_rq_map’:
block/blk-mq.c:1094:10: error: ‘struct page’ has no member named ‘list’
block/blk-mq.c:1094:10: warning: initialization from incompatible pointer type [enabled by default]
block/blk-mq.c:1094:10: error: ‘struct page’ has no member named ‘list’
block/blk-mq.c:1095:22: error: ‘struct page’ has no member named ‘list’
block/blk-mq.c: In function ‘blk_mq_init_rq_map’:
block/blk-mq.c:1159:22: error: ‘struct page’ has no member named ‘list’

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 1/9] mm: slab/slub: use page->list consistently instead of page->lru
@ 2014-01-15  2:31     ` David Rientjes
  0 siblings, 0 replies; 62+ messages in thread
From: David Rientjes @ 2014-01-15  2:31 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-mm, linux-kernel, Andrew Morton, Pekka Enberg, Christoph Lameter

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1469 bytes --]

On Tue, 14 Jan 2014, Dave Hansen wrote:

> diff -puN include/linux/mm_types.h~make-slab-use-page-lru-vs-list-consistently include/linux/mm_types.h
> --- a/include/linux/mm_types.h~make-slab-use-page-lru-vs-list-consistently	2014-01-14 09:57:56.099621967 -0800
> +++ b/include/linux/mm_types.h	2014-01-14 09:57:56.106622281 -0800
> @@ -124,6 +124,8 @@ struct page {
>  	union {
>  		struct list_head lru;	/* Pageout list, eg. active_list
>  					 * protected by zone->lru_lock !
> +					 * Can be used as a generic list
> +					 * by the page owner.
>  					 */
>  		struct {		/* slub per cpu partial pages */
>  			struct page *next;	/* Next partial slab */
> @@ -136,7 +138,6 @@ struct page {
>  #endif
>  		};
>  
> -		struct list_head list;	/* slobs list of pages */
>  		struct slab *slab_page; /* slab fields */
>  		struct rcu_head rcu_head;	/* Used by SLAB
>  						 * when destroying via RCU

Did you try with a CONFIG_BLOCK config?

block/blk-mq.c: In function a??blk_mq_free_rq_mapa??:
block/blk-mq.c:1094:10: error: a??struct pagea?? has no member named a??lista??
block/blk-mq.c:1094:10: warning: initialization from incompatible pointer type [enabled by default]
block/blk-mq.c:1094:10: error: a??struct pagea?? has no member named a??lista??
block/blk-mq.c:1095:22: error: a??struct pagea?? has no member named a??lista??
block/blk-mq.c: In function a??blk_mq_init_rq_mapa??:
block/blk-mq.c:1159:22: error: a??struct pagea?? has no member named a??lista??

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 2/9] mm: slub: abstract out double cmpxchg option
  2014-01-14 21:41       ` Dave Hansen
@ 2014-01-15  2:37         ` David Rientjes
  -1 siblings, 0 replies; 62+ messages in thread
From: David Rientjes @ 2014-01-15  2:37 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Christoph Lameter, linux-mm, linux-kernel, akpm, penberg

On Tue, 14 Jan 2014, Dave Hansen wrote:

> > On Tue, 14 Jan 2014, Dave Hansen wrote:
> >> I found this useful to have in my testing.  I would like to have
> >> it available for a bit, at least until other folks have had a
> >> chance to do some testing with it.
> > 
> > I dont really see the point of this patch since we already have
> > CONFIG_HAVE_ALIGNED_STRUCT_PAGE to play with.
> 
> With the current code, if you wanted to turn off the double-cmpxchg abd
> get a 56-byte 'struct page' how would you do it?  Can you do it with a
> .config, or do you need to hack the code?
> 

If that's the intention of disabling this new config option, then it 
should probably mention the savings in the "help" section, similar to what
CONFIG_HAVE_ALIGNED_STRUCT_PAGE says.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 2/9] mm: slub: abstract out double cmpxchg option
@ 2014-01-15  2:37         ` David Rientjes
  0 siblings, 0 replies; 62+ messages in thread
From: David Rientjes @ 2014-01-15  2:37 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Christoph Lameter, linux-mm, linux-kernel, akpm, penberg

On Tue, 14 Jan 2014, Dave Hansen wrote:

> > On Tue, 14 Jan 2014, Dave Hansen wrote:
> >> I found this useful to have in my testing.  I would like to have
> >> it available for a bit, at least until other folks have had a
> >> chance to do some testing with it.
> > 
> > I dont really see the point of this patch since we already have
> > CONFIG_HAVE_ALIGNED_STRUCT_PAGE to play with.
> 
> With the current code, if you wanted to turn off the double-cmpxchg abd
> get a 56-byte 'struct page' how would you do it?  Can you do it with a
> .config, or do you need to hack the code?
> 

If that's the intention of disabling this new config option, then it 
should probably mention the savings in the "help" section, similar to what
CONFIG_HAVE_ALIGNED_STRUCT_PAGE says.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 3/9] mm: page->pfmemalloc only used by slab/skb
  2014-01-14 22:17       ` Dave Hansen
@ 2014-01-15  2:45         ` David Rientjes
  -1 siblings, 0 replies; 62+ messages in thread
From: David Rientjes @ 2014-01-15  2:45 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Christoph Lameter, linux-mm, linux-kernel, akpm, penberg

On Tue, 14 Jan 2014, Dave Hansen wrote:

> >> page->pfmemalloc does not deserve a spot in 'struct page'.  It is
> >> only used transiently _just_ after a page leaves the buddy
> >> allocator.
> > 
> > Why would we need to do this if we are removing the cmpxchg_double?
> 
> Why do we need the patch?
> 
> 'struct page' is a mess.  It's really hard to follow, and the space in
> the definition is a limited resource.  We should not waste that space on
> such a transient and unimportant value as pfmemalloc.
> 

I don't have any strong opinions on whether this patch is merged or not, 
but I'm not sure it's cleaner to do it with an accessor function that 
overloads page->index when its placement within the union inside 
struct page makes that obvious, nor is it good that the patch adds more 
code than it removes solely because it introduces those accessor 
functions.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 3/9] mm: page->pfmemalloc only used by slab/skb
@ 2014-01-15  2:45         ` David Rientjes
  0 siblings, 0 replies; 62+ messages in thread
From: David Rientjes @ 2014-01-15  2:45 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Christoph Lameter, linux-mm, linux-kernel, akpm, penberg

On Tue, 14 Jan 2014, Dave Hansen wrote:

> >> page->pfmemalloc does not deserve a spot in 'struct page'.  It is
> >> only used transiently _just_ after a page leaves the buddy
> >> allocator.
> > 
> > Why would we need to do this if we are removing the cmpxchg_double?
> 
> Why do we need the patch?
> 
> 'struct page' is a mess.  It's really hard to follow, and the space in
> the definition is a limited resource.  We should not waste that space on
> such a transient and unimportant value as pfmemalloc.
> 

I don't have any strong opinions on whether this patch is merged or not, 
but I'm not sure it's cleaner to do it with an accessor function that 
overloads page->index when its placement within the union inside 
struct page makes that obvious, nor is it good that the patch adds more 
code than it removes solely because it introduces those accessor 
functions.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 4/9] mm: slabs: reset page at free
  2014-01-14 18:00   ` Dave Hansen
@ 2014-01-15  2:48     ` David Rientjes
  -1 siblings, 0 replies; 62+ messages in thread
From: David Rientjes @ 2014-01-15  2:48 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, penberg, cl

On Tue, 14 Jan 2014, Dave Hansen wrote:

> diff -puN include/linux/mm.h~slub-reset-page-at-free include/linux/mm.h
> --- a/include/linux/mm.h~slub-reset-page-at-free	2014-01-14 09:57:57.099666808 -0800
> +++ b/include/linux/mm.h	2014-01-14 09:57:57.110667301 -0800
> @@ -2076,5 +2076,16 @@ static inline void set_page_pfmemalloc(s
>  	page->index = pfmemalloc;
>  }
>  
> +/*
> + * Custom allocators (like the slabs) use 'struct page' fields
> + * for all kinds of things.  This resets the page's state so that
> + * the buddy allocator will be happy with it.
> + */
> +static inline void allocator_reset_page(struct page *page)

This is ambiguous as to what "allocator" you're referring to unless we 
look at the comment.  I think it would be better to name it 
slab_reset_page() or something similar.

> +{
> +	page->mapping = NULL;
> +	page_mapcount_reset(page);
> +}
> +
>  #endif /* __KERNEL__ */
>  #endif /* _LINUX_MM_H */

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 4/9] mm: slabs: reset page at free
@ 2014-01-15  2:48     ` David Rientjes
  0 siblings, 0 replies; 62+ messages in thread
From: David Rientjes @ 2014-01-15  2:48 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, penberg, cl

On Tue, 14 Jan 2014, Dave Hansen wrote:

> diff -puN include/linux/mm.h~slub-reset-page-at-free include/linux/mm.h
> --- a/include/linux/mm.h~slub-reset-page-at-free	2014-01-14 09:57:57.099666808 -0800
> +++ b/include/linux/mm.h	2014-01-14 09:57:57.110667301 -0800
> @@ -2076,5 +2076,16 @@ static inline void set_page_pfmemalloc(s
>  	page->index = pfmemalloc;
>  }
>  
> +/*
> + * Custom allocators (like the slabs) use 'struct page' fields
> + * for all kinds of things.  This resets the page's state so that
> + * the buddy allocator will be happy with it.
> + */
> +static inline void allocator_reset_page(struct page *page)

This is ambiguous as to what "allocator" you're referring to unless we 
look at the comment.  I think it would be better to name it 
slab_reset_page() or something similar.

> +{
> +	page->mapping = NULL;
> +	page_mapcount_reset(page);
> +}
> +
>  #endif /* __KERNEL__ */
>  #endif /* _LINUX_MM_H */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 1/9] mm: slab/slub: use page->list consistently instead of page->lru
  2014-01-15  2:31     ` David Rientjes
@ 2014-01-15  6:58       ` Dave Hansen
  -1 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-15  6:58 UTC (permalink / raw)
  To: David Rientjes
  Cc: linux-mm, linux-kernel, Andrew Morton, Pekka Enberg, Christoph Lameter

On 01/14/2014 06:31 PM, David Rientjes wrote:
> Did you try with a CONFIG_BLOCK config?
> 
> block/blk-mq.c: In function ‘blk_mq_free_rq_map’:
> block/blk-mq.c:1094:10: error: ‘struct page’ has no member named ‘list’
> block/blk-mq.c:1094:10: warning: initialization from incompatible pointer type [enabled by default]
> block/blk-mq.c:1094:10: error: ‘struct page’ has no member named ‘list’
> block/blk-mq.c:1095:22: error: ‘struct page’ has no member named ‘list’
> block/blk-mq.c: In function ‘blk_mq_init_rq_map’:
> block/blk-mq.c:1159:22: error: ‘struct page’ has no member named ‘list’

As I mentioned in the introduction, these are against linux-next.
There's a patch in there at the moment which fixed this.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 1/9] mm: slab/slub: use page->list consistently instead of page->lru
@ 2014-01-15  6:58       ` Dave Hansen
  0 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-15  6:58 UTC (permalink / raw)
  To: David Rientjes
  Cc: linux-mm, linux-kernel, Andrew Morton, Pekka Enberg, Christoph Lameter

On 01/14/2014 06:31 PM, David Rientjes wrote:
> Did you try with a CONFIG_BLOCK config?
> 
> block/blk-mq.c: In function a??blk_mq_free_rq_mapa??:
> block/blk-mq.c:1094:10: error: a??struct pagea?? has no member named a??lista??
> block/blk-mq.c:1094:10: warning: initialization from incompatible pointer type [enabled by default]
> block/blk-mq.c:1094:10: error: a??struct pagea?? has no member named a??lista??
> block/blk-mq.c:1095:22: error: a??struct pagea?? has no member named a??lista??
> block/blk-mq.c: In function a??blk_mq_init_rq_mapa??:
> block/blk-mq.c:1159:22: error: a??struct pagea?? has no member named a??lista??

As I mentioned in the introduction, these are against linux-next.
There's a patch in there at the moment which fixed this.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 1/9] mm: slab/slub: use page->list consistently instead of page->lru
  2014-01-15  6:58       ` Dave Hansen
@ 2014-01-15  7:16         ` David Rientjes
  -1 siblings, 0 replies; 62+ messages in thread
From: David Rientjes @ 2014-01-15  7:16 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-mm, linux-kernel, Andrew Morton, Pekka Enberg, Christoph Lameter

[-- Attachment #1: Type: TEXT/PLAIN, Size: 809 bytes --]

On Tue, 14 Jan 2014, Dave Hansen wrote:

> > block/blk-mq.c: In function ‘blk_mq_free_rq_map’:
> > block/blk-mq.c:1094:10: error: ‘struct page’ has no member named ‘list’
> > block/blk-mq.c:1094:10: warning: initialization from incompatible pointer type [enabled by default]
> > block/blk-mq.c:1094:10: error: ‘struct page’ has no member named ‘list’
> > block/blk-mq.c:1095:22: error: ‘struct page’ has no member named ‘list’
> > block/blk-mq.c: In function ‘blk_mq_init_rq_map’:
> > block/blk-mq.c:1159:22: error: ‘struct page’ has no member named ‘list’
> 
> As I mentioned in the introduction, these are against linux-next.
> There's a patch in there at the moment which fixed this.
> 

Ok, thanks, I like this patch.

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 1/9] mm: slab/slub: use page->list consistently instead of page->lru
@ 2014-01-15  7:16         ` David Rientjes
  0 siblings, 0 replies; 62+ messages in thread
From: David Rientjes @ 2014-01-15  7:16 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-mm, linux-kernel, Andrew Morton, Pekka Enberg, Christoph Lameter

[-- Attachment #1: Type: TEXT/PLAIN, Size: 809 bytes --]

On Tue, 14 Jan 2014, Dave Hansen wrote:

> > block/blk-mq.c: In function a??blk_mq_free_rq_mapa??:
> > block/blk-mq.c:1094:10: error: a??struct pagea?? has no member named a??lista??
> > block/blk-mq.c:1094:10: warning: initialization from incompatible pointer type [enabled by default]
> > block/blk-mq.c:1094:10: error: a??struct pagea?? has no member named a??lista??
> > block/blk-mq.c:1095:22: error: a??struct pagea?? has no member named a??lista??
> > block/blk-mq.c: In function a??blk_mq_init_rq_mapa??:
> > block/blk-mq.c:1159:22: error: a??struct pagea?? has no member named a??lista??
> 
> As I mentioned in the introduction, these are against linux-next.
> There's a patch in there at the moment which fixed this.
> 

Ok, thanks, I like this patch.

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 1/9] mm: slab/slub: use page->list consistently instead of page->lru
  2014-01-14 18:00   ` Dave Hansen
@ 2014-01-16  0:11     ` Kirill A. Shutemov
  -1 siblings, 0 replies; 62+ messages in thread
From: Kirill A. Shutemov @ 2014-01-16  0:11 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, penberg, cl

On Tue, Jan 14, 2014 at 10:00:44AM -0800, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> 'struct page' has two list_head fields: 'lru' and 'list'.
> Conveniently, they are unioned together.  This means that code
> can use them interchangably, which gets horribly confusing like
> with this nugget from slab.c:
> 
> >	list_del(&page->lru);
> >	if (page->active == cachep->num)
> >		list_add(&page->list, &n->slabs_full);
> 
> This patch makes the slab and slub code use page->lru
> universally instead of mixing ->list and ->lru.
> 
> So, the new rule is: page->lru is what the you use if you want to
> keep your page on a list.  Don't like the fact that it's not
> called ->list?  Too bad.
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 1/9] mm: slab/slub: use page->list consistently instead of page->lru
@ 2014-01-16  0:11     ` Kirill A. Shutemov
  0 siblings, 0 replies; 62+ messages in thread
From: Kirill A. Shutemov @ 2014-01-16  0:11 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, penberg, cl

On Tue, Jan 14, 2014 at 10:00:44AM -0800, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> 'struct page' has two list_head fields: 'lru' and 'list'.
> Conveniently, they are unioned together.  This means that code
> can use them interchangably, which gets horribly confusing like
> with this nugget from slab.c:
> 
> >	list_del(&page->lru);
> >	if (page->active == cachep->num)
> >		list_add(&page->list, &n->slabs_full);
> 
> This patch makes the slab and slub code use page->lru
> universally instead of mixing ->list and ->lru.
> 
> So, the new rule is: page->lru is what the you use if you want to
> keep your page on a list.  Don't like the fact that it's not
> called ->list?  Too bad.
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 3/9] mm: page->pfmemalloc only used by slab/skb
  2014-01-14 18:00   ` Dave Hansen
@ 2014-01-16  0:16     ` Kirill A. Shutemov
  -1 siblings, 0 replies; 62+ messages in thread
From: Kirill A. Shutemov @ 2014-01-16  0:16 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, penberg, cl

On Tue, Jan 14, 2014 at 10:00:51AM -0800, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> page->pfmemalloc does not deserve a spot in 'struct page'.  It is
> only used transiently _just_ after a page leaves the buddy
> allocator.
> 
> Instead of declaring a union, we move its functionality behind a
> few quick accessor functions.  This way we could also much more
> easily audit that it is being used correctly in debugging
> scenarios.  For instance, we could store a magic number in there
> which could never get reused as a page->index and check that the
> magic number exists in page_pfmemalloc().
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 3/9] mm: page->pfmemalloc only used by slab/skb
@ 2014-01-16  0:16     ` Kirill A. Shutemov
  0 siblings, 0 replies; 62+ messages in thread
From: Kirill A. Shutemov @ 2014-01-16  0:16 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, penberg, cl

On Tue, Jan 14, 2014 at 10:00:51AM -0800, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> page->pfmemalloc does not deserve a spot in 'struct page'.  It is
> only used transiently _just_ after a page leaves the buddy
> allocator.
> 
> Instead of declaring a union, we move its functionality behind a
> few quick accessor functions.  This way we could also much more
> easily audit that it is being used correctly in debugging
> scenarios.  For instance, we could store a magic number in there
> which could never get reused as a page->index and check that the
> magic number exists in page_pfmemalloc().
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 5/9] mm: rearrange struct page
  2014-01-14 18:00   ` Dave Hansen
@ 2014-01-16  0:20     ` Kirill A. Shutemov
  -1 siblings, 0 replies; 62+ messages in thread
From: Kirill A. Shutemov @ 2014-01-16  0:20 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, penberg, cl

On Tue, Jan 14, 2014 at 10:00:55AM -0800, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> To make the layout of 'struct page' look nicer, I broke
> up a few of the unions.  But, this has a cost: things that
> were guaranteed to line up before might not any more.  To make up
> for that, some BUILD_BUG_ON()s are added to manually check for
> the alignment dependencies.
> 
> This makes it *MUCH* more clear how the first few fields of
> 'struct page' get used by the slab allocators.
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>

Looks much cleaner!

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 5/9] mm: rearrange struct page
@ 2014-01-16  0:20     ` Kirill A. Shutemov
  0 siblings, 0 replies; 62+ messages in thread
From: Kirill A. Shutemov @ 2014-01-16  0:20 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, penberg, cl

On Tue, Jan 14, 2014 at 10:00:55AM -0800, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> To make the layout of 'struct page' look nicer, I broke
> up a few of the unions.  But, this has a cost: things that
> were guaranteed to line up before might not any more.  To make up
> for that, some BUILD_BUG_ON()s are added to manually check for
> the alignment dependencies.
> 
> This makes it *MUCH* more clear how the first few fields of
> 'struct page' get used by the slab allocators.
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>

Looks much cleaner!

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 2/9] mm: slub: abstract out double cmpxchg option
  2014-01-14 21:41       ` Dave Hansen
@ 2014-01-16 16:45         ` Christoph Lameter
  -1 siblings, 0 replies; 62+ messages in thread
From: Christoph Lameter @ 2014-01-16 16:45 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, penberg

On Tue, 14 Jan 2014, Dave Hansen wrote:

> On 01/14/2014 11:49 AM, Christoph Lameter wrote:
> > On Tue, 14 Jan 2014, Dave Hansen wrote:
> >> I found this useful to have in my testing.  I would like to have
> >> it available for a bit, at least until other folks have had a
> >> chance to do some testing with it.
> >
> > I dont really see the point of this patch since we already have
> > CONFIG_HAVE_ALIGNED_STRUCT_PAGE to play with.
>
> With the current code, if you wanted to turn off the double-cmpxchg abd
> get a 56-byte 'struct page' how would you do it?  Can you do it with a
> .config, or do you need to hack the code?

Remove HAVE_ALIGNED_STRUCT_PAGE from a Kconfig file.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 2/9] mm: slub: abstract out double cmpxchg option
@ 2014-01-16 16:45         ` Christoph Lameter
  0 siblings, 0 replies; 62+ messages in thread
From: Christoph Lameter @ 2014-01-16 16:45 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, penberg

On Tue, 14 Jan 2014, Dave Hansen wrote:

> On 01/14/2014 11:49 AM, Christoph Lameter wrote:
> > On Tue, 14 Jan 2014, Dave Hansen wrote:
> >> I found this useful to have in my testing.  I would like to have
> >> it available for a bit, at least until other folks have had a
> >> chance to do some testing with it.
> >
> > I dont really see the point of this patch since we already have
> > CONFIG_HAVE_ALIGNED_STRUCT_PAGE to play with.
>
> With the current code, if you wanted to turn off the double-cmpxchg abd
> get a 56-byte 'struct page' how would you do it?  Can you do it with a
> .config, or do you need to hack the code?

Remove HAVE_ALIGNED_STRUCT_PAGE from a Kconfig file.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 2/9] mm: slub: abstract out double cmpxchg option
  2014-01-16 16:45         ` Christoph Lameter
@ 2014-01-16 17:13           ` Dave Hansen
  -1 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-16 17:13 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, linux-kernel, akpm, penberg

On 01/16/2014 08:45 AM, Christoph Lameter wrote:
> On Tue, 14 Jan 2014, Dave Hansen wrote:
>> With the current code, if you wanted to turn off the double-cmpxchg abd
>> get a 56-byte 'struct page' how would you do it?  Can you do it with a
>> .config, or do you need to hack the code?
> 
> Remove HAVE_ALIGNED_STRUCT_PAGE from a Kconfig file.

SLUB 'selects' it, so it seems to pop back up whenever SLUB is on:

$ grep STRUCT_PAGE ~/build/64bit/.config
CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y
$ vi ~/build/64bit/.config
... remove from Kconfig file
$ grep STRUCT_PAGE ~/build/64bit/.config
$ make oldconfig
... no prompt
$ grep STRUCT_PAGE ~/build/64bit/.config
CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 2/9] mm: slub: abstract out double cmpxchg option
@ 2014-01-16 17:13           ` Dave Hansen
  0 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-16 17:13 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, linux-kernel, akpm, penberg

On 01/16/2014 08:45 AM, Christoph Lameter wrote:
> On Tue, 14 Jan 2014, Dave Hansen wrote:
>> With the current code, if you wanted to turn off the double-cmpxchg abd
>> get a 56-byte 'struct page' how would you do it?  Can you do it with a
>> .config, or do you need to hack the code?
> 
> Remove HAVE_ALIGNED_STRUCT_PAGE from a Kconfig file.

SLUB 'selects' it, so it seems to pop back up whenever SLUB is on:

$ grep STRUCT_PAGE ~/build/64bit/.config
CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y
$ vi ~/build/64bit/.config
... remove from Kconfig file
$ grep STRUCT_PAGE ~/build/64bit/.config
$ make oldconfig
... no prompt
$ grep STRUCT_PAGE ~/build/64bit/.config
CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 4/9] mm: slabs: reset page at free
  2014-01-14 18:00   ` Dave Hansen
@ 2014-01-16 18:32     ` Christoph Lameter
  -1 siblings, 0 replies; 62+ messages in thread
From: Christoph Lameter @ 2014-01-16 18:32 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, penberg

On Tue, 14 Jan 2014, Dave Hansen wrote:

> diff -puN include/linux/mm.h~slub-reset-page-at-free include/linux/mm.h
> --- a/include/linux/mm.h~slub-reset-page-at-free	2014-01-14 09:57:57.099666808 -0800
> +++ b/include/linux/mm.h	2014-01-14 09:57:57.110667301 -0800
> @@ -2076,5 +2076,16 @@ static inline void set_page_pfmemalloc(s
>  	page->index = pfmemalloc;
>  }
>
> +/*
> + * Custom allocators (like the slabs) use 'struct page' fields
> + * for all kinds of things.  This resets the page's state so that
> + * the buddy allocator will be happy with it.
> + */
> +static inline void allocator_reset_page(struct page *page)
> +{
> +	page->mapping = NULL;
> +	page_mapcount_reset(page);
> +}
> +
>  #endif /* __KERNEL__ */
>  #endif /* _LINUX_MM_H */

This belongs into mm/slab.h

Otherwise Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 4/9] mm: slabs: reset page at free
@ 2014-01-16 18:32     ` Christoph Lameter
  0 siblings, 0 replies; 62+ messages in thread
From: Christoph Lameter @ 2014-01-16 18:32 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, penberg

On Tue, 14 Jan 2014, Dave Hansen wrote:

> diff -puN include/linux/mm.h~slub-reset-page-at-free include/linux/mm.h
> --- a/include/linux/mm.h~slub-reset-page-at-free	2014-01-14 09:57:57.099666808 -0800
> +++ b/include/linux/mm.h	2014-01-14 09:57:57.110667301 -0800
> @@ -2076,5 +2076,16 @@ static inline void set_page_pfmemalloc(s
>  	page->index = pfmemalloc;
>  }
>
> +/*
> + * Custom allocators (like the slabs) use 'struct page' fields
> + * for all kinds of things.  This resets the page's state so that
> + * the buddy allocator will be happy with it.
> + */
> +static inline void allocator_reset_page(struct page *page)
> +{
> +	page->mapping = NULL;
> +	page_mapcount_reset(page);
> +}
> +
>  #endif /* __KERNEL__ */
>  #endif /* _LINUX_MM_H */

This belongs into mm/slab.h

Otherwise Acked-by: Christoph Lameter <cl@linux.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 5/9] mm: rearrange struct page
  2014-01-14 18:00   ` Dave Hansen
@ 2014-01-16 18:34     ` Christoph Lameter
  -1 siblings, 0 replies; 62+ messages in thread
From: Christoph Lameter @ 2014-01-16 18:34 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, penberg

On Tue, 14 Jan 2014, Dave Hansen wrote:

>
> This makes it *MUCH* more clear how the first few fields of
> 'struct page' get used by the slab allocators.

I think this adds to the confusion. What you want to know is which other
fields overlap a certain field. After this patch you wont have that
anymore.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 5/9] mm: rearrange struct page
@ 2014-01-16 18:34     ` Christoph Lameter
  0 siblings, 0 replies; 62+ messages in thread
From: Christoph Lameter @ 2014-01-16 18:34 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, penberg

On Tue, 14 Jan 2014, Dave Hansen wrote:

>
> This makes it *MUCH* more clear how the first few fields of
> 'struct page' get used by the slab allocators.

I think this adds to the confusion. What you want to know is which other
fields overlap a certain field. After this patch you wont have that
anymore.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 4/9] mm: slabs: reset page at free
  2014-01-15  2:48     ` David Rientjes
@ 2014-01-16 18:35       ` Dave Hansen
  -1 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-16 18:35 UTC (permalink / raw)
  To: David Rientjes; +Cc: linux-mm, linux-kernel, akpm, penberg, cl

On 01/14/2014 06:48 PM, David Rientjes wrote:
>> > +/*
>> > + * Custom allocators (like the slabs) use 'struct page' fields
>> > + * for all kinds of things.  This resets the page's state so that
>> > + * the buddy allocator will be happy with it.
>> > + */
>> > +static inline void allocator_reset_page(struct page *page)
> This is ambiguous as to what "allocator" you're referring to unless we 
> look at the comment.  I think it would be better to name it 
> slab_reset_page() or something similar.

I stuck it in mm.h and deliberately didn't call it 'slab_something' so
that zsmalloc (in staging) could use this as well.  The "allocator" part
of the name was to indicate that any allocator could use it.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 4/9] mm: slabs: reset page at free
@ 2014-01-16 18:35       ` Dave Hansen
  0 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-16 18:35 UTC (permalink / raw)
  To: David Rientjes; +Cc: linux-mm, linux-kernel, akpm, penberg, cl

On 01/14/2014 06:48 PM, David Rientjes wrote:
>> > +/*
>> > + * Custom allocators (like the slabs) use 'struct page' fields
>> > + * for all kinds of things.  This resets the page's state so that
>> > + * the buddy allocator will be happy with it.
>> > + */
>> > +static inline void allocator_reset_page(struct page *page)
> This is ambiguous as to what "allocator" you're referring to unless we 
> look at the comment.  I think it would be better to name it 
> slab_reset_page() or something similar.

I stuck it in mm.h and deliberately didn't call it 'slab_something' so
that zsmalloc (in staging) could use this as well.  The "allocator" part
of the name was to indicate that any allocator could use it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 5/9] mm: rearrange struct page
  2014-01-16 18:34     ` Christoph Lameter
@ 2014-01-16 22:29       ` Dave Hansen
  -1 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-16 22:29 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, linux-kernel, akpm, penberg

On 01/16/2014 10:34 AM, Christoph Lameter wrote:
> On Tue, 14 Jan 2014, Dave Hansen wrote:
>> This makes it *MUCH* more clear how the first few fields of
>> 'struct page' get used by the slab allocators.
> 
> I think this adds to the confusion. What you want to know is which other
> fields overlap a certain field. After this patch you wont have that
> anymore.

Why does it matter *specifically* that "index shares space with
freelist", or that "mapping shares space with s_mem"?  No data is ever
handed off in those fields.

All that matters is that we know the _set_ of fields that gets reused,
and that we preserve the ones which *need* their contents preserved
(only flags and _count as far as I can tell).

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 5/9] mm: rearrange struct page
@ 2014-01-16 22:29       ` Dave Hansen
  0 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2014-01-16 22:29 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, linux-kernel, akpm, penberg

On 01/16/2014 10:34 AM, Christoph Lameter wrote:
> On Tue, 14 Jan 2014, Dave Hansen wrote:
>> This makes it *MUCH* more clear how the first few fields of
>> 'struct page' get used by the slab allocators.
> 
> I think this adds to the confusion. What you want to know is which other
> fields overlap a certain field. After this patch you wont have that
> anymore.

Why does it matter *specifically* that "index shares space with
freelist", or that "mapping shares space with s_mem"?  No data is ever
handed off in those fields.

All that matters is that we know the _set_ of fields that gets reused,
and that we preserve the ones which *need* their contents preserved
(only flags and _count as far as I can tell).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 5/9] mm: rearrange struct page
  2014-01-16 22:29       ` Dave Hansen
@ 2014-01-17 14:58         ` Christoph Lameter
  -1 siblings, 0 replies; 62+ messages in thread
From: Christoph Lameter @ 2014-01-17 14:58 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, penberg

On Thu, 16 Jan 2014, Dave Hansen wrote:

> Why does it matter *specifically* that "index shares space with
> freelist", or that "mapping shares space with s_mem"?  No data is ever
> handed off in those fields.

If the field is corrupted then one needs to figure out what other ways of
using this field exists. If one wants to change the page struct fields
then also knowlege about possible interconnections are important.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC][PATCH 5/9] mm: rearrange struct page
@ 2014-01-17 14:58         ` Christoph Lameter
  0 siblings, 0 replies; 62+ messages in thread
From: Christoph Lameter @ 2014-01-17 14:58 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, penberg

On Thu, 16 Jan 2014, Dave Hansen wrote:

> Why does it matter *specifically* that "index shares space with
> freelist", or that "mapping shares space with s_mem"?  No data is ever
> handed off in those fields.

If the field is corrupted then one needs to figure out what other ways of
using this field exists. If one wants to change the page struct fields
then also knowlege about possible interconnections are important.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2014-01-17 14:58 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-14 18:00 [RFC][PATCH 0/9] re-shrink 'struct page' when SLUB is on Dave Hansen
2014-01-14 18:00 ` Dave Hansen
2014-01-14 18:00 ` [RFC][PATCH 1/9] mm: slab/slub: use page->list consistently instead of page->lru Dave Hansen
2014-01-14 18:00   ` Dave Hansen
2014-01-14 19:31   ` Christoph Lameter
2014-01-14 19:31     ` Christoph Lameter
2014-01-15  2:31   ` David Rientjes
2014-01-15  2:31     ` David Rientjes
2014-01-15  6:58     ` Dave Hansen
2014-01-15  6:58       ` Dave Hansen
2014-01-15  7:16       ` David Rientjes
2014-01-15  7:16         ` David Rientjes
2014-01-16  0:11   ` Kirill A. Shutemov
2014-01-16  0:11     ` Kirill A. Shutemov
2014-01-14 18:00 ` [RFC][PATCH 2/9] mm: slub: abstract out double cmpxchg option Dave Hansen
2014-01-14 18:00   ` Dave Hansen
2014-01-14 19:49   ` Christoph Lameter
2014-01-14 19:49     ` Christoph Lameter
2014-01-14 21:41     ` Dave Hansen
2014-01-14 21:41       ` Dave Hansen
2014-01-15  2:37       ` David Rientjes
2014-01-15  2:37         ` David Rientjes
2014-01-16 16:45       ` Christoph Lameter
2014-01-16 16:45         ` Christoph Lameter
2014-01-16 17:13         ` Dave Hansen
2014-01-16 17:13           ` Dave Hansen
2014-01-14 18:00 ` [RFC][PATCH 3/9] mm: page->pfmemalloc only used by slab/skb Dave Hansen
2014-01-14 18:00   ` Dave Hansen
2014-01-14 19:49   ` Christoph Lameter
2014-01-14 19:49     ` Christoph Lameter
2014-01-14 22:17     ` Dave Hansen
2014-01-14 22:17       ` Dave Hansen
2014-01-15  2:45       ` David Rientjes
2014-01-15  2:45         ` David Rientjes
2014-01-16  0:16   ` Kirill A. Shutemov
2014-01-16  0:16     ` Kirill A. Shutemov
2014-01-14 18:00 ` [RFC][PATCH 4/9] mm: slabs: reset page at free Dave Hansen
2014-01-14 18:00   ` Dave Hansen
2014-01-15  2:48   ` David Rientjes
2014-01-15  2:48     ` David Rientjes
2014-01-16 18:35     ` Dave Hansen
2014-01-16 18:35       ` Dave Hansen
2014-01-16 18:32   ` Christoph Lameter
2014-01-16 18:32     ` Christoph Lameter
2014-01-14 18:00 ` [RFC][PATCH 5/9] mm: rearrange struct page Dave Hansen
2014-01-14 18:00   ` Dave Hansen
2014-01-16  0:20   ` Kirill A. Shutemov
2014-01-16  0:20     ` Kirill A. Shutemov
2014-01-16 18:34   ` Christoph Lameter
2014-01-16 18:34     ` Christoph Lameter
2014-01-16 22:29     ` Dave Hansen
2014-01-16 22:29       ` Dave Hansen
2014-01-17 14:58       ` Christoph Lameter
2014-01-17 14:58         ` Christoph Lameter
2014-01-14 18:01 ` [RFC][PATCH 6/9] mm: slub: rearrange 'struct page' fields Dave Hansen
2014-01-14 18:01   ` Dave Hansen
2014-01-14 18:01 ` [RFC][PATCH 7/9] mm: slub: remove 'struct page' alignment restrictions Dave Hansen
2014-01-14 18:01   ` Dave Hansen
2014-01-14 18:01 ` [RFC][PATCH 8/9] mm: slub: cleanups after code churn Dave Hansen
2014-01-14 18:01   ` Dave Hansen
2014-01-14 18:01 ` [RFC][PATCH 9/9] mm: fix alignment checks on 32-bit Dave Hansen
2014-01-14 18:01   ` Dave Hansen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.