[PATCH 00/16] slab: overload struct slab over struct page to reduce memory usage

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 00/16] slab: overload struct slab over struct page to reduce memory usage
@ 2013-08-22  8:44 ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

There is two main topics in this patchset. One is to reduce memory usage
and the other is to change a management method of free objects of a slab.

The SLAB allocate a struct slab for each slab. The size of this structure
except bufctl array is 40 bytes on 64 bits machine. We can reduce memory
waste and cache footprint if we overload struct slab over struct page.

And this patchset change a management method of free objects of a slab.
Current free objects management method of the slab is weird, because
it touch random position of the array of kmem_bufctl_t when we try to
get free object. See following example.

struct slab's free = 6
kmem_bufctl_t array: 1 END 5 7 0 4 3 2

To get free objects, we access this array with following pattern.
6 -> 3 -> 7 -> 2 -> 5 -> 4 -> 0 -> 1 -> END

If we have many objects, this array would be larger and be not in the same
cache line. It is not good for performance.

We can do same thing through more easy way, like as the stack.
This patchset implement it and remove complex code for above algorithm.
This makes slab code much cleaner.

This patchset is based on v3.11-rc6, but tested on v3.10.

Thanks.

Joonsoo Kim (16):
  slab: correct pfmemalloc check
  slab: change return type of kmem_getpages() to struct page
  slab: remove colouroff in struct slab
  slab: remove nodeid in struct slab
  slab: remove cachep in struct slab_rcu
  slab: put forward freeing slab management object
  slab: overloading the RCU head over the LRU for RCU free
  slab: use well-defined macro, virt_to_slab()
  slab: use __GFP_COMP flag for allocating slab pages
  slab: change the management method of free objects of the slab
  slab: remove kmem_bufctl_t
  slab: remove SLAB_LIMIT
  slab: replace free and inuse in struct slab with newly introduced
    active
  slab: use struct page for slab management
  slab: remove useless statement for checking pfmemalloc
  slab: rename slab_bufctl to slab_freelist

 include/linux/mm_types.h |   21 +-
 include/linux/slab.h     |    9 +-
 include/linux/slab_def.h |    4 +-
 mm/slab.c                |  563 ++++++++++++++++++----------------------------
 4 files changed, 237 insertions(+), 360 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 00/16] slab: overload struct slab over struct page to reduce memory usage
@ 2013-08-22  8:44 ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

There is two main topics in this patchset. One is to reduce memory usage
and the other is to change a management method of free objects of a slab.

The SLAB allocate a struct slab for each slab. The size of this structure
except bufctl array is 40 bytes on 64 bits machine. We can reduce memory
waste and cache footprint if we overload struct slab over struct page.

And this patchset change a management method of free objects of a slab.
Current free objects management method of the slab is weird, because
it touch random position of the array of kmem_bufctl_t when we try to
get free object. See following example.

struct slab's free = 6
kmem_bufctl_t array: 1 END 5 7 0 4 3 2

To get free objects, we access this array with following pattern.
6 -> 3 -> 7 -> 2 -> 5 -> 4 -> 0 -> 1 -> END

If we have many objects, this array would be larger and be not in the same
cache line. It is not good for performance.

We can do same thing through more easy way, like as the stack.
This patchset implement it and remove complex code for above algorithm.
This makes slab code much cleaner.

This patchset is based on v3.11-rc6, but tested on v3.10.

Thanks.

Joonsoo Kim (16):
  slab: correct pfmemalloc check
  slab: change return type of kmem_getpages() to struct page
  slab: remove colouroff in struct slab
  slab: remove nodeid in struct slab
  slab: remove cachep in struct slab_rcu
  slab: put forward freeing slab management object
  slab: overloading the RCU head over the LRU for RCU free
  slab: use well-defined macro, virt_to_slab()
  slab: use __GFP_COMP flag for allocating slab pages
  slab: change the management method of free objects of the slab
  slab: remove kmem_bufctl_t
  slab: remove SLAB_LIMIT
  slab: replace free and inuse in struct slab with newly introduced
    active
  slab: use struct page for slab management
  slab: remove useless statement for checking pfmemalloc
  slab: rename slab_bufctl to slab_freelist

 include/linux/mm_types.h |   21 +-
 include/linux/slab.h     |    9 +-
 include/linux/slab_def.h |    4 +-
 mm/slab.c                |  563 ++++++++++++++++++----------------------------
 4 files changed, 237 insertions(+), 360 deletions(-)

-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 01/16] slab: correct pfmemalloc check
  2013-08-22  8:44 ` Joonsoo Kim
@ 2013-08-22  8:44   ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim, Mel Gorman

We checked pfmemalloc by slab unit, not page unit. You can see this
in is_slab_pfmemalloc(). So other pages don't need to be set/cleared
pfmemalloc.

And, therefore we should check pfmemalloc in page flag of first page,
but current implementation don't do that. virt_to_head_page(obj) just
return 'struct page' of that object, not one of first page, since the SLAB
don't use __GFP_COMP when CONFIG_MMU. To get 'struct page' of first page,
we first get a slab and try to get it via virt_to_head_page(slab->s_mem).

Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index 8ccd296..d9eae39 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -930,7 +930,8 @@ static void *__ac_put_obj(struct kmem_cache *cachep, struct array_cache *ac,
 {
 	if (unlikely(pfmemalloc_active)) {
 		/* Some pfmemalloc slabs exist, check if this is one */
-		struct page *page = virt_to_head_page(objp);
+		struct slab *slabp = virt_to_slab(objp);
+		struct page *page = virt_to_head_page(slabp->s_mem);
 		if (PageSlabPfmemalloc(page))
 			set_obj_pfmemalloc(&objp);
 	}
@@ -1770,7 +1771,7 @@ static void *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, int nodeid)
 		__SetPageSlab(page + i);
 
 		if (page->pfmemalloc)
-			SetPageSlabPfmemalloc(page + i);
+			SetPageSlabPfmemalloc(page);
 	}
 	memcg_bind_pages(cachep, cachep->gfporder);
 
@@ -1803,9 +1804,10 @@ static void kmem_freepages(struct kmem_cache *cachep, void *addr)
 	else
 		sub_zone_page_state(page_zone(page),
 				NR_SLAB_UNRECLAIMABLE, nr_freed);
+
+	__ClearPageSlabPfmemalloc(page);
 	while (i--) {
 		BUG_ON(!PageSlab(page));
-		__ClearPageSlabPfmemalloc(page);
 		__ClearPageSlab(page);
 		page++;
 	}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 01/16] slab: correct pfmemalloc check
@ 2013-08-22  8:44   ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim, Mel Gorman

We checked pfmemalloc by slab unit, not page unit. You can see this
in is_slab_pfmemalloc(). So other pages don't need to be set/cleared
pfmemalloc.

And, therefore we should check pfmemalloc in page flag of first page,
but current implementation don't do that. virt_to_head_page(obj) just
return 'struct page' of that object, not one of first page, since the SLAB
don't use __GFP_COMP when CONFIG_MMU. To get 'struct page' of first page,
we first get a slab and try to get it via virt_to_head_page(slab->s_mem).

Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index 8ccd296..d9eae39 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -930,7 +930,8 @@ static void *__ac_put_obj(struct kmem_cache *cachep, struct array_cache *ac,
 {
 	if (unlikely(pfmemalloc_active)) {
 		/* Some pfmemalloc slabs exist, check if this is one */
-		struct page *page = virt_to_head_page(objp);
+		struct slab *slabp = virt_to_slab(objp);
+		struct page *page = virt_to_head_page(slabp->s_mem);
 		if (PageSlabPfmemalloc(page))
 			set_obj_pfmemalloc(&objp);
 	}
@@ -1770,7 +1771,7 @@ static void *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, int nodeid)
 		__SetPageSlab(page + i);
 
 		if (page->pfmemalloc)
-			SetPageSlabPfmemalloc(page + i);
+			SetPageSlabPfmemalloc(page);
 	}
 	memcg_bind_pages(cachep, cachep->gfporder);
 
@@ -1803,9 +1804,10 @@ static void kmem_freepages(struct kmem_cache *cachep, void *addr)
 	else
 		sub_zone_page_state(page_zone(page),
 				NR_SLAB_UNRECLAIMABLE, nr_freed);
+
+	__ClearPageSlabPfmemalloc(page);
 	while (i--) {
 		BUG_ON(!PageSlab(page));
-		__ClearPageSlabPfmemalloc(page);
 		__ClearPageSlab(page);
 		page++;
 	}
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 02/16] slab: change return type of kmem_getpages() to struct page
  2013-08-22  8:44 ` Joonsoo Kim
@ 2013-08-22  8:44   ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

It is more understandable that kmem_getpages() return struct page.
And, with this, we can reduce one translation from virt addr to page and
makes better code than before. Below is a change of this patch.

* Before
   text	   data	    bss	    dec	    hex	filename
  22123	  23434	      4	  45561	   b1f9	mm/slab.o

* After
   text	   data	    bss	    dec	    hex	filename
  22074	  23434	      4	  45512	   b1c8	mm/slab.o

And this help following patch to remove struct slab's colouroff.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index d9eae39..180f532 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -205,7 +205,7 @@ typedef unsigned int kmem_bufctl_t;
 struct slab_rcu {
 	struct rcu_head head;
 	struct kmem_cache *cachep;
-	void *addr;
+	struct page *page;
 };
 
 /*
@@ -1731,7 +1731,8 @@ slab_out_of_memory(struct kmem_cache *cachep, gfp_t gfpflags, int nodeid)
  * did not request dmaable memory, we might get it, but that
  * would be relatively rare and ignorable.
  */
-static void *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, int nodeid)
+static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
+								int nodeid)
 {
 	struct page *page;
 	int nr_pages;
@@ -1784,16 +1785,15 @@ static void *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, int nodeid)
 			kmemcheck_mark_unallocated_pages(page, nr_pages);
 	}
 
-	return page_address(page);
+	return page;
 }
 
 /*
  * Interface to system's page release.
  */
-static void kmem_freepages(struct kmem_cache *cachep, void *addr)
+static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
 {
 	unsigned long i = (1 << cachep->gfporder);
-	struct page *page = virt_to_page(addr);
 	const unsigned long nr_freed = i;
 
 	kmemcheck_free_shadow(page, cachep->gfporder);
@@ -1815,7 +1815,7 @@ static void kmem_freepages(struct kmem_cache *cachep, void *addr)
 	memcg_release_pages(cachep, cachep->gfporder);
 	if (current->reclaim_state)
 		current->reclaim_state->reclaimed_slab += nr_freed;
-	free_memcg_kmem_pages((unsigned long)addr, cachep->gfporder);
+	__free_memcg_kmem_pages(page, cachep->gfporder);
 }
 
 static void kmem_rcu_free(struct rcu_head *head)
@@ -1823,7 +1823,7 @@ static void kmem_rcu_free(struct rcu_head *head)
 	struct slab_rcu *slab_rcu = (struct slab_rcu *)head;
 	struct kmem_cache *cachep = slab_rcu->cachep;
 
-	kmem_freepages(cachep, slab_rcu->addr);
+	kmem_freepages(cachep, slab_rcu->page);
 	if (OFF_SLAB(cachep))
 		kmem_cache_free(cachep->slabp_cache, slab_rcu);
 }
@@ -2042,7 +2042,7 @@ static void slab_destroy_debugcheck(struct kmem_cache *cachep, struct slab *slab
  */
 static void slab_destroy(struct kmem_cache *cachep, struct slab *slabp)
 {
-	void *addr = slabp->s_mem - slabp->colouroff;
+	struct page *page = virt_to_head_page(slabp->s_mem);
 
 	slab_destroy_debugcheck(cachep, slabp);
 	if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU)) {
@@ -2050,10 +2050,10 @@ static void slab_destroy(struct kmem_cache *cachep, struct slab *slabp)
 
 		slab_rcu = (struct slab_rcu *)slabp;
 		slab_rcu->cachep = cachep;
-		slab_rcu->addr = addr;
+		slab_rcu->page = page;
 		call_rcu(&slab_rcu->head, kmem_rcu_free);
 	} else {
-		kmem_freepages(cachep, addr);
+		kmem_freepages(cachep, page);
 		if (OFF_SLAB(cachep))
 			kmem_cache_free(cachep->slabp_cache, slabp);
 	}
@@ -2598,11 +2598,12 @@ int __kmem_cache_shutdown(struct kmem_cache *cachep)
  * kmem_find_general_cachep till the initialization is complete.
  * Hence we cannot have slabp_cache same as the original cache.
  */
-static struct slab *alloc_slabmgmt(struct kmem_cache *cachep, void *objp,
-				   int colour_off, gfp_t local_flags,
-				   int nodeid)
+static struct slab *alloc_slabmgmt(struct kmem_cache *cachep,
+				   struct page *page, int colour_off,
+				   gfp_t local_flags, int nodeid)
 {
 	struct slab *slabp;
+	void *addr = page_address(page);
 
 	if (OFF_SLAB(cachep)) {
 		/* Slab management obj is off-slab. */
@@ -2619,12 +2620,12 @@ static struct slab *alloc_slabmgmt(struct kmem_cache *cachep, void *objp,
 		if (!slabp)
 			return NULL;
 	} else {
-		slabp = objp + colour_off;
+		slabp = addr + colour_off;
 		colour_off += cachep->slab_size;
 	}
 	slabp->inuse = 0;
 	slabp->colouroff = colour_off;
-	slabp->s_mem = objp + colour_off;
+	slabp->s_mem = addr + colour_off;
 	slabp->nodeid = nodeid;
 	slabp->free = 0;
 	return slabp;
@@ -2735,12 +2736,9 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
  * virtual address for kfree, ksize, and slab debugging.
  */
 static void slab_map_pages(struct kmem_cache *cache, struct slab *slab,
-			   void *addr)
+			   struct page *page)
 {
 	int nr_pages;
-	struct page *page;
-
-	page = virt_to_page(addr);
 
 	nr_pages = 1;
 	if (likely(!PageCompound(page)))
@@ -2758,7 +2756,7 @@ static void slab_map_pages(struct kmem_cache *cache, struct slab *slab,
  * kmem_cache_alloc() when there are no active objs left in a cache.
  */
 static int cache_grow(struct kmem_cache *cachep,
-		gfp_t flags, int nodeid, void *objp)
+		gfp_t flags, int nodeid, struct page *page)
 {
 	struct slab *slabp;
 	size_t offset;
@@ -2801,18 +2799,18 @@ static int cache_grow(struct kmem_cache *cachep,
 	 * Get mem for the objs.  Attempt to allocate a physical page from
 	 * 'nodeid'.
 	 */
-	if (!objp)
-		objp = kmem_getpages(cachep, local_flags, nodeid);
-	if (!objp)
+	if (!page)
+		page = kmem_getpages(cachep, local_flags, nodeid);
+	if (!page)
 		goto failed;
 
 	/* Get slab management. */
-	slabp = alloc_slabmgmt(cachep, objp, offset,
+	slabp = alloc_slabmgmt(cachep, page, offset,
 			local_flags & ~GFP_CONSTRAINT_MASK, nodeid);
 	if (!slabp)
 		goto opps1;
 
-	slab_map_pages(cachep, slabp, objp);
+	slab_map_pages(cachep, slabp, page);
 
 	cache_init_objs(cachep, slabp);
 
@@ -2828,7 +2826,7 @@ static int cache_grow(struct kmem_cache *cachep,
 	spin_unlock(&n->list_lock);
 	return 1;
 opps1:
-	kmem_freepages(cachep, objp);
+	kmem_freepages(cachep, page);
 failed:
 	if (local_flags & __GFP_WAIT)
 		local_irq_disable();
@@ -3244,18 +3242,20 @@ retry:
 		 * We may trigger various forms of reclaim on the allowed
 		 * set and go into memory reserves if necessary.
 		 */
+		struct page *page;
+
 		if (local_flags & __GFP_WAIT)
 			local_irq_enable();
 		kmem_flagcheck(cache, flags);
-		obj = kmem_getpages(cache, local_flags, numa_mem_id());
+		page = kmem_getpages(cache, local_flags, numa_mem_id());
 		if (local_flags & __GFP_WAIT)
 			local_irq_disable();
-		if (obj) {
+		if (page) {
 			/*
 			 * Insert into the appropriate per node queues
 			 */
-			nid = page_to_nid(virt_to_page(obj));
-			if (cache_grow(cache, flags, nid, obj)) {
+			nid = page_to_nid(page);
+			if (cache_grow(cache, flags, nid, page)) {
 				obj = ____cache_alloc_node(cache,
 					flags | GFP_THISNODE, nid);
 				if (!obj)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 02/16] slab: change return type of kmem_getpages() to struct page
@ 2013-08-22  8:44   ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

It is more understandable that kmem_getpages() return struct page.
And, with this, we can reduce one translation from virt addr to page and
makes better code than before. Below is a change of this patch.

* Before
   text	   data	    bss	    dec	    hex	filename
  22123	  23434	      4	  45561	   b1f9	mm/slab.o

* After
   text	   data	    bss	    dec	    hex	filename
  22074	  23434	      4	  45512	   b1c8	mm/slab.o

And this help following patch to remove struct slab's colouroff.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index d9eae39..180f532 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -205,7 +205,7 @@ typedef unsigned int kmem_bufctl_t;
 struct slab_rcu {
 	struct rcu_head head;
 	struct kmem_cache *cachep;
-	void *addr;
+	struct page *page;
 };
 
 /*
@@ -1731,7 +1731,8 @@ slab_out_of_memory(struct kmem_cache *cachep, gfp_t gfpflags, int nodeid)
  * did not request dmaable memory, we might get it, but that
  * would be relatively rare and ignorable.
  */
-static void *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, int nodeid)
+static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
+								int nodeid)
 {
 	struct page *page;
 	int nr_pages;
@@ -1784,16 +1785,15 @@ static void *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, int nodeid)
 			kmemcheck_mark_unallocated_pages(page, nr_pages);
 	}
 
-	return page_address(page);
+	return page;
 }
 
 /*
  * Interface to system's page release.
  */
-static void kmem_freepages(struct kmem_cache *cachep, void *addr)
+static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
 {
 	unsigned long i = (1 << cachep->gfporder);
-	struct page *page = virt_to_page(addr);
 	const unsigned long nr_freed = i;
 
 	kmemcheck_free_shadow(page, cachep->gfporder);
@@ -1815,7 +1815,7 @@ static void kmem_freepages(struct kmem_cache *cachep, void *addr)
 	memcg_release_pages(cachep, cachep->gfporder);
 	if (current->reclaim_state)
 		current->reclaim_state->reclaimed_slab += nr_freed;
-	free_memcg_kmem_pages((unsigned long)addr, cachep->gfporder);
+	__free_memcg_kmem_pages(page, cachep->gfporder);
 }
 
 static void kmem_rcu_free(struct rcu_head *head)
@@ -1823,7 +1823,7 @@ static void kmem_rcu_free(struct rcu_head *head)
 	struct slab_rcu *slab_rcu = (struct slab_rcu *)head;
 	struct kmem_cache *cachep = slab_rcu->cachep;
 
-	kmem_freepages(cachep, slab_rcu->addr);
+	kmem_freepages(cachep, slab_rcu->page);
 	if (OFF_SLAB(cachep))
 		kmem_cache_free(cachep->slabp_cache, slab_rcu);
 }
@@ -2042,7 +2042,7 @@ static void slab_destroy_debugcheck(struct kmem_cache *cachep, struct slab *slab
  */
 static void slab_destroy(struct kmem_cache *cachep, struct slab *slabp)
 {
-	void *addr = slabp->s_mem - slabp->colouroff;
+	struct page *page = virt_to_head_page(slabp->s_mem);
 
 	slab_destroy_debugcheck(cachep, slabp);
 	if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU)) {
@@ -2050,10 +2050,10 @@ static void slab_destroy(struct kmem_cache *cachep, struct slab *slabp)
 
 		slab_rcu = (struct slab_rcu *)slabp;
 		slab_rcu->cachep = cachep;
-		slab_rcu->addr = addr;
+		slab_rcu->page = page;
 		call_rcu(&slab_rcu->head, kmem_rcu_free);
 	} else {
-		kmem_freepages(cachep, addr);
+		kmem_freepages(cachep, page);
 		if (OFF_SLAB(cachep))
 			kmem_cache_free(cachep->slabp_cache, slabp);
 	}
@@ -2598,11 +2598,12 @@ int __kmem_cache_shutdown(struct kmem_cache *cachep)
  * kmem_find_general_cachep till the initialization is complete.
  * Hence we cannot have slabp_cache same as the original cache.
  */
-static struct slab *alloc_slabmgmt(struct kmem_cache *cachep, void *objp,
-				   int colour_off, gfp_t local_flags,
-				   int nodeid)
+static struct slab *alloc_slabmgmt(struct kmem_cache *cachep,
+				   struct page *page, int colour_off,
+				   gfp_t local_flags, int nodeid)
 {
 	struct slab *slabp;
+	void *addr = page_address(page);
 
 	if (OFF_SLAB(cachep)) {
 		/* Slab management obj is off-slab. */
@@ -2619,12 +2620,12 @@ static struct slab *alloc_slabmgmt(struct kmem_cache *cachep, void *objp,
 		if (!slabp)
 			return NULL;
 	} else {
-		slabp = objp + colour_off;
+		slabp = addr + colour_off;
 		colour_off += cachep->slab_size;
 	}
 	slabp->inuse = 0;
 	slabp->colouroff = colour_off;
-	slabp->s_mem = objp + colour_off;
+	slabp->s_mem = addr + colour_off;
 	slabp->nodeid = nodeid;
 	slabp->free = 0;
 	return slabp;
@@ -2735,12 +2736,9 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
  * virtual address for kfree, ksize, and slab debugging.
  */
 static void slab_map_pages(struct kmem_cache *cache, struct slab *slab,
-			   void *addr)
+			   struct page *page)
 {
 	int nr_pages;
-	struct page *page;
-
-	page = virt_to_page(addr);
 
 	nr_pages = 1;
 	if (likely(!PageCompound(page)))
@@ -2758,7 +2756,7 @@ static void slab_map_pages(struct kmem_cache *cache, struct slab *slab,
  * kmem_cache_alloc() when there are no active objs left in a cache.
  */
 static int cache_grow(struct kmem_cache *cachep,
-		gfp_t flags, int nodeid, void *objp)
+		gfp_t flags, int nodeid, struct page *page)
 {
 	struct slab *slabp;
 	size_t offset;
@@ -2801,18 +2799,18 @@ static int cache_grow(struct kmem_cache *cachep,
 	 * Get mem for the objs.  Attempt to allocate a physical page from
 	 * 'nodeid'.
 	 */
-	if (!objp)
-		objp = kmem_getpages(cachep, local_flags, nodeid);
-	if (!objp)
+	if (!page)
+		page = kmem_getpages(cachep, local_flags, nodeid);
+	if (!page)
 		goto failed;
 
 	/* Get slab management. */
-	slabp = alloc_slabmgmt(cachep, objp, offset,
+	slabp = alloc_slabmgmt(cachep, page, offset,
 			local_flags & ~GFP_CONSTRAINT_MASK, nodeid);
 	if (!slabp)
 		goto opps1;
 
-	slab_map_pages(cachep, slabp, objp);
+	slab_map_pages(cachep, slabp, page);
 
 	cache_init_objs(cachep, slabp);
 
@@ -2828,7 +2826,7 @@ static int cache_grow(struct kmem_cache *cachep,
 	spin_unlock(&n->list_lock);
 	return 1;
 opps1:
-	kmem_freepages(cachep, objp);
+	kmem_freepages(cachep, page);
 failed:
 	if (local_flags & __GFP_WAIT)
 		local_irq_disable();
@@ -3244,18 +3242,20 @@ retry:
 		 * We may trigger various forms of reclaim on the allowed
 		 * set and go into memory reserves if necessary.
 		 */
+		struct page *page;
+
 		if (local_flags & __GFP_WAIT)
 			local_irq_enable();
 		kmem_flagcheck(cache, flags);
-		obj = kmem_getpages(cache, local_flags, numa_mem_id());
+		page = kmem_getpages(cache, local_flags, numa_mem_id());
 		if (local_flags & __GFP_WAIT)
 			local_irq_disable();
-		if (obj) {
+		if (page) {
 			/*
 			 * Insert into the appropriate per node queues
 			 */
-			nid = page_to_nid(virt_to_page(obj));
-			if (cache_grow(cache, flags, nid, obj)) {
+			nid = page_to_nid(page);
+			if (cache_grow(cache, flags, nid, page)) {
 				obj = ____cache_alloc_node(cache,
 					flags | GFP_THISNODE, nid);
 				if (!obj)
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 03/16] slab: remove colouroff in struct slab
  2013-08-22  8:44 ` Joonsoo Kim
@ 2013-08-22  8:44   ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

Now there is no user colouroff, so remove it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index 180f532..d9f81a0 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -219,7 +219,6 @@ struct slab {
 	union {
 		struct {
 			struct list_head list;
-			unsigned long colouroff;
 			void *s_mem;		/* including colour offset */
 			unsigned int inuse;	/* num of objs active in slab */
 			kmem_bufctl_t free;
@@ -2624,7 +2623,6 @@ static struct slab *alloc_slabmgmt(struct kmem_cache *cachep,
 		colour_off += cachep->slab_size;
 	}
 	slabp->inuse = 0;
-	slabp->colouroff = colour_off;
 	slabp->s_mem = addr + colour_off;
 	slabp->nodeid = nodeid;
 	slabp->free = 0;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 03/16] slab: remove colouroff in struct slab
@ 2013-08-22  8:44   ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

Now there is no user colouroff, so remove it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index 180f532..d9f81a0 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -219,7 +219,6 @@ struct slab {
 	union {
 		struct {
 			struct list_head list;
-			unsigned long colouroff;
 			void *s_mem;		/* including colour offset */
 			unsigned int inuse;	/* num of objs active in slab */
 			kmem_bufctl_t free;
@@ -2624,7 +2623,6 @@ static struct slab *alloc_slabmgmt(struct kmem_cache *cachep,
 		colour_off += cachep->slab_size;
 	}
 	slabp->inuse = 0;
-	slabp->colouroff = colour_off;
 	slabp->s_mem = addr + colour_off;
 	slabp->nodeid = nodeid;
 	slabp->free = 0;
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 04/16] slab: remove nodeid in struct slab
  2013-08-22  8:44 ` Joonsoo Kim
@ 2013-08-22  8:44   ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

We can get nodeid using address translation, so this field is not useful.
Therefore, remove it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index d9f81a0..69dc25a 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -222,7 +222,6 @@ struct slab {
 			void *s_mem;		/* including colour offset */
 			unsigned int inuse;	/* num of objs active in slab */
 			kmem_bufctl_t free;
-			unsigned short nodeid;
 		};
 		struct slab_rcu __slab_cover_slab_rcu;
 	};
@@ -1099,8 +1098,7 @@ static void drain_alien_cache(struct kmem_cache *cachep,
 
 static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
 {
-	struct slab *slabp = virt_to_slab(objp);
-	int nodeid = slabp->nodeid;
+	int nodeid = page_to_nid(virt_to_page(objp));
 	struct kmem_cache_node *n;
 	struct array_cache *alien = NULL;
 	int node;
@@ -1111,7 +1109,7 @@ static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
 	 * Make sure we are not freeing a object from another node to the array
 	 * cache on this cpu.
 	 */
-	if (likely(slabp->nodeid == node))
+	if (likely(nodeid == node))
 		return 0;
 
 	n = cachep->node[node];
@@ -2624,7 +2622,6 @@ static struct slab *alloc_slabmgmt(struct kmem_cache *cachep,
 	}
 	slabp->inuse = 0;
 	slabp->s_mem = addr + colour_off;
-	slabp->nodeid = nodeid;
 	slabp->free = 0;
 	return slabp;
 }
@@ -2701,7 +2698,7 @@ static void *slab_get_obj(struct kmem_cache *cachep, struct slab *slabp,
 	next = slab_bufctl(slabp)[slabp->free];
 #if DEBUG
 	slab_bufctl(slabp)[slabp->free] = BUFCTL_FREE;
-	WARN_ON(slabp->nodeid != nodeid);
+	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
 #endif
 	slabp->free = next;
 
@@ -2715,7 +2712,7 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
 
 #if DEBUG
 	/* Verify that the slab belongs to the intended node */
-	WARN_ON(slabp->nodeid != nodeid);
+	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
 
 	if (slab_bufctl(slabp)[objnr] + 1 <= SLAB_LIMIT + 1) {
 		printk(KERN_ERR "slab: double free detected in cache "
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 04/16] slab: remove nodeid in struct slab
@ 2013-08-22  8:44   ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

We can get nodeid using address translation, so this field is not useful.
Therefore, remove it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index d9f81a0..69dc25a 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -222,7 +222,6 @@ struct slab {
 			void *s_mem;		/* including colour offset */
 			unsigned int inuse;	/* num of objs active in slab */
 			kmem_bufctl_t free;
-			unsigned short nodeid;
 		};
 		struct slab_rcu __slab_cover_slab_rcu;
 	};
@@ -1099,8 +1098,7 @@ static void drain_alien_cache(struct kmem_cache *cachep,
 
 static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
 {
-	struct slab *slabp = virt_to_slab(objp);
-	int nodeid = slabp->nodeid;
+	int nodeid = page_to_nid(virt_to_page(objp));
 	struct kmem_cache_node *n;
 	struct array_cache *alien = NULL;
 	int node;
@@ -1111,7 +1109,7 @@ static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
 	 * Make sure we are not freeing a object from another node to the array
 	 * cache on this cpu.
 	 */
-	if (likely(slabp->nodeid == node))
+	if (likely(nodeid == node))
 		return 0;
 
 	n = cachep->node[node];
@@ -2624,7 +2622,6 @@ static struct slab *alloc_slabmgmt(struct kmem_cache *cachep,
 	}
 	slabp->inuse = 0;
 	slabp->s_mem = addr + colour_off;
-	slabp->nodeid = nodeid;
 	slabp->free = 0;
 	return slabp;
 }
@@ -2701,7 +2698,7 @@ static void *slab_get_obj(struct kmem_cache *cachep, struct slab *slabp,
 	next = slab_bufctl(slabp)[slabp->free];
 #if DEBUG
 	slab_bufctl(slabp)[slabp->free] = BUFCTL_FREE;
-	WARN_ON(slabp->nodeid != nodeid);
+	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
 #endif
 	slabp->free = next;
 
@@ -2715,7 +2712,7 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
 
 #if DEBUG
 	/* Verify that the slab belongs to the intended node */
-	WARN_ON(slabp->nodeid != nodeid);
+	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
 
 	if (slab_bufctl(slabp)[objnr] + 1 <= SLAB_LIMIT + 1) {
 		printk(KERN_ERR "slab: double free detected in cache "
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 05/16] slab: remove cachep in struct slab_rcu
  2013-08-22  8:44 ` Joonsoo Kim
@ 2013-08-22  8:44   ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

We can get cachep using page in struct slab_rcu, so remove it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index 69dc25a..b378f91 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -204,7 +204,6 @@ typedef unsigned int kmem_bufctl_t;
  */
 struct slab_rcu {
 	struct rcu_head head;
-	struct kmem_cache *cachep;
 	struct page *page;
 };
 
@@ -1818,7 +1817,7 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
 static void kmem_rcu_free(struct rcu_head *head)
 {
 	struct slab_rcu *slab_rcu = (struct slab_rcu *)head;
-	struct kmem_cache *cachep = slab_rcu->cachep;
+	struct kmem_cache *cachep = slab_rcu->page->slab_cache;
 
 	kmem_freepages(cachep, slab_rcu->page);
 	if (OFF_SLAB(cachep))
@@ -2046,7 +2045,6 @@ static void slab_destroy(struct kmem_cache *cachep, struct slab *slabp)
 		struct slab_rcu *slab_rcu;
 
 		slab_rcu = (struct slab_rcu *)slabp;
-		slab_rcu->cachep = cachep;
 		slab_rcu->page = page;
 		call_rcu(&slab_rcu->head, kmem_rcu_free);
 	} else {
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 05/16] slab: remove cachep in struct slab_rcu
@ 2013-08-22  8:44   ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

We can get cachep using page in struct slab_rcu, so remove it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index 69dc25a..b378f91 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -204,7 +204,6 @@ typedef unsigned int kmem_bufctl_t;
  */
 struct slab_rcu {
 	struct rcu_head head;
-	struct kmem_cache *cachep;
 	struct page *page;
 };
 
@@ -1818,7 +1817,7 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
 static void kmem_rcu_free(struct rcu_head *head)
 {
 	struct slab_rcu *slab_rcu = (struct slab_rcu *)head;
-	struct kmem_cache *cachep = slab_rcu->cachep;
+	struct kmem_cache *cachep = slab_rcu->page->slab_cache;
 
 	kmem_freepages(cachep, slab_rcu->page);
 	if (OFF_SLAB(cachep))
@@ -2046,7 +2045,6 @@ static void slab_destroy(struct kmem_cache *cachep, struct slab *slabp)
 		struct slab_rcu *slab_rcu;
 
 		slab_rcu = (struct slab_rcu *)slabp;
-		slab_rcu->cachep = cachep;
 		slab_rcu->page = page;
 		call_rcu(&slab_rcu->head, kmem_rcu_free);
 	} else {
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 06/16] slab: put forward freeing slab management object
  2013-08-22  8:44 ` Joonsoo Kim
@ 2013-08-22  8:44   ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

We don't need to free slab management object in rcu context,
because, from now on, we don't manage this slab anymore.
So put forward freeing.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index b378f91..607a9b8 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1820,8 +1820,6 @@ static void kmem_rcu_free(struct rcu_head *head)
 	struct kmem_cache *cachep = slab_rcu->page->slab_cache;
 
 	kmem_freepages(cachep, slab_rcu->page);
-	if (OFF_SLAB(cachep))
-		kmem_cache_free(cachep->slabp_cache, slab_rcu);
 }
 
 #if DEBUG
@@ -2047,11 +2045,16 @@ static void slab_destroy(struct kmem_cache *cachep, struct slab *slabp)
 		slab_rcu = (struct slab_rcu *)slabp;
 		slab_rcu->page = page;
 		call_rcu(&slab_rcu->head, kmem_rcu_free);
-	} else {
+
+	} else
 		kmem_freepages(cachep, page);
-		if (OFF_SLAB(cachep))
-			kmem_cache_free(cachep->slabp_cache, slabp);
-	}
+
+	/*
+	 * From now on, we don't use slab management
+	 * although actual page will be freed in rcu context.
+	 */
+	if (OFF_SLAB(cachep))
+		kmem_cache_free(cachep->slabp_cache, slabp);
 }
 
 /**
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 06/16] slab: put forward freeing slab management object
@ 2013-08-22  8:44   ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

We don't need to free slab management object in rcu context,
because, from now on, we don't manage this slab anymore.
So put forward freeing.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index b378f91..607a9b8 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1820,8 +1820,6 @@ static void kmem_rcu_free(struct rcu_head *head)
 	struct kmem_cache *cachep = slab_rcu->page->slab_cache;
 
 	kmem_freepages(cachep, slab_rcu->page);
-	if (OFF_SLAB(cachep))
-		kmem_cache_free(cachep->slabp_cache, slab_rcu);
 }
 
 #if DEBUG
@@ -2047,11 +2045,16 @@ static void slab_destroy(struct kmem_cache *cachep, struct slab *slabp)
 		slab_rcu = (struct slab_rcu *)slabp;
 		slab_rcu->page = page;
 		call_rcu(&slab_rcu->head, kmem_rcu_free);
-	} else {
+
+	} else
 		kmem_freepages(cachep, page);
-		if (OFF_SLAB(cachep))
-			kmem_cache_free(cachep->slabp_cache, slabp);
-	}
+
+	/*
+	 * From now on, we don't use slab management
+	 * although actual page will be freed in rcu context.
+	 */
+	if (OFF_SLAB(cachep))
+		kmem_cache_free(cachep->slabp_cache, slabp);
 }
 
 /**
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 07/16] slab: overloading the RCU head over the LRU for RCU free
  2013-08-22  8:44 ` Joonsoo Kim
@ 2013-08-22  8:44   ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

With build-time size checking, we can overload the RCU head over the LRU
of struct page to free pages of a slab in rcu context. This really help to
implement to overload the struct slab over the struct page and this
eventually reduce memory usage and cache footprint of the SLAB.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 0c62175..b8d19b1 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -51,7 +51,14 @@
  *  }
  *  rcu_read_unlock();
  *
- * See also the comment on struct slab_rcu in mm/slab.c.
+ * This is useful if we need to approach a kernel structure obliquely,
+ * from its address obtained without the usual locking. We can lock
+ * the structure to stabilize it and check it's still at the given address,
+ * only if we can be sure that the memory has not been meanwhile reused
+ * for some other kind of object (which our subsystem's lock might corrupt).
+ *
+ * rcu_read_lock before reading the address, then rcu_read_unlock after
+ * taking the spinlock within the structure expected at that address.
  */
 #define SLAB_DESTROY_BY_RCU	0x00080000UL	/* Defer freeing slabs to RCU */
 #define SLAB_MEM_SPREAD		0x00100000UL	/* Spread some memory over cpuset */
diff --git a/mm/slab.c b/mm/slab.c
index 607a9b8..9e98ee0 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -189,25 +189,6 @@ typedef unsigned int kmem_bufctl_t;
 #define	SLAB_LIMIT	(((kmem_bufctl_t)(~0U))-3)
 
 /*
- * struct slab_rcu
- *
- * slab_destroy on a SLAB_DESTROY_BY_RCU cache uses this structure to
- * arrange for kmem_freepages to be called via RCU.  This is useful if
- * we need to approach a kernel structure obliquely, from its address
- * obtained without the usual locking.  We can lock the structure to
- * stabilize it and check it's still at the given address, only if we
- * can be sure that the memory has not been meanwhile reused for some
- * other kind of object (which our subsystem's lock might corrupt).
- *
- * rcu_read_lock before reading the address, then rcu_read_unlock after
- * taking the spinlock within the structure expected at that address.
- */
-struct slab_rcu {
-	struct rcu_head head;
-	struct page *page;
-};
-
-/*
  * struct slab
  *
  * Manages the objs in a slab. Placed either at the beginning of mem allocated
@@ -215,14 +196,11 @@ struct slab_rcu {
  * Slabs are chained into three list: fully used, partial, fully free slabs.
  */
 struct slab {
-	union {
-		struct {
-			struct list_head list;
-			void *s_mem;		/* including colour offset */
-			unsigned int inuse;	/* num of objs active in slab */
-			kmem_bufctl_t free;
-		};
-		struct slab_rcu __slab_cover_slab_rcu;
+	struct {
+		struct list_head list;
+		void *s_mem;		/* including colour offset */
+		unsigned int inuse;	/* num of objs active in slab */
+		kmem_bufctl_t free;
 	};
 };
 
@@ -1503,6 +1481,8 @@ void __init kmem_cache_init(void)
 {
 	int i;
 
+	BUILD_BUG_ON(sizeof(((struct page *)NULL)->lru) <
+					sizeof(struct rcu_head));
 	kmem_cache = &kmem_cache_boot;
 	setup_node_pointer(kmem_cache);
 
@@ -1816,10 +1796,13 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
 
 static void kmem_rcu_free(struct rcu_head *head)
 {
-	struct slab_rcu *slab_rcu = (struct slab_rcu *)head;
-	struct kmem_cache *cachep = slab_rcu->page->slab_cache;
+	struct kmem_cache *cachep;
+	struct page *page;
 
-	kmem_freepages(cachep, slab_rcu->page);
+	page = container_of((struct list_head *)head, struct page, lru);
+	cachep = page->slab_cache;
+
+	kmem_freepages(cachep, page);
 }
 
 #if DEBUG
@@ -2040,11 +2023,11 @@ static void slab_destroy(struct kmem_cache *cachep, struct slab *slabp)
 
 	slab_destroy_debugcheck(cachep, slabp);
 	if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU)) {
-		struct slab_rcu *slab_rcu;
+		struct rcu_head *head;
 
-		slab_rcu = (struct slab_rcu *)slabp;
-		slab_rcu->page = page;
-		call_rcu(&slab_rcu->head, kmem_rcu_free);
+		/* RCU free overloads the RCU head over the LRU */
+		head = (void *)&page->lru;
+		call_rcu(head, kmem_rcu_free);
 
 	} else
 		kmem_freepages(cachep, page);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 07/16] slab: overloading the RCU head over the LRU for RCU free
@ 2013-08-22  8:44   ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

With build-time size checking, we can overload the RCU head over the LRU
of struct page to free pages of a slab in rcu context. This really help to
implement to overload the struct slab over the struct page and this
eventually reduce memory usage and cache footprint of the SLAB.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 0c62175..b8d19b1 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -51,7 +51,14 @@
  *  }
  *  rcu_read_unlock();
  *
- * See also the comment on struct slab_rcu in mm/slab.c.
+ * This is useful if we need to approach a kernel structure obliquely,
+ * from its address obtained without the usual locking. We can lock
+ * the structure to stabilize it and check it's still at the given address,
+ * only if we can be sure that the memory has not been meanwhile reused
+ * for some other kind of object (which our subsystem's lock might corrupt).
+ *
+ * rcu_read_lock before reading the address, then rcu_read_unlock after
+ * taking the spinlock within the structure expected at that address.
  */
 #define SLAB_DESTROY_BY_RCU	0x00080000UL	/* Defer freeing slabs to RCU */
 #define SLAB_MEM_SPREAD		0x00100000UL	/* Spread some memory over cpuset */
diff --git a/mm/slab.c b/mm/slab.c
index 607a9b8..9e98ee0 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -189,25 +189,6 @@ typedef unsigned int kmem_bufctl_t;
 #define	SLAB_LIMIT	(((kmem_bufctl_t)(~0U))-3)
 
 /*
- * struct slab_rcu
- *
- * slab_destroy on a SLAB_DESTROY_BY_RCU cache uses this structure to
- * arrange for kmem_freepages to be called via RCU.  This is useful if
- * we need to approach a kernel structure obliquely, from its address
- * obtained without the usual locking.  We can lock the structure to
- * stabilize it and check it's still at the given address, only if we
- * can be sure that the memory has not been meanwhile reused for some
- * other kind of object (which our subsystem's lock might corrupt).
- *
- * rcu_read_lock before reading the address, then rcu_read_unlock after
- * taking the spinlock within the structure expected at that address.
- */
-struct slab_rcu {
-	struct rcu_head head;
-	struct page *page;
-};
-
-/*
  * struct slab
  *
  * Manages the objs in a slab. Placed either at the beginning of mem allocated
@@ -215,14 +196,11 @@ struct slab_rcu {
  * Slabs are chained into three list: fully used, partial, fully free slabs.
  */
 struct slab {
-	union {
-		struct {
-			struct list_head list;
-			void *s_mem;		/* including colour offset */
-			unsigned int inuse;	/* num of objs active in slab */
-			kmem_bufctl_t free;
-		};
-		struct slab_rcu __slab_cover_slab_rcu;
+	struct {
+		struct list_head list;
+		void *s_mem;		/* including colour offset */
+		unsigned int inuse;	/* num of objs active in slab */
+		kmem_bufctl_t free;
 	};
 };
 
@@ -1503,6 +1481,8 @@ void __init kmem_cache_init(void)
 {
 	int i;
 
+	BUILD_BUG_ON(sizeof(((struct page *)NULL)->lru) <
+					sizeof(struct rcu_head));
 	kmem_cache = &kmem_cache_boot;
 	setup_node_pointer(kmem_cache);
 
@@ -1816,10 +1796,13 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
 
 static void kmem_rcu_free(struct rcu_head *head)
 {
-	struct slab_rcu *slab_rcu = (struct slab_rcu *)head;
-	struct kmem_cache *cachep = slab_rcu->page->slab_cache;
+	struct kmem_cache *cachep;
+	struct page *page;
 
-	kmem_freepages(cachep, slab_rcu->page);
+	page = container_of((struct list_head *)head, struct page, lru);
+	cachep = page->slab_cache;
+
+	kmem_freepages(cachep, page);
 }
 
 #if DEBUG
@@ -2040,11 +2023,11 @@ static void slab_destroy(struct kmem_cache *cachep, struct slab *slabp)
 
 	slab_destroy_debugcheck(cachep, slabp);
 	if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU)) {
-		struct slab_rcu *slab_rcu;
+		struct rcu_head *head;
 
-		slab_rcu = (struct slab_rcu *)slabp;
-		slab_rcu->page = page;
-		call_rcu(&slab_rcu->head, kmem_rcu_free);
+		/* RCU free overloads the RCU head over the LRU */
+		head = (void *)&page->lru;
+		call_rcu(head, kmem_rcu_free);
 
 	} else
 		kmem_freepages(cachep, page);
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 08/16] slab: use well-defined macro, virt_to_slab()
  2013-08-22  8:44 ` Joonsoo Kim
@ 2013-08-22  8:44   ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

This is trivial change, just use well-defined macro.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index 9e98ee0..ee03eba 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2853,7 +2853,6 @@ static inline void verify_redzone_free(struct kmem_cache *cache, void *obj)
 static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
 				   unsigned long caller)
 {
-	struct page *page;
 	unsigned int objnr;
 	struct slab *slabp;
 
@@ -2861,9 +2860,7 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
 
 	objp -= obj_offset(cachep);
 	kfree_debugcheck(objp);
-	page = virt_to_head_page(objp);
-
-	slabp = page->slab_page;
+	slabp = virt_to_slab(objp);
 
 	if (cachep->flags & SLAB_RED_ZONE) {
 		verify_redzone_free(cachep, objp);
@@ -3075,7 +3072,7 @@ static void *cache_alloc_debugcheck_after(struct kmem_cache *cachep,
 		struct slab *slabp;
 		unsigned objnr;
 
-		slabp = virt_to_head_page(objp)->slab_page;
+		slabp = virt_to_slab(objp);
 		objnr = (unsigned)(objp - slabp->s_mem) / cachep->size;
 		slab_bufctl(slabp)[objnr] = BUFCTL_ACTIVE;
 	}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 08/16] slab: use well-defined macro, virt_to_slab()
@ 2013-08-22  8:44   ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

This is trivial change, just use well-defined macro.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index 9e98ee0..ee03eba 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2853,7 +2853,6 @@ static inline void verify_redzone_free(struct kmem_cache *cache, void *obj)
 static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
 				   unsigned long caller)
 {
-	struct page *page;
 	unsigned int objnr;
 	struct slab *slabp;
 
@@ -2861,9 +2860,7 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
 
 	objp -= obj_offset(cachep);
 	kfree_debugcheck(objp);
-	page = virt_to_head_page(objp);
-
-	slabp = page->slab_page;
+	slabp = virt_to_slab(objp);
 
 	if (cachep->flags & SLAB_RED_ZONE) {
 		verify_redzone_free(cachep, objp);
@@ -3075,7 +3072,7 @@ static void *cache_alloc_debugcheck_after(struct kmem_cache *cachep,
 		struct slab *slabp;
 		unsigned objnr;
 
-		slabp = virt_to_head_page(objp)->slab_page;
+		slabp = virt_to_slab(objp);
 		objnr = (unsigned)(objp - slabp->s_mem) / cachep->size;
 		slab_bufctl(slabp)[objnr] = BUFCTL_ACTIVE;
 	}
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 09/16] slab: use __GFP_COMP flag for allocating slab pages
  2013-08-22  8:44 ` Joonsoo Kim
@ 2013-08-22  8:44   ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

If we use 'struct page' of first page as 'struct slab', there is no
advantage not to use __GFP_COMP. So use __GFP_COMP flag for all the cases.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index ee03eba..855f481 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1712,15 +1712,6 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
 {
 	struct page *page;
 	int nr_pages;
-	int i;
-
-#ifndef CONFIG_MMU
-	/*
-	 * Nommu uses slab's for process anonymous memory allocations, and thus
-	 * requires __GFP_COMP to properly refcount higher order allocations
-	 */
-	flags |= __GFP_COMP;
-#endif
 
 	flags |= cachep->allocflags;
 	if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
@@ -1744,12 +1735,9 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
 	else
 		add_zone_page_state(page_zone(page),
 			NR_SLAB_UNRECLAIMABLE, nr_pages);
-	for (i = 0; i < nr_pages; i++) {
-		__SetPageSlab(page + i);
-
-		if (page->pfmemalloc)
-			SetPageSlabPfmemalloc(page);
-	}
+	__SetPageSlab(page);
+	if (page->pfmemalloc)
+		SetPageSlabPfmemalloc(page);
 	memcg_bind_pages(cachep, cachep->gfporder);
 
 	if (kmemcheck_enabled && !(cachep->flags & SLAB_NOTRACK)) {
@@ -1769,8 +1757,7 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
  */
 static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
 {
-	unsigned long i = (1 << cachep->gfporder);
-	const unsigned long nr_freed = i;
+	const unsigned long nr_freed = (1 << cachep->gfporder);
 
 	kmemcheck_free_shadow(page, cachep->gfporder);
 
@@ -1781,12 +1768,9 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
 		sub_zone_page_state(page_zone(page),
 				NR_SLAB_UNRECLAIMABLE, nr_freed);
 
+	BUG_ON(!PageSlab(page));
 	__ClearPageSlabPfmemalloc(page);
-	while (i--) {
-		BUG_ON(!PageSlab(page));
-		__ClearPageSlab(page);
-		page++;
-	}
+	__ClearPageSlab(page);
 
 	memcg_release_pages(cachep, cachep->gfporder);
 	if (current->reclaim_state)
@@ -2350,7 +2334,7 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	cachep->colour = left_over / cachep->colour_off;
 	cachep->slab_size = slab_size;
 	cachep->flags = flags;
-	cachep->allocflags = 0;
+	cachep->allocflags = __GFP_COMP;
 	if (CONFIG_ZONE_DMA_FLAG && (flags & SLAB_CACHE_DMA))
 		cachep->allocflags |= GFP_DMA;
 	cachep->size = size;
@@ -2717,17 +2701,8 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
 static void slab_map_pages(struct kmem_cache *cache, struct slab *slab,
 			   struct page *page)
 {
-	int nr_pages;
-
-	nr_pages = 1;
-	if (likely(!PageCompound(page)))
-		nr_pages <<= cache->gfporder;
-
-	do {
-		page->slab_cache = cache;
-		page->slab_page = slab;
-		page++;
-	} while (--nr_pages);
+	page->slab_cache = cache;
+	page->slab_page = slab;
 }
 
 /*
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 09/16] slab: use __GFP_COMP flag for allocating slab pages
@ 2013-08-22  8:44   ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

If we use 'struct page' of first page as 'struct slab', there is no
advantage not to use __GFP_COMP. So use __GFP_COMP flag for all the cases.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index ee03eba..855f481 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1712,15 +1712,6 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
 {
 	struct page *page;
 	int nr_pages;
-	int i;
-
-#ifndef CONFIG_MMU
-	/*
-	 * Nommu uses slab's for process anonymous memory allocations, and thus
-	 * requires __GFP_COMP to properly refcount higher order allocations
-	 */
-	flags |= __GFP_COMP;
-#endif
 
 	flags |= cachep->allocflags;
 	if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
@@ -1744,12 +1735,9 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
 	else
 		add_zone_page_state(page_zone(page),
 			NR_SLAB_UNRECLAIMABLE, nr_pages);
-	for (i = 0; i < nr_pages; i++) {
-		__SetPageSlab(page + i);
-
-		if (page->pfmemalloc)
-			SetPageSlabPfmemalloc(page);
-	}
+	__SetPageSlab(page);
+	if (page->pfmemalloc)
+		SetPageSlabPfmemalloc(page);
 	memcg_bind_pages(cachep, cachep->gfporder);
 
 	if (kmemcheck_enabled && !(cachep->flags & SLAB_NOTRACK)) {
@@ -1769,8 +1757,7 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
  */
 static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
 {
-	unsigned long i = (1 << cachep->gfporder);
-	const unsigned long nr_freed = i;
+	const unsigned long nr_freed = (1 << cachep->gfporder);
 
 	kmemcheck_free_shadow(page, cachep->gfporder);
 
@@ -1781,12 +1768,9 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
 		sub_zone_page_state(page_zone(page),
 				NR_SLAB_UNRECLAIMABLE, nr_freed);
 
+	BUG_ON(!PageSlab(page));
 	__ClearPageSlabPfmemalloc(page);
-	while (i--) {
-		BUG_ON(!PageSlab(page));
-		__ClearPageSlab(page);
-		page++;
-	}
+	__ClearPageSlab(page);
 
 	memcg_release_pages(cachep, cachep->gfporder);
 	if (current->reclaim_state)
@@ -2350,7 +2334,7 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	cachep->colour = left_over / cachep->colour_off;
 	cachep->slab_size = slab_size;
 	cachep->flags = flags;
-	cachep->allocflags = 0;
+	cachep->allocflags = __GFP_COMP;
 	if (CONFIG_ZONE_DMA_FLAG && (flags & SLAB_CACHE_DMA))
 		cachep->allocflags |= GFP_DMA;
 	cachep->size = size;
@@ -2717,17 +2701,8 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
 static void slab_map_pages(struct kmem_cache *cache, struct slab *slab,
 			   struct page *page)
 {
-	int nr_pages;
-
-	nr_pages = 1;
-	if (likely(!PageCompound(page)))
-		nr_pages <<= cache->gfporder;
-
-	do {
-		page->slab_cache = cache;
-		page->slab_page = slab;
-		page++;
-	} while (--nr_pages);
+	page->slab_cache = cache;
+	page->slab_page = slab;
 }
 
 /*
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 10/16] slab: change the management method of free objects of the slab
  2013-08-22  8:44 ` Joonsoo Kim
@ 2013-08-22  8:44   ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

Current free objects management method of the slab is weird, because
it touch random position of the array of kmem_bufctl_t when we try to
get free object. See following example.

struct slab's free = 6
kmem_bufctl_t array: 1 END 5 7 0 4 3 2

To get free objects, we access this array with following pattern.
6 -> 3 -> 7 -> 2 -> 5 -> 4 -> 0 -> 1 -> END

If we have many objects, this array would be larger and be not in the same
cache line. It is not good for performance.

We can do same thing through more easy way, like as the stack.
Only thing we have to do is to maintain stack top to free object. I use
free field of struct slab for this purpose. After that, if we need to get
an object, we can get it at stack top and manipulate top pointer.
That's all. This method already used in array_cache management.
Following is an access pattern when we use this method.

struct slab's free = 0
kmem_bufctl_t array: 6 3 7 2 5 4 0 1

To get free objects, we access this array with following pattern.
0 -> 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7

This may help cache line footprint if slab has many objects, and,
in addition, this makes code much much simpler.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index 855f481..4551d57 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -183,9 +183,6 @@ static bool pfmemalloc_active __read_mostly;
  */
 
 typedef unsigned int kmem_bufctl_t;
-#define BUFCTL_END	(((kmem_bufctl_t)(~0U))-0)
-#define BUFCTL_FREE	(((kmem_bufctl_t)(~0U))-1)
-#define	BUFCTL_ACTIVE	(((kmem_bufctl_t)(~0U))-2)
 #define	SLAB_LIMIT	(((kmem_bufctl_t)(~0U))-3)
 
 /*
@@ -2641,9 +2638,8 @@ static void cache_init_objs(struct kmem_cache *cachep,
 		if (cachep->ctor)
 			cachep->ctor(objp);
 #endif
-		slab_bufctl(slabp)[i] = i + 1;
+		slab_bufctl(slabp)[i] = i;
 	}
-	slab_bufctl(slabp)[i - 1] = BUFCTL_END;
 }
 
 static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
@@ -2659,16 +2655,14 @@ static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
 static void *slab_get_obj(struct kmem_cache *cachep, struct slab *slabp,
 				int nodeid)
 {
-	void *objp = index_to_obj(cachep, slabp, slabp->free);
-	kmem_bufctl_t next;
+	void *objp;
 
 	slabp->inuse++;
-	next = slab_bufctl(slabp)[slabp->free];
+	objp = index_to_obj(cachep, slabp, slab_bufctl(slabp)[slabp->free]);
 #if DEBUG
-	slab_bufctl(slabp)[slabp->free] = BUFCTL_FREE;
 	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
 #endif
-	slabp->free = next;
+	slabp->free++;
 
 	return objp;
 }
@@ -2677,19 +2671,23 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
 				void *objp, int nodeid)
 {
 	unsigned int objnr = obj_to_index(cachep, slabp, objp);
-
 #if DEBUG
+	kmem_bufctl_t i;
+
 	/* Verify that the slab belongs to the intended node */
 	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
 
-	if (slab_bufctl(slabp)[objnr] + 1 <= SLAB_LIMIT + 1) {
-		printk(KERN_ERR "slab: double free detected in cache "
-				"'%s', objp %p\n", cachep->name, objp);
-		BUG();
+	/* Verify double free bug */
+	for (i = slabp->free; i < cachep->num; i++) {
+		if (slab_bufctl(slabp)[i] == objnr) {
+			printk(KERN_ERR "slab: double free detected in cache "
+					"'%s', objp %p\n", cachep->name, objp);
+			BUG();
+		}
 	}
 #endif
-	slab_bufctl(slabp)[objnr] = slabp->free;
-	slabp->free = objnr;
+	slabp->free--;
+	slab_bufctl(slabp)[slabp->free] = objnr;
 	slabp->inuse--;
 }
 
@@ -2850,9 +2848,6 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
 	BUG_ON(objnr >= cachep->num);
 	BUG_ON(objp != index_to_obj(cachep, slabp, objnr));
 
-#ifdef CONFIG_DEBUG_SLAB_LEAK
-	slab_bufctl(slabp)[objnr] = BUFCTL_FREE;
-#endif
 	if (cachep->flags & SLAB_POISON) {
 #ifdef CONFIG_DEBUG_PAGEALLOC
 		if ((cachep->size % PAGE_SIZE)==0 && OFF_SLAB(cachep)) {
@@ -2869,33 +2864,9 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
 	return objp;
 }
 
-static void check_slabp(struct kmem_cache *cachep, struct slab *slabp)
-{
-	kmem_bufctl_t i;
-	int entries = 0;
-
-	/* Check slab's freelist to see if this obj is there. */
-	for (i = slabp->free; i != BUFCTL_END; i = slab_bufctl(slabp)[i]) {
-		entries++;
-		if (entries > cachep->num || i >= cachep->num)
-			goto bad;
-	}
-	if (entries != cachep->num - slabp->inuse) {
-bad:
-		printk(KERN_ERR "slab: Internal list corruption detected in "
-			"cache '%s'(%d), slabp %p(%d). Tainted(%s). Hexdump:\n",
-			cachep->name, cachep->num, slabp, slabp->inuse,
-			print_tainted());
-		print_hex_dump(KERN_ERR, "", DUMP_PREFIX_OFFSET, 16, 1, slabp,
-			sizeof(*slabp) + cachep->num * sizeof(kmem_bufctl_t),
-			1);
-		BUG();
-	}
-}
 #else
 #define kfree_debugcheck(x) do { } while(0)
 #define cache_free_debugcheck(x,objp,z) (objp)
-#define check_slabp(x,y) do { } while(0)
 #endif
 
 static void *cache_alloc_refill(struct kmem_cache *cachep, gfp_t flags,
@@ -2945,7 +2916,6 @@ retry:
 		}
 
 		slabp = list_entry(entry, struct slab, list);
-		check_slabp(cachep, slabp);
 		check_spinlock_acquired(cachep);
 
 		/*
@@ -2963,11 +2933,10 @@ retry:
 			ac_put_obj(cachep, ac, slab_get_obj(cachep, slabp,
 									node));
 		}
-		check_slabp(cachep, slabp);
 
 		/* move slabp to correct slabp list: */
 		list_del(&slabp->list);
-		if (slabp->free == BUFCTL_END)
+		if (slabp->free == cachep->num)
 			list_add(&slabp->list, &n->slabs_full);
 		else
 			list_add(&slabp->list, &n->slabs_partial);
@@ -3042,16 +3011,6 @@ static void *cache_alloc_debugcheck_after(struct kmem_cache *cachep,
 		*dbg_redzone1(cachep, objp) = RED_ACTIVE;
 		*dbg_redzone2(cachep, objp) = RED_ACTIVE;
 	}
-#ifdef CONFIG_DEBUG_SLAB_LEAK
-	{
-		struct slab *slabp;
-		unsigned objnr;
-
-		slabp = virt_to_slab(objp);
-		objnr = (unsigned)(objp - slabp->s_mem) / cachep->size;
-		slab_bufctl(slabp)[objnr] = BUFCTL_ACTIVE;
-	}
-#endif
 	objp += obj_offset(cachep);
 	if (cachep->ctor && cachep->flags & SLAB_POISON)
 		cachep->ctor(objp);
@@ -3257,7 +3216,6 @@ retry:
 
 	slabp = list_entry(entry, struct slab, list);
 	check_spinlock_acquired_node(cachep, nodeid);
-	check_slabp(cachep, slabp);
 
 	STATS_INC_NODEALLOCS(cachep);
 	STATS_INC_ACTIVE(cachep);
@@ -3266,12 +3224,11 @@ retry:
 	BUG_ON(slabp->inuse == cachep->num);
 
 	obj = slab_get_obj(cachep, slabp, nodeid);
-	check_slabp(cachep, slabp);
 	n->free_objects--;
 	/* move slabp to correct slabp list: */
 	list_del(&slabp->list);
 
-	if (slabp->free == BUFCTL_END)
+	if (slabp->free == cachep->num)
 		list_add(&slabp->list, &n->slabs_full);
 	else
 		list_add(&slabp->list, &n->slabs_partial);
@@ -3445,11 +3402,9 @@ static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects,
 		n = cachep->node[node];
 		list_del(&slabp->list);
 		check_spinlock_acquired_node(cachep, node);
-		check_slabp(cachep, slabp);
 		slab_put_obj(cachep, slabp, objp, node);
 		STATS_DEC_ACTIVE(cachep);
 		n->free_objects++;
-		check_slabp(cachep, slabp);
 
 		/* fixup slab chains */
 		if (slabp->inuse == 0) {
@@ -4297,12 +4252,23 @@ static inline int add_caller(unsigned long *n, unsigned long v)
 static void handle_slab(unsigned long *n, struct kmem_cache *c, struct slab *s)
 {
 	void *p;
-	int i;
+	int i, j;
+
 	if (n[0] == n[1])
 		return;
 	for (i = 0, p = s->s_mem; i < c->num; i++, p += c->size) {
-		if (slab_bufctl(s)[i] != BUFCTL_ACTIVE)
+		bool active = true;
+
+		for (j = s->free; j < c->num; j++) {
+			/* Skip freed item */
+			if (slab_bufctl(s)[j] == i) {
+				active = false;
+				break;
+			}
+		}
+		if (!active)
 			continue;
+
 		if (!add_caller(n, (unsigned long)*dbg_userword(c, p)))
 			return;
 	}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 10/16] slab: change the management method of free objects of the slab
@ 2013-08-22  8:44   ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

Current free objects management method of the slab is weird, because
it touch random position of the array of kmem_bufctl_t when we try to
get free object. See following example.

struct slab's free = 6
kmem_bufctl_t array: 1 END 5 7 0 4 3 2

To get free objects, we access this array with following pattern.
6 -> 3 -> 7 -> 2 -> 5 -> 4 -> 0 -> 1 -> END

If we have many objects, this array would be larger and be not in the same
cache line. It is not good for performance.

We can do same thing through more easy way, like as the stack.
Only thing we have to do is to maintain stack top to free object. I use
free field of struct slab for this purpose. After that, if we need to get
an object, we can get it at stack top and manipulate top pointer.
That's all. This method already used in array_cache management.
Following is an access pattern when we use this method.

struct slab's free = 0
kmem_bufctl_t array: 6 3 7 2 5 4 0 1

To get free objects, we access this array with following pattern.
0 -> 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7

This may help cache line footprint if slab has many objects, and,
in addition, this makes code much much simpler.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index 855f481..4551d57 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -183,9 +183,6 @@ static bool pfmemalloc_active __read_mostly;
  */
 
 typedef unsigned int kmem_bufctl_t;
-#define BUFCTL_END	(((kmem_bufctl_t)(~0U))-0)
-#define BUFCTL_FREE	(((kmem_bufctl_t)(~0U))-1)
-#define	BUFCTL_ACTIVE	(((kmem_bufctl_t)(~0U))-2)
 #define	SLAB_LIMIT	(((kmem_bufctl_t)(~0U))-3)
 
 /*
@@ -2641,9 +2638,8 @@ static void cache_init_objs(struct kmem_cache *cachep,
 		if (cachep->ctor)
 			cachep->ctor(objp);
 #endif
-		slab_bufctl(slabp)[i] = i + 1;
+		slab_bufctl(slabp)[i] = i;
 	}
-	slab_bufctl(slabp)[i - 1] = BUFCTL_END;
 }
 
 static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
@@ -2659,16 +2655,14 @@ static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
 static void *slab_get_obj(struct kmem_cache *cachep, struct slab *slabp,
 				int nodeid)
 {
-	void *objp = index_to_obj(cachep, slabp, slabp->free);
-	kmem_bufctl_t next;
+	void *objp;
 
 	slabp->inuse++;
-	next = slab_bufctl(slabp)[slabp->free];
+	objp = index_to_obj(cachep, slabp, slab_bufctl(slabp)[slabp->free]);
 #if DEBUG
-	slab_bufctl(slabp)[slabp->free] = BUFCTL_FREE;
 	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
 #endif
-	slabp->free = next;
+	slabp->free++;
 
 	return objp;
 }
@@ -2677,19 +2671,23 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
 				void *objp, int nodeid)
 {
 	unsigned int objnr = obj_to_index(cachep, slabp, objp);
-
 #if DEBUG
+	kmem_bufctl_t i;
+
 	/* Verify that the slab belongs to the intended node */
 	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
 
-	if (slab_bufctl(slabp)[objnr] + 1 <= SLAB_LIMIT + 1) {
-		printk(KERN_ERR "slab: double free detected in cache "
-				"'%s', objp %p\n", cachep->name, objp);
-		BUG();
+	/* Verify double free bug */
+	for (i = slabp->free; i < cachep->num; i++) {
+		if (slab_bufctl(slabp)[i] == objnr) {
+			printk(KERN_ERR "slab: double free detected in cache "
+					"'%s', objp %p\n", cachep->name, objp);
+			BUG();
+		}
 	}
 #endif
-	slab_bufctl(slabp)[objnr] = slabp->free;
-	slabp->free = objnr;
+	slabp->free--;
+	slab_bufctl(slabp)[slabp->free] = objnr;
 	slabp->inuse--;
 }
 
@@ -2850,9 +2848,6 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
 	BUG_ON(objnr >= cachep->num);
 	BUG_ON(objp != index_to_obj(cachep, slabp, objnr));
 
-#ifdef CONFIG_DEBUG_SLAB_LEAK
-	slab_bufctl(slabp)[objnr] = BUFCTL_FREE;
-#endif
 	if (cachep->flags & SLAB_POISON) {
 #ifdef CONFIG_DEBUG_PAGEALLOC
 		if ((cachep->size % PAGE_SIZE)==0 && OFF_SLAB(cachep)) {
@@ -2869,33 +2864,9 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
 	return objp;
 }
 
-static void check_slabp(struct kmem_cache *cachep, struct slab *slabp)
-{
-	kmem_bufctl_t i;
-	int entries = 0;
-
-	/* Check slab's freelist to see if this obj is there. */
-	for (i = slabp->free; i != BUFCTL_END; i = slab_bufctl(slabp)[i]) {
-		entries++;
-		if (entries > cachep->num || i >= cachep->num)
-			goto bad;
-	}
-	if (entries != cachep->num - slabp->inuse) {
-bad:
-		printk(KERN_ERR "slab: Internal list corruption detected in "
-			"cache '%s'(%d), slabp %p(%d). Tainted(%s). Hexdump:\n",
-			cachep->name, cachep->num, slabp, slabp->inuse,
-			print_tainted());
-		print_hex_dump(KERN_ERR, "", DUMP_PREFIX_OFFSET, 16, 1, slabp,
-			sizeof(*slabp) + cachep->num * sizeof(kmem_bufctl_t),
-			1);
-		BUG();
-	}
-}
 #else
 #define kfree_debugcheck(x) do { } while(0)
 #define cache_free_debugcheck(x,objp,z) (objp)
-#define check_slabp(x,y) do { } while(0)
 #endif
 
 static void *cache_alloc_refill(struct kmem_cache *cachep, gfp_t flags,
@@ -2945,7 +2916,6 @@ retry:
 		}
 
 		slabp = list_entry(entry, struct slab, list);
-		check_slabp(cachep, slabp);
 		check_spinlock_acquired(cachep);
 
 		/*
@@ -2963,11 +2933,10 @@ retry:
 			ac_put_obj(cachep, ac, slab_get_obj(cachep, slabp,
 									node));
 		}
-		check_slabp(cachep, slabp);
 
 		/* move slabp to correct slabp list: */
 		list_del(&slabp->list);
-		if (slabp->free == BUFCTL_END)
+		if (slabp->free == cachep->num)
 			list_add(&slabp->list, &n->slabs_full);
 		else
 			list_add(&slabp->list, &n->slabs_partial);
@@ -3042,16 +3011,6 @@ static void *cache_alloc_debugcheck_after(struct kmem_cache *cachep,
 		*dbg_redzone1(cachep, objp) = RED_ACTIVE;
 		*dbg_redzone2(cachep, objp) = RED_ACTIVE;
 	}
-#ifdef CONFIG_DEBUG_SLAB_LEAK
-	{
-		struct slab *slabp;
-		unsigned objnr;
-
-		slabp = virt_to_slab(objp);
-		objnr = (unsigned)(objp - slabp->s_mem) / cachep->size;
-		slab_bufctl(slabp)[objnr] = BUFCTL_ACTIVE;
-	}
-#endif
 	objp += obj_offset(cachep);
 	if (cachep->ctor && cachep->flags & SLAB_POISON)
 		cachep->ctor(objp);
@@ -3257,7 +3216,6 @@ retry:
 
 	slabp = list_entry(entry, struct slab, list);
 	check_spinlock_acquired_node(cachep, nodeid);
-	check_slabp(cachep, slabp);
 
 	STATS_INC_NODEALLOCS(cachep);
 	STATS_INC_ACTIVE(cachep);
@@ -3266,12 +3224,11 @@ retry:
 	BUG_ON(slabp->inuse == cachep->num);
 
 	obj = slab_get_obj(cachep, slabp, nodeid);
-	check_slabp(cachep, slabp);
 	n->free_objects--;
 	/* move slabp to correct slabp list: */
 	list_del(&slabp->list);
 
-	if (slabp->free == BUFCTL_END)
+	if (slabp->free == cachep->num)
 		list_add(&slabp->list, &n->slabs_full);
 	else
 		list_add(&slabp->list, &n->slabs_partial);
@@ -3445,11 +3402,9 @@ static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects,
 		n = cachep->node[node];
 		list_del(&slabp->list);
 		check_spinlock_acquired_node(cachep, node);
-		check_slabp(cachep, slabp);
 		slab_put_obj(cachep, slabp, objp, node);
 		STATS_DEC_ACTIVE(cachep);
 		n->free_objects++;
-		check_slabp(cachep, slabp);
 
 		/* fixup slab chains */
 		if (slabp->inuse == 0) {
@@ -4297,12 +4252,23 @@ static inline int add_caller(unsigned long *n, unsigned long v)
 static void handle_slab(unsigned long *n, struct kmem_cache *c, struct slab *s)
 {
 	void *p;
-	int i;
+	int i, j;
+
 	if (n[0] == n[1])
 		return;
 	for (i = 0, p = s->s_mem; i < c->num; i++, p += c->size) {
-		if (slab_bufctl(s)[i] != BUFCTL_ACTIVE)
+		bool active = true;
+
+		for (j = s->free; j < c->num; j++) {
+			/* Skip freed item */
+			if (slab_bufctl(s)[j] == i) {
+				active = false;
+				break;
+			}
+		}
+		if (!active)
 			continue;
+
 		if (!add_caller(n, (unsigned long)*dbg_userword(c, p)))
 			return;
 	}
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 11/16] slab: remove kmem_bufctl_t
  2013-08-22  8:44 ` Joonsoo Kim
@ 2013-08-22  8:44   ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

Now, we changed the management method of free objects of the slab and
there is no need to use special value, BUFCTL_END, BUFCTL_FREE and
BUFCTL_ACTIVE. So remove them.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index 4551d57..7216ebe 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -163,27 +163,7 @@
  */
 static bool pfmemalloc_active __read_mostly;
 
-/*
- * kmem_bufctl_t:
- *
- * Bufctl's are used for linking objs within a slab
- * linked offsets.
- *
- * This implementation relies on "struct page" for locating the cache &
- * slab an object belongs to.
- * This allows the bufctl structure to be small (one int), but limits
- * the number of objects a slab (not a cache) can contain when off-slab
- * bufctls are used. The limit is the size of the largest general cache
- * that does not use off-slab slabs.
- * For 32bit archs with 4 kB pages, is this 56.
- * This is not serious, as it is only for large objects, when it is unwise
- * to have too many per slab.
- * Note: This limit can be raised by introducing a general cache whose size
- * is less than 512 (PAGE_SIZE<<3), but greater than 256.
- */
-
-typedef unsigned int kmem_bufctl_t;
-#define	SLAB_LIMIT	(((kmem_bufctl_t)(~0U))-3)
+#define	SLAB_LIMIT	(((unsigned int)(~0U))-1)
 
 /*
  * struct slab
@@ -197,7 +177,7 @@ struct slab {
 		struct list_head list;
 		void *s_mem;		/* including colour offset */
 		unsigned int inuse;	/* num of objs active in slab */
-		kmem_bufctl_t free;
+		unsigned int free;
 	};
 };
 
@@ -613,7 +593,7 @@ static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep)
 
 static size_t slab_mgmt_size(size_t nr_objs, size_t align)
 {
-	return ALIGN(sizeof(struct slab)+nr_objs*sizeof(kmem_bufctl_t), align);
+	return ALIGN(sizeof(struct slab)+nr_objs*sizeof(unsigned int), align);
 }
 
 /*
@@ -633,7 +613,7 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
 	 * slab is used for:
 	 *
 	 * - The struct slab
-	 * - One kmem_bufctl_t for each object
+	 * - One unsigned int for each object
 	 * - Padding to respect alignment of @align
 	 * - @buffer_size bytes for each object
 	 *
@@ -658,7 +638,7 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
 		 * into account.
 		 */
 		nr_objs = (slab_size - sizeof(struct slab)) /
-			  (buffer_size + sizeof(kmem_bufctl_t));
+			  (buffer_size + sizeof(unsigned int));
 
 		/*
 		 * This calculated number will be either the right
@@ -2056,7 +2036,7 @@ static size_t calculate_slab_order(struct kmem_cache *cachep,
 			 * looping condition in cache_grow().
 			 */
 			offslab_limit = size - sizeof(struct slab);
-			offslab_limit /= sizeof(kmem_bufctl_t);
+			offslab_limit /= sizeof(unsigned int);
 
  			if (num > offslab_limit)
 				break;
@@ -2297,7 +2277,7 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	if (!cachep->num)
 		return -E2BIG;
 
-	slab_size = ALIGN(cachep->num * sizeof(kmem_bufctl_t)
+	slab_size = ALIGN(cachep->num * sizeof(unsigned int)
 			  + sizeof(struct slab), cachep->align);
 
 	/*
@@ -2312,7 +2292,7 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	if (flags & CFLGS_OFF_SLAB) {
 		/* really off slab. No need for manual alignment */
 		slab_size =
-		    cachep->num * sizeof(kmem_bufctl_t) + sizeof(struct slab);
+		    cachep->num * sizeof(unsigned int) + sizeof(struct slab);
 
 #ifdef CONFIG_PAGE_POISONING
 		/* If we're going to use the generic kernel_map_pages()
@@ -2591,9 +2571,9 @@ static struct slab *alloc_slabmgmt(struct kmem_cache *cachep,
 	return slabp;
 }
 
-static inline kmem_bufctl_t *slab_bufctl(struct slab *slabp)
+static inline unsigned int *slab_bufctl(struct slab *slabp)
 {
-	return (kmem_bufctl_t *) (slabp + 1);
+	return (unsigned int *) (slabp + 1);
 }
 
 static void cache_init_objs(struct kmem_cache *cachep,
@@ -2672,7 +2652,7 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
 {
 	unsigned int objnr = obj_to_index(cachep, slabp, objp);
 #if DEBUG
-	kmem_bufctl_t i;
+	unsigned int i;
 
 	/* Verify that the slab belongs to the intended node */
 	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 11/16] slab: remove kmem_bufctl_t
@ 2013-08-22  8:44   ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

Now, we changed the management method of free objects of the slab and
there is no need to use special value, BUFCTL_END, BUFCTL_FREE and
BUFCTL_ACTIVE. So remove them.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index 4551d57..7216ebe 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -163,27 +163,7 @@
  */
 static bool pfmemalloc_active __read_mostly;
 
-/*
- * kmem_bufctl_t:
- *
- * Bufctl's are used for linking objs within a slab
- * linked offsets.
- *
- * This implementation relies on "struct page" for locating the cache &
- * slab an object belongs to.
- * This allows the bufctl structure to be small (one int), but limits
- * the number of objects a slab (not a cache) can contain when off-slab
- * bufctls are used. The limit is the size of the largest general cache
- * that does not use off-slab slabs.
- * For 32bit archs with 4 kB pages, is this 56.
- * This is not serious, as it is only for large objects, when it is unwise
- * to have too many per slab.
- * Note: This limit can be raised by introducing a general cache whose size
- * is less than 512 (PAGE_SIZE<<3), but greater than 256.
- */
-
-typedef unsigned int kmem_bufctl_t;
-#define	SLAB_LIMIT	(((kmem_bufctl_t)(~0U))-3)
+#define	SLAB_LIMIT	(((unsigned int)(~0U))-1)
 
 /*
  * struct slab
@@ -197,7 +177,7 @@ struct slab {
 		struct list_head list;
 		void *s_mem;		/* including colour offset */
 		unsigned int inuse;	/* num of objs active in slab */
-		kmem_bufctl_t free;
+		unsigned int free;
 	};
 };
 
@@ -613,7 +593,7 @@ static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep)
 
 static size_t slab_mgmt_size(size_t nr_objs, size_t align)
 {
-	return ALIGN(sizeof(struct slab)+nr_objs*sizeof(kmem_bufctl_t), align);
+	return ALIGN(sizeof(struct slab)+nr_objs*sizeof(unsigned int), align);
 }
 
 /*
@@ -633,7 +613,7 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
 	 * slab is used for:
 	 *
 	 * - The struct slab
-	 * - One kmem_bufctl_t for each object
+	 * - One unsigned int for each object
 	 * - Padding to respect alignment of @align
 	 * - @buffer_size bytes for each object
 	 *
@@ -658,7 +638,7 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
 		 * into account.
 		 */
 		nr_objs = (slab_size - sizeof(struct slab)) /
-			  (buffer_size + sizeof(kmem_bufctl_t));
+			  (buffer_size + sizeof(unsigned int));
 
 		/*
 		 * This calculated number will be either the right
@@ -2056,7 +2036,7 @@ static size_t calculate_slab_order(struct kmem_cache *cachep,
 			 * looping condition in cache_grow().
 			 */
 			offslab_limit = size - sizeof(struct slab);
-			offslab_limit /= sizeof(kmem_bufctl_t);
+			offslab_limit /= sizeof(unsigned int);
 
  			if (num > offslab_limit)
 				break;
@@ -2297,7 +2277,7 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	if (!cachep->num)
 		return -E2BIG;
 
-	slab_size = ALIGN(cachep->num * sizeof(kmem_bufctl_t)
+	slab_size = ALIGN(cachep->num * sizeof(unsigned int)
 			  + sizeof(struct slab), cachep->align);
 
 	/*
@@ -2312,7 +2292,7 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	if (flags & CFLGS_OFF_SLAB) {
 		/* really off slab. No need for manual alignment */
 		slab_size =
-		    cachep->num * sizeof(kmem_bufctl_t) + sizeof(struct slab);
+		    cachep->num * sizeof(unsigned int) + sizeof(struct slab);
 
 #ifdef CONFIG_PAGE_POISONING
 		/* If we're going to use the generic kernel_map_pages()
@@ -2591,9 +2571,9 @@ static struct slab *alloc_slabmgmt(struct kmem_cache *cachep,
 	return slabp;
 }
 
-static inline kmem_bufctl_t *slab_bufctl(struct slab *slabp)
+static inline unsigned int *slab_bufctl(struct slab *slabp)
 {
-	return (kmem_bufctl_t *) (slabp + 1);
+	return (unsigned int *) (slabp + 1);
 }
 
 static void cache_init_objs(struct kmem_cache *cachep,
@@ -2672,7 +2652,7 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
 {
 	unsigned int objnr = obj_to_index(cachep, slabp, objp);
 #if DEBUG
-	kmem_bufctl_t i;
+	unsigned int i;
 
 	/* Verify that the slab belongs to the intended node */
 	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 12/16] slab: remove SLAB_LIMIT
  2013-08-22  8:44 ` Joonsoo Kim
@ 2013-08-22  8:44   ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

It's useless now, so remove it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index 7216ebe..98257e4 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -163,8 +163,6 @@
  */
 static bool pfmemalloc_active __read_mostly;
 
-#define	SLAB_LIMIT	(((unsigned int)(~0U))-1)
-
 /*
  * struct slab
  *
@@ -626,8 +624,6 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
 		mgmt_size = 0;
 		nr_objs = slab_size / buffer_size;
 
-		if (nr_objs > SLAB_LIMIT)
-			nr_objs = SLAB_LIMIT;
 	} else {
 		/*
 		 * Ignore padding for the initial guess. The padding
@@ -648,9 +644,6 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
 		       > slab_size)
 			nr_objs--;
 
-		if (nr_objs > SLAB_LIMIT)
-			nr_objs = SLAB_LIMIT;
-
 		mgmt_size = slab_mgmt_size(nr_objs, align);
 	}
 	*num = nr_objs;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 12/16] slab: remove SLAB_LIMIT
@ 2013-08-22  8:44   ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

It's useless now, so remove it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index 7216ebe..98257e4 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -163,8 +163,6 @@
  */
 static bool pfmemalloc_active __read_mostly;
 
-#define	SLAB_LIMIT	(((unsigned int)(~0U))-1)
-
 /*
  * struct slab
  *
@@ -626,8 +624,6 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
 		mgmt_size = 0;
 		nr_objs = slab_size / buffer_size;
 
-		if (nr_objs > SLAB_LIMIT)
-			nr_objs = SLAB_LIMIT;
 	} else {
 		/*
 		 * Ignore padding for the initial guess. The padding
@@ -648,9 +644,6 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
 		       > slab_size)
 			nr_objs--;
 
-		if (nr_objs > SLAB_LIMIT)
-			nr_objs = SLAB_LIMIT;
-
 		mgmt_size = slab_mgmt_size(nr_objs, align);
 	}
 	*num = nr_objs;
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 13/16] slab: replace free and inuse in struct slab with newly introduced active
  2013-08-22  8:44 ` Joonsoo Kim
@ 2013-08-22  8:44   ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

Now, free in struct slab is same meaning as inuse.
So, remove both and replace them with active.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index 98257e4..9dcbb22 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -174,8 +174,7 @@ struct slab {
 	struct {
 		struct list_head list;
 		void *s_mem;		/* including colour offset */
-		unsigned int inuse;	/* num of objs active in slab */
-		unsigned int free;
+		unsigned int active;	/* num of objs active in slab */
 	};
 };
 
@@ -1652,7 +1651,7 @@ slab_out_of_memory(struct kmem_cache *cachep, gfp_t gfpflags, int nodeid)
 			active_slabs++;
 		}
 		list_for_each_entry(slabp, &n->slabs_partial, list) {
-			active_objs += slabp->inuse;
+			active_objs += slabp->active;
 			active_slabs++;
 		}
 		list_for_each_entry(slabp, &n->slabs_free, list)
@@ -2439,7 +2438,7 @@ static int drain_freelist(struct kmem_cache *cache,
 
 		slabp = list_entry(p, struct slab, list);
 #if DEBUG
-		BUG_ON(slabp->inuse);
+		BUG_ON(slabp->active);
 #endif
 		list_del(&slabp->list);
 		/*
@@ -2558,9 +2557,8 @@ static struct slab *alloc_slabmgmt(struct kmem_cache *cachep,
 		slabp = addr + colour_off;
 		colour_off += cachep->slab_size;
 	}
-	slabp->inuse = 0;
+	slabp->active = 0;
 	slabp->s_mem = addr + colour_off;
-	slabp->free = 0;
 	return slabp;
 }
 
@@ -2630,12 +2628,11 @@ static void *slab_get_obj(struct kmem_cache *cachep, struct slab *slabp,
 {
 	void *objp;
 
-	slabp->inuse++;
-	objp = index_to_obj(cachep, slabp, slab_bufctl(slabp)[slabp->free]);
+	objp = index_to_obj(cachep, slabp, slab_bufctl(slabp)[slabp->active]);
+	slabp->active++;
 #if DEBUG
 	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
 #endif
-	slabp->free++;
 
 	return objp;
 }
@@ -2651,7 +2648,7 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
 	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
 
 	/* Verify double free bug */
-	for (i = slabp->free; i < cachep->num; i++) {
+	for (i = slabp->active; i < cachep->num; i++) {
 		if (slab_bufctl(slabp)[i] == objnr) {
 			printk(KERN_ERR "slab: double free detected in cache "
 					"'%s', objp %p\n", cachep->name, objp);
@@ -2659,9 +2656,8 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
 		}
 	}
 #endif
-	slabp->free--;
-	slab_bufctl(slabp)[slabp->free] = objnr;
-	slabp->inuse--;
+	slabp->active--;
+	slab_bufctl(slabp)[slabp->active] = objnr;
 }
 
 /*
@@ -2896,9 +2892,9 @@ retry:
 		 * there must be at least one object available for
 		 * allocation.
 		 */
-		BUG_ON(slabp->inuse >= cachep->num);
+		BUG_ON(slabp->active >= cachep->num);
 
-		while (slabp->inuse < cachep->num && batchcount--) {
+		while (slabp->active < cachep->num && batchcount--) {
 			STATS_INC_ALLOCED(cachep);
 			STATS_INC_ACTIVE(cachep);
 			STATS_SET_HIGH(cachep);
@@ -2909,7 +2905,7 @@ retry:
 
 		/* move slabp to correct slabp list: */
 		list_del(&slabp->list);
-		if (slabp->free == cachep->num)
+		if (slabp->active == cachep->num)
 			list_add(&slabp->list, &n->slabs_full);
 		else
 			list_add(&slabp->list, &n->slabs_partial);
@@ -3194,14 +3190,14 @@ retry:
 	STATS_INC_ACTIVE(cachep);
 	STATS_SET_HIGH(cachep);
 
-	BUG_ON(slabp->inuse == cachep->num);
+	BUG_ON(slabp->active == cachep->num);
 
 	obj = slab_get_obj(cachep, slabp, nodeid);
 	n->free_objects--;
 	/* move slabp to correct slabp list: */
 	list_del(&slabp->list);
 
-	if (slabp->free == cachep->num)
+	if (slabp->active == cachep->num)
 		list_add(&slabp->list, &n->slabs_full);
 	else
 		list_add(&slabp->list, &n->slabs_partial);
@@ -3380,7 +3376,7 @@ static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects,
 		n->free_objects++;
 
 		/* fixup slab chains */
-		if (slabp->inuse == 0) {
+		if (slabp->active == 0) {
 			if (n->free_objects > n->free_limit) {
 				n->free_objects -= cachep->num;
 				/* No need to drop any previously held
@@ -3441,7 +3437,7 @@ free_done:
 			struct slab *slabp;
 
 			slabp = list_entry(p, struct slab, list);
-			BUG_ON(slabp->inuse);
+			BUG_ON(slabp->active);
 
 			i++;
 			p = p->next;
@@ -4055,22 +4051,22 @@ void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)
 		spin_lock_irq(&n->list_lock);
 
 		list_for_each_entry(slabp, &n->slabs_full, list) {
-			if (slabp->inuse != cachep->num && !error)
+			if (slabp->active != cachep->num && !error)
 				error = "slabs_full accounting error";
 			active_objs += cachep->num;
 			active_slabs++;
 		}
 		list_for_each_entry(slabp, &n->slabs_partial, list) {
-			if (slabp->inuse == cachep->num && !error)
-				error = "slabs_partial inuse accounting error";
-			if (!slabp->inuse && !error)
-				error = "slabs_partial/inuse accounting error";
-			active_objs += slabp->inuse;
+			if (slabp->active == cachep->num && !error)
+				error = "slabs_partial accounting error";
+			if (!slabp->active && !error)
+				error = "slabs_partial accounting error";
+			active_objs += slabp->active;
 			active_slabs++;
 		}
 		list_for_each_entry(slabp, &n->slabs_free, list) {
-			if (slabp->inuse && !error)
-				error = "slabs_free/inuse accounting error";
+			if (slabp->active && !error)
+				error = "slabs_free accounting error";
 			num_slabs++;
 		}
 		free_objects += n->free_objects;
@@ -4232,7 +4228,7 @@ static void handle_slab(unsigned long *n, struct kmem_cache *c, struct slab *s)
 	for (i = 0, p = s->s_mem; i < c->num; i++, p += c->size) {
 		bool active = true;
 
-		for (j = s->free; j < c->num; j++) {
+		for (j = s->active; j < c->num; j++) {
 			/* Skip freed item */
 			if (slab_bufctl(s)[j] == i) {
 				active = false;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 13/16] slab: replace free and inuse in struct slab with newly introduced active
@ 2013-08-22  8:44   ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

Now, free in struct slab is same meaning as inuse.
So, remove both and replace them with active.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index 98257e4..9dcbb22 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -174,8 +174,7 @@ struct slab {
 	struct {
 		struct list_head list;
 		void *s_mem;		/* including colour offset */
-		unsigned int inuse;	/* num of objs active in slab */
-		unsigned int free;
+		unsigned int active;	/* num of objs active in slab */
 	};
 };
 
@@ -1652,7 +1651,7 @@ slab_out_of_memory(struct kmem_cache *cachep, gfp_t gfpflags, int nodeid)
 			active_slabs++;
 		}
 		list_for_each_entry(slabp, &n->slabs_partial, list) {
-			active_objs += slabp->inuse;
+			active_objs += slabp->active;
 			active_slabs++;
 		}
 		list_for_each_entry(slabp, &n->slabs_free, list)
@@ -2439,7 +2438,7 @@ static int drain_freelist(struct kmem_cache *cache,
 
 		slabp = list_entry(p, struct slab, list);
 #if DEBUG
-		BUG_ON(slabp->inuse);
+		BUG_ON(slabp->active);
 #endif
 		list_del(&slabp->list);
 		/*
@@ -2558,9 +2557,8 @@ static struct slab *alloc_slabmgmt(struct kmem_cache *cachep,
 		slabp = addr + colour_off;
 		colour_off += cachep->slab_size;
 	}
-	slabp->inuse = 0;
+	slabp->active = 0;
 	slabp->s_mem = addr + colour_off;
-	slabp->free = 0;
 	return slabp;
 }
 
@@ -2630,12 +2628,11 @@ static void *slab_get_obj(struct kmem_cache *cachep, struct slab *slabp,
 {
 	void *objp;
 
-	slabp->inuse++;
-	objp = index_to_obj(cachep, slabp, slab_bufctl(slabp)[slabp->free]);
+	objp = index_to_obj(cachep, slabp, slab_bufctl(slabp)[slabp->active]);
+	slabp->active++;
 #if DEBUG
 	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
 #endif
-	slabp->free++;
 
 	return objp;
 }
@@ -2651,7 +2648,7 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
 	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
 
 	/* Verify double free bug */
-	for (i = slabp->free; i < cachep->num; i++) {
+	for (i = slabp->active; i < cachep->num; i++) {
 		if (slab_bufctl(slabp)[i] == objnr) {
 			printk(KERN_ERR "slab: double free detected in cache "
 					"'%s', objp %p\n", cachep->name, objp);
@@ -2659,9 +2656,8 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
 		}
 	}
 #endif
-	slabp->free--;
-	slab_bufctl(slabp)[slabp->free] = objnr;
-	slabp->inuse--;
+	slabp->active--;
+	slab_bufctl(slabp)[slabp->active] = objnr;
 }
 
 /*
@@ -2896,9 +2892,9 @@ retry:
 		 * there must be at least one object available for
 		 * allocation.
 		 */
-		BUG_ON(slabp->inuse >= cachep->num);
+		BUG_ON(slabp->active >= cachep->num);
 
-		while (slabp->inuse < cachep->num && batchcount--) {
+		while (slabp->active < cachep->num && batchcount--) {
 			STATS_INC_ALLOCED(cachep);
 			STATS_INC_ACTIVE(cachep);
 			STATS_SET_HIGH(cachep);
@@ -2909,7 +2905,7 @@ retry:
 
 		/* move slabp to correct slabp list: */
 		list_del(&slabp->list);
-		if (slabp->free == cachep->num)
+		if (slabp->active == cachep->num)
 			list_add(&slabp->list, &n->slabs_full);
 		else
 			list_add(&slabp->list, &n->slabs_partial);
@@ -3194,14 +3190,14 @@ retry:
 	STATS_INC_ACTIVE(cachep);
 	STATS_SET_HIGH(cachep);
 
-	BUG_ON(slabp->inuse == cachep->num);
+	BUG_ON(slabp->active == cachep->num);
 
 	obj = slab_get_obj(cachep, slabp, nodeid);
 	n->free_objects--;
 	/* move slabp to correct slabp list: */
 	list_del(&slabp->list);
 
-	if (slabp->free == cachep->num)
+	if (slabp->active == cachep->num)
 		list_add(&slabp->list, &n->slabs_full);
 	else
 		list_add(&slabp->list, &n->slabs_partial);
@@ -3380,7 +3376,7 @@ static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects,
 		n->free_objects++;
 
 		/* fixup slab chains */
-		if (slabp->inuse == 0) {
+		if (slabp->active == 0) {
 			if (n->free_objects > n->free_limit) {
 				n->free_objects -= cachep->num;
 				/* No need to drop any previously held
@@ -3441,7 +3437,7 @@ free_done:
 			struct slab *slabp;
 
 			slabp = list_entry(p, struct slab, list);
-			BUG_ON(slabp->inuse);
+			BUG_ON(slabp->active);
 
 			i++;
 			p = p->next;
@@ -4055,22 +4051,22 @@ void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)
 		spin_lock_irq(&n->list_lock);
 
 		list_for_each_entry(slabp, &n->slabs_full, list) {
-			if (slabp->inuse != cachep->num && !error)
+			if (slabp->active != cachep->num && !error)
 				error = "slabs_full accounting error";
 			active_objs += cachep->num;
 			active_slabs++;
 		}
 		list_for_each_entry(slabp, &n->slabs_partial, list) {
-			if (slabp->inuse == cachep->num && !error)
-				error = "slabs_partial inuse accounting error";
-			if (!slabp->inuse && !error)
-				error = "slabs_partial/inuse accounting error";
-			active_objs += slabp->inuse;
+			if (slabp->active == cachep->num && !error)
+				error = "slabs_partial accounting error";
+			if (!slabp->active && !error)
+				error = "slabs_partial accounting error";
+			active_objs += slabp->active;
 			active_slabs++;
 		}
 		list_for_each_entry(slabp, &n->slabs_free, list) {
-			if (slabp->inuse && !error)
-				error = "slabs_free/inuse accounting error";
+			if (slabp->active && !error)
+				error = "slabs_free accounting error";
 			num_slabs++;
 		}
 		free_objects += n->free_objects;
@@ -4232,7 +4228,7 @@ static void handle_slab(unsigned long *n, struct kmem_cache *c, struct slab *s)
 	for (i = 0, p = s->s_mem; i < c->num; i++, p += c->size) {
 		bool active = true;
 
-		for (j = s->free; j < c->num; j++) {
+		for (j = s->active; j < c->num; j++) {
 			/* Skip freed item */
 			if (slab_bufctl(s)[j] == i) {
 				active = false;
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 14/16] slab: use struct page for slab management
  2013-08-22  8:44 ` Joonsoo Kim
@ 2013-08-22  8:44   ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

Now, there are a few field in struct slab, so we can overload these
over struct page. This will save some memory and reduce cache footprint.

After this change, slabp_cache and slab_size no longer related to
a struct slab, so rename them as freelist_cache and freelist_size.

These changes are just mechanical ones and there is no functional change.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index ace9a5f..66ee577 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -42,18 +42,22 @@ struct page {
 	/* First double word block */
 	unsigned long flags;		/* Atomic flags, some possibly
 					 * updated asynchronously */
-	struct address_space *mapping;	/* If low bit clear, points to
-					 * inode address_space, or NULL.
-					 * If page mapped as anonymous
-					 * memory, low bit is set, and
-					 * it points to anon_vma object:
-					 * see PAGE_MAPPING_ANON below.
-					 */
+	union {
+		struct address_space *mapping;	/* If low bit clear, points to
+						 * inode address_space, or NULL.
+						 * If page mapped as anonymous
+						 * memory, low bit is set, and
+						 * it points to anon_vma object:
+						 * see PAGE_MAPPING_ANON below.
+						 */
+		void *s_mem;			/* slab first object */
+	};
+
 	/* Second double word */
 	struct {
 		union {
 			pgoff_t index;		/* Our offset within mapping. */
-			void *freelist;		/* slub/slob first free object */
+			void *freelist;		/* sl[aou]b first free object */
 			bool pfmemalloc;	/* If set by the page allocator,
 						 * ALLOC_NO_WATERMARKS was set
 						 * and the low watermark was not
@@ -109,6 +113,7 @@ struct page {
 				};
 				atomic_t _count;		/* Usage count, see below. */
 			};
+			unsigned int active;	/* SLAB */
 		};
 	};
 
diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index cd40158..ca82e8f 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -41,8 +41,8 @@ struct kmem_cache {
 
 	size_t colour;			/* cache colouring range */
 	unsigned int colour_off;	/* colour offset */
-	struct kmem_cache *slabp_cache;
-	unsigned int slab_size;
+	struct kmem_cache *freelist_cache;
+	unsigned int freelist_size;
 
 	/* constructor func */
 	void (*ctor)(void *obj);
diff --git a/mm/slab.c b/mm/slab.c
index 9dcbb22..cf39309 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -164,21 +164,6 @@
 static bool pfmemalloc_active __read_mostly;
 
 /*
- * struct slab
- *
- * Manages the objs in a slab. Placed either at the beginning of mem allocated
- * for a slab, or allocated from an general cache.
- * Slabs are chained into three list: fully used, partial, fully free slabs.
- */
-struct slab {
-	struct {
-		struct list_head list;
-		void *s_mem;		/* including colour offset */
-		unsigned int active;	/* num of objs active in slab */
-	};
-};
-
-/*
  * struct array_cache
  *
  * Purpose:
@@ -405,18 +390,10 @@ static inline struct kmem_cache *virt_to_cache(const void *obj)
 	return page->slab_cache;
 }
 
-static inline struct slab *virt_to_slab(const void *obj)
-{
-	struct page *page = virt_to_head_page(obj);
-
-	VM_BUG_ON(!PageSlab(page));
-	return page->slab_page;
-}
-
-static inline void *index_to_obj(struct kmem_cache *cache, struct slab *slab,
+static inline void *index_to_obj(struct kmem_cache *cache, struct page *page,
 				 unsigned int idx)
 {
-	return slab->s_mem + cache->size * idx;
+	return page->s_mem + cache->size * idx;
 }
 
 /*
@@ -426,9 +403,9 @@ static inline void *index_to_obj(struct kmem_cache *cache, struct slab *slab,
  *   reciprocal_divide(offset, cache->reciprocal_buffer_size)
  */
 static inline unsigned int obj_to_index(const struct kmem_cache *cache,
-					const struct slab *slab, void *obj)
+					const struct page *page, void *obj)
 {
-	u32 offset = (obj - slab->s_mem);
+	u32 offset = (obj - page->s_mem);
 	return reciprocal_divide(offset, cache->reciprocal_buffer_size);
 }
 
@@ -590,7 +567,7 @@ static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep)
 
 static size_t slab_mgmt_size(size_t nr_objs, size_t align)
 {
-	return ALIGN(sizeof(struct slab)+nr_objs*sizeof(unsigned int), align);
+	return ALIGN(nr_objs * sizeof(unsigned int), align);
 }
 
 /*
@@ -609,7 +586,6 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
 	 * on it. For the latter case, the memory allocated for a
 	 * slab is used for:
 	 *
-	 * - The struct slab
 	 * - One unsigned int for each object
 	 * - Padding to respect alignment of @align
 	 * - @buffer_size bytes for each object
@@ -632,8 +608,7 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
 		 * into the memory allocation when taking the padding
 		 * into account.
 		 */
-		nr_objs = (slab_size - sizeof(struct slab)) /
-			  (buffer_size + sizeof(unsigned int));
+		nr_objs = (slab_size) / (buffer_size + sizeof(unsigned int));
 
 		/*
 		 * This calculated number will be either the right
@@ -773,11 +748,11 @@ static struct array_cache *alloc_arraycache(int node, int entries,
 	return nc;
 }
 
-static inline bool is_slab_pfmemalloc(struct slab *slabp)
+static inline bool is_slab_pfmemalloc(struct page *page)
 {
-	struct page *page = virt_to_page(slabp->s_mem);
+	struct page *mem_page = virt_to_page(page->s_mem);
 
-	return PageSlabPfmemalloc(page);
+	return PageSlabPfmemalloc(mem_page);
 }
 
 /* Clears pfmemalloc_active if no slabs have pfmalloc set */
@@ -785,23 +760,23 @@ static void recheck_pfmemalloc_active(struct kmem_cache *cachep,
 						struct array_cache *ac)
 {
 	struct kmem_cache_node *n = cachep->node[numa_mem_id()];
-	struct slab *slabp;
+	struct page *page;
 	unsigned long flags;
 
 	if (!pfmemalloc_active)
 		return;
 
 	spin_lock_irqsave(&n->list_lock, flags);
-	list_for_each_entry(slabp, &n->slabs_full, list)
-		if (is_slab_pfmemalloc(slabp))
+	list_for_each_entry(page, &n->slabs_full, lru)
+		if (is_slab_pfmemalloc(page))
 			goto out;
 
-	list_for_each_entry(slabp, &n->slabs_partial, list)
-		if (is_slab_pfmemalloc(slabp))
+	list_for_each_entry(page, &n->slabs_partial, lru)
+		if (is_slab_pfmemalloc(page))
 			goto out;
 
-	list_for_each_entry(slabp, &n->slabs_free, list)
-		if (is_slab_pfmemalloc(slabp))
+	list_for_each_entry(page, &n->slabs_free, lru)
+		if (is_slab_pfmemalloc(page))
 			goto out;
 
 	pfmemalloc_active = false;
@@ -841,8 +816,8 @@ static void *__ac_get_obj(struct kmem_cache *cachep, struct array_cache *ac,
 		 */
 		n = cachep->node[numa_mem_id()];
 		if (!list_empty(&n->slabs_free) && force_refill) {
-			struct slab *slabp = virt_to_slab(objp);
-			ClearPageSlabPfmemalloc(virt_to_head_page(slabp->s_mem));
+			struct page *page = virt_to_head_page(objp);
+			ClearPageSlabPfmemalloc(virt_to_head_page(page->s_mem));
 			clear_obj_pfmemalloc(&objp);
 			recheck_pfmemalloc_active(cachep, ac);
 			return objp;
@@ -874,9 +849,9 @@ static void *__ac_put_obj(struct kmem_cache *cachep, struct array_cache *ac,
 {
 	if (unlikely(pfmemalloc_active)) {
 		/* Some pfmemalloc slabs exist, check if this is one */
-		struct slab *slabp = virt_to_slab(objp);
-		struct page *page = virt_to_head_page(slabp->s_mem);
-		if (PageSlabPfmemalloc(page))
+		struct page *page = virt_to_head_page(objp);
+		struct page *mem_page = virt_to_head_page(page->s_mem);
+		if (PageSlabPfmemalloc(mem_page))
 			set_obj_pfmemalloc(&objp);
 	}
 
@@ -1627,7 +1602,7 @@ static noinline void
 slab_out_of_memory(struct kmem_cache *cachep, gfp_t gfpflags, int nodeid)
 {
 	struct kmem_cache_node *n;
-	struct slab *slabp;
+	struct page *page;
 	unsigned long flags;
 	int node;
 
@@ -1646,15 +1621,15 @@ slab_out_of_memory(struct kmem_cache *cachep, gfp_t gfpflags, int nodeid)
 			continue;
 
 		spin_lock_irqsave(&n->list_lock, flags);
-		list_for_each_entry(slabp, &n->slabs_full, list) {
+		list_for_each_entry(page, &n->slabs_full, lru) {
 			active_objs += cachep->num;
 			active_slabs++;
 		}
-		list_for_each_entry(slabp, &n->slabs_partial, list) {
-			active_objs += slabp->active;
+		list_for_each_entry(page, &n->slabs_partial, lru) {
+			active_objs += page->active;
 			active_slabs++;
 		}
-		list_for_each_entry(slabp, &n->slabs_free, list)
+		list_for_each_entry(page, &n->slabs_free, lru)
 			num_slabs++;
 
 		free_objects += n->free_objects;
@@ -1740,6 +1715,8 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
 	BUG_ON(!PageSlab(page));
 	__ClearPageSlabPfmemalloc(page);
 	__ClearPageSlab(page);
+	page_mapcount_reset(page);
+	page->mapping = NULL;
 
 	memcg_release_pages(cachep, cachep->gfporder);
 	if (current->reclaim_state)
@@ -1904,19 +1881,19 @@ static void check_poison_obj(struct kmem_cache *cachep, void *objp)
 		/* Print some data about the neighboring objects, if they
 		 * exist:
 		 */
-		struct slab *slabp = virt_to_slab(objp);
+		struct page *page = virt_to_head_page(objp);
 		unsigned int objnr;
 
-		objnr = obj_to_index(cachep, slabp, objp);
+		objnr = obj_to_index(cachep, page, objp);
 		if (objnr) {
-			objp = index_to_obj(cachep, slabp, objnr - 1);
+			objp = index_to_obj(cachep, page, objnr - 1);
 			realobj = (char *)objp + obj_offset(cachep);
 			printk(KERN_ERR "Prev obj: start=%p, len=%d\n",
 			       realobj, size);
 			print_objinfo(cachep, objp, 2);
 		}
 		if (objnr + 1 < cachep->num) {
-			objp = index_to_obj(cachep, slabp, objnr + 1);
+			objp = index_to_obj(cachep, page, objnr + 1);
 			realobj = (char *)objp + obj_offset(cachep);
 			printk(KERN_ERR "Next obj: start=%p, len=%d\n",
 			       realobj, size);
@@ -1927,11 +1904,12 @@ static void check_poison_obj(struct kmem_cache *cachep, void *objp)
 #endif
 
 #if DEBUG
-static void slab_destroy_debugcheck(struct kmem_cache *cachep, struct slab *slabp)
+static void slab_destroy_debugcheck(struct kmem_cache *cachep,
+						struct page *page)
 {
 	int i;
 	for (i = 0; i < cachep->num; i++) {
-		void *objp = index_to_obj(cachep, slabp, i);
+		void *objp = index_to_obj(cachep, page, i);
 
 		if (cachep->flags & SLAB_POISON) {
 #ifdef CONFIG_DEBUG_PAGEALLOC
@@ -1956,7 +1934,8 @@ static void slab_destroy_debugcheck(struct kmem_cache *cachep, struct slab *slab
 	}
 }
 #else
-static void slab_destroy_debugcheck(struct kmem_cache *cachep, struct slab *slabp)
+static void slab_destroy_debugcheck(struct kmem_cache *cachep,
+						struct page *page)
 {
 }
 #endif
@@ -1970,11 +1949,12 @@ static void slab_destroy_debugcheck(struct kmem_cache *cachep, struct slab *slab
  * Before calling the slab must have been unlinked from the cache.  The
  * cache-lock is not held/needed.
  */
-static void slab_destroy(struct kmem_cache *cachep, struct slab *slabp)
+static void slab_destroy(struct kmem_cache *cachep, struct page *page)
 {
-	struct page *page = virt_to_head_page(slabp->s_mem);
+	struct freelist *freelist;
 
-	slab_destroy_debugcheck(cachep, slabp);
+	freelist = page->freelist;
+	slab_destroy_debugcheck(cachep, page);
 	if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU)) {
 		struct rcu_head *head;
 
@@ -1986,11 +1966,11 @@ static void slab_destroy(struct kmem_cache *cachep, struct slab *slabp)
 		kmem_freepages(cachep, page);
 
 	/*
-	 * From now on, we don't use slab management
+	 * From now on, we don't use freelist
 	 * although actual page will be freed in rcu context.
 	 */
 	if (OFF_SLAB(cachep))
-		kmem_cache_free(cachep->slabp_cache, slabp);
+		kmem_cache_free(cachep->freelist_cache, freelist);
 }
 
 /**
@@ -2027,7 +2007,7 @@ static size_t calculate_slab_order(struct kmem_cache *cachep,
 			 * use off-slab slabs. Needed to avoid a possible
 			 * looping condition in cache_grow().
 			 */
-			offslab_limit = size - sizeof(struct slab);
+			offslab_limit = size;
 			offslab_limit /= sizeof(unsigned int);
 
  			if (num > offslab_limit)
@@ -2150,7 +2130,7 @@ static int __init_refok setup_cpu_cache(struct kmem_cache *cachep, gfp_t gfp)
 int
 __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 {
-	size_t left_over, slab_size, ralign;
+	size_t left_over, freelist_size, ralign;
 	gfp_t gfp;
 	int err;
 	size_t size = cachep->size;
@@ -2269,22 +2249,21 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	if (!cachep->num)
 		return -E2BIG;
 
-	slab_size = ALIGN(cachep->num * sizeof(unsigned int)
-			  + sizeof(struct slab), cachep->align);
+	freelist_size =
+		ALIGN(cachep->num * sizeof(unsigned int), cachep->align);
 
 	/*
 	 * If the slab has been placed off-slab, and we have enough space then
 	 * move it on-slab. This is at the expense of any extra colouring.
 	 */
-	if (flags & CFLGS_OFF_SLAB && left_over >= slab_size) {
+	if (flags & CFLGS_OFF_SLAB && left_over >= freelist_size) {
 		flags &= ~CFLGS_OFF_SLAB;
-		left_over -= slab_size;
+		left_over -= freelist_size;
 	}
 
 	if (flags & CFLGS_OFF_SLAB) {
 		/* really off slab. No need for manual alignment */
-		slab_size =
-		    cachep->num * sizeof(unsigned int) + sizeof(struct slab);
+		freelist_size = cachep->num * sizeof(unsigned int);
 
 #ifdef CONFIG_PAGE_POISONING
 		/* If we're going to use the generic kernel_map_pages()
@@ -2301,7 +2280,7 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	if (cachep->colour_off < cachep->align)
 		cachep->colour_off = cachep->align;
 	cachep->colour = left_over / cachep->colour_off;
-	cachep->slab_size = slab_size;
+	cachep->freelist_size = freelist_size;
 	cachep->flags = flags;
 	cachep->allocflags = __GFP_COMP;
 	if (CONFIG_ZONE_DMA_FLAG && (flags & SLAB_CACHE_DMA))
@@ -2310,7 +2289,7 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	cachep->reciprocal_buffer_size = reciprocal_value(size);
 
 	if (flags & CFLGS_OFF_SLAB) {
-		cachep->slabp_cache = kmalloc_slab(slab_size, 0u);
+		cachep->freelist_cache = kmalloc_slab(freelist_size, 0u);
 		/*
 		 * This is a possibility for one of the malloc_sizes caches.
 		 * But since we go off slab only for object size greater than
@@ -2318,7 +2297,7 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 		 * this should not happen at all.
 		 * But leave a BUG_ON for some lucky dude.
 		 */
-		BUG_ON(ZERO_OR_NULL_PTR(cachep->slabp_cache));
+		BUG_ON(ZERO_OR_NULL_PTR(cachep->freelist_cache));
 	}
 
 	err = setup_cpu_cache(cachep, gfp);
@@ -2424,7 +2403,7 @@ static int drain_freelist(struct kmem_cache *cache,
 {
 	struct list_head *p;
 	int nr_freed;
-	struct slab *slabp;
+	struct page *page;
 
 	nr_freed = 0;
 	while (nr_freed < tofree && !list_empty(&n->slabs_free)) {
@@ -2436,18 +2415,18 @@ static int drain_freelist(struct kmem_cache *cache,
 			goto out;
 		}
 
-		slabp = list_entry(p, struct slab, list);
+		page = list_entry(p, struct page, lru);
 #if DEBUG
-		BUG_ON(slabp->active);
+		BUG_ON(page->active);
 #endif
-		list_del(&slabp->list);
+		list_del(&page->lru);
 		/*
 		 * Safe to drop the lock. The slab is no longer linked
 		 * to the cache.
 		 */
 		n->free_objects -= cache->num;
 		spin_unlock_irq(&n->list_lock);
-		slab_destroy(cache, slabp);
+		slab_destroy(cache, page);
 		nr_freed++;
 	}
 out:
@@ -2530,18 +2509,18 @@ int __kmem_cache_shutdown(struct kmem_cache *cachep)
  * descriptors in kmem_cache_create, we search through the malloc_sizes array.
  * If we are creating a malloc_sizes cache here it would not be visible to
  * kmem_find_general_cachep till the initialization is complete.
- * Hence we cannot have slabp_cache same as the original cache.
+ * Hence we cannot have freelist_cache same as the original cache.
  */
-static struct slab *alloc_slabmgmt(struct kmem_cache *cachep,
+static struct freelist *alloc_slabmgmt(struct kmem_cache *cachep,
 				   struct page *page, int colour_off,
 				   gfp_t local_flags, int nodeid)
 {
-	struct slab *slabp;
+	struct freelist *freelist;
 	void *addr = page_address(page);
 
 	if (OFF_SLAB(cachep)) {
 		/* Slab management obj is off-slab. */
-		slabp = kmem_cache_alloc_node(cachep->slabp_cache,
+		freelist = kmem_cache_alloc_node(cachep->freelist_cache,
 					      local_flags, nodeid);
 		/*
 		 * If the first object in the slab is leaked (it's allocated
@@ -2549,31 +2528,31 @@ static struct slab *alloc_slabmgmt(struct kmem_cache *cachep,
 		 * kmemleak does not treat the ->s_mem pointer as a reference
 		 * to the object. Otherwise we will not report the leak.
 		 */
-		kmemleak_scan_area(&slabp->list, sizeof(struct list_head),
+		kmemleak_scan_area(&page->lru, sizeof(struct list_head),
 				   local_flags);
-		if (!slabp)
+		if (!freelist)
 			return NULL;
 	} else {
-		slabp = addr + colour_off;
-		colour_off += cachep->slab_size;
+		freelist = addr + colour_off;
+		colour_off += cachep->freelist_size;
 	}
-	slabp->active = 0;
-	slabp->s_mem = addr + colour_off;
-	return slabp;
+	page->active = 0;
+	page->s_mem = addr + colour_off;
+	return freelist;
 }
 
-static inline unsigned int *slab_bufctl(struct slab *slabp)
+static inline unsigned int *slab_bufctl(struct page *page)
 {
-	return (unsigned int *) (slabp + 1);
+	return (unsigned int *)(page->freelist);
 }
 
 static void cache_init_objs(struct kmem_cache *cachep,
-			    struct slab *slabp)
+			    struct page *page)
 {
 	int i;
 
 	for (i = 0; i < cachep->num; i++) {
-		void *objp = index_to_obj(cachep, slabp, i);
+		void *objp = index_to_obj(cachep, page, i);
 #if DEBUG
 		/* need to poison the objs? */
 		if (cachep->flags & SLAB_POISON)
@@ -2609,7 +2588,7 @@ static void cache_init_objs(struct kmem_cache *cachep,
 		if (cachep->ctor)
 			cachep->ctor(objp);
 #endif
-		slab_bufctl(slabp)[i] = i;
+		slab_bufctl(page)[i] = i;
 	}
 }
 
@@ -2623,13 +2602,13 @@ static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
 	}
 }
 
-static void *slab_get_obj(struct kmem_cache *cachep, struct slab *slabp,
+static void *slab_get_obj(struct kmem_cache *cachep, struct page *page,
 				int nodeid)
 {
 	void *objp;
 
-	objp = index_to_obj(cachep, slabp, slab_bufctl(slabp)[slabp->active]);
-	slabp->active++;
+	objp = index_to_obj(cachep, page, slab_bufctl(page)[page->active]);
+	page->active++;
 #if DEBUG
 	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
 #endif
@@ -2637,10 +2616,10 @@ static void *slab_get_obj(struct kmem_cache *cachep, struct slab *slabp,
 	return objp;
 }
 
-static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
+static void slab_put_obj(struct kmem_cache *cachep, struct page *page,
 				void *objp, int nodeid)
 {
-	unsigned int objnr = obj_to_index(cachep, slabp, objp);
+	unsigned int objnr = obj_to_index(cachep, page, objp);
 #if DEBUG
 	unsigned int i;
 
@@ -2648,16 +2627,16 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
 	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
 
 	/* Verify double free bug */
-	for (i = slabp->active; i < cachep->num; i++) {
-		if (slab_bufctl(slabp)[i] == objnr) {
+	for (i = page->active; i < cachep->num; i++) {
+		if (slab_bufctl(page)[i] == objnr) {
 			printk(KERN_ERR "slab: double free detected in cache "
 					"'%s', objp %p\n", cachep->name, objp);
 			BUG();
 		}
 	}
 #endif
-	slabp->active--;
-	slab_bufctl(slabp)[slabp->active] = objnr;
+	page->active--;
+	slab_bufctl(page)[page->active] = objnr;
 }
 
 /*
@@ -2665,11 +2644,11 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
  * for the slab allocator to be able to lookup the cache and slab of a
  * virtual address for kfree, ksize, and slab debugging.
  */
-static void slab_map_pages(struct kmem_cache *cache, struct slab *slab,
-			   struct page *page)
+static void slab_map_pages(struct kmem_cache *cache, struct page *page,
+			   struct freelist *freelist)
 {
 	page->slab_cache = cache;
-	page->slab_page = slab;
+	page->freelist = freelist;
 }
 
 /*
@@ -2679,7 +2658,7 @@ static void slab_map_pages(struct kmem_cache *cache, struct slab *slab,
 static int cache_grow(struct kmem_cache *cachep,
 		gfp_t flags, int nodeid, struct page *page)
 {
-	struct slab *slabp;
+	struct freelist *freelist;
 	size_t offset;
 	gfp_t local_flags;
 	struct kmem_cache_node *n;
@@ -2726,14 +2705,14 @@ static int cache_grow(struct kmem_cache *cachep,
 		goto failed;
 
 	/* Get slab management. */
-	slabp = alloc_slabmgmt(cachep, page, offset,
+	freelist = alloc_slabmgmt(cachep, page, offset,
 			local_flags & ~GFP_CONSTRAINT_MASK, nodeid);
-	if (!slabp)
+	if (!freelist)
 		goto opps1;
 
-	slab_map_pages(cachep, slabp, page);
+	slab_map_pages(cachep, page, freelist);
 
-	cache_init_objs(cachep, slabp);
+	cache_init_objs(cachep, page);
 
 	if (local_flags & __GFP_WAIT)
 		local_irq_disable();
@@ -2741,7 +2720,7 @@ static int cache_grow(struct kmem_cache *cachep,
 	spin_lock(&n->list_lock);
 
 	/* Make slab active. */
-	list_add_tail(&slabp->list, &(n->slabs_free));
+	list_add_tail(&page->lru, &(n->slabs_free));
 	STATS_INC_GROWN(cachep);
 	n->free_objects += cachep->num;
 	spin_unlock(&n->list_lock);
@@ -2796,13 +2775,13 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
 				   unsigned long caller)
 {
 	unsigned int objnr;
-	struct slab *slabp;
+	struct page *page;
 
 	BUG_ON(virt_to_cache(objp) != cachep);
 
 	objp -= obj_offset(cachep);
 	kfree_debugcheck(objp);
-	slabp = virt_to_slab(objp);
+	page = virt_to_head_page(objp);
 
 	if (cachep->flags & SLAB_RED_ZONE) {
 		verify_redzone_free(cachep, objp);
@@ -2812,10 +2791,10 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
 	if (cachep->flags & SLAB_STORE_USER)
 		*dbg_userword(cachep, objp) = (void *)caller;
 
-	objnr = obj_to_index(cachep, slabp, objp);
+	objnr = obj_to_index(cachep, page, objp);
 
 	BUG_ON(objnr >= cachep->num);
-	BUG_ON(objp != index_to_obj(cachep, slabp, objnr));
+	BUG_ON(objp != index_to_obj(cachep, page, objnr));
 
 	if (cachep->flags & SLAB_POISON) {
 #ifdef CONFIG_DEBUG_PAGEALLOC
@@ -2874,7 +2853,7 @@ retry:
 
 	while (batchcount > 0) {
 		struct list_head *entry;
-		struct slab *slabp;
+		struct page *page;
 		/* Get slab alloc is to come from. */
 		entry = n->slabs_partial.next;
 		if (entry == &n->slabs_partial) {
@@ -2884,7 +2863,7 @@ retry:
 				goto must_grow;
 		}
 
-		slabp = list_entry(entry, struct slab, list);
+		page = list_entry(entry, struct page, lru);
 		check_spinlock_acquired(cachep);
 
 		/*
@@ -2892,23 +2871,23 @@ retry:
 		 * there must be at least one object available for
 		 * allocation.
 		 */
-		BUG_ON(slabp->active >= cachep->num);
+		BUG_ON(page->active >= cachep->num);
 
-		while (slabp->active < cachep->num && batchcount--) {
+		while (page->active < cachep->num && batchcount--) {
 			STATS_INC_ALLOCED(cachep);
 			STATS_INC_ACTIVE(cachep);
 			STATS_SET_HIGH(cachep);
 
-			ac_put_obj(cachep, ac, slab_get_obj(cachep, slabp,
+			ac_put_obj(cachep, ac, slab_get_obj(cachep, page,
 									node));
 		}
 
 		/* move slabp to correct slabp list: */
-		list_del(&slabp->list);
-		if (slabp->active == cachep->num)
-			list_add(&slabp->list, &n->slabs_full);
+		list_del(&page->lru);
+		if (page->active == cachep->num)
+			list_add(&page->list, &n->slabs_full);
 		else
-			list_add(&slabp->list, &n->slabs_partial);
+			list_add(&page->list, &n->slabs_partial);
 	}
 
 must_grow:
@@ -3163,7 +3142,7 @@ static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
 				int nodeid)
 {
 	struct list_head *entry;
-	struct slab *slabp;
+	struct page *page;
 	struct kmem_cache_node *n;
 	void *obj;
 	int x;
@@ -3183,24 +3162,24 @@ retry:
 			goto must_grow;
 	}
 
-	slabp = list_entry(entry, struct slab, list);
+	page = list_entry(entry, struct page, lru);
 	check_spinlock_acquired_node(cachep, nodeid);
 
 	STATS_INC_NODEALLOCS(cachep);
 	STATS_INC_ACTIVE(cachep);
 	STATS_SET_HIGH(cachep);
 
-	BUG_ON(slabp->active == cachep->num);
+	BUG_ON(page->active == cachep->num);
 
-	obj = slab_get_obj(cachep, slabp, nodeid);
+	obj = slab_get_obj(cachep, page, nodeid);
 	n->free_objects--;
 	/* move slabp to correct slabp list: */
-	list_del(&slabp->list);
+	list_del(&page->lru);
 
-	if (slabp->active == cachep->num)
-		list_add(&slabp->list, &n->slabs_full);
+	if (page->active == cachep->num)
+		list_add(&page->lru, &n->slabs_full);
 	else
-		list_add(&slabp->list, &n->slabs_partial);
+		list_add(&page->lru, &n->slabs_partial);
 
 	spin_unlock(&n->list_lock);
 	goto done;
@@ -3362,21 +3341,21 @@ static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects,
 
 	for (i = 0; i < nr_objects; i++) {
 		void *objp;
-		struct slab *slabp;
+		struct page *page;
 
 		clear_obj_pfmemalloc(&objpp[i]);
 		objp = objpp[i];
 
-		slabp = virt_to_slab(objp);
+		page = virt_to_head_page(objp);
 		n = cachep->node[node];
-		list_del(&slabp->list);
+		list_del(&page->lru);
 		check_spinlock_acquired_node(cachep, node);
-		slab_put_obj(cachep, slabp, objp, node);
+		slab_put_obj(cachep, page, objp, node);
 		STATS_DEC_ACTIVE(cachep);
 		n->free_objects++;
 
 		/* fixup slab chains */
-		if (slabp->active == 0) {
+		if (page->active == 0) {
 			if (n->free_objects > n->free_limit) {
 				n->free_objects -= cachep->num;
 				/* No need to drop any previously held
@@ -3385,16 +3364,16 @@ static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects,
 				 * a different cache, refer to comments before
 				 * alloc_slabmgmt.
 				 */
-				slab_destroy(cachep, slabp);
+				slab_destroy(cachep, page);
 			} else {
-				list_add(&slabp->list, &n->slabs_free);
+				list_add(&page->lru, &n->slabs_free);
 			}
 		} else {
 			/* Unconditionally move a slab to the end of the
 			 * partial list on free - maximum time for the
 			 * other objects to be freed, too.
 			 */
-			list_add_tail(&slabp->list, &n->slabs_partial);
+			list_add_tail(&page->lru, &n->slabs_partial);
 		}
 	}
 }
@@ -3434,10 +3413,10 @@ free_done:
 
 		p = n->slabs_free.next;
 		while (p != &(n->slabs_free)) {
-			struct slab *slabp;
+			struct page *page;
 
-			slabp = list_entry(p, struct slab, list);
-			BUG_ON(slabp->active);
+			page = list_entry(p, struct page, lru);
+			BUG_ON(page->active);
 
 			i++;
 			p = p->next;
@@ -4030,7 +4009,7 @@ out:
 #ifdef CONFIG_SLABINFO
 void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)
 {
-	struct slab *slabp;
+	struct page *page;
 	unsigned long active_objs;
 	unsigned long num_objs;
 	unsigned long active_slabs = 0;
@@ -4050,22 +4029,22 @@ void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)
 		check_irq_on();
 		spin_lock_irq(&n->list_lock);
 
-		list_for_each_entry(slabp, &n->slabs_full, list) {
-			if (slabp->active != cachep->num && !error)
+		list_for_each_entry(page, &n->slabs_full, lru) {
+			if (page->active != cachep->num && !error)
 				error = "slabs_full accounting error";
 			active_objs += cachep->num;
 			active_slabs++;
 		}
-		list_for_each_entry(slabp, &n->slabs_partial, list) {
-			if (slabp->active == cachep->num && !error)
+		list_for_each_entry(page, &n->slabs_partial, lru) {
+			if (page->active == cachep->num && !error)
 				error = "slabs_partial accounting error";
-			if (!slabp->active && !error)
+			if (!page->active && !error)
 				error = "slabs_partial accounting error";
-			active_objs += slabp->active;
+			active_objs += page->active;
 			active_slabs++;
 		}
-		list_for_each_entry(slabp, &n->slabs_free, list) {
-			if (slabp->active && !error)
+		list_for_each_entry(page, &n->slabs_free, lru) {
+			if (page->active && !error)
 				error = "slabs_free accounting error";
 			num_slabs++;
 		}
@@ -4218,19 +4197,20 @@ static inline int add_caller(unsigned long *n, unsigned long v)
 	return 1;
 }
 
-static void handle_slab(unsigned long *n, struct kmem_cache *c, struct slab *s)
+static void handle_slab(unsigned long *n, struct kmem_cache *c,
+						struct page *page)
 {
 	void *p;
 	int i, j;
 
 	if (n[0] == n[1])
 		return;
-	for (i = 0, p = s->s_mem; i < c->num; i++, p += c->size) {
+	for (i = 0, p = page->s_mem; i < c->num; i++, p += c->size) {
 		bool active = true;
 
-		for (j = s->active; j < c->num; j++) {
+		for (j = page->active; j < c->num; j++) {
 			/* Skip freed item */
-			if (slab_bufctl(s)[j] == i) {
+			if (slab_bufctl(page)[j] == i) {
 				active = false;
 				break;
 			}
@@ -4262,7 +4242,7 @@ static void show_symbol(struct seq_file *m, unsigned long address)
 static int leaks_show(struct seq_file *m, void *p)
 {
 	struct kmem_cache *cachep = list_entry(p, struct kmem_cache, list);
-	struct slab *slabp;
+	struct page *page;
 	struct kmem_cache_node *n;
 	const char *name;
 	unsigned long *x = m->private;
@@ -4286,10 +4266,10 @@ static int leaks_show(struct seq_file *m, void *p)
 		check_irq_on();
 		spin_lock_irq(&n->list_lock);
 
-		list_for_each_entry(slabp, &n->slabs_full, list)
-			handle_slab(x, cachep, slabp);
-		list_for_each_entry(slabp, &n->slabs_partial, list)
-			handle_slab(x, cachep, slabp);
+		list_for_each_entry(page, &n->slabs_full, lru)
+			handle_slab(x, cachep, page);
+		list_for_each_entry(page, &n->slabs_partial, lru)
+			handle_slab(x, cachep, page);
 		spin_unlock_irq(&n->list_lock);
 	}
 	name = cachep->name;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 14/16] slab: use struct page for slab management
@ 2013-08-22  8:44   ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

Now, there are a few field in struct slab, so we can overload these
over struct page. This will save some memory and reduce cache footprint.

After this change, slabp_cache and slab_size no longer related to
a struct slab, so rename them as freelist_cache and freelist_size.

These changes are just mechanical ones and there is no functional change.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index ace9a5f..66ee577 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -42,18 +42,22 @@ struct page {
 	/* First double word block */
 	unsigned long flags;		/* Atomic flags, some possibly
 					 * updated asynchronously */
-	struct address_space *mapping;	/* If low bit clear, points to
-					 * inode address_space, or NULL.
-					 * If page mapped as anonymous
-					 * memory, low bit is set, and
-					 * it points to anon_vma object:
-					 * see PAGE_MAPPING_ANON below.
-					 */
+	union {
+		struct address_space *mapping;	/* If low bit clear, points to
+						 * inode address_space, or NULL.
+						 * If page mapped as anonymous
+						 * memory, low bit is set, and
+						 * it points to anon_vma object:
+						 * see PAGE_MAPPING_ANON below.
+						 */
+		void *s_mem;			/* slab first object */
+	};
+
 	/* Second double word */
 	struct {
 		union {
 			pgoff_t index;		/* Our offset within mapping. */
-			void *freelist;		/* slub/slob first free object */
+			void *freelist;		/* sl[aou]b first free object */
 			bool pfmemalloc;	/* If set by the page allocator,
 						 * ALLOC_NO_WATERMARKS was set
 						 * and the low watermark was not
@@ -109,6 +113,7 @@ struct page {
 				};
 				atomic_t _count;		/* Usage count, see below. */
 			};
+			unsigned int active;	/* SLAB */
 		};
 	};
 
diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index cd40158..ca82e8f 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -41,8 +41,8 @@ struct kmem_cache {
 
 	size_t colour;			/* cache colouring range */
 	unsigned int colour_off;	/* colour offset */
-	struct kmem_cache *slabp_cache;
-	unsigned int slab_size;
+	struct kmem_cache *freelist_cache;
+	unsigned int freelist_size;
 
 	/* constructor func */
 	void (*ctor)(void *obj);
diff --git a/mm/slab.c b/mm/slab.c
index 9dcbb22..cf39309 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -164,21 +164,6 @@
 static bool pfmemalloc_active __read_mostly;
 
 /*
- * struct slab
- *
- * Manages the objs in a slab. Placed either at the beginning of mem allocated
- * for a slab, or allocated from an general cache.
- * Slabs are chained into three list: fully used, partial, fully free slabs.
- */
-struct slab {
-	struct {
-		struct list_head list;
-		void *s_mem;		/* including colour offset */
-		unsigned int active;	/* num of objs active in slab */
-	};
-};
-
-/*
  * struct array_cache
  *
  * Purpose:
@@ -405,18 +390,10 @@ static inline struct kmem_cache *virt_to_cache(const void *obj)
 	return page->slab_cache;
 }
 
-static inline struct slab *virt_to_slab(const void *obj)
-{
-	struct page *page = virt_to_head_page(obj);
-
-	VM_BUG_ON(!PageSlab(page));
-	return page->slab_page;
-}
-
-static inline void *index_to_obj(struct kmem_cache *cache, struct slab *slab,
+static inline void *index_to_obj(struct kmem_cache *cache, struct page *page,
 				 unsigned int idx)
 {
-	return slab->s_mem + cache->size * idx;
+	return page->s_mem + cache->size * idx;
 }
 
 /*
@@ -426,9 +403,9 @@ static inline void *index_to_obj(struct kmem_cache *cache, struct slab *slab,
  *   reciprocal_divide(offset, cache->reciprocal_buffer_size)
  */
 static inline unsigned int obj_to_index(const struct kmem_cache *cache,
-					const struct slab *slab, void *obj)
+					const struct page *page, void *obj)
 {
-	u32 offset = (obj - slab->s_mem);
+	u32 offset = (obj - page->s_mem);
 	return reciprocal_divide(offset, cache->reciprocal_buffer_size);
 }
 
@@ -590,7 +567,7 @@ static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep)
 
 static size_t slab_mgmt_size(size_t nr_objs, size_t align)
 {
-	return ALIGN(sizeof(struct slab)+nr_objs*sizeof(unsigned int), align);
+	return ALIGN(nr_objs * sizeof(unsigned int), align);
 }
 
 /*
@@ -609,7 +586,6 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
 	 * on it. For the latter case, the memory allocated for a
 	 * slab is used for:
 	 *
-	 * - The struct slab
 	 * - One unsigned int for each object
 	 * - Padding to respect alignment of @align
 	 * - @buffer_size bytes for each object
@@ -632,8 +608,7 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
 		 * into the memory allocation when taking the padding
 		 * into account.
 		 */
-		nr_objs = (slab_size - sizeof(struct slab)) /
-			  (buffer_size + sizeof(unsigned int));
+		nr_objs = (slab_size) / (buffer_size + sizeof(unsigned int));
 
 		/*
 		 * This calculated number will be either the right
@@ -773,11 +748,11 @@ static struct array_cache *alloc_arraycache(int node, int entries,
 	return nc;
 }
 
-static inline bool is_slab_pfmemalloc(struct slab *slabp)
+static inline bool is_slab_pfmemalloc(struct page *page)
 {
-	struct page *page = virt_to_page(slabp->s_mem);
+	struct page *mem_page = virt_to_page(page->s_mem);
 
-	return PageSlabPfmemalloc(page);
+	return PageSlabPfmemalloc(mem_page);
 }
 
 /* Clears pfmemalloc_active if no slabs have pfmalloc set */
@@ -785,23 +760,23 @@ static void recheck_pfmemalloc_active(struct kmem_cache *cachep,
 						struct array_cache *ac)
 {
 	struct kmem_cache_node *n = cachep->node[numa_mem_id()];
-	struct slab *slabp;
+	struct page *page;
 	unsigned long flags;
 
 	if (!pfmemalloc_active)
 		return;
 
 	spin_lock_irqsave(&n->list_lock, flags);
-	list_for_each_entry(slabp, &n->slabs_full, list)
-		if (is_slab_pfmemalloc(slabp))
+	list_for_each_entry(page, &n->slabs_full, lru)
+		if (is_slab_pfmemalloc(page))
 			goto out;
 
-	list_for_each_entry(slabp, &n->slabs_partial, list)
-		if (is_slab_pfmemalloc(slabp))
+	list_for_each_entry(page, &n->slabs_partial, lru)
+		if (is_slab_pfmemalloc(page))
 			goto out;
 
-	list_for_each_entry(slabp, &n->slabs_free, list)
-		if (is_slab_pfmemalloc(slabp))
+	list_for_each_entry(page, &n->slabs_free, lru)
+		if (is_slab_pfmemalloc(page))
 			goto out;
 
 	pfmemalloc_active = false;
@@ -841,8 +816,8 @@ static void *__ac_get_obj(struct kmem_cache *cachep, struct array_cache *ac,
 		 */
 		n = cachep->node[numa_mem_id()];
 		if (!list_empty(&n->slabs_free) && force_refill) {
-			struct slab *slabp = virt_to_slab(objp);
-			ClearPageSlabPfmemalloc(virt_to_head_page(slabp->s_mem));
+			struct page *page = virt_to_head_page(objp);
+			ClearPageSlabPfmemalloc(virt_to_head_page(page->s_mem));
 			clear_obj_pfmemalloc(&objp);
 			recheck_pfmemalloc_active(cachep, ac);
 			return objp;
@@ -874,9 +849,9 @@ static void *__ac_put_obj(struct kmem_cache *cachep, struct array_cache *ac,
 {
 	if (unlikely(pfmemalloc_active)) {
 		/* Some pfmemalloc slabs exist, check if this is one */
-		struct slab *slabp = virt_to_slab(objp);
-		struct page *page = virt_to_head_page(slabp->s_mem);
-		if (PageSlabPfmemalloc(page))
+		struct page *page = virt_to_head_page(objp);
+		struct page *mem_page = virt_to_head_page(page->s_mem);
+		if (PageSlabPfmemalloc(mem_page))
 			set_obj_pfmemalloc(&objp);
 	}
 
@@ -1627,7 +1602,7 @@ static noinline void
 slab_out_of_memory(struct kmem_cache *cachep, gfp_t gfpflags, int nodeid)
 {
 	struct kmem_cache_node *n;
-	struct slab *slabp;
+	struct page *page;
 	unsigned long flags;
 	int node;
 
@@ -1646,15 +1621,15 @@ slab_out_of_memory(struct kmem_cache *cachep, gfp_t gfpflags, int nodeid)
 			continue;
 
 		spin_lock_irqsave(&n->list_lock, flags);
-		list_for_each_entry(slabp, &n->slabs_full, list) {
+		list_for_each_entry(page, &n->slabs_full, lru) {
 			active_objs += cachep->num;
 			active_slabs++;
 		}
-		list_for_each_entry(slabp, &n->slabs_partial, list) {
-			active_objs += slabp->active;
+		list_for_each_entry(page, &n->slabs_partial, lru) {
+			active_objs += page->active;
 			active_slabs++;
 		}
-		list_for_each_entry(slabp, &n->slabs_free, list)
+		list_for_each_entry(page, &n->slabs_free, lru)
 			num_slabs++;
 
 		free_objects += n->free_objects;
@@ -1740,6 +1715,8 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
 	BUG_ON(!PageSlab(page));
 	__ClearPageSlabPfmemalloc(page);
 	__ClearPageSlab(page);
+	page_mapcount_reset(page);
+	page->mapping = NULL;
 
 	memcg_release_pages(cachep, cachep->gfporder);
 	if (current->reclaim_state)
@@ -1904,19 +1881,19 @@ static void check_poison_obj(struct kmem_cache *cachep, void *objp)
 		/* Print some data about the neighboring objects, if they
 		 * exist:
 		 */
-		struct slab *slabp = virt_to_slab(objp);
+		struct page *page = virt_to_head_page(objp);
 		unsigned int objnr;
 
-		objnr = obj_to_index(cachep, slabp, objp);
+		objnr = obj_to_index(cachep, page, objp);
 		if (objnr) {
-			objp = index_to_obj(cachep, slabp, objnr - 1);
+			objp = index_to_obj(cachep, page, objnr - 1);
 			realobj = (char *)objp + obj_offset(cachep);
 			printk(KERN_ERR "Prev obj: start=%p, len=%d\n",
 			       realobj, size);
 			print_objinfo(cachep, objp, 2);
 		}
 		if (objnr + 1 < cachep->num) {
-			objp = index_to_obj(cachep, slabp, objnr + 1);
+			objp = index_to_obj(cachep, page, objnr + 1);
 			realobj = (char *)objp + obj_offset(cachep);
 			printk(KERN_ERR "Next obj: start=%p, len=%d\n",
 			       realobj, size);
@@ -1927,11 +1904,12 @@ static void check_poison_obj(struct kmem_cache *cachep, void *objp)
 #endif
 
 #if DEBUG
-static void slab_destroy_debugcheck(struct kmem_cache *cachep, struct slab *slabp)
+static void slab_destroy_debugcheck(struct kmem_cache *cachep,
+						struct page *page)
 {
 	int i;
 	for (i = 0; i < cachep->num; i++) {
-		void *objp = index_to_obj(cachep, slabp, i);
+		void *objp = index_to_obj(cachep, page, i);
 
 		if (cachep->flags & SLAB_POISON) {
 #ifdef CONFIG_DEBUG_PAGEALLOC
@@ -1956,7 +1934,8 @@ static void slab_destroy_debugcheck(struct kmem_cache *cachep, struct slab *slab
 	}
 }
 #else
-static void slab_destroy_debugcheck(struct kmem_cache *cachep, struct slab *slabp)
+static void slab_destroy_debugcheck(struct kmem_cache *cachep,
+						struct page *page)
 {
 }
 #endif
@@ -1970,11 +1949,12 @@ static void slab_destroy_debugcheck(struct kmem_cache *cachep, struct slab *slab
  * Before calling the slab must have been unlinked from the cache.  The
  * cache-lock is not held/needed.
  */
-static void slab_destroy(struct kmem_cache *cachep, struct slab *slabp)
+static void slab_destroy(struct kmem_cache *cachep, struct page *page)
 {
-	struct page *page = virt_to_head_page(slabp->s_mem);
+	struct freelist *freelist;
 
-	slab_destroy_debugcheck(cachep, slabp);
+	freelist = page->freelist;
+	slab_destroy_debugcheck(cachep, page);
 	if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU)) {
 		struct rcu_head *head;
 
@@ -1986,11 +1966,11 @@ static void slab_destroy(struct kmem_cache *cachep, struct slab *slabp)
 		kmem_freepages(cachep, page);
 
 	/*
-	 * From now on, we don't use slab management
+	 * From now on, we don't use freelist
 	 * although actual page will be freed in rcu context.
 	 */
 	if (OFF_SLAB(cachep))
-		kmem_cache_free(cachep->slabp_cache, slabp);
+		kmem_cache_free(cachep->freelist_cache, freelist);
 }
 
 /**
@@ -2027,7 +2007,7 @@ static size_t calculate_slab_order(struct kmem_cache *cachep,
 			 * use off-slab slabs. Needed to avoid a possible
 			 * looping condition in cache_grow().
 			 */
-			offslab_limit = size - sizeof(struct slab);
+			offslab_limit = size;
 			offslab_limit /= sizeof(unsigned int);
 
  			if (num > offslab_limit)
@@ -2150,7 +2130,7 @@ static int __init_refok setup_cpu_cache(struct kmem_cache *cachep, gfp_t gfp)
 int
 __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 {
-	size_t left_over, slab_size, ralign;
+	size_t left_over, freelist_size, ralign;
 	gfp_t gfp;
 	int err;
 	size_t size = cachep->size;
@@ -2269,22 +2249,21 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	if (!cachep->num)
 		return -E2BIG;
 
-	slab_size = ALIGN(cachep->num * sizeof(unsigned int)
-			  + sizeof(struct slab), cachep->align);
+	freelist_size =
+		ALIGN(cachep->num * sizeof(unsigned int), cachep->align);
 
 	/*
 	 * If the slab has been placed off-slab, and we have enough space then
 	 * move it on-slab. This is at the expense of any extra colouring.
 	 */
-	if (flags & CFLGS_OFF_SLAB && left_over >= slab_size) {
+	if (flags & CFLGS_OFF_SLAB && left_over >= freelist_size) {
 		flags &= ~CFLGS_OFF_SLAB;
-		left_over -= slab_size;
+		left_over -= freelist_size;
 	}
 
 	if (flags & CFLGS_OFF_SLAB) {
 		/* really off slab. No need for manual alignment */
-		slab_size =
-		    cachep->num * sizeof(unsigned int) + sizeof(struct slab);
+		freelist_size = cachep->num * sizeof(unsigned int);
 
 #ifdef CONFIG_PAGE_POISONING
 		/* If we're going to use the generic kernel_map_pages()
@@ -2301,7 +2280,7 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	if (cachep->colour_off < cachep->align)
 		cachep->colour_off = cachep->align;
 	cachep->colour = left_over / cachep->colour_off;
-	cachep->slab_size = slab_size;
+	cachep->freelist_size = freelist_size;
 	cachep->flags = flags;
 	cachep->allocflags = __GFP_COMP;
 	if (CONFIG_ZONE_DMA_FLAG && (flags & SLAB_CACHE_DMA))
@@ -2310,7 +2289,7 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	cachep->reciprocal_buffer_size = reciprocal_value(size);
 
 	if (flags & CFLGS_OFF_SLAB) {
-		cachep->slabp_cache = kmalloc_slab(slab_size, 0u);
+		cachep->freelist_cache = kmalloc_slab(freelist_size, 0u);
 		/*
 		 * This is a possibility for one of the malloc_sizes caches.
 		 * But since we go off slab only for object size greater than
@@ -2318,7 +2297,7 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 		 * this should not happen at all.
 		 * But leave a BUG_ON for some lucky dude.
 		 */
-		BUG_ON(ZERO_OR_NULL_PTR(cachep->slabp_cache));
+		BUG_ON(ZERO_OR_NULL_PTR(cachep->freelist_cache));
 	}
 
 	err = setup_cpu_cache(cachep, gfp);
@@ -2424,7 +2403,7 @@ static int drain_freelist(struct kmem_cache *cache,
 {
 	struct list_head *p;
 	int nr_freed;
-	struct slab *slabp;
+	struct page *page;
 
 	nr_freed = 0;
 	while (nr_freed < tofree && !list_empty(&n->slabs_free)) {
@@ -2436,18 +2415,18 @@ static int drain_freelist(struct kmem_cache *cache,
 			goto out;
 		}
 
-		slabp = list_entry(p, struct slab, list);
+		page = list_entry(p, struct page, lru);
 #if DEBUG
-		BUG_ON(slabp->active);
+		BUG_ON(page->active);
 #endif
-		list_del(&slabp->list);
+		list_del(&page->lru);
 		/*
 		 * Safe to drop the lock. The slab is no longer linked
 		 * to the cache.
 		 */
 		n->free_objects -= cache->num;
 		spin_unlock_irq(&n->list_lock);
-		slab_destroy(cache, slabp);
+		slab_destroy(cache, page);
 		nr_freed++;
 	}
 out:
@@ -2530,18 +2509,18 @@ int __kmem_cache_shutdown(struct kmem_cache *cachep)
  * descriptors in kmem_cache_create, we search through the malloc_sizes array.
  * If we are creating a malloc_sizes cache here it would not be visible to
  * kmem_find_general_cachep till the initialization is complete.
- * Hence we cannot have slabp_cache same as the original cache.
+ * Hence we cannot have freelist_cache same as the original cache.
  */
-static struct slab *alloc_slabmgmt(struct kmem_cache *cachep,
+static struct freelist *alloc_slabmgmt(struct kmem_cache *cachep,
 				   struct page *page, int colour_off,
 				   gfp_t local_flags, int nodeid)
 {
-	struct slab *slabp;
+	struct freelist *freelist;
 	void *addr = page_address(page);
 
 	if (OFF_SLAB(cachep)) {
 		/* Slab management obj is off-slab. */
-		slabp = kmem_cache_alloc_node(cachep->slabp_cache,
+		freelist = kmem_cache_alloc_node(cachep->freelist_cache,
 					      local_flags, nodeid);
 		/*
 		 * If the first object in the slab is leaked (it's allocated
@@ -2549,31 +2528,31 @@ static struct slab *alloc_slabmgmt(struct kmem_cache *cachep,
 		 * kmemleak does not treat the ->s_mem pointer as a reference
 		 * to the object. Otherwise we will not report the leak.
 		 */
-		kmemleak_scan_area(&slabp->list, sizeof(struct list_head),
+		kmemleak_scan_area(&page->lru, sizeof(struct list_head),
 				   local_flags);
-		if (!slabp)
+		if (!freelist)
 			return NULL;
 	} else {
-		slabp = addr + colour_off;
-		colour_off += cachep->slab_size;
+		freelist = addr + colour_off;
+		colour_off += cachep->freelist_size;
 	}
-	slabp->active = 0;
-	slabp->s_mem = addr + colour_off;
-	return slabp;
+	page->active = 0;
+	page->s_mem = addr + colour_off;
+	return freelist;
 }
 
-static inline unsigned int *slab_bufctl(struct slab *slabp)
+static inline unsigned int *slab_bufctl(struct page *page)
 {
-	return (unsigned int *) (slabp + 1);
+	return (unsigned int *)(page->freelist);
 }
 
 static void cache_init_objs(struct kmem_cache *cachep,
-			    struct slab *slabp)
+			    struct page *page)
 {
 	int i;
 
 	for (i = 0; i < cachep->num; i++) {
-		void *objp = index_to_obj(cachep, slabp, i);
+		void *objp = index_to_obj(cachep, page, i);
 #if DEBUG
 		/* need to poison the objs? */
 		if (cachep->flags & SLAB_POISON)
@@ -2609,7 +2588,7 @@ static void cache_init_objs(struct kmem_cache *cachep,
 		if (cachep->ctor)
 			cachep->ctor(objp);
 #endif
-		slab_bufctl(slabp)[i] = i;
+		slab_bufctl(page)[i] = i;
 	}
 }
 
@@ -2623,13 +2602,13 @@ static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
 	}
 }
 
-static void *slab_get_obj(struct kmem_cache *cachep, struct slab *slabp,
+static void *slab_get_obj(struct kmem_cache *cachep, struct page *page,
 				int nodeid)
 {
 	void *objp;
 
-	objp = index_to_obj(cachep, slabp, slab_bufctl(slabp)[slabp->active]);
-	slabp->active++;
+	objp = index_to_obj(cachep, page, slab_bufctl(page)[page->active]);
+	page->active++;
 #if DEBUG
 	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
 #endif
@@ -2637,10 +2616,10 @@ static void *slab_get_obj(struct kmem_cache *cachep, struct slab *slabp,
 	return objp;
 }
 
-static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
+static void slab_put_obj(struct kmem_cache *cachep, struct page *page,
 				void *objp, int nodeid)
 {
-	unsigned int objnr = obj_to_index(cachep, slabp, objp);
+	unsigned int objnr = obj_to_index(cachep, page, objp);
 #if DEBUG
 	unsigned int i;
 
@@ -2648,16 +2627,16 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
 	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
 
 	/* Verify double free bug */
-	for (i = slabp->active; i < cachep->num; i++) {
-		if (slab_bufctl(slabp)[i] == objnr) {
+	for (i = page->active; i < cachep->num; i++) {
+		if (slab_bufctl(page)[i] == objnr) {
 			printk(KERN_ERR "slab: double free detected in cache "
 					"'%s', objp %p\n", cachep->name, objp);
 			BUG();
 		}
 	}
 #endif
-	slabp->active--;
-	slab_bufctl(slabp)[slabp->active] = objnr;
+	page->active--;
+	slab_bufctl(page)[page->active] = objnr;
 }
 
 /*
@@ -2665,11 +2644,11 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
  * for the slab allocator to be able to lookup the cache and slab of a
  * virtual address for kfree, ksize, and slab debugging.
  */
-static void slab_map_pages(struct kmem_cache *cache, struct slab *slab,
-			   struct page *page)
+static void slab_map_pages(struct kmem_cache *cache, struct page *page,
+			   struct freelist *freelist)
 {
 	page->slab_cache = cache;
-	page->slab_page = slab;
+	page->freelist = freelist;
 }
 
 /*
@@ -2679,7 +2658,7 @@ static void slab_map_pages(struct kmem_cache *cache, struct slab *slab,
 static int cache_grow(struct kmem_cache *cachep,
 		gfp_t flags, int nodeid, struct page *page)
 {
-	struct slab *slabp;
+	struct freelist *freelist;
 	size_t offset;
 	gfp_t local_flags;
 	struct kmem_cache_node *n;
@@ -2726,14 +2705,14 @@ static int cache_grow(struct kmem_cache *cachep,
 		goto failed;
 
 	/* Get slab management. */
-	slabp = alloc_slabmgmt(cachep, page, offset,
+	freelist = alloc_slabmgmt(cachep, page, offset,
 			local_flags & ~GFP_CONSTRAINT_MASK, nodeid);
-	if (!slabp)
+	if (!freelist)
 		goto opps1;
 
-	slab_map_pages(cachep, slabp, page);
+	slab_map_pages(cachep, page, freelist);
 
-	cache_init_objs(cachep, slabp);
+	cache_init_objs(cachep, page);
 
 	if (local_flags & __GFP_WAIT)
 		local_irq_disable();
@@ -2741,7 +2720,7 @@ static int cache_grow(struct kmem_cache *cachep,
 	spin_lock(&n->list_lock);
 
 	/* Make slab active. */
-	list_add_tail(&slabp->list, &(n->slabs_free));
+	list_add_tail(&page->lru, &(n->slabs_free));
 	STATS_INC_GROWN(cachep);
 	n->free_objects += cachep->num;
 	spin_unlock(&n->list_lock);
@@ -2796,13 +2775,13 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
 				   unsigned long caller)
 {
 	unsigned int objnr;
-	struct slab *slabp;
+	struct page *page;
 
 	BUG_ON(virt_to_cache(objp) != cachep);
 
 	objp -= obj_offset(cachep);
 	kfree_debugcheck(objp);
-	slabp = virt_to_slab(objp);
+	page = virt_to_head_page(objp);
 
 	if (cachep->flags & SLAB_RED_ZONE) {
 		verify_redzone_free(cachep, objp);
@@ -2812,10 +2791,10 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
 	if (cachep->flags & SLAB_STORE_USER)
 		*dbg_userword(cachep, objp) = (void *)caller;
 
-	objnr = obj_to_index(cachep, slabp, objp);
+	objnr = obj_to_index(cachep, page, objp);
 
 	BUG_ON(objnr >= cachep->num);
-	BUG_ON(objp != index_to_obj(cachep, slabp, objnr));
+	BUG_ON(objp != index_to_obj(cachep, page, objnr));
 
 	if (cachep->flags & SLAB_POISON) {
 #ifdef CONFIG_DEBUG_PAGEALLOC
@@ -2874,7 +2853,7 @@ retry:
 
 	while (batchcount > 0) {
 		struct list_head *entry;
-		struct slab *slabp;
+		struct page *page;
 		/* Get slab alloc is to come from. */
 		entry = n->slabs_partial.next;
 		if (entry == &n->slabs_partial) {
@@ -2884,7 +2863,7 @@ retry:
 				goto must_grow;
 		}
 
-		slabp = list_entry(entry, struct slab, list);
+		page = list_entry(entry, struct page, lru);
 		check_spinlock_acquired(cachep);
 
 		/*
@@ -2892,23 +2871,23 @@ retry:
 		 * there must be at least one object available for
 		 * allocation.
 		 */
-		BUG_ON(slabp->active >= cachep->num);
+		BUG_ON(page->active >= cachep->num);
 
-		while (slabp->active < cachep->num && batchcount--) {
+		while (page->active < cachep->num && batchcount--) {
 			STATS_INC_ALLOCED(cachep);
 			STATS_INC_ACTIVE(cachep);
 			STATS_SET_HIGH(cachep);
 
-			ac_put_obj(cachep, ac, slab_get_obj(cachep, slabp,
+			ac_put_obj(cachep, ac, slab_get_obj(cachep, page,
 									node));
 		}
 
 		/* move slabp to correct slabp list: */
-		list_del(&slabp->list);
-		if (slabp->active == cachep->num)
-			list_add(&slabp->list, &n->slabs_full);
+		list_del(&page->lru);
+		if (page->active == cachep->num)
+			list_add(&page->list, &n->slabs_full);
 		else
-			list_add(&slabp->list, &n->slabs_partial);
+			list_add(&page->list, &n->slabs_partial);
 	}
 
 must_grow:
@@ -3163,7 +3142,7 @@ static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
 				int nodeid)
 {
 	struct list_head *entry;
-	struct slab *slabp;
+	struct page *page;
 	struct kmem_cache_node *n;
 	void *obj;
 	int x;
@@ -3183,24 +3162,24 @@ retry:
 			goto must_grow;
 	}
 
-	slabp = list_entry(entry, struct slab, list);
+	page = list_entry(entry, struct page, lru);
 	check_spinlock_acquired_node(cachep, nodeid);
 
 	STATS_INC_NODEALLOCS(cachep);
 	STATS_INC_ACTIVE(cachep);
 	STATS_SET_HIGH(cachep);
 
-	BUG_ON(slabp->active == cachep->num);
+	BUG_ON(page->active == cachep->num);
 
-	obj = slab_get_obj(cachep, slabp, nodeid);
+	obj = slab_get_obj(cachep, page, nodeid);
 	n->free_objects--;
 	/* move slabp to correct slabp list: */
-	list_del(&slabp->list);
+	list_del(&page->lru);
 
-	if (slabp->active == cachep->num)
-		list_add(&slabp->list, &n->slabs_full);
+	if (page->active == cachep->num)
+		list_add(&page->lru, &n->slabs_full);
 	else
-		list_add(&slabp->list, &n->slabs_partial);
+		list_add(&page->lru, &n->slabs_partial);
 
 	spin_unlock(&n->list_lock);
 	goto done;
@@ -3362,21 +3341,21 @@ static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects,
 
 	for (i = 0; i < nr_objects; i++) {
 		void *objp;
-		struct slab *slabp;
+		struct page *page;
 
 		clear_obj_pfmemalloc(&objpp[i]);
 		objp = objpp[i];
 
-		slabp = virt_to_slab(objp);
+		page = virt_to_head_page(objp);
 		n = cachep->node[node];
-		list_del(&slabp->list);
+		list_del(&page->lru);
 		check_spinlock_acquired_node(cachep, node);
-		slab_put_obj(cachep, slabp, objp, node);
+		slab_put_obj(cachep, page, objp, node);
 		STATS_DEC_ACTIVE(cachep);
 		n->free_objects++;
 
 		/* fixup slab chains */
-		if (slabp->active == 0) {
+		if (page->active == 0) {
 			if (n->free_objects > n->free_limit) {
 				n->free_objects -= cachep->num;
 				/* No need to drop any previously held
@@ -3385,16 +3364,16 @@ static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects,
 				 * a different cache, refer to comments before
 				 * alloc_slabmgmt.
 				 */
-				slab_destroy(cachep, slabp);
+				slab_destroy(cachep, page);
 			} else {
-				list_add(&slabp->list, &n->slabs_free);
+				list_add(&page->lru, &n->slabs_free);
 			}
 		} else {
 			/* Unconditionally move a slab to the end of the
 			 * partial list on free - maximum time for the
 			 * other objects to be freed, too.
 			 */
-			list_add_tail(&slabp->list, &n->slabs_partial);
+			list_add_tail(&page->lru, &n->slabs_partial);
 		}
 	}
 }
@@ -3434,10 +3413,10 @@ free_done:
 
 		p = n->slabs_free.next;
 		while (p != &(n->slabs_free)) {
-			struct slab *slabp;
+			struct page *page;
 
-			slabp = list_entry(p, struct slab, list);
-			BUG_ON(slabp->active);
+			page = list_entry(p, struct page, lru);
+			BUG_ON(page->active);
 
 			i++;
 			p = p->next;
@@ -4030,7 +4009,7 @@ out:
 #ifdef CONFIG_SLABINFO
 void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)
 {
-	struct slab *slabp;
+	struct page *page;
 	unsigned long active_objs;
 	unsigned long num_objs;
 	unsigned long active_slabs = 0;
@@ -4050,22 +4029,22 @@ void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)
 		check_irq_on();
 		spin_lock_irq(&n->list_lock);
 
-		list_for_each_entry(slabp, &n->slabs_full, list) {
-			if (slabp->active != cachep->num && !error)
+		list_for_each_entry(page, &n->slabs_full, lru) {
+			if (page->active != cachep->num && !error)
 				error = "slabs_full accounting error";
 			active_objs += cachep->num;
 			active_slabs++;
 		}
-		list_for_each_entry(slabp, &n->slabs_partial, list) {
-			if (slabp->active == cachep->num && !error)
+		list_for_each_entry(page, &n->slabs_partial, lru) {
+			if (page->active == cachep->num && !error)
 				error = "slabs_partial accounting error";
-			if (!slabp->active && !error)
+			if (!page->active && !error)
 				error = "slabs_partial accounting error";
-			active_objs += slabp->active;
+			active_objs += page->active;
 			active_slabs++;
 		}
-		list_for_each_entry(slabp, &n->slabs_free, list) {
-			if (slabp->active && !error)
+		list_for_each_entry(page, &n->slabs_free, lru) {
+			if (page->active && !error)
 				error = "slabs_free accounting error";
 			num_slabs++;
 		}
@@ -4218,19 +4197,20 @@ static inline int add_caller(unsigned long *n, unsigned long v)
 	return 1;
 }
 
-static void handle_slab(unsigned long *n, struct kmem_cache *c, struct slab *s)
+static void handle_slab(unsigned long *n, struct kmem_cache *c,
+						struct page *page)
 {
 	void *p;
 	int i, j;
 
 	if (n[0] == n[1])
 		return;
-	for (i = 0, p = s->s_mem; i < c->num; i++, p += c->size) {
+	for (i = 0, p = page->s_mem; i < c->num; i++, p += c->size) {
 		bool active = true;
 
-		for (j = s->active; j < c->num; j++) {
+		for (j = page->active; j < c->num; j++) {
 			/* Skip freed item */
-			if (slab_bufctl(s)[j] == i) {
+			if (slab_bufctl(page)[j] == i) {
 				active = false;
 				break;
 			}
@@ -4262,7 +4242,7 @@ static void show_symbol(struct seq_file *m, unsigned long address)
 static int leaks_show(struct seq_file *m, void *p)
 {
 	struct kmem_cache *cachep = list_entry(p, struct kmem_cache, list);
-	struct slab *slabp;
+	struct page *page;
 	struct kmem_cache_node *n;
 	const char *name;
 	unsigned long *x = m->private;
@@ -4286,10 +4266,10 @@ static int leaks_show(struct seq_file *m, void *p)
 		check_irq_on();
 		spin_lock_irq(&n->list_lock);
 
-		list_for_each_entry(slabp, &n->slabs_full, list)
-			handle_slab(x, cachep, slabp);
-		list_for_each_entry(slabp, &n->slabs_partial, list)
-			handle_slab(x, cachep, slabp);
+		list_for_each_entry(page, &n->slabs_full, lru)
+			handle_slab(x, cachep, page);
+		list_for_each_entry(page, &n->slabs_partial, lru)
+			handle_slab(x, cachep, page);
 		spin_unlock_irq(&n->list_lock);
 	}
 	name = cachep->name;
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 15/16] slab: remove useless statement for checking pfmemalloc
  2013-08-22  8:44 ` Joonsoo Kim
@ 2013-08-22  8:44   ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

Now, virt_to_page(page->s_mem) is same as the page,
because slab use this structure for management.
So remove useless statement.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index cf39309..6abc069 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -750,9 +750,7 @@ static struct array_cache *alloc_arraycache(int node, int entries,
 
 static inline bool is_slab_pfmemalloc(struct page *page)
 {
-	struct page *mem_page = virt_to_page(page->s_mem);
-
-	return PageSlabPfmemalloc(mem_page);
+	return PageSlabPfmemalloc(page);
 }
 
 /* Clears pfmemalloc_active if no slabs have pfmalloc set */
@@ -817,7 +815,7 @@ static void *__ac_get_obj(struct kmem_cache *cachep, struct array_cache *ac,
 		n = cachep->node[numa_mem_id()];
 		if (!list_empty(&n->slabs_free) && force_refill) {
 			struct page *page = virt_to_head_page(objp);
-			ClearPageSlabPfmemalloc(virt_to_head_page(page->s_mem));
+			ClearPageSlabPfmemalloc(page);
 			clear_obj_pfmemalloc(&objp);
 			recheck_pfmemalloc_active(cachep, ac);
 			return objp;
@@ -850,8 +848,7 @@ static void *__ac_put_obj(struct kmem_cache *cachep, struct array_cache *ac,
 	if (unlikely(pfmemalloc_active)) {
 		/* Some pfmemalloc slabs exist, check if this is one */
 		struct page *page = virt_to_head_page(objp);
-		struct page *mem_page = virt_to_head_page(page->s_mem);
-		if (PageSlabPfmemalloc(mem_page))
+		if (PageSlabPfmemalloc(page))
 			set_obj_pfmemalloc(&objp);
 	}
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 15/16] slab: remove useless statement for checking pfmemalloc
@ 2013-08-22  8:44   ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

Now, virt_to_page(page->s_mem) is same as the page,
because slab use this structure for management.
So remove useless statement.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index cf39309..6abc069 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -750,9 +750,7 @@ static struct array_cache *alloc_arraycache(int node, int entries,
 
 static inline bool is_slab_pfmemalloc(struct page *page)
 {
-	struct page *mem_page = virt_to_page(page->s_mem);
-
-	return PageSlabPfmemalloc(mem_page);
+	return PageSlabPfmemalloc(page);
 }
 
 /* Clears pfmemalloc_active if no slabs have pfmalloc set */
@@ -817,7 +815,7 @@ static void *__ac_get_obj(struct kmem_cache *cachep, struct array_cache *ac,
 		n = cachep->node[numa_mem_id()];
 		if (!list_empty(&n->slabs_free) && force_refill) {
 			struct page *page = virt_to_head_page(objp);
-			ClearPageSlabPfmemalloc(virt_to_head_page(page->s_mem));
+			ClearPageSlabPfmemalloc(page);
 			clear_obj_pfmemalloc(&objp);
 			recheck_pfmemalloc_active(cachep, ac);
 			return objp;
@@ -850,8 +848,7 @@ static void *__ac_put_obj(struct kmem_cache *cachep, struct array_cache *ac,
 	if (unlikely(pfmemalloc_active)) {
 		/* Some pfmemalloc slabs exist, check if this is one */
 		struct page *page = virt_to_head_page(objp);
-		struct page *mem_page = virt_to_head_page(page->s_mem);
-		if (PageSlabPfmemalloc(mem_page))
+		if (PageSlabPfmemalloc(page))
 			set_obj_pfmemalloc(&objp);
 	}
 
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 16/16] slab: rename slab_bufctl to slab_freelist
  2013-08-22  8:44 ` Joonsoo Kim
@ 2013-08-22  8:44   ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

Now, bufctl is not proper name to this array.
So change it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index 6abc069..e8ec4c5 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2538,7 +2538,7 @@ static struct freelist *alloc_slabmgmt(struct kmem_cache *cachep,
 	return freelist;
 }
 
-static inline unsigned int *slab_bufctl(struct page *page)
+static inline unsigned int *slab_freelist(struct page *page)
 {
 	return (unsigned int *)(page->freelist);
 }
@@ -2585,7 +2585,7 @@ static void cache_init_objs(struct kmem_cache *cachep,
 		if (cachep->ctor)
 			cachep->ctor(objp);
 #endif
-		slab_bufctl(page)[i] = i;
+		slab_freelist(page)[i] = i;
 	}
 }
 
@@ -2604,7 +2604,7 @@ static void *slab_get_obj(struct kmem_cache *cachep, struct page *page,
 {
 	void *objp;
 
-	objp = index_to_obj(cachep, page, slab_bufctl(page)[page->active]);
+	objp = index_to_obj(cachep, page, slab_freelist(page)[page->active]);
 	page->active++;
 #if DEBUG
 	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
@@ -2625,7 +2625,7 @@ static void slab_put_obj(struct kmem_cache *cachep, struct page *page,
 
 	/* Verify double free bug */
 	for (i = page->active; i < cachep->num; i++) {
-		if (slab_bufctl(page)[i] == objnr) {
+		if (slab_freelist(page)[i] == objnr) {
 			printk(KERN_ERR "slab: double free detected in cache "
 					"'%s', objp %p\n", cachep->name, objp);
 			BUG();
@@ -2633,7 +2633,7 @@ static void slab_put_obj(struct kmem_cache *cachep, struct page *page,
 	}
 #endif
 	page->active--;
-	slab_bufctl(page)[page->active] = objnr;
+	slab_freelist(page)[page->active] = objnr;
 }
 
 /*
@@ -4207,7 +4207,7 @@ static void handle_slab(unsigned long *n, struct kmem_cache *c,
 
 		for (j = page->active; j < c->num; j++) {
 			/* Skip freed item */
-			if (slab_bufctl(page)[j] == i) {
+			if (slab_freelist(page)[j] == i) {
 				active = false;
 				break;
 			}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 16/16] slab: rename slab_bufctl to slab_freelist
@ 2013-08-22  8:44   ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-22  8:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

Now, bufctl is not proper name to this array.
So change it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index 6abc069..e8ec4c5 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2538,7 +2538,7 @@ static struct freelist *alloc_slabmgmt(struct kmem_cache *cachep,
 	return freelist;
 }
 
-static inline unsigned int *slab_bufctl(struct page *page)
+static inline unsigned int *slab_freelist(struct page *page)
 {
 	return (unsigned int *)(page->freelist);
 }
@@ -2585,7 +2585,7 @@ static void cache_init_objs(struct kmem_cache *cachep,
 		if (cachep->ctor)
 			cachep->ctor(objp);
 #endif
-		slab_bufctl(page)[i] = i;
+		slab_freelist(page)[i] = i;
 	}
 }
 
@@ -2604,7 +2604,7 @@ static void *slab_get_obj(struct kmem_cache *cachep, struct page *page,
 {
 	void *objp;
 
-	objp = index_to_obj(cachep, page, slab_bufctl(page)[page->active]);
+	objp = index_to_obj(cachep, page, slab_freelist(page)[page->active]);
 	page->active++;
 #if DEBUG
 	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
@@ -2625,7 +2625,7 @@ static void slab_put_obj(struct kmem_cache *cachep, struct page *page,
 
 	/* Verify double free bug */
 	for (i = page->active; i < cachep->num; i++) {
-		if (slab_bufctl(page)[i] == objnr) {
+		if (slab_freelist(page)[i] == objnr) {
 			printk(KERN_ERR "slab: double free detected in cache "
 					"'%s', objp %p\n", cachep->name, objp);
 			BUG();
@@ -2633,7 +2633,7 @@ static void slab_put_obj(struct kmem_cache *cachep, struct page *page,
 	}
 #endif
 	page->active--;
-	slab_bufctl(page)[page->active] = objnr;
+	slab_freelist(page)[page->active] = objnr;
 }
 
 /*
@@ -4207,7 +4207,7 @@ static void handle_slab(unsigned long *n, struct kmem_cache *c,
 
 		for (j = page->active; j < c->num; j++) {
 			/* Skip freed item */
-			if (slab_bufctl(page)[j] == i) {
+			if (slab_freelist(page)[j] == i) {
 				active = false;
 				break;
 			}
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* Re: [PATCH 00/16] slab: overload struct slab over struct page to reduce memory usage
  2013-08-22  8:44 ` Joonsoo Kim
@ 2013-08-22 16:47   ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-08-22 16:47 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel

On Thu, 22 Aug 2013, Joonsoo Kim wrote:

> And this patchset change a management method of free objects of a slab.
> Current free objects management method of the slab is weird, because
> it touch random position of the array of kmem_bufctl_t when we try to
> get free object. See following example.

The ordering is intentional so that the most cache hot objects are removed
first.

> To get free objects, we access this array with following pattern.
> 6 -> 3 -> 7 -> 2 -> 5 -> 4 -> 0 -> 1 -> END

Because that is the inverse order of the objects being freed.

The cache hot effect may not be that significant since per cpu and per
node queues have been aded on top. So maybe we do not be so cache aware
anymore when actually touching struct slab.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 00/16] slab: overload struct slab over struct page to reduce memory usage
@ 2013-08-22 16:47   ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-08-22 16:47 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel

On Thu, 22 Aug 2013, Joonsoo Kim wrote:

> And this patchset change a management method of free objects of a slab.
> Current free objects management method of the slab is weird, because
> it touch random position of the array of kmem_bufctl_t when we try to
> get free object. See following example.

The ordering is intentional so that the most cache hot objects are removed
first.

> To get free objects, we access this array with following pattern.
> 6 -> 3 -> 7 -> 2 -> 5 -> 4 -> 0 -> 1 -> END

Because that is the inverse order of the objects being freed.

The cache hot effect may not be that significant since per cpu and per
node queues have been aded on top. So maybe we do not be so cache aware
anymore when actually touching struct slab.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 02/16] slab: change return type of kmem_getpages() to struct page
  2013-08-22  8:44   ` Joonsoo Kim
@ 2013-08-22 17:49     ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-08-22 17:49 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel

On Thu, 22 Aug 2013, Joonsoo Kim wrote:

> @@ -2042,7 +2042,7 @@ static void slab_destroy_debugcheck(struct kmem_cache *cachep, struct slab *slab
>   */
>  static void slab_destroy(struct kmem_cache *cachep, struct slab *slabp)
>  {
> -	void *addr = slabp->s_mem - slabp->colouroff;
> +	struct page *page = virt_to_head_page(slabp->s_mem);
>
>  	slab_destroy_debugcheck(cachep, slabp);
>  	if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU)) {

Ok so this removes slab offset management. The use of a struct page
pointer therefore results in coloring support to be not possible anymore.

I would suggest to have a separate patch for coloring removal before this
patch. It seems that the support is removed in two different patches now.


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 02/16] slab: change return type of kmem_getpages() to struct page
@ 2013-08-22 17:49     ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-08-22 17:49 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel

On Thu, 22 Aug 2013, Joonsoo Kim wrote:

> @@ -2042,7 +2042,7 @@ static void slab_destroy_debugcheck(struct kmem_cache *cachep, struct slab *slab
>   */
>  static void slab_destroy(struct kmem_cache *cachep, struct slab *slabp)
>  {
> -	void *addr = slabp->s_mem - slabp->colouroff;
> +	struct page *page = virt_to_head_page(slabp->s_mem);
>
>  	slab_destroy_debugcheck(cachep, slabp);
>  	if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU)) {

Ok so this removes slab offset management. The use of a struct page
pointer therefore results in coloring support to be not possible anymore.

I would suggest to have a separate patch for coloring removal before this
patch. It seems that the support is removed in two different patches now.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 04/16] slab: remove nodeid in struct slab
  2013-08-22  8:44   ` Joonsoo Kim
@ 2013-08-22 17:51     ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-08-22 17:51 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel

On Thu, 22 Aug 2013, Joonsoo Kim wrote:

> @@ -1099,8 +1098,7 @@ static void drain_alien_cache(struct kmem_cache *cachep,
>
>  static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
>  {
> -	struct slab *slabp = virt_to_slab(objp);
> -	int nodeid = slabp->nodeid;
> +	int nodeid = page_to_nid(virt_to_page(objp));
>  	struct kmem_cache_node *n;
>  	struct array_cache *alien = NULL;
>  	int node;

virt_to_page is a relatively expensive operation. How does this affect
performance?

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 04/16] slab: remove nodeid in struct slab
@ 2013-08-22 17:51     ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-08-22 17:51 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel

On Thu, 22 Aug 2013, Joonsoo Kim wrote:

> @@ -1099,8 +1098,7 @@ static void drain_alien_cache(struct kmem_cache *cachep,
>
>  static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
>  {
> -	struct slab *slabp = virt_to_slab(objp);
> -	int nodeid = slabp->nodeid;
> +	int nodeid = page_to_nid(virt_to_page(objp));
>  	struct kmem_cache_node *n;
>  	struct array_cache *alien = NULL;
>  	int node;

virt_to_page is a relatively expensive operation. How does this affect
performance?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 05/16] slab: remove cachep in struct slab_rcu
  2013-08-22  8:44   ` Joonsoo Kim
@ 2013-08-22 17:53     ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-08-22 17:53 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel

On Thu, 22 Aug 2013, Joonsoo Kim wrote:

> We can get cachep using page in struct slab_rcu, so remove it.

Ok but this means that we need to touch struct page. Additional cacheline
in cache footprint.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 05/16] slab: remove cachep in struct slab_rcu
@ 2013-08-22 17:53     ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-08-22 17:53 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel

On Thu, 22 Aug 2013, Joonsoo Kim wrote:

> We can get cachep using page in struct slab_rcu, so remove it.

Ok but this means that we need to touch struct page. Additional cacheline
in cache footprint.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 09/16] slab: use __GFP_COMP flag for allocating slab pages
  2013-08-22  8:44   ` Joonsoo Kim
@ 2013-08-22 18:00     ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-08-22 18:00 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel

On Thu, 22 Aug 2013, Joonsoo Kim wrote:

> If we use 'struct page' of first page as 'struct slab', there is no
> advantage not to use __GFP_COMP. So use __GFP_COMP flag for all the cases.

Ok that brings it in line with SLUB and SLOB.

> @@ -2717,17 +2701,8 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
>  static void slab_map_pages(struct kmem_cache *cache, struct slab *slab,
>  			   struct page *page)
>  {
> -	int nr_pages;
> -
> -	nr_pages = 1;
> -	if (likely(!PageCompound(page)))
> -		nr_pages <<= cache->gfporder;
> -
> -	do {
> -		page->slab_cache = cache;
> -		page->slab_page = slab;
> -		page++;
> -	} while (--nr_pages);
> +	page->slab_cache = cache;
> +	page->slab_page = slab;
>  }

And saves some processing.


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 09/16] slab: use __GFP_COMP flag for allocating slab pages
@ 2013-08-22 18:00     ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-08-22 18:00 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel

On Thu, 22 Aug 2013, Joonsoo Kim wrote:

> If we use 'struct page' of first page as 'struct slab', there is no
> advantage not to use __GFP_COMP. So use __GFP_COMP flag for all the cases.

Ok that brings it in line with SLUB and SLOB.

> @@ -2717,17 +2701,8 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
>  static void slab_map_pages(struct kmem_cache *cache, struct slab *slab,
>  			   struct page *page)
>  {
> -	int nr_pages;
> -
> -	nr_pages = 1;
> -	if (likely(!PageCompound(page)))
> -		nr_pages <<= cache->gfporder;
> -
> -	do {
> -		page->slab_cache = cache;
> -		page->slab_page = slab;
> -		page++;
> -	} while (--nr_pages);
> +	page->slab_cache = cache;
> +	page->slab_page = slab;
>  }

And saves some processing.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 00/16] slab: overload struct slab over struct page to reduce memory usage
  2013-08-22 16:47   ` Christoph Lameter
@ 2013-08-23  6:35     ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-23  6:35 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andrew Morton, David Rientjes, linux-mm, linux-kernel

On Thu, Aug 22, 2013 at 04:47:25PM +0000, Christoph Lameter wrote:
> On Thu, 22 Aug 2013, Joonsoo Kim wrote:
> 
> > And this patchset change a management method of free objects of a slab.
> > Current free objects management method of the slab is weird, because
> > it touch random position of the array of kmem_bufctl_t when we try to
> > get free object. See following example.
> 
> The ordering is intentional so that the most cache hot objects are removed
> first.

Yes, I know.

> 
> > To get free objects, we access this array with following pattern.
> > 6 -> 3 -> 7 -> 2 -> 5 -> 4 -> 0 -> 1 -> END
> 
> Because that is the inverse order of the objects being freed.
> 
> The cache hot effect may not be that significant since per cpu and per
> node queues have been aded on top. So maybe we do not be so cache aware
> anymore when actually touching struct slab.

I don't change the ordering, I just change how we store that order to
reduce cache footprint. We can simply implement this order via stack.

Assume indexes of free order are 1 -> 0 -> 4.
Currently, this order is stored in very complex way like below.

struct slab's free = 4
kmem_bufctl_t array: 1 END ACTIVE ACTIVE 0

If we allocate one object, we access slab's free and index 4 of
kmem_bufctl_t array.

struct slab's free = 0
kmem_bufctl_t array: 1 END ACTIVE ACTIVE ACTIVE
<we get object at index 4>

And then,

struct slab's free = 1
kmem_bufctl_t array: ACTIVE END ACTIVE ACTIVE ACTIVE
<we get object at index 0>

And then,

struct slab's free = END
kmem_bufctl_t array: ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE
<we get object at index 0>

Following is newly implementation (stack) in same situation.

struct slab's free = 0
kmem_bufctl_t array: 4 0 1

To get an one object,

struct slab's free = 1
kmem_bufctl_t array: dummy 0 1
<we get object at index 4>

And then,

struct slab's free = 2
kmem_bufctl_t array: dummy dummy 1
<we get object at index 0>

struct slab's free = 3
kmem_bufctl_t array: dummy dummy dummy
<we get object at index 1>

The order of returned object is same as previous algorithm.
However this algorithm sequentially accesses kmem_bufctl_t array,
instead of randomly access. This is an advantage of this patch.

Thanks.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 00/16] slab: overload struct slab over struct page to reduce memory usage
@ 2013-08-23  6:35     ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-23  6:35 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andrew Morton, David Rientjes, linux-mm, linux-kernel

On Thu, Aug 22, 2013 at 04:47:25PM +0000, Christoph Lameter wrote:
> On Thu, 22 Aug 2013, Joonsoo Kim wrote:
> 
> > And this patchset change a management method of free objects of a slab.
> > Current free objects management method of the slab is weird, because
> > it touch random position of the array of kmem_bufctl_t when we try to
> > get free object. See following example.
> 
> The ordering is intentional so that the most cache hot objects are removed
> first.

Yes, I know.

> 
> > To get free objects, we access this array with following pattern.
> > 6 -> 3 -> 7 -> 2 -> 5 -> 4 -> 0 -> 1 -> END
> 
> Because that is the inverse order of the objects being freed.
> 
> The cache hot effect may not be that significant since per cpu and per
> node queues have been aded on top. So maybe we do not be so cache aware
> anymore when actually touching struct slab.

I don't change the ordering, I just change how we store that order to
reduce cache footprint. We can simply implement this order via stack.

Assume indexes of free order are 1 -> 0 -> 4.
Currently, this order is stored in very complex way like below.

struct slab's free = 4
kmem_bufctl_t array: 1 END ACTIVE ACTIVE 0

If we allocate one object, we access slab's free and index 4 of
kmem_bufctl_t array.

struct slab's free = 0
kmem_bufctl_t array: 1 END ACTIVE ACTIVE ACTIVE
<we get object at index 4>

And then,

struct slab's free = 1
kmem_bufctl_t array: ACTIVE END ACTIVE ACTIVE ACTIVE
<we get object at index 0>

And then,

struct slab's free = END
kmem_bufctl_t array: ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE
<we get object at index 0>

Following is newly implementation (stack) in same situation.

struct slab's free = 0
kmem_bufctl_t array: 4 0 1

To get an one object,

struct slab's free = 1
kmem_bufctl_t array: dummy 0 1
<we get object at index 4>

And then,

struct slab's free = 2
kmem_bufctl_t array: dummy dummy 1
<we get object at index 0>

struct slab's free = 3
kmem_bufctl_t array: dummy dummy dummy
<we get object at index 1>

The order of returned object is same as previous algorithm.
However this algorithm sequentially accesses kmem_bufctl_t array,
instead of randomly access. This is an advantage of this patch.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 02/16] slab: change return type of kmem_getpages() to struct page
  2013-08-22 17:49     ` Christoph Lameter
@ 2013-08-23  6:40       ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-23  6:40 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andrew Morton, David Rientjes, linux-mm, linux-kernel

On Thu, Aug 22, 2013 at 05:49:43PM +0000, Christoph Lameter wrote:
> On Thu, 22 Aug 2013, Joonsoo Kim wrote:
> 
> > @@ -2042,7 +2042,7 @@ static void slab_destroy_debugcheck(struct kmem_cache *cachep, struct slab *slab
> >   */
> >  static void slab_destroy(struct kmem_cache *cachep, struct slab *slabp)
> >  {
> > -	void *addr = slabp->s_mem - slabp->colouroff;
> > +	struct page *page = virt_to_head_page(slabp->s_mem);
> >
> >  	slab_destroy_debugcheck(cachep, slabp);
> >  	if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU)) {
> 
> Ok so this removes slab offset management. The use of a struct page
> pointer therefore results in coloring support to be not possible anymore.

No, slab offset management is done by colour_off in struct kmem_cache.
This colouroff in struct slab is just for getting start address of the page
at free time. If we can get start address properly, we can remove it without
any side-effect. This patch implement it.

Thanks.

> 
> I would suggest to have a separate patch for coloring removal before this
> patch. It seems that the support is removed in two different patches now.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 02/16] slab: change return type of kmem_getpages() to struct page
@ 2013-08-23  6:40       ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-23  6:40 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andrew Morton, David Rientjes, linux-mm, linux-kernel

On Thu, Aug 22, 2013 at 05:49:43PM +0000, Christoph Lameter wrote:
> On Thu, 22 Aug 2013, Joonsoo Kim wrote:
> 
> > @@ -2042,7 +2042,7 @@ static void slab_destroy_debugcheck(struct kmem_cache *cachep, struct slab *slab
> >   */
> >  static void slab_destroy(struct kmem_cache *cachep, struct slab *slabp)
> >  {
> > -	void *addr = slabp->s_mem - slabp->colouroff;
> > +	struct page *page = virt_to_head_page(slabp->s_mem);
> >
> >  	slab_destroy_debugcheck(cachep, slabp);
> >  	if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU)) {
> 
> Ok so this removes slab offset management. The use of a struct page
> pointer therefore results in coloring support to be not possible anymore.

No, slab offset management is done by colour_off in struct kmem_cache.
This colouroff in struct slab is just for getting start address of the page
at free time. If we can get start address properly, we can remove it without
any side-effect. This patch implement it.

Thanks.

> 
> I would suggest to have a separate patch for coloring removal before this
> patch. It seems that the support is removed in two different patches now.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 04/16] slab: remove nodeid in struct slab
  2013-08-22 17:51     ` Christoph Lameter
@ 2013-08-23  6:49       ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-23  6:49 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andrew Morton, David Rientjes, linux-mm, linux-kernel

On Thu, Aug 22, 2013 at 05:51:58PM +0000, Christoph Lameter wrote:
> On Thu, 22 Aug 2013, Joonsoo Kim wrote:
> 
> > @@ -1099,8 +1098,7 @@ static void drain_alien_cache(struct kmem_cache *cachep,
> >
> >  static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
> >  {
> > -	struct slab *slabp = virt_to_slab(objp);
> > -	int nodeid = slabp->nodeid;
> > +	int nodeid = page_to_nid(virt_to_page(objp));
> >  	struct kmem_cache_node *n;
> >  	struct array_cache *alien = NULL;
> >  	int node;
> 
> virt_to_page is a relatively expensive operation. How does this affect
> performance?

Previous code, that is virt_to_slab(), already do virt_to_page().
So this doesn't matter at all.

Thanks.

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 04/16] slab: remove nodeid in struct slab
@ 2013-08-23  6:49       ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-23  6:49 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andrew Morton, David Rientjes, linux-mm, linux-kernel

On Thu, Aug 22, 2013 at 05:51:58PM +0000, Christoph Lameter wrote:
> On Thu, 22 Aug 2013, Joonsoo Kim wrote:
> 
> > @@ -1099,8 +1098,7 @@ static void drain_alien_cache(struct kmem_cache *cachep,
> >
> >  static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
> >  {
> > -	struct slab *slabp = virt_to_slab(objp);
> > -	int nodeid = slabp->nodeid;
> > +	int nodeid = page_to_nid(virt_to_page(objp));
> >  	struct kmem_cache_node *n;
> >  	struct array_cache *alien = NULL;
> >  	int node;
> 
> virt_to_page is a relatively expensive operation. How does this affect
> performance?

Previous code, that is virt_to_slab(), already do virt_to_page().
So this doesn't matter at all.

Thanks.

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 05/16] slab: remove cachep in struct slab_rcu
  2013-08-22 17:53     ` Christoph Lameter
@ 2013-08-23  6:53       ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-23  6:53 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andrew Morton, David Rientjes, linux-mm, linux-kernel

On Thu, Aug 22, 2013 at 05:53:00PM +0000, Christoph Lameter wrote:
> On Thu, 22 Aug 2013, Joonsoo Kim wrote:
> 
> > We can get cachep using page in struct slab_rcu, so remove it.
> 
> Ok but this means that we need to touch struct page. Additional cacheline
> in cache footprint.

In following patch, we overload RCU_HEAD to LRU of struct page and
also overload struct slab to struct page. So there is no
additional cacheline footprint at final stage.

Thanks.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 05/16] slab: remove cachep in struct slab_rcu
@ 2013-08-23  6:53       ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-23  6:53 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andrew Morton, David Rientjes, linux-mm, linux-kernel

On Thu, Aug 22, 2013 at 05:53:00PM +0000, Christoph Lameter wrote:
> On Thu, 22 Aug 2013, Joonsoo Kim wrote:
> 
> > We can get cachep using page in struct slab_rcu, so remove it.
> 
> Ok but this means that we need to touch struct page. Additional cacheline
> in cache footprint.

In following patch, we overload RCU_HEAD to LRU of struct page and
also overload struct slab to struct page. So there is no
additional cacheline footprint at final stage.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 09/16] slab: use __GFP_COMP flag for allocating slab pages
  2013-08-22 18:00     ` Christoph Lameter
@ 2013-08-23  6:55       ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-23  6:55 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andrew Morton, David Rientjes, linux-mm, linux-kernel

On Thu, Aug 22, 2013 at 06:00:56PM +0000, Christoph Lameter wrote:
> On Thu, 22 Aug 2013, Joonsoo Kim wrote:
> 
> > If we use 'struct page' of first page as 'struct slab', there is no
> > advantage not to use __GFP_COMP. So use __GFP_COMP flag for all the cases.
> 
> Ok that brings it in line with SLUB and SLOB.

Yes!

> 
> > @@ -2717,17 +2701,8 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
> >  static void slab_map_pages(struct kmem_cache *cache, struct slab *slab,
> >  			   struct page *page)
> >  {
> > -	int nr_pages;
> > -
> > -	nr_pages = 1;
> > -	if (likely(!PageCompound(page)))
> > -		nr_pages <<= cache->gfporder;
> > -
> > -	do {
> > -		page->slab_cache = cache;
> > -		page->slab_page = slab;
> > -		page++;
> > -	} while (--nr_pages);
> > +	page->slab_cache = cache;
> > +	page->slab_page = slab;
> >  }
> 
> And saves some processing.

Yes!


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 09/16] slab: use __GFP_COMP flag for allocating slab pages
@ 2013-08-23  6:55       ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-23  6:55 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andrew Morton, David Rientjes, linux-mm, linux-kernel

On Thu, Aug 22, 2013 at 06:00:56PM +0000, Christoph Lameter wrote:
> On Thu, 22 Aug 2013, Joonsoo Kim wrote:
> 
> > If we use 'struct page' of first page as 'struct slab', there is no
> > advantage not to use __GFP_COMP. So use __GFP_COMP flag for all the cases.
> 
> Ok that brings it in line with SLUB and SLOB.

Yes!

> 
> > @@ -2717,17 +2701,8 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
> >  static void slab_map_pages(struct kmem_cache *cache, struct slab *slab,
> >  			   struct page *page)
> >  {
> > -	int nr_pages;
> > -
> > -	nr_pages = 1;
> > -	if (likely(!PageCompound(page)))
> > -		nr_pages <<= cache->gfporder;
> > -
> > -	do {
> > -		page->slab_cache = cache;
> > -		page->slab_page = slab;
> > -		page++;
> > -	} while (--nr_pages);
> > +	page->slab_cache = cache;
> > +	page->slab_page = slab;
> >  }
> 
> And saves some processing.

Yes!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 05/16] slab: remove cachep in struct slab_rcu
  2013-08-23  6:53       ` Joonsoo Kim
@ 2013-08-23 13:42         ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-08-23 13:42 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, David Rientjes, linux-mm, linux-kernel

On Fri, 23 Aug 2013, Joonsoo Kim wrote:

> On Thu, Aug 22, 2013 at 05:53:00PM +0000, Christoph Lameter wrote:
> > On Thu, 22 Aug 2013, Joonsoo Kim wrote:
> >
> > > We can get cachep using page in struct slab_rcu, so remove it.
> >
> > Ok but this means that we need to touch struct page. Additional cacheline
> > in cache footprint.
>
> In following patch, we overload RCU_HEAD to LRU of struct page and
> also overload struct slab to struct page. So there is no
> additional cacheline footprint at final stage.

If you do not use rcu (standard case) then you have an additional
cacheline.


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 05/16] slab: remove cachep in struct slab_rcu
@ 2013-08-23 13:42         ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-08-23 13:42 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, David Rientjes, linux-mm, linux-kernel

On Fri, 23 Aug 2013, Joonsoo Kim wrote:

> On Thu, Aug 22, 2013 at 05:53:00PM +0000, Christoph Lameter wrote:
> > On Thu, 22 Aug 2013, Joonsoo Kim wrote:
> >
> > > We can get cachep using page in struct slab_rcu, so remove it.
> >
> > Ok but this means that we need to touch struct page. Additional cacheline
> > in cache footprint.
>
> In following patch, we overload RCU_HEAD to LRU of struct page and
> also overload struct slab to struct page. So there is no
> additional cacheline footprint at final stage.

If you do not use rcu (standard case) then you have an additional
cacheline.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 05/16] slab: remove cachep in struct slab_rcu
  2013-08-23 13:42         ` Christoph Lameter
@ 2013-08-23 14:24           ` JoonSoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: JoonSoo Kim @ 2013-08-23 14:24 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Joonsoo Kim, Pekka Enberg, Andrew Morton, David Rientjes,
	Linux Memory Management List, LKML

2013/8/23 Christoph Lameter <cl@linux.com>:
> On Fri, 23 Aug 2013, Joonsoo Kim wrote:
>
>> On Thu, Aug 22, 2013 at 05:53:00PM +0000, Christoph Lameter wrote:
>> > On Thu, 22 Aug 2013, Joonsoo Kim wrote:
>> >
>> > > We can get cachep using page in struct slab_rcu, so remove it.
>> >
>> > Ok but this means that we need to touch struct page. Additional cacheline
>> > in cache footprint.
>>
>> In following patch, we overload RCU_HEAD to LRU of struct page and
>> also overload struct slab to struct page. So there is no
>> additional cacheline footprint at final stage.
>
> If you do not use rcu (standard case) then you have an additional
> cacheline.
>

I don't get it. This patch only affect to the rcu case, because it
change the code
which is in kmem_rcu_free(). It doesn't touch anything in standard case.

Thanks.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 05/16] slab: remove cachep in struct slab_rcu
@ 2013-08-23 14:24           ` JoonSoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: JoonSoo Kim @ 2013-08-23 14:24 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Joonsoo Kim, Pekka Enberg, Andrew Morton, David Rientjes,
	Linux Memory Management List, LKML

2013/8/23 Christoph Lameter <cl@linux.com>:
> On Fri, 23 Aug 2013, Joonsoo Kim wrote:
>
>> On Thu, Aug 22, 2013 at 05:53:00PM +0000, Christoph Lameter wrote:
>> > On Thu, 22 Aug 2013, Joonsoo Kim wrote:
>> >
>> > > We can get cachep using page in struct slab_rcu, so remove it.
>> >
>> > Ok but this means that we need to touch struct page. Additional cacheline
>> > in cache footprint.
>>
>> In following patch, we overload RCU_HEAD to LRU of struct page and
>> also overload struct slab to struct page. So there is no
>> additional cacheline footprint at final stage.
>
> If you do not use rcu (standard case) then you have an additional
> cacheline.
>

I don't get it. This patch only affect to the rcu case, because it
change the code
which is in kmem_rcu_free(). It doesn't touch anything in standard case.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 05/16] slab: remove cachep in struct slab_rcu
  2013-08-23 14:24           ` JoonSoo Kim
@ 2013-08-23 15:41             ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-08-23 15:41 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: Joonsoo Kim, Pekka Enberg, Andrew Morton, David Rientjes,
	Linux Memory Management List, LKML

On Fri, 23 Aug 2013, JoonSoo Kim wrote:

> I don't get it. This patch only affect to the rcu case, because it
> change the code
> which is in kmem_rcu_free(). It doesn't touch anything in standard case.

In general this patchset moves struct slab to overlay struct page. The
design of SLAB was (at least at some point in the past) to avoid struct
page references. The freelist was kept close to struct slab so that the
contents are in the same cache line. Moving fields to struct page will add
another cacheline to be referenced.

The freelist (bufctl_t) was dimensioned in such a way as to be small
and close cache wise to struct slab. Somewhow bufctl_t grew to
unsigned int and therefore the table became a bit large. Fundamentally
these are indexes into the objects in page. They really could be sized
again to just be single bytes as also explained in the comments in slab.c:

/*
 * kmem_bufctl_t:
 *
 * Bufctl's are used for linking objs within a slab
 * linked offsets.
 *
 * This implementation relies on "struct page" for locating the cache &
 * slab an object belongs to.
 * This allows the bufctl structure to be small (one int), but limits
 * the number of objects a slab (not a cache) can contain when off-slab
 * bufctls are used. The limit is the size of the largest general cache
 * that does not use off-slab slabs.
 * For 32bit archs with 4 kB pages, is this 56.
 * This is not serious, as it is only for large objects, when it is unwise
 * to have too many per slab.
 * Note: This limit can be raised by introducing a general cache whose size
 * is less than 512 (PAGE_SIZE<<3), but greater than 256.
 */

For 56 objects the bufctl_t could really be reduced to an 8 bit integer
which would shrink the size of the table significantly and improve speed
by reducing cache footprint.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 05/16] slab: remove cachep in struct slab_rcu
@ 2013-08-23 15:41             ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-08-23 15:41 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: Joonsoo Kim, Pekka Enberg, Andrew Morton, David Rientjes,
	Linux Memory Management List, LKML

On Fri, 23 Aug 2013, JoonSoo Kim wrote:

> I don't get it. This patch only affect to the rcu case, because it
> change the code
> which is in kmem_rcu_free(). It doesn't touch anything in standard case.

In general this patchset moves struct slab to overlay struct page. The
design of SLAB was (at least at some point in the past) to avoid struct
page references. The freelist was kept close to struct slab so that the
contents are in the same cache line. Moving fields to struct page will add
another cacheline to be referenced.

The freelist (bufctl_t) was dimensioned in such a way as to be small
and close cache wise to struct slab. Somewhow bufctl_t grew to
unsigned int and therefore the table became a bit large. Fundamentally
these are indexes into the objects in page. They really could be sized
again to just be single bytes as also explained in the comments in slab.c:

/*
 * kmem_bufctl_t:
 *
 * Bufctl's are used for linking objs within a slab
 * linked offsets.
 *
 * This implementation relies on "struct page" for locating the cache &
 * slab an object belongs to.
 * This allows the bufctl structure to be small (one int), but limits
 * the number of objects a slab (not a cache) can contain when off-slab
 * bufctls are used. The limit is the size of the largest general cache
 * that does not use off-slab slabs.
 * For 32bit archs with 4 kB pages, is this 56.
 * This is not serious, as it is only for large objects, when it is unwise
 * to have too many per slab.
 * Note: This limit can be raised by introducing a general cache whose size
 * is less than 512 (PAGE_SIZE<<3), but greater than 256.
 */

For 56 objects the bufctl_t could really be reduced to an 8 bit integer
which would shrink the size of the table significantly and improve speed
by reducing cache footprint.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 05/16] slab: remove cachep in struct slab_rcu
  2013-08-23 15:41             ` Christoph Lameter
@ 2013-08-23 16:12               ` JoonSoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: JoonSoo Kim @ 2013-08-23 16:12 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Joonsoo Kim, Pekka Enberg, Andrew Morton, David Rientjes,
	Linux Memory Management List, LKML

2013/8/24 Christoph Lameter <cl@linux.com>:
> On Fri, 23 Aug 2013, JoonSoo Kim wrote:
>
>> I don't get it. This patch only affect to the rcu case, because it
>> change the code
>> which is in kmem_rcu_free(). It doesn't touch anything in standard case.
>
> In general this patchset moves struct slab to overlay struct page. The
> design of SLAB was (at least at some point in the past) to avoid struct
> page references. The freelist was kept close to struct slab so that the
> contents are in the same cache line. Moving fields to struct page will add
> another cacheline to be referenced.

I don't think so.
We should touch the struct page in order to get the struct slab, so there is
no additional cacheline reference.

And if the size of the (slab + freelist) decreases due to this patchset,
there is more chance to be on-slab which means that the freelist is in pages
of a slab itself. I think that it also help cache usage.

> The freelist (bufctl_t) was dimensioned in such a way as to be small
> and close cache wise to struct slab.

I think that my patchset don't harm anything related to this.
As I said, we should access the struct page before getting the struct slab,
so the fact that freelist is far from the struct slab doesn't mean additional
cache overhead.

* Before patchset
struct page -> struct slab (far from struct page)
   -> the freelist (near from struct slab)

* After patchset
struct page (overload by struct slab) -> the freelist (far from struct page)

Somewhow bufctl_t grew to
> unsigned int and therefore the table became a bit large. Fundamentally
> these are indexes into the objects in page. They really could be sized
> again to just be single bytes as also explained in the comments in slab.c:
> /*
>  * kmem_bufctl_t:
>  *
>  * Bufctl's are used for linking objs within a slab
>  * linked offsets.
>  *
>  * This implementation relies on "struct page" for locating the cache &
>  * slab an object belongs to.
>  * This allows the bufctl structure to be small (one int), but limits
>  * the number of objects a slab (not a cache) can contain when off-slab
>  * bufctls are used. The limit is the size of the largest general cache
>  * that does not use off-slab slabs.
>  * For 32bit archs with 4 kB pages, is this 56.
>  * This is not serious, as it is only for large objects, when it is unwise
>  * to have too many per slab.
>  * Note: This limit can be raised by introducing a general cache whose size
>  * is less than 512 (PAGE_SIZE<<3), but greater than 256.
>  */
>
> For 56 objects the bufctl_t could really be reduced to an 8 bit integer
> which would shrink the size of the table significantly and improve speed
> by reducing cache footprint.
>

Yes, that's very good. However this is not related to this patchset.
It can be implemented independently :)

Please let me know what I am missing.
Thanks.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 05/16] slab: remove cachep in struct slab_rcu
@ 2013-08-23 16:12               ` JoonSoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: JoonSoo Kim @ 2013-08-23 16:12 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Joonsoo Kim, Pekka Enberg, Andrew Morton, David Rientjes,
	Linux Memory Management List, LKML

2013/8/24 Christoph Lameter <cl@linux.com>:
> On Fri, 23 Aug 2013, JoonSoo Kim wrote:
>
>> I don't get it. This patch only affect to the rcu case, because it
>> change the code
>> which is in kmem_rcu_free(). It doesn't touch anything in standard case.
>
> In general this patchset moves struct slab to overlay struct page. The
> design of SLAB was (at least at some point in the past) to avoid struct
> page references. The freelist was kept close to struct slab so that the
> contents are in the same cache line. Moving fields to struct page will add
> another cacheline to be referenced.

I don't think so.
We should touch the struct page in order to get the struct slab, so there is
no additional cacheline reference.

And if the size of the (slab + freelist) decreases due to this patchset,
there is more chance to be on-slab which means that the freelist is in pages
of a slab itself. I think that it also help cache usage.

> The freelist (bufctl_t) was dimensioned in such a way as to be small
> and close cache wise to struct slab.

I think that my patchset don't harm anything related to this.
As I said, we should access the struct page before getting the struct slab,
so the fact that freelist is far from the struct slab doesn't mean additional
cache overhead.

* Before patchset
struct page -> struct slab (far from struct page)
   -> the freelist (near from struct slab)

* After patchset
struct page (overload by struct slab) -> the freelist (far from struct page)

Somewhow bufctl_t grew to
> unsigned int and therefore the table became a bit large. Fundamentally
> these are indexes into the objects in page. They really could be sized
> again to just be single bytes as also explained in the comments in slab.c:
> /*
>  * kmem_bufctl_t:
>  *
>  * Bufctl's are used for linking objs within a slab
>  * linked offsets.
>  *
>  * This implementation relies on "struct page" for locating the cache &
>  * slab an object belongs to.
>  * This allows the bufctl structure to be small (one int), but limits
>  * the number of objects a slab (not a cache) can contain when off-slab
>  * bufctls are used. The limit is the size of the largest general cache
>  * that does not use off-slab slabs.
>  * For 32bit archs with 4 kB pages, is this 56.
>  * This is not serious, as it is only for large objects, when it is unwise
>  * to have too many per slab.
>  * Note: This limit can be raised by introducing a general cache whose size
>  * is less than 512 (PAGE_SIZE<<3), but greater than 256.
>  */
>
> For 56 objects the bufctl_t could really be reduced to an 8 bit integer
> which would shrink the size of the table significantly and improve speed
> by reducing cache footprint.
>

Yes, that's very good. However this is not related to this patchset.
It can be implemented independently :)

Please let me know what I am missing.
Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 07/16] slab: overloading the RCU head over the LRU for RCU free
  2013-08-22  8:44   ` Joonsoo Kim
@ 2013-08-27 22:06     ` Jonathan Corbet
  -1 siblings, 0 replies; 114+ messages in thread
From: Jonathan Corbet @ 2013-08-27 22:06 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Christoph Lameter, Andrew Morton, Joonsoo Kim,
	David Rientjes, linux-mm, linux-kernel

On Thu, 22 Aug 2013 17:44:16 +0900
Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:

> With build-time size checking, we can overload the RCU head over the LRU
> of struct page to free pages of a slab in rcu context. This really help to
> implement to overload the struct slab over the struct page and this
> eventually reduce memory usage and cache footprint of the SLAB.

So I'm taking a look at this, trying to figure out what's actually in
struct page while this stuff is going on without my head exploding.  A
couple of questions come to mind.

>  static void kmem_rcu_free(struct rcu_head *head)
>  {
> -	struct slab_rcu *slab_rcu = (struct slab_rcu *)head;
> -	struct kmem_cache *cachep = slab_rcu->page->slab_cache;
> +	struct kmem_cache *cachep;
> +	struct page *page;
>  
> -	kmem_freepages(cachep, slab_rcu->page);
> +	page = container_of((struct list_head *)head, struct page, lru);
> +	cachep = page->slab_cache;
> +
> +	kmem_freepages(cachep, page);
>  }

Is there a reason why you don't add the rcu_head structure as another field
in that union alongside lru rather than playing casting games here?  This
stuff is hard enough to follow as it is without adding that into the mix.

The other question I had is: this field also overlays slab_page.  I guess
that, by the time RCU comes into play, there will be no further use of
slab_page?  It might be nice to document that somewhere if it's the case.

Thanks,

jon

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 07/16] slab: overloading the RCU head over the LRU for RCU free
@ 2013-08-27 22:06     ` Jonathan Corbet
  0 siblings, 0 replies; 114+ messages in thread
From: Jonathan Corbet @ 2013-08-27 22:06 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Christoph Lameter, Andrew Morton, Joonsoo Kim,
	David Rientjes, linux-mm, linux-kernel

On Thu, 22 Aug 2013 17:44:16 +0900
Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:

> With build-time size checking, we can overload the RCU head over the LRU
> of struct page to free pages of a slab in rcu context. This really help to
> implement to overload the struct slab over the struct page and this
> eventually reduce memory usage and cache footprint of the SLAB.

So I'm taking a look at this, trying to figure out what's actually in
struct page while this stuff is going on without my head exploding.  A
couple of questions come to mind.

>  static void kmem_rcu_free(struct rcu_head *head)
>  {
> -	struct slab_rcu *slab_rcu = (struct slab_rcu *)head;
> -	struct kmem_cache *cachep = slab_rcu->page->slab_cache;
> +	struct kmem_cache *cachep;
> +	struct page *page;
>  
> -	kmem_freepages(cachep, slab_rcu->page);
> +	page = container_of((struct list_head *)head, struct page, lru);
> +	cachep = page->slab_cache;
> +
> +	kmem_freepages(cachep, page);
>  }

Is there a reason why you don't add the rcu_head structure as another field
in that union alongside lru rather than playing casting games here?  This
stuff is hard enough to follow as it is without adding that into the mix.

The other question I had is: this field also overlays slab_page.  I guess
that, by the time RCU comes into play, there will be no further use of
slab_page?  It might be nice to document that somewhere if it's the case.

Thanks,

jon

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 07/16] slab: overloading the RCU head over the LRU for RCU free
  2013-08-27 22:06     ` Jonathan Corbet
@ 2013-08-28  6:36       ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-28  6:36 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Pekka Enberg, Christoph Lameter, Andrew Morton, David Rientjes,
	linux-mm, linux-kernel

Hello,

On Tue, Aug 27, 2013 at 04:06:04PM -0600, Jonathan Corbet wrote:
> On Thu, 22 Aug 2013 17:44:16 +0900
> Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> 
> > With build-time size checking, we can overload the RCU head over the LRU
> > of struct page to free pages of a slab in rcu context. This really help to
> > implement to overload the struct slab over the struct page and this
> > eventually reduce memory usage and cache footprint of the SLAB.
> 
> So I'm taking a look at this, trying to figure out what's actually in
> struct page while this stuff is going on without my head exploding.  A
> couple of questions come to mind.
> 
> >  static void kmem_rcu_free(struct rcu_head *head)
> >  {
> > -	struct slab_rcu *slab_rcu = (struct slab_rcu *)head;
> > -	struct kmem_cache *cachep = slab_rcu->page->slab_cache;
> > +	struct kmem_cache *cachep;
> > +	struct page *page;
> >  
> > -	kmem_freepages(cachep, slab_rcu->page);
> > +	page = container_of((struct list_head *)head, struct page, lru);
> > +	cachep = page->slab_cache;
> > +
> > +	kmem_freepages(cachep, page);
> >  }
> 
> Is there a reason why you don't add the rcu_head structure as another field
> in that union alongside lru rather than playing casting games here?  This
> stuff is hard enough to follow as it is without adding that into the mix.

One reason is that the SLUB is already playing this games :)
And the struct page shouldn't be enlarged unintentionally when the size of
the rcu_head is changed.

> 
> The other question I had is: this field also overlays slab_page.  I guess
> that, by the time RCU comes into play, there will be no further use of
> slab_page?  It might be nice to document that somewhere if it's the case.

Ah..... I did a mistake in previous patch (06/16). We should leave an object
on slab_page until rcu finish the work since rcu_head is overloaded over it.

If I remove that patch, this patch has a problem you mentioned. But I think
that a fix is simple. Moving the slab_page to another union field in the
struct slab prio to this patch solves the problem you mentioned.

Thanks for pointing that!


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 07/16] slab: overloading the RCU head over the LRU for RCU free
@ 2013-08-28  6:36       ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-08-28  6:36 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Pekka Enberg, Christoph Lameter, Andrew Morton, David Rientjes,
	linux-mm, linux-kernel

Hello,

On Tue, Aug 27, 2013 at 04:06:04PM -0600, Jonathan Corbet wrote:
> On Thu, 22 Aug 2013 17:44:16 +0900
> Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> 
> > With build-time size checking, we can overload the RCU head over the LRU
> > of struct page to free pages of a slab in rcu context. This really help to
> > implement to overload the struct slab over the struct page and this
> > eventually reduce memory usage and cache footprint of the SLAB.
> 
> So I'm taking a look at this, trying to figure out what's actually in
> struct page while this stuff is going on without my head exploding.  A
> couple of questions come to mind.
> 
> >  static void kmem_rcu_free(struct rcu_head *head)
> >  {
> > -	struct slab_rcu *slab_rcu = (struct slab_rcu *)head;
> > -	struct kmem_cache *cachep = slab_rcu->page->slab_cache;
> > +	struct kmem_cache *cachep;
> > +	struct page *page;
> >  
> > -	kmem_freepages(cachep, slab_rcu->page);
> > +	page = container_of((struct list_head *)head, struct page, lru);
> > +	cachep = page->slab_cache;
> > +
> > +	kmem_freepages(cachep, page);
> >  }
> 
> Is there a reason why you don't add the rcu_head structure as another field
> in that union alongside lru rather than playing casting games here?  This
> stuff is hard enough to follow as it is without adding that into the mix.

One reason is that the SLUB is already playing this games :)
And the struct page shouldn't be enlarged unintentionally when the size of
the rcu_head is changed.

> 
> The other question I had is: this field also overlays slab_page.  I guess
> that, by the time RCU comes into play, there will be no further use of
> slab_page?  It might be nice to document that somewhere if it's the case.

Ah..... I did a mistake in previous patch (06/16). We should leave an object
on slab_page until rcu finish the work since rcu_head is overloaded over it.

If I remove that patch, this patch has a problem you mentioned. But I think
that a fix is simple. Moving the slab_page to another union field in the
struct slab prio to this patch solves the problem you mentioned.

Thanks for pointing that!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 0/4] slab: implement byte sized indexes for the freelist of a slab
  2013-08-23 16:12               ` JoonSoo Kim
@ 2013-09-02  8:38                 ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-09-02  8:38 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

This patchset implements byte sized indexes for the freelist of a slab.

Currently, the freelist of a slab consist of unsigned int sized indexes.
Most of slabs have less number of objects than 256, so much space is wasted.
To reduce this overhead, this patchset implements byte sized indexes for
the freelist of a slab. With it, we can save 3 bytes for each objects.

This introduce one likely branch to functions used for setting/getting
objects to/from the freelist, but we may get more benefits from
this change.

Below is some numbers of 'cat /proc/slabinfo' related to my previous posting
and this patchset.


* Before *
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables [snip...]
kmalloc-512          525    640    512    8    1 : tunables   54   27    0 : slabdata     80     80      0   
kmalloc-256          210    210    256   15    1 : tunables  120   60    0 : slabdata     14     14      0   
kmalloc-192         1016   1040    192   20    1 : tunables  120   60    0 : slabdata     52     52      0   
kmalloc-96           560    620    128   31    1 : tunables  120   60    0 : slabdata     20     20      0   
kmalloc-64          2148   2280     64   60    1 : tunables  120   60    0 : slabdata     38     38      0   
kmalloc-128          647    682    128   31    1 : tunables  120   60    0 : slabdata     22     22      0   
kmalloc-32         11360  11413     32  113    1 : tunables  120   60    0 : slabdata    101    101      0   
kmem_cache           197    200    192   20    1 : tunables  120   60    0 : slabdata     10     10      0   

* After my previous posting(overload struct slab over struct page) *
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables [snip...]
kmalloc-512          527    600    512    8    1 : tunables   54   27    0 : slabdata     75     75      0   
kmalloc-256          210    210    256   15    1 : tunables  120   60    0 : slabdata     14     14      0   
kmalloc-192         1040   1040    192   20    1 : tunables  120   60    0 : slabdata     52     52      0   
kmalloc-96           750    750    128   30    1 : tunables  120   60    0 : slabdata     25     25      0   
kmalloc-64          2773   2773     64   59    1 : tunables  120   60    0 : slabdata     47     47      0   
kmalloc-128          660    690    128   30    1 : tunables  120   60    0 : slabdata     23     23      0   
kmalloc-32         11200  11200     32  112    1 : tunables  120   60    0 : slabdata    100    100      0   
kmem_cache           197    200    192   20    1 : tunables  120   60    0 : slabdata     10     10      0   

kmem_caches consisting of objects less than or equal to 128 byte have one more
objects in a slab. You can see it at objperslab.

We can improve further with this patchset.

* My previous posting + this patchset *
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables [snip...]
kmalloc-512          521    648    512    8    1 : tunables   54   27    0 : slabdata     81     81      0
kmalloc-256          208    208    256   16    1 : tunables  120   60    0 : slabdata     13     13      0
kmalloc-192         1029   1029    192   21    1 : tunables  120   60    0 : slabdata     49     49      0
kmalloc-96           529    589    128   31    1 : tunables  120   60    0 : slabdata     19     19      0
kmalloc-64          2142   2142     64   63    1 : tunables  120   60    0 : slabdata     34     34      0
kmalloc-128          660    682    128   31    1 : tunables  120   60    0 : slabdata     22     22      0
kmalloc-32         11716  11780     32  124    1 : tunables  120   60    0 : slabdata     95     95      0
kmem_cache           197    210    192   21    1 : tunables  120   60    0 : slabdata     10     10      0

kmem_caches consisting of objects less than or equal to 256 byte have
one or more objects than before. In the case of kmalloc-32, we have 12 more
objects, so 384 bytes (12 * 32) are saved and this is roughly 9% saving of
memory. Of couse, this percentage decreases as the number of objects
in a slab decreases.

Please let me know expert's opions :)
Thanks.

This patchset comes from a Christoph's idea.
https://lkml.org/lkml/2013/8/23/315

Patches are on top of my previous posting.
https://lkml.org/lkml/2013/8/22/137

Joonsoo Kim (4):
  slab: factor out calculate nr objects in cache_estimate
  slab: introduce helper functions to get/set free object
  slab: introduce byte sized index for the freelist of a slab
  slab: make more slab management structure off the slab

 mm/slab.c |  138 +++++++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 103 insertions(+), 35 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 0/4] slab: implement byte sized indexes for the freelist of a slab
@ 2013-09-02  8:38                 ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-09-02  8:38 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

This patchset implements byte sized indexes for the freelist of a slab.

Currently, the freelist of a slab consist of unsigned int sized indexes.
Most of slabs have less number of objects than 256, so much space is wasted.
To reduce this overhead, this patchset implements byte sized indexes for
the freelist of a slab. With it, we can save 3 bytes for each objects.

This introduce one likely branch to functions used for setting/getting
objects to/from the freelist, but we may get more benefits from
this change.

Below is some numbers of 'cat /proc/slabinfo' related to my previous posting
and this patchset.


* Before *
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables [snip...]
kmalloc-512          525    640    512    8    1 : tunables   54   27    0 : slabdata     80     80      0   
kmalloc-256          210    210    256   15    1 : tunables  120   60    0 : slabdata     14     14      0   
kmalloc-192         1016   1040    192   20    1 : tunables  120   60    0 : slabdata     52     52      0   
kmalloc-96           560    620    128   31    1 : tunables  120   60    0 : slabdata     20     20      0   
kmalloc-64          2148   2280     64   60    1 : tunables  120   60    0 : slabdata     38     38      0   
kmalloc-128          647    682    128   31    1 : tunables  120   60    0 : slabdata     22     22      0   
kmalloc-32         11360  11413     32  113    1 : tunables  120   60    0 : slabdata    101    101      0   
kmem_cache           197    200    192   20    1 : tunables  120   60    0 : slabdata     10     10      0   

* After my previous posting(overload struct slab over struct page) *
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables [snip...]
kmalloc-512          527    600    512    8    1 : tunables   54   27    0 : slabdata     75     75      0   
kmalloc-256          210    210    256   15    1 : tunables  120   60    0 : slabdata     14     14      0   
kmalloc-192         1040   1040    192   20    1 : tunables  120   60    0 : slabdata     52     52      0   
kmalloc-96           750    750    128   30    1 : tunables  120   60    0 : slabdata     25     25      0   
kmalloc-64          2773   2773     64   59    1 : tunables  120   60    0 : slabdata     47     47      0   
kmalloc-128          660    690    128   30    1 : tunables  120   60    0 : slabdata     23     23      0   
kmalloc-32         11200  11200     32  112    1 : tunables  120   60    0 : slabdata    100    100      0   
kmem_cache           197    200    192   20    1 : tunables  120   60    0 : slabdata     10     10      0   

kmem_caches consisting of objects less than or equal to 128 byte have one more
objects in a slab. You can see it at objperslab.

We can improve further with this patchset.

* My previous posting + this patchset *
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables [snip...]
kmalloc-512          521    648    512    8    1 : tunables   54   27    0 : slabdata     81     81      0
kmalloc-256          208    208    256   16    1 : tunables  120   60    0 : slabdata     13     13      0
kmalloc-192         1029   1029    192   21    1 : tunables  120   60    0 : slabdata     49     49      0
kmalloc-96           529    589    128   31    1 : tunables  120   60    0 : slabdata     19     19      0
kmalloc-64          2142   2142     64   63    1 : tunables  120   60    0 : slabdata     34     34      0
kmalloc-128          660    682    128   31    1 : tunables  120   60    0 : slabdata     22     22      0
kmalloc-32         11716  11780     32  124    1 : tunables  120   60    0 : slabdata     95     95      0
kmem_cache           197    210    192   21    1 : tunables  120   60    0 : slabdata     10     10      0

kmem_caches consisting of objects less than or equal to 256 byte have
one or more objects than before. In the case of kmalloc-32, we have 12 more
objects, so 384 bytes (12 * 32) are saved and this is roughly 9% saving of
memory. Of couse, this percentage decreases as the number of objects
in a slab decreases.

Please let me know expert's opions :)
Thanks.

This patchset comes from a Christoph's idea.
https://lkml.org/lkml/2013/8/23/315

Patches are on top of my previous posting.
https://lkml.org/lkml/2013/8/22/137

Joonsoo Kim (4):
  slab: factor out calculate nr objects in cache_estimate
  slab: introduce helper functions to get/set free object
  slab: introduce byte sized index for the freelist of a slab
  slab: make more slab management structure off the slab

 mm/slab.c |  138 +++++++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 103 insertions(+), 35 deletions(-)

-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 1/4] slab: factor out calculate nr objects in cache_estimate
  2013-09-02  8:38                 ` Joonsoo Kim
@ 2013-09-02  8:38                   ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-09-02  8:38 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

This logic is not simple to understand so that making separate function
helping readability. Additionally, we can use this change in the
following patch which implement for freelist to have another sized index
in according to nr objects.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index f3868fe..9d4bad5 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -565,9 +565,31 @@ static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep)
 	return cachep->array[smp_processor_id()];
 }
 
-static size_t slab_mgmt_size(size_t nr_objs, size_t align)
+static int calculate_nr_objs(size_t slab_size, size_t buffer_size,
+				size_t idx_size, size_t align)
 {
-	return ALIGN(nr_objs * sizeof(unsigned int), align);
+	int nr_objs;
+	size_t freelist_size;
+
+	/*
+	 * Ignore padding for the initial guess. The padding
+	 * is at most @align-1 bytes, and @buffer_size is at
+	 * least @align. In the worst case, this result will
+	 * be one greater than the number of objects that fit
+	 * into the memory allocation when taking the padding
+	 * into account.
+	 */
+	nr_objs = slab_size / (buffer_size + idx_size);
+
+	/*
+	 * This calculated number will be either the right
+	 * amount, or one greater than what we want.
+	 */
+	freelist_size = slab_size - nr_objs * buffer_size;
+	if (freelist_size < ALIGN(nr_objs * idx_size, align))
+		nr_objs--;
+
+	return nr_objs;
 }
 
 /*
@@ -600,28 +622,12 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
 		nr_objs = slab_size / buffer_size;
 
 	} else {
-		/*
-		 * Ignore padding for the initial guess. The padding
-		 * is at most @align-1 bytes, and @buffer_size is at
-		 * least @align. In the worst case, this result will
-		 * be one greater than the number of objects that fit
-		 * into the memory allocation when taking the padding
-		 * into account.
-		 */
-		nr_objs = (slab_size) / (buffer_size + sizeof(unsigned int));
-
-		/*
-		 * This calculated number will be either the right
-		 * amount, or one greater than what we want.
-		 */
-		if (slab_mgmt_size(nr_objs, align) + nr_objs*buffer_size
-		       > slab_size)
-			nr_objs--;
-
-		mgmt_size = slab_mgmt_size(nr_objs, align);
+		nr_objs = calculate_nr_objs(slab_size, buffer_size,
+					sizeof(unsigned int), align);
+		mgmt_size = ALIGN(nr_objs * sizeof(unsigned int), align);
 	}
 	*num = nr_objs;
-	*left_over = slab_size - nr_objs*buffer_size - mgmt_size;
+	*left_over = slab_size - (nr_objs * buffer_size) - mgmt_size;
 }
 
 #if DEBUG
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 1/4] slab: factor out calculate nr objects in cache_estimate
@ 2013-09-02  8:38                   ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-09-02  8:38 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

This logic is not simple to understand so that making separate function
helping readability. Additionally, we can use this change in the
following patch which implement for freelist to have another sized index
in according to nr objects.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index f3868fe..9d4bad5 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -565,9 +565,31 @@ static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep)
 	return cachep->array[smp_processor_id()];
 }
 
-static size_t slab_mgmt_size(size_t nr_objs, size_t align)
+static int calculate_nr_objs(size_t slab_size, size_t buffer_size,
+				size_t idx_size, size_t align)
 {
-	return ALIGN(nr_objs * sizeof(unsigned int), align);
+	int nr_objs;
+	size_t freelist_size;
+
+	/*
+	 * Ignore padding for the initial guess. The padding
+	 * is at most @align-1 bytes, and @buffer_size is at
+	 * least @align. In the worst case, this result will
+	 * be one greater than the number of objects that fit
+	 * into the memory allocation when taking the padding
+	 * into account.
+	 */
+	nr_objs = slab_size / (buffer_size + idx_size);
+
+	/*
+	 * This calculated number will be either the right
+	 * amount, or one greater than what we want.
+	 */
+	freelist_size = slab_size - nr_objs * buffer_size;
+	if (freelist_size < ALIGN(nr_objs * idx_size, align))
+		nr_objs--;
+
+	return nr_objs;
 }
 
 /*
@@ -600,28 +622,12 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
 		nr_objs = slab_size / buffer_size;
 
 	} else {
-		/*
-		 * Ignore padding for the initial guess. The padding
-		 * is at most @align-1 bytes, and @buffer_size is at
-		 * least @align. In the worst case, this result will
-		 * be one greater than the number of objects that fit
-		 * into the memory allocation when taking the padding
-		 * into account.
-		 */
-		nr_objs = (slab_size) / (buffer_size + sizeof(unsigned int));
-
-		/*
-		 * This calculated number will be either the right
-		 * amount, or one greater than what we want.
-		 */
-		if (slab_mgmt_size(nr_objs, align) + nr_objs*buffer_size
-		       > slab_size)
-			nr_objs--;
-
-		mgmt_size = slab_mgmt_size(nr_objs, align);
+		nr_objs = calculate_nr_objs(slab_size, buffer_size,
+					sizeof(unsigned int), align);
+		mgmt_size = ALIGN(nr_objs * sizeof(unsigned int), align);
 	}
 	*num = nr_objs;
-	*left_over = slab_size - nr_objs*buffer_size - mgmt_size;
+	*left_over = slab_size - (nr_objs * buffer_size) - mgmt_size;
 }
 
 #if DEBUG
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 2/4] slab: introduce helper functions to get/set free object
  2013-09-02  8:38                 ` Joonsoo Kim
@ 2013-09-02  8:38                   ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-09-02  8:38 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

In the following patches, to get/set free objects from the freelist
is changed so that simple casting doesn't work for it. Therefore,
introduce helper functions.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index 9d4bad5..a0e49bb 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2545,9 +2545,15 @@ static struct freelist *alloc_slabmgmt(struct kmem_cache *cachep,
 	return freelist;
 }
 
-static inline unsigned int *slab_freelist(struct page *page)
+static inline unsigned int get_free_obj(struct page *page, unsigned int idx)
 {
-	return (unsigned int *)(page->freelist);
+	return ((unsigned int *)page->freelist)[idx];
+}
+
+static inline void set_free_obj(struct page *page,
+					unsigned int idx, unsigned int val)
+{
+	((unsigned int *)(page->freelist))[idx] = val;
 }
 
 static void cache_init_objs(struct kmem_cache *cachep,
@@ -2592,7 +2598,7 @@ static void cache_init_objs(struct kmem_cache *cachep,
 		if (cachep->ctor)
 			cachep->ctor(objp);
 #endif
-		slab_freelist(page)[i] = i;
+		set_free_obj(page, i, i);
 	}
 }
 
@@ -2611,7 +2617,7 @@ static void *slab_get_obj(struct kmem_cache *cachep, struct page *page,
 {
 	void *objp;
 
-	objp = index_to_obj(cachep, page, slab_freelist(page)[page->active]);
+	objp = index_to_obj(cachep, page, get_free_obj(page, page->active));
 	page->active++;
 #if DEBUG
 	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
@@ -2632,7 +2638,7 @@ static void slab_put_obj(struct kmem_cache *cachep, struct page *page,
 
 	/* Verify double free bug */
 	for (i = page->active; i < cachep->num; i++) {
-		if (slab_freelist(page)[i] == objnr) {
+		if (get_free_obj(page, i) == objnr) {
 			printk(KERN_ERR "slab: double free detected in cache "
 					"'%s', objp %p\n", cachep->name, objp);
 			BUG();
@@ -2640,7 +2646,7 @@ static void slab_put_obj(struct kmem_cache *cachep, struct page *page,
 	}
 #endif
 	page->active--;
-	slab_freelist(page)[page->active] = objnr;
+	set_free_obj(page, page->active, objnr);
 }
 
 /*
@@ -4214,7 +4220,7 @@ static void handle_slab(unsigned long *n, struct kmem_cache *c,
 
 		for (j = page->active; j < c->num; j++) {
 			/* Skip freed item */
-			if (slab_freelist(page)[j] == i) {
+			if (get_free_obj(page, j) == i) {
 				active = false;
 				break;
 			}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 2/4] slab: introduce helper functions to get/set free object
@ 2013-09-02  8:38                   ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-09-02  8:38 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

In the following patches, to get/set free objects from the freelist
is changed so that simple casting doesn't work for it. Therefore,
introduce helper functions.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index 9d4bad5..a0e49bb 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2545,9 +2545,15 @@ static struct freelist *alloc_slabmgmt(struct kmem_cache *cachep,
 	return freelist;
 }
 
-static inline unsigned int *slab_freelist(struct page *page)
+static inline unsigned int get_free_obj(struct page *page, unsigned int idx)
 {
-	return (unsigned int *)(page->freelist);
+	return ((unsigned int *)page->freelist)[idx];
+}
+
+static inline void set_free_obj(struct page *page,
+					unsigned int idx, unsigned int val)
+{
+	((unsigned int *)(page->freelist))[idx] = val;
 }
 
 static void cache_init_objs(struct kmem_cache *cachep,
@@ -2592,7 +2598,7 @@ static void cache_init_objs(struct kmem_cache *cachep,
 		if (cachep->ctor)
 			cachep->ctor(objp);
 #endif
-		slab_freelist(page)[i] = i;
+		set_free_obj(page, i, i);
 	}
 }
 
@@ -2611,7 +2617,7 @@ static void *slab_get_obj(struct kmem_cache *cachep, struct page *page,
 {
 	void *objp;
 
-	objp = index_to_obj(cachep, page, slab_freelist(page)[page->active]);
+	objp = index_to_obj(cachep, page, get_free_obj(page, page->active));
 	page->active++;
 #if DEBUG
 	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
@@ -2632,7 +2638,7 @@ static void slab_put_obj(struct kmem_cache *cachep, struct page *page,
 
 	/* Verify double free bug */
 	for (i = page->active; i < cachep->num; i++) {
-		if (slab_freelist(page)[i] == objnr) {
+		if (get_free_obj(page, i) == objnr) {
 			printk(KERN_ERR "slab: double free detected in cache "
 					"'%s', objp %p\n", cachep->name, objp);
 			BUG();
@@ -2640,7 +2646,7 @@ static void slab_put_obj(struct kmem_cache *cachep, struct page *page,
 	}
 #endif
 	page->active--;
-	slab_freelist(page)[page->active] = objnr;
+	set_free_obj(page, page->active, objnr);
 }
 
 /*
@@ -4214,7 +4220,7 @@ static void handle_slab(unsigned long *n, struct kmem_cache *c,
 
 		for (j = page->active; j < c->num; j++) {
 			/* Skip freed item */
-			if (slab_freelist(page)[j] == i) {
+			if (get_free_obj(page, j) == i) {
 				active = false;
 				break;
 			}
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 3/4] slab: introduce byte sized index for the freelist of a slab
  2013-09-02  8:38                 ` Joonsoo Kim
@ 2013-09-02  8:38                   ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-09-02  8:38 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

Currently, the freelist of a slab consist of unsigned int sized indexes.
Most of slabs have less number of objects than 256, since restriction
for page order is at most 1 in default configuration. For example,
consider a slab consisting of 32 byte sized objects on two continous
pages. In this case, 256 objects is possible and these number fit to byte
sized indexes. 256 objects is maximum possible value in default
configuration, since 32 byte is minimum object size in the SLAB.
(8192 / 32 = 256). Therefore, if we use byte sized index, we can save
3 bytes for each object.

This introduce one likely branch to functions used for setting/getting
objects to/from the freelist, but we may get more benefits from
this change.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index a0e49bb..bd366e5 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -565,8 +565,16 @@ static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep)
 	return cachep->array[smp_processor_id()];
 }
 
-static int calculate_nr_objs(size_t slab_size, size_t buffer_size,
-				size_t idx_size, size_t align)
+static inline bool can_byte_index(int nr_objs)
+{
+	if (likely(nr_objs <= (sizeof(unsigned char) << 8)))
+		return true;
+
+	return false;
+}
+
+static int __calculate_nr_objs(size_t slab_size, size_t buffer_size,
+				unsigned int idx_size, size_t align)
 {
 	int nr_objs;
 	size_t freelist_size;
@@ -592,6 +600,29 @@ static int calculate_nr_objs(size_t slab_size, size_t buffer_size,
 	return nr_objs;
 }
 
+static int calculate_nr_objs(size_t slab_size, size_t buffer_size,
+							size_t align)
+{
+	int nr_objs;
+	int byte_nr_objs;
+
+	nr_objs = __calculate_nr_objs(slab_size, buffer_size,
+					sizeof(unsigned int), align);
+	if (!can_byte_index(nr_objs))
+		return nr_objs;
+
+	byte_nr_objs = __calculate_nr_objs(slab_size, buffer_size,
+					sizeof(unsigned char), align);
+	/*
+	 * nr_objs can be larger when using byte index,
+	 * so that it cannot be indexed by byte index.
+	 */
+	if (can_byte_index(byte_nr_objs))
+		return byte_nr_objs;
+	else
+		return nr_objs;
+}
+
 /*
  * Calculate the number of objects and left-over bytes for a given buffer size.
  */
@@ -618,13 +649,18 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
 	 * correct alignment when allocated.
 	 */
 	if (flags & CFLGS_OFF_SLAB) {
-		mgmt_size = 0;
 		nr_objs = slab_size / buffer_size;
+		mgmt_size = 0;
 
 	} else {
-		nr_objs = calculate_nr_objs(slab_size, buffer_size,
-					sizeof(unsigned int), align);
-		mgmt_size = ALIGN(nr_objs * sizeof(unsigned int), align);
+		nr_objs = calculate_nr_objs(slab_size, buffer_size, align);
+		if (can_byte_index(nr_objs)) {
+			mgmt_size =
+				ALIGN(nr_objs * sizeof(unsigned char), align);
+		} else {
+			mgmt_size =
+				ALIGN(nr_objs * sizeof(unsigned int), align);
+		}
 	}
 	*num = nr_objs;
 	*left_over = slab_size - (nr_objs * buffer_size) - mgmt_size;
@@ -2012,7 +2048,10 @@ static size_t calculate_slab_order(struct kmem_cache *cachep,
 			 * looping condition in cache_grow().
 			 */
 			offslab_limit = size;
-			offslab_limit /= sizeof(unsigned int);
+			if (can_byte_index(num))
+				offslab_limit /= sizeof(unsigned char);
+			else
+				offslab_limit /= sizeof(unsigned int);
 
  			if (num > offslab_limit)
 				break;
@@ -2253,8 +2292,13 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	if (!cachep->num)
 		return -E2BIG;
 
-	freelist_size =
-		ALIGN(cachep->num * sizeof(unsigned int), cachep->align);
+	if (can_byte_index(cachep->num)) {
+		freelist_size = ALIGN(cachep->num * sizeof(unsigned char),
+								cachep->align);
+	} else {
+		freelist_size = ALIGN(cachep->num * sizeof(unsigned int),
+								cachep->align);
+	}
 
 	/*
 	 * If the slab has been placed off-slab, and we have enough space then
@@ -2267,7 +2311,10 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 
 	if (flags & CFLGS_OFF_SLAB) {
 		/* really off slab. No need for manual alignment */
-		freelist_size = cachep->num * sizeof(unsigned int);
+		if (can_byte_index(cachep->num))
+			freelist_size = cachep->num * sizeof(unsigned char);
+		else
+			freelist_size = cachep->num * sizeof(unsigned int);
 
 #ifdef CONFIG_PAGE_POISONING
 		/* If we're going to use the generic kernel_map_pages()
@@ -2545,15 +2592,22 @@ static struct freelist *alloc_slabmgmt(struct kmem_cache *cachep,
 	return freelist;
 }
 
-static inline unsigned int get_free_obj(struct page *page, unsigned int idx)
+static inline unsigned int get_free_obj(struct kmem_cache *cachep,
+					struct page *page, unsigned int idx)
 {
-	return ((unsigned int *)page->freelist)[idx];
+	if (likely(can_byte_index(cachep->num)))
+		return ((unsigned char *)page->freelist)[idx];
+	else
+		return ((unsigned int *)page->freelist)[idx];
 }
 
-static inline void set_free_obj(struct page *page,
+static inline void set_free_obj(struct kmem_cache *cachep, struct page *page,
 					unsigned int idx, unsigned int val)
 {
-	((unsigned int *)(page->freelist))[idx] = val;
+	if (likely(can_byte_index(cachep->num)))
+		((unsigned char *)(page->freelist))[idx] = (unsigned char)val;
+	else
+		((unsigned int *)(page->freelist))[idx] = val;
 }
 
 static void cache_init_objs(struct kmem_cache *cachep,
@@ -2598,7 +2652,7 @@ static void cache_init_objs(struct kmem_cache *cachep,
 		if (cachep->ctor)
 			cachep->ctor(objp);
 #endif
-		set_free_obj(page, i, i);
+		set_free_obj(cachep, page, i, i);
 	}
 }
 
@@ -2615,9 +2669,11 @@ static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
 static void *slab_get_obj(struct kmem_cache *cachep, struct page *page,
 				int nodeid)
 {
+	unsigned int index;
 	void *objp;
 
-	objp = index_to_obj(cachep, page, get_free_obj(page, page->active));
+	index = get_free_obj(cachep, page, page->active);
+	objp = index_to_obj(cachep, page, index);
 	page->active++;
 #if DEBUG
 	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
@@ -2638,7 +2694,7 @@ static void slab_put_obj(struct kmem_cache *cachep, struct page *page,
 
 	/* Verify double free bug */
 	for (i = page->active; i < cachep->num; i++) {
-		if (get_free_obj(page, i) == objnr) {
+		if (get_free_obj(cachep, page, i) == objnr) {
 			printk(KERN_ERR "slab: double free detected in cache "
 					"'%s', objp %p\n", cachep->name, objp);
 			BUG();
@@ -2646,7 +2702,7 @@ static void slab_put_obj(struct kmem_cache *cachep, struct page *page,
 	}
 #endif
 	page->active--;
-	set_free_obj(page, page->active, objnr);
+	set_free_obj(cachep, page, page->active, objnr);
 }
 
 /*
@@ -4220,7 +4276,7 @@ static void handle_slab(unsigned long *n, struct kmem_cache *c,
 
 		for (j = page->active; j < c->num; j++) {
 			/* Skip freed item */
-			if (get_free_obj(page, j) == i) {
+			if (get_free_obj(c, page, j) == i) {
 				active = false;
 				break;
 			}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 3/4] slab: introduce byte sized index for the freelist of a slab
@ 2013-09-02  8:38                   ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-09-02  8:38 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

Currently, the freelist of a slab consist of unsigned int sized indexes.
Most of slabs have less number of objects than 256, since restriction
for page order is at most 1 in default configuration. For example,
consider a slab consisting of 32 byte sized objects on two continous
pages. In this case, 256 objects is possible and these number fit to byte
sized indexes. 256 objects is maximum possible value in default
configuration, since 32 byte is minimum object size in the SLAB.
(8192 / 32 = 256). Therefore, if we use byte sized index, we can save
3 bytes for each object.

This introduce one likely branch to functions used for setting/getting
objects to/from the freelist, but we may get more benefits from
this change.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index a0e49bb..bd366e5 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -565,8 +565,16 @@ static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep)
 	return cachep->array[smp_processor_id()];
 }
 
-static int calculate_nr_objs(size_t slab_size, size_t buffer_size,
-				size_t idx_size, size_t align)
+static inline bool can_byte_index(int nr_objs)
+{
+	if (likely(nr_objs <= (sizeof(unsigned char) << 8)))
+		return true;
+
+	return false;
+}
+
+static int __calculate_nr_objs(size_t slab_size, size_t buffer_size,
+				unsigned int idx_size, size_t align)
 {
 	int nr_objs;
 	size_t freelist_size;
@@ -592,6 +600,29 @@ static int calculate_nr_objs(size_t slab_size, size_t buffer_size,
 	return nr_objs;
 }
 
+static int calculate_nr_objs(size_t slab_size, size_t buffer_size,
+							size_t align)
+{
+	int nr_objs;
+	int byte_nr_objs;
+
+	nr_objs = __calculate_nr_objs(slab_size, buffer_size,
+					sizeof(unsigned int), align);
+	if (!can_byte_index(nr_objs))
+		return nr_objs;
+
+	byte_nr_objs = __calculate_nr_objs(slab_size, buffer_size,
+					sizeof(unsigned char), align);
+	/*
+	 * nr_objs can be larger when using byte index,
+	 * so that it cannot be indexed by byte index.
+	 */
+	if (can_byte_index(byte_nr_objs))
+		return byte_nr_objs;
+	else
+		return nr_objs;
+}
+
 /*
  * Calculate the number of objects and left-over bytes for a given buffer size.
  */
@@ -618,13 +649,18 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
 	 * correct alignment when allocated.
 	 */
 	if (flags & CFLGS_OFF_SLAB) {
-		mgmt_size = 0;
 		nr_objs = slab_size / buffer_size;
+		mgmt_size = 0;
 
 	} else {
-		nr_objs = calculate_nr_objs(slab_size, buffer_size,
-					sizeof(unsigned int), align);
-		mgmt_size = ALIGN(nr_objs * sizeof(unsigned int), align);
+		nr_objs = calculate_nr_objs(slab_size, buffer_size, align);
+		if (can_byte_index(nr_objs)) {
+			mgmt_size =
+				ALIGN(nr_objs * sizeof(unsigned char), align);
+		} else {
+			mgmt_size =
+				ALIGN(nr_objs * sizeof(unsigned int), align);
+		}
 	}
 	*num = nr_objs;
 	*left_over = slab_size - (nr_objs * buffer_size) - mgmt_size;
@@ -2012,7 +2048,10 @@ static size_t calculate_slab_order(struct kmem_cache *cachep,
 			 * looping condition in cache_grow().
 			 */
 			offslab_limit = size;
-			offslab_limit /= sizeof(unsigned int);
+			if (can_byte_index(num))
+				offslab_limit /= sizeof(unsigned char);
+			else
+				offslab_limit /= sizeof(unsigned int);
 
  			if (num > offslab_limit)
 				break;
@@ -2253,8 +2292,13 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	if (!cachep->num)
 		return -E2BIG;
 
-	freelist_size =
-		ALIGN(cachep->num * sizeof(unsigned int), cachep->align);
+	if (can_byte_index(cachep->num)) {
+		freelist_size = ALIGN(cachep->num * sizeof(unsigned char),
+								cachep->align);
+	} else {
+		freelist_size = ALIGN(cachep->num * sizeof(unsigned int),
+								cachep->align);
+	}
 
 	/*
 	 * If the slab has been placed off-slab, and we have enough space then
@@ -2267,7 +2311,10 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 
 	if (flags & CFLGS_OFF_SLAB) {
 		/* really off slab. No need for manual alignment */
-		freelist_size = cachep->num * sizeof(unsigned int);
+		if (can_byte_index(cachep->num))
+			freelist_size = cachep->num * sizeof(unsigned char);
+		else
+			freelist_size = cachep->num * sizeof(unsigned int);
 
 #ifdef CONFIG_PAGE_POISONING
 		/* If we're going to use the generic kernel_map_pages()
@@ -2545,15 +2592,22 @@ static struct freelist *alloc_slabmgmt(struct kmem_cache *cachep,
 	return freelist;
 }
 
-static inline unsigned int get_free_obj(struct page *page, unsigned int idx)
+static inline unsigned int get_free_obj(struct kmem_cache *cachep,
+					struct page *page, unsigned int idx)
 {
-	return ((unsigned int *)page->freelist)[idx];
+	if (likely(can_byte_index(cachep->num)))
+		return ((unsigned char *)page->freelist)[idx];
+	else
+		return ((unsigned int *)page->freelist)[idx];
 }
 
-static inline void set_free_obj(struct page *page,
+static inline void set_free_obj(struct kmem_cache *cachep, struct page *page,
 					unsigned int idx, unsigned int val)
 {
-	((unsigned int *)(page->freelist))[idx] = val;
+	if (likely(can_byte_index(cachep->num)))
+		((unsigned char *)(page->freelist))[idx] = (unsigned char)val;
+	else
+		((unsigned int *)(page->freelist))[idx] = val;
 }
 
 static void cache_init_objs(struct kmem_cache *cachep,
@@ -2598,7 +2652,7 @@ static void cache_init_objs(struct kmem_cache *cachep,
 		if (cachep->ctor)
 			cachep->ctor(objp);
 #endif
-		set_free_obj(page, i, i);
+		set_free_obj(cachep, page, i, i);
 	}
 }
 
@@ -2615,9 +2669,11 @@ static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
 static void *slab_get_obj(struct kmem_cache *cachep, struct page *page,
 				int nodeid)
 {
+	unsigned int index;
 	void *objp;
 
-	objp = index_to_obj(cachep, page, get_free_obj(page, page->active));
+	index = get_free_obj(cachep, page, page->active);
+	objp = index_to_obj(cachep, page, index);
 	page->active++;
 #if DEBUG
 	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
@@ -2638,7 +2694,7 @@ static void slab_put_obj(struct kmem_cache *cachep, struct page *page,
 
 	/* Verify double free bug */
 	for (i = page->active; i < cachep->num; i++) {
-		if (get_free_obj(page, i) == objnr) {
+		if (get_free_obj(cachep, page, i) == objnr) {
 			printk(KERN_ERR "slab: double free detected in cache "
 					"'%s', objp %p\n", cachep->name, objp);
 			BUG();
@@ -2646,7 +2702,7 @@ static void slab_put_obj(struct kmem_cache *cachep, struct page *page,
 	}
 #endif
 	page->active--;
-	set_free_obj(page, page->active, objnr);
+	set_free_obj(cachep, page, page->active, objnr);
 }
 
 /*
@@ -4220,7 +4276,7 @@ static void handle_slab(unsigned long *n, struct kmem_cache *c,
 
 		for (j = page->active; j < c->num; j++) {
 			/* Skip freed item */
-			if (get_free_obj(page, j) == i) {
+			if (get_free_obj(c, page, j) == i) {
 				active = false;
 				break;
 			}
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 4/4] slab: make more slab management structure off the slab
  2013-09-02  8:38                 ` Joonsoo Kim
@ 2013-09-02  8:38                   ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-09-02  8:38 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

Now, the size of the freelist for the slab management diminish,
so that the on-slab management structure can waste large space
if the object of the slab is large.

Consider a 128 byte sized slab. If on-slab is used, 31 objects can be
in the slab. The size of the freelist for this case would be 31 bytes
so that 97 bytes, that is, more than 75% of object size, are wasted.

In a 64 byte sized slab case, no space is wasted if we use on-slab.
So set off-slab determining constraint to 128 bytes.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index bd366e5..d01a2f0 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2277,7 +2277,7 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	 * it too early on. Always use on-slab management when
 	 * SLAB_NOLEAKTRACE to avoid recursive calls into kmemleak)
 	 */
-	if ((size >= (PAGE_SIZE >> 3)) && !slab_early_init &&
+	if ((size >= (PAGE_SIZE >> 5)) && !slab_early_init &&
 	    !(flags & SLAB_NOLEAKTRACE))
 		/*
 		 * Size is large, assume best to place the slab management obj
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 4/4] slab: make more slab management structure off the slab
@ 2013-09-02  8:38                   ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-09-02  8:38 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Joonsoo Kim

Now, the size of the freelist for the slab management diminish,
so that the on-slab management structure can waste large space
if the object of the slab is large.

Consider a 128 byte sized slab. If on-slab is used, 31 objects can be
in the slab. The size of the freelist for this case would be 31 bytes
so that 97 bytes, that is, more than 75% of object size, are wasted.

In a 64 byte sized slab case, no space is wasted if we use on-slab.
So set off-slab determining constraint to 128 bytes.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slab.c b/mm/slab.c
index bd366e5..d01a2f0 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2277,7 +2277,7 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	 * it too early on. Always use on-slab management when
 	 * SLAB_NOLEAKTRACE to avoid recursive calls into kmemleak)
 	 */
-	if ((size >= (PAGE_SIZE >> 3)) && !slab_early_init &&
+	if ((size >= (PAGE_SIZE >> 5)) && !slab_early_init &&
 	    !(flags & SLAB_NOLEAKTRACE))
 		/*
 		 * Size is large, assume best to place the slab management obj
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* Re: [PATCH 0/4] slab: implement byte sized indexes for the freelist of a slab
  2013-09-02  8:38                 ` Joonsoo Kim
@ 2013-09-03 14:15                   ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-09-03 14:15 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel

On Mon, 2 Sep 2013, Joonsoo Kim wrote:

> This patchset implements byte sized indexes for the freelist of a slab.
>
> Currently, the freelist of a slab consist of unsigned int sized indexes.
> Most of slabs have less number of objects than 256, so much space is wasted.
> To reduce this overhead, this patchset implements byte sized indexes for
> the freelist of a slab. With it, we can save 3 bytes for each objects.
>
> This introduce one likely branch to functions used for setting/getting
> objects to/from the freelist, but we may get more benefits from
> this change.
>
> Below is some numbers of 'cat /proc/slabinfo' related to my previous posting
> and this patchset.

You  may also want to run some performance tests. The cache footprint
should also be reduced with this patchset and therefore performance should
be better.


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 0/4] slab: implement byte sized indexes for the freelist of a slab
@ 2013-09-03 14:15                   ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-09-03 14:15 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel

On Mon, 2 Sep 2013, Joonsoo Kim wrote:

> This patchset implements byte sized indexes for the freelist of a slab.
>
> Currently, the freelist of a slab consist of unsigned int sized indexes.
> Most of slabs have less number of objects than 256, so much space is wasted.
> To reduce this overhead, this patchset implements byte sized indexes for
> the freelist of a slab. With it, we can save 3 bytes for each objects.
>
> This introduce one likely branch to functions used for setting/getting
> objects to/from the freelist, but we may get more benefits from
> this change.
>
> Below is some numbers of 'cat /proc/slabinfo' related to my previous posting
> and this patchset.

You  may also want to run some performance tests. The cache footprint
should also be reduced with this patchset and therefore performance should
be better.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 0/4] slab: implement byte sized indexes for the freelist of a slab
  2013-09-02  8:38                 ` Joonsoo Kim
                                   ` (6 preceding siblings ...)
  (?)
@ 2013-09-04  2:17                 ` Wanpeng Li
  -1 siblings, 0 replies; 114+ messages in thread
From: Wanpeng Li @ 2013-09-04  2:17 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Christoph Lameter, Andrew Morton, Joonsoo Kim,
	David Rientjes, linux-mm, linux-kernel

Hi Joonsoo,
On Mon, Sep 02, 2013 at 05:38:54PM +0900, Joonsoo Kim wrote:
>This patchset implements byte sized indexes for the freelist of a slab.
>
>Currently, the freelist of a slab consist of unsigned int sized indexes.
>Most of slabs have less number of objects than 256, so much space is wasted.
>To reduce this overhead, this patchset implements byte sized indexes for
>the freelist of a slab. With it, we can save 3 bytes for each objects.
>
>This introduce one likely branch to functions used for setting/getting
>objects to/from the freelist, but we may get more benefits from
>this change.
>
>Below is some numbers of 'cat /proc/slabinfo' related to my previous posting
>and this patchset.
>
>
>* Before *
># name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables [snip...]
>kmalloc-512          525    640    512    8    1 : tunables   54   27    0 : slabdata     80     80      0   
>kmalloc-256          210    210    256   15    1 : tunables  120   60    0 : slabdata     14     14      0   
>kmalloc-192         1016   1040    192   20    1 : tunables  120   60    0 : slabdata     52     52      0   
>kmalloc-96           560    620    128   31    1 : tunables  120   60    0 : slabdata     20     20      0   
>kmalloc-64          2148   2280     64   60    1 : tunables  120   60    0 : slabdata     38     38      0   
>kmalloc-128          647    682    128   31    1 : tunables  120   60    0 : slabdata     22     22      0   
>kmalloc-32         11360  11413     32  113    1 : tunables  120   60    0 : slabdata    101    101      0   
>kmem_cache           197    200    192   20    1 : tunables  120   60    0 : slabdata     10     10      0   
>
>* After my previous posting(overload struct slab over struct page) *
># name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables [snip...]
>kmalloc-512          527    600    512    8    1 : tunables   54   27    0 : slabdata     75     75      0   
>kmalloc-256          210    210    256   15    1 : tunables  120   60    0 : slabdata     14     14      0   
>kmalloc-192         1040   1040    192   20    1 : tunables  120   60    0 : slabdata     52     52      0   
>kmalloc-96           750    750    128   30    1 : tunables  120   60    0 : slabdata     25     25      0   
>kmalloc-64          2773   2773     64   59    1 : tunables  120   60    0 : slabdata     47     47      0   
>kmalloc-128          660    690    128   30    1 : tunables  120   60    0 : slabdata     23     23      0   
>kmalloc-32         11200  11200     32  112    1 : tunables  120   60    0 : slabdata    100    100      0   
>kmem_cache           197    200    192   20    1 : tunables  120   60    0 : slabdata     10     10      0   
>
>kmem_caches consisting of objects less than or equal to 128 byte have one more
>objects in a slab. You can see it at objperslab.

I think there is one less objects in a slab after observing objperslab.

Regards,
Wanpeng Li 

>
>We can improve further with this patchset.
>
>* My previous posting + this patchset *
># name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables [snip...]
>kmalloc-512          521    648    512    8    1 : tunables   54   27    0 : slabdata     81     81      0
>kmalloc-256          208    208    256   16    1 : tunables  120   60    0 : slabdata     13     13      0
>kmalloc-192         1029   1029    192   21    1 : tunables  120   60    0 : slabdata     49     49      0
>kmalloc-96           529    589    128   31    1 : tunables  120   60    0 : slabdata     19     19      0
>kmalloc-64          2142   2142     64   63    1 : tunables  120   60    0 : slabdata     34     34      0
>kmalloc-128          660    682    128   31    1 : tunables  120   60    0 : slabdata     22     22      0
>kmalloc-32         11716  11780     32  124    1 : tunables  120   60    0 : slabdata     95     95      0
>kmem_cache           197    210    192   21    1 : tunables  120   60    0 : slabdata     10     10      0
>
>kmem_caches consisting of objects less than or equal to 256 byte have
>one or more objects than before. In the case of kmalloc-32, we have 12 more
>objects, so 384 bytes (12 * 32) are saved and this is roughly 9% saving of
>memory. Of couse, this percentage decreases as the number of objects
>in a slab decreases.
>
>Please let me know expert's opions :)
>Thanks.
>
>This patchset comes from a Christoph's idea.
>https://lkml.org/lkml/2013/8/23/315
>
>Patches are on top of my previous posting.
>https://lkml.org/lkml/2013/8/22/137
>
>Joonsoo Kim (4):
>  slab: factor out calculate nr objects in cache_estimate
>  slab: introduce helper functions to get/set free object
>  slab: introduce byte sized index for the freelist of a slab
>  slab: make more slab management structure off the slab
>
> mm/slab.c |  138 +++++++++++++++++++++++++++++++++++++++++++++----------------
> 1 file changed, 103 insertions(+), 35 deletions(-)
>
>-- 
>1.7.9.5
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@kvack.org.  For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 0/4] slab: implement byte sized indexes for the freelist of a slab
  2013-09-02  8:38                 ` Joonsoo Kim
                                   ` (5 preceding siblings ...)
  (?)
@ 2013-09-04  2:17                 ` Wanpeng Li
  -1 siblings, 0 replies; 114+ messages in thread
From: Wanpeng Li @ 2013-09-04  2:17 UTC (permalink / raw)
  Cc: Pekka Enberg, Christoph Lameter, Andrew Morton, Joonsoo Kim,
	David Rientjes, linux-mm, linux-kernel, Joonsoo Kim

Hi Joonsoo,
On Mon, Sep 02, 2013 at 05:38:54PM +0900, Joonsoo Kim wrote:
>This patchset implements byte sized indexes for the freelist of a slab.
>
>Currently, the freelist of a slab consist of unsigned int sized indexes.
>Most of slabs have less number of objects than 256, so much space is wasted.
>To reduce this overhead, this patchset implements byte sized indexes for
>the freelist of a slab. With it, we can save 3 bytes for each objects.
>
>This introduce one likely branch to functions used for setting/getting
>objects to/from the freelist, but we may get more benefits from
>this change.
>
>Below is some numbers of 'cat /proc/slabinfo' related to my previous posting
>and this patchset.
>
>
>* Before *
># name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables [snip...]
>kmalloc-512          525    640    512    8    1 : tunables   54   27    0 : slabdata     80     80      0   
>kmalloc-256          210    210    256   15    1 : tunables  120   60    0 : slabdata     14     14      0   
>kmalloc-192         1016   1040    192   20    1 : tunables  120   60    0 : slabdata     52     52      0   
>kmalloc-96           560    620    128   31    1 : tunables  120   60    0 : slabdata     20     20      0   
>kmalloc-64          2148   2280     64   60    1 : tunables  120   60    0 : slabdata     38     38      0   
>kmalloc-128          647    682    128   31    1 : tunables  120   60    0 : slabdata     22     22      0   
>kmalloc-32         11360  11413     32  113    1 : tunables  120   60    0 : slabdata    101    101      0   
>kmem_cache           197    200    192   20    1 : tunables  120   60    0 : slabdata     10     10      0   
>
>* After my previous posting(overload struct slab over struct page) *
># name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables [snip...]
>kmalloc-512          527    600    512    8    1 : tunables   54   27    0 : slabdata     75     75      0   
>kmalloc-256          210    210    256   15    1 : tunables  120   60    0 : slabdata     14     14      0   
>kmalloc-192         1040   1040    192   20    1 : tunables  120   60    0 : slabdata     52     52      0   
>kmalloc-96           750    750    128   30    1 : tunables  120   60    0 : slabdata     25     25      0   
>kmalloc-64          2773   2773     64   59    1 : tunables  120   60    0 : slabdata     47     47      0   
>kmalloc-128          660    690    128   30    1 : tunables  120   60    0 : slabdata     23     23      0   
>kmalloc-32         11200  11200     32  112    1 : tunables  120   60    0 : slabdata    100    100      0   
>kmem_cache           197    200    192   20    1 : tunables  120   60    0 : slabdata     10     10      0   
>
>kmem_caches consisting of objects less than or equal to 128 byte have one more
>objects in a slab. You can see it at objperslab.

I think there is one less objects in a slab after observing objperslab.

Regards,
Wanpeng Li 

>
>We can improve further with this patchset.
>
>* My previous posting + this patchset *
># name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables [snip...]
>kmalloc-512          521    648    512    8    1 : tunables   54   27    0 : slabdata     81     81      0
>kmalloc-256          208    208    256   16    1 : tunables  120   60    0 : slabdata     13     13      0
>kmalloc-192         1029   1029    192   21    1 : tunables  120   60    0 : slabdata     49     49      0
>kmalloc-96           529    589    128   31    1 : tunables  120   60    0 : slabdata     19     19      0
>kmalloc-64          2142   2142     64   63    1 : tunables  120   60    0 : slabdata     34     34      0
>kmalloc-128          660    682    128   31    1 : tunables  120   60    0 : slabdata     22     22      0
>kmalloc-32         11716  11780     32  124    1 : tunables  120   60    0 : slabdata     95     95      0
>kmem_cache           197    210    192   21    1 : tunables  120   60    0 : slabdata     10     10      0
>
>kmem_caches consisting of objects less than or equal to 256 byte have
>one or more objects than before. In the case of kmalloc-32, we have 12 more
>objects, so 384 bytes (12 * 32) are saved and this is roughly 9% saving of
>memory. Of couse, this percentage decreases as the number of objects
>in a slab decreases.
>
>Please let me know expert's opions :)
>Thanks.
>
>This patchset comes from a Christoph's idea.
>https://lkml.org/lkml/2013/8/23/315
>
>Patches are on top of my previous posting.
>https://lkml.org/lkml/2013/8/22/137
>
>Joonsoo Kim (4):
>  slab: factor out calculate nr objects in cache_estimate
>  slab: introduce helper functions to get/set free object
>  slab: introduce byte sized index for the freelist of a slab
>  slab: make more slab management structure off the slab
>
> mm/slab.c |  138 +++++++++++++++++++++++++++++++++++++++++++++----------------
> 1 file changed, 103 insertions(+), 35 deletions(-)
>
>-- 
>1.7.9.5
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@kvack.org.  For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 00/16] slab: overload struct slab over struct page to reduce memory usage
  2013-08-23  6:35     ` Joonsoo Kim
  (?)
@ 2013-09-04  3:38     ` Wanpeng Li
  -1 siblings, 0 replies; 114+ messages in thread
From: Wanpeng Li @ 2013-09-04  3:38 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Christoph Lameter, Pekka Enberg, Andrew Morton, David Rientjes,
	linux-mm, linux-kernel

Hi Joonsoo,
On Fri, Aug 23, 2013 at 03:35:39PM +0900, Joonsoo Kim wrote:
>On Thu, Aug 22, 2013 at 04:47:25PM +0000, Christoph Lameter wrote:
>> On Thu, 22 Aug 2013, Joonsoo Kim wrote:
>
[...]
>struct slab's free = END
>kmem_bufctl_t array: ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE
><we get object at index 0>
>

Is there a real item for END in kmem_bufctl_t array as you mentioned above?
I think the kmem_bufctl_t array doesn't include that and the last step is 
not present. 

Regards,
Wanpeng Li 

[...]

>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@kvack.org.  For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 00/16] slab: overload struct slab over struct page to reduce memory usage
  2013-08-23  6:35     ` Joonsoo Kim
  (?)
  (?)
@ 2013-09-04  3:38     ` Wanpeng Li
  -1 siblings, 0 replies; 114+ messages in thread
From: Wanpeng Li @ 2013-09-04  3:38 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Christoph Lameter, Pekka Enberg, Andrew Morton, David Rientjes,
	linux-mm, linux-kernel

Hi Joonsoo,
On Fri, Aug 23, 2013 at 03:35:39PM +0900, Joonsoo Kim wrote:
>On Thu, Aug 22, 2013 at 04:47:25PM +0000, Christoph Lameter wrote:
>> On Thu, 22 Aug 2013, Joonsoo Kim wrote:
>
[...]
>struct slab's free = END
>kmem_bufctl_t array: ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE
><we get object at index 0>
>

Is there a real item for END in kmem_bufctl_t array as you mentioned above?
I think the kmem_bufctl_t array doesn't include that and the last step is 
not present. 

Regards,
Wanpeng Li 

[...]

>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@kvack.org.  For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 00/16] slab: overload struct slab over struct page to reduce memory usage
       [not found]     ` <5226ab2c.02092b0a.5eed.ffffd7e4SMTPIN_ADDED_BROKEN@mx.google.com>
@ 2013-09-04  8:25         ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-09-04  8:25 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Christoph Lameter, Pekka Enberg, Andrew Morton, David Rientjes,
	linux-mm, linux-kernel

On Wed, Sep 04, 2013 at 11:38:04AM +0800, Wanpeng Li wrote:
> Hi Joonsoo,
> On Fri, Aug 23, 2013 at 03:35:39PM +0900, Joonsoo Kim wrote:
> >On Thu, Aug 22, 2013 at 04:47:25PM +0000, Christoph Lameter wrote:
> >> On Thu, 22 Aug 2013, Joonsoo Kim wrote:
> >
> [...]
> >struct slab's free = END
> >kmem_bufctl_t array: ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE
> ><we get object at index 0>
> >
> 
> Is there a real item for END in kmem_bufctl_t array as you mentioned above?
> I think the kmem_bufctl_t array doesn't include that and the last step is 
> not present. 

Yes, there is. BUFCTL_END is what I told for END. A slab is initialized in
cache_init_objs() and a last step in that function is to set last entry of
a free array of a slab to BUFCTL_END. This value remains in the whole life
cycle of a slab.

Thanks.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 00/16] slab: overload struct slab over struct page to reduce memory usage
@ 2013-09-04  8:25         ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-09-04  8:25 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Christoph Lameter, Pekka Enberg, Andrew Morton, David Rientjes,
	linux-mm, linux-kernel

On Wed, Sep 04, 2013 at 11:38:04AM +0800, Wanpeng Li wrote:
> Hi Joonsoo,
> On Fri, Aug 23, 2013 at 03:35:39PM +0900, Joonsoo Kim wrote:
> >On Thu, Aug 22, 2013 at 04:47:25PM +0000, Christoph Lameter wrote:
> >> On Thu, 22 Aug 2013, Joonsoo Kim wrote:
> >
> [...]
> >struct slab's free = END
> >kmem_bufctl_t array: ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE
> ><we get object at index 0>
> >
> 
> Is there a real item for END in kmem_bufctl_t array as you mentioned above?
> I think the kmem_bufctl_t array doesn't include that and the last step is 
> not present. 

Yes, there is. BUFCTL_END is what I told for END. A slab is initialized in
cache_init_objs() and a last step in that function is to set last entry of
a free array of a slab to BUFCTL_END. This value remains in the whole life
cycle of a slab.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 0/4] slab: implement byte sized indexes for the freelist of a slab
       [not found]                 ` <5226985f.4475320a.1c61.2623SMTPIN_ADDED_BROKEN@mx.google.com>
@ 2013-09-04  8:28                     ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-09-04  8:28 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Pekka Enberg, Christoph Lameter, Andrew Morton, David Rientjes,
	linux-mm, linux-kernel

On Wed, Sep 04, 2013 at 10:17:46AM +0800, Wanpeng Li wrote:
> Hi Joonsoo,
> On Mon, Sep 02, 2013 at 05:38:54PM +0900, Joonsoo Kim wrote:
> >This patchset implements byte sized indexes for the freelist of a slab.
> >
> >Currently, the freelist of a slab consist of unsigned int sized indexes.
> >Most of slabs have less number of objects than 256, so much space is wasted.
> >To reduce this overhead, this patchset implements byte sized indexes for
> >the freelist of a slab. With it, we can save 3 bytes for each objects.
> >
> >This introduce one likely branch to functions used for setting/getting
> >objects to/from the freelist, but we may get more benefits from
> >this change.
> >
> >Below is some numbers of 'cat /proc/slabinfo' related to my previous posting
> >and this patchset.
> >
> >
> >* Before *
> ># name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables [snip...]
> >kmalloc-512          525    640    512    8    1 : tunables   54   27    0 : slabdata     80     80      0   
> >kmalloc-256          210    210    256   15    1 : tunables  120   60    0 : slabdata     14     14      0   
> >kmalloc-192         1016   1040    192   20    1 : tunables  120   60    0 : slabdata     52     52      0   
> >kmalloc-96           560    620    128   31    1 : tunables  120   60    0 : slabdata     20     20      0   
> >kmalloc-64          2148   2280     64   60    1 : tunables  120   60    0 : slabdata     38     38      0   
> >kmalloc-128          647    682    128   31    1 : tunables  120   60    0 : slabdata     22     22      0   
> >kmalloc-32         11360  11413     32  113    1 : tunables  120   60    0 : slabdata    101    101      0   
> >kmem_cache           197    200    192   20    1 : tunables  120   60    0 : slabdata     10     10      0   
> >
> >* After my previous posting(overload struct slab over struct page) *
> ># name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables [snip...]
> >kmalloc-512          527    600    512    8    1 : tunables   54   27    0 : slabdata     75     75      0   
> >kmalloc-256          210    210    256   15    1 : tunables  120   60    0 : slabdata     14     14      0   
> >kmalloc-192         1040   1040    192   20    1 : tunables  120   60    0 : slabdata     52     52      0   
> >kmalloc-96           750    750    128   30    1 : tunables  120   60    0 : slabdata     25     25      0   
> >kmalloc-64          2773   2773     64   59    1 : tunables  120   60    0 : slabdata     47     47      0   
> >kmalloc-128          660    690    128   30    1 : tunables  120   60    0 : slabdata     23     23      0   
> >kmalloc-32         11200  11200     32  112    1 : tunables  120   60    0 : slabdata    100    100      0   
> >kmem_cache           197    200    192   20    1 : tunables  120   60    0 : slabdata     10     10      0   
> >
> >kmem_caches consisting of objects less than or equal to 128 byte have one more
> >objects in a slab. You can see it at objperslab.
> 
> I think there is one less objects in a slab after observing objperslab.

Yes :)
I did a mistake when I attached the data about this patchset.
The results of *Before* and *After* should be exchanged.
Thanks for pointing out that.

Thanks.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 0/4] slab: implement byte sized indexes for the freelist of a slab
@ 2013-09-04  8:28                     ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-09-04  8:28 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Pekka Enberg, Christoph Lameter, Andrew Morton, David Rientjes,
	linux-mm, linux-kernel

On Wed, Sep 04, 2013 at 10:17:46AM +0800, Wanpeng Li wrote:
> Hi Joonsoo,
> On Mon, Sep 02, 2013 at 05:38:54PM +0900, Joonsoo Kim wrote:
> >This patchset implements byte sized indexes for the freelist of a slab.
> >
> >Currently, the freelist of a slab consist of unsigned int sized indexes.
> >Most of slabs have less number of objects than 256, so much space is wasted.
> >To reduce this overhead, this patchset implements byte sized indexes for
> >the freelist of a slab. With it, we can save 3 bytes for each objects.
> >
> >This introduce one likely branch to functions used for setting/getting
> >objects to/from the freelist, but we may get more benefits from
> >this change.
> >
> >Below is some numbers of 'cat /proc/slabinfo' related to my previous posting
> >and this patchset.
> >
> >
> >* Before *
> ># name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables [snip...]
> >kmalloc-512          525    640    512    8    1 : tunables   54   27    0 : slabdata     80     80      0   
> >kmalloc-256          210    210    256   15    1 : tunables  120   60    0 : slabdata     14     14      0   
> >kmalloc-192         1016   1040    192   20    1 : tunables  120   60    0 : slabdata     52     52      0   
> >kmalloc-96           560    620    128   31    1 : tunables  120   60    0 : slabdata     20     20      0   
> >kmalloc-64          2148   2280     64   60    1 : tunables  120   60    0 : slabdata     38     38      0   
> >kmalloc-128          647    682    128   31    1 : tunables  120   60    0 : slabdata     22     22      0   
> >kmalloc-32         11360  11413     32  113    1 : tunables  120   60    0 : slabdata    101    101      0   
> >kmem_cache           197    200    192   20    1 : tunables  120   60    0 : slabdata     10     10      0   
> >
> >* After my previous posting(overload struct slab over struct page) *
> ># name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables [snip...]
> >kmalloc-512          527    600    512    8    1 : tunables   54   27    0 : slabdata     75     75      0   
> >kmalloc-256          210    210    256   15    1 : tunables  120   60    0 : slabdata     14     14      0   
> >kmalloc-192         1040   1040    192   20    1 : tunables  120   60    0 : slabdata     52     52      0   
> >kmalloc-96           750    750    128   30    1 : tunables  120   60    0 : slabdata     25     25      0   
> >kmalloc-64          2773   2773     64   59    1 : tunables  120   60    0 : slabdata     47     47      0   
> >kmalloc-128          660    690    128   30    1 : tunables  120   60    0 : slabdata     23     23      0   
> >kmalloc-32         11200  11200     32  112    1 : tunables  120   60    0 : slabdata    100    100      0   
> >kmem_cache           197    200    192   20    1 : tunables  120   60    0 : slabdata     10     10      0   
> >
> >kmem_caches consisting of objects less than or equal to 128 byte have one more
> >objects in a slab. You can see it at objperslab.
> 
> I think there is one less objects in a slab after observing objperslab.

Yes :)
I did a mistake when I attached the data about this patchset.
The results of *Before* and *After* should be exchanged.
Thanks for pointing out that.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 0/4] slab: implement byte sized indexes for the freelist of a slab
  2013-09-03 14:15                   ` Christoph Lameter
@ 2013-09-04  8:33                     ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-09-04  8:33 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andrew Morton, David Rientjes, linux-mm, linux-kernel

On Tue, Sep 03, 2013 at 02:15:42PM +0000, Christoph Lameter wrote:
> On Mon, 2 Sep 2013, Joonsoo Kim wrote:
> 
> > This patchset implements byte sized indexes for the freelist of a slab.
> >
> > Currently, the freelist of a slab consist of unsigned int sized indexes.
> > Most of slabs have less number of objects than 256, so much space is wasted.
> > To reduce this overhead, this patchset implements byte sized indexes for
> > the freelist of a slab. With it, we can save 3 bytes for each objects.
> >
> > This introduce one likely branch to functions used for setting/getting
> > objects to/from the freelist, but we may get more benefits from
> > this change.
> >
> > Below is some numbers of 'cat /proc/slabinfo' related to my previous posting
> > and this patchset.
> 
> You  may also want to run some performance tests. The cache footprint
> should also be reduced with this patchset and therefore performance should
> be better.

Yes, I did a hackbench test today, but I'm not ready for posting it.
The performance is improved for my previous posting and futher improvement is
founded by this patchset. Perhaps I will post it tomorrow.

Thanks.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 0/4] slab: implement byte sized indexes for the freelist of a slab
@ 2013-09-04  8:33                     ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-09-04  8:33 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andrew Morton, David Rientjes, linux-mm, linux-kernel

On Tue, Sep 03, 2013 at 02:15:42PM +0000, Christoph Lameter wrote:
> On Mon, 2 Sep 2013, Joonsoo Kim wrote:
> 
> > This patchset implements byte sized indexes for the freelist of a slab.
> >
> > Currently, the freelist of a slab consist of unsigned int sized indexes.
> > Most of slabs have less number of objects than 256, so much space is wasted.
> > To reduce this overhead, this patchset implements byte sized indexes for
> > the freelist of a slab. With it, we can save 3 bytes for each objects.
> >
> > This introduce one likely branch to functions used for setting/getting
> > objects to/from the freelist, but we may get more benefits from
> > this change.
> >
> > Below is some numbers of 'cat /proc/slabinfo' related to my previous posting
> > and this patchset.
> 
> You  may also want to run some performance tests. The cache footprint
> should also be reduced with this patchset and therefore performance should
> be better.

Yes, I did a hackbench test today, but I'm not ready for posting it.
The performance is improved for my previous posting and futher improvement is
founded by this patchset. Perhaps I will post it tomorrow.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 0/4] slab: implement byte sized indexes for the freelist of a slab
  2013-09-04  8:33                     ` Joonsoo Kim
@ 2013-09-05  6:55                       ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-09-05  6:55 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andrew Morton, David Rientjes, linux-mm, linux-kernel

On Wed, Sep 04, 2013 at 05:33:05PM +0900, Joonsoo Kim wrote:
> On Tue, Sep 03, 2013 at 02:15:42PM +0000, Christoph Lameter wrote:
> > On Mon, 2 Sep 2013, Joonsoo Kim wrote:
> > 
> > > This patchset implements byte sized indexes for the freelist of a slab.
> > >
> > > Currently, the freelist of a slab consist of unsigned int sized indexes.
> > > Most of slabs have less number of objects than 256, so much space is wasted.
> > > To reduce this overhead, this patchset implements byte sized indexes for
> > > the freelist of a slab. With it, we can save 3 bytes for each objects.
> > >
> > > This introduce one likely branch to functions used for setting/getting
> > > objects to/from the freelist, but we may get more benefits from
> > > this change.
> > >
> > > Below is some numbers of 'cat /proc/slabinfo' related to my previous posting
> > > and this patchset.
> > 
> > You  may also want to run some performance tests. The cache footprint
> > should also be reduced with this patchset and therefore performance should
> > be better.
> 
> Yes, I did a hackbench test today, but I'm not ready for posting it.
> The performance is improved for my previous posting and futher improvement is
> founded by this patchset. Perhaps I will post it tomorrow.
> 

Here are the results from both patchsets on my 4 cpus machine.

* Before *

 Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 runs):

       238,309,671 cache-misses                                                  ( +-  0.40% )

      12.010172090 seconds time elapsed                                          ( +-  0.21% )

* After my previous posting *

 Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 runs):

       229,945,138 cache-misses                                                  ( +-  0.23% )

      11.627897174 seconds time elapsed                                          ( +-  0.14% )


* After my previous posting + this patchset *

 Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 runs):

       218,640,472 cache-misses                                                  ( +-  0.42% )

      11.504999837 seconds time elapsed                                          ( +-  0.21% )



cache-misses are reduced whenever applying each patchset, roughly 5% respectively.
And elapsed times are also improved by 3.1% and 4.2% to baseline, respectively.

I think that all patchsets deserve to be merged, since it reduces memory usage and
also improves performance. :)

Thanks.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 0/4] slab: implement byte sized indexes for the freelist of a slab
@ 2013-09-05  6:55                       ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-09-05  6:55 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andrew Morton, David Rientjes, linux-mm, linux-kernel

On Wed, Sep 04, 2013 at 05:33:05PM +0900, Joonsoo Kim wrote:
> On Tue, Sep 03, 2013 at 02:15:42PM +0000, Christoph Lameter wrote:
> > On Mon, 2 Sep 2013, Joonsoo Kim wrote:
> > 
> > > This patchset implements byte sized indexes for the freelist of a slab.
> > >
> > > Currently, the freelist of a slab consist of unsigned int sized indexes.
> > > Most of slabs have less number of objects than 256, so much space is wasted.
> > > To reduce this overhead, this patchset implements byte sized indexes for
> > > the freelist of a slab. With it, we can save 3 bytes for each objects.
> > >
> > > This introduce one likely branch to functions used for setting/getting
> > > objects to/from the freelist, but we may get more benefits from
> > > this change.
> > >
> > > Below is some numbers of 'cat /proc/slabinfo' related to my previous posting
> > > and this patchset.
> > 
> > You  may also want to run some performance tests. The cache footprint
> > should also be reduced with this patchset and therefore performance should
> > be better.
> 
> Yes, I did a hackbench test today, but I'm not ready for posting it.
> The performance is improved for my previous posting and futher improvement is
> founded by this patchset. Perhaps I will post it tomorrow.
> 

Here are the results from both patchsets on my 4 cpus machine.

* Before *

 Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 runs):

       238,309,671 cache-misses                                                  ( +-  0.40% )

      12.010172090 seconds time elapsed                                          ( +-  0.21% )

* After my previous posting *

 Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 runs):

       229,945,138 cache-misses                                                  ( +-  0.23% )

      11.627897174 seconds time elapsed                                          ( +-  0.14% )


* After my previous posting + this patchset *

 Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 runs):

       218,640,472 cache-misses                                                  ( +-  0.42% )

      11.504999837 seconds time elapsed                                          ( +-  0.21% )



cache-misses are reduced whenever applying each patchset, roughly 5% respectively.
And elapsed times are also improved by 3.1% and 4.2% to baseline, respectively.

I think that all patchsets deserve to be merged, since it reduces memory usage and
also improves performance. :)

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 0/4] slab: implement byte sized indexes for the freelist of a slab
  2013-09-05  6:55                       ` Joonsoo Kim
@ 2013-09-05 14:33                         ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-09-05 14:33 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, David Rientjes, linux-mm, linux-kernel

On Thu, 5 Sep 2013, Joonsoo Kim wrote:

> I think that all patchsets deserve to be merged, since it reduces memory usage and
> also improves performance. :)

Could you clean things up etc and the repost the patchset? This time do
*not* do this as a response to an earlier email but start the patchset
with new thread id. I think some people are not seeing this patchset.

There is a tool called "quilt" that can help you send the patchset.

	quilt mail

Tools for git to do the same also exist.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 0/4] slab: implement byte sized indexes for the freelist of a slab
@ 2013-09-05 14:33                         ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-09-05 14:33 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, David Rientjes, linux-mm, linux-kernel

On Thu, 5 Sep 2013, Joonsoo Kim wrote:

> I think that all patchsets deserve to be merged, since it reduces memory usage and
> also improves performance. :)

Could you clean things up etc and the repost the patchset? This time do
*not* do this as a response to an earlier email but start the patchset
with new thread id. I think some people are not seeing this patchset.

There is a tool called "quilt" that can help you send the patchset.

	quilt mail

Tools for git to do the same also exist.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 0/4] slab: implement byte sized indexes for the freelist of a slab
  2013-09-05 14:33                         ` Christoph Lameter
@ 2013-09-06  5:58                           ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-09-06  5:58 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andrew Morton, David Rientjes, linux-mm, linux-kernel

On Thu, Sep 05, 2013 at 02:33:56PM +0000, Christoph Lameter wrote:
> On Thu, 5 Sep 2013, Joonsoo Kim wrote:
> 
> > I think that all patchsets deserve to be merged, since it reduces memory usage and
> > also improves performance. :)
> 
> Could you clean things up etc and the repost the patchset? This time do
> *not* do this as a response to an earlier email but start the patchset
> with new thread id. I think some people are not seeing this patchset.

Okay. I just did that.

Thanks.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 0/4] slab: implement byte sized indexes for the freelist of a slab
@ 2013-09-06  5:58                           ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-09-06  5:58 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andrew Morton, David Rientjes, linux-mm, linux-kernel

On Thu, Sep 05, 2013 at 02:33:56PM +0000, Christoph Lameter wrote:
> On Thu, 5 Sep 2013, Joonsoo Kim wrote:
> 
> > I think that all patchsets deserve to be merged, since it reduces memory usage and
> > also improves performance. :)
> 
> Could you clean things up etc and the repost the patchset? This time do
> *not* do this as a response to an earlier email but start the patchset
> with new thread id. I think some people are not seeing this patchset.

Okay. I just did that.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 01/16] slab: correct pfmemalloc check
  2013-08-22  8:44   ` Joonsoo Kim
@ 2013-09-11 14:30     ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-09-11 14:30 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Mel Gorman

On Thu, 22 Aug 2013, Joonsoo Kim wrote:

> And, therefore we should check pfmemalloc in page flag of first page,
> but current implementation don't do that. virt_to_head_page(obj) just
> return 'struct page' of that object, not one of first page, since the SLAB
> don't use __GFP_COMP when CONFIG_MMU. To get 'struct page' of first page,
> we first get a slab and try to get it via virt_to_head_page(slab->s_mem).

Maybe using __GFP_COMP would make it consistent across all allocators and
avoid the issue? We then do only have to set the flags on the first page.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 01/16] slab: correct pfmemalloc check
@ 2013-09-11 14:30     ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-09-11 14:30 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel, Mel Gorman

On Thu, 22 Aug 2013, Joonsoo Kim wrote:

> And, therefore we should check pfmemalloc in page flag of first page,
> but current implementation don't do that. virt_to_head_page(obj) just
> return 'struct page' of that object, not one of first page, since the SLAB
> don't use __GFP_COMP when CONFIG_MMU. To get 'struct page' of first page,
> we first get a slab and try to get it via virt_to_head_page(slab->s_mem).

Maybe using __GFP_COMP would make it consistent across all allocators and
avoid the issue? We then do only have to set the flags on the first page.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 02/16] slab: change return type of kmem_getpages() to struct page
  2013-08-22  8:44   ` Joonsoo Kim
@ 2013-09-11 14:31     ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-09-11 14:31 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel

On Thu, 22 Aug 2013, Joonsoo Kim wrote:

> It is more understandable that kmem_getpages() return struct page.
> And, with this, we can reduce one translation from virt addr to page and
> makes better code than before. Below is a change of this patch.

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 02/16] slab: change return type of kmem_getpages() to struct page
@ 2013-09-11 14:31     ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-09-11 14:31 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel

On Thu, 22 Aug 2013, Joonsoo Kim wrote:

> It is more understandable that kmem_getpages() return struct page.
> And, with this, we can reduce one translation from virt addr to page and
> makes better code than before. Below is a change of this patch.

Acked-by: Christoph Lameter <cl@linux.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 03/16] slab: remove colouroff in struct slab
  2013-08-22  8:44   ` Joonsoo Kim
@ 2013-09-11 14:32     ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-09-11 14:32 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel

On Thu, 22 Aug 2013, Joonsoo Kim wrote:

> Now there is no user colouroff, so remove it.

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 03/16] slab: remove colouroff in struct slab
@ 2013-09-11 14:32     ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-09-11 14:32 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel

On Thu, 22 Aug 2013, Joonsoo Kim wrote:

> Now there is no user colouroff, so remove it.

Acked-by: Christoph Lameter <cl@linux.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 05/16] slab: remove cachep in struct slab_rcu
  2013-08-22  8:44   ` Joonsoo Kim
@ 2013-09-11 14:33     ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-09-11 14:33 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel

On Thu, 22 Aug 2013, Joonsoo Kim wrote:

> We can get cachep using page in struct slab_rcu, so remove it.

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 05/16] slab: remove cachep in struct slab_rcu
@ 2013-09-11 14:33     ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-09-11 14:33 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel

On Thu, 22 Aug 2013, Joonsoo Kim wrote:

> We can get cachep using page in struct slab_rcu, so remove it.

Acked-by: Christoph Lameter <cl@linux.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 06/16] slab: put forward freeing slab management object
  2013-08-22  8:44   ` Joonsoo Kim
@ 2013-09-11 14:35     ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-09-11 14:35 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel

On Thu, 22 Aug 2013, Joonsoo Kim wrote:

> We don't need to free slab management object in rcu context,
> because, from now on, we don't manage this slab anymore.
> So put forward freeing.

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 06/16] slab: put forward freeing slab management object
@ 2013-09-11 14:35     ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-09-11 14:35 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel

On Thu, 22 Aug 2013, Joonsoo Kim wrote:

> We don't need to free slab management object in rcu context,
> because, from now on, we don't manage this slab anymore.
> So put forward freeing.

Acked-by: Christoph Lameter <cl@linux.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 07/16] slab: overloading the RCU head over the LRU for RCU free
  2013-08-22  8:44   ` Joonsoo Kim
@ 2013-09-11 14:39     ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-09-11 14:39 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel

On Thu, 22 Aug 2013, Joonsoo Kim wrote:

> With build-time size checking, we can overload the RCU head over the LRU
> of struct page to free pages of a slab in rcu context. This really help to
> implement to overload the struct slab over the struct page and this
> eventually reduce memory usage and cache footprint of the SLAB.

Looks fine to me. Can you add the rcu_head to the struct page union? This
kind of overload is used frequently elsewhere as well. Then cleanup other
cases of such uses (such as in SLUB).

Acked-by: Christoph Lameter <cl@linux.com>


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 07/16] slab: overloading the RCU head over the LRU for RCU free
@ 2013-09-11 14:39     ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-09-11 14:39 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel

On Thu, 22 Aug 2013, Joonsoo Kim wrote:

> With build-time size checking, we can overload the RCU head over the LRU
> of struct page to free pages of a slab in rcu context. This really help to
> implement to overload the struct slab over the struct page and this
> eventually reduce memory usage and cache footprint of the SLAB.

Looks fine to me. Can you add the rcu_head to the struct page union? This
kind of overload is used frequently elsewhere as well. Then cleanup other
cases of such uses (such as in SLUB).

Acked-by: Christoph Lameter <cl@linux.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 08/16] slab: use well-defined macro, virt_to_slab()
  2013-08-22  8:44   ` Joonsoo Kim
@ 2013-09-11 14:40     ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-09-11 14:40 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel

On Thu, 22 Aug 2013, Joonsoo Kim wrote:

> This is trivial change, just use well-defined macro.

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 08/16] slab: use well-defined macro, virt_to_slab()
@ 2013-09-11 14:40     ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-09-11 14:40 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, Joonsoo Kim, David Rientjes,
	linux-mm, linux-kernel

On Thu, 22 Aug 2013, Joonsoo Kim wrote:

> This is trivial change, just use well-defined macro.

Acked-by: Christoph Lameter <cl@linux.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 01/16] slab: correct pfmemalloc check
  2013-09-11 14:30     ` Christoph Lameter
@ 2013-09-12  6:51       ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-09-12  6:51 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andrew Morton, David Rientjes, linux-mm,
	linux-kernel, Mel Gorman

On Wed, Sep 11, 2013 at 02:30:03PM +0000, Christoph Lameter wrote:
> On Thu, 22 Aug 2013, Joonsoo Kim wrote:
> 
> > And, therefore we should check pfmemalloc in page flag of first page,
> > but current implementation don't do that. virt_to_head_page(obj) just
> > return 'struct page' of that object, not one of first page, since the SLAB
> > don't use __GFP_COMP when CONFIG_MMU. To get 'struct page' of first page,
> > we first get a slab and try to get it via virt_to_head_page(slab->s_mem).
> 
> Maybe using __GFP_COMP would make it consistent across all allocators and
> avoid the issue? We then do only have to set the flags on the first page.

Yes, you are right. It can be solved by using __GFP_COMP.
But I made this patch to clarify the problem in current code and to
be merged seperately.

If I solve the problem with __GFP_COMP which is implemented in [09/16]
of this patchset, it would also weaken the purpose of that patch.

Thanks.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 01/16] slab: correct pfmemalloc check
@ 2013-09-12  6:51       ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-09-12  6:51 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andrew Morton, David Rientjes, linux-mm,
	linux-kernel, Mel Gorman

On Wed, Sep 11, 2013 at 02:30:03PM +0000, Christoph Lameter wrote:
> On Thu, 22 Aug 2013, Joonsoo Kim wrote:
> 
> > And, therefore we should check pfmemalloc in page flag of first page,
> > but current implementation don't do that. virt_to_head_page(obj) just
> > return 'struct page' of that object, not one of first page, since the SLAB
> > don't use __GFP_COMP when CONFIG_MMU. To get 'struct page' of first page,
> > we first get a slab and try to get it via virt_to_head_page(slab->s_mem).
> 
> Maybe using __GFP_COMP would make it consistent across all allocators and
> avoid the issue? We then do only have to set the flags on the first page.

Yes, you are right. It can be solved by using __GFP_COMP.
But I made this patch to clarify the problem in current code and to
be merged seperately.

If I solve the problem with __GFP_COMP which is implemented in [09/16]
of this patchset, it would also weaken the purpose of that patch.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 07/16] slab: overloading the RCU head over the LRU for RCU free
  2013-09-11 14:39     ` Christoph Lameter
@ 2013-09-12  6:55       ` Joonsoo Kim
  -1 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-09-12  6:55 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andrew Morton, David Rientjes, linux-mm, linux-kernel

On Wed, Sep 11, 2013 at 02:39:22PM +0000, Christoph Lameter wrote:
> On Thu, 22 Aug 2013, Joonsoo Kim wrote:
> 
> > With build-time size checking, we can overload the RCU head over the LRU
> > of struct page to free pages of a slab in rcu context. This really help to
> > implement to overload the struct slab over the struct page and this
> > eventually reduce memory usage and cache footprint of the SLAB.
> 
> Looks fine to me. Can you add the rcu_head to the struct page union? This
> kind of overload is used frequently elsewhere as well. Then cleanup other
> cases of such uses (such as in SLUB).

Okay. But I will implement it seprately because I don't know where the cases
are now and some inverstigation would be needed.

> 
> Acked-by: Christoph Lameter <cl@linux.com>

Thanks!

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 07/16] slab: overloading the RCU head over the LRU for RCU free
@ 2013-09-12  6:55       ` Joonsoo Kim
  0 siblings, 0 replies; 114+ messages in thread
From: Joonsoo Kim @ 2013-09-12  6:55 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andrew Morton, David Rientjes, linux-mm, linux-kernel

On Wed, Sep 11, 2013 at 02:39:22PM +0000, Christoph Lameter wrote:
> On Thu, 22 Aug 2013, Joonsoo Kim wrote:
> 
> > With build-time size checking, we can overload the RCU head over the LRU
> > of struct page to free pages of a slab in rcu context. This really help to
> > implement to overload the struct slab over the struct page and this
> > eventually reduce memory usage and cache footprint of the SLAB.
> 
> Looks fine to me. Can you add the rcu_head to the struct page union? This
> kind of overload is used frequently elsewhere as well. Then cleanup other
> cases of such uses (such as in SLUB).

Okay. But I will implement it seprately because I don't know where the cases
are now and some inverstigation would be needed.

> 
> Acked-by: Christoph Lameter <cl@linux.com>

Thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 07/16] slab: overloading the RCU head over the LRU for RCU free
  2013-09-12  6:55       ` Joonsoo Kim
@ 2013-09-12 14:21         ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-09-12 14:21 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, David Rientjes, linux-mm, linux-kernel

On Thu, 12 Sep 2013, Joonsoo Kim wrote:

> On Wed, Sep 11, 2013 at 02:39:22PM +0000, Christoph Lameter wrote:
> > On Thu, 22 Aug 2013, Joonsoo Kim wrote:
> >
> > > With build-time size checking, we can overload the RCU head over the LRU
> > > of struct page to free pages of a slab in rcu context. This really help to
> > > implement to overload the struct slab over the struct page and this
> > > eventually reduce memory usage and cache footprint of the SLAB.
> >
> > Looks fine to me. Can you add the rcu_head to the struct page union? This
> > kind of overload is used frequently elsewhere as well. Then cleanup other
> > cases of such uses (such as in SLUB).
>
> Okay. But I will implement it seprately because I don't know where the cases
> are now and some inverstigation would be needed.

Do it just for this case. The others can be done later.


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 07/16] slab: overloading the RCU head over the LRU for RCU free
@ 2013-09-12 14:21         ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2013-09-12 14:21 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Andrew Morton, David Rientjes, linux-mm, linux-kernel

On Thu, 12 Sep 2013, Joonsoo Kim wrote:

> On Wed, Sep 11, 2013 at 02:39:22PM +0000, Christoph Lameter wrote:
> > On Thu, 22 Aug 2013, Joonsoo Kim wrote:
> >
> > > With build-time size checking, we can overload the RCU head over the LRU
> > > of struct page to free pages of a slab in rcu context. This really help to
> > > implement to overload the struct slab over the struct page and this
> > > eventually reduce memory usage and cache footprint of the SLAB.
> >
> > Looks fine to me. Can you add the rcu_head to the struct page union? This
> > kind of overload is used frequently elsewhere as well. Then cleanup other
> > cases of such uses (such as in SLUB).
>
> Okay. But I will implement it seprately because I don't know where the cases
> are now and some inverstigation would be needed.

Do it just for this case. The others can be done later.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

end of thread, other threads:[~2013-09-12 14:21 UTC | newest]

Thread overview: 114+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-22  8:44 [PATCH 00/16] slab: overload struct slab over struct page to reduce memory usage Joonsoo Kim
2013-08-22  8:44 ` Joonsoo Kim
2013-08-22  8:44 ` [PATCH 01/16] slab: correct pfmemalloc check Joonsoo Kim
2013-08-22  8:44   ` Joonsoo Kim
2013-09-11 14:30   ` Christoph Lameter
2013-09-11 14:30     ` Christoph Lameter
2013-09-12  6:51     ` Joonsoo Kim
2013-09-12  6:51       ` Joonsoo Kim
2013-08-22  8:44 ` [PATCH 02/16] slab: change return type of kmem_getpages() to struct page Joonsoo Kim
2013-08-22  8:44   ` Joonsoo Kim
2013-08-22 17:49   ` Christoph Lameter
2013-08-22 17:49     ` Christoph Lameter
2013-08-23  6:40     ` Joonsoo Kim
2013-08-23  6:40       ` Joonsoo Kim
2013-09-11 14:31   ` Christoph Lameter
2013-09-11 14:31     ` Christoph Lameter
2013-08-22  8:44 ` [PATCH 03/16] slab: remove colouroff in struct slab Joonsoo Kim
2013-08-22  8:44   ` Joonsoo Kim
2013-09-11 14:32   ` Christoph Lameter
2013-09-11 14:32     ` Christoph Lameter
2013-08-22  8:44 ` [PATCH 04/16] slab: remove nodeid " Joonsoo Kim
2013-08-22  8:44   ` Joonsoo Kim
2013-08-22 17:51   ` Christoph Lameter
2013-08-22 17:51     ` Christoph Lameter
2013-08-23  6:49     ` Joonsoo Kim
2013-08-23  6:49       ` Joonsoo Kim
2013-08-22  8:44 ` [PATCH 05/16] slab: remove cachep in struct slab_rcu Joonsoo Kim
2013-08-22  8:44   ` Joonsoo Kim
2013-08-22 17:53   ` Christoph Lameter
2013-08-22 17:53     ` Christoph Lameter
2013-08-23  6:53     ` Joonsoo Kim
2013-08-23  6:53       ` Joonsoo Kim
2013-08-23 13:42       ` Christoph Lameter
2013-08-23 13:42         ` Christoph Lameter
2013-08-23 14:24         ` JoonSoo Kim
2013-08-23 14:24           ` JoonSoo Kim
2013-08-23 15:41           ` Christoph Lameter
2013-08-23 15:41             ` Christoph Lameter
2013-08-23 16:12             ` JoonSoo Kim
2013-08-23 16:12               ` JoonSoo Kim
2013-09-02  8:38               ` [PATCH 0/4] slab: implement byte sized indexes for the freelist of a slab Joonsoo Kim
2013-09-02  8:38                 ` Joonsoo Kim
2013-09-02  8:38                 ` [PATCH 1/4] slab: factor out calculate nr objects in cache_estimate Joonsoo Kim
2013-09-02  8:38                   ` Joonsoo Kim
2013-09-02  8:38                 ` [PATCH 2/4] slab: introduce helper functions to get/set free object Joonsoo Kim
2013-09-02  8:38                   ` Joonsoo Kim
2013-09-02  8:38                 ` [PATCH 3/4] slab: introduce byte sized index for the freelist of a slab Joonsoo Kim
2013-09-02  8:38                   ` Joonsoo Kim
2013-09-02  8:38                 ` [PATCH 4/4] slab: make more slab management structure off the slab Joonsoo Kim
2013-09-02  8:38                   ` Joonsoo Kim
2013-09-03 14:15                 ` [PATCH 0/4] slab: implement byte sized indexes for the freelist of a slab Christoph Lameter
2013-09-03 14:15                   ` Christoph Lameter
2013-09-04  8:33                   ` Joonsoo Kim
2013-09-04  8:33                     ` Joonsoo Kim
2013-09-05  6:55                     ` Joonsoo Kim
2013-09-05  6:55                       ` Joonsoo Kim
2013-09-05 14:33                       ` Christoph Lameter
2013-09-05 14:33                         ` Christoph Lameter
2013-09-06  5:58                         ` Joonsoo Kim
2013-09-06  5:58                           ` Joonsoo Kim
2013-09-04  2:17                 ` Wanpeng Li
2013-09-04  2:17                 ` Wanpeng Li
     [not found]                 ` <5226985f.4475320a.1c61.2623SMTPIN_ADDED_BROKEN@mx.google.com>
2013-09-04  8:28                   ` Joonsoo Kim
2013-09-04  8:28                     ` Joonsoo Kim
2013-09-11 14:33   ` [PATCH 05/16] slab: remove cachep in struct slab_rcu Christoph Lameter
2013-09-11 14:33     ` Christoph Lameter
2013-08-22  8:44 ` [PATCH 06/16] slab: put forward freeing slab management object Joonsoo Kim
2013-08-22  8:44   ` Joonsoo Kim
2013-09-11 14:35   ` Christoph Lameter
2013-09-11 14:35     ` Christoph Lameter
2013-08-22  8:44 ` [PATCH 07/16] slab: overloading the RCU head over the LRU for RCU free Joonsoo Kim
2013-08-22  8:44   ` Joonsoo Kim
2013-08-27 22:06   ` Jonathan Corbet
2013-08-27 22:06     ` Jonathan Corbet
2013-08-28  6:36     ` Joonsoo Kim
2013-08-28  6:36       ` Joonsoo Kim
2013-09-11 14:39   ` Christoph Lameter
2013-09-11 14:39     ` Christoph Lameter
2013-09-12  6:55     ` Joonsoo Kim
2013-09-12  6:55       ` Joonsoo Kim
2013-09-12 14:21       ` Christoph Lameter
2013-09-12 14:21         ` Christoph Lameter
2013-08-22  8:44 ` [PATCH 08/16] slab: use well-defined macro, virt_to_slab() Joonsoo Kim
2013-08-22  8:44   ` Joonsoo Kim
2013-09-11 14:40   ` Christoph Lameter
2013-09-11 14:40     ` Christoph Lameter
2013-08-22  8:44 ` [PATCH 09/16] slab: use __GFP_COMP flag for allocating slab pages Joonsoo Kim
2013-08-22  8:44   ` Joonsoo Kim
2013-08-22 18:00   ` Christoph Lameter
2013-08-22 18:00     ` Christoph Lameter
2013-08-23  6:55     ` Joonsoo Kim
2013-08-23  6:55       ` Joonsoo Kim
2013-08-22  8:44 ` [PATCH 10/16] slab: change the management method of free objects of the slab Joonsoo Kim
2013-08-22  8:44   ` Joonsoo Kim
2013-08-22  8:44 ` [PATCH 11/16] slab: remove kmem_bufctl_t Joonsoo Kim
2013-08-22  8:44   ` Joonsoo Kim
2013-08-22  8:44 ` [PATCH 12/16] slab: remove SLAB_LIMIT Joonsoo Kim
2013-08-22  8:44   ` Joonsoo Kim
2013-08-22  8:44 ` [PATCH 13/16] slab: replace free and inuse in struct slab with newly introduced active Joonsoo Kim
2013-08-22  8:44   ` Joonsoo Kim
2013-08-22  8:44 ` [PATCH 14/16] slab: use struct page for slab management Joonsoo Kim
2013-08-22  8:44   ` Joonsoo Kim
2013-08-22  8:44 ` [PATCH 15/16] slab: remove useless statement for checking pfmemalloc Joonsoo Kim
2013-08-22  8:44   ` Joonsoo Kim
2013-08-22  8:44 ` [PATCH 16/16] slab: rename slab_bufctl to slab_freelist Joonsoo Kim
2013-08-22  8:44   ` Joonsoo Kim
2013-08-22 16:47 ` [PATCH 00/16] slab: overload struct slab over struct page to reduce memory usage Christoph Lameter
2013-08-22 16:47   ` Christoph Lameter
2013-08-23  6:35   ` Joonsoo Kim
2013-08-23  6:35     ` Joonsoo Kim
2013-09-04  3:38     ` Wanpeng Li
2013-09-04  3:38     ` Wanpeng Li
     [not found]     ` <5226ab2c.02092b0a.5eed.ffffd7e4SMTPIN_ADDED_BROKEN@mx.google.com>
2013-09-04  8:25       ` Joonsoo Kim
2013-09-04  8:25         ` Joonsoo Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.