linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/16] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB
@ 2016-01-14  5:24 Joonsoo Kim
  2016-01-14  5:24 ` [PATCH 01/16] mm/slab: fix stale code comment Joonsoo Kim
                   ` (16 more replies)
  0 siblings, 17 replies; 36+ messages in thread
From: Joonsoo Kim @ 2016-01-14  5:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Jesper Dangaard Brouer, linux-mm, linux-kernel

Hello,

This patchset implements new freed object management way, that is,
OBJFREELIST_SLAB. Purpose of it is to reduce memory overhead in SLAB.

SLAB needs a array to manage freed objects in a slab. If there is
leftover after objects are packed into a slab, we can use it as
a management array, and, in this case, there is no memory waste.
But, in the other cases, we need to allocate extra memory for
a management array or utilize dedicated internal memory in a slab for it.
Both cases causes memory waste so it's not good.

With this patchset, freed object itself can be used for a management
array. So, memory waste could be reduced. Detailed idea and numbers
are described in last patch's commit description. Please refer it.

In fact, I tested another idea implementing OBJFREELIST_SLAB with
extendable linked array through another freed object. It can remove
memory waste completely but it causes more computational overhead
in critical lock path and it seems that overhead outweigh benefit.
So, this patchset doesn't include it. I will attach prototype just for
a reference.

This patchset is based on next-20151231.

Thanks.

Joonsoo Kim (16):
  mm/slab: fix stale code comment
  mm/slab: remove useless structure define
  mm/slab: remove the checks for slab implementation bug
  mm/slab: activate debug_pagealloc in SLAB when it is actually enabled
  mm/slab: use more appropriate condition check for debug_pagealloc
  mm/slab: clean-up DEBUG_PAGEALLOC processing code
  mm/slab: alternative implementation for DEBUG_SLAB_LEAK
  mm/slab: remove object status buffer for DEBUG_SLAB_LEAK
  mm/slab: put the freelist at the end of slab page
  mm/slab: align cache size first before determination of OFF_SLAB
    candidate
  mm/slab: clean-up cache type determination
  mm/slab: do not change cache size if debug pagealloc isn't possible
  mm/slab: make criteria for off slab determination robust and simple
  mm/slab: factor out slab list fixup code
  mm/slab: factor out debugging initialization in cache_init_objs()
  mm/slab: introduce new slab management type, OBJFREELIST_SLAB

 include/linux/slab_def.h |   3 +
 mm/slab.c                | 620 ++++++++++++++++++++++++++---------------------
 2 files changed, 350 insertions(+), 273 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 01/16] mm/slab: fix stale code comment
  2016-01-14  5:24 [PATCH 00/16] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB Joonsoo Kim
@ 2016-01-14  5:24 ` Joonsoo Kim
  2016-01-14 15:22   ` Christoph Lameter
  2016-01-14  5:24 ` [PATCH 02/16] mm/slab: remove useless structure define Joonsoo Kim
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 36+ messages in thread
From: Joonsoo Kim @ 2016-01-14  5:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Jesper Dangaard Brouer, linux-mm, linux-kernel

We use freelist_idx_t type for free object management whose size
would be smaller than size of unsigned int. Fix it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/slab.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slab.c b/mm/slab.c
index 6ecc697..c8f9c3a 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -537,7 +537,7 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
 	 * on it. For the latter case, the memory allocated for a
 	 * slab is used for:
 	 *
-	 * - One unsigned int for each object
+	 * - One freelist_idx_t for each object
 	 * - Padding to respect alignment of @align
 	 * - @buffer_size bytes for each object
 	 *
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 02/16] mm/slab: remove useless structure define
  2016-01-14  5:24 [PATCH 00/16] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB Joonsoo Kim
  2016-01-14  5:24 ` [PATCH 01/16] mm/slab: fix stale code comment Joonsoo Kim
@ 2016-01-14  5:24 ` Joonsoo Kim
  2016-01-14 15:22   ` Christoph Lameter
  2016-01-14  5:24 ` [PATCH 03/16] mm/slab: remove the checks for slab implementation bug Joonsoo Kim
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 36+ messages in thread
From: Joonsoo Kim @ 2016-01-14  5:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Jesper Dangaard Brouer, linux-mm, linux-kernel

It is obsolete so remove it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/slab.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index c8f9c3a..1bc6294 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -225,16 +225,6 @@ static inline void clear_obj_pfmemalloc(void **objp)
 }
 
 /*
- * bootstrap: The caches do not work without cpuarrays anymore, but the
- * cpuarrays are allocated from the generic caches...
- */
-#define BOOT_CPUCACHE_ENTRIES	1
-struct arraycache_init {
-	struct array_cache cache;
-	void *entries[BOOT_CPUCACHE_ENTRIES];
-};
-
-/*
  * Need this for bootstrapping a per node allocator.
  */
 #define NUM_INIT_LISTS (2 * MAX_NUMNODES)
@@ -457,6 +447,7 @@ static inline unsigned int obj_to_index(const struct kmem_cache *cache,
 	return reciprocal_divide(offset, cache->reciprocal_buffer_size);
 }
 
+#define BOOT_CPUCACHE_ENTRIES	1
 /* internal cache of cache description objs */
 static struct kmem_cache kmem_cache_boot = {
 	.batchcount = 1,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 03/16] mm/slab: remove the checks for slab implementation bug
  2016-01-14  5:24 [PATCH 00/16] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB Joonsoo Kim
  2016-01-14  5:24 ` [PATCH 01/16] mm/slab: fix stale code comment Joonsoo Kim
  2016-01-14  5:24 ` [PATCH 02/16] mm/slab: remove useless structure define Joonsoo Kim
@ 2016-01-14  5:24 ` Joonsoo Kim
  2016-01-14 15:23   ` Christoph Lameter
  2016-01-14  5:24 ` [PATCH 04/16] mm/slab: activate debug_pagealloc in SLAB when it is actually enabled Joonsoo Kim
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 36+ messages in thread
From: Joonsoo Kim @ 2016-01-14  5:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Jesper Dangaard Brouer, linux-mm, linux-kernel

Some of "#if DEBUG" are for reporting slab implementation bug
rather than user usecase bug. It's not really needed because slab
is stable for a quite long time and it makes code too dirty. This
patch remove it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/slab.c | 29 +++++++----------------------
 1 file changed, 7 insertions(+), 22 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 1bc6294..bbe4df2 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2110,8 +2110,6 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	if (!(flags & SLAB_DESTROY_BY_RCU))
 		flags |= SLAB_POISON;
 #endif
-	if (flags & SLAB_DESTROY_BY_RCU)
-		BUG_ON(flags & SLAB_POISON);
 #endif
 
 	/*
@@ -2368,9 +2366,6 @@ static int drain_freelist(struct kmem_cache *cache,
 		}
 
 		page = list_entry(p, struct page, lru);
-#if DEBUG
-		BUG_ON(page->active);
-#endif
 		list_del(&page->lru);
 		/*
 		 * Safe to drop the lock. The slab is no longer linked
@@ -2528,30 +2523,23 @@ static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
 	}
 }
 
-static void *slab_get_obj(struct kmem_cache *cachep, struct page *page,
-				int nodeid)
+static void *slab_get_obj(struct kmem_cache *cachep, struct page *page)
 {
 	void *objp;
 
 	objp = index_to_obj(cachep, page, get_free_obj(page, page->active));
 	page->active++;
-#if DEBUG
-	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
-#endif
 
 	return objp;
 }
 
-static void slab_put_obj(struct kmem_cache *cachep, struct page *page,
-				void *objp, int nodeid)
+static void slab_put_obj(struct kmem_cache *cachep,
+			struct page *page, void *objp)
 {
 	unsigned int objnr = obj_to_index(cachep, page, objp);
 #if DEBUG
 	unsigned int i;
 
-	/* Verify that the slab belongs to the intended node */
-	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
-
 	/* Verify double free bug */
 	for (i = page->active; i < cachep->num; i++) {
 		if (get_free_obj(page, i) == objnr) {
@@ -2817,8 +2805,7 @@ retry:
 			STATS_INC_ACTIVE(cachep);
 			STATS_SET_HIGH(cachep);
 
-			ac_put_obj(cachep, ac, slab_get_obj(cachep, page,
-									node));
+			ac_put_obj(cachep, ac, slab_get_obj(cachep, page));
 		}
 
 		/* move slabp to correct slabp list: */
@@ -3109,7 +3096,7 @@ retry:
 
 	BUG_ON(page->active == cachep->num);
 
-	obj = slab_get_obj(cachep, page, nodeid);
+	obj = slab_get_obj(cachep, page);
 	n->free_objects--;
 	/* move slabp to correct slabp list: */
 	list_del(&page->lru);
@@ -3278,7 +3265,7 @@ static void free_block(struct kmem_cache *cachep, void **objpp,
 		page = virt_to_head_page(objp);
 		list_del(&page->lru);
 		check_spinlock_acquired_node(cachep, node);
-		slab_put_obj(cachep, page, objp, node);
+		slab_put_obj(cachep, page, objp);
 		STATS_DEC_ACTIVE(cachep);
 		n->free_objects++;
 
@@ -3308,9 +3295,7 @@ static void cache_flusharray(struct kmem_cache *cachep, struct array_cache *ac)
 	LIST_HEAD(list);
 
 	batchcount = ac->batchcount;
-#if DEBUG
-	BUG_ON(!batchcount || batchcount > ac->avail);
-#endif
+
 	check_irq_off();
 	n = get_node(cachep, node);
 	spin_lock(&n->list_lock);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 04/16] mm/slab: activate debug_pagealloc in SLAB when it is actually enabled
  2016-01-14  5:24 [PATCH 00/16] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB Joonsoo Kim
                   ` (2 preceding siblings ...)
  2016-01-14  5:24 ` [PATCH 03/16] mm/slab: remove the checks for slab implementation bug Joonsoo Kim
@ 2016-01-14  5:24 ` Joonsoo Kim
  2016-01-14 12:09   ` Jesper Dangaard Brouer
  2016-01-14  5:24 ` [PATCH 05/16] mm/slab: use more appropriate condition check for debug_pagealloc Joonsoo Kim
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 36+ messages in thread
From: Joonsoo Kim @ 2016-01-14  5:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Jesper Dangaard Brouer, linux-mm, linux-kernel

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/slab.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index bbe4df2..4b55516 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1838,7 +1838,8 @@ static void slab_destroy_debugcheck(struct kmem_cache *cachep,
 
 		if (cachep->flags & SLAB_POISON) {
 #ifdef CONFIG_DEBUG_PAGEALLOC
-			if (cachep->size % PAGE_SIZE == 0 &&
+			if (debug_pagealloc_enabled() &&
+				cachep->size % PAGE_SIZE == 0 &&
 					OFF_SLAB(cachep))
 				kernel_map_pages(virt_to_page(objp),
 					cachep->size / PAGE_SIZE, 1);
@@ -2176,7 +2177,8 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	 * to check size >= 256. It guarantees that all necessary small
 	 * sized slab is initialized in current slab initialization sequence.
 	 */
-	if (!slab_early_init && size >= kmalloc_size(INDEX_NODE) &&
+	if (debug_pagealloc_enabled() &&
+		!slab_early_init && size >= kmalloc_size(INDEX_NODE) &&
 		size >= 256 && cachep->object_size > cache_line_size() &&
 		ALIGN(size, cachep->align) < PAGE_SIZE) {
 		cachep->obj_offset += PAGE_SIZE - ALIGN(size, cachep->align);
@@ -2232,7 +2234,8 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 		 * poisoning, then it's going to smash the contents of
 		 * the redzone and userword anyhow, so switch them off.
 		 */
-		if (size % PAGE_SIZE == 0 && flags & SLAB_POISON)
+		if (debug_pagealloc_enabled() &&
+			size % PAGE_SIZE == 0 && flags & SLAB_POISON)
 			flags &= ~(SLAB_RED_ZONE | SLAB_STORE_USER);
 #endif
 	}
@@ -2716,7 +2719,8 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
 	set_obj_status(page, objnr, OBJECT_FREE);
 	if (cachep->flags & SLAB_POISON) {
 #ifdef CONFIG_DEBUG_PAGEALLOC
-		if ((cachep->size % PAGE_SIZE)==0 && OFF_SLAB(cachep)) {
+		if (debug_pagealloc_enabled() &&
+			(cachep->size % PAGE_SIZE) == 0 && OFF_SLAB(cachep)) {
 			store_stackinfo(cachep, objp, caller);
 			kernel_map_pages(virt_to_page(objp),
 					 cachep->size / PAGE_SIZE, 0);
@@ -2861,7 +2865,8 @@ static void *cache_alloc_debugcheck_after(struct kmem_cache *cachep,
 		return objp;
 	if (cachep->flags & SLAB_POISON) {
 #ifdef CONFIG_DEBUG_PAGEALLOC
-		if ((cachep->size % PAGE_SIZE) == 0 && OFF_SLAB(cachep))
+		if (debug_pagealloc_enabled() &&
+			(cachep->size % PAGE_SIZE) == 0 && OFF_SLAB(cachep))
 			kernel_map_pages(virt_to_page(objp),
 					 cachep->size / PAGE_SIZE, 1);
 		else
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 05/16] mm/slab: use more appropriate condition check for debug_pagealloc
  2016-01-14  5:24 [PATCH 00/16] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB Joonsoo Kim
                   ` (3 preceding siblings ...)
  2016-01-14  5:24 ` [PATCH 04/16] mm/slab: activate debug_pagealloc in SLAB when it is actually enabled Joonsoo Kim
@ 2016-01-14  5:24 ` Joonsoo Kim
  2016-01-14  5:24 ` [PATCH 06/16] mm/slab: clean-up DEBUG_PAGEALLOC processing code Joonsoo Kim
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 36+ messages in thread
From: Joonsoo Kim @ 2016-01-14  5:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Jesper Dangaard Brouer, linux-mm, linux-kernel

debug_pagealloc debugging is related to SLAB_POISON flag rather than
FORCED_DEBUG option, although FORCED_DEBUG option will enable SLAB_POISON.
Fix it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/slab.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 4b55516..1fcc338 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2169,7 +2169,6 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 		else
 			size += BYTES_PER_WORD;
 	}
-#if FORCED_DEBUG && defined(CONFIG_DEBUG_PAGEALLOC)
 	/*
 	 * To activate debug pagealloc, off-slab management is necessary
 	 * requirement. In early phase of initialization, small sized slab
@@ -2177,7 +2176,7 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	 * to check size >= 256. It guarantees that all necessary small
 	 * sized slab is initialized in current slab initialization sequence.
 	 */
-	if (debug_pagealloc_enabled() &&
+	if (debug_pagealloc_enabled() && (flags & SLAB_POISON) &&
 		!slab_early_init && size >= kmalloc_size(INDEX_NODE) &&
 		size >= 256 && cachep->object_size > cache_line_size() &&
 		ALIGN(size, cachep->align) < PAGE_SIZE) {
@@ -2185,7 +2184,6 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 		size = PAGE_SIZE;
 	}
 #endif
-#endif
 
 	/*
 	 * Determine if the slab management is 'on' or 'off' slab.
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 06/16] mm/slab: clean-up DEBUG_PAGEALLOC processing code
  2016-01-14  5:24 [PATCH 00/16] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB Joonsoo Kim
                   ` (4 preceding siblings ...)
  2016-01-14  5:24 ` [PATCH 05/16] mm/slab: use more appropriate condition check for debug_pagealloc Joonsoo Kim
@ 2016-01-14  5:24 ` Joonsoo Kim
  2016-01-14  5:24 ` [PATCH 07/16] mm/slab: alternative implementation for DEBUG_SLAB_LEAK Joonsoo Kim
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 36+ messages in thread
From: Joonsoo Kim @ 2016-01-14  5:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Jesper Dangaard Brouer, linux-mm, linux-kernel

Currently, open code for checking DEBUG_PAGEALLOC cache is spread to
some sites. It makes code unreadable and hard to change. This patch
clean-up these code. Following patch will change the criteria for
DEBUG_PAGEALLOC cache so this clean-up will help it, too.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/slab.c | 97 ++++++++++++++++++++++++++++++++-------------------------------
 1 file changed, 49 insertions(+), 48 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 1fcc338..7a10e18 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1661,6 +1661,14 @@ static void kmem_rcu_free(struct rcu_head *head)
 }
 
 #if DEBUG
+static bool is_debug_pagealloc_cache(struct kmem_cache *cachep)
+{
+	if (debug_pagealloc_enabled() && OFF_SLAB(cachep) &&
+		(cachep->size % PAGE_SIZE) == 0)
+		return true;
+
+	return false;
+}
 
 #ifdef CONFIG_DEBUG_PAGEALLOC
 static void store_stackinfo(struct kmem_cache *cachep, unsigned long *addr,
@@ -1694,6 +1702,23 @@ static void store_stackinfo(struct kmem_cache *cachep, unsigned long *addr,
 	}
 	*addr++ = 0x87654321;
 }
+
+static void slab_kernel_map(struct kmem_cache *cachep, void *objp,
+				int map, unsigned long caller)
+{
+	if (!is_debug_pagealloc_cache(cachep))
+		return;
+
+	if (caller)
+		store_stackinfo(cachep, objp, caller);
+
+	kernel_map_pages(virt_to_page(objp), cachep->size / PAGE_SIZE, map);
+}
+
+#else
+static inline void slab_kernel_map(struct kmem_cache *cachep, void *objp,
+				int map, unsigned long caller) {}
+
 #endif
 
 static void poison_obj(struct kmem_cache *cachep, void *addr, unsigned char val)
@@ -1772,6 +1797,9 @@ static void check_poison_obj(struct kmem_cache *cachep, void *objp)
 	int size, i;
 	int lines = 0;
 
+	if (is_debug_pagealloc_cache(cachep))
+		return;
+
 	realobj = (char *)objp + obj_offset(cachep);
 	size = cachep->object_size;
 
@@ -1837,17 +1865,8 @@ static void slab_destroy_debugcheck(struct kmem_cache *cachep,
 		void *objp = index_to_obj(cachep, page, i);
 
 		if (cachep->flags & SLAB_POISON) {
-#ifdef CONFIG_DEBUG_PAGEALLOC
-			if (debug_pagealloc_enabled() &&
-				cachep->size % PAGE_SIZE == 0 &&
-					OFF_SLAB(cachep))
-				kernel_map_pages(virt_to_page(objp),
-					cachep->size / PAGE_SIZE, 1);
-			else
-				check_poison_obj(cachep, objp);
-#else
 			check_poison_obj(cachep, objp);
-#endif
+			slab_kernel_map(cachep, objp, 1, 0);
 		}
 		if (cachep->flags & SLAB_RED_ZONE) {
 			if (*dbg_redzone1(cachep, objp) != RED_INACTIVE)
@@ -2226,16 +2245,6 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	if (flags & CFLGS_OFF_SLAB) {
 		/* really off slab. No need for manual alignment */
 		freelist_size = calculate_freelist_size(cachep->num, 0);
-
-#ifdef CONFIG_PAGE_POISONING
-		/* If we're going to use the generic kernel_map_pages()
-		 * poisoning, then it's going to smash the contents of
-		 * the redzone and userword anyhow, so switch them off.
-		 */
-		if (debug_pagealloc_enabled() &&
-			size % PAGE_SIZE == 0 && flags & SLAB_POISON)
-			flags &= ~(SLAB_RED_ZONE | SLAB_STORE_USER);
-#endif
 	}
 
 	cachep->colour_off = cache_line_size();
@@ -2251,7 +2260,19 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	cachep->size = size;
 	cachep->reciprocal_buffer_size = reciprocal_value(size);
 
-	if (flags & CFLGS_OFF_SLAB) {
+#if DEBUG
+	/*
+	 * If we're going to use the generic kernel_map_pages()
+	 * poisoning, then it's going to smash the contents of
+	 * the redzone and userword anyhow, so switch them off.
+	 */
+	if (IS_ENABLED(CONFIG_PAGE_POISONING) &&
+		(cachep->flags & SLAB_POISON) &&
+		is_debug_pagealloc_cache(cachep))
+		cachep->flags &= ~(SLAB_RED_ZONE | SLAB_STORE_USER);
+#endif
+
+	if (OFF_SLAB(cachep)) {
 		cachep->freelist_cache = kmalloc_slab(freelist_size, 0u);
 		/*
 		 * This is a possibility for one of the kmalloc_{dma,}_caches.
@@ -2475,9 +2496,6 @@ static void cache_init_objs(struct kmem_cache *cachep,
 	for (i = 0; i < cachep->num; i++) {
 		void *objp = index_to_obj(cachep, page, i);
 #if DEBUG
-		/* need to poison the objs? */
-		if (cachep->flags & SLAB_POISON)
-			poison_obj(cachep, objp, POISON_FREE);
 		if (cachep->flags & SLAB_STORE_USER)
 			*dbg_userword(cachep, objp) = NULL;
 
@@ -2501,10 +2519,11 @@ static void cache_init_objs(struct kmem_cache *cachep,
 				slab_error(cachep, "constructor overwrote the"
 					   " start of an object");
 		}
-		if ((cachep->size % PAGE_SIZE) == 0 &&
-			    OFF_SLAB(cachep) && cachep->flags & SLAB_POISON)
-			kernel_map_pages(virt_to_page(objp),
-					 cachep->size / PAGE_SIZE, 0);
+		/* need to poison the objs? */
+		if (cachep->flags & SLAB_POISON) {
+			poison_obj(cachep, objp, POISON_FREE);
+			slab_kernel_map(cachep, objp, 0, 0);
+		}
 #else
 		if (cachep->ctor)
 			cachep->ctor(objp);
@@ -2716,18 +2735,8 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
 
 	set_obj_status(page, objnr, OBJECT_FREE);
 	if (cachep->flags & SLAB_POISON) {
-#ifdef CONFIG_DEBUG_PAGEALLOC
-		if (debug_pagealloc_enabled() &&
-			(cachep->size % PAGE_SIZE) == 0 && OFF_SLAB(cachep)) {
-			store_stackinfo(cachep, objp, caller);
-			kernel_map_pages(virt_to_page(objp),
-					 cachep->size / PAGE_SIZE, 0);
-		} else {
-			poison_obj(cachep, objp, POISON_FREE);
-		}
-#else
 		poison_obj(cachep, objp, POISON_FREE);
-#endif
+		slab_kernel_map(cachep, objp, 0, caller);
 	}
 	return objp;
 }
@@ -2862,16 +2871,8 @@ static void *cache_alloc_debugcheck_after(struct kmem_cache *cachep,
 	if (!objp)
 		return objp;
 	if (cachep->flags & SLAB_POISON) {
-#ifdef CONFIG_DEBUG_PAGEALLOC
-		if (debug_pagealloc_enabled() &&
-			(cachep->size % PAGE_SIZE) == 0 && OFF_SLAB(cachep))
-			kernel_map_pages(virt_to_page(objp),
-					 cachep->size / PAGE_SIZE, 1);
-		else
-			check_poison_obj(cachep, objp);
-#else
 		check_poison_obj(cachep, objp);
-#endif
+		slab_kernel_map(cachep, objp, 1, 0);
 		poison_obj(cachep, objp, POISON_INUSE);
 	}
 	if (cachep->flags & SLAB_STORE_USER)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 07/16] mm/slab: alternative implementation for DEBUG_SLAB_LEAK
  2016-01-14  5:24 [PATCH 00/16] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB Joonsoo Kim
                   ` (5 preceding siblings ...)
  2016-01-14  5:24 ` [PATCH 06/16] mm/slab: clean-up DEBUG_PAGEALLOC processing code Joonsoo Kim
@ 2016-01-14  5:24 ` Joonsoo Kim
  2016-01-14  5:24 ` [PATCH 08/16] mm/slab: remove object status buffer " Joonsoo Kim
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 36+ messages in thread
From: Joonsoo Kim @ 2016-01-14  5:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Jesper Dangaard Brouer, linux-mm, linux-kernel

DEBUG_SLAB_LEAK is a debug option. It's current implementation
requires status buffer so we need more memory to use it. And, it
cause kmem_cache initialization step more complex.

To remove this extra memory usage and to simplify initialization
step, this patch implement this feature with another way.

When user requests to get slab object owner information, it marks
that getting information is started. And then, all free objects
in caches are flushed to corresponding slab page. Now, we can
distinguish all freed object so we can know all allocated objects, too.
After collecting slab object owner information on allocated objects,
mark is checked that there is no free during the processing. If true,
we can be sure that our information is correct so information is
returned to user.

Although this way is rather complex, it has two important benefits
mentioned above. So, I think it is worth changing.

There is one drawback that it takes more time to get slab object owner
information but it is just a debug option so it doesn't matter at all.

To help review, this patch implements new way only. Following patch
will remove useless code.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 include/linux/slab_def.h |  3 ++
 mm/slab.c                | 85 +++++++++++++++++++++++++++++++++++-------------
 2 files changed, 66 insertions(+), 22 deletions(-)

diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index cf139d3..e878ba3 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -60,6 +60,9 @@ struct kmem_cache {
 	atomic_t allocmiss;
 	atomic_t freehit;
 	atomic_t freemiss;
+#ifdef CONFIG_DEBUG_SLAB_LEAK
+	atomic_t store_user_clean;
+#endif
 
 	/*
 	 * If debugging is enabled, then the allocator can add additional
diff --git a/mm/slab.c b/mm/slab.c
index 7a10e18..9a64d8f 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -396,20 +396,25 @@ static void set_obj_status(struct page *page, int idx, int val)
 	status[idx] = val;
 }
 
-static inline unsigned int get_obj_status(struct page *page, int idx)
+static inline bool is_store_user_clean(struct kmem_cache *cachep)
 {
-	int freelist_size;
-	char *status;
-	struct kmem_cache *cachep = page->slab_cache;
+	return atomic_read(&cachep->store_user_clean) == 1;
+}
 
-	freelist_size = cachep->num * sizeof(freelist_idx_t);
-	status = (char *)page->freelist + freelist_size;
+static inline void set_store_user_clean(struct kmem_cache *cachep)
+{
+	atomic_set(&cachep->store_user_clean, 1);
+}
 
-	return status[idx];
+static inline void set_store_user_dirty(struct kmem_cache *cachep)
+{
+	if (is_store_user_clean(cachep))
+		atomic_set(&cachep->store_user_clean, 0);
 }
 
 #else
 static inline void set_obj_status(struct page *page, int idx, int val) {}
+static inline void set_store_user_dirty(struct kmem_cache *cachep) {}
 
 #endif
 
@@ -2550,6 +2555,11 @@ static void *slab_get_obj(struct kmem_cache *cachep, struct page *page)
 	objp = index_to_obj(cachep, page, get_free_obj(page, page->active));
 	page->active++;
 
+#if DEBUG
+	if (cachep->flags & SLAB_STORE_USER)
+		set_store_user_dirty(cachep);
+#endif
+
 	return objp;
 }
 
@@ -2725,8 +2735,10 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
 		*dbg_redzone1(cachep, objp) = RED_INACTIVE;
 		*dbg_redzone2(cachep, objp) = RED_INACTIVE;
 	}
-	if (cachep->flags & SLAB_STORE_USER)
+	if (cachep->flags & SLAB_STORE_USER) {
+		set_store_user_dirty(cachep);
 		*dbg_userword(cachep, objp) = (void *)caller;
+	}
 
 	objnr = obj_to_index(cachep, page, objp);
 
@@ -4082,15 +4094,34 @@ static void handle_slab(unsigned long *n, struct kmem_cache *c,
 						struct page *page)
 {
 	void *p;
-	int i;
+	int i, j;
+	unsigned long v;
 
 	if (n[0] == n[1])
 		return;
 	for (i = 0, p = page->s_mem; i < c->num; i++, p += c->size) {
-		if (get_obj_status(page, i) != OBJECT_ACTIVE)
+		bool active = true;
+
+		for (j = page->active; j < c->num; j++) {
+			if (get_free_obj(page, j) == i) {
+				active = false;
+				break;
+			}
+		}
+
+		if (!active)
 			continue;
 
-		if (!add_caller(n, (unsigned long)*dbg_userword(c, p)))
+		/*
+		 * probe_kernel_read() is used for DEBUG_PAGEALLOC. page table
+		 * mapping is established when actual object allocation and
+		 * we could mistakenly access the unmapped object in the cpu
+		 * cache.
+		 */
+		if (probe_kernel_read(&v, dbg_userword(c, p), sizeof(v)))
+			continue;
+
+		if (!add_caller(n, v))
 			return;
 	}
 }
@@ -4126,21 +4157,31 @@ static int leaks_show(struct seq_file *m, void *p)
 	if (!(cachep->flags & SLAB_RED_ZONE))
 		return 0;
 
-	/* OK, we can do it */
+	/*
+	 * Set store_user_clean and start to grab stored user information
+	 * for all objects on this cache. If some alloc/free requests comes
+	 * during the processing, information would be wrong so restart
+	 * whole processing.
+	 */
+	do {
+		set_store_user_clean(cachep);
+		drain_cpu_caches(cachep);
+
+		x[1] = 0;
 
-	x[1] = 0;
+		for_each_kmem_cache_node(cachep, node, n) {
 
-	for_each_kmem_cache_node(cachep, node, n) {
+			check_irq_on();
+			spin_lock_irq(&n->list_lock);
 
-		check_irq_on();
-		spin_lock_irq(&n->list_lock);
+			list_for_each_entry(page, &n->slabs_full, lru)
+				handle_slab(x, cachep, page);
+			list_for_each_entry(page, &n->slabs_partial, lru)
+				handle_slab(x, cachep, page);
+			spin_unlock_irq(&n->list_lock);
+		}
+	} while (!is_store_user_clean(cachep));
 
-		list_for_each_entry(page, &n->slabs_full, lru)
-			handle_slab(x, cachep, page);
-		list_for_each_entry(page, &n->slabs_partial, lru)
-			handle_slab(x, cachep, page);
-		spin_unlock_irq(&n->list_lock);
-	}
 	name = cachep->name;
 	if (x[0] == x[1]) {
 		/* Increase the buffer size */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 08/16] mm/slab: remove object status buffer for DEBUG_SLAB_LEAK
  2016-01-14  5:24 [PATCH 00/16] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB Joonsoo Kim
                   ` (6 preceding siblings ...)
  2016-01-14  5:24 ` [PATCH 07/16] mm/slab: alternative implementation for DEBUG_SLAB_LEAK Joonsoo Kim
@ 2016-01-14  5:24 ` Joonsoo Kim
  2016-01-14  5:24 ` [PATCH 09/16] mm/slab: put the freelist at the end of slab page Joonsoo Kim
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 36+ messages in thread
From: Joonsoo Kim @ 2016-01-14  5:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Jesper Dangaard Brouer, linux-mm, linux-kernel

Now, we don't use object status buffer in any setup. Remove it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/slab.c | 34 ++--------------------------------
 1 file changed, 2 insertions(+), 32 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 9a64d8f..02bdb91 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -380,22 +380,8 @@ static void **dbg_userword(struct kmem_cache *cachep, void *objp)
 
 #endif
 
-#define OBJECT_FREE (0)
-#define OBJECT_ACTIVE (1)
-
 #ifdef CONFIG_DEBUG_SLAB_LEAK
 
-static void set_obj_status(struct page *page, int idx, int val)
-{
-	int freelist_size;
-	char *status;
-	struct kmem_cache *cachep = page->slab_cache;
-
-	freelist_size = cachep->num * sizeof(freelist_idx_t);
-	status = (char *)page->freelist + freelist_size;
-	status[idx] = val;
-}
-
 static inline bool is_store_user_clean(struct kmem_cache *cachep)
 {
 	return atomic_read(&cachep->store_user_clean) == 1;
@@ -413,7 +399,6 @@ static inline void set_store_user_dirty(struct kmem_cache *cachep)
 }
 
 #else
-static inline void set_obj_status(struct page *page, int idx, int val) {}
 static inline void set_store_user_dirty(struct kmem_cache *cachep) {}
 
 #endif
@@ -476,9 +461,6 @@ static size_t calculate_freelist_size(int nr_objs, size_t align)
 	size_t freelist_size;
 
 	freelist_size = nr_objs * sizeof(freelist_idx_t);
-	if (IS_ENABLED(CONFIG_DEBUG_SLAB_LEAK))
-		freelist_size += nr_objs * sizeof(char);
-
 	if (align)
 		freelist_size = ALIGN(freelist_size, align);
 
@@ -491,10 +473,7 @@ static int calculate_nr_objs(size_t slab_size, size_t buffer_size,
 	int nr_objs;
 	size_t remained_size;
 	size_t freelist_size;
-	int extra_space = 0;
 
-	if (IS_ENABLED(CONFIG_DEBUG_SLAB_LEAK))
-		extra_space = sizeof(char);
 	/*
 	 * Ignore padding for the initial guess. The padding
 	 * is at most @align-1 bytes, and @buffer_size is at
@@ -503,7 +482,7 @@ static int calculate_nr_objs(size_t slab_size, size_t buffer_size,
 	 * into the memory allocation when taking the padding
 	 * into account.
 	 */
-	nr_objs = slab_size / (buffer_size + idx_size + extra_space);
+	nr_objs = slab_size / (buffer_size + idx_size);
 
 	/*
 	 * This calculated number will be either the right
@@ -1961,16 +1940,13 @@ static size_t calculate_slab_order(struct kmem_cache *cachep,
 			break;
 
 		if (flags & CFLGS_OFF_SLAB) {
-			size_t freelist_size_per_obj = sizeof(freelist_idx_t);
 			/*
 			 * Max number of objs-per-slab for caches which
 			 * use off-slab slabs. Needed to avoid a possible
 			 * looping condition in cache_grow().
 			 */
-			if (IS_ENABLED(CONFIG_DEBUG_SLAB_LEAK))
-				freelist_size_per_obj += sizeof(char);
 			offslab_limit = size;
-			offslab_limit /= freelist_size_per_obj;
+			offslab_limit /= sizeof(freelist_idx_t);
 
  			if (num > offslab_limit)
 				break;
@@ -2533,7 +2509,6 @@ static void cache_init_objs(struct kmem_cache *cachep,
 		if (cachep->ctor)
 			cachep->ctor(objp);
 #endif
-		set_obj_status(page, i, OBJECT_FREE);
 		set_free_obj(page, i, i);
 	}
 }
@@ -2745,7 +2720,6 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
 	BUG_ON(objnr >= cachep->num);
 	BUG_ON(objp != index_to_obj(cachep, page, objnr));
 
-	set_obj_status(page, objnr, OBJECT_FREE);
 	if (cachep->flags & SLAB_POISON) {
 		poison_obj(cachep, objp, POISON_FREE);
 		slab_kernel_map(cachep, objp, 0, caller);
@@ -2878,8 +2852,6 @@ static inline void cache_alloc_debugcheck_before(struct kmem_cache *cachep,
 static void *cache_alloc_debugcheck_after(struct kmem_cache *cachep,
 				gfp_t flags, void *objp, unsigned long caller)
 {
-	struct page *page;
-
 	if (!objp)
 		return objp;
 	if (cachep->flags & SLAB_POISON) {
@@ -2904,8 +2876,6 @@ static void *cache_alloc_debugcheck_after(struct kmem_cache *cachep,
 		*dbg_redzone2(cachep, objp) = RED_ACTIVE;
 	}
 
-	page = virt_to_head_page(objp);
-	set_obj_status(page, obj_to_index(cachep, page, objp), OBJECT_ACTIVE);
 	objp += obj_offset(cachep);
 	if (cachep->ctor && cachep->flags & SLAB_POISON)
 		cachep->ctor(objp);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 09/16] mm/slab: put the freelist at the end of slab page
  2016-01-14  5:24 [PATCH 00/16] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB Joonsoo Kim
                   ` (7 preceding siblings ...)
  2016-01-14  5:24 ` [PATCH 08/16] mm/slab: remove object status buffer " Joonsoo Kim
@ 2016-01-14  5:24 ` Joonsoo Kim
  2016-01-14 15:26   ` Christoph Lameter
  2016-01-14 17:13   ` Christoph Lameter
  2016-01-14  5:24 ` [PATCH 10/16] mm/slab: align cache size first before determination of OFF_SLAB candidate Joonsoo Kim
                   ` (7 subsequent siblings)
  16 siblings, 2 replies; 36+ messages in thread
From: Joonsoo Kim @ 2016-01-14  5:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Jesper Dangaard Brouer, linux-mm, linux-kernel

Currently, the freelist is at the front of slab page. This requires
extra space to meet object alignment requirement. If we put the freelist
at the end of slab page, object could start at page boundary and will
be at correct alignment. This is possible because freelist has
no alignment constraint itself.

This gives us two benefits. It removes extra memory space
for the freelist alignment and remove complex calculation
at cache initialization step. I can't think notable drawback here.

I mentioned that this would reduce extra memory space, but, this benefit
is rather theoretical because it can be applied to very few cases.
Following is the example cache type that can get benefit from this change.

size align num before after
32 8 124 4100 4092
64 8 63 4103 4095
88 8 46 4102 4094
272 8 15 4103 4095
408 8 10 4098 4090
32 16 124 4108 4092
64 16 63 4111 4095
32 32 124 4124 4092
64 32 63 4127 4095
96 32 42 4106 4074

before means whole size for objects and aligned freelist before applying
patch and after shows the result of this patch.

Since before is more than 4096, number of object should decrease and
memory waste happens.

Anyway, this patch removes complex calculation so looks beneficial to me.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/slab.c | 89 ++++++++++++++++-----------------------------------------------
 1 file changed, 22 insertions(+), 67 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 02bdb91..fe17acc 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -456,55 +456,12 @@ static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep)
 	return this_cpu_ptr(cachep->cpu_cache);
 }
 
-static size_t calculate_freelist_size(int nr_objs, size_t align)
-{
-	size_t freelist_size;
-
-	freelist_size = nr_objs * sizeof(freelist_idx_t);
-	if (align)
-		freelist_size = ALIGN(freelist_size, align);
-
-	return freelist_size;
-}
-
-static int calculate_nr_objs(size_t slab_size, size_t buffer_size,
-				size_t idx_size, size_t align)
-{
-	int nr_objs;
-	size_t remained_size;
-	size_t freelist_size;
-
-	/*
-	 * Ignore padding for the initial guess. The padding
-	 * is at most @align-1 bytes, and @buffer_size is at
-	 * least @align. In the worst case, this result will
-	 * be one greater than the number of objects that fit
-	 * into the memory allocation when taking the padding
-	 * into account.
-	 */
-	nr_objs = slab_size / (buffer_size + idx_size);
-
-	/*
-	 * This calculated number will be either the right
-	 * amount, or one greater than what we want.
-	 */
-	remained_size = slab_size - nr_objs * buffer_size;
-	freelist_size = calculate_freelist_size(nr_objs, align);
-	if (remained_size < freelist_size)
-		nr_objs--;
-
-	return nr_objs;
-}
-
 /*
  * Calculate the number of objects and left-over bytes for a given buffer size.
  */
 static void cache_estimate(unsigned long gfporder, size_t buffer_size,
-			   size_t align, int flags, size_t *left_over,
-			   unsigned int *num)
+		unsigned long flags, size_t *left_over, unsigned int *num)
 {
-	int nr_objs;
-	size_t mgmt_size;
 	size_t slab_size = PAGE_SIZE << gfporder;
 
 	/*
@@ -512,9 +469,12 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
 	 * on it. For the latter case, the memory allocated for a
 	 * slab is used for:
 	 *
-	 * - One freelist_idx_t for each object
-	 * - Padding to respect alignment of @align
 	 * - @buffer_size bytes for each object
+	 * - One freelist_idx_t for each object
+	 *
+	 * We don't need to consider alignment of freelist because
+	 * freelist will be at the end of slab page. The objects will be
+	 * at the correct alignment.
 	 *
 	 * If the slab management structure is off the slab, then the
 	 * alignment will already be calculated into the size. Because
@@ -522,16 +482,13 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
 	 * correct alignment when allocated.
 	 */
 	if (flags & CFLGS_OFF_SLAB) {
-		mgmt_size = 0;
-		nr_objs = slab_size / buffer_size;
-
+		*num = slab_size / buffer_size;
+		*left_over = slab_size % buffer_size;
 	} else {
-		nr_objs = calculate_nr_objs(slab_size, buffer_size,
-					sizeof(freelist_idx_t), align);
-		mgmt_size = calculate_freelist_size(nr_objs, align);
+		*num = slab_size / (buffer_size + sizeof(freelist_idx_t));
+		*left_over = slab_size %
+			(buffer_size + sizeof(freelist_idx_t));
 	}
-	*num = nr_objs;
-	*left_over = slab_size - nr_objs*buffer_size - mgmt_size;
 }
 
 #if DEBUG
@@ -1921,7 +1878,7 @@ static void slabs_destroy(struct kmem_cache *cachep, struct list_head *list)
  * towards high-order requests, this should be changed.
  */
 static size_t calculate_slab_order(struct kmem_cache *cachep,
-			size_t size, size_t align, unsigned long flags)
+				size_t size, unsigned long flags)
 {
 	unsigned long offslab_limit;
 	size_t left_over = 0;
@@ -1931,7 +1888,7 @@ static size_t calculate_slab_order(struct kmem_cache *cachep,
 		unsigned int num;
 		size_t remainder;
 
-		cache_estimate(gfporder, size, align, flags, &remainder, &num);
+		cache_estimate(gfporder, size, flags, &remainder, &num);
 		if (!num)
 			continue;
 
@@ -2207,12 +2164,12 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	if (FREELIST_BYTE_INDEX && size < SLAB_OBJ_MIN_SIZE)
 		size = ALIGN(SLAB_OBJ_MIN_SIZE, cachep->align);
 
-	left_over = calculate_slab_order(cachep, size, cachep->align, flags);
+	left_over = calculate_slab_order(cachep, size, flags);
 
 	if (!cachep->num)
 		return -E2BIG;
 
-	freelist_size = calculate_freelist_size(cachep->num, cachep->align);
+	freelist_size = cachep->num * sizeof(freelist_idx_t);
 
 	/*
 	 * If the slab has been placed off-slab, and we have enough space then
@@ -2223,11 +2180,6 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 		left_over -= freelist_size;
 	}
 
-	if (flags & CFLGS_OFF_SLAB) {
-		/* really off slab. No need for manual alignment */
-		freelist_size = calculate_freelist_size(cachep->num, 0);
-	}
-
 	cachep->colour_off = cache_line_size();
 	/* Offset must be a multiple of the alignment. */
 	if (cachep->colour_off < cachep->align)
@@ -2443,6 +2395,9 @@ static void *alloc_slabmgmt(struct kmem_cache *cachep,
 	void *freelist;
 	void *addr = page_address(page);
 
+	page->s_mem = addr + colour_off;
+	page->active = 0;
+
 	if (OFF_SLAB(cachep)) {
 		/* Slab management obj is off-slab. */
 		freelist = kmem_cache_alloc_node(cachep->freelist_cache,
@@ -2450,11 +2405,11 @@ static void *alloc_slabmgmt(struct kmem_cache *cachep,
 		if (!freelist)
 			return NULL;
 	} else {
-		freelist = addr + colour_off;
-		colour_off += cachep->freelist_size;
+		/* We will use last bytes at the slab for freelist */
+		freelist = addr + (PAGE_SIZE << cachep->gfporder) -
+				cachep->freelist_size;
 	}
-	page->active = 0;
-	page->s_mem = addr + colour_off;
+
 	return freelist;
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 10/16] mm/slab: align cache size first before determination of OFF_SLAB candidate
  2016-01-14  5:24 [PATCH 00/16] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB Joonsoo Kim
                   ` (8 preceding siblings ...)
  2016-01-14  5:24 ` [PATCH 09/16] mm/slab: put the freelist at the end of slab page Joonsoo Kim
@ 2016-01-14  5:24 ` Joonsoo Kim
  2016-01-14  5:24 ` [PATCH 11/16] mm/slab: clean-up cache type determination Joonsoo Kim
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 36+ messages in thread
From: Joonsoo Kim @ 2016-01-14  5:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Jesper Dangaard Brouer, linux-mm, linux-kernel

Finding suitable OFF_SLAB candidate is more related to aligned cache size
rather than original size. Same reasoning can be applied to
the debug pagealloc candidate. So, this patch moves up alignment fixup
to proper position.

>From that point, size is aligned so we can remove some alignment fixups.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/slab.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index fe17acc..3a319f5 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2126,6 +2126,17 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 		else
 			size += BYTES_PER_WORD;
 	}
+#endif
+
+	size = ALIGN(size, cachep->align);
+	/*
+	 * We should restrict the number of objects in a slab to implement
+	 * byte sized index. Refer comment on SLAB_OBJ_MIN_SIZE definition.
+	 */
+	if (FREELIST_BYTE_INDEX && size < SLAB_OBJ_MIN_SIZE)
+		size = ALIGN(SLAB_OBJ_MIN_SIZE, cachep->align);
+
+#if DEBUG
 	/*
 	 * To activate debug pagealloc, off-slab management is necessary
 	 * requirement. In early phase of initialization, small sized slab
@@ -2136,8 +2147,8 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	if (debug_pagealloc_enabled() && (flags & SLAB_POISON) &&
 		!slab_early_init && size >= kmalloc_size(INDEX_NODE) &&
 		size >= 256 && cachep->object_size > cache_line_size() &&
-		ALIGN(size, cachep->align) < PAGE_SIZE) {
-		cachep->obj_offset += PAGE_SIZE - ALIGN(size, cachep->align);
+		size < PAGE_SIZE) {
+		cachep->obj_offset += PAGE_SIZE - size;
 		size = PAGE_SIZE;
 	}
 #endif
@@ -2149,20 +2160,13 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	 * SLAB_NOLEAKTRACE to avoid recursive calls into kmemleak)
 	 */
 	if (size >= OFF_SLAB_MIN_SIZE && !slab_early_init &&
-	    !(flags & SLAB_NOLEAKTRACE))
+	    !(flags & SLAB_NOLEAKTRACE)) {
 		/*
 		 * Size is large, assume best to place the slab management obj
 		 * off-slab (should allow better packing of objs).
 		 */
 		flags |= CFLGS_OFF_SLAB;
-
-	size = ALIGN(size, cachep->align);
-	/*
-	 * We should restrict the number of objects in a slab to implement
-	 * byte sized index. Refer comment on SLAB_OBJ_MIN_SIZE definition.
-	 */
-	if (FREELIST_BYTE_INDEX && size < SLAB_OBJ_MIN_SIZE)
-		size = ALIGN(SLAB_OBJ_MIN_SIZE, cachep->align);
+	}
 
 	left_over = calculate_slab_order(cachep, size, flags);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 11/16] mm/slab: clean-up cache type determination
  2016-01-14  5:24 [PATCH 00/16] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB Joonsoo Kim
                   ` (9 preceding siblings ...)
  2016-01-14  5:24 ` [PATCH 10/16] mm/slab: align cache size first before determination of OFF_SLAB candidate Joonsoo Kim
@ 2016-01-14  5:24 ` Joonsoo Kim
  2016-01-14  5:24 ` [PATCH 12/16] mm/slab: do not change cache size if debug pagealloc isn't possible Joonsoo Kim
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 36+ messages in thread
From: Joonsoo Kim @ 2016-01-14  5:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Jesper Dangaard Brouer, linux-mm, linux-kernel

Current cache type determination code is open-code and
looks not understandable. Following patch will introduce
one more cache type and it would make code more complex.
So, before it happens, this patch abstracts these codes.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/slab.c | 105 ++++++++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 71 insertions(+), 34 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 3a319f5..7a253a9 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2024,6 +2024,64 @@ __kmem_cache_alias(const char *name, size_t size, size_t align,
 	return cachep;
 }
 
+static bool set_off_slab_cache(struct kmem_cache *cachep,
+			size_t size, unsigned long flags)
+{
+	size_t left;
+
+	cachep->num = 0;
+
+	/*
+	 * Determine if the slab management is 'on' or 'off' slab.
+	 * (bootstrapping cannot cope with offslab caches so don't do
+	 * it too early on. Always use on-slab management when
+	 * SLAB_NOLEAKTRACE to avoid recursive calls into kmemleak)
+	 */
+	if (size < OFF_SLAB_MIN_SIZE)
+		return false;
+
+	if (slab_early_init)
+		return false;
+
+	if (flags & SLAB_NOLEAKTRACE)
+		return false;
+
+	/*
+	 * Size is large, assume best to place the slab management obj
+	 * off-slab (should allow better packing of objs).
+	 */
+	left = calculate_slab_order(cachep, size, flags | CFLGS_OFF_SLAB);
+	if (!cachep->num)
+		return false;
+
+	/*
+	 * If the slab has been placed off-slab, and we have enough space then
+	 * move it on-slab. This is at the expense of any extra colouring.
+	 */
+	if (left >= cachep->num * sizeof(freelist_idx_t))
+		return false;
+
+	cachep->colour = left / cachep->colour_off;
+
+	return true;
+}
+
+static bool set_on_slab_cache(struct kmem_cache *cachep,
+			size_t size, unsigned long flags)
+{
+	size_t left;
+
+	cachep->num = 0;
+
+	left = calculate_slab_order(cachep, size, flags);
+	if (!cachep->num)
+		return false;
+
+	cachep->colour = left / cachep->colour_off;
+
+	return true;
+}
+
 /**
  * __kmem_cache_create - Create a cache.
  * @cachep: cache management descriptor
@@ -2048,7 +2106,6 @@ __kmem_cache_alias(const char *name, size_t size, size_t align,
 int
 __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 {
-	size_t left_over, freelist_size;
 	size_t ralign = BYTES_PER_WORD;
 	gfp_t gfp;
 	int err;
@@ -2099,6 +2156,10 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	 * 4) Store it.
 	 */
 	cachep->align = ralign;
+	cachep->colour_off = cache_line_size();
+	/* Offset must be a multiple of the alignment. */
+	if (cachep->colour_off < cachep->align)
+		cachep->colour_off = cachep->align;
 
 	if (slab_is_available())
 		gfp = GFP_KERNEL;
@@ -2153,43 +2214,18 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	}
 #endif
 
-	/*
-	 * Determine if the slab management is 'on' or 'off' slab.
-	 * (bootstrapping cannot cope with offslab caches so don't do
-	 * it too early on. Always use on-slab management when
-	 * SLAB_NOLEAKTRACE to avoid recursive calls into kmemleak)
-	 */
-	if (size >= OFF_SLAB_MIN_SIZE && !slab_early_init &&
-	    !(flags & SLAB_NOLEAKTRACE)) {
-		/*
-		 * Size is large, assume best to place the slab management obj
-		 * off-slab (should allow better packing of objs).
-		 */
+	if (set_off_slab_cache(cachep, size, flags)) {
 		flags |= CFLGS_OFF_SLAB;
+		goto done;
 	}
 
-	left_over = calculate_slab_order(cachep, size, flags);
-
-	if (!cachep->num)
-		return -E2BIG;
+	if (set_on_slab_cache(cachep, size, flags))
+		goto done;
 
-	freelist_size = cachep->num * sizeof(freelist_idx_t);
+	return -E2BIG;
 
-	/*
-	 * If the slab has been placed off-slab, and we have enough space then
-	 * move it on-slab. This is at the expense of any extra colouring.
-	 */
-	if (flags & CFLGS_OFF_SLAB && left_over >= freelist_size) {
-		flags &= ~CFLGS_OFF_SLAB;
-		left_over -= freelist_size;
-	}
-
-	cachep->colour_off = cache_line_size();
-	/* Offset must be a multiple of the alignment. */
-	if (cachep->colour_off < cachep->align)
-		cachep->colour_off = cachep->align;
-	cachep->colour = left_over / cachep->colour_off;
-	cachep->freelist_size = freelist_size;
+done:
+	cachep->freelist_size = cachep->num * sizeof(freelist_idx_t);
 	cachep->flags = flags;
 	cachep->allocflags = __GFP_COMP;
 	if (CONFIG_ZONE_DMA_FLAG && (flags & SLAB_CACHE_DMA))
@@ -2210,7 +2246,8 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 #endif
 
 	if (OFF_SLAB(cachep)) {
-		cachep->freelist_cache = kmalloc_slab(freelist_size, 0u);
+		cachep->freelist_cache =
+			kmalloc_slab(cachep->freelist_size, 0u);
 		/*
 		 * This is a possibility for one of the kmalloc_{dma,}_caches.
 		 * But since we go off slab only for object size greater than
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 12/16] mm/slab: do not change cache size if debug pagealloc isn't possible
  2016-01-14  5:24 [PATCH 00/16] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB Joonsoo Kim
                   ` (10 preceding siblings ...)
  2016-01-14  5:24 ` [PATCH 11/16] mm/slab: clean-up cache type determination Joonsoo Kim
@ 2016-01-14  5:24 ` Joonsoo Kim
  2016-01-14  5:24 ` [PATCH 13/16] mm/slab: make criteria for off slab determination robust and simple Joonsoo Kim
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 36+ messages in thread
From: Joonsoo Kim @ 2016-01-14  5:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Jesper Dangaard Brouer, linux-mm, linux-kernel

We can fail to setup off slab in some conditions. Even in this case,
debug pagealloc increases cache size to PAGE_SIZE in advance and
it is waste because debug pagealloc cannot work for it when it isn't
the off slab. To improve this situation, this patch checks first
that this cache with increased size is suitable for off slab.
It actually increases cache size when it is suitable for off-slab,
so possible waste is removed.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/slab.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 7a253a9..b0f6eda 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2207,10 +2207,17 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	 */
 	if (debug_pagealloc_enabled() && (flags & SLAB_POISON) &&
 		!slab_early_init && size >= kmalloc_size(INDEX_NODE) &&
-		size >= 256 && cachep->object_size > cache_line_size() &&
-		size < PAGE_SIZE) {
-		cachep->obj_offset += PAGE_SIZE - size;
-		size = PAGE_SIZE;
+		size >= 256 && cachep->object_size > cache_line_size()) {
+		if (size < PAGE_SIZE || size % PAGE_SIZE == 0) {
+			size_t tmp_size = ALIGN(size, PAGE_SIZE);
+
+			if (set_off_slab_cache(cachep, tmp_size, flags)) {
+				flags |= CFLGS_OFF_SLAB;
+				cachep->obj_offset += tmp_size - size;
+				size = tmp_size;
+				goto done;
+			}
+		}
 	}
 #endif
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 13/16] mm/slab: make criteria for off slab determination robust and simple
  2016-01-14  5:24 [PATCH 00/16] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB Joonsoo Kim
                   ` (11 preceding siblings ...)
  2016-01-14  5:24 ` [PATCH 12/16] mm/slab: do not change cache size if debug pagealloc isn't possible Joonsoo Kim
@ 2016-01-14  5:24 ` Joonsoo Kim
  2016-01-14  5:24 ` [PATCH 14/16] mm/slab: factor out slab list fixup code Joonsoo Kim
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 36+ messages in thread
From: Joonsoo Kim @ 2016-01-14  5:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Jesper Dangaard Brouer, linux-mm, linux-kernel

To become a off slab, there are some constraints to avoid
bootstrapping problem and recursive call. This can be avoided differently
by simply checking that corresponding kmalloc cache is ready and it's not
a off slab. It would be more robust because static size checking can be
affected by cache size change or architecture type but dynamic checking
isn't.

One check 'freelist_cache->size > cachep->size / 2' is added to check
benefit of choosing off slab, because, now, there is no size constraint
which ensures enough advantage when selecting off slab.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/slab.c | 45 +++++++++++++++++----------------------------
 1 file changed, 17 insertions(+), 28 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index b0f6eda..e86977e 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -272,7 +272,6 @@ static void kmem_cache_node_init(struct kmem_cache_node *parent)
 
 #define CFLGS_OFF_SLAB		(0x80000000UL)
 #define	OFF_SLAB(x)	((x)->flags & CFLGS_OFF_SLAB)
-#define OFF_SLAB_MIN_SIZE (max_t(size_t, PAGE_SIZE >> 5, KMALLOC_MIN_SIZE + 1))
 
 #define BATCHREFILL_LIMIT	16
 /*
@@ -1880,7 +1879,6 @@ static void slabs_destroy(struct kmem_cache *cachep, struct list_head *list)
 static size_t calculate_slab_order(struct kmem_cache *cachep,
 				size_t size, unsigned long flags)
 {
-	unsigned long offslab_limit;
 	size_t left_over = 0;
 	int gfporder;
 
@@ -1897,16 +1895,24 @@ static size_t calculate_slab_order(struct kmem_cache *cachep,
 			break;
 
 		if (flags & CFLGS_OFF_SLAB) {
+			struct kmem_cache *freelist_cache;
+			size_t freelist_size;
+
+			freelist_size = num * sizeof(freelist_idx_t);
+			freelist_cache = kmalloc_slab(freelist_size, 0u);
+			if (!freelist_cache)
+				continue;
+
 			/*
-			 * Max number of objs-per-slab for caches which
-			 * use off-slab slabs. Needed to avoid a possible
-			 * looping condition in cache_grow().
+			 * Needed to avoid possible looping condition
+			 * in cache_grow()
 			 */
-			offslab_limit = size;
-			offslab_limit /= sizeof(freelist_idx_t);
+			if (OFF_SLAB(freelist_cache))
+				continue;
 
- 			if (num > offslab_limit)
-				break;
+			/* check if off slab has enough benefit */
+			if (freelist_cache->size > cachep->size / 2)
+				continue;
 		}
 
 		/* Found something acceptable - save it away */
@@ -2032,17 +2038,9 @@ static bool set_off_slab_cache(struct kmem_cache *cachep,
 	cachep->num = 0;
 
 	/*
-	 * Determine if the slab management is 'on' or 'off' slab.
-	 * (bootstrapping cannot cope with offslab caches so don't do
-	 * it too early on. Always use on-slab management when
-	 * SLAB_NOLEAKTRACE to avoid recursive calls into kmemleak)
+	 * Always use on-slab management when SLAB_NOLEAKTRACE
+	 * to avoid recursive calls into kmemleak.
 	 */
-	if (size < OFF_SLAB_MIN_SIZE)
-		return false;
-
-	if (slab_early_init)
-		return false;
-
 	if (flags & SLAB_NOLEAKTRACE)
 		return false;
 
@@ -2206,7 +2204,6 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	 * sized slab is initialized in current slab initialization sequence.
 	 */
 	if (debug_pagealloc_enabled() && (flags & SLAB_POISON) &&
-		!slab_early_init && size >= kmalloc_size(INDEX_NODE) &&
 		size >= 256 && cachep->object_size > cache_line_size()) {
 		if (size < PAGE_SIZE || size % PAGE_SIZE == 0) {
 			size_t tmp_size = ALIGN(size, PAGE_SIZE);
@@ -2255,14 +2252,6 @@ done:
 	if (OFF_SLAB(cachep)) {
 		cachep->freelist_cache =
 			kmalloc_slab(cachep->freelist_size, 0u);
-		/*
-		 * This is a possibility for one of the kmalloc_{dma,}_caches.
-		 * But since we go off slab only for object size greater than
-		 * OFF_SLAB_MIN_SIZE, and kmalloc_{dma,}_caches get created
-		 * in ascending order,this should not happen at all.
-		 * But leave a BUG_ON for some lucky dude.
-		 */
-		BUG_ON(ZERO_OR_NULL_PTR(cachep->freelist_cache));
 	}
 
 	err = setup_cpu_cache(cachep, gfp);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 14/16] mm/slab: factor out slab list fixup code
  2016-01-14  5:24 [PATCH 00/16] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB Joonsoo Kim
                   ` (12 preceding siblings ...)
  2016-01-14  5:24 ` [PATCH 13/16] mm/slab: make criteria for off slab determination robust and simple Joonsoo Kim
@ 2016-01-14  5:24 ` Joonsoo Kim
  2016-01-14  5:24 ` [PATCH 15/16] mm/slab: factor out debugging initialization in cache_init_objs() Joonsoo Kim
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 36+ messages in thread
From: Joonsoo Kim @ 2016-01-14  5:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Jesper Dangaard Brouer, linux-mm, linux-kernel

Slab list should be fixed up after object is detached from the slab
and this happens at two places. They do exactly same thing. They will
be changed in the following patch, so, to reduce code duplication,
this patch factor out them and make it common function.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/slab.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index e86977e..dbf18ed 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2724,6 +2724,17 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
 #define cache_free_debugcheck(x,objp,z) (objp)
 #endif
 
+static inline void fixup_slab_list(struct kmem_cache *cachep,
+				struct kmem_cache_node *n, struct page *page)
+{
+	/* move slabp to correct slabp list: */
+	list_del(&page->lru);
+	if (page->active == cachep->num)
+		list_add(&page->lru, &n->slabs_full);
+	else
+		list_add(&page->lru, &n->slabs_partial);
+}
+
 static struct page *get_first_slab(struct kmem_cache_node *n)
 {
 	struct page *page;
@@ -2797,12 +2808,7 @@ retry:
 			ac_put_obj(cachep, ac, slab_get_obj(cachep, page));
 		}
 
-		/* move slabp to correct slabp list: */
-		list_del(&page->lru);
-		if (page->active == cachep->num)
-			list_add(&page->lru, &n->slabs_full);
-		else
-			list_add(&page->lru, &n->slabs_partial);
+		fixup_slab_list(cachep, n, page);
 	}
 
 must_grow:
@@ -3076,13 +3082,8 @@ retry:
 
 	obj = slab_get_obj(cachep, page);
 	n->free_objects--;
-	/* move slabp to correct slabp list: */
-	list_del(&page->lru);
 
-	if (page->active == cachep->num)
-		list_add(&page->lru, &n->slabs_full);
-	else
-		list_add(&page->lru, &n->slabs_partial);
+	fixup_slab_list(cachep, n, page);
 
 	spin_unlock(&n->list_lock);
 	goto done;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 15/16] mm/slab: factor out debugging initialization in cache_init_objs()
  2016-01-14  5:24 [PATCH 00/16] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB Joonsoo Kim
                   ` (13 preceding siblings ...)
  2016-01-14  5:24 ` [PATCH 14/16] mm/slab: factor out slab list fixup code Joonsoo Kim
@ 2016-01-14  5:24 ` Joonsoo Kim
  2016-01-14  5:24 ` [PATCH 16/16] mm/slab: introduce new slab management type, OBJFREELIST_SLAB Joonsoo Kim
  2016-01-27  4:40 ` [PATCH 00/16] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB Andrew Morton
  16 siblings, 0 replies; 36+ messages in thread
From: Joonsoo Kim @ 2016-01-14  5:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Jesper Dangaard Brouer, linux-mm, linux-kernel

cache_init_objs() will be changed in following patch and current form
doesn't fit well for that change. So, before doing it, this patch
separates debugging initialization. This would cause two loop iteration
when debugging is enabled, but, this overhead seems too light than
debug feature itself so effect may not be visible. This patch will
greatly simplify changes in cache_init_objs() in following patch.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/slab.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index dbf18ed..a9807c3 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2461,14 +2461,14 @@ static inline void set_free_obj(struct page *page,
 	((freelist_idx_t *)(page->freelist))[idx] = val;
 }
 
-static void cache_init_objs(struct kmem_cache *cachep,
-			    struct page *page)
+static void cache_init_objs_debug(struct kmem_cache *cachep, struct page *page)
 {
+#if DEBUG
 	int i;
 
 	for (i = 0; i < cachep->num; i++) {
 		void *objp = index_to_obj(cachep, page, i);
-#if DEBUG
+
 		if (cachep->flags & SLAB_STORE_USER)
 			*dbg_userword(cachep, objp) = NULL;
 
@@ -2497,10 +2497,22 @@ static void cache_init_objs(struct kmem_cache *cachep,
 			poison_obj(cachep, objp, POISON_FREE);
 			slab_kernel_map(cachep, objp, 0, 0);
 		}
-#else
-		if (cachep->ctor)
-			cachep->ctor(objp);
+	}
 #endif
+}
+
+static void cache_init_objs(struct kmem_cache *cachep,
+			    struct page *page)
+{
+	int i;
+
+	cache_init_objs_debug(cachep, page);
+
+	for (i = 0; i < cachep->num; i++) {
+		/* constructor could break poison info */
+		if (DEBUG == 0 && cachep->ctor)
+			cachep->ctor(index_to_obj(cachep, page, i));
+
 		set_free_obj(page, i, i);
 	}
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 16/16] mm/slab: introduce new slab management type, OBJFREELIST_SLAB
  2016-01-14  5:24 [PATCH 00/16] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB Joonsoo Kim
                   ` (14 preceding siblings ...)
  2016-01-14  5:24 ` [PATCH 15/16] mm/slab: factor out debugging initialization in cache_init_objs() Joonsoo Kim
@ 2016-01-14  5:24 ` Joonsoo Kim
  2016-01-14 15:32   ` Christoph Lameter
  2016-01-27 13:35   ` Vlastimil Babka
  2016-01-27  4:40 ` [PATCH 00/16] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB Andrew Morton
  16 siblings, 2 replies; 36+ messages in thread
From: Joonsoo Kim @ 2016-01-14  5:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Jesper Dangaard Brouer, linux-mm, linux-kernel

SLAB needs a array to manage freed objects in a slab. It is only used
if some objects are freed so we can use free object itself as this array.
This requires additional branch in somewhat critical lock path to check
if it is first freed object or not but that's all we need. Benefits is
that we can save extra memory usage and reduce some computational
overhead by allocating a management array when new slab is created.

Code change is rather complex than what we can expect from the idea,
in order to handle debugging feature efficiently. If you want to see
core idea only, please remove '#if DEBUG' block in the patch.

Although this idea can apply to all caches whose size is larger than
management array size, it isn't applied to caches which have
a constructor. If such cache's object is used for management array,
constructor should be called for it before that object is returned to
user. I guess that overhead overwhelm benefit in that case so this idea
doesn't applied to them at least now.

For summary, from now on, slab management type is determined by
following logic.

1) if management array size is smaller than object size and no ctor,
it becomes OBJFREELIST_SLAB.
2) if management array size is smaller than leftover, it becomes
NORMAL_SLAB which uses leftover as a array.
3) if OFF_SLAB help to save memory than way 4), it becomes OFF_SLAB.
It allocate a management array from the other cache so memory waste
happens.
4) others become NORMAL_SLAB. It uses dedicated internal memory
in a slab as a management array so it causes memory waste.

In my system, without enabling CONFIG_DEBUG_SLAB, Almost caches become
OBJFREELIST_SLAB and NORMAL_SLAB (using leftover) which doesn't waste
memory. Following is the result of number of caches with specific slab
management type.

TOTAL = OBJFREELIST + NORMAL(leftover) + NORMAL + OFF

/Before/
126 = 0 + 60 + 25 + 41

/After/
126 = 97 + 12 + 15 + 2

Result shows that number of caches that doesn't waste memory increase
from 60 to 109.

I did some benchmarking and it looks that benefit are more than loss.

Kmalloc: Repeatedly allocate then free test

/Before/
[    0.286809] 1. Kmalloc: Repeatedly allocate then free test
[    1.143674] 100000 times kmalloc(32) -> 116 cycles kfree -> 78 cycles
[    1.441726] 100000 times kmalloc(64) -> 121 cycles kfree -> 80 cycles
[    1.815734] 100000 times kmalloc(128) -> 168 cycles kfree -> 85 cycles
[    2.380709] 100000 times kmalloc(256) -> 287 cycles kfree -> 95 cycles
[    3.101153] 100000 times kmalloc(512) -> 370 cycles kfree -> 117 cycles
[    3.942432] 100000 times kmalloc(1024) -> 413 cycles kfree -> 156 cycles
[    5.227396] 100000 times kmalloc(2048) -> 622 cycles kfree -> 248 cycles
[    7.519793] 100000 times kmalloc(4096) -> 1102 cycles kfree -> 452 cycles

/After/
[    1.205313] 100000 times kmalloc(32) -> 117 cycles kfree -> 78 cycles
[    1.510526] 100000 times kmalloc(64) -> 124 cycles kfree -> 81 cycles
[    1.827382] 100000 times kmalloc(128) -> 130 cycles kfree -> 84 cycles
[    2.226073] 100000 times kmalloc(256) -> 177 cycles kfree -> 92 cycles
[    2.814747] 100000 times kmalloc(512) -> 286 cycles kfree -> 112 cycles
[    3.532952] 100000 times kmalloc(1024) -> 344 cycles kfree -> 141 cycles
[    4.608777] 100000 times kmalloc(2048) -> 519 cycles kfree -> 210 cycles
[    6.350105] 100000 times kmalloc(4096) -> 789 cycles kfree -> 391 cycles

In fact, I tested another idea implementing OBJFREELIST_SLAB with
extendable linked array through another freed object. It can remove
memory waste completely but it causes more computational overhead
in critical lock path and it seems that overhead outweigh benefit.
So, this patch doesn't include it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/slab.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 86 insertions(+), 8 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index a9807c3..f75f569 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -270,7 +270,9 @@ static void kmem_cache_node_init(struct kmem_cache_node *parent)
 	MAKE_LIST((cachep), (&(ptr)->slabs_free), slabs_free, nodeid);	\
 	} while (0)
 
+#define CFLGS_OBJFREELIST_SLAB	(0x40000000UL)
 #define CFLGS_OFF_SLAB		(0x80000000UL)
+#define	OBJFREELIST_SLAB(x)	((x)->flags & CFLGS_OBJFREELIST_SLAB)
 #define	OFF_SLAB(x)	((x)->flags & CFLGS_OFF_SLAB)
 
 #define BATCHREFILL_LIMIT	16
@@ -480,7 +482,7 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
 	 * the slabs are all pages aligned, the objects will be at the
 	 * correct alignment when allocated.
 	 */
-	if (flags & CFLGS_OFF_SLAB) {
+	if (flags & (CFLGS_OBJFREELIST_SLAB | CFLGS_OFF_SLAB)) {
 		*num = slab_size / buffer_size;
 		*left_over = slab_size % buffer_size;
 	} else {
@@ -1801,6 +1803,12 @@ static void slab_destroy_debugcheck(struct kmem_cache *cachep,
 						struct page *page)
 {
 	int i;
+
+	if (OBJFREELIST_SLAB(cachep) && cachep->flags & SLAB_POISON) {
+		poison_obj(cachep, page->freelist - obj_offset(cachep),
+			POISON_FREE);
+	}
+
 	for (i = 0; i < cachep->num; i++) {
 		void *objp = index_to_obj(cachep, page, i);
 
@@ -2030,6 +2038,29 @@ __kmem_cache_alias(const char *name, size_t size, size_t align,
 	return cachep;
 }
 
+static bool set_objfreelist_slab_cache(struct kmem_cache *cachep,
+			size_t size, unsigned long flags)
+{
+	size_t left;
+
+	cachep->num = 0;
+
+	if (cachep->ctor)
+		return false;
+
+	left = calculate_slab_order(cachep, size,
+			flags | CFLGS_OBJFREELIST_SLAB);
+	if (!cachep->num)
+		return false;
+
+	if (cachep->num * sizeof(freelist_idx_t) > cachep->object_size)
+		return false;
+
+	cachep->colour = left / cachep->colour_off;
+
+	return true;
+}
+
 static bool set_off_slab_cache(struct kmem_cache *cachep,
 			size_t size, unsigned long flags)
 {
@@ -2218,6 +2249,11 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	}
 #endif
 
+	if (set_objfreelist_slab_cache(cachep, size, flags)) {
+		flags |= CFLGS_OBJFREELIST_SLAB;
+		goto done;
+	}
+
 	if (set_off_slab_cache(cachep, size, flags)) {
 		flags |= CFLGS_OFF_SLAB;
 		goto done;
@@ -2435,7 +2471,9 @@ static void *alloc_slabmgmt(struct kmem_cache *cachep,
 	page->s_mem = addr + colour_off;
 	page->active = 0;
 
-	if (OFF_SLAB(cachep)) {
+	if (OBJFREELIST_SLAB(cachep))
+		freelist = NULL;
+	else if (OFF_SLAB(cachep)) {
 		/* Slab management obj is off-slab. */
 		freelist = kmem_cache_alloc_node(cachep->freelist_cache,
 					      local_flags, nodeid);
@@ -2508,6 +2546,11 @@ static void cache_init_objs(struct kmem_cache *cachep,
 
 	cache_init_objs_debug(cachep, page);
 
+	if (OBJFREELIST_SLAB(cachep)) {
+		page->freelist = index_to_obj(cachep, page, cachep->num - 1) +
+						obj_offset(cachep);
+	}
+
 	for (i = 0; i < cachep->num; i++) {
 		/* constructor could break poison info */
 		if (DEBUG == 0 && cachep->ctor)
@@ -2559,6 +2602,9 @@ static void slab_put_obj(struct kmem_cache *cachep,
 	}
 #endif
 	page->active--;
+	if (!page->freelist)
+		page->freelist = objp + obj_offset(cachep);
+
 	set_free_obj(page, page->active, objnr);
 }
 
@@ -2633,7 +2679,7 @@ static int cache_grow(struct kmem_cache *cachep,
 	/* Get slab management. */
 	freelist = alloc_slabmgmt(cachep, page, offset,
 			local_flags & ~GFP_CONSTRAINT_MASK, nodeid);
-	if (!freelist)
+	if (OFF_SLAB(cachep) && !freelist)
 		goto opps1;
 
 	slab_map_pages(cachep, page, freelist);
@@ -2736,14 +2782,42 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
 #define cache_free_debugcheck(x,objp,z) (objp)
 #endif
 
+static inline void fixup_objfreelist_debug(struct kmem_cache *cachep,
+						void **list)
+{
+#if DEBUG
+	void *next = *list;
+	void *objp;
+
+	while (next) {
+		objp = next - obj_offset(cachep);
+		next = *(void **)next;
+		poison_obj(cachep, objp, POISON_FREE);
+	}
+#endif
+}
+
 static inline void fixup_slab_list(struct kmem_cache *cachep,
-				struct kmem_cache_node *n, struct page *page)
+				struct kmem_cache_node *n, struct page *page,
+				void **list)
 {
 	/* move slabp to correct slabp list: */
 	list_del(&page->lru);
-	if (page->active == cachep->num)
+	if (page->active == cachep->num) {
 		list_add(&page->lru, &n->slabs_full);
-	else
+		if (OBJFREELIST_SLAB(cachep)) {
+#if DEBUG
+			/* Poisoning will be done without holding the lock */
+			if (cachep->flags & SLAB_POISON) {
+				void **objp = page->freelist;
+
+				*objp = *list;
+				*list = objp;
+			}
+#endif
+			page->freelist = NULL;
+		}
+	} else
 		list_add(&page->lru, &n->slabs_partial);
 }
 
@@ -2769,6 +2843,7 @@ static void *cache_alloc_refill(struct kmem_cache *cachep, gfp_t flags,
 	struct kmem_cache_node *n;
 	struct array_cache *ac;
 	int node;
+	void *list = NULL;
 
 	check_irq_off();
 	node = numa_mem_id();
@@ -2820,13 +2895,14 @@ retry:
 			ac_put_obj(cachep, ac, slab_get_obj(cachep, page));
 		}
 
-		fixup_slab_list(cachep, n, page);
+		fixup_slab_list(cachep, n, page, &list);
 	}
 
 must_grow:
 	n->free_objects -= ac->avail;
 alloc_done:
 	spin_unlock(&n->list_lock);
+	fixup_objfreelist_debug(cachep, &list);
 
 	if (unlikely(!ac->avail)) {
 		int x;
@@ -3071,6 +3147,7 @@ static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
 	struct page *page;
 	struct kmem_cache_node *n;
 	void *obj;
+	void *list = NULL;
 	int x;
 
 	VM_BUG_ON(nodeid < 0 || nodeid >= MAX_NUMNODES);
@@ -3095,9 +3172,10 @@ retry:
 	obj = slab_get_obj(cachep, page);
 	n->free_objects--;
 
-	fixup_slab_list(cachep, n, page);
+	fixup_slab_list(cachep, n, page, &list);
 
 	spin_unlock(&n->list_lock);
+	fixup_objfreelist_debug(cachep, &list);
 	goto done;
 
 must_grow:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 04/16] mm/slab: activate debug_pagealloc in SLAB when it is actually enabled
  2016-01-14  5:24 ` [PATCH 04/16] mm/slab: activate debug_pagealloc in SLAB when it is actually enabled Joonsoo Kim
@ 2016-01-14 12:09   ` Jesper Dangaard Brouer
  2016-01-14 16:16     ` Joonsoo Kim
  0 siblings, 1 reply; 36+ messages in thread
From: Jesper Dangaard Brouer @ 2016-01-14 12:09 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, linux-mm, linux-kernel, brouer

On Thu, 14 Jan 2016 14:24:17 +0900
Joonsoo Kim <js1304@gmail.com> wrote:

> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  mm/slab.c | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/slab.c b/mm/slab.c
> index bbe4df2..4b55516 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1838,7 +1838,8 @@ static void slab_destroy_debugcheck(struct kmem_cache *cachep,
>  
>  		if (cachep->flags & SLAB_POISON) {
>  #ifdef CONFIG_DEBUG_PAGEALLOC
> -			if (cachep->size % PAGE_SIZE == 0 &&
> +			if (debug_pagealloc_enabled() &&
> +				cachep->size % PAGE_SIZE == 0 &&
>  					OFF_SLAB(cachep))
>  				kernel_map_pages(virt_to_page(objp),
>  					cachep->size / PAGE_SIZE, 1);
> @@ -2176,7 +2177,8 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
>  	 * to check size >= 256. It guarantees that all necessary small
>  	 * sized slab is initialized in current slab initialization sequence.
>  	 */
> -	if (!slab_early_init && size >= kmalloc_size(INDEX_NODE) &&
> +	if (debug_pagealloc_enabled() &&
> +		!slab_early_init && size >= kmalloc_size(INDEX_NODE) &&
>  		size >= 256 && cachep->object_size > cache_line_size() &&
>  		ALIGN(size, cachep->align) < PAGE_SIZE) {
>  		cachep->obj_offset += PAGE_SIZE - ALIGN(size, cachep->align);
> @@ -2232,7 +2234,8 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
>  		 * poisoning, then it's going to smash the contents of
>  		 * the redzone and userword anyhow, so switch them off.
>  		 */
> -		if (size % PAGE_SIZE == 0 && flags & SLAB_POISON)
> +		if (debug_pagealloc_enabled() &&
> +			size % PAGE_SIZE == 0 && flags & SLAB_POISON)
>  			flags &= ~(SLAB_RED_ZONE | SLAB_STORE_USER);

Sorry, but I dislike the indention style here (when the if covers
several lines). Same goes for other changes in this patch.  Looking,
there are several example of this indention style in the existing
mm/slab.c. Thus, I don't know if this is accepted in the MM area (it is
definitely not accepted in the NET-area).


>  #endif
>  	}
> @@ -2716,7 +2719,8 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
>  	set_obj_status(page, objnr, OBJECT_FREE);
>  	if (cachep->flags & SLAB_POISON) {
>  #ifdef CONFIG_DEBUG_PAGEALLOC
> -		if ((cachep->size % PAGE_SIZE)==0 && OFF_SLAB(cachep)) {
> +		if (debug_pagealloc_enabled() &&
> +			(cachep->size % PAGE_SIZE) == 0 && OFF_SLAB(cachep)) {
>  			store_stackinfo(cachep, objp, caller);
>  			kernel_map_pages(virt_to_page(objp),
>  					 cachep->size / PAGE_SIZE, 0);
> @@ -2861,7 +2865,8 @@ static void *cache_alloc_debugcheck_after(struct kmem_cache *cachep,
>  		return objp;
>  	if (cachep->flags & SLAB_POISON) {
>  #ifdef CONFIG_DEBUG_PAGEALLOC
> -		if ((cachep->size % PAGE_SIZE) == 0 && OFF_SLAB(cachep))
> +		if (debug_pagealloc_enabled() &&
> +			(cachep->size % PAGE_SIZE) == 0 && OFF_SLAB(cachep))
>  			kernel_map_pages(virt_to_page(objp),
>  					 cachep->size / PAGE_SIZE, 1);
>  		else



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 01/16] mm/slab: fix stale code comment
  2016-01-14  5:24 ` [PATCH 01/16] mm/slab: fix stale code comment Joonsoo Kim
@ 2016-01-14 15:22   ` Christoph Lameter
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Lameter @ 2016-01-14 15:22 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Jesper Dangaard Brouer, linux-mm, linux-kernel

On Thu, 14 Jan 2016, Joonsoo Kim wrote:

> We use freelist_idx_t type for free object management whose size
> would be smaller than size of unsigned int. Fix it.

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 02/16] mm/slab: remove useless structure define
  2016-01-14  5:24 ` [PATCH 02/16] mm/slab: remove useless structure define Joonsoo Kim
@ 2016-01-14 15:22   ` Christoph Lameter
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Lameter @ 2016-01-14 15:22 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Jesper Dangaard Brouer, linux-mm, linux-kernel

On Thu, 14 Jan 2016, Joonsoo Kim wrote:

> It is obsolete so remove it.

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/16] mm/slab: remove the checks for slab implementation bug
  2016-01-14  5:24 ` [PATCH 03/16] mm/slab: remove the checks for slab implementation bug Joonsoo Kim
@ 2016-01-14 15:23   ` Christoph Lameter
  2016-01-14 16:20     ` Joonsoo Kim
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Lameter @ 2016-01-14 15:23 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Jesper Dangaard Brouer, linux-mm, linux-kernel

On Thu, 14 Jan 2016, Joonsoo Kim wrote:

> Some of "#if DEBUG" are for reporting slab implementation bug
> rather than user usecase bug. It's not really needed because slab
> is stable for a quite long time and it makes code too dirty. This
> patch remove it.

Maybe better convert them to VM_BUG_ON() or so?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 09/16] mm/slab: put the freelist at the end of slab page
  2016-01-14  5:24 ` [PATCH 09/16] mm/slab: put the freelist at the end of slab page Joonsoo Kim
@ 2016-01-14 15:26   ` Christoph Lameter
  2016-01-14 16:21     ` Joonsoo Kim
  2016-01-14 17:13   ` Christoph Lameter
  1 sibling, 1 reply; 36+ messages in thread
From: Christoph Lameter @ 2016-01-14 15:26 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Jesper Dangaard Brouer, linux-mm, linux-kernel

On Thu, 14 Jan 2016, Joonsoo Kim wrote:

> Currently, the freelist is at the front of slab page. This requires
> extra space to meet object alignment requirement. If we put the freelist
> at the end of slab page, object could start at page boundary and will
> be at correct alignment. This is possible because freelist has
> no alignment constraint itself.
>
> This gives us two benefits. It removes extra memory space
> for the freelist alignment and remove complex calculation
> at cache initialization step. I can't think notable drawback here.


The third one is that the padding space at the end of the slab could
actually be used for the freelist if it fits.

The drawback may be that the location of the freelist at the beginning of
the page is more cache effective because the cache prefetcher may be able
to get the following cachelines and effectively hit the first object.
However, this is rather dubious speculation.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 16/16] mm/slab: introduce new slab management type, OBJFREELIST_SLAB
  2016-01-14  5:24 ` [PATCH 16/16] mm/slab: introduce new slab management type, OBJFREELIST_SLAB Joonsoo Kim
@ 2016-01-14 15:32   ` Christoph Lameter
  2016-01-14 16:24     ` Joonsoo Kim
  2016-01-27 13:35   ` Vlastimil Babka
  1 sibling, 1 reply; 36+ messages in thread
From: Christoph Lameter @ 2016-01-14 15:32 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Jesper Dangaard Brouer, linux-mm, linux-kernel

On Thu, 14 Jan 2016, Joonsoo Kim wrote:

> SLAB needs a array to manage freed objects in a slab. It is only used
> if some objects are freed so we can use free object itself as this array.
> This requires additional branch in somewhat critical lock path to check
> if it is first freed object or not but that's all we need. Benefits is
> that we can save extra memory usage and reduce some computational
> overhead by allocating a management array when new slab is created.

Hmmm... But then you need to have an offset in the page struct to
figure out where the freelist starts. One additional level of indirection.
Seems to have some negative impact on performance.

> In my system, without enabling CONFIG_DEBUG_SLAB, Almost caches become
> OBJFREELIST_SLAB and NORMAL_SLAB (using leftover) which doesn't waste
> memory. Following is the result of number of caches with specific slab
> management type.

Sounds good.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 04/16] mm/slab: activate debug_pagealloc in SLAB when it is actually enabled
  2016-01-14 12:09   ` Jesper Dangaard Brouer
@ 2016-01-14 16:16     ` Joonsoo Kim
  0 siblings, 0 replies; 36+ messages in thread
From: Joonsoo Kim @ 2016-01-14 16:16 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Andrew Morton, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Linux Memory Management List, LKML

2016-01-14 21:09 GMT+09:00 Jesper Dangaard Brouer <brouer@redhat.com>:
> On Thu, 14 Jan 2016 14:24:17 +0900
> Joonsoo Kim <js1304@gmail.com> wrote:
>
>> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> ---
>>  mm/slab.c | 15 ++++++++++-----
>>  1 file changed, 10 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/slab.c b/mm/slab.c
>> index bbe4df2..4b55516 100644
>> --- a/mm/slab.c
>> +++ b/mm/slab.c
>> @@ -1838,7 +1838,8 @@ static void slab_destroy_debugcheck(struct kmem_cache *cachep,
>>
>>               if (cachep->flags & SLAB_POISON) {
>>  #ifdef CONFIG_DEBUG_PAGEALLOC
>> -                     if (cachep->size % PAGE_SIZE == 0 &&
>> +                     if (debug_pagealloc_enabled() &&
>> +                             cachep->size % PAGE_SIZE == 0 &&
>>                                       OFF_SLAB(cachep))
>>                               kernel_map_pages(virt_to_page(objp),
>>                                       cachep->size / PAGE_SIZE, 1);
>> @@ -2176,7 +2177,8 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
>>        * to check size >= 256. It guarantees that all necessary small
>>        * sized slab is initialized in current slab initialization sequence.
>>        */
>> -     if (!slab_early_init && size >= kmalloc_size(INDEX_NODE) &&
>> +     if (debug_pagealloc_enabled() &&
>> +             !slab_early_init && size >= kmalloc_size(INDEX_NODE) &&
>>               size >= 256 && cachep->object_size > cache_line_size() &&
>>               ALIGN(size, cachep->align) < PAGE_SIZE) {
>>               cachep->obj_offset += PAGE_SIZE - ALIGN(size, cachep->align);
>> @@ -2232,7 +2234,8 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
>>                * poisoning, then it's going to smash the contents of
>>                * the redzone and userword anyhow, so switch them off.
>>                */
>> -             if (size % PAGE_SIZE == 0 && flags & SLAB_POISON)
>> +             if (debug_pagealloc_enabled() &&
>> +                     size % PAGE_SIZE == 0 && flags & SLAB_POISON)
>>                       flags &= ~(SLAB_RED_ZONE | SLAB_STORE_USER);
>
> Sorry, but I dislike the indention style here (when the if covers
> several lines). Same goes for other changes in this patch.  Looking,
> there are several example of this indention style in the existing
> mm/slab.c. Thus, I don't know if this is accepted in the MM area (it is
> definitely not accepted in the NET-area).

I guess it is acceptable in the MM. :)
Moreover, it is cleaned-up in the following patch.
But, I hope to know how it is handled in the NET-area.

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/16] mm/slab: remove the checks for slab implementation bug
  2016-01-14 15:23   ` Christoph Lameter
@ 2016-01-14 16:20     ` Joonsoo Kim
  0 siblings, 0 replies; 36+ messages in thread
From: Joonsoo Kim @ 2016-01-14 16:20 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Jesper Dangaard Brouer, Linux Memory Management List, LKML

2016-01-15 0:23 GMT+09:00 Christoph Lameter <cl@linux.com>:
> On Thu, 14 Jan 2016, Joonsoo Kim wrote:
>
>> Some of "#if DEBUG" are for reporting slab implementation bug
>> rather than user usecase bug. It's not really needed because slab
>> is stable for a quite long time and it makes code too dirty. This
>> patch remove it.
>
> Maybe better convert them to VM_BUG_ON() or so?

It's one possible solution but I'd like to make slab.c clean
as much as possible. Nowadays, SLAB code isn't changed
so much, therefore I don't think we need to keep them.

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 09/16] mm/slab: put the freelist at the end of slab page
  2016-01-14 15:26   ` Christoph Lameter
@ 2016-01-14 16:21     ` Joonsoo Kim
  0 siblings, 0 replies; 36+ messages in thread
From: Joonsoo Kim @ 2016-01-14 16:21 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Jesper Dangaard Brouer, Linux Memory Management List, LKML

2016-01-15 0:26 GMT+09:00 Christoph Lameter <cl@linux.com>:
> On Thu, 14 Jan 2016, Joonsoo Kim wrote:
>
>> Currently, the freelist is at the front of slab page. This requires
>> extra space to meet object alignment requirement. If we put the freelist
>> at the end of slab page, object could start at page boundary and will
>> be at correct alignment. This is possible because freelist has
>> no alignment constraint itself.
>>
>> This gives us two benefits. It removes extra memory space
>> for the freelist alignment and remove complex calculation
>> at cache initialization step. I can't think notable drawback here.
>
>
> The third one is that the padding space at the end of the slab could
> actually be used for the freelist if it fits.

Yes.

> The drawback may be that the location of the freelist at the beginning of
> the page is more cache effective because the cache prefetcher may be able
> to get the following cachelines and effectively hit the first object.
> However, this is rather dubious speculation.

I think so, too. :)
If then, could you give me an ack?

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 16/16] mm/slab: introduce new slab management type, OBJFREELIST_SLAB
  2016-01-14 15:32   ` Christoph Lameter
@ 2016-01-14 16:24     ` Joonsoo Kim
  0 siblings, 0 replies; 36+ messages in thread
From: Joonsoo Kim @ 2016-01-14 16:24 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Jesper Dangaard Brouer, Linux Memory Management List, LKML

2016-01-15 0:32 GMT+09:00 Christoph Lameter <cl@linux.com>:
> On Thu, 14 Jan 2016, Joonsoo Kim wrote:
>
>> SLAB needs a array to manage freed objects in a slab. It is only used
>> if some objects are freed so we can use free object itself as this array.
>> This requires additional branch in somewhat critical lock path to check
>> if it is first freed object or not but that's all we need. Benefits is
>> that we can save extra memory usage and reduce some computational
>> overhead by allocating a management array when new slab is created.
>
> Hmmm... But then you need to have an offset in the page struct to
> figure out where the freelist starts. One additional level of indirection.
> Seems to have some negative impact on performance.

SLAB already keeps the pointer where the freelist starts in the page struct.
So, there is no *additional* negative impact on performance.

>> In my system, without enabling CONFIG_DEBUG_SLAB, Almost caches become
>> OBJFREELIST_SLAB and NORMAL_SLAB (using leftover) which doesn't waste
>> memory. Following is the result of number of caches with specific slab
>> management type.
>
> Sounds good.

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 09/16] mm/slab: put the freelist at the end of slab page
  2016-01-14  5:24 ` [PATCH 09/16] mm/slab: put the freelist at the end of slab page Joonsoo Kim
  2016-01-14 15:26   ` Christoph Lameter
@ 2016-01-14 17:13   ` Christoph Lameter
  1 sibling, 0 replies; 36+ messages in thread
From: Christoph Lameter @ 2016-01-14 17:13 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Jesper Dangaard Brouer, linux-mm, linux-kernel

On Thu, 14 Jan 2016, Joonsoo Kim wrote:

>  /*
>   * Calculate the number of objects and left-over bytes for a given buffer size.
>   */
>  static void cache_estimate(unsigned long gfporder, size_t buffer_size,
> -			   size_t align, int flags, size_t *left_over,
> -			   unsigned int *num)
> +		unsigned long flags, size_t *left_over, unsigned int *num)
>  {

Return the number of objects from the function? Avoid returning values by
reference. left_over is already bad enough.

Otherwise

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/16] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB
  2016-01-14  5:24 [PATCH 00/16] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB Joonsoo Kim
                   ` (15 preceding siblings ...)
  2016-01-14  5:24 ` [PATCH 16/16] mm/slab: introduce new slab management type, OBJFREELIST_SLAB Joonsoo Kim
@ 2016-01-27  4:40 ` Andrew Morton
  2016-01-27  4:46   ` Joonsoo Kim
  16 siblings, 1 reply; 36+ messages in thread
From: Andrew Morton @ 2016-01-27  4:40 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Jesper Dangaard Brouer, linux-mm, linux-kernel

On Thu, 14 Jan 2016 14:24:13 +0900 Joonsoo Kim <js1304@gmail.com> wrote:

> This patchset implements new freed object management way, that is,
> OBJFREELIST_SLAB. Purpose of it is to reduce memory overhead in SLAB.
> 
> SLAB needs a array to manage freed objects in a slab. If there is
> leftover after objects are packed into a slab, we can use it as
> a management array, and, in this case, there is no memory waste.
> But, in the other cases, we need to allocate extra memory for
> a management array or utilize dedicated internal memory in a slab for it.
> Both cases causes memory waste so it's not good.
> 
> With this patchset, freed object itself can be used for a management
> array. So, memory waste could be reduced. Detailed idea and numbers
> are described in last patch's commit description. Please refer it.
> 
> In fact, I tested another idea implementing OBJFREELIST_SLAB with
> extendable linked array through another freed object. It can remove
> memory waste completely but it causes more computational overhead
> in critical lock path and it seems that overhead outweigh benefit.
> So, this patchset doesn't include it. I will attach prototype just for
> a reference.

It appears that this patchset is perhaps due a couple of touchups from
Christoph's comments.  I'll grab it as-is as I want to get an mmotm
into linux-next tomorrow then vanish for a few days.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/16] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB
  2016-01-27  4:40 ` [PATCH 00/16] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB Andrew Morton
@ 2016-01-27  4:46   ` Joonsoo Kim
  0 siblings, 0 replies; 36+ messages in thread
From: Joonsoo Kim @ 2016-01-27  4:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes,
	Jesper Dangaard Brouer, linux-mm, linux-kernel

On Tue, Jan 26, 2016 at 08:40:13PM -0800, Andrew Morton wrote:
> On Thu, 14 Jan 2016 14:24:13 +0900 Joonsoo Kim <js1304@gmail.com> wrote:
> 
> > This patchset implements new freed object management way, that is,
> > OBJFREELIST_SLAB. Purpose of it is to reduce memory overhead in SLAB.
> > 
> > SLAB needs a array to manage freed objects in a slab. If there is
> > leftover after objects are packed into a slab, we can use it as
> > a management array, and, in this case, there is no memory waste.
> > But, in the other cases, we need to allocate extra memory for
> > a management array or utilize dedicated internal memory in a slab for it.
> > Both cases causes memory waste so it's not good.
> > 
> > With this patchset, freed object itself can be used for a management
> > array. So, memory waste could be reduced. Detailed idea and numbers
> > are described in last patch's commit description. Please refer it.
> > 
> > In fact, I tested another idea implementing OBJFREELIST_SLAB with
> > extendable linked array through another freed object. It can remove
> > memory waste completely but it causes more computational overhead
> > in critical lock path and it seems that overhead outweigh benefit.
> > So, this patchset doesn't include it. I will attach prototype just for
> > a reference.
> 
> It appears that this patchset is perhaps due a couple of touchups from
> Christoph's comments.  I'll grab it as-is as I want to get an mmotm
> into linux-next tomorrow then vanish for a few days.

Hello, Andrew.

Could you add just one small fix below to 16/16 "mm/slab: introduce
new slab management type, OBJFREELIST_SLAB"?

Thanks.


------------>8-------------
>From f2afad62904fe3129c1a296987954076fb8f6b5a Mon Sep 17 00:00:00 2001
From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Date: Wed, 27 Jan 2016 10:36:05 +0900
Subject: [PATCH] fix SLAB_DESTROTY_BY_RCU

---
 mm/slab.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slab.c b/mm/slab.c
index 56777a7..ee44d78 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2019,7 +2019,7 @@ static bool set_objfreelist_slab_cache(struct kmem_cache *cachep,
 
 	cachep->num = 0;
 
-	if (cachep->ctor)
+	if (cachep->ctor || flags & SLAB_DESTROY_BY_RCU)
 		return false;
 
 	left = calculate_slab_order(cachep, size,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 16/16] mm/slab: introduce new slab management type, OBJFREELIST_SLAB
  2016-01-14  5:24 ` [PATCH 16/16] mm/slab: introduce new slab management type, OBJFREELIST_SLAB Joonsoo Kim
  2016-01-14 15:32   ` Christoph Lameter
@ 2016-01-27 13:35   ` Vlastimil Babka
  2016-01-27 16:48     ` Christoph Lameter
  2016-01-28  4:51     ` Joonsoo Kim
  1 sibling, 2 replies; 36+ messages in thread
From: Vlastimil Babka @ 2016-01-27 13:35 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Jesper Dangaard Brouer, linux-mm, linux-kernel

On 01/14/2016 06:24 AM, Joonsoo Kim wrote:
> In fact, I tested another idea implementing OBJFREELIST_SLAB with
> extendable linked array through another freed object. It can remove
> memory waste completely but it causes more computational overhead
> in critical lock path and it seems that overhead outweigh benefit.
> So, this patch doesn't include it.

Can you elaborate? Do we actually need an extendable linked array? Why not just
store the pointer to the next free object into the object, NULL for the last
one? I.e. a singly-linked list. We should never need to actually traverse it?

freeing object obj:
*obj = page->freelist;
page->freelist = obj;

allocating object:
obj = page->freelist;
page->freelist = *obj;
*obj = NULL;

That means two writes, but if we omit managing page->active, it's not an
increase. For counting free objects, we would need to traverse the list, but
that's only needed for debugging?

Also during bulk operations, page->freelist could be updated just once at the
very end.

Vlastimil

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 16/16] mm/slab: introduce new slab management type, OBJFREELIST_SLAB
  2016-01-27 13:35   ` Vlastimil Babka
@ 2016-01-27 16:48     ` Christoph Lameter
  2016-01-27 17:18       ` Vlastimil Babka
  2016-01-28  4:51     ` Joonsoo Kim
  1 sibling, 1 reply; 36+ messages in thread
From: Christoph Lameter @ 2016-01-27 16:48 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Joonsoo Kim, Andrew Morton, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Jesper Dangaard Brouer, linux-mm, linux-kernel

On Wed, 27 Jan 2016, Vlastimil Babka wrote:

> On 01/14/2016 06:24 AM, Joonsoo Kim wrote:
> > In fact, I tested another idea implementing OBJFREELIST_SLAB with
> > extendable linked array through another freed object. It can remove
> > memory waste completely but it causes more computational overhead
> > in critical lock path and it seems that overhead outweigh benefit.
> > So, this patch doesn't include it.
>
> Can you elaborate? Do we actually need an extendable linked array? Why not just
> store the pointer to the next free object into the object, NULL for the last
> one? I.e. a singly-linked list. We should never need to actually traverse it?
>
> freeing object obj:
> *obj = page->freelist;
> page->freelist = obj;
>
> allocating object:
> obj = page->freelist;
> page->freelist = *obj;
> *obj = NULL;

Well the single linked lists are a concept of another slab allocator. At
what point do we rename SLAB to SLUB2?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 16/16] mm/slab: introduce new slab management type, OBJFREELIST_SLAB
  2016-01-27 16:48     ` Christoph Lameter
@ 2016-01-27 17:18       ` Vlastimil Babka
  2016-01-27 17:55         ` Christoph Lameter
  0 siblings, 1 reply; 36+ messages in thread
From: Vlastimil Babka @ 2016-01-27 17:18 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Joonsoo Kim, Andrew Morton, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Jesper Dangaard Brouer, linux-mm, linux-kernel

On 01/27/2016 05:48 PM, Christoph Lameter wrote:
> On Wed, 27 Jan 2016, Vlastimil Babka wrote:
> 
>>
>> Can you elaborate? Do we actually need an extendable linked array? Why not just
>> store the pointer to the next free object into the object, NULL for the last
>> one? I.e. a singly-linked list. We should never need to actually traverse it?
>>
>> freeing object obj:
>> *obj = page->freelist;
>> page->freelist = obj;
>>
>> allocating object:
>> obj = page->freelist;
>> page->freelist = *obj;
>> *obj = NULL;
> 
> Well the single linked lists are a concept of another slab allocator. At
> what point do we rename SLAB to SLUB2?

OK. Perhaps a LSF/MM topic then to discuss whether we need both? What are the
remaining cases where SLAB is better choice, and can there be something done
about them in SLUB?

(I can imagine there were such discussions in the past, and I came to kernel
development only in 2013. In that case maybe enough time passed to revisit this?)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 16/16] mm/slab: introduce new slab management type, OBJFREELIST_SLAB
  2016-01-27 17:18       ` Vlastimil Babka
@ 2016-01-27 17:55         ` Christoph Lameter
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Lameter @ 2016-01-27 17:55 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Joonsoo Kim, Andrew Morton, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Jesper Dangaard Brouer, linux-mm, linux-kernel

On Wed, 27 Jan 2016, Vlastimil Babka wrote:

> OK. Perhaps a LSF/MM topic then to discuss whether we need both? What are the
> remaining cases where SLAB is better choice, and can there be something done
> about them in SLUB?

Right now one is driving the other which is good I think. So you may just
ignore my cynical comment.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 16/16] mm/slab: introduce new slab management type, OBJFREELIST_SLAB
  2016-01-27 13:35   ` Vlastimil Babka
  2016-01-27 16:48     ` Christoph Lameter
@ 2016-01-28  4:51     ` Joonsoo Kim
  2016-01-29 15:21       ` Vlastimil Babka
  1 sibling, 1 reply; 36+ messages in thread
From: Joonsoo Kim @ 2016-01-28  4:51 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Christoph Lameter, Pekka Enberg, David Rientjes,
	Jesper Dangaard Brouer, linux-mm, linux-kernel

On Wed, Jan 27, 2016 at 02:35:04PM +0100, Vlastimil Babka wrote:
> On 01/14/2016 06:24 AM, Joonsoo Kim wrote:
> > In fact, I tested another idea implementing OBJFREELIST_SLAB with
> > extendable linked array through another freed object. It can remove
> > memory waste completely but it causes more computational overhead
> > in critical lock path and it seems that overhead outweigh benefit.
> > So, this patch doesn't include it.
> 
> Can you elaborate? Do we actually need an extendable linked array? Why not just
> store the pointer to the next free object into the object, NULL for the last
> one? I.e. a singly-linked list. We should never need to actually traverse it?

As Christoph explained, it's the way SLUB manages freed objects. In SLAB
case, it doesn't want to touch object itself. It's one of main difference
between SLAB and SLUB. These objects are cache-cold now so touching object itself
could cause more cache footprint.

> 
> freeing object obj:
> *obj = page->freelist;
> page->freelist = obj;
> 
> allocating object:
> obj = page->freelist;
> page->freelist = *obj;
> *obj = NULL;
> 
> That means two writes, but if we omit managing page->active, it's not an

It's not just matter of number of instructions as explained above. Touching
more cache line should also be avoided.

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 16/16] mm/slab: introduce new slab management type, OBJFREELIST_SLAB
  2016-01-28  4:51     ` Joonsoo Kim
@ 2016-01-29 15:21       ` Vlastimil Babka
  0 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2016-01-29 15:21 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Christoph Lameter, Pekka Enberg, David Rientjes,
	Jesper Dangaard Brouer, linux-mm, linux-kernel

On 01/28/2016 05:51 AM, Joonsoo Kim wrote:
> On Wed, Jan 27, 2016 at 02:35:04PM +0100, Vlastimil Babka wrote:
>> On 01/14/2016 06:24 AM, Joonsoo Kim wrote:
>> > In fact, I tested another idea implementing OBJFREELIST_SLAB with
>> > extendable linked array through another freed object. It can remove
>> > memory waste completely but it causes more computational overhead
>> > in critical lock path and it seems that overhead outweigh benefit.
>> > So, this patch doesn't include it.
>> 
>> Can you elaborate? Do we actually need an extendable linked array? Why not just
>> store the pointer to the next free object into the object, NULL for the last
>> one? I.e. a singly-linked list. We should never need to actually traverse it?
> 
> As Christoph explained, it's the way SLUB manages freed objects. In SLAB
> case, it doesn't want to touch object itself. It's one of main difference
> between SLAB and SLUB. These objects are cache-cold now so touching object itself
> could cause more cache footprint.

Hm I see. Although I wouldn't bet on whether the now-freed object is more or
less cold than the freelist array itself (regardless of its placement) :)

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2016-01-29 15:21 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-14  5:24 [PATCH 00/16] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB Joonsoo Kim
2016-01-14  5:24 ` [PATCH 01/16] mm/slab: fix stale code comment Joonsoo Kim
2016-01-14 15:22   ` Christoph Lameter
2016-01-14  5:24 ` [PATCH 02/16] mm/slab: remove useless structure define Joonsoo Kim
2016-01-14 15:22   ` Christoph Lameter
2016-01-14  5:24 ` [PATCH 03/16] mm/slab: remove the checks for slab implementation bug Joonsoo Kim
2016-01-14 15:23   ` Christoph Lameter
2016-01-14 16:20     ` Joonsoo Kim
2016-01-14  5:24 ` [PATCH 04/16] mm/slab: activate debug_pagealloc in SLAB when it is actually enabled Joonsoo Kim
2016-01-14 12:09   ` Jesper Dangaard Brouer
2016-01-14 16:16     ` Joonsoo Kim
2016-01-14  5:24 ` [PATCH 05/16] mm/slab: use more appropriate condition check for debug_pagealloc Joonsoo Kim
2016-01-14  5:24 ` [PATCH 06/16] mm/slab: clean-up DEBUG_PAGEALLOC processing code Joonsoo Kim
2016-01-14  5:24 ` [PATCH 07/16] mm/slab: alternative implementation for DEBUG_SLAB_LEAK Joonsoo Kim
2016-01-14  5:24 ` [PATCH 08/16] mm/slab: remove object status buffer " Joonsoo Kim
2016-01-14  5:24 ` [PATCH 09/16] mm/slab: put the freelist at the end of slab page Joonsoo Kim
2016-01-14 15:26   ` Christoph Lameter
2016-01-14 16:21     ` Joonsoo Kim
2016-01-14 17:13   ` Christoph Lameter
2016-01-14  5:24 ` [PATCH 10/16] mm/slab: align cache size first before determination of OFF_SLAB candidate Joonsoo Kim
2016-01-14  5:24 ` [PATCH 11/16] mm/slab: clean-up cache type determination Joonsoo Kim
2016-01-14  5:24 ` [PATCH 12/16] mm/slab: do not change cache size if debug pagealloc isn't possible Joonsoo Kim
2016-01-14  5:24 ` [PATCH 13/16] mm/slab: make criteria for off slab determination robust and simple Joonsoo Kim
2016-01-14  5:24 ` [PATCH 14/16] mm/slab: factor out slab list fixup code Joonsoo Kim
2016-01-14  5:24 ` [PATCH 15/16] mm/slab: factor out debugging initialization in cache_init_objs() Joonsoo Kim
2016-01-14  5:24 ` [PATCH 16/16] mm/slab: introduce new slab management type, OBJFREELIST_SLAB Joonsoo Kim
2016-01-14 15:32   ` Christoph Lameter
2016-01-14 16:24     ` Joonsoo Kim
2016-01-27 13:35   ` Vlastimil Babka
2016-01-27 16:48     ` Christoph Lameter
2016-01-27 17:18       ` Vlastimil Babka
2016-01-27 17:55         ` Christoph Lameter
2016-01-28  4:51     ` Joonsoo Kim
2016-01-29 15:21       ` Vlastimil Babka
2016-01-27  4:40 ` [PATCH 00/16] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB Andrew Morton
2016-01-27  4:46   ` Joonsoo Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).