[PATCH v2 00/17] mm/slab: introduce new freed objects management way, OBJFREELIST

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 00/17] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB
@ 2016-02-26  6:01 js1304
  2016-02-26  6:01 ` [PATCH v2 01/17] mm/slab: fix stale code comment js1304
                   ` (16 more replies)
  0 siblings, 17 replies; 26+ messages in thread
From: js1304 @ 2016-02-26  6:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes,
	Jesper Dangaard Brouer, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

This submission is just intended to get more review if possible.
This patchset is already on mmotm for a month and there is
no problem at all. No changes from mmotm one.

Changes from v1:
o Fold fixes into corresponding ones (no difference from mmotm)
o Add a clean-up patch at last per Christoph's comment

Hello,

This patchset implements new freed object management way, that is,
OBJFREELIST_SLAB. Purpose of it is to reduce memory overhead in SLAB.

SLAB needs a array to manage freed objects in a slab. If there is
leftover after objects are packed into a slab, we can use it as
a management array, and, in this case, there is no memory waste.
But, in the other cases, we need to allocate extra memory for
a management array or utilize dedicated internal memory in a slab for it.
Both cases causes memory waste so it's not good.

With this patchset, freed object itself can be used for a management
array. So, memory waste could be reduced. Detailed idea and numbers
are described in last patch's commit description. Please refer it.

Thanks.

Joonsoo Kim (17):
  mm/slab: fix stale code comment
  mm/slab: remove useless structure define
  mm/slab: remove the checks for slab implementation bug
  mm/slab: activate debug_pagealloc in SLAB when it is actually enabled
  mm/slab: use more appropriate condition check for debug_pagealloc
  mm/slab: clean up DEBUG_PAGEALLOC processing code
  mm/slab: alternative implementation for DEBUG_SLAB_LEAK
  mm/slab: remove object status buffer for DEBUG_SLAB_LEAK
  mm/slab: put the freelist at the end of slab page
  mm/slab: align cache size first before determination of OFF_SLAB
    candidate
  mm/slab: clean up cache type determination
  mm/slab: do not change cache size if debug pagealloc isn't possible
  mm/slab: make criteria for off slab determination robust and simple
  mm/slab: factor out slab list fixup code
  mm/slab: factor out debugging initialization in cache_init_objs()
  mm/slab: introduce new slab management type, OBJFREELIST_SLAB
  mm/slab: avoid returning values by reference

 include/linux/mm.h       |  12 +-
 include/linux/slab_def.h |   3 +
 mm/slab.c                | 626 ++++++++++++++++++++++++++---------------------
 3 files changed, 362 insertions(+), 279 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 01/17] mm/slab: fix stale code comment
  2016-02-26  6:01 [PATCH v2 00/17] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB js1304
@ 2016-02-26  6:01 ` js1304
  2016-02-26  6:01 ` [PATCH v2 02/17] mm/slab: remove useless structure define js1304
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: js1304 @ 2016-02-26  6:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes,
	Jesper Dangaard Brouer, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

This patchset implements new freed object management way, that is,
OBJFREELIST_SLAB.  Purpose of it is to reduce memory overhead in SLAB.

SLAB needs a array to manage freed objects in a slab.  If there is
leftover after objects are packed into a slab, we can use it as a
management array, and, in this case, there is no memory waste.  But, in
the other cases, we need to allocate extra memory for a management array
or utilize dedicated internal memory in a slab for it.  Both cases causes
memory waste so it's not good.

With this patchset, freed object itself can be used for a management
array.  So, memory waste could be reduced.  Detailed idea and numbers are
described in last patch's commit description.  Please refer it.

In fact, I tested another idea implementing OBJFREELIST_SLAB with
extendable linked array through another freed object.  It can remove
memory waste completely but it causes more computational overhead in
critical lock path and it seems that overhead outweigh benefit.  So, this
patchset doesn't include it.  I will attach prototype just for a
reference.

This patch (of 16):

We use freelist_idx_t type for free object management whose size would be
smaller than size of unsigned int.  Fix it.

Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 mm/slab.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slab.c b/mm/slab.c
index b3eca03..23fc4b0 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -537,7 +537,7 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
 	 * on it. For the latter case, the memory allocated for a
 	 * slab is used for:
 	 *
-	 * - One unsigned int for each object
+	 * - One freelist_idx_t for each object
 	 * - Padding to respect alignment of @align
 	 * - @buffer_size bytes for each object
 	 *
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 02/17] mm/slab: remove useless structure define
  2016-02-26  6:01 [PATCH v2 00/17] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB js1304
  2016-02-26  6:01 ` [PATCH v2 01/17] mm/slab: fix stale code comment js1304
@ 2016-02-26  6:01 ` js1304
  2016-02-26  6:01 ` [PATCH v2 03/17] mm/slab: remove the checks for slab implementation bug js1304
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: js1304 @ 2016-02-26  6:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes,
	Jesper Dangaard Brouer, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

It is obsolete so remove it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 mm/slab.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 23fc4b0..3634dc1 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -225,16 +225,6 @@ static inline void clear_obj_pfmemalloc(void **objp)
 }
 
 /*
- * bootstrap: The caches do not work without cpuarrays anymore, but the
- * cpuarrays are allocated from the generic caches...
- */
-#define BOOT_CPUCACHE_ENTRIES	1
-struct arraycache_init {
-	struct array_cache cache;
-	void *entries[BOOT_CPUCACHE_ENTRIES];
-};
-
-/*
  * Need this for bootstrapping a per node allocator.
  */
 #define NUM_INIT_LISTS (2 * MAX_NUMNODES)
@@ -457,6 +447,7 @@ static inline unsigned int obj_to_index(const struct kmem_cache *cache,
 	return reciprocal_divide(offset, cache->reciprocal_buffer_size);
 }
 
+#define BOOT_CPUCACHE_ENTRIES	1
 /* internal cache of cache description objs */
 static struct kmem_cache kmem_cache_boot = {
 	.batchcount = 1,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 03/17] mm/slab: remove the checks for slab implementation bug
  2016-02-26  6:01 [PATCH v2 00/17] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB js1304
  2016-02-26  6:01 ` [PATCH v2 01/17] mm/slab: fix stale code comment js1304
  2016-02-26  6:01 ` [PATCH v2 02/17] mm/slab: remove useless structure define js1304
@ 2016-02-26  6:01 ` js1304
  2016-02-26 16:05   ` Christoph Lameter
  2016-02-26  6:01 ` [PATCH v2 04/17] mm/slab: activate debug_pagealloc in SLAB when it is actually enabled js1304
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 26+ messages in thread
From: js1304 @ 2016-02-26  6:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes,
	Jesper Dangaard Brouer, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Some of "#if DEBUG" are for reporting slab implementation bug rather than
user usecase bug.  It's not really needed because slab is stable for a
quite long time and it makes code too dirty.  This patch remove it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 mm/slab.c | 29 +++++++----------------------
 1 file changed, 7 insertions(+), 22 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 3634dc1..14c3f9c 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2110,8 +2110,6 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	if (!(flags & SLAB_DESTROY_BY_RCU))
 		flags |= SLAB_POISON;
 #endif
-	if (flags & SLAB_DESTROY_BY_RCU)
-		BUG_ON(flags & SLAB_POISON);
 #endif
 
 	/*
@@ -2368,9 +2366,6 @@ static int drain_freelist(struct kmem_cache *cache,
 		}
 
 		page = list_entry(p, struct page, lru);
-#if DEBUG
-		BUG_ON(page->active);
-#endif
 		list_del(&page->lru);
 		/*
 		 * Safe to drop the lock. The slab is no longer linked
@@ -2528,30 +2523,23 @@ static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
 	}
 }
 
-static void *slab_get_obj(struct kmem_cache *cachep, struct page *page,
-				int nodeid)
+static void *slab_get_obj(struct kmem_cache *cachep, struct page *page)
 {
 	void *objp;
 
 	objp = index_to_obj(cachep, page, get_free_obj(page, page->active));
 	page->active++;
-#if DEBUG
-	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
-#endif
 
 	return objp;
 }
 
-static void slab_put_obj(struct kmem_cache *cachep, struct page *page,
-				void *objp, int nodeid)
+static void slab_put_obj(struct kmem_cache *cachep,
+			struct page *page, void *objp)
 {
 	unsigned int objnr = obj_to_index(cachep, page, objp);
 #if DEBUG
 	unsigned int i;
 
-	/* Verify that the slab belongs to the intended node */
-	WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
-
 	/* Verify double free bug */
 	for (i = page->active; i < cachep->num; i++) {
 		if (get_free_obj(page, i) == objnr) {
@@ -2817,8 +2805,7 @@ retry:
 			STATS_INC_ACTIVE(cachep);
 			STATS_SET_HIGH(cachep);
 
-			ac_put_obj(cachep, ac, slab_get_obj(cachep, page,
-									node));
+			ac_put_obj(cachep, ac, slab_get_obj(cachep, page));
 		}
 
 		/* move slabp to correct slabp list: */
@@ -3101,7 +3088,7 @@ retry:
 
 	BUG_ON(page->active == cachep->num);
 
-	obj = slab_get_obj(cachep, page, nodeid);
+	obj = slab_get_obj(cachep, page);
 	n->free_objects--;
 	/* move slabp to correct slabp list: */
 	list_del(&page->lru);
@@ -3252,7 +3239,7 @@ static void free_block(struct kmem_cache *cachep, void **objpp,
 		page = virt_to_head_page(objp);
 		list_del(&page->lru);
 		check_spinlock_acquired_node(cachep, node);
-		slab_put_obj(cachep, page, objp, node);
+		slab_put_obj(cachep, page, objp);
 		STATS_DEC_ACTIVE(cachep);
 		n->free_objects++;
 
@@ -3282,9 +3269,7 @@ static void cache_flusharray(struct kmem_cache *cachep, struct array_cache *ac)
 	LIST_HEAD(list);
 
 	batchcount = ac->batchcount;
-#if DEBUG
-	BUG_ON(!batchcount || batchcount > ac->avail);
-#endif
+
 	check_irq_off();
 	n = get_node(cachep, node);
 	spin_lock(&n->list_lock);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 04/17] mm/slab: activate debug_pagealloc in SLAB when it is actually enabled
  2016-02-26  6:01 [PATCH v2 00/17] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB js1304
                   ` (2 preceding siblings ...)
  2016-02-26  6:01 ` [PATCH v2 03/17] mm/slab: remove the checks for slab implementation bug js1304
@ 2016-02-26  6:01 ` js1304
  2016-02-26 16:06   ` Christoph Lameter
  2016-02-26  6:01 ` [PATCH v2 05/17] mm/slab: use more appropriate condition check for debug_pagealloc js1304
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 26+ messages in thread
From: js1304 @ 2016-02-26  6:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes,
	Jesper Dangaard Brouer, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 mm/slab.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 14c3f9c..4807cf4 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1838,7 +1838,8 @@ static void slab_destroy_debugcheck(struct kmem_cache *cachep,
 
 		if (cachep->flags & SLAB_POISON) {
 #ifdef CONFIG_DEBUG_PAGEALLOC
-			if (cachep->size % PAGE_SIZE == 0 &&
+			if (debug_pagealloc_enabled() &&
+				cachep->size % PAGE_SIZE == 0 &&
 					OFF_SLAB(cachep))
 				kernel_map_pages(virt_to_page(objp),
 					cachep->size / PAGE_SIZE, 1);
@@ -2176,7 +2177,8 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	 * to check size >= 256. It guarantees that all necessary small
 	 * sized slab is initialized in current slab initialization sequence.
 	 */
-	if (!slab_early_init && size >= kmalloc_size(INDEX_NODE) &&
+	if (debug_pagealloc_enabled() &&
+		!slab_early_init && size >= kmalloc_size(INDEX_NODE) &&
 		size >= 256 && cachep->object_size > cache_line_size() &&
 		ALIGN(size, cachep->align) < PAGE_SIZE) {
 		cachep->obj_offset += PAGE_SIZE - ALIGN(size, cachep->align);
@@ -2232,7 +2234,8 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 		 * poisoning, then it's going to smash the contents of
 		 * the redzone and userword anyhow, so switch them off.
 		 */
-		if (size % PAGE_SIZE == 0 && flags & SLAB_POISON)
+		if (debug_pagealloc_enabled() &&
+			size % PAGE_SIZE == 0 && flags & SLAB_POISON)
 			flags &= ~(SLAB_RED_ZONE | SLAB_STORE_USER);
 #endif
 	}
@@ -2716,7 +2719,8 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
 	set_obj_status(page, objnr, OBJECT_FREE);
 	if (cachep->flags & SLAB_POISON) {
 #ifdef CONFIG_DEBUG_PAGEALLOC
-		if ((cachep->size % PAGE_SIZE)==0 && OFF_SLAB(cachep)) {
+		if (debug_pagealloc_enabled() &&
+			(cachep->size % PAGE_SIZE) == 0 && OFF_SLAB(cachep)) {
 			store_stackinfo(cachep, objp, caller);
 			kernel_map_pages(virt_to_page(objp),
 					 cachep->size / PAGE_SIZE, 0);
@@ -2861,7 +2865,8 @@ static void *cache_alloc_debugcheck_after(struct kmem_cache *cachep,
 		return objp;
 	if (cachep->flags & SLAB_POISON) {
 #ifdef CONFIG_DEBUG_PAGEALLOC
-		if ((cachep->size % PAGE_SIZE) == 0 && OFF_SLAB(cachep))
+		if (debug_pagealloc_enabled() &&
+			(cachep->size % PAGE_SIZE) == 0 && OFF_SLAB(cachep))
 			kernel_map_pages(virt_to_page(objp),
 					 cachep->size / PAGE_SIZE, 1);
 		else
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 05/17] mm/slab: use more appropriate condition check for debug_pagealloc
  2016-02-26  6:01 [PATCH v2 00/17] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB js1304
                   ` (3 preceding siblings ...)
  2016-02-26  6:01 ` [PATCH v2 04/17] mm/slab: activate debug_pagealloc in SLAB when it is actually enabled js1304
@ 2016-02-26  6:01 ` js1304
  2016-02-26 16:08   ` Christoph Lameter
  2016-02-26  6:01 ` [PATCH v2 06/17] mm/slab: clean up DEBUG_PAGEALLOC processing code js1304
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 26+ messages in thread
From: js1304 @ 2016-02-26  6:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes,
	Jesper Dangaard Brouer, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

debug_pagealloc debugging is related to SLAB_POISON flag rather than
FORCED_DEBUG option, although FORCED_DEBUG option will enable SLAB_POISON.
Fix it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 mm/slab.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 4807cf4..8bca9be 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2169,7 +2169,6 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 		else
 			size += BYTES_PER_WORD;
 	}
-#if FORCED_DEBUG && defined(CONFIG_DEBUG_PAGEALLOC)
 	/*
 	 * To activate debug pagealloc, off-slab management is necessary
 	 * requirement. In early phase of initialization, small sized slab
@@ -2177,7 +2176,7 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	 * to check size >= 256. It guarantees that all necessary small
 	 * sized slab is initialized in current slab initialization sequence.
 	 */
-	if (debug_pagealloc_enabled() &&
+	if (debug_pagealloc_enabled() && (flags & SLAB_POISON) &&
 		!slab_early_init && size >= kmalloc_size(INDEX_NODE) &&
 		size >= 256 && cachep->object_size > cache_line_size() &&
 		ALIGN(size, cachep->align) < PAGE_SIZE) {
@@ -2185,7 +2184,6 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 		size = PAGE_SIZE;
 	}
 #endif
-#endif
 
 	/*
 	 * Determine if the slab management is 'on' or 'off' slab.
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 06/17] mm/slab: clean up DEBUG_PAGEALLOC processing code
  2016-02-26  6:01 [PATCH v2 00/17] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB js1304
                   ` (4 preceding siblings ...)
  2016-02-26  6:01 ` [PATCH v2 05/17] mm/slab: use more appropriate condition check for debug_pagealloc js1304
@ 2016-02-26  6:01 ` js1304
  2016-02-26  6:01 ` [PATCH v2 07/17] mm/slab: alternative implementation for DEBUG_SLAB_LEAK js1304
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: js1304 @ 2016-02-26  6:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes,
	Jesper Dangaard Brouer, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Currently, open code for checking DEBUG_PAGEALLOC cache is spread to some
sites.  It makes code unreadable and hard to change.  This patch clean-up
these code.  Following patch will change the criteria for DEBUG_PAGEALLOC
cache so this clean-up will help it, too.

v2: fix build with CONFIG_DEBUG_PAGEALLOC=n by Andrew

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 include/linux/mm.h | 12 ++++---
 mm/slab.c          | 97 +++++++++++++++++++++++++++---------------------------
 2 files changed, 57 insertions(+), 52 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 516e149..7c6d47d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2194,14 +2194,18 @@ kernel_map_pages(struct page *page, int numpages, int enable)
 }
 #ifdef CONFIG_HIBERNATION
 extern bool kernel_page_present(struct page *page);
-#endif /* CONFIG_HIBERNATION */
-#else
+#endif	/* CONFIG_HIBERNATION */
+#else	/* CONFIG_DEBUG_PAGEALLOC */
 static inline void
 kernel_map_pages(struct page *page, int numpages, int enable) {}
 #ifdef CONFIG_HIBERNATION
 static inline bool kernel_page_present(struct page *page) { return true; }
-#endif /* CONFIG_HIBERNATION */
-#endif
+#endif	/* CONFIG_HIBERNATION */
+static inline bool debug_pagealloc_enabled(void)
+{
+	return false;
+}
+#endif	/* CONFIG_DEBUG_PAGEALLOC */
 
 #ifdef __HAVE_ARCH_GATE_AREA
 extern struct vm_area_struct *get_gate_vma(struct mm_struct *mm);
diff --git a/mm/slab.c b/mm/slab.c
index 8bca9be..3142ec3 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1661,6 +1661,14 @@ static void kmem_rcu_free(struct rcu_head *head)
 }
 
 #if DEBUG
+static bool is_debug_pagealloc_cache(struct kmem_cache *cachep)
+{
+	if (debug_pagealloc_enabled() && OFF_SLAB(cachep) &&
+		(cachep->size % PAGE_SIZE) == 0)
+		return true;
+
+	return false;
+}
 
 #ifdef CONFIG_DEBUG_PAGEALLOC
 static void store_stackinfo(struct kmem_cache *cachep, unsigned long *addr,
@@ -1694,6 +1702,23 @@ static void store_stackinfo(struct kmem_cache *cachep, unsigned long *addr,
 	}
 	*addr++ = 0x87654321;
 }
+
+static void slab_kernel_map(struct kmem_cache *cachep, void *objp,
+				int map, unsigned long caller)
+{
+	if (!is_debug_pagealloc_cache(cachep))
+		return;
+
+	if (caller)
+		store_stackinfo(cachep, objp, caller);
+
+	kernel_map_pages(virt_to_page(objp), cachep->size / PAGE_SIZE, map);
+}
+
+#else
+static inline void slab_kernel_map(struct kmem_cache *cachep, void *objp,
+				int map, unsigned long caller) {}
+
 #endif
 
 static void poison_obj(struct kmem_cache *cachep, void *addr, unsigned char val)
@@ -1772,6 +1797,9 @@ static void check_poison_obj(struct kmem_cache *cachep, void *objp)
 	int size, i;
 	int lines = 0;
 
+	if (is_debug_pagealloc_cache(cachep))
+		return;
+
 	realobj = (char *)objp + obj_offset(cachep);
 	size = cachep->object_size;
 
@@ -1837,17 +1865,8 @@ static void slab_destroy_debugcheck(struct kmem_cache *cachep,
 		void *objp = index_to_obj(cachep, page, i);
 
 		if (cachep->flags & SLAB_POISON) {
-#ifdef CONFIG_DEBUG_PAGEALLOC
-			if (debug_pagealloc_enabled() &&
-				cachep->size % PAGE_SIZE == 0 &&
-					OFF_SLAB(cachep))
-				kernel_map_pages(virt_to_page(objp),
-					cachep->size / PAGE_SIZE, 1);
-			else
-				check_poison_obj(cachep, objp);
-#else
 			check_poison_obj(cachep, objp);
-#endif
+			slab_kernel_map(cachep, objp, 1, 0);
 		}
 		if (cachep->flags & SLAB_RED_ZONE) {
 			if (*dbg_redzone1(cachep, objp) != RED_INACTIVE)
@@ -2226,16 +2245,6 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	if (flags & CFLGS_OFF_SLAB) {
 		/* really off slab. No need for manual alignment */
 		freelist_size = calculate_freelist_size(cachep->num, 0);
-
-#ifdef CONFIG_PAGE_POISONING
-		/* If we're going to use the generic kernel_map_pages()
-		 * poisoning, then it's going to smash the contents of
-		 * the redzone and userword anyhow, so switch them off.
-		 */
-		if (debug_pagealloc_enabled() &&
-			size % PAGE_SIZE == 0 && flags & SLAB_POISON)
-			flags &= ~(SLAB_RED_ZONE | SLAB_STORE_USER);
-#endif
 	}
 
 	cachep->colour_off = cache_line_size();
@@ -2251,7 +2260,19 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	cachep->size = size;
 	cachep->reciprocal_buffer_size = reciprocal_value(size);
 
-	if (flags & CFLGS_OFF_SLAB) {
+#if DEBUG
+	/*
+	 * If we're going to use the generic kernel_map_pages()
+	 * poisoning, then it's going to smash the contents of
+	 * the redzone and userword anyhow, so switch them off.
+	 */
+	if (IS_ENABLED(CONFIG_PAGE_POISONING) &&
+		(cachep->flags & SLAB_POISON) &&
+		is_debug_pagealloc_cache(cachep))
+		cachep->flags &= ~(SLAB_RED_ZONE | SLAB_STORE_USER);
+#endif
+
+	if (OFF_SLAB(cachep)) {
 		cachep->freelist_cache = kmalloc_slab(freelist_size, 0u);
 		/*
 		 * This is a possibility for one of the kmalloc_{dma,}_caches.
@@ -2475,9 +2496,6 @@ static void cache_init_objs(struct kmem_cache *cachep,
 	for (i = 0; i < cachep->num; i++) {
 		void *objp = index_to_obj(cachep, page, i);
 #if DEBUG
-		/* need to poison the objs? */
-		if (cachep->flags & SLAB_POISON)
-			poison_obj(cachep, objp, POISON_FREE);
 		if (cachep->flags & SLAB_STORE_USER)
 			*dbg_userword(cachep, objp) = NULL;
 
@@ -2501,10 +2519,11 @@ static void cache_init_objs(struct kmem_cache *cachep,
 				slab_error(cachep, "constructor overwrote the"
 					   " start of an object");
 		}
-		if ((cachep->size % PAGE_SIZE) == 0 &&
-			    OFF_SLAB(cachep) && cachep->flags & SLAB_POISON)
-			kernel_map_pages(virt_to_page(objp),
-					 cachep->size / PAGE_SIZE, 0);
+		/* need to poison the objs? */
+		if (cachep->flags & SLAB_POISON) {
+			poison_obj(cachep, objp, POISON_FREE);
+			slab_kernel_map(cachep, objp, 0, 0);
+		}
 #else
 		if (cachep->ctor)
 			cachep->ctor(objp);
@@ -2716,18 +2735,8 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
 
 	set_obj_status(page, objnr, OBJECT_FREE);
 	if (cachep->flags & SLAB_POISON) {
-#ifdef CONFIG_DEBUG_PAGEALLOC
-		if (debug_pagealloc_enabled() &&
-			(cachep->size % PAGE_SIZE) == 0 && OFF_SLAB(cachep)) {
-			store_stackinfo(cachep, objp, caller);
-			kernel_map_pages(virt_to_page(objp),
-					 cachep->size / PAGE_SIZE, 0);
-		} else {
-			poison_obj(cachep, objp, POISON_FREE);
-		}
-#else
 		poison_obj(cachep, objp, POISON_FREE);
-#endif
+		slab_kernel_map(cachep, objp, 0, caller);
 	}
 	return objp;
 }
@@ -2862,16 +2871,8 @@ static void *cache_alloc_debugcheck_after(struct kmem_cache *cachep,
 	if (!objp)
 		return objp;
 	if (cachep->flags & SLAB_POISON) {
-#ifdef CONFIG_DEBUG_PAGEALLOC
-		if (debug_pagealloc_enabled() &&
-			(cachep->size % PAGE_SIZE) == 0 && OFF_SLAB(cachep))
-			kernel_map_pages(virt_to_page(objp),
-					 cachep->size / PAGE_SIZE, 1);
-		else
-			check_poison_obj(cachep, objp);
-#else
 		check_poison_obj(cachep, objp);
-#endif
+		slab_kernel_map(cachep, objp, 1, 0);
 		poison_obj(cachep, objp, POISON_INUSE);
 	}
 	if (cachep->flags & SLAB_STORE_USER)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 07/17] mm/slab: alternative implementation for DEBUG_SLAB_LEAK
  2016-02-26  6:01 [PATCH v2 00/17] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB js1304
                   ` (5 preceding siblings ...)
  2016-02-26  6:01 ` [PATCH v2 06/17] mm/slab: clean up DEBUG_PAGEALLOC processing code js1304
@ 2016-02-26  6:01 ` js1304
  2016-02-26  6:01 ` [PATCH v2 08/17] mm/slab: remove object status buffer " js1304
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: js1304 @ 2016-02-26  6:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes,
	Jesper Dangaard Brouer, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

DEBUG_SLAB_LEAK is a debug option.  It's current implementation requires
status buffer so we need more memory to use it.  And, it cause kmem_cache
initialization step more complex.

To remove this extra memory usage and to simplify initialization step,
this patch implement this feature with another way.

When user requests to get slab object owner information, it marks that
getting information is started.  And then, all free objects in caches are
flushed to corresponding slab page.  Now, we can distinguish all freed
object so we can know all allocated objects, too.  After collecting slab
object owner information on allocated objects, mark is checked that there
is no free during the processing.  If true, we can be sure that our
information is correct so information is returned to user.

Although this way is rather complex, it has two important benefits
mentioned above.  So, I think it is worth changing.

There is one drawback that it takes more time to get slab object owner
information but it is just a debug option so it doesn't matter at all.

To help review, this patch implements new way only.  Following patch will
remove useless code.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 include/linux/slab_def.h |  3 ++
 mm/slab.c                | 85 +++++++++++++++++++++++++++++++++++-------------
 2 files changed, 66 insertions(+), 22 deletions(-)

diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index cf139d3..e878ba3 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -60,6 +60,9 @@ struct kmem_cache {
 	atomic_t allocmiss;
 	atomic_t freehit;
 	atomic_t freemiss;
+#ifdef CONFIG_DEBUG_SLAB_LEAK
+	atomic_t store_user_clean;
+#endif
 
 	/*
 	 * If debugging is enabled, then the allocator can add additional
diff --git a/mm/slab.c b/mm/slab.c
index 3142ec3..907abe9 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -396,20 +396,25 @@ static void set_obj_status(struct page *page, int idx, int val)
 	status[idx] = val;
 }
 
-static inline unsigned int get_obj_status(struct page *page, int idx)
+static inline bool is_store_user_clean(struct kmem_cache *cachep)
 {
-	int freelist_size;
-	char *status;
-	struct kmem_cache *cachep = page->slab_cache;
+	return atomic_read(&cachep->store_user_clean) == 1;
+}
 
-	freelist_size = cachep->num * sizeof(freelist_idx_t);
-	status = (char *)page->freelist + freelist_size;
+static inline void set_store_user_clean(struct kmem_cache *cachep)
+{
+	atomic_set(&cachep->store_user_clean, 1);
+}
 
-	return status[idx];
+static inline void set_store_user_dirty(struct kmem_cache *cachep)
+{
+	if (is_store_user_clean(cachep))
+		atomic_set(&cachep->store_user_clean, 0);
 }
 
 #else
 static inline void set_obj_status(struct page *page, int idx, int val) {}
+static inline void set_store_user_dirty(struct kmem_cache *cachep) {}
 
 #endif
 
@@ -2550,6 +2555,11 @@ static void *slab_get_obj(struct kmem_cache *cachep, struct page *page)
 	objp = index_to_obj(cachep, page, get_free_obj(page, page->active));
 	page->active++;
 
+#if DEBUG
+	if (cachep->flags & SLAB_STORE_USER)
+		set_store_user_dirty(cachep);
+#endif
+
 	return objp;
 }
 
@@ -2725,8 +2735,10 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
 		*dbg_redzone1(cachep, objp) = RED_INACTIVE;
 		*dbg_redzone2(cachep, objp) = RED_INACTIVE;
 	}
-	if (cachep->flags & SLAB_STORE_USER)
+	if (cachep->flags & SLAB_STORE_USER) {
+		set_store_user_dirty(cachep);
 		*dbg_userword(cachep, objp) = (void *)caller;
+	}
 
 	objnr = obj_to_index(cachep, page, objp);
 
@@ -4119,15 +4131,34 @@ static void handle_slab(unsigned long *n, struct kmem_cache *c,
 						struct page *page)
 {
 	void *p;
-	int i;
+	int i, j;
+	unsigned long v;
 
 	if (n[0] == n[1])
 		return;
 	for (i = 0, p = page->s_mem; i < c->num; i++, p += c->size) {
-		if (get_obj_status(page, i) != OBJECT_ACTIVE)
+		bool active = true;
+
+		for (j = page->active; j < c->num; j++) {
+			if (get_free_obj(page, j) == i) {
+				active = false;
+				break;
+			}
+		}
+
+		if (!active)
 			continue;
 
-		if (!add_caller(n, (unsigned long)*dbg_userword(c, p)))
+		/*
+		 * probe_kernel_read() is used for DEBUG_PAGEALLOC. page table
+		 * mapping is established when actual object allocation and
+		 * we could mistakenly access the unmapped object in the cpu
+		 * cache.
+		 */
+		if (probe_kernel_read(&v, dbg_userword(c, p), sizeof(v)))
+			continue;
+
+		if (!add_caller(n, v))
 			return;
 	}
 }
@@ -4163,21 +4194,31 @@ static int leaks_show(struct seq_file *m, void *p)
 	if (!(cachep->flags & SLAB_RED_ZONE))
 		return 0;
 
-	/* OK, we can do it */
+	/*
+	 * Set store_user_clean and start to grab stored user information
+	 * for all objects on this cache. If some alloc/free requests comes
+	 * during the processing, information would be wrong so restart
+	 * whole processing.
+	 */
+	do {
+		set_store_user_clean(cachep);
+		drain_cpu_caches(cachep);
+
+		x[1] = 0;
 
-	x[1] = 0;
+		for_each_kmem_cache_node(cachep, node, n) {
 
-	for_each_kmem_cache_node(cachep, node, n) {
+			check_irq_on();
+			spin_lock_irq(&n->list_lock);
 
-		check_irq_on();
-		spin_lock_irq(&n->list_lock);
+			list_for_each_entry(page, &n->slabs_full, lru)
+				handle_slab(x, cachep, page);
+			list_for_each_entry(page, &n->slabs_partial, lru)
+				handle_slab(x, cachep, page);
+			spin_unlock_irq(&n->list_lock);
+		}
+	} while (!is_store_user_clean(cachep));
 
-		list_for_each_entry(page, &n->slabs_full, lru)
-			handle_slab(x, cachep, page);
-		list_for_each_entry(page, &n->slabs_partial, lru)
-			handle_slab(x, cachep, page);
-		spin_unlock_irq(&n->list_lock);
-	}
 	name = cachep->name;
 	if (x[0] == x[1]) {
 		/* Increase the buffer size */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 08/17] mm/slab: remove object status buffer for DEBUG_SLAB_LEAK
  2016-02-26  6:01 [PATCH v2 00/17] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB js1304
                   ` (6 preceding siblings ...)
  2016-02-26  6:01 ` [PATCH v2 07/17] mm/slab: alternative implementation for DEBUG_SLAB_LEAK js1304
@ 2016-02-26  6:01 ` js1304
  2016-02-26  6:01 ` [PATCH v2 09/17] mm/slab: put the freelist at the end of slab page js1304
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: js1304 @ 2016-02-26  6:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes,
	Jesper Dangaard Brouer, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Now, we don't use object status buffer in any setup. Remove it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 mm/slab.c | 34 ++--------------------------------
 1 file changed, 2 insertions(+), 32 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 907abe9..02be9d9 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -380,22 +380,8 @@ static void **dbg_userword(struct kmem_cache *cachep, void *objp)
 
 #endif
 
-#define OBJECT_FREE (0)
-#define OBJECT_ACTIVE (1)
-
 #ifdef CONFIG_DEBUG_SLAB_LEAK
 
-static void set_obj_status(struct page *page, int idx, int val)
-{
-	int freelist_size;
-	char *status;
-	struct kmem_cache *cachep = page->slab_cache;
-
-	freelist_size = cachep->num * sizeof(freelist_idx_t);
-	status = (char *)page->freelist + freelist_size;
-	status[idx] = val;
-}
-
 static inline bool is_store_user_clean(struct kmem_cache *cachep)
 {
 	return atomic_read(&cachep->store_user_clean) == 1;
@@ -413,7 +399,6 @@ static inline void set_store_user_dirty(struct kmem_cache *cachep)
 }
 
 #else
-static inline void set_obj_status(struct page *page, int idx, int val) {}
 static inline void set_store_user_dirty(struct kmem_cache *cachep) {}
 
 #endif
@@ -476,9 +461,6 @@ static size_t calculate_freelist_size(int nr_objs, size_t align)
 	size_t freelist_size;
 
 	freelist_size = nr_objs * sizeof(freelist_idx_t);
-	if (IS_ENABLED(CONFIG_DEBUG_SLAB_LEAK))
-		freelist_size += nr_objs * sizeof(char);
-
 	if (align)
 		freelist_size = ALIGN(freelist_size, align);
 
@@ -491,10 +473,7 @@ static int calculate_nr_objs(size_t slab_size, size_t buffer_size,
 	int nr_objs;
 	size_t remained_size;
 	size_t freelist_size;
-	int extra_space = 0;
 
-	if (IS_ENABLED(CONFIG_DEBUG_SLAB_LEAK))
-		extra_space = sizeof(char);
 	/*
 	 * Ignore padding for the initial guess. The padding
 	 * is at most @align-1 bytes, and @buffer_size is at
@@ -503,7 +482,7 @@ static int calculate_nr_objs(size_t slab_size, size_t buffer_size,
 	 * into the memory allocation when taking the padding
 	 * into account.
 	 */
-	nr_objs = slab_size / (buffer_size + idx_size + extra_space);
+	nr_objs = slab_size / (buffer_size + idx_size);
 
 	/*
 	 * This calculated number will be either the right
@@ -1961,16 +1940,13 @@ static size_t calculate_slab_order(struct kmem_cache *cachep,
 			break;
 
 		if (flags & CFLGS_OFF_SLAB) {
-			size_t freelist_size_per_obj = sizeof(freelist_idx_t);
 			/*
 			 * Max number of objs-per-slab for caches which
 			 * use off-slab slabs. Needed to avoid a possible
 			 * looping condition in cache_grow().
 			 */
-			if (IS_ENABLED(CONFIG_DEBUG_SLAB_LEAK))
-				freelist_size_per_obj += sizeof(char);
 			offslab_limit = size;
-			offslab_limit /= freelist_size_per_obj;
+			offslab_limit /= sizeof(freelist_idx_t);
 
  			if (num > offslab_limit)
 				break;
@@ -2533,7 +2509,6 @@ static void cache_init_objs(struct kmem_cache *cachep,
 		if (cachep->ctor)
 			cachep->ctor(objp);
 #endif
-		set_obj_status(page, i, OBJECT_FREE);
 		set_free_obj(page, i, i);
 	}
 }
@@ -2745,7 +2720,6 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
 	BUG_ON(objnr >= cachep->num);
 	BUG_ON(objp != index_to_obj(cachep, page, objnr));
 
-	set_obj_status(page, objnr, OBJECT_FREE);
 	if (cachep->flags & SLAB_POISON) {
 		poison_obj(cachep, objp, POISON_FREE);
 		slab_kernel_map(cachep, objp, 0, caller);
@@ -2878,8 +2852,6 @@ static inline void cache_alloc_debugcheck_before(struct kmem_cache *cachep,
 static void *cache_alloc_debugcheck_after(struct kmem_cache *cachep,
 				gfp_t flags, void *objp, unsigned long caller)
 {
-	struct page *page;
-
 	if (!objp)
 		return objp;
 	if (cachep->flags & SLAB_POISON) {
@@ -2904,8 +2876,6 @@ static void *cache_alloc_debugcheck_after(struct kmem_cache *cachep,
 		*dbg_redzone2(cachep, objp) = RED_ACTIVE;
 	}
 
-	page = virt_to_head_page(objp);
-	set_obj_status(page, obj_to_index(cachep, page, objp), OBJECT_ACTIVE);
 	objp += obj_offset(cachep);
 	if (cachep->ctor && cachep->flags & SLAB_POISON)
 		cachep->ctor(objp);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 09/17] mm/slab: put the freelist at the end of slab page
  2016-02-26  6:01 [PATCH v2 00/17] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB js1304
                   ` (7 preceding siblings ...)
  2016-02-26  6:01 ` [PATCH v2 08/17] mm/slab: remove object status buffer " js1304
@ 2016-02-26  6:01 ` js1304
  2016-02-26  6:01 ` [PATCH v2 10/17] mm/slab: align cache size first before determination of OFF_SLAB candidate js1304
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: js1304 @ 2016-02-26  6:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes,
	Jesper Dangaard Brouer, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Currently, the freelist is at the front of slab page.  This requires extra
space to meet object alignment requirement.  If we put the freelist at the
end of slab page, object could start at page boundary and will be at
correct alignment.  This is possible because freelist has no alignment
constraint itself.

This gives us two benefits.  It removes extra memory space for the
freelist alignment and remove complex calculation at cache initialization
step.  I can't think notable drawback here.

I mentioned that this would reduce extra memory space, but, this benefit
is rather theoretical because it can be applied to very few cases.
Following is the example cache type that can get benefit from this change.

size align num before after
32 8 124 4100 4092
64 8 63 4103 4095
88 8 46 4102 4094
272 8 15 4103 4095
408 8 10 4098 4090
32 16 124 4108 4092
64 16 63 4111 4095
32 32 124 4124 4092
64 32 63 4127 4095
96 32 42 4106 4074

before means whole size for objects and aligned freelist before applying
patch and after shows the result of this patch.

Since before is more than 4096, number of object should decrease and
memory waste happens.

Anyway, this patch removes complex calculation so looks beneficial to me.

v2: fix kerneldoc by Andrew

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 mm/slab.c | 90 ++++++++++++++++-----------------------------------------------
 1 file changed, 22 insertions(+), 68 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 02be9d9..b3d91b0 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -456,55 +456,12 @@ static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep)
 	return this_cpu_ptr(cachep->cpu_cache);
 }
 
-static size_t calculate_freelist_size(int nr_objs, size_t align)
-{
-	size_t freelist_size;
-
-	freelist_size = nr_objs * sizeof(freelist_idx_t);
-	if (align)
-		freelist_size = ALIGN(freelist_size, align);
-
-	return freelist_size;
-}
-
-static int calculate_nr_objs(size_t slab_size, size_t buffer_size,
-				size_t idx_size, size_t align)
-{
-	int nr_objs;
-	size_t remained_size;
-	size_t freelist_size;
-
-	/*
-	 * Ignore padding for the initial guess. The padding
-	 * is at most @align-1 bytes, and @buffer_size is at
-	 * least @align. In the worst case, this result will
-	 * be one greater than the number of objects that fit
-	 * into the memory allocation when taking the padding
-	 * into account.
-	 */
-	nr_objs = slab_size / (buffer_size + idx_size);
-
-	/*
-	 * This calculated number will be either the right
-	 * amount, or one greater than what we want.
-	 */
-	remained_size = slab_size - nr_objs * buffer_size;
-	freelist_size = calculate_freelist_size(nr_objs, align);
-	if (remained_size < freelist_size)
-		nr_objs--;
-
-	return nr_objs;
-}
-
 /*
  * Calculate the number of objects and left-over bytes for a given buffer size.
  */
 static void cache_estimate(unsigned long gfporder, size_t buffer_size,
-			   size_t align, int flags, size_t *left_over,
-			   unsigned int *num)
+		unsigned long flags, size_t *left_over, unsigned int *num)
 {
-	int nr_objs;
-	size_t mgmt_size;
 	size_t slab_size = PAGE_SIZE << gfporder;
 
 	/*
@@ -512,9 +469,12 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
 	 * on it. For the latter case, the memory allocated for a
 	 * slab is used for:
 	 *
-	 * - One freelist_idx_t for each object
-	 * - Padding to respect alignment of @align
 	 * - @buffer_size bytes for each object
+	 * - One freelist_idx_t for each object
+	 *
+	 * We don't need to consider alignment of freelist because
+	 * freelist will be at the end of slab page. The objects will be
+	 * at the correct alignment.
 	 *
 	 * If the slab management structure is off the slab, then the
 	 * alignment will already be calculated into the size. Because
@@ -522,16 +482,13 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
 	 * correct alignment when allocated.
 	 */
 	if (flags & CFLGS_OFF_SLAB) {
-		mgmt_size = 0;
-		nr_objs = slab_size / buffer_size;
-
+		*num = slab_size / buffer_size;
+		*left_over = slab_size % buffer_size;
 	} else {
-		nr_objs = calculate_nr_objs(slab_size, buffer_size,
-					sizeof(freelist_idx_t), align);
-		mgmt_size = calculate_freelist_size(nr_objs, align);
+		*num = slab_size / (buffer_size + sizeof(freelist_idx_t));
+		*left_over = slab_size %
+			(buffer_size + sizeof(freelist_idx_t));
 	}
-	*num = nr_objs;
-	*left_over = slab_size - nr_objs*buffer_size - mgmt_size;
 }
 
 #if DEBUG
@@ -1911,7 +1868,6 @@ static void slabs_destroy(struct kmem_cache *cachep, struct list_head *list)
  * calculate_slab_order - calculate size (page order) of slabs
  * @cachep: pointer to the cache that is being created
  * @size: size of objects to be created in this cache.
- * @align: required alignment for the objects.
  * @flags: slab allocation flags
  *
  * Also calculates the number of objects per slab.
@@ -1921,7 +1877,7 @@ static void slabs_destroy(struct kmem_cache *cachep, struct list_head *list)
  * towards high-order requests, this should be changed.
  */
 static size_t calculate_slab_order(struct kmem_cache *cachep,
-			size_t size, size_t align, unsigned long flags)
+				size_t size, unsigned long flags)
 {
 	unsigned long offslab_limit;
 	size_t left_over = 0;
@@ -1931,7 +1887,7 @@ static size_t calculate_slab_order(struct kmem_cache *cachep,
 		unsigned int num;
 		size_t remainder;
 
-		cache_estimate(gfporder, size, align, flags, &remainder, &num);
+		cache_estimate(gfporder, size, flags, &remainder, &num);
 		if (!num)
 			continue;
 
@@ -2207,12 +2163,12 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	if (FREELIST_BYTE_INDEX && size < SLAB_OBJ_MIN_SIZE)
 		size = ALIGN(SLAB_OBJ_MIN_SIZE, cachep->align);
 
-	left_over = calculate_slab_order(cachep, size, cachep->align, flags);
+	left_over = calculate_slab_order(cachep, size, flags);
 
 	if (!cachep->num)
 		return -E2BIG;
 
-	freelist_size = calculate_freelist_size(cachep->num, cachep->align);
+	freelist_size = cachep->num * sizeof(freelist_idx_t);
 
 	/*
 	 * If the slab has been placed off-slab, and we have enough space then
@@ -2223,11 +2179,6 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 		left_over -= freelist_size;
 	}
 
-	if (flags & CFLGS_OFF_SLAB) {
-		/* really off slab. No need for manual alignment */
-		freelist_size = calculate_freelist_size(cachep->num, 0);
-	}
-
 	cachep->colour_off = cache_line_size();
 	/* Offset must be a multiple of the alignment. */
 	if (cachep->colour_off < cachep->align)
@@ -2443,6 +2394,9 @@ static void *alloc_slabmgmt(struct kmem_cache *cachep,
 	void *freelist;
 	void *addr = page_address(page);
 
+	page->s_mem = addr + colour_off;
+	page->active = 0;
+
 	if (OFF_SLAB(cachep)) {
 		/* Slab management obj is off-slab. */
 		freelist = kmem_cache_alloc_node(cachep->freelist_cache,
@@ -2450,11 +2404,11 @@ static void *alloc_slabmgmt(struct kmem_cache *cachep,
 		if (!freelist)
 			return NULL;
 	} else {
-		freelist = addr + colour_off;
-		colour_off += cachep->freelist_size;
+		/* We will use last bytes at the slab for freelist */
+		freelist = addr + (PAGE_SIZE << cachep->gfporder) -
+				cachep->freelist_size;
 	}
-	page->active = 0;
-	page->s_mem = addr + colour_off;
+
 	return freelist;
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 10/17] mm/slab: align cache size first before determination of OFF_SLAB candidate
  2016-02-26  6:01 [PATCH v2 00/17] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB js1304
                   ` (8 preceding siblings ...)
  2016-02-26  6:01 ` [PATCH v2 09/17] mm/slab: put the freelist at the end of slab page js1304
@ 2016-02-26  6:01 ` js1304
  2016-02-26  6:01 ` [PATCH v2 11/17] mm/slab: clean up cache type determination js1304
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: js1304 @ 2016-02-26  6:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes,
	Jesper Dangaard Brouer, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Finding suitable OFF_SLAB candidate is more related to aligned cache size
rather than original size.  Same reasoning can be applied to the debug
pagealloc candidate.  So, this patch moves up alignment fixup to proper
position.  From that point, size is aligned so we can remove some
alignment fixups.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 mm/slab.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index b3d91b0..d5dffc8 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2125,6 +2125,17 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 		else
 			size += BYTES_PER_WORD;
 	}
+#endif
+
+	size = ALIGN(size, cachep->align);
+	/*
+	 * We should restrict the number of objects in a slab to implement
+	 * byte sized index. Refer comment on SLAB_OBJ_MIN_SIZE definition.
+	 */
+	if (FREELIST_BYTE_INDEX && size < SLAB_OBJ_MIN_SIZE)
+		size = ALIGN(SLAB_OBJ_MIN_SIZE, cachep->align);
+
+#if DEBUG
 	/*
 	 * To activate debug pagealloc, off-slab management is necessary
 	 * requirement. In early phase of initialization, small sized slab
@@ -2135,8 +2146,8 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	if (debug_pagealloc_enabled() && (flags & SLAB_POISON) &&
 		!slab_early_init && size >= kmalloc_size(INDEX_NODE) &&
 		size >= 256 && cachep->object_size > cache_line_size() &&
-		ALIGN(size, cachep->align) < PAGE_SIZE) {
-		cachep->obj_offset += PAGE_SIZE - ALIGN(size, cachep->align);
+		size < PAGE_SIZE) {
+		cachep->obj_offset += PAGE_SIZE - size;
 		size = PAGE_SIZE;
 	}
 #endif
@@ -2148,20 +2159,13 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	 * SLAB_NOLEAKTRACE to avoid recursive calls into kmemleak)
 	 */
 	if (size >= OFF_SLAB_MIN_SIZE && !slab_early_init &&
-	    !(flags & SLAB_NOLEAKTRACE))
+	    !(flags & SLAB_NOLEAKTRACE)) {
 		/*
 		 * Size is large, assume best to place the slab management obj
 		 * off-slab (should allow better packing of objs).
 		 */
 		flags |= CFLGS_OFF_SLAB;
-
-	size = ALIGN(size, cachep->align);
-	/*
-	 * We should restrict the number of objects in a slab to implement
-	 * byte sized index. Refer comment on SLAB_OBJ_MIN_SIZE definition.
-	 */
-	if (FREELIST_BYTE_INDEX && size < SLAB_OBJ_MIN_SIZE)
-		size = ALIGN(SLAB_OBJ_MIN_SIZE, cachep->align);
+	}
 
 	left_over = calculate_slab_order(cachep, size, flags);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 11/17] mm/slab: clean up cache type determination
  2016-02-26  6:01 [PATCH v2 00/17] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB js1304
                   ` (9 preceding siblings ...)
  2016-02-26  6:01 ` [PATCH v2 10/17] mm/slab: align cache size first before determination of OFF_SLAB candidate js1304
@ 2016-02-26  6:01 ` js1304
  2016-02-26  6:01 ` [PATCH v2 12/17] mm/slab: do not change cache size if debug pagealloc isn't possible js1304
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: js1304 @ 2016-02-26  6:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes,
	Jesper Dangaard Brouer, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Current cache type determination code is open-code and looks not
understandable.  Following patch will introduce one more cache type and it
would make code more complex.  So, before it happens, this patch abstracts
these codes.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 mm/slab.c | 105 ++++++++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 71 insertions(+), 34 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index d5dffc8..9b56685 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2023,6 +2023,64 @@ __kmem_cache_alias(const char *name, size_t size, size_t align,
 	return cachep;
 }
 
+static bool set_off_slab_cache(struct kmem_cache *cachep,
+			size_t size, unsigned long flags)
+{
+	size_t left;
+
+	cachep->num = 0;
+
+	/*
+	 * Determine if the slab management is 'on' or 'off' slab.
+	 * (bootstrapping cannot cope with offslab caches so don't do
+	 * it too early on. Always use on-slab management when
+	 * SLAB_NOLEAKTRACE to avoid recursive calls into kmemleak)
+	 */
+	if (size < OFF_SLAB_MIN_SIZE)
+		return false;
+
+	if (slab_early_init)
+		return false;
+
+	if (flags & SLAB_NOLEAKTRACE)
+		return false;
+
+	/*
+	 * Size is large, assume best to place the slab management obj
+	 * off-slab (should allow better packing of objs).
+	 */
+	left = calculate_slab_order(cachep, size, flags | CFLGS_OFF_SLAB);
+	if (!cachep->num)
+		return false;
+
+	/*
+	 * If the slab has been placed off-slab, and we have enough space then
+	 * move it on-slab. This is at the expense of any extra colouring.
+	 */
+	if (left >= cachep->num * sizeof(freelist_idx_t))
+		return false;
+
+	cachep->colour = left / cachep->colour_off;
+
+	return true;
+}
+
+static bool set_on_slab_cache(struct kmem_cache *cachep,
+			size_t size, unsigned long flags)
+{
+	size_t left;
+
+	cachep->num = 0;
+
+	left = calculate_slab_order(cachep, size, flags);
+	if (!cachep->num)
+		return false;
+
+	cachep->colour = left / cachep->colour_off;
+
+	return true;
+}
+
 /**
  * __kmem_cache_create - Create a cache.
  * @cachep: cache management descriptor
@@ -2047,7 +2105,6 @@ __kmem_cache_alias(const char *name, size_t size, size_t align,
 int
 __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 {
-	size_t left_over, freelist_size;
 	size_t ralign = BYTES_PER_WORD;
 	gfp_t gfp;
 	int err;
@@ -2098,6 +2155,10 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	 * 4) Store it.
 	 */
 	cachep->align = ralign;
+	cachep->colour_off = cache_line_size();
+	/* Offset must be a multiple of the alignment. */
+	if (cachep->colour_off < cachep->align)
+		cachep->colour_off = cachep->align;
 
 	if (slab_is_available())
 		gfp = GFP_KERNEL;
@@ -2152,43 +2213,18 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	}
 #endif
 
-	/*
-	 * Determine if the slab management is 'on' or 'off' slab.
-	 * (bootstrapping cannot cope with offslab caches so don't do
-	 * it too early on. Always use on-slab management when
-	 * SLAB_NOLEAKTRACE to avoid recursive calls into kmemleak)
-	 */
-	if (size >= OFF_SLAB_MIN_SIZE && !slab_early_init &&
-	    !(flags & SLAB_NOLEAKTRACE)) {
-		/*
-		 * Size is large, assume best to place the slab management obj
-		 * off-slab (should allow better packing of objs).
-		 */
+	if (set_off_slab_cache(cachep, size, flags)) {
 		flags |= CFLGS_OFF_SLAB;
+		goto done;
 	}
 
-	left_over = calculate_slab_order(cachep, size, flags);
-
-	if (!cachep->num)
-		return -E2BIG;
-
-	freelist_size = cachep->num * sizeof(freelist_idx_t);
+	if (set_on_slab_cache(cachep, size, flags))
+		goto done;
 
-	/*
-	 * If the slab has been placed off-slab, and we have enough space then
-	 * move it on-slab. This is at the expense of any extra colouring.
-	 */
-	if (flags & CFLGS_OFF_SLAB && left_over >= freelist_size) {
-		flags &= ~CFLGS_OFF_SLAB;
-		left_over -= freelist_size;
-	}
+	return -E2BIG;
 
-	cachep->colour_off = cache_line_size();
-	/* Offset must be a multiple of the alignment. */
-	if (cachep->colour_off < cachep->align)
-		cachep->colour_off = cachep->align;
-	cachep->colour = left_over / cachep->colour_off;
-	cachep->freelist_size = freelist_size;
+done:
+	cachep->freelist_size = cachep->num * sizeof(freelist_idx_t);
 	cachep->flags = flags;
 	cachep->allocflags = __GFP_COMP;
 	if (CONFIG_ZONE_DMA_FLAG && (flags & SLAB_CACHE_DMA))
@@ -2209,7 +2245,8 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 #endif
 
 	if (OFF_SLAB(cachep)) {
-		cachep->freelist_cache = kmalloc_slab(freelist_size, 0u);
+		cachep->freelist_cache =
+			kmalloc_slab(cachep->freelist_size, 0u);
 		/*
 		 * This is a possibility for one of the kmalloc_{dma,}_caches.
 		 * But since we go off slab only for object size greater than
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 12/17] mm/slab: do not change cache size if debug pagealloc isn't possible
  2016-02-26  6:01 [PATCH v2 00/17] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB js1304
                   ` (10 preceding siblings ...)
  2016-02-26  6:01 ` [PATCH v2 11/17] mm/slab: clean up cache type determination js1304
@ 2016-02-26  6:01 ` js1304
  2016-02-26 16:13   ` Christoph Lameter
  2016-02-26  6:01 ` [PATCH v2 13/17] mm/slab: make criteria for off slab determination robust and simple js1304
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 26+ messages in thread
From: js1304 @ 2016-02-26  6:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes,
	Jesper Dangaard Brouer, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

We can fail to setup off slab in some conditions.  Even in this case,
debug pagealloc increases cache size to PAGE_SIZE in advance and it is
waste because debug pagealloc cannot work for it when it isn't the off
slab.  To improve this situation, this patch checks first that this cache
with increased size is suitable for off slab.  It actually increases cache
size when it is suitable for off-slab, so possible waste is removed.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 mm/slab.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 9b56685..21aad9d 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2206,10 +2206,17 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	 */
 	if (debug_pagealloc_enabled() && (flags & SLAB_POISON) &&
 		!slab_early_init && size >= kmalloc_size(INDEX_NODE) &&
-		size >= 256 && cachep->object_size > cache_line_size() &&
-		size < PAGE_SIZE) {
-		cachep->obj_offset += PAGE_SIZE - size;
-		size = PAGE_SIZE;
+		size >= 256 && cachep->object_size > cache_line_size()) {
+		if (size < PAGE_SIZE || size % PAGE_SIZE == 0) {
+			size_t tmp_size = ALIGN(size, PAGE_SIZE);
+
+			if (set_off_slab_cache(cachep, tmp_size, flags)) {
+				flags |= CFLGS_OFF_SLAB;
+				cachep->obj_offset += tmp_size - size;
+				size = tmp_size;
+				goto done;
+			}
+		}
 	}
 #endif
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 13/17] mm/slab: make criteria for off slab determination robust and simple
  2016-02-26  6:01 [PATCH v2 00/17] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB js1304
                   ` (11 preceding siblings ...)
  2016-02-26  6:01 ` [PATCH v2 12/17] mm/slab: do not change cache size if debug pagealloc isn't possible js1304
@ 2016-02-26  6:01 ` js1304
  2016-02-26  6:01 ` [PATCH v2 14/17] mm/slab: factor out slab list fixup code js1304
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: js1304 @ 2016-02-26  6:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes,
	Jesper Dangaard Brouer, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

To become an off slab, there are some constraints to avoid bootstrapping
problem and recursive call.  This can be avoided differently by simply
checking that corresponding kmalloc cache is ready and it's not a off
slab.  It would be more robust because static size checking can be
affected by cache size change or architecture type but dynamic checking
isn't.

One check 'freelist_cache->size > cachep->size / 2' is added to check
benefit of choosing off slab, because, now, there is no size constraint
which ensures enough advantage when selecting off slab.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 mm/slab.c | 45 +++++++++++++++++----------------------------
 1 file changed, 17 insertions(+), 28 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 21aad9d..ab43d9f 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -272,7 +272,6 @@ static void kmem_cache_node_init(struct kmem_cache_node *parent)
 
 #define CFLGS_OFF_SLAB		(0x80000000UL)
 #define	OFF_SLAB(x)	((x)->flags & CFLGS_OFF_SLAB)
-#define OFF_SLAB_MIN_SIZE (max_t(size_t, PAGE_SIZE >> 5, KMALLOC_MIN_SIZE + 1))
 
 #define BATCHREFILL_LIMIT	16
 /*
@@ -1879,7 +1878,6 @@ static void slabs_destroy(struct kmem_cache *cachep, struct list_head *list)
 static size_t calculate_slab_order(struct kmem_cache *cachep,
 				size_t size, unsigned long flags)
 {
-	unsigned long offslab_limit;
 	size_t left_over = 0;
 	int gfporder;
 
@@ -1896,16 +1894,24 @@ static size_t calculate_slab_order(struct kmem_cache *cachep,
 			break;
 
 		if (flags & CFLGS_OFF_SLAB) {
+			struct kmem_cache *freelist_cache;
+			size_t freelist_size;
+
+			freelist_size = num * sizeof(freelist_idx_t);
+			freelist_cache = kmalloc_slab(freelist_size, 0u);
+			if (!freelist_cache)
+				continue;
+
 			/*
-			 * Max number of objs-per-slab for caches which
-			 * use off-slab slabs. Needed to avoid a possible
-			 * looping condition in cache_grow().
+			 * Needed to avoid possible looping condition
+			 * in cache_grow()
 			 */
-			offslab_limit = size;
-			offslab_limit /= sizeof(freelist_idx_t);
+			if (OFF_SLAB(freelist_cache))
+				continue;
 
- 			if (num > offslab_limit)
-				break;
+			/* check if off slab has enough benefit */
+			if (freelist_cache->size > cachep->size / 2)
+				continue;
 		}
 
 		/* Found something acceptable - save it away */
@@ -2031,17 +2037,9 @@ static bool set_off_slab_cache(struct kmem_cache *cachep,
 	cachep->num = 0;
 
 	/*
-	 * Determine if the slab management is 'on' or 'off' slab.
-	 * (bootstrapping cannot cope with offslab caches so don't do
-	 * it too early on. Always use on-slab management when
-	 * SLAB_NOLEAKTRACE to avoid recursive calls into kmemleak)
+	 * Always use on-slab management when SLAB_NOLEAKTRACE
+	 * to avoid recursive calls into kmemleak.
 	 */
-	if (size < OFF_SLAB_MIN_SIZE)
-		return false;
-
-	if (slab_early_init)
-		return false;
-
 	if (flags & SLAB_NOLEAKTRACE)
 		return false;
 
@@ -2205,7 +2203,6 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	 * sized slab is initialized in current slab initialization sequence.
 	 */
 	if (debug_pagealloc_enabled() && (flags & SLAB_POISON) &&
-		!slab_early_init && size >= kmalloc_size(INDEX_NODE) &&
 		size >= 256 && cachep->object_size > cache_line_size()) {
 		if (size < PAGE_SIZE || size % PAGE_SIZE == 0) {
 			size_t tmp_size = ALIGN(size, PAGE_SIZE);
@@ -2254,14 +2251,6 @@ done:
 	if (OFF_SLAB(cachep)) {
 		cachep->freelist_cache =
 			kmalloc_slab(cachep->freelist_size, 0u);
-		/*
-		 * This is a possibility for one of the kmalloc_{dma,}_caches.
-		 * But since we go off slab only for object size greater than
-		 * OFF_SLAB_MIN_SIZE, and kmalloc_{dma,}_caches get created
-		 * in ascending order,this should not happen at all.
-		 * But leave a BUG_ON for some lucky dude.
-		 */
-		BUG_ON(ZERO_OR_NULL_PTR(cachep->freelist_cache));
 	}
 
 	err = setup_cpu_cache(cachep, gfp);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 14/17] mm/slab: factor out slab list fixup code
  2016-02-26  6:01 [PATCH v2 00/17] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB js1304
                   ` (12 preceding siblings ...)
  2016-02-26  6:01 ` [PATCH v2 13/17] mm/slab: make criteria for off slab determination robust and simple js1304
@ 2016-02-26  6:01 ` js1304
  2016-02-26  6:01 ` [PATCH v2 15/17] mm/slab: factor out debugging initialization in cache_init_objs() js1304
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: js1304 @ 2016-02-26  6:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes,
	Jesper Dangaard Brouer, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Slab list should be fixed up after object is detached from the slab and
this happens at two places.  They do exactly same thing.  They will be
changed in the following patch, so, to reduce code duplication, this patch
factor out them and make it common function.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 mm/slab.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index ab43d9f..95e5d63 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2723,6 +2723,17 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
 #define cache_free_debugcheck(x,objp,z) (objp)
 #endif
 
+static inline void fixup_slab_list(struct kmem_cache *cachep,
+				struct kmem_cache_node *n, struct page *page)
+{
+	/* move slabp to correct slabp list: */
+	list_del(&page->lru);
+	if (page->active == cachep->num)
+		list_add(&page->lru, &n->slabs_full);
+	else
+		list_add(&page->lru, &n->slabs_partial);
+}
+
 static struct page *get_first_slab(struct kmem_cache_node *n)
 {
 	struct page *page;
@@ -2796,12 +2807,7 @@ retry:
 			ac_put_obj(cachep, ac, slab_get_obj(cachep, page));
 		}
 
-		/* move slabp to correct slabp list: */
-		list_del(&page->lru);
-		if (page->active == cachep->num)
-			list_add(&page->lru, &n->slabs_full);
-		else
-			list_add(&page->lru, &n->slabs_partial);
+		fixup_slab_list(cachep, n, page);
 	}
 
 must_grow:
@@ -3067,13 +3073,8 @@ retry:
 
 	obj = slab_get_obj(cachep, page);
 	n->free_objects--;
-	/* move slabp to correct slabp list: */
-	list_del(&page->lru);
 
-	if (page->active == cachep->num)
-		list_add(&page->lru, &n->slabs_full);
-	else
-		list_add(&page->lru, &n->slabs_partial);
+	fixup_slab_list(cachep, n, page);
 
 	spin_unlock(&n->list_lock);
 	goto done;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 15/17] mm/slab: factor out debugging initialization in cache_init_objs()
  2016-02-26  6:01 [PATCH v2 00/17] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB js1304
                   ` (13 preceding siblings ...)
  2016-02-26  6:01 ` [PATCH v2 14/17] mm/slab: factor out slab list fixup code js1304
@ 2016-02-26  6:01 ` js1304
  2016-02-26  6:01 ` [PATCH v2 16/17] mm/slab: introduce new slab management type, OBJFREELIST_SLAB js1304
  2016-02-26  6:01 ` [PATCH v2 17/17] mm/slab: avoid returning values by reference js1304
  16 siblings, 0 replies; 26+ messages in thread
From: js1304 @ 2016-02-26  6:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes,
	Jesper Dangaard Brouer, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

cache_init_objs() will be changed in following patch and current form
doesn't fit well for that change.  So, before doing it, this patch
separates debugging initialization.  This would cause two loop iteration
when debugging is enabled, but, this overhead seems too light than debug
feature itself so effect may not be visible.  This patch will greatly
simplify changes in cache_init_objs() in following patch.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 mm/slab.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 95e5d63..d3608d1 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2460,14 +2460,14 @@ static inline void set_free_obj(struct page *page,
 	((freelist_idx_t *)(page->freelist))[idx] = val;
 }
 
-static void cache_init_objs(struct kmem_cache *cachep,
-			    struct page *page)
+static void cache_init_objs_debug(struct kmem_cache *cachep, struct page *page)
 {
+#if DEBUG
 	int i;
 
 	for (i = 0; i < cachep->num; i++) {
 		void *objp = index_to_obj(cachep, page, i);
-#if DEBUG
+
 		if (cachep->flags & SLAB_STORE_USER)
 			*dbg_userword(cachep, objp) = NULL;
 
@@ -2496,10 +2496,22 @@ static void cache_init_objs(struct kmem_cache *cachep,
 			poison_obj(cachep, objp, POISON_FREE);
 			slab_kernel_map(cachep, objp, 0, 0);
 		}
-#else
-		if (cachep->ctor)
-			cachep->ctor(objp);
+	}
 #endif
+}
+
+static void cache_init_objs(struct kmem_cache *cachep,
+			    struct page *page)
+{
+	int i;
+
+	cache_init_objs_debug(cachep, page);
+
+	for (i = 0; i < cachep->num; i++) {
+		/* constructor could break poison info */
+		if (DEBUG == 0 && cachep->ctor)
+			cachep->ctor(index_to_obj(cachep, page, i));
+
 		set_free_obj(page, i, i);
 	}
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 16/17] mm/slab: introduce new slab management type, OBJFREELIST_SLAB
  2016-02-26  6:01 [PATCH v2 00/17] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB js1304
                   ` (14 preceding siblings ...)
  2016-02-26  6:01 ` [PATCH v2 15/17] mm/slab: factor out debugging initialization in cache_init_objs() js1304
@ 2016-02-26  6:01 ` js1304
  2016-02-26 16:21   ` Christoph Lameter
  2016-02-26  6:01 ` [PATCH v2 17/17] mm/slab: avoid returning values by reference js1304
  16 siblings, 1 reply; 26+ messages in thread
From: js1304 @ 2016-02-26  6:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes,
	Jesper Dangaard Brouer, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

SLAB needs a array to manage freed objects in a slab.  It is only used if
some objects are freed so we can use free object itself as this array.
This requires additional branch in somewhat critical lock path to check if
it is first freed object or not but that's all we need.  Benefits is that
we can save extra memory usage and reduce some computational overhead by
allocating a management array when new slab is created.

Code change is rather complex than what we can expect from the idea, in
order to handle debugging feature efficiently.  If you want to see core
idea only, please remove '#if DEBUG' block in the patch.

Although this idea can apply to all caches whose size is larger than
management array size, it isn't applied to caches which have a
constructor.  If such cache's object is used for management array,
constructor should be called for it before that object is returned to
user.  I guess that overhead overwhelm benefit in that case so this idea
doesn't applied to them at least now.

For summary, from now on, slab management type is determined by
following logic.

1) if management array size is smaller than object size and no ctor, it
   becomes OBJFREELIST_SLAB.

2) if management array size is smaller than leftover, it becomes
   NORMAL_SLAB which uses leftover as a array.

3) if OFF_SLAB help to save memory than way 4), it becomes OFF_SLAB.
   It allocate a management array from the other cache so memory waste
   happens.

4) others become NORMAL_SLAB.  It uses dedicated internal memory in a
   slab as a management array so it causes memory waste.

In my system, without enabling CONFIG_DEBUG_SLAB, Almost caches become
OBJFREELIST_SLAB and NORMAL_SLAB (using leftover) which doesn't waste
memory.  Following is the result of number of caches with specific slab
management type.

TOTAL = OBJFREELIST + NORMAL(leftover) + NORMAL + OFF

/Before/
126 = 0 + 60 + 25 + 41

/After/
126 = 97 + 12 + 15 + 2

Result shows that number of caches that doesn't waste memory increase
from 60 to 109.

I did some benchmarking and it looks that benefit are more than loss.

Kmalloc: Repeatedly allocate then free test

/Before/
[    0.286809] 1. Kmalloc: Repeatedly allocate then free test
[    1.143674] 100000 times kmalloc(32) -> 116 cycles kfree -> 78 cycles
[    1.441726] 100000 times kmalloc(64) -> 121 cycles kfree -> 80 cycles
[    1.815734] 100000 times kmalloc(128) -> 168 cycles kfree -> 85 cycles
[    2.380709] 100000 times kmalloc(256) -> 287 cycles kfree -> 95 cycles
[    3.101153] 100000 times kmalloc(512) -> 370 cycles kfree -> 117 cycles
[    3.942432] 100000 times kmalloc(1024) -> 413 cycles kfree -> 156 cycles
[    5.227396] 100000 times kmalloc(2048) -> 622 cycles kfree -> 248 cycles
[    7.519793] 100000 times kmalloc(4096) -> 1102 cycles kfree -> 452 cycles

/After/
[    1.205313] 100000 times kmalloc(32) -> 117 cycles kfree -> 78 cycles
[    1.510526] 100000 times kmalloc(64) -> 124 cycles kfree -> 81 cycles
[    1.827382] 100000 times kmalloc(128) -> 130 cycles kfree -> 84 cycles
[    2.226073] 100000 times kmalloc(256) -> 177 cycles kfree -> 92 cycles
[    2.814747] 100000 times kmalloc(512) -> 286 cycles kfree -> 112 cycles
[    3.532952] 100000 times kmalloc(1024) -> 344 cycles kfree -> 141 cycles
[    4.608777] 100000 times kmalloc(2048) -> 519 cycles kfree -> 210 cycles
[    6.350105] 100000 times kmalloc(4096) -> 789 cycles kfree -> 391 cycles

v2: fix SLAB_DESTROTY_BY_RCU cache type handling

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 mm/slab.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 86 insertions(+), 8 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index d3608d1..85e394f 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -270,7 +270,9 @@ static void kmem_cache_node_init(struct kmem_cache_node *parent)
 	MAKE_LIST((cachep), (&(ptr)->slabs_free), slabs_free, nodeid);	\
 	} while (0)
 
+#define CFLGS_OBJFREELIST_SLAB	(0x40000000UL)
 #define CFLGS_OFF_SLAB		(0x80000000UL)
+#define	OBJFREELIST_SLAB(x)	((x)->flags & CFLGS_OBJFREELIST_SLAB)
 #define	OFF_SLAB(x)	((x)->flags & CFLGS_OFF_SLAB)
 
 #define BATCHREFILL_LIMIT	16
@@ -480,7 +482,7 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
 	 * the slabs are all pages aligned, the objects will be at the
 	 * correct alignment when allocated.
 	 */
-	if (flags & CFLGS_OFF_SLAB) {
+	if (flags & (CFLGS_OBJFREELIST_SLAB | CFLGS_OFF_SLAB)) {
 		*num = slab_size / buffer_size;
 		*left_over = slab_size % buffer_size;
 	} else {
@@ -1801,6 +1803,12 @@ static void slab_destroy_debugcheck(struct kmem_cache *cachep,
 						struct page *page)
 {
 	int i;
+
+	if (OBJFREELIST_SLAB(cachep) && cachep->flags & SLAB_POISON) {
+		poison_obj(cachep, page->freelist - obj_offset(cachep),
+			POISON_FREE);
+	}
+
 	for (i = 0; i < cachep->num; i++) {
 		void *objp = index_to_obj(cachep, page, i);
 
@@ -2029,6 +2037,29 @@ __kmem_cache_alias(const char *name, size_t size, size_t align,
 	return cachep;
 }
 
+static bool set_objfreelist_slab_cache(struct kmem_cache *cachep,
+			size_t size, unsigned long flags)
+{
+	size_t left;
+
+	cachep->num = 0;
+
+	if (cachep->ctor || flags & SLAB_DESTROY_BY_RCU)
+		return false;
+
+	left = calculate_slab_order(cachep, size,
+			flags | CFLGS_OBJFREELIST_SLAB);
+	if (!cachep->num)
+		return false;
+
+	if (cachep->num * sizeof(freelist_idx_t) > cachep->object_size)
+		return false;
+
+	cachep->colour = left / cachep->colour_off;
+
+	return true;
+}
+
 static bool set_off_slab_cache(struct kmem_cache *cachep,
 			size_t size, unsigned long flags)
 {
@@ -2217,6 +2248,11 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	}
 #endif
 
+	if (set_objfreelist_slab_cache(cachep, size, flags)) {
+		flags |= CFLGS_OBJFREELIST_SLAB;
+		goto done;
+	}
+
 	if (set_off_slab_cache(cachep, size, flags)) {
 		flags |= CFLGS_OFF_SLAB;
 		goto done;
@@ -2434,7 +2470,9 @@ static void *alloc_slabmgmt(struct kmem_cache *cachep,
 	page->s_mem = addr + colour_off;
 	page->active = 0;
 
-	if (OFF_SLAB(cachep)) {
+	if (OBJFREELIST_SLAB(cachep))
+		freelist = NULL;
+	else if (OFF_SLAB(cachep)) {
 		/* Slab management obj is off-slab. */
 		freelist = kmem_cache_alloc_node(cachep->freelist_cache,
 					      local_flags, nodeid);
@@ -2507,6 +2545,11 @@ static void cache_init_objs(struct kmem_cache *cachep,
 
 	cache_init_objs_debug(cachep, page);
 
+	if (OBJFREELIST_SLAB(cachep)) {
+		page->freelist = index_to_obj(cachep, page, cachep->num - 1) +
+						obj_offset(cachep);
+	}
+
 	for (i = 0; i < cachep->num; i++) {
 		/* constructor could break poison info */
 		if (DEBUG == 0 && cachep->ctor)
@@ -2558,6 +2601,9 @@ static void slab_put_obj(struct kmem_cache *cachep,
 	}
 #endif
 	page->active--;
+	if (!page->freelist)
+		page->freelist = objp + obj_offset(cachep);
+
 	set_free_obj(page, page->active, objnr);
 }
 
@@ -2632,7 +2678,7 @@ static int cache_grow(struct kmem_cache *cachep,
 	/* Get slab management. */
 	freelist = alloc_slabmgmt(cachep, page, offset,
 			local_flags & ~GFP_CONSTRAINT_MASK, nodeid);
-	if (!freelist)
+	if (OFF_SLAB(cachep) && !freelist)
 		goto opps1;
 
 	slab_map_pages(cachep, page, freelist);
@@ -2735,14 +2781,42 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
 #define cache_free_debugcheck(x,objp,z) (objp)
 #endif
 
+static inline void fixup_objfreelist_debug(struct kmem_cache *cachep,
+						void **list)
+{
+#if DEBUG
+	void *next = *list;
+	void *objp;
+
+	while (next) {
+		objp = next - obj_offset(cachep);
+		next = *(void **)next;
+		poison_obj(cachep, objp, POISON_FREE);
+	}
+#endif
+}
+
 static inline void fixup_slab_list(struct kmem_cache *cachep,
-				struct kmem_cache_node *n, struct page *page)
+				struct kmem_cache_node *n, struct page *page,
+				void **list)
 {
 	/* move slabp to correct slabp list: */
 	list_del(&page->lru);
-	if (page->active == cachep->num)
+	if (page->active == cachep->num) {
 		list_add(&page->lru, &n->slabs_full);
-	else
+		if (OBJFREELIST_SLAB(cachep)) {
+#if DEBUG
+			/* Poisoning will be done without holding the lock */
+			if (cachep->flags & SLAB_POISON) {
+				void **objp = page->freelist;
+
+				*objp = *list;
+				*list = objp;
+			}
+#endif
+			page->freelist = NULL;
+		}
+	} else
 		list_add(&page->lru, &n->slabs_partial);
 }
 
@@ -2768,6 +2842,7 @@ static void *cache_alloc_refill(struct kmem_cache *cachep, gfp_t flags,
 	struct kmem_cache_node *n;
 	struct array_cache *ac;
 	int node;
+	void *list = NULL;
 
 	check_irq_off();
 	node = numa_mem_id();
@@ -2819,13 +2894,14 @@ retry:
 			ac_put_obj(cachep, ac, slab_get_obj(cachep, page));
 		}
 
-		fixup_slab_list(cachep, n, page);
+		fixup_slab_list(cachep, n, page, &list);
 	}
 
 must_grow:
 	n->free_objects -= ac->avail;
 alloc_done:
 	spin_unlock(&n->list_lock);
+	fixup_objfreelist_debug(cachep, &list);
 
 	if (unlikely(!ac->avail)) {
 		int x;
@@ -3062,6 +3138,7 @@ static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
 	struct page *page;
 	struct kmem_cache_node *n;
 	void *obj;
+	void *list = NULL;
 	int x;
 
 	VM_BUG_ON(nodeid < 0 || nodeid >= MAX_NUMNODES);
@@ -3086,9 +3163,10 @@ retry:
 	obj = slab_get_obj(cachep, page);
 	n->free_objects--;
 
-	fixup_slab_list(cachep, n, page);
+	fixup_slab_list(cachep, n, page, &list);
 
 	spin_unlock(&n->list_lock);
+	fixup_objfreelist_debug(cachep, &list);
 	goto done;
 
 must_grow:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 17/17] mm/slab: avoid returning values by reference
  2016-02-26  6:01 [PATCH v2 00/17] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB js1304
                   ` (15 preceding siblings ...)
  2016-02-26  6:01 ` [PATCH v2 16/17] mm/slab: introduce new slab management type, OBJFREELIST_SLAB js1304
@ 2016-02-26  6:01 ` js1304
  2016-02-26 16:22   ` Christoph Lameter
  16 siblings, 1 reply; 26+ messages in thread
From: js1304 @ 2016-02-26  6:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes,
	Jesper Dangaard Brouer, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Returing values by reference is bad practice. Instead, just use
function return value.

Suggested-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/slab.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 85e394f..4f4e647 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -460,9 +460,10 @@ static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep)
 /*
  * Calculate the number of objects and left-over bytes for a given buffer size.
  */
-static void cache_estimate(unsigned long gfporder, size_t buffer_size,
-		unsigned long flags, size_t *left_over, unsigned int *num)
+static unsigned int cache_estimate(unsigned long gfporder, size_t buffer_size,
+		unsigned long flags, size_t *left_over)
 {
+	unsigned int num;
 	size_t slab_size = PAGE_SIZE << gfporder;
 
 	/*
@@ -483,13 +484,15 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
 	 * correct alignment when allocated.
 	 */
 	if (flags & (CFLGS_OBJFREELIST_SLAB | CFLGS_OFF_SLAB)) {
-		*num = slab_size / buffer_size;
+		num = slab_size / buffer_size;
 		*left_over = slab_size % buffer_size;
 	} else {
-		*num = slab_size / (buffer_size + sizeof(freelist_idx_t));
+		num = slab_size / (buffer_size + sizeof(freelist_idx_t));
 		*left_over = slab_size %
 			(buffer_size + sizeof(freelist_idx_t));
 	}
+
+	return num;
 }
 
 #if DEBUG
@@ -1893,7 +1896,7 @@ static size_t calculate_slab_order(struct kmem_cache *cachep,
 		unsigned int num;
 		size_t remainder;
 
-		cache_estimate(gfporder, size, flags, &remainder, &num);
+		num = cache_estimate(gfporder, size, flags, &remainder);
 		if (!num)
 			continue;
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 03/17] mm/slab: remove the checks for slab implementation bug
  2016-02-26  6:01 ` [PATCH v2 03/17] mm/slab: remove the checks for slab implementation bug js1304
@ 2016-02-26 16:05   ` Christoph Lameter
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Lameter @ 2016-02-26 16:05 UTC (permalink / raw)
  To: js1304
  Cc: Andrew Morton, Pekka Enberg, David Rientjes,
	Jesper Dangaard Brouer, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

On Fri, 26 Feb 2016, js1304@gmail.com wrote:

> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> Some of "#if DEBUG" are for reporting slab implementation bug rather than
> user usecase bug.  It's not really needed because slab is stable for a
> quite long time and it makes code too dirty.  This patch remove it.

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 04/17] mm/slab: activate debug_pagealloc in SLAB when it is actually enabled
  2016-02-26  6:01 ` [PATCH v2 04/17] mm/slab: activate debug_pagealloc in SLAB when it is actually enabled js1304
@ 2016-02-26 16:06   ` Christoph Lameter
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Lameter @ 2016-02-26 16:06 UTC (permalink / raw)
  To: js1304
  Cc: Andrew Morton, Pekka Enberg, David Rientjes,
	Jesper Dangaard Brouer, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim


Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 05/17] mm/slab: use more appropriate condition check for debug_pagealloc
  2016-02-26  6:01 ` [PATCH v2 05/17] mm/slab: use more appropriate condition check for debug_pagealloc js1304
@ 2016-02-26 16:08   ` Christoph Lameter
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Lameter @ 2016-02-26 16:08 UTC (permalink / raw)
  To: js1304
  Cc: Andrew Morton, Pekka Enberg, David Rientjes,
	Jesper Dangaard Brouer, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim


Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 12/17] mm/slab: do not change cache size if debug pagealloc isn't possible
  2016-02-26  6:01 ` [PATCH v2 12/17] mm/slab: do not change cache size if debug pagealloc isn't possible js1304
@ 2016-02-26 16:13   ` Christoph Lameter
  2016-02-26 17:02     ` Joonsoo Kim
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Lameter @ 2016-02-26 16:13 UTC (permalink / raw)
  To: js1304
  Cc: Andrew Morton, Pekka Enberg, David Rientjes,
	Jesper Dangaard Brouer, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

On Fri, 26 Feb 2016, js1304@gmail.com wrote:

> We can fail to setup off slab in some conditions.  Even in this case,
> debug pagealloc increases cache size to PAGE_SIZE in advance and it is
> waste because debug pagealloc cannot work for it when it isn't the off
> slab.  To improve this situation, this patch checks first that this cache
> with increased size is suitable for off slab.  It actually increases cache
> size when it is suitable for off-slab, so possible waste is removed.

Maybe add some explanations to the code? You tried to simplify it earlier
and make it understandable. This makes it difficult to understand it.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 16/17] mm/slab: introduce new slab management type, OBJFREELIST_SLAB
  2016-02-26  6:01 ` [PATCH v2 16/17] mm/slab: introduce new slab management type, OBJFREELIST_SLAB js1304
@ 2016-02-26 16:21   ` Christoph Lameter
  2016-02-26 17:06     ` Joonsoo Kim
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Lameter @ 2016-02-26 16:21 UTC (permalink / raw)
  To: js1304
  Cc: Andrew Morton, Pekka Enberg, David Rientjes,
	Jesper Dangaard Brouer, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

On Fri, 26 Feb 2016, js1304@gmail.com wrote:

> Although this idea can apply to all caches whose size is larger than
> management array size, it isn't applied to caches which have a
> constructor.  If such cache's object is used for management array,
> constructor should be called for it before that object is returned to
> user.  I guess that overhead overwhelm benefit in that case so this idea
> doesn't applied to them at least now.

Caches which have a constructor (or are used with SLAB_RCU_FREE) have a
defined content even when they are free. Therefore they cannot be used
for the freelist.

> For summary, from now on, slab management type is determined by
> following logic.
>
> 1) if management array size is smaller than object size and no ctor, it
>    becomes OBJFREELIST_SLAB.

Also do not do this for RCU slabs.

> 2) if management array size is smaller than leftover, it becomes
>    NORMAL_SLAB which uses leftover as a array.
>
> 3) if OFF_SLAB help to save memory than way 4), it becomes OFF_SLAB.
>    It allocate a management array from the other cache so memory waste
>    happens.

Wonder how many of these ugly off slabs are left after what you did here.

> TOTAL = OBJFREELIST + NORMAL(leftover) + NORMAL + OFF
>
> /Before/
> 126 = 0 + 60 + 25 + 41
>
> /After/
> 126 = 97 + 12 + 15 + 2
>
> Result shows that number of caches that doesn't waste memory increase
> from 60 to 109.

Great results.

> v2: fix SLAB_DESTROTY_BY_RCU cache type handling

Ok how are they handled now? Do not see that dealt with in the patch.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 17/17] mm/slab: avoid returning values by reference
  2016-02-26  6:01 ` [PATCH v2 17/17] mm/slab: avoid returning values by reference js1304
@ 2016-02-26 16:22   ` Christoph Lameter
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Lameter @ 2016-02-26 16:22 UTC (permalink / raw)
  To: js1304
  Cc: Andrew Morton, Pekka Enberg, David Rientjes,
	Jesper Dangaard Brouer, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

On Fri, 26 Feb 2016, js1304@gmail.com wrote:

> Returing values by reference is bad practice. Instead, just use
> function return value.

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 12/17] mm/slab: do not change cache size if debug pagealloc isn't possible
  2016-02-26 16:13   ` Christoph Lameter
@ 2016-02-26 17:02     ` Joonsoo Kim
  0 siblings, 0 replies; 26+ messages in thread
From: Joonsoo Kim @ 2016-02-26 17:02 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, Pekka Enberg, David Rientjes,
	Jesper Dangaard Brouer, Vlastimil Babka,
	Linux Memory Management List, LKML, Joonsoo Kim

2016-02-27 1:13 GMT+09:00 Christoph Lameter <cl@linux.com>:
> On Fri, 26 Feb 2016, js1304@gmail.com wrote:
>
>> We can fail to setup off slab in some conditions.  Even in this case,
>> debug pagealloc increases cache size to PAGE_SIZE in advance and it is
>> waste because debug pagealloc cannot work for it when it isn't the off
>> slab.  To improve this situation, this patch checks first that this cache
>> with increased size is suitable for off slab.  It actually increases cache
>> size when it is suitable for off-slab, so possible waste is removed.
>
> Maybe add some explanations to the code? You tried to simplify it earlier
> and make it understandable. This makes it difficult to understand it.

There is some explanation above the changed stuff although it doesn't
appear in the patch. And, this patch doesn't change any condition for it.
What this patch does is checking if it is suitable for off slab cache
in advance.
Before this patch, it is checked after increasing size and if it isn't
suitable for off slab cache, it cannot be used for debug_pagealloc with
increased size.

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 16/17] mm/slab: introduce new slab management type, OBJFREELIST_SLAB
  2016-02-26 16:21   ` Christoph Lameter
@ 2016-02-26 17:06     ` Joonsoo Kim
  0 siblings, 0 replies; 26+ messages in thread
From: Joonsoo Kim @ 2016-02-26 17:06 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, Pekka Enberg, David Rientjes,
	Jesper Dangaard Brouer, Vlastimil Babka,
	Linux Memory Management List, LKML, Joonsoo Kim

2016-02-27 1:21 GMT+09:00 Christoph Lameter <cl@linux.com>:
> On Fri, 26 Feb 2016, js1304@gmail.com wrote:
>
>> Although this idea can apply to all caches whose size is larger than
>> management array size, it isn't applied to caches which have a
>> constructor.  If such cache's object is used for management array,
>> constructor should be called for it before that object is returned to
>> user.  I guess that overhead overwhelm benefit in that case so this idea
>> doesn't applied to them at least now.
>
> Caches which have a constructor (or are used with SLAB_RCU_FREE) have a
> defined content even when they are free. Therefore they cannot be used
> for the freelist.

Yes, I know. I already handled it. I attach related hunk.

+static bool set_objfreelist_slab_cache(struct kmem_cache *cachep,
+                       size_t size, unsigned long flags)
+{
+       size_t left;
+
+       cachep->num = 0;
+
+       if (cachep->ctor || flags & SLAB_DESTROY_BY_RCU)
+               return false;

So, if there is ctor or RCU slabs, objfreelist will not be used.

>> For summary, from now on, slab management type is determined by
>> following logic.
>>
>> 1) if management array size is smaller than object size and no ctor, it
>>    becomes OBJFREELIST_SLAB.
>
> Also do not do this for RCU slabs.

Explained above.

>> 2) if management array size is smaller than leftover, it becomes
>>    NORMAL_SLAB which uses leftover as a array.
>>
>> 3) if OFF_SLAB help to save memory than way 4), it becomes OFF_SLAB.
>>    It allocate a management array from the other cache so memory waste
>>    happens.
>
> Wonder how many of these ugly off slabs are left after what you did here.

See below result.

>> TOTAL = OBJFREELIST + NORMAL(leftover) + NORMAL + OFF
>>
>> /Before/
>> 126 = 0 + 60 + 25 + 41
>>
>> /After/
>> 126 = 97 + 12 + 15 + 2

97 is the number of 1) type caches.
12 is the number of 2) type caches.
and so on...

>> Result shows that number of caches that doesn't waste memory increase
>> from 60 to 109.
>
> Great results.

Thanks. :)

>> v2: fix SLAB_DESTROTY_BY_RCU cache type handling
>
> Ok how are they handled now? Do not see that dealt with in the patch.

Explained above.

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2016-02-26 17:06 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-26  6:01 [PATCH v2 00/17] mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB js1304
2016-02-26  6:01 ` [PATCH v2 01/17] mm/slab: fix stale code comment js1304
2016-02-26  6:01 ` [PATCH v2 02/17] mm/slab: remove useless structure define js1304
2016-02-26  6:01 ` [PATCH v2 03/17] mm/slab: remove the checks for slab implementation bug js1304
2016-02-26 16:05   ` Christoph Lameter
2016-02-26  6:01 ` [PATCH v2 04/17] mm/slab: activate debug_pagealloc in SLAB when it is actually enabled js1304
2016-02-26 16:06   ` Christoph Lameter
2016-02-26  6:01 ` [PATCH v2 05/17] mm/slab: use more appropriate condition check for debug_pagealloc js1304
2016-02-26 16:08   ` Christoph Lameter
2016-02-26  6:01 ` [PATCH v2 06/17] mm/slab: clean up DEBUG_PAGEALLOC processing code js1304
2016-02-26  6:01 ` [PATCH v2 07/17] mm/slab: alternative implementation for DEBUG_SLAB_LEAK js1304
2016-02-26  6:01 ` [PATCH v2 08/17] mm/slab: remove object status buffer " js1304
2016-02-26  6:01 ` [PATCH v2 09/17] mm/slab: put the freelist at the end of slab page js1304
2016-02-26  6:01 ` [PATCH v2 10/17] mm/slab: align cache size first before determination of OFF_SLAB candidate js1304
2016-02-26  6:01 ` [PATCH v2 11/17] mm/slab: clean up cache type determination js1304
2016-02-26  6:01 ` [PATCH v2 12/17] mm/slab: do not change cache size if debug pagealloc isn't possible js1304
2016-02-26 16:13   ` Christoph Lameter
2016-02-26 17:02     ` Joonsoo Kim
2016-02-26  6:01 ` [PATCH v2 13/17] mm/slab: make criteria for off slab determination robust and simple js1304
2016-02-26  6:01 ` [PATCH v2 14/17] mm/slab: factor out slab list fixup code js1304
2016-02-26  6:01 ` [PATCH v2 15/17] mm/slab: factor out debugging initialization in cache_init_objs() js1304
2016-02-26  6:01 ` [PATCH v2 16/17] mm/slab: introduce new slab management type, OBJFREELIST_SLAB js1304
2016-02-26 16:21   ` Christoph Lameter
2016-02-26 17:06     ` Joonsoo Kim
2016-02-26  6:01 ` [PATCH v2 17/17] mm/slab: avoid returning values by reference js1304
2016-02-26 16:22   ` Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).