All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] kmalloc-reclaimable caches
@ 2018-07-18 13:36 Vlastimil Babka
  2018-07-18 13:36 ` [PATCH v3 1/7] mm, slab: combine kmalloc_caches and kmalloc_dma_caches Vlastimil Babka
                   ` (7 more replies)
  0 siblings, 8 replies; 28+ messages in thread
From: Vlastimil Babka @ 2018-07-18 13:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, linux-api, Roman Gushchin, Michal Hocko,
	Johannes Weiner, Christoph Lameter, David Rientjes, Joonsoo Kim,
	Mel Gorman, Matthew Wilcox, Vlastimil Babka, Laura Abbott,
	Sumit Semwal, Vijayanand Jitta

v3 changes:
- fix missing hunk in patch 5/7
- more verbose cover letter and patch 6/7 commit log
v2 changes:
- shorten cache names to kmalloc-rcl-<SIZE>
- last patch shortens <SIZE> for all kmalloc caches to e.g. "1k", "4M"
- include dma caches to the 2D kmalloc_caches[] array to avoid a branch
- vmstat counter nr_indirectly_reclaimable_bytes renamed to
  nr_kernel_misc_reclaimable, doesn't include kmalloc-rcl-*
- /proc/meminfo counter renamed to KReclaimable, includes kmalloc-rcl*
  and nr_kernel_misc_reclaimable

Hi,

as discussed at LSF/MM [1] here's a patchset that introduces
kmalloc-reclaimable caches (more details in the second patch) and uses them for
SLAB freelists and dcache external names. The latter allows us to repurpose the
NR_INDIRECTLY_RECLAIMABLE_BYTES counter later in the series.

With patch 4/7, dcache external names are allocated from kmalloc-rcl-*
caches, eliminating the need for manual accounting. More importantly, it
also ensures the reclaimable kmalloc allocations are grouped in pages
separate from the regular kmalloc allocations. The need for proper
accounting of dcache external names has shown it's easy for misbehaving
process to allocate lots of them, causing premature OOMs. Without the
added grouping, it's likely that a similar workload can interleave the
dcache external names allocations with regular kmalloc allocations
(note: I haven't searched myself for an example of such regular kmalloc
allocation, but I would be very surprised if there wasn't some). A
pathological case would be e.g. one 64byte regular allocations with 63
external dcache names in a page (64x64=4096), which means the page is
not freed even after reclaiming after all dcache names, and the process
can thus "steal" the whole page with single 64byte allocation.

If there other kmalloc users similar to dcache external names become
identified, they can also benefit from the new functionality simply by
adding __GFP_RECLAIMABLE to the kmalloc calls.

Side benefits of the patchset (that could be also merged separately)
include removed branch for detecting __GFP_DMA kmalloc(), and shortening
kmalloc cache names in /proc/slabinfo output. The latter is potentially
an ABI break in case there are tools parsing the names and expecting the
values to be in bytes.

This is how /proc/slabinfo looks like after booting in virtme:

...
kmalloc-rcl-4M         0      0 4194304    1 1024 : tunables    1    1    0 : slabdata      0      0      0
...
kmalloc-rcl-96         7     32    128   32    1 : tunables  120   60    8 : slabdata      1      1      0
kmalloc-rcl-64        25    128     64   64    1 : tunables  120   60    8 : slabdata      2      2      0
kmalloc-rcl-32         0      0     32  124    1 : tunables  120   60    8 : slabdata      0      0      0
kmalloc-4M             0      0 4194304    1 1024 : tunables    1    1    0 : slabdata      0      0      0
kmalloc-2M             0      0 2097152    1  512 : tunables    1    1    0 : slabdata      0      0      0
kmalloc-1M             0      0 1048576    1  256 : tunables    1    1    0 : slabdata      0      0      0
...

/proc/vmstat with renamed nr_indirectly_reclaimable_bytes counter:

...
nr_slab_reclaimable 2817
nr_slab_unreclaimable 1781
...
nr_kernel_misc_reclaimable 0
...

/proc/meminfo with new KReclaimable counter:

...
Shmem:               564 kB
KReclaimable:      11260 kB
Slab:              18368 kB
SReclaimable:      11260 kB
SUnreclaim:         7108 kB
KernelStack:        1248 kB
...

Thanks,
Vlastimil

Vlastimil Babka (7):
  mm, slab: combine kmalloc_caches and kmalloc_dma_caches
  mm, slab/slub: introduce kmalloc-reclaimable caches
  mm, slab: allocate off-slab freelists as reclaimable when appropriate
  dcache: allocate external names from reclaimable kmalloc caches
  mm: rename and change semantics of nr_indirectly_reclaimable_bytes
  mm, proc: add KReclaimable to /proc/meminfo
  mm, slab: shorten kmalloc cache names for large sizes

 Documentation/filesystems/proc.txt          |   4 +
 drivers/base/node.c                         |  19 ++--
 drivers/staging/android/ion/ion_page_pool.c |   8 +-
 fs/dcache.c                                 |  38 ++------
 fs/proc/meminfo.c                           |  16 +--
 include/linux/mmzone.h                      |   2 +-
 include/linux/slab.h                        |  49 +++++++---
 mm/page_alloc.c                             |  19 ++--
 mm/slab.c                                   |  11 ++-
 mm/slab_common.c                            | 102 ++++++++++++--------
 mm/slub.c                                   |  13 +--
 mm/util.c                                   |   3 +-
 mm/vmstat.c                                 |   6 +-
 13 files changed, 161 insertions(+), 129 deletions(-)

-- 
2.18.0


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 1/7] mm, slab: combine kmalloc_caches and kmalloc_dma_caches
  2018-07-18 13:36 [PATCH v3 0/7] kmalloc-reclaimable caches Vlastimil Babka
@ 2018-07-18 13:36 ` Vlastimil Babka
  2018-07-19  8:10   ` Mel Gorman
  2018-07-30 15:38   ` Christopher Lameter
  2018-07-18 13:36 ` [PATCH v3 2/7] mm, slab/slub: introduce kmalloc-reclaimable caches Vlastimil Babka
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 28+ messages in thread
From: Vlastimil Babka @ 2018-07-18 13:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, linux-api, Roman Gushchin, Michal Hocko,
	Johannes Weiner, Christoph Lameter, David Rientjes, Joonsoo Kim,
	Mel Gorman, Matthew Wilcox, Vlastimil Babka

The kmalloc caches currently mainain separate (optional) array
kmalloc_dma_caches for __GFP_DMA allocations. There are tests for __GFP_DMA in
the allocation hotpaths. We can avoid the branches by combining kmalloc_caches
and kmalloc_dma_caches into a single two-dimensional array where the outer
dimension is cache "type". This will also allow to add kmalloc-reclaimable
caches as a third type.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/slab.h | 41 ++++++++++++++++++++++++++++++-----------
 mm/slab.c            |  4 ++--
 mm/slab_common.c     | 30 +++++++++++-------------------
 mm/slub.c            | 13 +++++++------
 4 files changed, 50 insertions(+), 38 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 14e3fe4bd6a1..4299c59353a1 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -295,12 +295,28 @@ static inline void __check_heap_object(const void *ptr, unsigned long n,
 #define SLAB_OBJ_MIN_SIZE      (KMALLOC_MIN_SIZE < 16 ? \
                                (KMALLOC_MIN_SIZE) : 16)
 
+#define KMALLOC_NORMAL	0
+#ifdef CONFIG_ZONE_DMA
+#define KMALLOC_DMA	1
+#define KMALLOC_TYPES	2
+#else
+#define KMALLOC_TYPES	1
+#endif
+
 #ifndef CONFIG_SLOB
-extern struct kmem_cache *kmalloc_caches[KMALLOC_SHIFT_HIGH + 1];
+extern struct kmem_cache *kmalloc_caches[KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1];
+
+static __always_inline unsigned int kmalloc_type(gfp_t flags)
+{
+	int is_dma = 0;
+
 #ifdef CONFIG_ZONE_DMA
-extern struct kmem_cache *kmalloc_dma_caches[KMALLOC_SHIFT_HIGH + 1];
+	is_dma = !!(flags & __GFP_DMA);
 #endif
 
+	return is_dma;
+}
+
 /*
  * Figure out which kmalloc slab an allocation of a certain size
  * belongs to.
@@ -501,18 +517,20 @@ static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
 static __always_inline void *kmalloc(size_t size, gfp_t flags)
 {
 	if (__builtin_constant_p(size)) {
+#ifndef CONFIG_SLOB
+		unsigned int index;
+#endif
 		if (size > KMALLOC_MAX_CACHE_SIZE)
 			return kmalloc_large(size, flags);
 #ifndef CONFIG_SLOB
-		if (!(flags & GFP_DMA)) {
-			unsigned int index = kmalloc_index(size);
+		index = kmalloc_index(size);
 
-			if (!index)
-				return ZERO_SIZE_PTR;
+		if (!index)
+			return ZERO_SIZE_PTR;
 
-			return kmem_cache_alloc_trace(kmalloc_caches[index],
-					flags, size);
-		}
+		return kmem_cache_alloc_trace(
+				kmalloc_caches[kmalloc_type(flags)][index],
+				flags, size);
 #endif
 	}
 	return __kmalloc(size, flags);
@@ -542,13 +560,14 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
 {
 #ifndef CONFIG_SLOB
 	if (__builtin_constant_p(size) &&
-		size <= KMALLOC_MAX_CACHE_SIZE && !(flags & GFP_DMA)) {
+		size <= KMALLOC_MAX_CACHE_SIZE) {
 		unsigned int i = kmalloc_index(size);
 
 		if (!i)
 			return ZERO_SIZE_PTR;
 
-		return kmem_cache_alloc_node_trace(kmalloc_caches[i],
+		return kmem_cache_alloc_node_trace(
+				kmalloc_caches[kmalloc_type(flags)][i],
 						flags, node, size);
 	}
 #endif
diff --git a/mm/slab.c b/mm/slab.c
index aa76a70e087e..9515798f37b2 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1288,7 +1288,7 @@ void __init kmem_cache_init(void)
 	 * Initialize the caches that provide memory for the  kmem_cache_node
 	 * structures first.  Without this, further allocations will bug.
 	 */
-	kmalloc_caches[INDEX_NODE] = create_kmalloc_cache(
+	kmalloc_caches[KMALLOC_NORMAL][INDEX_NODE] = create_kmalloc_cache(
 				kmalloc_info[INDEX_NODE].name,
 				kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS,
 				0, kmalloc_size(INDEX_NODE));
@@ -1304,7 +1304,7 @@ void __init kmem_cache_init(void)
 		for_each_online_node(nid) {
 			init_list(kmem_cache, &init_kmem_cache_node[CACHE_CACHE + nid], nid);
 
-			init_list(kmalloc_caches[INDEX_NODE],
+			init_list(kmalloc_caches[KMALLOC_NORMAL][INDEX_NODE],
 					  &init_kmem_cache_node[SIZE_NODE + nid], nid);
 		}
 	}
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 2296caf87bfb..4614248ca381 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -973,14 +973,9 @@ struct kmem_cache *__init create_kmalloc_cache(const char *name,
 	return s;
 }
 
-struct kmem_cache *kmalloc_caches[KMALLOC_SHIFT_HIGH + 1] __ro_after_init;
+struct kmem_cache *kmalloc_caches[KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1] __ro_after_init;
 EXPORT_SYMBOL(kmalloc_caches);
 
-#ifdef CONFIG_ZONE_DMA
-struct kmem_cache *kmalloc_dma_caches[KMALLOC_SHIFT_HIGH + 1] __ro_after_init;
-EXPORT_SYMBOL(kmalloc_dma_caches);
-#endif
-
 /*
  * Conversion table for small slabs sizes / 8 to the index in the
  * kmalloc array. This is necessary for slabs < 192 since we have non power
@@ -1040,12 +1035,7 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags)
 	} else
 		index = fls(size - 1);
 
-#ifdef CONFIG_ZONE_DMA
-	if (unlikely((flags & GFP_DMA)))
-		return kmalloc_dma_caches[index];
-
-#endif
-	return kmalloc_caches[index];
+	return kmalloc_caches[kmalloc_type(flags)][index];
 }
 
 /*
@@ -1119,7 +1109,8 @@ void __init setup_kmalloc_cache_index_table(void)
 
 static void __init new_kmalloc_cache(int idx, slab_flags_t flags)
 {
-	kmalloc_caches[idx] = create_kmalloc_cache(kmalloc_info[idx].name,
+	kmalloc_caches[KMALLOC_NORMAL][idx] = create_kmalloc_cache(
+					kmalloc_info[idx].name,
 					kmalloc_info[idx].size, flags, 0,
 					kmalloc_info[idx].size);
 }
@@ -1132,9 +1123,10 @@ static void __init new_kmalloc_cache(int idx, slab_flags_t flags)
 void __init create_kmalloc_caches(slab_flags_t flags)
 {
 	int i;
+	int type = KMALLOC_NORMAL;
 
 	for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) {
-		if (!kmalloc_caches[i])
+		if (!kmalloc_caches[type][i])
 			new_kmalloc_cache(i, flags);
 
 		/*
@@ -1142,9 +1134,9 @@ void __init create_kmalloc_caches(slab_flags_t flags)
 		 * These have to be created immediately after the
 		 * earlier power of two caches
 		 */
-		if (KMALLOC_MIN_SIZE <= 32 && !kmalloc_caches[1] && i == 6)
+		if (KMALLOC_MIN_SIZE <= 32 && !kmalloc_caches[type][1] && i == 6)
 			new_kmalloc_cache(1, flags);
-		if (KMALLOC_MIN_SIZE <= 64 && !kmalloc_caches[2] && i == 7)
+		if (KMALLOC_MIN_SIZE <= 64 && !kmalloc_caches[type][2] && i == 7)
 			new_kmalloc_cache(2, flags);
 	}
 
@@ -1153,7 +1145,7 @@ void __init create_kmalloc_caches(slab_flags_t flags)
 
 #ifdef CONFIG_ZONE_DMA
 	for (i = 0; i <= KMALLOC_SHIFT_HIGH; i++) {
-		struct kmem_cache *s = kmalloc_caches[i];
+		struct kmem_cache *s = kmalloc_caches[KMALLOC_NORMAL][i];
 
 		if (s) {
 			unsigned int size = kmalloc_size(i);
@@ -1161,8 +1153,8 @@ void __init create_kmalloc_caches(slab_flags_t flags)
 				 "dma-kmalloc-%u", size);
 
 			BUG_ON(!n);
-			kmalloc_dma_caches[i] = create_kmalloc_cache(n,
-				size, SLAB_CACHE_DMA | flags, 0, 0);
+			kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache(
+				n, size, SLAB_CACHE_DMA | flags, 0, 0);
 		}
 	}
 #endif
diff --git a/mm/slub.c b/mm/slub.c
index 51258eff4178..a7b4657ea8e0 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4659,6 +4659,7 @@ static int list_locations(struct kmem_cache *s, char *buf,
 static void __init resiliency_test(void)
 {
 	u8 *p;
+	int type = KMALLOC_NORMAL;
 
 	BUILD_BUG_ON(KMALLOC_MIN_SIZE > 16 || KMALLOC_SHIFT_HIGH < 10);
 
@@ -4671,7 +4672,7 @@ static void __init resiliency_test(void)
 	pr_err("\n1. kmalloc-16: Clobber Redzone/next pointer 0x12->0x%p\n\n",
 	       p + 16);
 
-	validate_slab_cache(kmalloc_caches[4]);
+	validate_slab_cache(kmalloc_caches[type][4]);
 
 	/* Hmmm... The next two are dangerous */
 	p = kzalloc(32, GFP_KERNEL);
@@ -4680,33 +4681,33 @@ static void __init resiliency_test(void)
 	       p);
 	pr_err("If allocated object is overwritten then not detectable\n\n");
 
-	validate_slab_cache(kmalloc_caches[5]);
+	validate_slab_cache(kmalloc_caches[type][5]);
 	p = kzalloc(64, GFP_KERNEL);
 	p += 64 + (get_cycles() & 0xff) * sizeof(void *);
 	*p = 0x56;
 	pr_err("\n3. kmalloc-64: corrupting random byte 0x56->0x%p\n",
 	       p);
 	pr_err("If allocated object is overwritten then not detectable\n\n");
-	validate_slab_cache(kmalloc_caches[6]);
+	validate_slab_cache(kmalloc_caches[type][6]);
 
 	pr_err("\nB. Corruption after free\n");
 	p = kzalloc(128, GFP_KERNEL);
 	kfree(p);
 	*p = 0x78;
 	pr_err("1. kmalloc-128: Clobber first word 0x78->0x%p\n\n", p);
-	validate_slab_cache(kmalloc_caches[7]);
+	validate_slab_cache(kmalloc_caches[type][7]);
 
 	p = kzalloc(256, GFP_KERNEL);
 	kfree(p);
 	p[50] = 0x9a;
 	pr_err("\n2. kmalloc-256: Clobber 50th byte 0x9a->0x%p\n\n", p);
-	validate_slab_cache(kmalloc_caches[8]);
+	validate_slab_cache(kmalloc_caches[type][8]);
 
 	p = kzalloc(512, GFP_KERNEL);
 	kfree(p);
 	p[512] = 0xab;
 	pr_err("\n3. kmalloc-512: Clobber redzone 0xab->0x%p\n\n", p);
-	validate_slab_cache(kmalloc_caches[9]);
+	validate_slab_cache(kmalloc_caches[type][9]);
 }
 #else
 #ifdef CONFIG_SYSFS
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 2/7] mm, slab/slub: introduce kmalloc-reclaimable caches
  2018-07-18 13:36 [PATCH v3 0/7] kmalloc-reclaimable caches Vlastimil Babka
  2018-07-18 13:36 ` [PATCH v3 1/7] mm, slab: combine kmalloc_caches and kmalloc_dma_caches Vlastimil Babka
@ 2018-07-18 13:36 ` Vlastimil Babka
  2018-07-19  8:23   ` Mel Gorman
                     ` (2 more replies)
  2018-07-18 13:36 ` [PATCH v3 3/7] mm, slab: allocate off-slab freelists as reclaimable when appropriate Vlastimil Babka
                   ` (5 subsequent siblings)
  7 siblings, 3 replies; 28+ messages in thread
From: Vlastimil Babka @ 2018-07-18 13:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, linux-api, Roman Gushchin, Michal Hocko,
	Johannes Weiner, Christoph Lameter, David Rientjes, Joonsoo Kim,
	Mel Gorman, Matthew Wilcox, Vlastimil Babka

Kmem caches can be created with a SLAB_RECLAIM_ACCOUNT flag, which indicates
they contain objects which can be reclaimed under memory pressure (typically
through a shrinker). This makes the slab pages accounted as NR_SLAB_RECLAIMABLE
in vmstat, which is reflected also the MemAvailable meminfo counter and in
overcommit decisions. The slab pages are also allocated with __GFP_RECLAIMABLE,
which is good for anti-fragmentation through grouping pages by mobility.

The generic kmalloc-X caches are created without this flag, but sometimes are
used also for objects that can be reclaimed, which due to varying size cannot
have a dedicated kmem cache with SLAB_RECLAIM_ACCOUNT flag. A prominent example
are dcache external names, which prompted the creation of a new, manually
managed vmstat counter NR_INDIRECTLY_RECLAIMABLE_BYTES in commit f1782c9bc547
("dcache: account external names as indirectly reclaimable memory").

To better handle this and any other similar cases, this patch introduces
SLAB_RECLAIM_ACCOUNT variants of kmalloc caches, named kmalloc-rcl-X.
They are used whenever the kmalloc() call passes __GFP_RECLAIMABLE among gfp
flags. They are added to the kmalloc_caches array as a new type. Allocations
with both __GFP_DMA and __GFP_RECLAIMABLE will use a dma type cache.

This change only applies to SLAB and SLUB, not SLOB. This is fine, since SLOB's
target are tiny system and this patch does add some overhead of kmem management
objects.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/slab.h | 16 +++++++++++----
 mm/slab_common.c     | 48 ++++++++++++++++++++++++++++----------------
 2 files changed, 43 insertions(+), 21 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 4299c59353a1..d89e934e0d8b 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -296,11 +296,12 @@ static inline void __check_heap_object(const void *ptr, unsigned long n,
                                (KMALLOC_MIN_SIZE) : 16)
 
 #define KMALLOC_NORMAL	0
+#define KMALLOC_RECLAIM	1
 #ifdef CONFIG_ZONE_DMA
-#define KMALLOC_DMA	1
-#define KMALLOC_TYPES	2
+#define KMALLOC_DMA	2
+#define KMALLOC_TYPES	3
 #else
-#define KMALLOC_TYPES	1
+#define KMALLOC_TYPES	2
 #endif
 
 #ifndef CONFIG_SLOB
@@ -309,12 +310,19 @@ extern struct kmem_cache *kmalloc_caches[KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1];
 static __always_inline unsigned int kmalloc_type(gfp_t flags)
 {
 	int is_dma = 0;
+	int is_reclaimable;
 
 #ifdef CONFIG_ZONE_DMA
 	is_dma = !!(flags & __GFP_DMA);
 #endif
 
-	return is_dma;
+	is_reclaimable = !!(flags & __GFP_RECLAIMABLE);
+
+	/*
+	 * If an allocation is botth __GFP_DMA and __GFP_RECLAIMABLE, return
+	 * KMALLOC_DMA and effectively ignore __GFP_RECLAIMABLE
+	 */
+	return (is_dma * 2) + (is_reclaimable & !is_dma);
 }
 
 /*
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 4614248ca381..614fb7ab8312 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1107,10 +1107,21 @@ void __init setup_kmalloc_cache_index_table(void)
 	}
 }
 
-static void __init new_kmalloc_cache(int idx, slab_flags_t flags)
+static void __init
+new_kmalloc_cache(int idx, int type, slab_flags_t flags)
 {
-	kmalloc_caches[KMALLOC_NORMAL][idx] = create_kmalloc_cache(
-					kmalloc_info[idx].name,
+	const char *name;
+
+	if (type == KMALLOC_RECLAIM) {
+		flags |= SLAB_RECLAIM_ACCOUNT;
+		name = kasprintf(GFP_NOWAIT, "kmalloc-rcl-%u",
+						kmalloc_info[idx].size);
+		BUG_ON(!name);
+	} else {
+		name = kmalloc_info[idx].name;
+	}
+
+	kmalloc_caches[type][idx] = create_kmalloc_cache(name,
 					kmalloc_info[idx].size, flags, 0,
 					kmalloc_info[idx].size);
 }
@@ -1122,22 +1133,25 @@ static void __init new_kmalloc_cache(int idx, slab_flags_t flags)
  */
 void __init create_kmalloc_caches(slab_flags_t flags)
 {
-	int i;
-	int type = KMALLOC_NORMAL;
+	int i, type;
 
-	for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) {
-		if (!kmalloc_caches[type][i])
-			new_kmalloc_cache(i, flags);
+	for (type = KMALLOC_NORMAL; type <= KMALLOC_RECLAIM; type++) {
+		for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) {
+			if (!kmalloc_caches[type][i])
+				new_kmalloc_cache(i, type, flags);
 
-		/*
-		 * Caches that are not of the two-to-the-power-of size.
-		 * These have to be created immediately after the
-		 * earlier power of two caches
-		 */
-		if (KMALLOC_MIN_SIZE <= 32 && !kmalloc_caches[type][1] && i == 6)
-			new_kmalloc_cache(1, flags);
-		if (KMALLOC_MIN_SIZE <= 64 && !kmalloc_caches[type][2] && i == 7)
-			new_kmalloc_cache(2, flags);
+			/*
+			 * Caches that are not of the two-to-the-power-of size.
+			 * These have to be created immediately after the
+			 * earlier power of two caches
+			 */
+			if (KMALLOC_MIN_SIZE <= 32 && i == 6 &&
+					!kmalloc_caches[type][1])
+				new_kmalloc_cache(1, type, flags);
+			if (KMALLOC_MIN_SIZE <= 64 && i == 7 &&
+					!kmalloc_caches[type][2])
+				new_kmalloc_cache(2, type, flags);
+		}
 	}
 
 	/* Kmalloc array is now usable */
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 3/7] mm, slab: allocate off-slab freelists as reclaimable when appropriate
  2018-07-18 13:36 [PATCH v3 0/7] kmalloc-reclaimable caches Vlastimil Babka
  2018-07-18 13:36 ` [PATCH v3 1/7] mm, slab: combine kmalloc_caches and kmalloc_dma_caches Vlastimil Babka
  2018-07-18 13:36 ` [PATCH v3 2/7] mm, slab/slub: introduce kmalloc-reclaimable caches Vlastimil Babka
@ 2018-07-18 13:36 ` Vlastimil Babka
  2018-07-19  8:35   ` Mel Gorman
  2018-07-30 15:45   ` Christopher Lameter
  2018-07-18 13:36 ` [PATCH v3 4/7] dcache: allocate external names from reclaimable kmalloc caches Vlastimil Babka
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 28+ messages in thread
From: Vlastimil Babka @ 2018-07-18 13:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, linux-api, Roman Gushchin, Michal Hocko,
	Johannes Weiner, Christoph Lameter, David Rientjes, Joonsoo Kim,
	Mel Gorman, Matthew Wilcox, Vlastimil Babka

In SLAB, OFF_SLAB caches allocate management structures (currently just the
freelist) from kmalloc caches when placement in a slab page together with
objects would lead to suboptimal memory usage. For SLAB_RECLAIM_ACCOUNT caches,
we can allocate the freelists from the newly introduced reclaimable kmalloc
caches, because shrinking the OFF_SLAB cache will in general result to freeing
of the freelists as well. This should improve accounting and anti-fragmentation
a bit.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slab.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/slab.c b/mm/slab.c
index 9515798f37b2..99d779ba2b92 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2140,8 +2140,13 @@ int __kmem_cache_create(struct kmem_cache *cachep, slab_flags_t flags)
 #endif
 
 	if (OFF_SLAB(cachep)) {
+		/*
+		 * If this cache is reclaimable, allocate also freelists from
+		 * a reclaimable kmalloc cache.
+		 */
 		cachep->freelist_cache =
-			kmalloc_slab(cachep->freelist_size, 0u);
+			kmalloc_slab(cachep->freelist_size,
+				     cachep->allocflags & __GFP_RECLAIMABLE);
 	}
 
 	err = setup_cpu_cache(cachep, gfp);
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 4/7] dcache: allocate external names from reclaimable kmalloc caches
  2018-07-18 13:36 [PATCH v3 0/7] kmalloc-reclaimable caches Vlastimil Babka
                   ` (2 preceding siblings ...)
  2018-07-18 13:36 ` [PATCH v3 3/7] mm, slab: allocate off-slab freelists as reclaimable when appropriate Vlastimil Babka
@ 2018-07-18 13:36 ` Vlastimil Babka
  2018-07-19  8:42   ` Mel Gorman
  2018-07-18 13:36 ` [PATCH v3 5/7] mm: rename and change semantics of nr_indirectly_reclaimable_bytes Vlastimil Babka
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Vlastimil Babka @ 2018-07-18 13:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, linux-api, Roman Gushchin, Michal Hocko,
	Johannes Weiner, Christoph Lameter, David Rientjes, Joonsoo Kim,
	Mel Gorman, Matthew Wilcox, Vlastimil Babka

We can use the newly introduced kmalloc-reclaimable-X caches, to allocate
external names in dcache, which will take care of the proper accounting
automatically, and also improve anti-fragmentation page grouping.

This effectively reverts commit f1782c9bc547 ("dcache: account external names
as indirectly reclaimable memory") and instead passes __GFP_RECLAIMABLE to
kmalloc(). The accounting thus moves from NR_INDIRECTLY_RECLAIMABLE_BYTES to
NR_SLAB_RECLAIMABLE, which is also considered in MemAvailable calculation and
overcommit decisions.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 fs/dcache.c | 38 +++++++++-----------------------------
 1 file changed, 9 insertions(+), 29 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 0e8e5de3c48a..518c9ed8db8c 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -257,24 +257,10 @@ static void __d_free(struct rcu_head *head)
 	kmem_cache_free(dentry_cache, dentry); 
 }
 
-static void __d_free_external_name(struct rcu_head *head)
-{
-	struct external_name *name = container_of(head, struct external_name,
-						  u.head);
-
-	mod_node_page_state(page_pgdat(virt_to_page(name)),
-			    NR_INDIRECTLY_RECLAIMABLE_BYTES,
-			    -ksize(name));
-
-	kfree(name);
-}
-
 static void __d_free_external(struct rcu_head *head)
 {
 	struct dentry *dentry = container_of(head, struct dentry, d_u.d_rcu);
-
-	__d_free_external_name(&external_name(dentry)->u.head);
-
+	kfree(external_name(dentry));
 	kmem_cache_free(dentry_cache, dentry);
 }
 
@@ -305,7 +291,7 @@ void release_dentry_name_snapshot(struct name_snapshot *name)
 		struct external_name *p;
 		p = container_of(name->name, struct external_name, name[0]);
 		if (unlikely(atomic_dec_and_test(&p->u.count)))
-			call_rcu(&p->u.head, __d_free_external_name);
+			kfree_rcu(p, u.head);
 	}
 }
 EXPORT_SYMBOL(release_dentry_name_snapshot);
@@ -1608,7 +1594,6 @@ EXPORT_SYMBOL(d_invalidate);
  
 struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 {
-	struct external_name *ext = NULL;
 	struct dentry *dentry;
 	char *dname;
 	int err;
@@ -1629,14 +1614,15 @@ struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 		dname = dentry->d_iname;
 	} else if (name->len > DNAME_INLINE_LEN-1) {
 		size_t size = offsetof(struct external_name, name[1]);
-
-		ext = kmalloc(size + name->len, GFP_KERNEL_ACCOUNT);
-		if (!ext) {
+		struct external_name *p = kmalloc(size + name->len,
+						  GFP_KERNEL_ACCOUNT |
+						  __GFP_RECLAIMABLE);
+		if (!p) {
 			kmem_cache_free(dentry_cache, dentry); 
 			return NULL;
 		}
-		atomic_set(&ext->u.count, 1);
-		dname = ext->name;
+		atomic_set(&p->u.count, 1);
+		dname = p->name;
 	} else  {
 		dname = dentry->d_iname;
 	}	
@@ -1675,12 +1661,6 @@ struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 		}
 	}
 
-	if (unlikely(ext)) {
-		pg_data_t *pgdat = page_pgdat(virt_to_page(ext));
-		mod_node_page_state(pgdat, NR_INDIRECTLY_RECLAIMABLE_BYTES,
-				    ksize(ext));
-	}
-
 	this_cpu_inc(nr_dentry);
 
 	return dentry;
@@ -2761,7 +2741,7 @@ static void copy_name(struct dentry *dentry, struct dentry *target)
 		dentry->d_name.hash_len = target->d_name.hash_len;
 	}
 	if (old_name && likely(atomic_dec_and_test(&old_name->u.count)))
-		call_rcu(&old_name->u.head, __d_free_external_name);
+		kfree_rcu(old_name, u.head);
 }
 
 /*
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 5/7] mm: rename and change semantics of nr_indirectly_reclaimable_bytes
  2018-07-18 13:36 [PATCH v3 0/7] kmalloc-reclaimable caches Vlastimil Babka
                   ` (3 preceding siblings ...)
  2018-07-18 13:36 ` [PATCH v3 4/7] dcache: allocate external names from reclaimable kmalloc caches Vlastimil Babka
@ 2018-07-18 13:36 ` Vlastimil Babka
  2018-07-30 15:46   ` Christopher Lameter
  2018-07-18 13:36 ` [PATCH v3 6/7] mm, proc: add KReclaimable to /proc/meminfo Vlastimil Babka
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Vlastimil Babka @ 2018-07-18 13:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, linux-api, Roman Gushchin, Michal Hocko,
	Johannes Weiner, Christoph Lameter, David Rientjes, Joonsoo Kim,
	Mel Gorman, Matthew Wilcox, Vlastimil Babka, Vijayanand Jitta,
	Laura Abbott, Sumit Semwal

The vmstat counter NR_INDIRECTLY_RECLAIMABLE_BYTES was introduced by commit
eb59254608bc ("mm: introduce NR_INDIRECTLY_RECLAIMABLE_BYTES") with the goal of
accounting objects that can be reclaimed, but cannot be allocated via a
SLAB_RECLAIM_ACCOUNT cache. This is now possible via kmalloc() with
__GFP_RECLAIMABLE flag, and the dcache external names user is converted.

The counter is however still useful for accounting direct page allocations
(i.e. not slab) with a shrinker, such as the ION page pool. So keep it, and:

- change granularity to pages to be more like other counters; sub-page
  allocations should be able to use kmalloc
- rename the counter to NR_KERNEL_MISC_RECLAIMABLE
- expose the counter again in vmstat as "nr_kernel_misc_reclaimable"; we can
  again remove the check for not printing "hidden" counters

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Vijayanand Jitta <vjitta@codeaurora.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
---
 drivers/staging/android/ion/ion_page_pool.c |  8 ++++----
 include/linux/mmzone.h                      |  2 +-
 mm/page_alloc.c                             | 19 +++++++------------
 mm/util.c                                   |  3 +--
 mm/vmstat.c                                 |  6 +-----
 5 files changed, 14 insertions(+), 24 deletions(-)

diff --git a/drivers/staging/android/ion/ion_page_pool.c b/drivers/staging/android/ion/ion_page_pool.c
index 9bc56eb48d2a..0d2a95957ee8 100644
--- a/drivers/staging/android/ion/ion_page_pool.c
+++ b/drivers/staging/android/ion/ion_page_pool.c
@@ -33,8 +33,8 @@ static void ion_page_pool_add(struct ion_page_pool *pool, struct page *page)
 		pool->low_count++;
 	}
 
-	mod_node_page_state(page_pgdat(page), NR_INDIRECTLY_RECLAIMABLE_BYTES,
-			    (1 << (PAGE_SHIFT + pool->order)));
+	mod_node_page_state(page_pgdat(page), NR_KERNEL_MISC_RECLAIMABLE,
+							1 << pool->order);
 	mutex_unlock(&pool->mutex);
 }
 
@@ -53,8 +53,8 @@ static struct page *ion_page_pool_remove(struct ion_page_pool *pool, bool high)
 	}
 
 	list_del(&page->lru);
-	mod_node_page_state(page_pgdat(page), NR_INDIRECTLY_RECLAIMABLE_BYTES,
-			    -(1 << (PAGE_SHIFT + pool->order)));
+	mod_node_page_state(page_pgdat(page), NR_KERNEL_MISC_RECLAIMABLE,
+							-(1 << pool->order));
 	return page;
 }
 
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 32699b2dc52a..c2f6bc4c9e8a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -180,7 +180,7 @@ enum node_stat_item {
 	NR_VMSCAN_IMMEDIATE,	/* Prioritise for reclaim when writeback ends */
 	NR_DIRTIED,		/* page dirtyings since bootup */
 	NR_WRITTEN,		/* page writings since bootup */
-	NR_INDIRECTLY_RECLAIMABLE_BYTES, /* measured in bytes */
+	NR_KERNEL_MISC_RECLAIMABLE,	/* reclaimable non-slab kernel pages */
 	NR_VM_NODE_STAT_ITEMS
 };
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5d800d61ddb7..91f75bf4404d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4704,6 +4704,7 @@ long si_mem_available(void)
 	unsigned long pagecache;
 	unsigned long wmark_low = 0;
 	unsigned long pages[NR_LRU_LISTS];
+	unsigned long reclaimable;
 	struct zone *zone;
 	int lru;
 
@@ -4729,19 +4730,13 @@ long si_mem_available(void)
 	available += pagecache;
 
 	/*
-	 * Part of the reclaimable slab consists of items that are in use,
-	 * and cannot be freed. Cap this estimate at the low watermark.
+	 * Part of the reclaimable slab and other kernel memory consists of
+	 * items that are in use, and cannot be freed. Cap this estimate at the
+	 * low watermark.
 	 */
-	available += global_node_page_state(NR_SLAB_RECLAIMABLE) -
-		     min(global_node_page_state(NR_SLAB_RECLAIMABLE) / 2,
-			 wmark_low);
-
-	/*
-	 * Part of the kernel memory, which can be released under memory
-	 * pressure.
-	 */
-	available += global_node_page_state(NR_INDIRECTLY_RECLAIMABLE_BYTES) >>
-		PAGE_SHIFT;
+	reclaimable = global_node_page_state(NR_SLAB_RECLAIMABLE) +
+			global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE);
+	available += reclaimable - min(reclaimable / 2, wmark_low);
 
 	if (available < 0)
 		available = 0;
diff --git a/mm/util.c b/mm/util.c
index 3351659200e6..891f0654e7b5 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -675,8 +675,7 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
 		 * Part of the kernel memory, which can be released
 		 * under memory pressure.
 		 */
-		free += global_node_page_state(
-			NR_INDIRECTLY_RECLAIMABLE_BYTES) >> PAGE_SHIFT;
+		free += global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE);
 
 		/*
 		 * Leave reserved pages. The pages are not for anonymous pages.
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 8ba0870ecddd..c5e52f94ba5f 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1161,7 +1161,7 @@ const char * const vmstat_text[] = {
 	"nr_vmscan_immediate_reclaim",
 	"nr_dirtied",
 	"nr_written",
-	"", /* nr_indirectly_reclaimable */
+	"nr_kernel_misc_reclaimable",
 
 	/* enum writeback_stat_item counters */
 	"nr_dirty_threshold",
@@ -1704,10 +1704,6 @@ static int vmstat_show(struct seq_file *m, void *arg)
 	unsigned long *l = arg;
 	unsigned long off = l - (unsigned long *)m->private;
 
-	/* Skip hidden vmstat items. */
-	if (*vmstat_text[off] == '\0')
-		return 0;
-
 	seq_puts(m, vmstat_text[off]);
 	seq_put_decimal_ull(m, " ", *l);
 	seq_putc(m, '\n');
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 6/7] mm, proc: add KReclaimable to /proc/meminfo
  2018-07-18 13:36 [PATCH v3 0/7] kmalloc-reclaimable caches Vlastimil Babka
                   ` (4 preceding siblings ...)
  2018-07-18 13:36 ` [PATCH v3 5/7] mm: rename and change semantics of nr_indirectly_reclaimable_bytes Vlastimil Babka
@ 2018-07-18 13:36 ` Vlastimil Babka
  2018-07-18 13:36 ` [PATCH v3 7/7] mm, slab: shorten kmalloc cache names for large sizes Vlastimil Babka
  2018-07-19 19:53   ` Roman Gushchin
  7 siblings, 0 replies; 28+ messages in thread
From: Vlastimil Babka @ 2018-07-18 13:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, linux-api, Roman Gushchin, Michal Hocko,
	Johannes Weiner, Christoph Lameter, David Rientjes, Joonsoo Kim,
	Mel Gorman, Matthew Wilcox, Vlastimil Babka

The vmstat NR_KERNEL_MISC_RECLAIMABLE counter is for kernel non-slab
allocations that can be reclaimed via shrinker. In /proc/meminfo, we can show
the sum of all reclaimable kernel allocations (including slab) as
"KReclaimable". Add the same counter also to per-node meminfo under /sys

With this counter, users will have more complete information about
kernel memory usage. Non-slab reclaimable pages (currently just the ION
allocator) will not be missing from /proc/meminfo, making users wonder
where part of their memory went. More precisely, they already appear in
MemAvailable, but without the new counter, it's not obvious why the
value in MemAvailable doesn't fully correspond with the sum of other
counters participating in it.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 Documentation/filesystems/proc.txt |  4 ++++
 drivers/base/node.c                | 19 ++++++++++++-------
 fs/proc/meminfo.c                  | 16 ++++++++--------
 3 files changed, 24 insertions(+), 15 deletions(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 520f6a84cf50..6a255f960ab5 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -858,6 +858,7 @@ Writeback:           0 kB
 AnonPages:      861800 kB
 Mapped:         280372 kB
 Shmem:             644 kB
+KReclaimable:   168048 kB
 Slab:           284364 kB
 SReclaimable:   159856 kB
 SUnreclaim:     124508 kB
@@ -921,6 +922,9 @@ AnonHugePages: Non-file backed huge pages mapped into userspace page tables
 ShmemHugePages: Memory used by shared memory (shmem) and tmpfs allocated
               with huge pages
 ShmemPmdMapped: Shared memory mapped into userspace with huge pages
+KReclaimable: Kernel allocations that the kernel will attempt to reclaim
+              under memory pressure. Includes SReclaimable (below), and other
+              direct allocations with a shrinker.
         Slab: in-kernel data structures cache
 SReclaimable: Part of Slab, that might be reclaimed, such as caches
   SUnreclaim: Part of Slab, that cannot be reclaimed on memory pressure
diff --git a/drivers/base/node.c b/drivers/base/node.c
index a5e821d09656..81cef8031eae 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -67,8 +67,11 @@ static ssize_t node_read_meminfo(struct device *dev,
 	int nid = dev->id;
 	struct pglist_data *pgdat = NODE_DATA(nid);
 	struct sysinfo i;
+	unsigned long sreclaimable, sunreclaimable;
 
 	si_meminfo_node(&i, nid);
+	sreclaimable = node_page_state(pgdat, NR_SLAB_RECLAIMABLE);
+	sunreclaimable = node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE);
 	n = sprintf(buf,
 		       "Node %d MemTotal:       %8lu kB\n"
 		       "Node %d MemFree:        %8lu kB\n"
@@ -118,6 +121,7 @@ static ssize_t node_read_meminfo(struct device *dev,
 		       "Node %d NFS_Unstable:   %8lu kB\n"
 		       "Node %d Bounce:         %8lu kB\n"
 		       "Node %d WritebackTmp:   %8lu kB\n"
+		       "Node %d KReclaimable:   %8lu kB\n"
 		       "Node %d Slab:           %8lu kB\n"
 		       "Node %d SReclaimable:   %8lu kB\n"
 		       "Node %d SUnreclaim:     %8lu kB\n"
@@ -138,20 +142,21 @@ static ssize_t node_read_meminfo(struct device *dev,
 		       nid, K(node_page_state(pgdat, NR_UNSTABLE_NFS)),
 		       nid, K(sum_zone_node_page_state(nid, NR_BOUNCE)),
 		       nid, K(node_page_state(pgdat, NR_WRITEBACK_TEMP)),
-		       nid, K(node_page_state(pgdat, NR_SLAB_RECLAIMABLE) +
-			      node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE)),
-		       nid, K(node_page_state(pgdat, NR_SLAB_RECLAIMABLE)),
+		       nid, K(sreclaimable +
+			      node_page_state(pgdat, NR_KERNEL_MISC_RECLAIMABLE)),
+		       nid, K(sreclaimable + sunreclaimable),
+		       nid, K(sreclaimable),
+		       nid, K(sunreclaimable)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-		       nid, K(node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE)),
+		       ,
 		       nid, K(node_page_state(pgdat, NR_ANON_THPS) *
 				       HPAGE_PMD_NR),
 		       nid, K(node_page_state(pgdat, NR_SHMEM_THPS) *
 				       HPAGE_PMD_NR),
 		       nid, K(node_page_state(pgdat, NR_SHMEM_PMDMAPPED) *
-				       HPAGE_PMD_NR));
-#else
-		       nid, K(node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE)));
+				       HPAGE_PMD_NR)
 #endif
+		       );
 	n += hugetlb_report_node_meminfo(nid, buf + n);
 	return n;
 }
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 2fb04846ed11..61a18477bc07 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -37,6 +37,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 	long cached;
 	long available;
 	unsigned long pages[NR_LRU_LISTS];
+	unsigned long sreclaimable, sunreclaim;
 	int lru;
 
 	si_meminfo(&i);
@@ -52,6 +53,8 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 		pages[lru] = global_node_page_state(NR_LRU_BASE + lru);
 
 	available = si_mem_available();
+	sreclaimable = global_node_page_state(NR_SLAB_RECLAIMABLE);
+	sunreclaim = global_node_page_state(NR_SLAB_UNRECLAIMABLE);
 
 	show_val_kb(m, "MemTotal:       ", i.totalram);
 	show_val_kb(m, "MemFree:        ", i.freeram);
@@ -93,14 +96,11 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 	show_val_kb(m, "Mapped:         ",
 		    global_node_page_state(NR_FILE_MAPPED));
 	show_val_kb(m, "Shmem:          ", i.sharedram);
-	show_val_kb(m, "Slab:           ",
-		    global_node_page_state(NR_SLAB_RECLAIMABLE) +
-		    global_node_page_state(NR_SLAB_UNRECLAIMABLE));
-
-	show_val_kb(m, "SReclaimable:   ",
-		    global_node_page_state(NR_SLAB_RECLAIMABLE));
-	show_val_kb(m, "SUnreclaim:     ",
-		    global_node_page_state(NR_SLAB_UNRECLAIMABLE));
+	show_val_kb(m, "KReclaimable:   ", sreclaimable +
+		    global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE));
+	show_val_kb(m, "Slab:           ", sreclaimable + sunreclaim);
+	show_val_kb(m, "SReclaimable:   ", sreclaimable);
+	show_val_kb(m, "SUnreclaim:     ", sunreclaim);
 	seq_printf(m, "KernelStack:    %8lu kB\n",
 		   global_zone_page_state(NR_KERNEL_STACK_KB));
 	show_val_kb(m, "PageTables:     ",
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 7/7] mm, slab: shorten kmalloc cache names for large sizes
  2018-07-18 13:36 [PATCH v3 0/7] kmalloc-reclaimable caches Vlastimil Babka
                   ` (5 preceding siblings ...)
  2018-07-18 13:36 ` [PATCH v3 6/7] mm, proc: add KReclaimable to /proc/meminfo Vlastimil Babka
@ 2018-07-18 13:36 ` Vlastimil Babka
  2018-07-19  8:46   ` Mel Gorman
  2018-07-30 15:48   ` Christopher Lameter
  2018-07-19 19:53   ` Roman Gushchin
  7 siblings, 2 replies; 28+ messages in thread
From: Vlastimil Babka @ 2018-07-18 13:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, linux-api, Roman Gushchin, Michal Hocko,
	Johannes Weiner, Christoph Lameter, David Rientjes, Joonsoo Kim,
	Mel Gorman, Matthew Wilcox, Vlastimil Babka

Kmalloc cache names can get quite long for large object sizes, when the sizes
are expressed in bytes. Use 'k' and 'M' prefixes to make the names as short
as possible e.g. in /proc/slabinfo. This works, as we mostly use power-of-two
sizes, with exceptions only below 1k.

Example: 'kmalloc-4194304' becomes 'kmalloc-4M'

Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slab_common.c | 38 ++++++++++++++++++++++++++------------
 1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 614fb7ab8312..04d71ead7d12 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1049,15 +1049,15 @@ const struct kmalloc_info_struct kmalloc_info[] __initconst = {
 	{"kmalloc-16",             16},		{"kmalloc-32",             32},
 	{"kmalloc-64",             64},		{"kmalloc-128",           128},
 	{"kmalloc-256",           256},		{"kmalloc-512",           512},
-	{"kmalloc-1024",         1024},		{"kmalloc-2048",         2048},
-	{"kmalloc-4096",         4096},		{"kmalloc-8192",         8192},
-	{"kmalloc-16384",       16384},		{"kmalloc-32768",       32768},
-	{"kmalloc-65536",       65536},		{"kmalloc-131072",     131072},
-	{"kmalloc-262144",     262144},		{"kmalloc-524288",     524288},
-	{"kmalloc-1048576",   1048576},		{"kmalloc-2097152",   2097152},
-	{"kmalloc-4194304",   4194304},		{"kmalloc-8388608",   8388608},
-	{"kmalloc-16777216", 16777216},		{"kmalloc-33554432", 33554432},
-	{"kmalloc-67108864", 67108864}
+	{"kmalloc-1k",           1024},		{"kmalloc-2k",           2048},
+	{"kmalloc-4k",           4096},		{"kmalloc-8k",           8192},
+	{"kmalloc-16k",         16384},		{"kmalloc-32k",         32768},
+	{"kmalloc-64k",         65536},		{"kmalloc-128k",       131072},
+	{"kmalloc-256k",       262144},		{"kmalloc-512k",       524288},
+	{"kmalloc-1M",        1048576},		{"kmalloc-2M",        2097152},
+	{"kmalloc-4M",        4194304},		{"kmalloc-8M",        8388608},
+	{"kmalloc-16M",      16777216},		{"kmalloc-32M",      33554432},
+	{"kmalloc-64M",      67108864}
 };
 
 /*
@@ -1107,6 +1107,21 @@ void __init setup_kmalloc_cache_index_table(void)
 	}
 }
 
+static const char *
+kmalloc_cache_name(const char *prefix, unsigned int size)
+{
+
+	static const char units[3] = "\0kM";
+	int idx = 0;
+
+	while (size >= 1024 && (size % 1024 == 0)) {
+		size /= 1024;
+		idx++;
+	}
+
+	return kasprintf(GFP_NOWAIT, "%s-%u%c", prefix, size, units[idx]);
+}
+
 static void __init
 new_kmalloc_cache(int idx, int type, slab_flags_t flags)
 {
@@ -1114,7 +1129,7 @@ new_kmalloc_cache(int idx, int type, slab_flags_t flags)
 
 	if (type == KMALLOC_RECLAIM) {
 		flags |= SLAB_RECLAIM_ACCOUNT;
-		name = kasprintf(GFP_NOWAIT, "kmalloc-rcl-%u",
+		name = kmalloc_cache_name("kmalloc-rcl",
 						kmalloc_info[idx].size);
 		BUG_ON(!name);
 	} else {
@@ -1163,8 +1178,7 @@ void __init create_kmalloc_caches(slab_flags_t flags)
 
 		if (s) {
 			unsigned int size = kmalloc_size(i);
-			char *n = kasprintf(GFP_NOWAIT,
-				 "dma-kmalloc-%u", size);
+			const char *n = kmalloc_cache_name("dma-kmalloc", size);
 
 			BUG_ON(!n);
 			kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache(
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 1/7] mm, slab: combine kmalloc_caches and kmalloc_dma_caches
  2018-07-18 13:36 ` [PATCH v3 1/7] mm, slab: combine kmalloc_caches and kmalloc_dma_caches Vlastimil Babka
@ 2018-07-19  8:10   ` Mel Gorman
  2018-07-20  9:30     ` Vlastimil Babka
  2018-07-30 15:38   ` Christopher Lameter
  1 sibling, 1 reply; 28+ messages in thread
From: Mel Gorman @ 2018-07-19  8:10 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, linux-api, Roman Gushchin,
	Michal Hocko, Johannes Weiner, Christoph Lameter, David Rientjes,
	Joonsoo Kim, Matthew Wilcox

On Wed, Jul 18, 2018 at 03:36:14PM +0200, Vlastimil Babka wrote:
> The kmalloc caches currently mainain separate (optional) array
> kmalloc_dma_caches for __GFP_DMA allocations. There are tests for __GFP_DMA in
> the allocation hotpaths. We can avoid the branches by combining kmalloc_caches
> and kmalloc_dma_caches into a single two-dimensional array where the outer
> dimension is cache "type". This will also allow to add kmalloc-reclaimable
> caches as a third type.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

I'm surprised there are so many kmalloc users that require the DMA zone.
Some of them are certainly bogus such as in drivers for archs that only
have one zone and is probably a reflection of the confusing naming. The
audit would be a mess and unrelated to the patch so for this patch;

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 2/7] mm, slab/slub: introduce kmalloc-reclaimable caches
  2018-07-18 13:36 ` [PATCH v3 2/7] mm, slab/slub: introduce kmalloc-reclaimable caches Vlastimil Babka
@ 2018-07-19  8:23   ` Mel Gorman
  2018-07-20  9:32     ` Vlastimil Babka
  2018-07-19 18:16     ` Roman Gushchin
  2018-07-30 15:41   ` Christopher Lameter
  2 siblings, 1 reply; 28+ messages in thread
From: Mel Gorman @ 2018-07-19  8:23 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, linux-api, Roman Gushchin,
	Michal Hocko, Johannes Weiner, Christoph Lameter, David Rientjes,
	Joonsoo Kim, Matthew Wilcox

On Wed, Jul 18, 2018 at 03:36:15PM +0200, Vlastimil Babka wrote:
> Kmem caches can be created with a SLAB_RECLAIM_ACCOUNT flag, which indicates
> they contain objects which can be reclaimed under memory pressure (typically
> through a shrinker). This makes the slab pages accounted as NR_SLAB_RECLAIMABLE
> in vmstat, which is reflected also the MemAvailable meminfo counter and in
> overcommit decisions. The slab pages are also allocated with __GFP_RECLAIMABLE,
> which is good for anti-fragmentation through grouping pages by mobility.
> 
> The generic kmalloc-X caches are created without this flag, but sometimes are
> used also for objects that can be reclaimed, which due to varying size cannot
> have a dedicated kmem cache with SLAB_RECLAIM_ACCOUNT flag. A prominent example
> are dcache external names, which prompted the creation of a new, manually
> managed vmstat counter NR_INDIRECTLY_RECLAIMABLE_BYTES in commit f1782c9bc547
> ("dcache: account external names as indirectly reclaimable memory").
> 
> To better handle this and any other similar cases, this patch introduces
> SLAB_RECLAIM_ACCOUNT variants of kmalloc caches, named kmalloc-rcl-X.
> They are used whenever the kmalloc() call passes __GFP_RECLAIMABLE among gfp
> flags. They are added to the kmalloc_caches array as a new type. Allocations
> with both __GFP_DMA and __GFP_RECLAIMABLE will use a dma type cache.
> 
> This change only applies to SLAB and SLUB, not SLOB. This is fine, since SLOB's
> target are tiny system and this patch does add some overhead of kmem management
> objects.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>
> <SNIP>
>
> @@ -309,12 +310,19 @@ extern struct kmem_cache *kmalloc_caches[KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1];
>  static __always_inline unsigned int kmalloc_type(gfp_t flags)
>  {
>  	int is_dma = 0;
> +	int is_reclaimable;
>  
>  #ifdef CONFIG_ZONE_DMA
>  	is_dma = !!(flags & __GFP_DMA);
>  #endif
>  
> -	return is_dma;
> +	is_reclaimable = !!(flags & __GFP_RECLAIMABLE);
> +
> +	/*
> +	 * If an allocation is botth __GFP_DMA and __GFP_RECLAIMABLE, return
> +	 * KMALLOC_DMA and effectively ignore __GFP_RECLAIMABLE
> +	 */
> +	return (is_dma * 2) + (is_reclaimable & !is_dma);
>  }
>  

s/botth/both/



>  /*
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 4614248ca381..614fb7ab8312 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1107,10 +1107,21 @@ void __init setup_kmalloc_cache_index_table(void)
>  	}
>  }
>  
> -static void __init new_kmalloc_cache(int idx, slab_flags_t flags)
> +static void __init
> +new_kmalloc_cache(int idx, int type, slab_flags_t flags)
>  {
> -	kmalloc_caches[KMALLOC_NORMAL][idx] = create_kmalloc_cache(
> -					kmalloc_info[idx].name,
> +	const char *name;
> +
> +	if (type == KMALLOC_RECLAIM) {
> +		flags |= SLAB_RECLAIM_ACCOUNT;
> +		name = kasprintf(GFP_NOWAIT, "kmalloc-rcl-%u",
> +						kmalloc_info[idx].size);
> +		BUG_ON(!name);
> +	} else {
> +		name = kmalloc_info[idx].name;
> +	}
> +
> +	kmalloc_caches[type][idx] = create_kmalloc_cache(name,
>  					kmalloc_info[idx].size, flags, 0,
>  					kmalloc_info[idx].size);
>  }

I was going to query that BUG_ON but if I'm reading it right, we just
have to be careful in the future that the "normal" kmalloc cache is always
initialised before the reclaimable cache or there will be issues.

> @@ -1122,22 +1133,25 @@ static void __init new_kmalloc_cache(int idx, slab_flags_t flags)
>   */
>  void __init create_kmalloc_caches(slab_flags_t flags)
>  {
> -	int i;
> -	int type = KMALLOC_NORMAL;
> +	int i, type;
>  
> -	for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) {
> -		if (!kmalloc_caches[type][i])
> -			new_kmalloc_cache(i, flags);
> +	for (type = KMALLOC_NORMAL; type <= KMALLOC_RECLAIM; type++) {
> +		for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) {
> +			if (!kmalloc_caches[type][i])
> +				new_kmalloc_cache(i, type, flags);
>  

I don't see a problem here as such but the values of the KMALLOC_* types
is important both for this function and the kmalloc_type(). It might be
worth adding a warning that these functions be examined if updating the
types but then again, anyone trying and getting it wrong will have a
broken kernel so;

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/7] mm, slab: allocate off-slab freelists as reclaimable when appropriate
  2018-07-18 13:36 ` [PATCH v3 3/7] mm, slab: allocate off-slab freelists as reclaimable when appropriate Vlastimil Babka
@ 2018-07-19  8:35   ` Mel Gorman
  2018-07-20  9:37     ` Vlastimil Babka
  2018-07-30 15:45   ` Christopher Lameter
  1 sibling, 1 reply; 28+ messages in thread
From: Mel Gorman @ 2018-07-19  8:35 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, linux-api, Roman Gushchin,
	Michal Hocko, Johannes Weiner, Christoph Lameter, David Rientjes,
	Joonsoo Kim, Matthew Wilcox

On Wed, Jul 18, 2018 at 03:36:16PM +0200, Vlastimil Babka wrote:
> In SLAB, OFF_SLAB caches allocate management structures (currently just the
> freelist) from kmalloc caches when placement in a slab page together with
> objects would lead to suboptimal memory usage. For SLAB_RECLAIM_ACCOUNT caches,
> we can allocate the freelists from the newly introduced reclaimable kmalloc
> caches, because shrinking the OFF_SLAB cache will in general result to freeing
> of the freelists as well. This should improve accounting and anti-fragmentation
> a bit.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

I'm not quite convinced by this one. The freelist cache is tied to the
lifetime of the slab and not the objects. A single freelist can be reclaimed
eventually but for caches with many objects per slab, it could take a lot
of shrinking random objects to reclaim one freelist. Functionally the
patch appears to be fine.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 4/7] dcache: allocate external names from reclaimable kmalloc caches
  2018-07-18 13:36 ` [PATCH v3 4/7] dcache: allocate external names from reclaimable kmalloc caches Vlastimil Babka
@ 2018-07-19  8:42   ` Mel Gorman
  0 siblings, 0 replies; 28+ messages in thread
From: Mel Gorman @ 2018-07-19  8:42 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, linux-api, Roman Gushchin,
	Michal Hocko, Johannes Weiner, Christoph Lameter, David Rientjes,
	Joonsoo Kim, Matthew Wilcox

On Wed, Jul 18, 2018 at 03:36:17PM +0200, Vlastimil Babka wrote:
> We can use the newly introduced kmalloc-reclaimable-X caches, to allocate
> external names in dcache, which will take care of the proper accounting
> automatically, and also improve anti-fragmentation page grouping.
> 
> This effectively reverts commit f1782c9bc547 ("dcache: account external names
> as indirectly reclaimable memory") and instead passes __GFP_RECLAIMABLE to
> kmalloc(). The accounting thus moves from NR_INDIRECTLY_RECLAIMABLE_BYTES to
> NR_SLAB_RECLAIMABLE, which is also considered in MemAvailable calculation and
> overcommit decisions.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 7/7] mm, slab: shorten kmalloc cache names for large sizes
  2018-07-18 13:36 ` [PATCH v3 7/7] mm, slab: shorten kmalloc cache names for large sizes Vlastimil Babka
@ 2018-07-19  8:46   ` Mel Gorman
  2018-07-30 15:48   ` Christopher Lameter
  1 sibling, 0 replies; 28+ messages in thread
From: Mel Gorman @ 2018-07-19  8:46 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, linux-api, Roman Gushchin,
	Michal Hocko, Johannes Weiner, Christoph Lameter, David Rientjes,
	Joonsoo Kim, Matthew Wilcox

On Wed, Jul 18, 2018 at 03:36:20PM +0200, Vlastimil Babka wrote:
> Kmalloc cache names can get quite long for large object sizes, when the sizes
> are expressed in bytes. Use 'k' and 'M' prefixes to make the names as short
> as possible e.g. in /proc/slabinfo. This works, as we mostly use power-of-two
> sizes, with exceptions only below 1k.
> 
> Example: 'kmalloc-4194304' becomes 'kmalloc-4M'
> 
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

There is a slight chance this will break any external tooling that
calculates fragmentation stats for slab/slub if they are particularly
stupid parsers but other than that;

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 2/7] mm, slab/slub: introduce kmalloc-reclaimable caches
  2018-07-18 13:36 ` [PATCH v3 2/7] mm, slab/slub: introduce kmalloc-reclaimable caches Vlastimil Babka
@ 2018-07-19 18:16     ` Roman Gushchin
  2018-07-19 18:16     ` Roman Gushchin
  2018-07-30 15:41   ` Christopher Lameter
  2 siblings, 0 replies; 28+ messages in thread
From: Roman Gushchin @ 2018-07-19 18:16 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, linux-api, Michal Hocko,
	Johannes Weiner, Christoph Lameter, David Rientjes, Joonsoo Kim,
	Mel Gorman, Matthew Wilcox

On Wed, Jul 18, 2018 at 03:36:15PM +0200, Vlastimil Babka wrote:
> Kmem caches can be created with a SLAB_RECLAIM_ACCOUNT flag, which indicates
> they contain objects which can be reclaimed under memory pressure (typically
> through a shrinker). This makes the slab pages accounted as NR_SLAB_RECLAIMABLE
> in vmstat, which is reflected also the MemAvailable meminfo counter and in
> overcommit decisions. The slab pages are also allocated with __GFP_RECLAIMABLE,
> which is good for anti-fragmentation through grouping pages by mobility.
> 
> The generic kmalloc-X caches are created without this flag, but sometimes are
> used also for objects that can be reclaimed, which due to varying size cannot
> have a dedicated kmem cache with SLAB_RECLAIM_ACCOUNT flag. A prominent example
> are dcache external names, which prompted the creation of a new, manually
> managed vmstat counter NR_INDIRECTLY_RECLAIMABLE_BYTES in commit f1782c9bc547
> ("dcache: account external names as indirectly reclaimable memory").
> 
> To better handle this and any other similar cases, this patch introduces
> SLAB_RECLAIM_ACCOUNT variants of kmalloc caches, named kmalloc-rcl-X.
> They are used whenever the kmalloc() call passes __GFP_RECLAIMABLE among gfp
> flags. They are added to the kmalloc_caches array as a new type. Allocations
> with both __GFP_DMA and __GFP_RECLAIMABLE will use a dma type cache.
> 
> This change only applies to SLAB and SLUB, not SLOB. This is fine, since SLOB's
> target are tiny system and this patch does add some overhead of kmem management
> objects.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  include/linux/slab.h | 16 +++++++++++----
>  mm/slab_common.c     | 48 ++++++++++++++++++++++++++++----------------
>  2 files changed, 43 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 4299c59353a1..d89e934e0d8b 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -296,11 +296,12 @@ static inline void __check_heap_object(const void *ptr, unsigned long n,
>                                 (KMALLOC_MIN_SIZE) : 16)
>  
>  #define KMALLOC_NORMAL	0
> +#define KMALLOC_RECLAIM	1
>  #ifdef CONFIG_ZONE_DMA
> -#define KMALLOC_DMA	1
> -#define KMALLOC_TYPES	2
> +#define KMALLOC_DMA	2
> +#define KMALLOC_TYPES	3
>  #else
> -#define KMALLOC_TYPES	1
> +#define KMALLOC_TYPES	2
>  #endif
>  
>  #ifndef CONFIG_SLOB
> @@ -309,12 +310,19 @@ extern struct kmem_cache *kmalloc_caches[KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1];
>  static __always_inline unsigned int kmalloc_type(gfp_t flags)
>  {
>  	int is_dma = 0;
> +	int is_reclaimable;
>  
>  #ifdef CONFIG_ZONE_DMA
>  	is_dma = !!(flags & __GFP_DMA);
>  #endif
>  
> -	return is_dma;
> +	is_reclaimable = !!(flags & __GFP_RECLAIMABLE);
> +
> +	/*
> +	 * If an allocation is botth __GFP_DMA and __GFP_RECLAIMABLE, return
                                 ^^
			       typo
> +	 * KMALLOC_DMA and effectively ignore __GFP_RECLAIMABLE
> +	 */
> +	return (is_dma * 2) + (is_reclaimable & !is_dma);

Maybe
is_dma * KMALLOC_DMA + (is_reclaimable && !is_dma) * KMALLOC_RECLAIM
looks better?

>  }
>  
>  /*
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 4614248ca381..614fb7ab8312 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1107,10 +1107,21 @@ void __init setup_kmalloc_cache_index_table(void)
>  	}
>  }
>  
> -static void __init new_kmalloc_cache(int idx, slab_flags_t flags)
> +static void __init
> +new_kmalloc_cache(int idx, int type, slab_flags_t flags)
>  {
> -	kmalloc_caches[KMALLOC_NORMAL][idx] = create_kmalloc_cache(
> -					kmalloc_info[idx].name,
> +	const char *name;
> +
> +	if (type == KMALLOC_RECLAIM) {
> +		flags |= SLAB_RECLAIM_ACCOUNT;
> +		name = kasprintf(GFP_NOWAIT, "kmalloc-rcl-%u",
> +						kmalloc_info[idx].size);
> +		BUG_ON(!name);

I'd replace this with WARN_ON() and falling back to kmalloc_info[idx].name.

> +	} else {
> +		name = kmalloc_info[idx].name;
> +	}
> +
> +	kmalloc_caches[type][idx] = create_kmalloc_cache(name,
>  					kmalloc_info[idx].size, flags, 0,
>  					kmalloc_info[idx].size);
>  }

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 2/7] mm, slab/slub: introduce kmalloc-reclaimable caches
@ 2018-07-19 18:16     ` Roman Gushchin
  0 siblings, 0 replies; 28+ messages in thread
From: Roman Gushchin @ 2018-07-19 18:16 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, linux-api, Michal Hocko,
	Johannes Weiner, Christoph Lameter, David Rientjes, Joonsoo Kim,
	Mel Gorman, Matthew Wilcox

On Wed, Jul 18, 2018 at 03:36:15PM +0200, Vlastimil Babka wrote:
> Kmem caches can be created with a SLAB_RECLAIM_ACCOUNT flag, which indicates
> they contain objects which can be reclaimed under memory pressure (typically
> through a shrinker). This makes the slab pages accounted as NR_SLAB_RECLAIMABLE
> in vmstat, which is reflected also the MemAvailable meminfo counter and in
> overcommit decisions. The slab pages are also allocated with __GFP_RECLAIMABLE,
> which is good for anti-fragmentation through grouping pages by mobility.
> 
> The generic kmalloc-X caches are created without this flag, but sometimes are
> used also for objects that can be reclaimed, which due to varying size cannot
> have a dedicated kmem cache with SLAB_RECLAIM_ACCOUNT flag. A prominent example
> are dcache external names, which prompted the creation of a new, manually
> managed vmstat counter NR_INDIRECTLY_RECLAIMABLE_BYTES in commit f1782c9bc547
> ("dcache: account external names as indirectly reclaimable memory").
> 
> To better handle this and any other similar cases, this patch introduces
> SLAB_RECLAIM_ACCOUNT variants of kmalloc caches, named kmalloc-rcl-X.
> They are used whenever the kmalloc() call passes __GFP_RECLAIMABLE among gfp
> flags. They are added to the kmalloc_caches array as a new type. Allocations
> with both __GFP_DMA and __GFP_RECLAIMABLE will use a dma type cache.
> 
> This change only applies to SLAB and SLUB, not SLOB. This is fine, since SLOB's
> target are tiny system and this patch does add some overhead of kmem management
> objects.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  include/linux/slab.h | 16 +++++++++++----
>  mm/slab_common.c     | 48 ++++++++++++++++++++++++++++----------------
>  2 files changed, 43 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 4299c59353a1..d89e934e0d8b 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -296,11 +296,12 @@ static inline void __check_heap_object(const void *ptr, unsigned long n,
>                                 (KMALLOC_MIN_SIZE) : 16)
>  
>  #define KMALLOC_NORMAL	0
> +#define KMALLOC_RECLAIM	1
>  #ifdef CONFIG_ZONE_DMA
> -#define KMALLOC_DMA	1
> -#define KMALLOC_TYPES	2
> +#define KMALLOC_DMA	2
> +#define KMALLOC_TYPES	3
>  #else
> -#define KMALLOC_TYPES	1
> +#define KMALLOC_TYPES	2
>  #endif
>  
>  #ifndef CONFIG_SLOB
> @@ -309,12 +310,19 @@ extern struct kmem_cache *kmalloc_caches[KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1];
>  static __always_inline unsigned int kmalloc_type(gfp_t flags)
>  {
>  	int is_dma = 0;
> +	int is_reclaimable;
>  
>  #ifdef CONFIG_ZONE_DMA
>  	is_dma = !!(flags & __GFP_DMA);
>  #endif
>  
> -	return is_dma;
> +	is_reclaimable = !!(flags & __GFP_RECLAIMABLE);
> +
> +	/*
> +	 * If an allocation is botth __GFP_DMA and __GFP_RECLAIMABLE, return
                                 ^^
			       typo
> +	 * KMALLOC_DMA and effectively ignore __GFP_RECLAIMABLE
> +	 */
> +	return (is_dma * 2) + (is_reclaimable & !is_dma);

Maybe
is_dma * KMALLOC_DMA + (is_reclaimable && !is_dma) * KMALLOC_RECLAIM
looks better?

>  }
>  
>  /*
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 4614248ca381..614fb7ab8312 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1107,10 +1107,21 @@ void __init setup_kmalloc_cache_index_table(void)
>  	}
>  }
>  
> -static void __init new_kmalloc_cache(int idx, slab_flags_t flags)
> +static void __init
> +new_kmalloc_cache(int idx, int type, slab_flags_t flags)
>  {
> -	kmalloc_caches[KMALLOC_NORMAL][idx] = create_kmalloc_cache(
> -					kmalloc_info[idx].name,
> +	const char *name;
> +
> +	if (type == KMALLOC_RECLAIM) {
> +		flags |= SLAB_RECLAIM_ACCOUNT;
> +		name = kasprintf(GFP_NOWAIT, "kmalloc-rcl-%u",
> +						kmalloc_info[idx].size);
> +		BUG_ON(!name);

I'd replace this with WARN_ON() and falling back to kmalloc_info[idx].name.

> +	} else {
> +		name = kmalloc_info[idx].name;
> +	}
> +
> +	kmalloc_caches[type][idx] = create_kmalloc_cache(name,
>  					kmalloc_info[idx].size, flags, 0,
>  					kmalloc_info[idx].size);
>  }

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 0/7] kmalloc-reclaimable caches
  2018-07-18 13:36 [PATCH v3 0/7] kmalloc-reclaimable caches Vlastimil Babka
@ 2018-07-19 19:53   ` Roman Gushchin
  2018-07-18 13:36 ` [PATCH v3 2/7] mm, slab/slub: introduce kmalloc-reclaimable caches Vlastimil Babka
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 28+ messages in thread
From: Roman Gushchin @ 2018-07-19 19:53 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, linux-api, Michal Hocko,
	Johannes Weiner, Christoph Lameter, David Rientjes, Joonsoo Kim,
	Mel Gorman, Matthew Wilcox, Laura Abbott, Sumit Semwal,
	Vijayanand Jitta

On Wed, Jul 18, 2018 at 03:36:13PM +0200, Vlastimil Babka wrote:
> v3 changes:
> - fix missing hunk in patch 5/7
> - more verbose cover letter and patch 6/7 commit log
> v2 changes:
> - shorten cache names to kmalloc-rcl-<SIZE>
> - last patch shortens <SIZE> for all kmalloc caches to e.g. "1k", "4M"
> - include dma caches to the 2D kmalloc_caches[] array to avoid a branch
> - vmstat counter nr_indirectly_reclaimable_bytes renamed to
>   nr_kernel_misc_reclaimable, doesn't include kmalloc-rcl-*
> - /proc/meminfo counter renamed to KReclaimable, includes kmalloc-rcl*
>   and nr_kernel_misc_reclaimable
> 
> Hi,
> 
> as discussed at LSF/MM [1] here's a patchset that introduces
> kmalloc-reclaimable caches (more details in the second patch) and uses them for
> SLAB freelists and dcache external names. The latter allows us to repurpose the
> NR_INDIRECTLY_RECLAIMABLE_BYTES counter later in the series.
> 
> With patch 4/7, dcache external names are allocated from kmalloc-rcl-*
> caches, eliminating the need for manual accounting. More importantly, it
> also ensures the reclaimable kmalloc allocations are grouped in pages
> separate from the regular kmalloc allocations. The need for proper
> accounting of dcache external names has shown it's easy for misbehaving
> process to allocate lots of them, causing premature OOMs. Without the
> added grouping, it's likely that a similar workload can interleave the
> dcache external names allocations with regular kmalloc allocations
> (note: I haven't searched myself for an example of such regular kmalloc
> allocation, but I would be very surprised if there wasn't some). A
> pathological case would be e.g. one 64byte regular allocations with 63
> external dcache names in a page (64x64=4096), which means the page is
> not freed even after reclaiming after all dcache names, and the process
> can thus "steal" the whole page with single 64byte allocation.
> 
> If there other kmalloc users similar to dcache external names become
> identified, they can also benefit from the new functionality simply by
> adding __GFP_RECLAIMABLE to the kmalloc calls.
> 
> Side benefits of the patchset (that could be also merged separately)
> include removed branch for detecting __GFP_DMA kmalloc(), and shortening
> kmalloc cache names in /proc/slabinfo output. The latter is potentially
> an ABI break in case there are tools parsing the names and expecting the
> values to be in bytes.
> 
> This is how /proc/slabinfo looks like after booting in virtme:
> 
> ...
> kmalloc-rcl-4M         0      0 4194304    1 1024 : tunables    1    1    0 : slabdata      0      0      0
> ...
> kmalloc-rcl-96         7     32    128   32    1 : tunables  120   60    8 : slabdata      1      1      0
> kmalloc-rcl-64        25    128     64   64    1 : tunables  120   60    8 : slabdata      2      2      0
> kmalloc-rcl-32         0      0     32  124    1 : tunables  120   60    8 : slabdata      0      0      0
> kmalloc-4M             0      0 4194304    1 1024 : tunables    1    1    0 : slabdata      0      0      0
> kmalloc-2M             0      0 2097152    1  512 : tunables    1    1    0 : slabdata      0      0      0
> kmalloc-1M             0      0 1048576    1  256 : tunables    1    1    0 : slabdata      0      0      0
> ...
> 
> /proc/vmstat with renamed nr_indirectly_reclaimable_bytes counter:
> 
> ...
> nr_slab_reclaimable 2817
> nr_slab_unreclaimable 1781
> ...
> nr_kernel_misc_reclaimable 0
> ...
> 
> /proc/meminfo with new KReclaimable counter:
> 
> ...
> Shmem:               564 kB
> KReclaimable:      11260 kB
> Slab:              18368 kB
> SReclaimable:      11260 kB
> SUnreclaim:         7108 kB
> KernelStack:        1248 kB
> ...
> 
> Thanks,
> Vlastimil

Hi, Vlastimil!

Overall the patchset looks solid to me.
Please, feel free to add
Acked-by: Roman Gushchin <guro@fb.com>

Two small nits:
1) The last patch is unrelated to the main idea,
and can potentially cause ABI breakage.
I'd separate it from the rest of the patchset.

2) It's actually re-opening the security issue for SLOB
users. Is the memory overhead really big enough to
justify that?

Thanks!

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 0/7] kmalloc-reclaimable caches
@ 2018-07-19 19:53   ` Roman Gushchin
  0 siblings, 0 replies; 28+ messages in thread
From: Roman Gushchin @ 2018-07-19 19:53 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, linux-api, Michal Hocko,
	Johannes Weiner, Christoph Lameter, David Rientjes, Joonsoo Kim,
	Mel Gorman, Matthew Wilcox, Laura Abbott, Sumit Semwal,
	Vijayanand Jitta

On Wed, Jul 18, 2018 at 03:36:13PM +0200, Vlastimil Babka wrote:
> v3 changes:
> - fix missing hunk in patch 5/7
> - more verbose cover letter and patch 6/7 commit log
> v2 changes:
> - shorten cache names to kmalloc-rcl-<SIZE>
> - last patch shortens <SIZE> for all kmalloc caches to e.g. "1k", "4M"
> - include dma caches to the 2D kmalloc_caches[] array to avoid a branch
> - vmstat counter nr_indirectly_reclaimable_bytes renamed to
>   nr_kernel_misc_reclaimable, doesn't include kmalloc-rcl-*
> - /proc/meminfo counter renamed to KReclaimable, includes kmalloc-rcl*
>   and nr_kernel_misc_reclaimable
> 
> Hi,
> 
> as discussed at LSF/MM [1] here's a patchset that introduces
> kmalloc-reclaimable caches (more details in the second patch) and uses them for
> SLAB freelists and dcache external names. The latter allows us to repurpose the
> NR_INDIRECTLY_RECLAIMABLE_BYTES counter later in the series.
> 
> With patch 4/7, dcache external names are allocated from kmalloc-rcl-*
> caches, eliminating the need for manual accounting. More importantly, it
> also ensures the reclaimable kmalloc allocations are grouped in pages
> separate from the regular kmalloc allocations. The need for proper
> accounting of dcache external names has shown it's easy for misbehaving
> process to allocate lots of them, causing premature OOMs. Without the
> added grouping, it's likely that a similar workload can interleave the
> dcache external names allocations with regular kmalloc allocations
> (note: I haven't searched myself for an example of such regular kmalloc
> allocation, but I would be very surprised if there wasn't some). A
> pathological case would be e.g. one 64byte regular allocations with 63
> external dcache names in a page (64x64=4096), which means the page is
> not freed even after reclaiming after all dcache names, and the process
> can thus "steal" the whole page with single 64byte allocation.
> 
> If there other kmalloc users similar to dcache external names become
> identified, they can also benefit from the new functionality simply by
> adding __GFP_RECLAIMABLE to the kmalloc calls.
> 
> Side benefits of the patchset (that could be also merged separately)
> include removed branch for detecting __GFP_DMA kmalloc(), and shortening
> kmalloc cache names in /proc/slabinfo output. The latter is potentially
> an ABI break in case there are tools parsing the names and expecting the
> values to be in bytes.
> 
> This is how /proc/slabinfo looks like after booting in virtme:
> 
> ...
> kmalloc-rcl-4M         0      0 4194304    1 1024 : tunables    1    1    0 : slabdata      0      0      0
> ...
> kmalloc-rcl-96         7     32    128   32    1 : tunables  120   60    8 : slabdata      1      1      0
> kmalloc-rcl-64        25    128     64   64    1 : tunables  120   60    8 : slabdata      2      2      0
> kmalloc-rcl-32         0      0     32  124    1 : tunables  120   60    8 : slabdata      0      0      0
> kmalloc-4M             0      0 4194304    1 1024 : tunables    1    1    0 : slabdata      0      0      0
> kmalloc-2M             0      0 2097152    1  512 : tunables    1    1    0 : slabdata      0      0      0
> kmalloc-1M             0      0 1048576    1  256 : tunables    1    1    0 : slabdata      0      0      0
> ...
> 
> /proc/vmstat with renamed nr_indirectly_reclaimable_bytes counter:
> 
> ...
> nr_slab_reclaimable 2817
> nr_slab_unreclaimable 1781
> ...
> nr_kernel_misc_reclaimable 0
> ...
> 
> /proc/meminfo with new KReclaimable counter:
> 
> ...
> Shmem:               564 kB
> KReclaimable:      11260 kB
> Slab:              18368 kB
> SReclaimable:      11260 kB
> SUnreclaim:         7108 kB
> KernelStack:        1248 kB
> ...
> 
> Thanks,
> Vlastimil

Hi, Vlastimil!

Overall the patchset looks solid to me.
Please, feel free to add
Acked-by: Roman Gushchin <guro@fb.com>

Two small nits:
1) The last patch is unrelated to the main idea,
and can potentially cause ABI breakage.
I'd separate it from the rest of the patchset.

2) It's actually re-opening the security issue for SLOB
users. Is the memory overhead really big enough to
justify that?

Thanks!

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 1/7] mm, slab: combine kmalloc_caches and kmalloc_dma_caches
  2018-07-19  8:10   ` Mel Gorman
@ 2018-07-20  9:30     ` Vlastimil Babka
  0 siblings, 0 replies; 28+ messages in thread
From: Vlastimil Babka @ 2018-07-20  9:30 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, linux-mm, linux-kernel, linux-api, Roman Gushchin,
	Michal Hocko, Johannes Weiner, Christoph Lameter, David Rientjes,
	Joonsoo Kim, Matthew Wilcox, Luis R. Rodriguez

On 07/19/2018 10:10 AM, Mel Gorman wrote:
> On Wed, Jul 18, 2018 at 03:36:14PM +0200, Vlastimil Babka wrote:
>> The kmalloc caches currently mainain separate (optional) array
>> kmalloc_dma_caches for __GFP_DMA allocations. There are tests for __GFP_DMA in
>> the allocation hotpaths. We can avoid the branches by combining kmalloc_caches
>> and kmalloc_dma_caches into a single two-dimensional array where the outer
>> dimension is cache "type". This will also allow to add kmalloc-reclaimable
>> caches as a third type.
>>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> 
> I'm surprised there are so many kmalloc users that require the DMA zone.
> Some of them are certainly bogus such as in drivers for archs that only
> have one zone and is probably a reflection of the confusing naming. The
> audit would be a mess and unrelated to the patch so for this patch;

Yeah, there was a session about that on LSF/MM and Luis was working on
it. One of the motivations was to get rid of the branch, so that's
sidestepped by this patch. I would still like to not have slabinfo full
of empty dma-kmalloc caches though :)

> Acked-by: Mel Gorman <mgorman@techsingularity.net>
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 2/7] mm, slab/slub: introduce kmalloc-reclaimable caches
  2018-07-19  8:23   ` Mel Gorman
@ 2018-07-20  9:32     ` Vlastimil Babka
  0 siblings, 0 replies; 28+ messages in thread
From: Vlastimil Babka @ 2018-07-20  9:32 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, linux-mm, linux-kernel, linux-api, Roman Gushchin,
	Michal Hocko, Johannes Weiner, Christoph Lameter, David Rientjes,
	Joonsoo Kim, Matthew Wilcox

On 07/19/2018 10:23 AM, Mel Gorman wrote:
>>  /*
>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>> index 4614248ca381..614fb7ab8312 100644
>> --- a/mm/slab_common.c
>> +++ b/mm/slab_common.c
>> @@ -1107,10 +1107,21 @@ void __init setup_kmalloc_cache_index_table(void)
>>  	}
>>  }
>>  
>> -static void __init new_kmalloc_cache(int idx, slab_flags_t flags)
>> +static void __init
>> +new_kmalloc_cache(int idx, int type, slab_flags_t flags)
>>  {
>> -	kmalloc_caches[KMALLOC_NORMAL][idx] = create_kmalloc_cache(
>> -					kmalloc_info[idx].name,
>> +	const char *name;
>> +
>> +	if (type == KMALLOC_RECLAIM) {
>> +		flags |= SLAB_RECLAIM_ACCOUNT;
>> +		name = kasprintf(GFP_NOWAIT, "kmalloc-rcl-%u",
>> +						kmalloc_info[idx].size);
>> +		BUG_ON(!name);
>> +	} else {
>> +		name = kmalloc_info[idx].name;
>> +	}
>> +
>> +	kmalloc_caches[type][idx] = create_kmalloc_cache(name,
>>  					kmalloc_info[idx].size, flags, 0,
>>  					kmalloc_info[idx].size);
>>  }
> 
> I was going to query that BUG_ON but if I'm reading it right, we just
> have to be careful in the future that the "normal" kmalloc cache is always
> initialised before the reclaimable cache or there will be issues.

Yeah, I was just copying how the dma-kmalloc code does it.

>> @@ -1122,22 +1133,25 @@ static void __init new_kmalloc_cache(int idx, slab_flags_t flags)
>>   */
>>  void __init create_kmalloc_caches(slab_flags_t flags)
>>  {
>> -	int i;
>> -	int type = KMALLOC_NORMAL;
>> +	int i, type;
>>  
>> -	for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) {
>> -		if (!kmalloc_caches[type][i])
>> -			new_kmalloc_cache(i, flags);
>> +	for (type = KMALLOC_NORMAL; type <= KMALLOC_RECLAIM; type++) {
>> +		for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) {
>> +			if (!kmalloc_caches[type][i])
>> +				new_kmalloc_cache(i, type, flags);
>>  
> 
> I don't see a problem here as such but the values of the KMALLOC_* types
> is important both for this function and the kmalloc_type(). It might be
> worth adding a warning that these functions be examined if updating the
> types but then again, anyone trying and getting it wrong will have a
> broken kernel so;

OK

> Acked-by: Mel Gorman <mgorman@techsingularity.net>

Thanks!


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 2/7] mm, slab/slub: introduce kmalloc-reclaimable caches
  2018-07-19 18:16     ` Roman Gushchin
  (?)
@ 2018-07-20  9:35     ` Vlastimil Babka
  -1 siblings, 0 replies; 28+ messages in thread
From: Vlastimil Babka @ 2018-07-20  9:35 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, linux-mm, linux-kernel, linux-api, Michal Hocko,
	Johannes Weiner, Christoph Lameter, David Rientjes, Joonsoo Kim,
	Mel Gorman, Matthew Wilcox

On 07/19/2018 08:16 PM, Roman Gushchin wrote:
>>  	is_dma = !!(flags & __GFP_DMA);
>>  #endif
>>  
>> -	return is_dma;
>> +	is_reclaimable = !!(flags & __GFP_RECLAIMABLE);
>> +
>> +	/*
>> +	 * If an allocation is botth __GFP_DMA and __GFP_RECLAIMABLE, return
>                                  ^^
> 			       typo
>> +	 * KMALLOC_DMA and effectively ignore __GFP_RECLAIMABLE
>> +	 */
>> +	return (is_dma * 2) + (is_reclaimable & !is_dma);
> 
> Maybe
> is_dma * KMALLOC_DMA + (is_reclaimable && !is_dma) * KMALLOC_RECLAIM
> looks better?

I think I meant to do that but forgot, thanks.

>>  }
>>  
>>  /*
>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>> index 4614248ca381..614fb7ab8312 100644
>> --- a/mm/slab_common.c
>> +++ b/mm/slab_common.c
>> @@ -1107,10 +1107,21 @@ void __init setup_kmalloc_cache_index_table(void)
>>  	}
>>  }
>>  
>> -static void __init new_kmalloc_cache(int idx, slab_flags_t flags)
>> +static void __init
>> +new_kmalloc_cache(int idx, int type, slab_flags_t flags)
>>  {
>> -	kmalloc_caches[KMALLOC_NORMAL][idx] = create_kmalloc_cache(
>> -					kmalloc_info[idx].name,
>> +	const char *name;
>> +
>> +	if (type == KMALLOC_RECLAIM) {
>> +		flags |= SLAB_RECLAIM_ACCOUNT;
>> +		name = kasprintf(GFP_NOWAIT, "kmalloc-rcl-%u",
>> +						kmalloc_info[idx].size);
>> +		BUG_ON(!name);
> 
> I'd replace this with WARN_ON() and falling back to kmalloc_info[idx].name.

It's basically a copy/paste of the dma-kmalloc code. If that triggers,
it means somebody was changing the code and introduced a wrong order (as
Mel said). A system that genuinely has no memory for that printf at this
point, would not get very far anyway...

>> +	} else {
>> +		name = kmalloc_info[idx].name;
>> +	}
>> +
>> +	kmalloc_caches[type][idx] = create_kmalloc_cache(name,
>>  					kmalloc_info[idx].size, flags, 0,
>>  					kmalloc_info[idx].size);
>>  }


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/7] mm, slab: allocate off-slab freelists as reclaimable when appropriate
  2018-07-19  8:35   ` Mel Gorman
@ 2018-07-20  9:37     ` Vlastimil Babka
  0 siblings, 0 replies; 28+ messages in thread
From: Vlastimil Babka @ 2018-07-20  9:37 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, linux-mm, linux-kernel, linux-api, Roman Gushchin,
	Michal Hocko, Johannes Weiner, Christoph Lameter, David Rientjes,
	Joonsoo Kim, Matthew Wilcox

On 07/19/2018 10:35 AM, Mel Gorman wrote:
> On Wed, Jul 18, 2018 at 03:36:16PM +0200, Vlastimil Babka wrote:
>> In SLAB, OFF_SLAB caches allocate management structures (currently just the
>> freelist) from kmalloc caches when placement in a slab page together with
>> objects would lead to suboptimal memory usage. For SLAB_RECLAIM_ACCOUNT caches,
>> we can allocate the freelists from the newly introduced reclaimable kmalloc
>> caches, because shrinking the OFF_SLAB cache will in general result to freeing
>> of the freelists as well. This should improve accounting and anti-fragmentation
>> a bit.
>>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> 
> I'm not quite convinced by this one. The freelist cache is tied to the
> lifetime of the slab and not the objects. A single freelist can be reclaimed
> eventually but for caches with many objects per slab, it could take a lot
> of shrinking random objects to reclaim one freelist. Functionally the
> patch appears to be fine.

Hm you're right that the reclaimability of freelist is maybe too much
detached, and could do more harm than good for the reclaimable caches. I
will probably drop it unless I can measure it's an improvement. Thanks.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 0/7] kmalloc-reclaimable caches
  2018-07-19 19:53   ` Roman Gushchin
  (?)
@ 2018-07-20  9:45   ` Vlastimil Babka
  -1 siblings, 0 replies; 28+ messages in thread
From: Vlastimil Babka @ 2018-07-20  9:45 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, linux-mm, linux-kernel, linux-api, Michal Hocko,
	Johannes Weiner, Christoph Lameter, David Rientjes, Joonsoo Kim,
	Mel Gorman, Matthew Wilcox, Laura Abbott, Sumit Semwal,
	Vijayanand Jitta

On 07/19/2018 09:53 PM, Roman Gushchin wrote:
>> Vlastimil
> Overall the patchset looks solid to me.
> Please, feel free to add
> Acked-by: Roman Gushchin <guro@fb.com>

Thanks!

> Two small nits:
> 1) The last patch is unrelated to the main idea,
> and can potentially cause ABI breakage.

Yes, that's why it's last.

> I'd separate it from the rest of the patchset.

It's not independent though because there would be conflicts. It has to
be decided if it goes before of after the rest. Putting it last in the
series makes the order clear and makes it possible to revert it in case
it does break any users, without disrupting the rest of the series.

> 2) It's actually re-opening the security issue for SLOB
> users. Is the memory overhead really big enough to
> justify that?

I assume that anyone choosing SLOB has a tiny embedded device which runs
only pre-flashed code, so that's less of an issue. If somebody can
trigger the issue remotely, there are likely also other ways to exhaust
the limited memory there?

> Thanks!


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 1/7] mm, slab: combine kmalloc_caches and kmalloc_dma_caches
  2018-07-18 13:36 ` [PATCH v3 1/7] mm, slab: combine kmalloc_caches and kmalloc_dma_caches Vlastimil Babka
  2018-07-19  8:10   ` Mel Gorman
@ 2018-07-30 15:38   ` Christopher Lameter
  1 sibling, 0 replies; 28+ messages in thread
From: Christopher Lameter @ 2018-07-30 15:38 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, linux-api, Roman Gushchin,
	Michal Hocko, Johannes Weiner, David Rientjes, Joonsoo Kim,
	Mel Gorman, Matthew Wilcox

On Wed, 18 Jul 2018, Vlastimil Babka wrote:

> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -295,12 +295,28 @@ static inline void __check_heap_object(const void *ptr, unsigned long n,
>  #define SLAB_OBJ_MIN_SIZE      (KMALLOC_MIN_SIZE < 16 ? \
>                                 (KMALLOC_MIN_SIZE) : 16)
>
> +#define KMALLOC_NORMAL	0
> +#ifdef CONFIG_ZONE_DMA
> +#define KMALLOC_DMA	1
> +#define KMALLOC_TYPES	2
> +#else
> +#define KMALLOC_TYPES	1
> +#endif

An emum would be better here I think.

But the patch is ok

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 2/7] mm, slab/slub: introduce kmalloc-reclaimable caches
  2018-07-18 13:36 ` [PATCH v3 2/7] mm, slab/slub: introduce kmalloc-reclaimable caches Vlastimil Babka
  2018-07-19  8:23   ` Mel Gorman
  2018-07-19 18:16     ` Roman Gushchin
@ 2018-07-30 15:41   ` Christopher Lameter
  2 siblings, 0 replies; 28+ messages in thread
From: Christopher Lameter @ 2018-07-30 15:41 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, linux-api, Roman Gushchin,
	Michal Hocko, Johannes Weiner, David Rientjes, Joonsoo Kim,
	Mel Gorman, Matthew Wilcox

On Wed, 18 Jul 2018, Vlastimil Babka wrote:

> index 4299c59353a1..d89e934e0d8b 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -296,11 +296,12 @@ static inline void __check_heap_object(const void *ptr, unsigned long n,
>                                 (KMALLOC_MIN_SIZE) : 16)
>
>  #define KMALLOC_NORMAL	0
> +#define KMALLOC_RECLAIM	1
>  #ifdef CONFIG_ZONE_DMA
> -#define KMALLOC_DMA	1
> -#define KMALLOC_TYPES	2
> +#define KMALLOC_DMA	2
> +#define KMALLOC_TYPES	3
>  #else
> -#define KMALLOC_TYPES	1
> +#define KMALLOC_TYPES	2
>  #endif

I like enums....

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/7] mm, slab: allocate off-slab freelists as reclaimable when appropriate
  2018-07-18 13:36 ` [PATCH v3 3/7] mm, slab: allocate off-slab freelists as reclaimable when appropriate Vlastimil Babka
  2018-07-19  8:35   ` Mel Gorman
@ 2018-07-30 15:45   ` Christopher Lameter
  1 sibling, 0 replies; 28+ messages in thread
From: Christopher Lameter @ 2018-07-30 15:45 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, linux-api, Roman Gushchin,
	Michal Hocko, Johannes Weiner, David Rientjes, Joonsoo Kim,
	Mel Gorman, Matthew Wilcox

On Wed, 18 Jul 2018, Vlastimil Babka wrote:

> In SLAB, OFF_SLAB caches allocate management structures (currently just the
> freelist) from kmalloc caches when placement in a slab page together with
> objects would lead to suboptimal memory usage. For SLAB_RECLAIM_ACCOUNT caches,
> we can allocate the freelists from the newly introduced reclaimable kmalloc
> caches, because shrinking the OFF_SLAB cache will in general result to freeing
> of the freelists as well. This should improve accounting and anti-fragmentation
> a bit.

Acked-by: Christoph Lameter <cl@linux.com>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 5/7] mm: rename and change semantics of nr_indirectly_reclaimable_bytes
  2018-07-18 13:36 ` [PATCH v3 5/7] mm: rename and change semantics of nr_indirectly_reclaimable_bytes Vlastimil Babka
@ 2018-07-30 15:46   ` Christopher Lameter
  0 siblings, 0 replies; 28+ messages in thread
From: Christopher Lameter @ 2018-07-30 15:46 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, linux-api, Roman Gushchin,
	Michal Hocko, Johannes Weiner, David Rientjes, Joonsoo Kim,
	Mel Gorman, Matthew Wilcox, Vijayanand Jitta, Laura Abbott,
	Sumit Semwal


Acked-by: Christoph Lameter <cl@linux.com>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 7/7] mm, slab: shorten kmalloc cache names for large sizes
  2018-07-18 13:36 ` [PATCH v3 7/7] mm, slab: shorten kmalloc cache names for large sizes Vlastimil Babka
  2018-07-19  8:46   ` Mel Gorman
@ 2018-07-30 15:48   ` Christopher Lameter
  2018-07-31  8:55     ` Vlastimil Babka
  1 sibling, 1 reply; 28+ messages in thread
From: Christopher Lameter @ 2018-07-30 15:48 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, linux-api, Roman Gushchin,
	Michal Hocko, Johannes Weiner, David Rientjes, Joonsoo Kim,
	Mel Gorman, Matthew Wilcox

On Wed, 18 Jul 2018, Vlastimil Babka wrote:

> +static const char *
> +kmalloc_cache_name(const char *prefix, unsigned int size)
> +{
> +
> +	static const char units[3] = "\0kM";
> +	int idx = 0;
> +
> +	while (size >= 1024 && (size % 1024 == 0)) {
> +		size /= 1024;
> +		idx++;
> +	}
> +
> +	return kasprintf(GFP_NOWAIT, "%s-%u%c", prefix, size, units[idx]);
> +}

This is likely to occur elsewhere in the kernel. Maybe generalize it a
bit?

Acked-by: Christoph Lameter <cl@linux.com>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 7/7] mm, slab: shorten kmalloc cache names for large sizes
  2018-07-30 15:48   ` Christopher Lameter
@ 2018-07-31  8:55     ` Vlastimil Babka
  0 siblings, 0 replies; 28+ messages in thread
From: Vlastimil Babka @ 2018-07-31  8:55 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Andrew Morton, linux-mm, linux-kernel, linux-api, Roman Gushchin,
	Michal Hocko, Johannes Weiner, David Rientjes, Joonsoo Kim,
	Mel Gorman, Matthew Wilcox

On 07/30/2018 05:48 PM, Christopher Lameter wrote:
> On Wed, 18 Jul 2018, Vlastimil Babka wrote:
> 
>> +static const char *
>> +kmalloc_cache_name(const char *prefix, unsigned int size)
>> +{
>> +
>> +	static const char units[3] = "\0kM";
>> +	int idx = 0;
>> +
>> +	while (size >= 1024 && (size % 1024 == 0)) {
>> +		size /= 1024;
>> +		idx++;
>> +	}
>> +
>> +	return kasprintf(GFP_NOWAIT, "%s-%u%c", prefix, size, units[idx]);
>> +}
> 
> This is likely to occur elsewhere in the kernel. Maybe generalize it a
> bit?

I'll try later on top, as that's generic printf code then.

> Acked-by: Christoph Lameter <cl@linux.com>

Thanks for all acks.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2018-07-31  8:55 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-18 13:36 [PATCH v3 0/7] kmalloc-reclaimable caches Vlastimil Babka
2018-07-18 13:36 ` [PATCH v3 1/7] mm, slab: combine kmalloc_caches and kmalloc_dma_caches Vlastimil Babka
2018-07-19  8:10   ` Mel Gorman
2018-07-20  9:30     ` Vlastimil Babka
2018-07-30 15:38   ` Christopher Lameter
2018-07-18 13:36 ` [PATCH v3 2/7] mm, slab/slub: introduce kmalloc-reclaimable caches Vlastimil Babka
2018-07-19  8:23   ` Mel Gorman
2018-07-20  9:32     ` Vlastimil Babka
2018-07-19 18:16   ` Roman Gushchin
2018-07-19 18:16     ` Roman Gushchin
2018-07-20  9:35     ` Vlastimil Babka
2018-07-30 15:41   ` Christopher Lameter
2018-07-18 13:36 ` [PATCH v3 3/7] mm, slab: allocate off-slab freelists as reclaimable when appropriate Vlastimil Babka
2018-07-19  8:35   ` Mel Gorman
2018-07-20  9:37     ` Vlastimil Babka
2018-07-30 15:45   ` Christopher Lameter
2018-07-18 13:36 ` [PATCH v3 4/7] dcache: allocate external names from reclaimable kmalloc caches Vlastimil Babka
2018-07-19  8:42   ` Mel Gorman
2018-07-18 13:36 ` [PATCH v3 5/7] mm: rename and change semantics of nr_indirectly_reclaimable_bytes Vlastimil Babka
2018-07-30 15:46   ` Christopher Lameter
2018-07-18 13:36 ` [PATCH v3 6/7] mm, proc: add KReclaimable to /proc/meminfo Vlastimil Babka
2018-07-18 13:36 ` [PATCH v3 7/7] mm, slab: shorten kmalloc cache names for large sizes Vlastimil Babka
2018-07-19  8:46   ` Mel Gorman
2018-07-30 15:48   ` Christopher Lameter
2018-07-31  8:55     ` Vlastimil Babka
2018-07-19 19:53 ` [PATCH v3 0/7] kmalloc-reclaimable caches Roman Gushchin
2018-07-19 19:53   ` Roman Gushchin
2018-07-20  9:45   ` Vlastimil Babka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.