All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] kmalloc-reclaimable caches
@ 2018-05-24 11:00 Vlastimil Babka
  2018-05-24 11:00 ` [RFC PATCH 1/5] mm, slab/slub: introduce " Vlastimil Babka
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: Vlastimil Babka @ 2018-05-24 11:00 UTC (permalink / raw)
  To: linux-mm
  Cc: Roman Gushchin, Michal Hocko, Johannes Weiner, linux-kernel,
	linux-api, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Mel Gorman, Vijayanand Jitta, Vlastimil Babka

Hi,

as discussed at LSF/MM [1] here's a RFC patchset that introduces
kmalloc-reclaimable caches (more details in the first patch) and uses them
for SLAB freelists and dcache external names. The latter allows us to
repurpose the NR_INDIRECTLY_RECLAIMABLE_BYTES counter later in the series.

This is how /proc/slabinfo looks like after booting in virtme:

...
kmalloc-reclaimable-4194304      0      0 4194304    1 1024 : tunables    1    1    0 : slabdata      0      0      0
...
kmalloc-reclaimable-96     17     64    128   32    1 : tunables  120   60    8 : slabdata      2      2      0
kmalloc-reclaimable-64     50    128     64   64    1 : tunables  120   60    8 : slabdata      2      2      6
kmalloc-reclaimable-32      0      0     32  124    1 : tunables  120   60    8 : slabdata      0      0      0
kmalloc-4194304        0      0 4194304    1 1024 : tunables    1    1    0 : slabdata      0      0      0
...
kmalloc-64          2888   2944     64   64    1 : tunables  120   60    8 : slabdata     46     46    454
kmalloc-32          4325   4712     32  124    1 : tunables  120   60    8 : slabdata     38     38    563
kmalloc-128         1178   1216    128   32    1 : tunables  120   60    8 : slabdata     38     38    114
...

/proc/vmstat with new/renamed nr_reclaimable counter (patch 4):

...
nr_slab_reclaimable 2817
nr_slab_unreclaimable 1781
...
nr_reclaimable 2817
...

/proc/meminfo with exposed nr_reclaimable counter (patch 5):

...
AnonPages:          8624 kB
Mapped:             3340 kB
Shmem:               564 kB
Reclaimable:       11272 kB
Slab:              18368 kB
SReclaimable:      11272 kB
SUnreclaim:         7096 kB
KernelStack:        1168 kB
PageTables:          448 kB
...

Now for the issues a.k.a. why RFC:

- I haven't find any other obvious users for reclaimable kmalloc (yet)
- the name of caches kmalloc-reclaimable-X is rather long
- the vmstat/meminfo counter name is rather general and might suggest it also
  includes reclaimable page caches, which it doesn't

Suggestions welcome for all three points. For the last one, we might also keep
the counter separate from nr_slab_reclaimable, not superset. I did a superset
as IIRC somebody suggested that in the older threads or at LSF.

Thanks,
Vlastimil


[1] https://lwn.net/Articles/753154/

Vlastimil Babka (5):
  mm, slab/slub: introduce kmalloc-reclaimable caches
  mm, slab: allocate off-slab freelists as reclaimable when appropriate
  dcache: allocate external names from reclaimable kmalloc caches
  mm: rename and change semantics of nr_indirectly_reclaimable_bytes
  mm, proc: add NR_RECLAIMABLE to /proc/meminfo

 drivers/base/node.c                         |  2 +
 drivers/staging/android/ion/ion_page_pool.c |  4 +-
 fs/dcache.c                                 | 40 ++++-----------
 fs/proc/meminfo.c                           |  3 +-
 include/linux/mmzone.h                      |  2 +-
 include/linux/slab.h                        | 17 +++++--
 mm/page_alloc.c                             | 15 ++----
 mm/slab.c                                   | 23 ++++++---
 mm/slab_common.c                            | 56 ++++++++++++++-------
 mm/slub.c                                   | 12 ++---
 mm/util.c                                   | 16 ++----
 mm/vmstat.c                                 |  6 +--
 12 files changed, 99 insertions(+), 97 deletions(-)

-- 
2.17.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC PATCH 1/5] mm, slab/slub: introduce kmalloc-reclaimable caches
  2018-05-24 11:00 [RFC PATCH 0/5] kmalloc-reclaimable caches Vlastimil Babka
@ 2018-05-24 11:00 ` Vlastimil Babka
  2018-05-25 15:51   ` Christopher Lameter
  2018-05-24 11:00 ` [RFC PATCH 2/5] mm, slab: allocate off-slab freelists as reclaimable when appropriate Vlastimil Babka
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Vlastimil Babka @ 2018-05-24 11:00 UTC (permalink / raw)
  To: linux-mm
  Cc: Roman Gushchin, Michal Hocko, Johannes Weiner, linux-kernel,
	linux-api, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Mel Gorman, Vijayanand Jitta, Vlastimil Babka

Kmem caches can be created with a SLAB_RECLAIM_ACCOUNT flag, which indicates
they contain objects which can be reclaimed under memory pressure (typically
through a shrinker). This makes the slab pages accounted as NR_SLAB_RECLAIMABLE
in vmstat, which is reflected also the MemAvailable meminfo counter and in
overcommit decisions. The slab pages are also allocated with __GFP_RECLAIMABLE,
which is good for anti-fragmentation through grouping pages by mobility.

The generic kmalloc-X caches are created without this flag, but sometimes are
used also for objects that can be reclaimed, which due to varying size cannot
have a dedicated kmem cache with SLAB_RECLAIM_ACCOUNT flag. A prominent example
are dcache external names, which prompted the creation of a new, manually
managed vmstat counter NR_INDIRECTLY_RECLAIMABLE_BYTES in commit f1782c9bc547
("dcache: account external names as indirectly reclaimable memory").

To better handle this and any other similar cases, this patch introduces
SLAB_RECLAIM_ACCOUNT variants of kmalloc caches, named kmalloc-reclaimable-X.
They are used whenever the kmalloc() call passes __GFP_RECLAIMABLE among gfp
flags. The kmalloc_caches[size_idx] array is extended to a two-dimensional
array kmalloc_caches[reclaimable][size_idx] to avoid an extra branch testing
for the flag.

This change only applies to SLAB and SLUB, not SLOB. This is fine, since SLOB's
target are tiny system and this patch does add some overhead of kmem management
objects.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/slab.h | 17 ++++++++++----
 mm/slab.c            |  4 ++--
 mm/slab_common.c     | 56 ++++++++++++++++++++++++++++++--------------
 mm/slub.c            | 12 +++++-----
 4 files changed, 58 insertions(+), 31 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 9ebe659bd4a5..5bff0571b360 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -296,11 +296,16 @@ static inline void __check_heap_object(const void *ptr, unsigned long n,
                                (KMALLOC_MIN_SIZE) : 16)
 
 #ifndef CONFIG_SLOB
-extern struct kmem_cache *kmalloc_caches[KMALLOC_SHIFT_HIGH + 1];
+extern struct kmem_cache *kmalloc_caches[2][KMALLOC_SHIFT_HIGH + 1];
 #ifdef CONFIG_ZONE_DMA
 extern struct kmem_cache *kmalloc_dma_caches[KMALLOC_SHIFT_HIGH + 1];
 #endif
 
+static __always_inline unsigned int kmalloc_reclaimable(gfp_t flags)
+{
+	return !!(flags & __GFP_RECLAIMABLE);
+}
+
 /*
  * Figure out which kmalloc slab an allocation of a certain size
  * belongs to.
@@ -536,12 +541,13 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags)
 #ifndef CONFIG_SLOB
 		if (!(flags & GFP_DMA)) {
 			unsigned int index = kmalloc_index(size);
+			unsigned int recl = kmalloc_reclaimable(flags);
 
 			if (!index)
 				return ZERO_SIZE_PTR;
 
-			return kmem_cache_alloc_trace(kmalloc_caches[index],
-					flags, size);
+			return kmem_cache_alloc_trace(
+				kmalloc_caches[recl][index], flags, size);
 		}
 #endif
 	}
@@ -588,12 +594,13 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
 	if (__builtin_constant_p(size) &&
 		size <= KMALLOC_MAX_CACHE_SIZE && !(flags & GFP_DMA)) {
 		unsigned int i = kmalloc_index(size);
+		unsigned int recl = kmalloc_reclaimable(flags);
 
 		if (!i)
 			return ZERO_SIZE_PTR;
 
-		return kmem_cache_alloc_node_trace(kmalloc_caches[i],
-						flags, node, size);
+		return kmem_cache_alloc_node_trace(
+			kmalloc_caches[recl][i], flags, node, size);
 	}
 #endif
 	return __kmalloc_node(size, flags, node);
diff --git a/mm/slab.c b/mm/slab.c
index c1fe8099b3cd..8d7e1f06127b 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1290,7 +1290,7 @@ void __init kmem_cache_init(void)
 	 * Initialize the caches that provide memory for the  kmem_cache_node
 	 * structures first.  Without this, further allocations will bug.
 	 */
-	kmalloc_caches[INDEX_NODE] = create_kmalloc_cache(
+	kmalloc_caches[0][INDEX_NODE] = create_kmalloc_cache(
 				kmalloc_info[INDEX_NODE].name,
 				kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS,
 				0, kmalloc_size(INDEX_NODE));
@@ -1306,7 +1306,7 @@ void __init kmem_cache_init(void)
 		for_each_online_node(nid) {
 			init_list(kmem_cache, &init_kmem_cache_node[CACHE_CACHE + nid], nid);
 
-			init_list(kmalloc_caches[INDEX_NODE],
+			init_list(kmalloc_caches[0][INDEX_NODE],
 					  &init_kmem_cache_node[SIZE_NODE + nid], nid);
 		}
 	}
diff --git a/mm/slab_common.c b/mm/slab_common.c
index b0dd9db1eb2f..d9a66095de0d 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -938,7 +938,11 @@ struct kmem_cache *__init create_kmalloc_cache(const char *name,
 	return s;
 }
 
-struct kmem_cache *kmalloc_caches[KMALLOC_SHIFT_HIGH + 1] __ro_after_init;
+/*
+ * kmalloc_caches[0][] - kmalloc caches for non-reclaimable allocations
+ * kmalloc_caches[1][] - kmalloc caches for __GFP_RECLAIMABLE allocations
+ */
+struct kmem_cache *kmalloc_caches[2][KMALLOC_SHIFT_HIGH + 1] __ro_after_init;
 EXPORT_SYMBOL(kmalloc_caches);
 
 #ifdef CONFIG_ZONE_DMA
@@ -1010,7 +1014,7 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags)
 		return kmalloc_dma_caches[index];
 
 #endif
-	return kmalloc_caches[index];
+	return kmalloc_caches[kmalloc_reclaimable(flags)][index];
 }
 
 /*
@@ -1082,9 +1086,21 @@ void __init setup_kmalloc_cache_index_table(void)
 	}
 }
 
-static void __init new_kmalloc_cache(int idx, slab_flags_t flags)
+static void __init
+new_kmalloc_cache(int idx, int reclaimable, slab_flags_t flags)
 {
-	kmalloc_caches[idx] = create_kmalloc_cache(kmalloc_info[idx].name,
+	const char * name;
+
+	if (reclaimable) {
+		flags |= SLAB_RECLAIM_ACCOUNT;
+		name = kasprintf(GFP_NOWAIT, "kmalloc-reclaimable-%u",
+						kmalloc_info[idx].size);
+		BUG_ON(!name);
+	} else {
+		name = kmalloc_info[idx].name;
+	}
+
+	kmalloc_caches[reclaimable][idx] = create_kmalloc_cache(name,
 					kmalloc_info[idx].size, flags, 0,
 					kmalloc_info[idx].size);
 }
@@ -1096,21 +1112,25 @@ static void __init new_kmalloc_cache(int idx, slab_flags_t flags)
  */
 void __init create_kmalloc_caches(slab_flags_t flags)
 {
-	int i;
+	int i, reclaimable;
 
-	for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) {
-		if (!kmalloc_caches[i])
-			new_kmalloc_cache(i, flags);
+	for (reclaimable = 0; reclaimable <= 1; reclaimable++) {
+		for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) {
+			if (!kmalloc_caches[reclaimable][i])
+				new_kmalloc_cache(i, reclaimable, flags);
 
-		/*
-		 * Caches that are not of the two-to-the-power-of size.
-		 * These have to be created immediately after the
-		 * earlier power of two caches
-		 */
-		if (KMALLOC_MIN_SIZE <= 32 && !kmalloc_caches[1] && i == 6)
-			new_kmalloc_cache(1, flags);
-		if (KMALLOC_MIN_SIZE <= 64 && !kmalloc_caches[2] && i == 7)
-			new_kmalloc_cache(2, flags);
+			/*
+			 * Caches that are not of the two-to-the-power-of size.
+			 * These have to be created immediately after the
+			 * earlier power of two caches
+			 */
+			if (KMALLOC_MIN_SIZE <= 32 && i == 6 &&
+						!kmalloc_caches[reclaimable][1])
+				new_kmalloc_cache(1, reclaimable, flags);
+			if (KMALLOC_MIN_SIZE <= 64 && i == 7 &&
+						!kmalloc_caches[reclaimable][2])
+				new_kmalloc_cache(2, reclaimable, flags);
+		}
 	}
 
 	/* Kmalloc array is now usable */
@@ -1118,7 +1138,7 @@ void __init create_kmalloc_caches(slab_flags_t flags)
 
 #ifdef CONFIG_ZONE_DMA
 	for (i = 0; i <= KMALLOC_SHIFT_HIGH; i++) {
-		struct kmem_cache *s = kmalloc_caches[i];
+		struct kmem_cache *s = kmalloc_caches[0][i];
 
 		if (s) {
 			unsigned int size = kmalloc_size(i);
diff --git a/mm/slub.c b/mm/slub.c
index 48f75872c356..c7d7b83f20c2 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4707,7 +4707,7 @@ static void __init resiliency_test(void)
 	pr_err("\n1. kmalloc-16: Clobber Redzone/next pointer 0x12->0x%p\n\n",
 	       p + 16);
 
-	validate_slab_cache(kmalloc_caches[4]);
+	validate_slab_cache(kmalloc_caches[0][4]);
 
 	/* Hmmm... The next two are dangerous */
 	p = kzalloc(32, GFP_KERNEL);
@@ -4716,33 +4716,33 @@ static void __init resiliency_test(void)
 	       p);
 	pr_err("If allocated object is overwritten then not detectable\n\n");
 
-	validate_slab_cache(kmalloc_caches[5]);
+	validate_slab_cache(kmalloc_caches[0][5]);
 	p = kzalloc(64, GFP_KERNEL);
 	p += 64 + (get_cycles() & 0xff) * sizeof(void *);
 	*p = 0x56;
 	pr_err("\n3. kmalloc-64: corrupting random byte 0x56->0x%p\n",
 	       p);
 	pr_err("If allocated object is overwritten then not detectable\n\n");
-	validate_slab_cache(kmalloc_caches[6]);
+	validate_slab_cache(kmalloc_caches[0][6]);
 
 	pr_err("\nB. Corruption after free\n");
 	p = kzalloc(128, GFP_KERNEL);
 	kfree(p);
 	*p = 0x78;
 	pr_err("1. kmalloc-128: Clobber first word 0x78->0x%p\n\n", p);
-	validate_slab_cache(kmalloc_caches[7]);
+	validate_slab_cache(kmalloc_caches[0][7]);
 
 	p = kzalloc(256, GFP_KERNEL);
 	kfree(p);
 	p[50] = 0x9a;
 	pr_err("\n2. kmalloc-256: Clobber 50th byte 0x9a->0x%p\n\n", p);
-	validate_slab_cache(kmalloc_caches[8]);
+	validate_slab_cache(kmalloc_caches[0][8]);
 
 	p = kzalloc(512, GFP_KERNEL);
 	kfree(p);
 	p[512] = 0xab;
 	pr_err("\n3. kmalloc-512: Clobber redzone 0xab->0x%p\n\n", p);
-	validate_slab_cache(kmalloc_caches[9]);
+	validate_slab_cache(kmalloc_caches[0][9]);
 }
 #else
 #ifdef CONFIG_SYSFS
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 2/5] mm, slab: allocate off-slab freelists as reclaimable when appropriate
  2018-05-24 11:00 [RFC PATCH 0/5] kmalloc-reclaimable caches Vlastimil Babka
  2018-05-24 11:00 ` [RFC PATCH 1/5] mm, slab/slub: introduce " Vlastimil Babka
@ 2018-05-24 11:00 ` Vlastimil Babka
  2018-05-24 11:00 ` [RFC PATCH 3/5] dcache: allocate external names from reclaimable kmalloc caches Vlastimil Babka
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Vlastimil Babka @ 2018-05-24 11:00 UTC (permalink / raw)
  To: linux-mm
  Cc: Roman Gushchin, Michal Hocko, Johannes Weiner, linux-kernel,
	linux-api, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Mel Gorman, Vijayanand Jitta, Vlastimil Babka

In SLAB, OFF_SLAB caches allocate management structures (currently just the
freelist) from kmalloc caches when placement in a slab page together with
objects would lead to suboptimal memory usage. For SLAB_RECLAIM_ACCOUNT caches,
we can allocate the freelists from the newly introduced reclaimable kmalloc
caches, because shrinking the OFF_SLAB cache will in general result to freeing
of the freelists as well. This should improve accounting and anti-fragmentation
a bit.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slab.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/slab.c b/mm/slab.c
index 8d7e1f06127b..4dd7d73a1972 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2142,8 +2142,13 @@ int __kmem_cache_create(struct kmem_cache *cachep, slab_flags_t flags)
 #endif
 
 	if (OFF_SLAB(cachep)) {
+		/*
+		 * If this cache is reclaimable, allocate also freelists from
+		 * a reclaimable kmalloc cache.
+		 */
 		cachep->freelist_cache =
-			kmalloc_slab(cachep->freelist_size, 0u);
+			kmalloc_slab(cachep->freelist_size,
+				     cachep->allocflags & __GFP_RECLAIMABLE);
 	}
 
 	err = setup_cpu_cache(cachep, gfp);
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 3/5] dcache: allocate external names from reclaimable kmalloc caches
  2018-05-24 11:00 [RFC PATCH 0/5] kmalloc-reclaimable caches Vlastimil Babka
  2018-05-24 11:00 ` [RFC PATCH 1/5] mm, slab/slub: introduce " Vlastimil Babka
  2018-05-24 11:00 ` [RFC PATCH 2/5] mm, slab: allocate off-slab freelists as reclaimable when appropriate Vlastimil Babka
@ 2018-05-24 11:00 ` Vlastimil Babka
  2018-05-24 11:00 ` [RFC PATCH 4/5] mm: rename and change semantics of nr_indirectly_reclaimable_bytes Vlastimil Babka
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Vlastimil Babka @ 2018-05-24 11:00 UTC (permalink / raw)
  To: linux-mm
  Cc: Roman Gushchin, Michal Hocko, Johannes Weiner, linux-kernel,
	linux-api, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Mel Gorman, Vijayanand Jitta, Vlastimil Babka

We can use the newly introduced kmalloc-reclaimable-X caches, to allocate
external names in dcache, which will take care of the proper accounting
automatically, and also improve anti-fragmentation page grouping.

This effectively reverts commit f1782c9bc547 ("dcache: account external names
as indirectly reclaimable memory") and instead passes __GFP_RECLAIMABLE to
kmalloc(). The accounting thus moves from NR_INDIRECTLY_RECLAIMABLE_BYTES to
NR_SLAB_RECLAIMABLE, which is also considered in MemAvailable calculation and
overcommit decisions.

This reverts commit f1782c9bc547754f4bd3043fe8cfda53db85f13f.
---
 fs/dcache.c | 40 ++++++++++------------------------------
 1 file changed, 10 insertions(+), 30 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index c30a8ae46096..3346034d4520 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -257,25 +257,11 @@ static void __d_free(struct rcu_head *head)
 	kmem_cache_free(dentry_cache, dentry); 
 }
 
-static void __d_free_external_name(struct rcu_head *head)
-{
-	struct external_name *name = container_of(head, struct external_name,
-						  u.head);
-
-	mod_node_page_state(page_pgdat(virt_to_page(name)),
-			    NR_INDIRECTLY_RECLAIMABLE_BYTES,
-			    -ksize(name));
-
-	kfree(name);
-}
-
 static void __d_free_external(struct rcu_head *head)
 {
 	struct dentry *dentry = container_of(head, struct dentry, d_u.d_rcu);
-
-	__d_free_external_name(&external_name(dentry)->u.head);
-
-	kmem_cache_free(dentry_cache, dentry);
+	kfree(external_name(dentry));
+	kmem_cache_free(dentry_cache, dentry); 
 }
 
 static inline int dname_external(const struct dentry *dentry)
@@ -306,7 +292,7 @@ void release_dentry_name_snapshot(struct name_snapshot *name)
 		struct external_name *p;
 		p = container_of(name->name, struct external_name, name[0]);
 		if (unlikely(atomic_dec_and_test(&p->u.count)))
-			call_rcu(&p->u.head, __d_free_external_name);
+			kfree_rcu(p, u.head);
 	}
 }
 EXPORT_SYMBOL(release_dentry_name_snapshot);
@@ -1609,7 +1595,6 @@ EXPORT_SYMBOL(d_invalidate);
  
 struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 {
-	struct external_name *ext = NULL;
 	struct dentry *dentry;
 	char *dname;
 	int err;
@@ -1630,14 +1615,15 @@ struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 		dname = dentry->d_iname;
 	} else if (name->len > DNAME_INLINE_LEN-1) {
 		size_t size = offsetof(struct external_name, name[1]);
-
-		ext = kmalloc(size + name->len, GFP_KERNEL_ACCOUNT);
-		if (!ext) {
+		struct external_name *p = kmalloc(size + name->len,
+						  GFP_KERNEL_ACCOUNT |
+						  __GFP_RECLAIMABLE);
+		if (!p) {
 			kmem_cache_free(dentry_cache, dentry); 
 			return NULL;
 		}
-		atomic_set(&ext->u.count, 1);
-		dname = ext->name;
+		atomic_set(&p->u.count, 1);
+		dname = p->name;
 	} else  {
 		dname = dentry->d_iname;
 	}	
@@ -1676,12 +1662,6 @@ struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 		}
 	}
 
-	if (unlikely(ext)) {
-		pg_data_t *pgdat = page_pgdat(virt_to_page(ext));
-		mod_node_page_state(pgdat, NR_INDIRECTLY_RECLAIMABLE_BYTES,
-				    ksize(ext));
-	}
-
 	this_cpu_inc(nr_dentry);
 
 	return dentry;
@@ -2762,7 +2742,7 @@ static void copy_name(struct dentry *dentry, struct dentry *target)
 		dentry->d_name.hash_len = target->d_name.hash_len;
 	}
 	if (old_name && likely(atomic_dec_and_test(&old_name->u.count)))
-		call_rcu(&old_name->u.head, __d_free_external_name);
+		kfree_rcu(old_name, u.head);
 }
 
 /*
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 4/5] mm: rename and change semantics of nr_indirectly_reclaimable_bytes
  2018-05-24 11:00 [RFC PATCH 0/5] kmalloc-reclaimable caches Vlastimil Babka
                   ` (2 preceding siblings ...)
  2018-05-24 11:00 ` [RFC PATCH 3/5] dcache: allocate external names from reclaimable kmalloc caches Vlastimil Babka
@ 2018-05-24 11:00 ` Vlastimil Babka
  2018-05-25 15:59   ` Christopher Lameter
  2018-05-24 11:00 ` [RFC PATCH 5/5] mm, proc: add NR_RECLAIMABLE to /proc/meminfo Vlastimil Babka
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Vlastimil Babka @ 2018-05-24 11:00 UTC (permalink / raw)
  To: linux-mm
  Cc: Roman Gushchin, Michal Hocko, Johannes Weiner, linux-kernel,
	linux-api, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Mel Gorman, Vijayanand Jitta, Vlastimil Babka

The vmstat counter NR_INDIRECTLY_RECLAIMABLE_BYTES was introduced by commit
eb59254608bc ("mm: introduce NR_INDIRECTLY_RECLAIMABLE_BYTES") with the goal of
accounting objects that can be reclaimed, but cannot be allocated via a
SLAB_RECLAIM_ACCOUNT cache. This is now possible via kmalloc() with
__GFP_RECLAIMABLE flag, and the dcache external names user is converted.

The counter is however still useful for accounting direct page allocations
(i.e. not slab) with a shrinker, such as the ION page pool. So keep it, and:

- change granularity to pages to be more like other counters; sub-page
  allocations should be able to use kmalloc
- rename the counter to NR_RECLAIMABLE
- expose the counter again in vmstat as "nr_reclaimable"; we can again remove
  the check for not printing "hidden" counters
- make the counter include also SLAB_RECLAIM_ACCOUNT, so it covers all
  shrinker-based (i.e. not page cache) reclaimable pages

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 drivers/staging/android/ion/ion_page_pool.c |  4 ++--
 include/linux/mmzone.h                      |  2 +-
 mm/page_alloc.c                             | 15 ++++-----------
 mm/slab.c                                   | 12 ++++++++----
 mm/util.c                                   | 16 +++++-----------
 mm/vmstat.c                                 |  6 +-----
 6 files changed, 21 insertions(+), 34 deletions(-)

diff --git a/drivers/staging/android/ion/ion_page_pool.c b/drivers/staging/android/ion/ion_page_pool.c
index 9bc56eb48d2a..11e6e694f425 100644
--- a/drivers/staging/android/ion/ion_page_pool.c
+++ b/drivers/staging/android/ion/ion_page_pool.c
@@ -33,8 +33,8 @@ static void ion_page_pool_add(struct ion_page_pool *pool, struct page *page)
 		pool->low_count++;
 	}
 
-	mod_node_page_state(page_pgdat(page), NR_INDIRECTLY_RECLAIMABLE_BYTES,
-			    (1 << (PAGE_SHIFT + pool->order)));
+	mod_node_page_state(page_pgdat(page), NR_RECLAIMABLE,
+							1 << pool->order);
 	mutex_unlock(&pool->mutex);
 }
 
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 32699b2dc52a..4343948f33e5 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -180,7 +180,7 @@ enum node_stat_item {
 	NR_VMSCAN_IMMEDIATE,	/* Prioritise for reclaim when writeback ends */
 	NR_DIRTIED,		/* page dirtyings since bootup */
 	NR_WRITTEN,		/* page writings since bootup */
-	NR_INDIRECTLY_RECLAIMABLE_BYTES, /* measured in bytes */
+	NR_RECLAIMABLE,         /* all reclaimable pages, including slab */
 	NR_VM_NODE_STAT_ITEMS
 };
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 249546393bd6..6f22fec0df54 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4708,6 +4708,7 @@ long si_mem_available(void)
 	unsigned long pagecache;
 	unsigned long wmark_low = 0;
 	unsigned long pages[NR_LRU_LISTS];
+	unsigned long reclaimable;
 	struct zone *zone;
 	int lru;
 
@@ -4733,19 +4734,11 @@ long si_mem_available(void)
 	available += pagecache;
 
 	/*
-	 * Part of the reclaimable slab consists of items that are in use,
+	 * Part of the reclaimable pages consists of items that are in use,
 	 * and cannot be freed. Cap this estimate at the low watermark.
 	 */
-	available += global_node_page_state(NR_SLAB_RECLAIMABLE) -
-		     min(global_node_page_state(NR_SLAB_RECLAIMABLE) / 2,
-			 wmark_low);
-
-	/*
-	 * Part of the kernel memory, which can be released under memory
-	 * pressure.
-	 */
-	available += global_node_page_state(NR_INDIRECTLY_RECLAIMABLE_BYTES) >>
-		PAGE_SHIFT;
+	reclaimable = global_node_page_state(NR_RECLAIMABLE);
+	available += reclaimable - min(reclaimable / 2,  wmark_low);
 
 	if (available < 0)
 		available = 0;
diff --git a/mm/slab.c b/mm/slab.c
index 4dd7d73a1972..a2a8c0802253 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1420,10 +1420,12 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
 	}
 
 	nr_pages = (1 << cachep->gfporder);
-	if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
+	if (cachep->flags & SLAB_RECLAIM_ACCOUNT) {
 		mod_lruvec_page_state(page, NR_SLAB_RECLAIMABLE, nr_pages);
-	else
+		mod_node_page_state(page_pgdat(page), NR_RECLAIMABLE, nr_pages);
+	} else {
 		mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE, nr_pages);
+	}
 
 	__SetPageSlab(page);
 	/* Record if ALLOC_NO_WATERMARKS was set when allocating the slab */
@@ -1441,10 +1443,12 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
 	int order = cachep->gfporder;
 	unsigned long nr_freed = (1 << order);
 
-	if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
+	if (cachep->flags & SLAB_RECLAIM_ACCOUNT) {
 		mod_lruvec_page_state(page, NR_SLAB_RECLAIMABLE, -nr_freed);
-	else
+		mod_node_page_state(page_pgdat(page), NR_RECLAIMABLE, -nr_freed);
+	} else {
 		mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE, -nr_freed);
+	}
 
 	BUG_ON(!PageSlab(page));
 	__ClearPageSlabPfmemalloc(page);
diff --git a/mm/util.c b/mm/util.c
index 98180a994895..3ffd92a9778a 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -662,19 +662,13 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
 		free += get_nr_swap_pages();
 
 		/*
-		 * Any slabs which are created with the
+		 * Pages accounted as reclaimable.
+		 * This includes any slabs which are created with the
 		 * SLAB_RECLAIM_ACCOUNT flag claim to have contents
-		 * which are reclaimable, under pressure.  The dentry
-		 * cache and most inode caches should fall into this
+		 * which are reclaimable, under pressure. The dentry
+		 * cache and most inode caches should fall into this.
 		 */
-		free += global_node_page_state(NR_SLAB_RECLAIMABLE);
-
-		/*
-		 * Part of the kernel memory, which can be released
-		 * under memory pressure.
-		 */
-		free += global_node_page_state(
-			NR_INDIRECTLY_RECLAIMABLE_BYTES) >> PAGE_SHIFT;
+		free += global_node_page_state(NR_RECLAIMABLE);
 
 		/*
 		 * Leave reserved pages. The pages are not for anonymous pages.
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 75eda9c2b260..21d571da9d5a 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1161,7 +1161,7 @@ const char * const vmstat_text[] = {
 	"nr_vmscan_immediate_reclaim",
 	"nr_dirtied",
 	"nr_written",
-	"", /* nr_indirectly_reclaimable */
+	"nr_reclaimable",
 
 	/* enum writeback_stat_item counters */
 	"nr_dirty_threshold",
@@ -1704,10 +1704,6 @@ static int vmstat_show(struct seq_file *m, void *arg)
 	unsigned long *l = arg;
 	unsigned long off = l - (unsigned long *)m->private;
 
-	/* Skip hidden vmstat items. */
-	if (*vmstat_text[off] == '\0')
-		return 0;
-
 	seq_puts(m, vmstat_text[off]);
 	seq_put_decimal_ull(m, " ", *l);
 	seq_putc(m, '\n');
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 5/5] mm, proc: add NR_RECLAIMABLE to /proc/meminfo
  2018-05-24 11:00 [RFC PATCH 0/5] kmalloc-reclaimable caches Vlastimil Babka
                   ` (3 preceding siblings ...)
  2018-05-24 11:00 ` [RFC PATCH 4/5] mm: rename and change semantics of nr_indirectly_reclaimable_bytes Vlastimil Babka
@ 2018-05-24 11:00 ` Vlastimil Babka
  2018-05-24 11:43 ` [RFC PATCH 0/5] kmalloc-reclaimable caches Matthew Wilcox
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Vlastimil Babka @ 2018-05-24 11:00 UTC (permalink / raw)
  To: linux-mm
  Cc: Roman Gushchin, Michal Hocko, Johannes Weiner, linux-kernel,
	linux-api, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Mel Gorman, Vijayanand Jitta, Vlastimil Babka

The vmstat NR_RECLAIMABLE counter is a superset of NR_SLAB_RECLAIMABLE and
other non-slab allocations that can be reclaimed via shrinker. Make it visible
also in /proc/meminfo and /sys/...node info.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 drivers/base/node.c | 2 ++
 fs/proc/meminfo.c   | 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index a5e821d09656..b35e3627eab7 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -118,6 +118,7 @@ static ssize_t node_read_meminfo(struct device *dev,
 		       "Node %d NFS_Unstable:   %8lu kB\n"
 		       "Node %d Bounce:         %8lu kB\n"
 		       "Node %d WritebackTmp:   %8lu kB\n"
+		       "Node %d Reclaimable:    %8lu kB\n"
 		       "Node %d Slab:           %8lu kB\n"
 		       "Node %d SReclaimable:   %8lu kB\n"
 		       "Node %d SUnreclaim:     %8lu kB\n"
@@ -138,6 +139,7 @@ static ssize_t node_read_meminfo(struct device *dev,
 		       nid, K(node_page_state(pgdat, NR_UNSTABLE_NFS)),
 		       nid, K(sum_zone_node_page_state(nid, NR_BOUNCE)),
 		       nid, K(node_page_state(pgdat, NR_WRITEBACK_TEMP)),
+		       nid, K(node_page_state(pgdat, NR_RECLAIMABLE)),
 		       nid, K(node_page_state(pgdat, NR_SLAB_RECLAIMABLE) +
 			      node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE)),
 		       nid, K(node_page_state(pgdat, NR_SLAB_RECLAIMABLE)),
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 2fb04846ed11..6ca0158a9ebd 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -93,10 +93,11 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 	show_val_kb(m, "Mapped:         ",
 		    global_node_page_state(NR_FILE_MAPPED));
 	show_val_kb(m, "Shmem:          ", i.sharedram);
+	show_val_kb(m, "Reclaimable:    ",
+		    global_node_page_state(NR_RECLAIMABLE));
 	show_val_kb(m, "Slab:           ",
 		    global_node_page_state(NR_SLAB_RECLAIMABLE) +
 		    global_node_page_state(NR_SLAB_UNRECLAIMABLE));
-
 	show_val_kb(m, "SReclaimable:   ",
 		    global_node_page_state(NR_SLAB_RECLAIMABLE));
 	show_val_kb(m, "SUnreclaim:     ",
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 0/5] kmalloc-reclaimable caches
  2018-05-24 11:00 [RFC PATCH 0/5] kmalloc-reclaimable caches Vlastimil Babka
                   ` (4 preceding siblings ...)
  2018-05-24 11:00 ` [RFC PATCH 5/5] mm, proc: add NR_RECLAIMABLE to /proc/meminfo Vlastimil Babka
@ 2018-05-24 11:43 ` Matthew Wilcox
  2018-05-24 16:18   ` Randy Dunlap
  2018-05-24 12:13   ` Roman Gushchin
  2018-05-24 15:32 ` Johannes Weiner
  7 siblings, 1 reply; 20+ messages in thread
From: Matthew Wilcox @ 2018-05-24 11:43 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Roman Gushchin, Michal Hocko, Johannes Weiner,
	linux-kernel, linux-api, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Mel Gorman, Vijayanand Jitta

On Thu, May 24, 2018 at 01:00:06PM +0200, Vlastimil Babka wrote:
> Now for the issues a.k.a. why RFC:
> 
> - I haven't find any other obvious users for reclaimable kmalloc (yet)

Is that a problem?  This sounds like it's enough to solve Facebook's
problem.

> - the name of caches kmalloc-reclaimable-X is rather long

Yes; Christoph and I were talking about restricting slab names to 16 bytes
just to make /proc/slabinfo easier to read.  How about

kmalloc-rec-128k
1234567890123456

Just makes it ;-)

Of course, somebody needs to do the work to use k/M instead of 4194304.
We also need to bikeshed about when to switch; should it be:

kmalloc-rec-512
kmalloc-rec-1024
kmalloc-rec-2048
kmalloc-rec-4096
kmalloc-rec-8192
kmalloc-rec-16k

or should it be

kmalloc-rec-512
kmalloc-rec-1k
kmalloc-rec-2k
kmalloc-rec-4k
kmalloc-rec-8k
kmalloc-rec-16k

I slightly favour the latter as it'll be easier to implement.  Something like

	static const char suffixes[3] = ' kM';
	int idx = 0;

	while (size > 1024) {
		size /= 1024;
		idx++;
	}

	sprintf("%d%c", size, suffices[idx]);


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 0/5] kmalloc-reclaimable caches
  2018-05-24 11:00 [RFC PATCH 0/5] kmalloc-reclaimable caches Vlastimil Babka
@ 2018-05-24 12:13   ` Roman Gushchin
  2018-05-24 11:00 ` [RFC PATCH 2/5] mm, slab: allocate off-slab freelists as reclaimable when appropriate Vlastimil Babka
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Roman Gushchin @ 2018-05-24 12:13 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Michal Hocko, Johannes Weiner, linux-kernel, linux-api,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Mel Gorman, Vijayanand Jitta

On Thu, May 24, 2018 at 01:00:06PM +0200, Vlastimil Babka wrote:
> Hi,
> 
> as discussed at LSF/MM [1] here's a RFC patchset that introduces
> kmalloc-reclaimable caches (more details in the first patch) and uses them
> for SLAB freelists and dcache external names. The latter allows us to
> repurpose the NR_INDIRECTLY_RECLAIMABLE_BYTES counter later in the series.
> 
> This is how /proc/slabinfo looks like after booting in virtme:
> 
> ...
> kmalloc-reclaimable-4194304      0      0 4194304    1 1024 : tunables    1    1    0 : slabdata      0      0      0
> ...
> kmalloc-reclaimable-96     17     64    128   32    1 : tunables  120   60    8 : slabdata      2      2      0
> kmalloc-reclaimable-64     50    128     64   64    1 : tunables  120   60    8 : slabdata      2      2      6
> kmalloc-reclaimable-32      0      0     32  124    1 : tunables  120   60    8 : slabdata      0      0      0
> kmalloc-4194304        0      0 4194304    1 1024 : tunables    1    1    0 : slabdata      0      0      0
> ...
> kmalloc-64          2888   2944     64   64    1 : tunables  120   60    8 : slabdata     46     46    454
> kmalloc-32          4325   4712     32  124    1 : tunables  120   60    8 : slabdata     38     38    563
> kmalloc-128         1178   1216    128   32    1 : tunables  120   60    8 : slabdata     38     38    114
> ...
> 
> /proc/vmstat with new/renamed nr_reclaimable counter (patch 4):
> 
> ...
> nr_slab_reclaimable 2817
> nr_slab_unreclaimable 1781
> ...
> nr_reclaimable 2817
> ...
> 
> /proc/meminfo with exposed nr_reclaimable counter (patch 5):
> 
> ...
> AnonPages:          8624 kB
> Mapped:             3340 kB
> Shmem:               564 kB
> Reclaimable:       11272 kB
> Slab:              18368 kB
> SReclaimable:      11272 kB
> SUnreclaim:         7096 kB
> KernelStack:        1168 kB
> PageTables:          448 kB
> ...
> 
> Now for the issues a.k.a. why RFC:
> 
> - I haven't find any other obvious users for reclaimable kmalloc (yet)

As I remember, ION memory allocator was discussed related to this theme:
https://lkml.org/lkml/2018/4/24/1288

> I did a superset as IIRC somebody suggested that in the older threads or at LSF.

This looks nice to me!

Thanks!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 0/5] kmalloc-reclaimable caches
@ 2018-05-24 12:13   ` Roman Gushchin
  0 siblings, 0 replies; 20+ messages in thread
From: Roman Gushchin @ 2018-05-24 12:13 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Michal Hocko, Johannes Weiner, linux-kernel, linux-api,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Mel Gorman, Vijayanand Jitta

On Thu, May 24, 2018 at 01:00:06PM +0200, Vlastimil Babka wrote:
> Hi,
> 
> as discussed at LSF/MM [1] here's a RFC patchset that introduces
> kmalloc-reclaimable caches (more details in the first patch) and uses them
> for SLAB freelists and dcache external names. The latter allows us to
> repurpose the NR_INDIRECTLY_RECLAIMABLE_BYTES counter later in the series.
> 
> This is how /proc/slabinfo looks like after booting in virtme:
> 
> ...
> kmalloc-reclaimable-4194304      0      0 4194304    1 1024 : tunables    1    1    0 : slabdata      0      0      0
> ...
> kmalloc-reclaimable-96     17     64    128   32    1 : tunables  120   60    8 : slabdata      2      2      0
> kmalloc-reclaimable-64     50    128     64   64    1 : tunables  120   60    8 : slabdata      2      2      6
> kmalloc-reclaimable-32      0      0     32  124    1 : tunables  120   60    8 : slabdata      0      0      0
> kmalloc-4194304        0      0 4194304    1 1024 : tunables    1    1    0 : slabdata      0      0      0
> ...
> kmalloc-64          2888   2944     64   64    1 : tunables  120   60    8 : slabdata     46     46    454
> kmalloc-32          4325   4712     32  124    1 : tunables  120   60    8 : slabdata     38     38    563
> kmalloc-128         1178   1216    128   32    1 : tunables  120   60    8 : slabdata     38     38    114
> ...
> 
> /proc/vmstat with new/renamed nr_reclaimable counter (patch 4):
> 
> ...
> nr_slab_reclaimable 2817
> nr_slab_unreclaimable 1781
> ...
> nr_reclaimable 2817
> ...
> 
> /proc/meminfo with exposed nr_reclaimable counter (patch 5):
> 
> ...
> AnonPages:          8624 kB
> Mapped:             3340 kB
> Shmem:               564 kB
> Reclaimable:       11272 kB
> Slab:              18368 kB
> SReclaimable:      11272 kB
> SUnreclaim:         7096 kB
> KernelStack:        1168 kB
> PageTables:          448 kB
> ...
> 
> Now for the issues a.k.a. why RFC:
> 
> - I haven't find any other obvious users for reclaimable kmalloc (yet)

As I remember, ION memory allocator was discussed related to this theme:
https://lkml.org/lkml/2018/4/24/1288

> I did a superset as IIRC somebody suggested that in the older threads or at LSF.

This looks nice to me!

Thanks!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 0/5] kmalloc-reclaimable caches
  2018-05-24 11:00 [RFC PATCH 0/5] kmalloc-reclaimable caches Vlastimil Babka
                   ` (6 preceding siblings ...)
  2018-05-24 12:13   ` Roman Gushchin
@ 2018-05-24 15:32 ` Johannes Weiner
  2018-05-28  8:15   ` Vlastimil Babka
  7 siblings, 1 reply; 20+ messages in thread
From: Johannes Weiner @ 2018-05-24 15:32 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Roman Gushchin, Michal Hocko, linux-kernel, linux-api,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Mel Gorman, Vijayanand Jitta

On Thu, May 24, 2018 at 01:00:06PM +0200, Vlastimil Babka wrote:
> - the vmstat/meminfo counter name is rather general and might suggest it also
>   includes reclaimable page caches, which it doesn't
>
> Suggestions welcome for all three points. For the last one, we might also keep
> the counter separate from nr_slab_reclaimable, not superset. I did a superset
> as IIRC somebody suggested that in the older threads or at LSF.

Yeah, the "reclaimable" name is too generic. How about KReclaimable?

The counter being a superset sounds good to me. We use this info for
both load balancing and manual debugging. For load balancing code it's
nice not having to worry about finding all the counters that hold
reclaimable memory depending on kernel version; it's always simply
user cache + user anon + kernel reclaimable. And for debugging, we can
always add more specific subset counters later on if we need them.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 0/5] kmalloc-reclaimable caches
  2018-05-24 12:13   ` Roman Gushchin
  (?)
@ 2018-05-24 15:52   ` Vlastimil Babka
  2018-05-24 17:35     ` Laura Abbott
  -1 siblings, 1 reply; 20+ messages in thread
From: Vlastimil Babka @ 2018-05-24 15:52 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, Michal Hocko, Johannes Weiner, linux-kernel, linux-api,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Mel Gorman, Vijayanand Jitta, Laura Abbott

On 05/24/2018 02:13 PM, Roman Gushchin wrote:
> On Thu, May 24, 2018 at 01:00:06PM +0200, Vlastimil Babka wrote:
>> Hi,
>>
>> - I haven't find any other obvious users for reclaimable kmalloc (yet)
> 
> As I remember, ION memory allocator was discussed related to this theme:
> https://lkml.org/lkml/2018/4/24/1288

+CC Laura

Yeah ION added the NR_INDIRECTLY_RECLAIMABLE_BYTES handling, which is
adjusted to page granularity in patch 4. I'm not sure if it should use
kmalloc as it seems to be allocating order-X pages, where kmalloc/slab
just means extra overhead. But maybe if it doesn't allocate/free too
frequently, it could work?

>> I did a superset as IIRC somebody suggested that in the older threads or at LSF.
> 
> This looks nice to me!
> 
> Thanks!
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 0/5] kmalloc-reclaimable caches
  2018-05-24 11:43 ` [RFC PATCH 0/5] kmalloc-reclaimable caches Matthew Wilcox
@ 2018-05-24 16:18   ` Randy Dunlap
  2018-05-24 18:40     ` Randy Dunlap
  0 siblings, 1 reply; 20+ messages in thread
From: Randy Dunlap @ 2018-05-24 16:18 UTC (permalink / raw)
  To: Matthew Wilcox, Vlastimil Babka
  Cc: linux-mm, Roman Gushchin, Michal Hocko, Johannes Weiner,
	linux-kernel, linux-api, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Mel Gorman, Vijayanand Jitta

On 05/24/2018 04:43 AM, Matthew Wilcox wrote:
> On Thu, May 24, 2018 at 01:00:06PM +0200, Vlastimil Babka wrote:
>> Now for the issues a.k.a. why RFC:
>>
>> - I haven't find any other obvious users for reclaimable kmalloc (yet)
> 
> Is that a problem?  This sounds like it's enough to solve Facebook's
> problem.
> 
>> - the name of caches kmalloc-reclaimable-X is rather long
> 
> Yes; Christoph and I were talking about restricting slab names to 16 bytes
> just to make /proc/slabinfo easier to read.  How about
> 
> kmalloc-rec-128k
> 1234567890123456
> 
> Just makes it ;-)
> 
> Of course, somebody needs to do the work to use k/M instead of 4194304.
> We also need to bikeshed about when to switch; should it be:
> 
> kmalloc-rec-512
> kmalloc-rec-1024
> kmalloc-rec-2048
> kmalloc-rec-4096
> kmalloc-rec-8192
> kmalloc-rec-16k
> 
> or should it be
> 
> kmalloc-rec-512
> kmalloc-rec-1k
> kmalloc-rec-2k
> kmalloc-rec-4k
> kmalloc-rec-8k
> kmalloc-rec-16k
> 
> I slightly favour the latter as it'll be easier to implement.  Something like

Yes, agree, start using the suffix early.

> 
> 	static const char suffixes[3] = ' kM';
> 	int idx = 0;
> 
> 	while (size > 1024) {
> 		size /= 1024;
> 		idx++;
> 	}
> 
> 	sprintf("%d%c", size, suffices[idx]);

	                      suffixes
> 
> --


-- 
~Randy

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 0/5] kmalloc-reclaimable caches
  2018-05-24 15:52   ` Vlastimil Babka
@ 2018-05-24 17:35     ` Laura Abbott
  0 siblings, 0 replies; 20+ messages in thread
From: Laura Abbott @ 2018-05-24 17:35 UTC (permalink / raw)
  To: Vlastimil Babka, Roman Gushchin
  Cc: linux-mm, Michal Hocko, Johannes Weiner, linux-kernel, linux-api,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Mel Gorman, Vijayanand Jitta

On 05/24/2018 08:52 AM, Vlastimil Babka wrote:
> On 05/24/2018 02:13 PM, Roman Gushchin wrote:
>> On Thu, May 24, 2018 at 01:00:06PM +0200, Vlastimil Babka wrote:
>>> Hi,
>>>
>>> - I haven't find any other obvious users for reclaimable kmalloc (yet)
>>
>> As I remember, ION memory allocator was discussed related to this theme:
>> https://lkml.org/lkml/2018/4/24/1288
> 
> +CC Laura
> 
> Yeah ION added the NR_INDIRECTLY_RECLAIMABLE_BYTES handling, which is
> adjusted to page granularity in patch 4. I'm not sure if it should use
> kmalloc as it seems to be allocating order-X pages, where kmalloc/slab
> just means extra overhead. But maybe if it doesn't allocate/free too
> frequently, it could work?
> 

The page pool allocation is supposed to be a slow path but it's
one I'd rather not have too much overhead. It also just looks really odd
to be allocating higher order pages via kmalloc imho.
  
>>> I did a superset as IIRC somebody suggested that in the older threads or at LSF.
>>
>> This looks nice to me!
>>
>> Thanks!
>>
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 0/5] kmalloc-reclaimable caches
  2018-05-24 16:18   ` Randy Dunlap
@ 2018-05-24 18:40     ` Randy Dunlap
  2018-05-24 18:48       ` Matthew Wilcox
  0 siblings, 1 reply; 20+ messages in thread
From: Randy Dunlap @ 2018-05-24 18:40 UTC (permalink / raw)
  To: Matthew Wilcox, Vlastimil Babka
  Cc: linux-mm, Roman Gushchin, Michal Hocko, Johannes Weiner,
	linux-kernel, linux-api, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Mel Gorman, Vijayanand Jitta

On 05/24/2018 09:18 AM, Randy Dunlap wrote:
> On 05/24/2018 04:43 AM, Matthew Wilcox wrote:
>> On Thu, May 24, 2018 at 01:00:06PM +0200, Vlastimil Babka wrote:
>>> Now for the issues a.k.a. why RFC:
>>>
>>> - I haven't find any other obvious users for reclaimable kmalloc (yet)
>>
>> Is that a problem?  This sounds like it's enough to solve Facebook's
>> problem.
>>
>>> - the name of caches kmalloc-reclaimable-X is rather long
>>
>> Yes; Christoph and I were talking about restricting slab names to 16 bytes
>> just to make /proc/slabinfo easier to read.  How about
>>
>> kmalloc-rec-128k
>> 1234567890123456
>>
>> Just makes it ;-)
>>
>> Of course, somebody needs to do the work to use k/M instead of 4194304.
>> We also need to bikeshed about when to switch; should it be:
>>
>> kmalloc-rec-512
>> kmalloc-rec-1024
>> kmalloc-rec-2048
>> kmalloc-rec-4096
>> kmalloc-rec-8192
>> kmalloc-rec-16k
>>
>> or should it be
>>
>> kmalloc-rec-512
>> kmalloc-rec-1k
>> kmalloc-rec-2k
>> kmalloc-rec-4k
>> kmalloc-rec-8k
>> kmalloc-rec-16k
>>
>> I slightly favour the latter as it'll be easier to implement.  Something like
> 
> Yes, agree, start using the suffix early.
> 
>>
>> 	static const char suffixes[3] = ' kM';
>> 	int idx = 0;
>>
>> 	while (size > 1024) {

I would use   (size >= 1024)
so that 1M is printed instead of 1024K.

>> 		size /= 1024;
>> 		idx++;
>> 	}
>>
>> 	sprintf("%d%c", size, suffices[idx]);
> 
> 	                      suffixes
>>
>> --
> 
> 


-- 
~Randy

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 0/5] kmalloc-reclaimable caches
  2018-05-24 18:40     ` Randy Dunlap
@ 2018-05-24 18:48       ` Matthew Wilcox
  0 siblings, 0 replies; 20+ messages in thread
From: Matthew Wilcox @ 2018-05-24 18:48 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Vlastimil Babka, linux-mm, Roman Gushchin, Michal Hocko,
	Johannes Weiner, linux-kernel, linux-api, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Mel Gorman,
	Vijayanand Jitta

On Thu, May 24, 2018 at 11:40:59AM -0700, Randy Dunlap wrote:
> >> 	while (size > 1024) {
> 
> I would use   (size >= 1024)
> so that 1M is printed instead of 1024K.

Yes; that's what I meant to type.  Thanks!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 1/5] mm, slab/slub: introduce kmalloc-reclaimable caches
  2018-05-24 11:00 ` [RFC PATCH 1/5] mm, slab/slub: introduce " Vlastimil Babka
@ 2018-05-25 15:51   ` Christopher Lameter
  2018-05-28  8:03     ` Vlastimil Babka
  0 siblings, 1 reply; 20+ messages in thread
From: Christopher Lameter @ 2018-05-25 15:51 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Roman Gushchin, Michal Hocko, Johannes Weiner,
	linux-kernel, linux-api, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Mel Gorman, Vijayanand Jitta

On Thu, 24 May 2018, Vlastimil Babka wrote:

> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 9ebe659bd4a5..5bff0571b360 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -296,11 +296,16 @@ static inline void __check_heap_object(const void *ptr, unsigned long n,
>                                 (KMALLOC_MIN_SIZE) : 16)
>
>  #ifndef CONFIG_SLOB
> -extern struct kmem_cache *kmalloc_caches[KMALLOC_SHIFT_HIGH + 1];
> +extern struct kmem_cache *kmalloc_caches[2][KMALLOC_SHIFT_HIGH + 1];
>  #ifdef CONFIG_ZONE_DMA
>  extern struct kmem_cache *kmalloc_dma_caches[KMALLOC_SHIFT_HIGH + 1];
>  #endif

In the existing code we used a different array name for the DMA caches.
This is a similar situation.

I would suggest to use

kmalloc_reclaimable_caches[]

or make it consistent by folding the DMA caches into the array too (but
then note the issues below).

> @@ -536,12 +541,13 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags)
>  #ifndef CONFIG_SLOB
>  		if (!(flags & GFP_DMA)) {
>  			unsigned int index = kmalloc_index(size);
> +			unsigned int recl = kmalloc_reclaimable(flags);

This is a hotpath reserved for regular allocations. The reclaimable slabs
need to be handled like the DMA slabs.  So check for GFP_DMA plus the
reclaimable flags.

> @@ -588,12 +594,13 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
>  	if (__builtin_constant_p(size) &&
>  		size <= KMALLOC_MAX_CACHE_SIZE && !(flags & GFP_DMA)) {
>  		unsigned int i = kmalloc_index(size);
> +		unsigned int recl = kmalloc_reclaimable(flags);
>


Same situation here and additional times below.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 4/5] mm: rename and change semantics of nr_indirectly_reclaimable_bytes
  2018-05-24 11:00 ` [RFC PATCH 4/5] mm: rename and change semantics of nr_indirectly_reclaimable_bytes Vlastimil Babka
@ 2018-05-25 15:59   ` Christopher Lameter
  0 siblings, 0 replies; 20+ messages in thread
From: Christopher Lameter @ 2018-05-25 15:59 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Roman Gushchin, Michal Hocko, Johannes Weiner,
	linux-kernel, linux-api, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Mel Gorman, Vijayanand Jitta

On Thu, 24 May 2018, Vlastimil Babka wrote:

> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 32699b2dc52a..4343948f33e5 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -180,7 +180,7 @@ enum node_stat_item {
>  	NR_VMSCAN_IMMEDIATE,	/* Prioritise for reclaim when writeback ends */
>  	NR_DIRTIED,		/* page dirtyings since bootup */
>  	NR_WRITTEN,		/* page writings since bootup */
> -	NR_INDIRECTLY_RECLAIMABLE_BYTES, /* measured in bytes */
> +	NR_RECLAIMABLE,         /* all reclaimable pages, including slab */
>  	NR_VM_NODE_STAT_ITEMS

We already have NR_SLAB_RECLAIMABLE and NR_RECLAIMABLE now counts what
NR_SLAB_RECLAIMABLE counts plus something else. THis means updating
two counters in parallel.

Could keep the existing counter and just account
for those non slab things you mentioned? Avoid counting twice and may
provide unique insides into those non slab reclaimable objects. I'd like
to see this.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 1/5] mm, slab/slub: introduce kmalloc-reclaimable caches
  2018-05-25 15:51   ` Christopher Lameter
@ 2018-05-28  8:03     ` Vlastimil Babka
  0 siblings, 0 replies; 20+ messages in thread
From: Vlastimil Babka @ 2018-05-28  8:03 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: linux-mm, Roman Gushchin, Michal Hocko, Johannes Weiner,
	linux-kernel, linux-api, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Mel Gorman, Vijayanand Jitta

On 05/25/2018 05:51 PM, Christopher Lameter wrote:
> On Thu, 24 May 2018, Vlastimil Babka wrote:
> 
>> diff --git a/include/linux/slab.h b/include/linux/slab.h
>> index 9ebe659bd4a5..5bff0571b360 100644
>> --- a/include/linux/slab.h
>> +++ b/include/linux/slab.h
>> @@ -296,11 +296,16 @@ static inline void __check_heap_object(const void *ptr, unsigned long n,
>>                                 (KMALLOC_MIN_SIZE) : 16)
>>
>>  #ifndef CONFIG_SLOB
>> -extern struct kmem_cache *kmalloc_caches[KMALLOC_SHIFT_HIGH + 1];
>> +extern struct kmem_cache *kmalloc_caches[2][KMALLOC_SHIFT_HIGH + 1];
>>  #ifdef CONFIG_ZONE_DMA
>>  extern struct kmem_cache *kmalloc_dma_caches[KMALLOC_SHIFT_HIGH + 1];
>>  #endif
> 
> In the existing code we used a different array name for the DMA caches.
> This is a similar situation.
> 
> I would suggest to use
> 
> kmalloc_reclaimable_caches[]
> 
> or make it consistent by folding the DMA caches into the array too (but
> then note the issues below).
> 
>> @@ -536,12 +541,13 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags)
>>  #ifndef CONFIG_SLOB
>>  		if (!(flags & GFP_DMA)) {
>>  			unsigned int index = kmalloc_index(size);
>> +			unsigned int recl = kmalloc_reclaimable(flags);
> 
> This is a hotpath reserved for regular allocations. The reclaimable slabs
> need to be handled like the DMA slabs.  So check for GFP_DMA plus the
> reclaimable flags.

Yeah I thought that by doing reclaimable via array index manipulation
and not a branch, there would be no noticeable overhead. And GFP_DMA
should go away eventually. I will see if I can convert GFP_DMA to
another index, and completely remove the branch quoted above.

>> @@ -588,12 +594,13 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
>>  	if (__builtin_constant_p(size) &&
>>  		size <= KMALLOC_MAX_CACHE_SIZE && !(flags & GFP_DMA)) {
>>  		unsigned int i = kmalloc_index(size);
>> +		unsigned int recl = kmalloc_reclaimable(flags);
>>
> 
> 
> Same situation here and additional times below.
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 0/5] kmalloc-reclaimable caches
  2018-05-24 15:32 ` Johannes Weiner
@ 2018-05-28  8:15   ` Vlastimil Babka
  2018-05-29 17:58     ` Johannes Weiner
  0 siblings, 1 reply; 20+ messages in thread
From: Vlastimil Babka @ 2018-05-28  8:15 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, Roman Gushchin, Michal Hocko, linux-kernel, linux-api,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Mel Gorman, Vijayanand Jitta

On 05/24/2018 05:32 PM, Johannes Weiner wrote:
> On Thu, May 24, 2018 at 01:00:06PM +0200, Vlastimil Babka wrote:
>> - the vmstat/meminfo counter name is rather general and might suggest it also
>>   includes reclaimable page caches, which it doesn't
>>
>> Suggestions welcome for all three points. For the last one, we might also keep
>> the counter separate from nr_slab_reclaimable, not superset. I did a superset
>> as IIRC somebody suggested that in the older threads or at LSF.
> 
> Yeah, the "reclaimable" name is too generic. How about KReclaimable?
> 
> The counter being a superset sounds good to me. We use this info for
> both load balancing and manual debugging. For load balancing code it's
> nice not having to worry about finding all the counters that hold
> reclaimable memory depending on kernel version; it's always simply
> user cache + user anon + kernel reclaimable. And for debugging, we can
> always add more specific subset counters later on if we need them.

Hm, Christoph in his reply to patch 4/5 expressed a different opinion.
It's true that updating two counters has extra overhead, especially if
there are two separate critical sections:

mod_lruvec_page_state(page, NR_SLAB_RECLAIMABLE, nr_pages);
mod_node_page_state(page_pgdat(page), NR_RECLAIMABLE, nr_pages);

The first disables irq for CONFIG_MEMCG or defers to
mod_node_page_state() otherwise.
mod_node_page_state() is different depending on CONFIG_SMP and
CONFIG_HAVE_CMPXCHG_LOCAL.

I don't see an easy way to make this optimal? Different counter would be
indeed simpler. /proc/vmstat would then print separate counters, but we
could have both separate and summary counter in /proc/meminfo. Would
that be enough?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 0/5] kmalloc-reclaimable caches
  2018-05-28  8:15   ` Vlastimil Babka
@ 2018-05-29 17:58     ` Johannes Weiner
  0 siblings, 0 replies; 20+ messages in thread
From: Johannes Weiner @ 2018-05-29 17:58 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Roman Gushchin, Michal Hocko, linux-kernel, linux-api,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Mel Gorman, Vijayanand Jitta

On Mon, May 28, 2018 at 10:15:46AM +0200, Vlastimil Babka wrote:
> On 05/24/2018 05:32 PM, Johannes Weiner wrote:
> > On Thu, May 24, 2018 at 01:00:06PM +0200, Vlastimil Babka wrote:
> >> - the vmstat/meminfo counter name is rather general and might suggest it also
> >>   includes reclaimable page caches, which it doesn't
> >>
> >> Suggestions welcome for all three points. For the last one, we might also keep
> >> the counter separate from nr_slab_reclaimable, not superset. I did a superset
> >> as IIRC somebody suggested that in the older threads or at LSF.
> > 
> > Yeah, the "reclaimable" name is too generic. How about KReclaimable?
> > 
> > The counter being a superset sounds good to me. We use this info for
> > both load balancing and manual debugging. For load balancing code it's
> > nice not having to worry about finding all the counters that hold
> > reclaimable memory depending on kernel version; it's always simply
> > user cache + user anon + kernel reclaimable. And for debugging, we can
> > always add more specific subset counters later on if we need them.
> 
> Hm, Christoph in his reply to patch 4/5 expressed a different opinion.
> It's true that updating two counters has extra overhead, especially if
> there are two separate critical sections:
> 
> mod_lruvec_page_state(page, NR_SLAB_RECLAIMABLE, nr_pages);
> mod_node_page_state(page_pgdat(page), NR_RECLAIMABLE, nr_pages);
> 
> The first disables irq for CONFIG_MEMCG or defers to
> mod_node_page_state() otherwise.
> mod_node_page_state() is different depending on CONFIG_SMP and
> CONFIG_HAVE_CMPXCHG_LOCAL.
> 
> I don't see an easy way to make this optimal? Different counter would be
> indeed simpler. /proc/vmstat would then print separate counters, but we
> could have both separate and summary counter in /proc/meminfo. Would
> that be enough?

Yeah, that works just as well.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-05-29 17:56 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-24 11:00 [RFC PATCH 0/5] kmalloc-reclaimable caches Vlastimil Babka
2018-05-24 11:00 ` [RFC PATCH 1/5] mm, slab/slub: introduce " Vlastimil Babka
2018-05-25 15:51   ` Christopher Lameter
2018-05-28  8:03     ` Vlastimil Babka
2018-05-24 11:00 ` [RFC PATCH 2/5] mm, slab: allocate off-slab freelists as reclaimable when appropriate Vlastimil Babka
2018-05-24 11:00 ` [RFC PATCH 3/5] dcache: allocate external names from reclaimable kmalloc caches Vlastimil Babka
2018-05-24 11:00 ` [RFC PATCH 4/5] mm: rename and change semantics of nr_indirectly_reclaimable_bytes Vlastimil Babka
2018-05-25 15:59   ` Christopher Lameter
2018-05-24 11:00 ` [RFC PATCH 5/5] mm, proc: add NR_RECLAIMABLE to /proc/meminfo Vlastimil Babka
2018-05-24 11:43 ` [RFC PATCH 0/5] kmalloc-reclaimable caches Matthew Wilcox
2018-05-24 16:18   ` Randy Dunlap
2018-05-24 18:40     ` Randy Dunlap
2018-05-24 18:48       ` Matthew Wilcox
2018-05-24 12:13 ` Roman Gushchin
2018-05-24 12:13   ` Roman Gushchin
2018-05-24 15:52   ` Vlastimil Babka
2018-05-24 17:35     ` Laura Abbott
2018-05-24 15:32 ` Johannes Weiner
2018-05-28  8:15   ` Vlastimil Babka
2018-05-29 17:58     ` Johannes Weiner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.