All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/7] mm: introduce shrinker debugfs interface
@ 2022-04-22 20:26 Roman Gushchin
  2022-04-22 20:26 ` [PATCH v2 1/7] mm: introduce debugfs interface for kernel memory shrinkers Roman Gushchin
                   ` (7 more replies)
  0 siblings, 8 replies; 24+ messages in thread
From: Roman Gushchin @ 2022-04-22 20:26 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Dave Chinner, linux-kernel, Yang Shi, Kent Overstreet,
	Hillf Danton, Roman Gushchin

There are 50+ different shrinkers in the kernel, many with their own bells and
whistles. Under the memory pressure the kernel applies some pressure on each of
them in the order of which they were created/registered in the system. Some
of them can contain only few objects, some can be quite large. Some can be
effective at reclaiming memory, some not.

The only existing debugging mechanism is a couple of tracepoints in
do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end. They aren't
covering everything though: shrinkers which report 0 objects will never show up,
there is no support for memcg-aware shrinkers. Shrinkers are identified by their
scan function, which is not always enough (e.g. hard to guess which super
block's shrinker it is having only "super_cache_scan"). They are a passive
mechanism: there is no way to call into counting and scanning of an individual
shrinker and profile it.

To provide a better visibility and debug options for memory shrinkers
this patchset introduces a /sys/kernel/debug/shrinker interface, to some extent
similar to /sys/kernel/slab.

For each shrinker registered in the system a directory is created. The directory
contains "count" and "scan" files, which allow to trigger count_objects()
and scan_objects() callbacks. For memcg-aware and numa-aware shrinkers
count_memcg, scan_memcg, count_node, scan_node, count_memcg_node
and scan_memcg_node are additionally provided. They allow to get per-memcg
and/or per-node object count and shrink only a specific memcg/node.

To make debugging more pleasant, the patchset also names all shrinkers,
so that debugfs entries can have more meaningful names.

Usage examples:

1) List registered shrinkers:
  $ cd /sys/kernel/debug/shrinker/
  $ ls
    dqcache-16          sb-cgroup2-30    sb-hugetlbfs-33  sb-proc-41       sb-selinuxfs-22  sb-tmpfs-40    sb-zsmalloc-19
    kfree_rcu-0         sb-configfs-23   sb-iomem-12      sb-proc-44       sb-sockfs-8      sb-tmpfs-42    shadow-18
    sb-aio-20           sb-dax-11        sb-mqueue-21     sb-proc-45       sb-sysfs-26      sb-tmpfs-43    thp_deferred_split-10
    sb-anon_inodefs-15  sb-debugfs-7     sb-nsfs-4        sb-proc-47       sb-tmpfs-1       sb-tmpfs-46    thp_zero-9
    sb-bdev-3           sb-devpts-28     sb-pipefs-14     sb-pstore-31     sb-tmpfs-27      sb-tmpfs-49    xfs_buf-37
    sb-bpf-32           sb-devtmpfs-5    sb-proc-25       sb-rootfs-2      sb-tmpfs-29      sb-tracefs-13  xfs_inodegc-38
    sb-btrfs-24         sb-hugetlbfs-17  sb-proc-39       sb-securityfs-6  sb-tmpfs-35      sb-xfs-36      zspool-34

2) Get information about a specific shrinker:
  $ cd sb-btrfs-24/
  $ ls
    count  count_memcg  count_memcg_node  count_node  scan  scan_memcg  scan_memcg_node  scan_node

3) Count objects on the system/root cgroup level
  $ cat count
    212

4) Count objects on the system/root cgroup level per numa node (on a 2-node machine)
  $ cat count_node
    209 3

5) Count objects for each memcg (output format: cgroup inode, count)
  $ cat count_memcg
    1 212
    20 96
    53 817
    2297 2
    218 13
    581 30
    911 124
    <CUT>

6) Same but with a per-node output
  $ cat count_memcg_node
    1 209 3
    20 96 0
    53 810 7
    2297 2 0
    218 13 0
    581 30 0
    911 124 0
    <CUT>

7) Scan system/root shrinker
  $ cat count
    212
  $ echo 100 > scan
  $ cat scan
    97
  $ cat count
    115

8) Scan individual memcg
  $ echo "1868 500" > scan_memcg
  $ cat scan_memcg
    193

9) Scan individual node
  $ echo "1 200" > scan_node
  $ cat scan_node
    2

10) Scan individual memcg and node
  $ echo "1868 0 500" > scan_memcg_node
  $ cat scan_memcg_node
    435


v1:
  1) switched to debugfs, suggested by Mike, Andrew, Greg and others
  2) switched to seq_file API for output, no PAGE_SIZE limit anymore, by Andrew
  3) switched to down_read_killable(), suggested by Hillf
  4) dropped stateful filtering and "freed" returning, by Kent
  5) added docs, by Andrew

rfc:
  https://lwn.net/Articles/891542/


Roman Gushchin (7):
  mm: introduce debugfs interface for kernel memory shrinkers
  mm: memcontrol: introduce mem_cgroup_ino() and
    mem_cgroup_get_from_ino()
  mm: introduce memcg interfaces for shrinker debugfs
  mm: introduce numa interfaces for shrinker debugfs
  mm: provide shrinkers with names
  docs: document shrinker debugfs
  tools: add memcg_shrinker.py

 Documentation/admin-guide/mm/index.rst        |   1 +
 .../admin-guide/mm/shrinker_debugfs.rst       |  90 +++
 arch/x86/kvm/mmu/mmu.c                        |   2 +-
 drivers/android/binder_alloc.c                |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_shrinker.c  |   3 +-
 drivers/gpu/drm/msm/msm_gem_shrinker.c        |   2 +-
 .../gpu/drm/panfrost/panfrost_gem_shrinker.c  |   2 +-
 drivers/gpu/drm/ttm/ttm_pool.c                |   2 +-
 drivers/md/bcache/btree.c                     |   2 +-
 drivers/md/dm-bufio.c                         |   2 +-
 drivers/md/dm-zoned-metadata.c                |   2 +-
 drivers/md/raid5.c                            |   2 +-
 drivers/misc/vmw_balloon.c                    |   2 +-
 drivers/virtio/virtio_balloon.c               |   2 +-
 drivers/xen/xenbus/xenbus_probe_backend.c     |   2 +-
 fs/erofs/utils.c                              |   2 +-
 fs/ext4/extents_status.c                      |   3 +-
 fs/f2fs/super.c                               |   2 +-
 fs/gfs2/glock.c                               |   2 +-
 fs/gfs2/main.c                                |   2 +-
 fs/jbd2/journal.c                             |   2 +-
 fs/mbcache.c                                  |   2 +-
 fs/nfs/nfs42xattr.c                           |   7 +-
 fs/nfs/super.c                                |   2 +-
 fs/nfsd/filecache.c                           |   2 +-
 fs/nfsd/nfscache.c                            |   2 +-
 fs/quota/dquot.c                              |   2 +-
 fs/super.c                                    |   2 +-
 fs/ubifs/super.c                              |   2 +-
 fs/xfs/xfs_buf.c                              |   2 +-
 fs/xfs/xfs_icache.c                           |   2 +-
 fs/xfs/xfs_qm.c                               |   2 +-
 include/linux/memcontrol.h                    |   9 +
 include/linux/shrinker.h                      |  24 +-
 kernel/rcu/tree.c                             |   2 +-
 lib/Kconfig.debug                             |   9 +
 mm/Makefile                                   |   1 +
 mm/huge_memory.c                              |   4 +-
 mm/memcontrol.c                               |  23 +
 mm/shrinker_debug.c                           | 511 ++++++++++++++++++
 mm/vmscan.c                                   |  66 ++-
 mm/workingset.c                               |   2 +-
 mm/zsmalloc.c                                 |   2 +-
 net/sunrpc/auth.c                             |   2 +-
 tools/cgroup/memcg_shrinker.py                |  70 +++
 45 files changed, 836 insertions(+), 47 deletions(-)
 create mode 100644 Documentation/admin-guide/mm/shrinker_debugfs.rst
 create mode 100644 mm/shrinker_debug.c
 create mode 100755 tools/cgroup/memcg_shrinker.py

-- 
2.35.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/7] mm: introduce debugfs interface for kernel memory shrinkers
  2022-04-22 20:26 [PATCH v2 0/7] mm: introduce shrinker debugfs interface Roman Gushchin
@ 2022-04-22 20:26 ` Roman Gushchin
  2022-04-23  7:51     ` Christophe JAILLET
  2022-04-22 20:26 ` [PATCH v2 2/7] mm: memcontrol: introduce mem_cgroup_ino() and mem_cgroup_get_from_ino() Roman Gushchin
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Roman Gushchin @ 2022-04-22 20:26 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Dave Chinner, linux-kernel, Yang Shi, Kent Overstreet,
	Hillf Danton, Roman Gushchin

This commit introduces the /sys/kernel/debug/shrinker debugfs
interface which provides an ability to observe the state and interact
with individual kernel memory shrinkers.

Because the feature is oriented on kernel developers and adds some
memory overhead (which shouldn't be large unless there is a huge
amount of registered shrinkers), it's guarded by a config option
(disabled by default).

This commit introduces "count" and "scan" interfaces for each
shrinker registered in the system.

Basic usage:
  1) Get the number of objects
    $ cat count

  2) Try to reclaim 500 objects
    $ echo "500" > scan

Following commits in the series will add memcg- and numa-specific
features.

This commit gives debugfs entries simple numeric names, which are not
very convenient. The following commit in the series will provide
shrinkers with more meaningful names.

Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
---
 include/linux/shrinker.h |  19 +++-
 lib/Kconfig.debug        |   9 ++
 mm/Makefile              |   1 +
 mm/shrinker_debug.c      | 214 +++++++++++++++++++++++++++++++++++++++
 mm/vmscan.c              |   6 +-
 5 files changed, 246 insertions(+), 3 deletions(-)
 create mode 100644 mm/shrinker_debug.c

diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 76fbf92b04d9..17985a890887 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -72,6 +72,10 @@ struct shrinker {
 #ifdef CONFIG_MEMCG
 	/* ID in shrinker_idr */
 	int id;
+#endif
+#ifdef CONFIG_SHRINKER_DEBUG
+	int debugfs_id;
+	struct dentry *debugfs_entry;
 #endif
 	/* objs pending delete, per node */
 	atomic_long_t *nr_deferred;
@@ -94,4 +98,17 @@ extern int register_shrinker(struct shrinker *shrinker);
 extern void unregister_shrinker(struct shrinker *shrinker);
 extern void free_prealloced_shrinker(struct shrinker *shrinker);
 extern void synchronize_shrinkers(void);
-#endif
+
+#ifdef CONFIG_SHRINKER_DEBUG
+int shrinker_debugfs_add(struct shrinker *shrinker);
+void shrinker_debugfs_remove(struct shrinker *shrinker);
+#else /* CONFIG_SHRINKER_DEBUG */
+static inline int shrinker_debugfs_add(struct shrinker *shrinker)
+{
+	return 0;
+}
+static inline void shrinker_debugfs_remove(struct shrinker *shrinker)
+{
+}
+#endif /* CONFIG_SHRINKER_DEBUG */
+#endif /* _LINUX_SHRINKER_H */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 6bf9cceb7d20..51910b291b81 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -733,6 +733,15 @@ config SLUB_STATS
 	  out which slabs are relevant to a particular load.
 	  Try running: slabinfo -DA
 
+config SHRINKER_DEBUG
+	default n
+	bool "Enable shrinker debugging support"
+	depends on DEBUG_FS
+	help
+	  Say Y to enable the shrinker debugfs interface which provides
+	  visibility into the kernel memory shrinkers subsystem.
+	  Disable it to avoid an extra memory footprint.
+
 config HAVE_DEBUG_KMEMLEAK
 	bool
 
diff --git a/mm/Makefile b/mm/Makefile
index 6f9ffa968a1a..9a564f836403 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -133,3 +133,4 @@ obj-$(CONFIG_PAGE_REPORTING) += page_reporting.o
 obj-$(CONFIG_IO_MAPPING) += io-mapping.o
 obj-$(CONFIG_HAVE_BOOTMEM_INFO_NODE) += bootmem_info.o
 obj-$(CONFIG_GENERIC_IOREMAP) += ioremap.o
+obj-$(CONFIG_SHRINKER_DEBUG) += shrinker_debug.o
diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c
new file mode 100644
index 000000000000..4df7382a0737
--- /dev/null
+++ b/mm/shrinker_debug.c
@@ -0,0 +1,214 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/idr.h>
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+#include <linux/shrinker.h>
+
+/* defined in vmscan.c */
+extern struct rw_semaphore shrinker_rwsem;
+extern struct list_head shrinker_list;
+
+static DEFINE_IDA(shrinker_debugfs_ida);
+static struct dentry *shrinker_debugfs_root;
+
+static int shrinker_debugfs_count_show(struct seq_file *m, void *v)
+{
+	struct shrinker *shrinker = (struct shrinker *)m->private;
+	unsigned long nr, total = 0;
+	int ret, nid;
+
+	ret = down_read_killable(&shrinker_rwsem);
+	if (ret)
+		return ret;
+
+	for_each_node(nid) {
+		struct shrink_control sc = {
+			.gfp_mask = GFP_KERNEL,
+			.nid = nid,
+		};
+
+		nr = shrinker->count_objects(shrinker, &sc);
+		if (nr == SHRINK_EMPTY)
+			nr = 0;
+		total += nr;
+
+		if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
+			break;
+
+		cond_resched();
+	}
+	up_read(&shrinker_rwsem);
+
+	seq_printf(m, "%lu\n", total);
+
+	return ret;
+}
+DEFINE_SHOW_ATTRIBUTE(shrinker_debugfs_count);
+
+static ssize_t shrinker_debugfs_scan_write(struct file *file,
+					   const char __user *buf,
+					   size_t size, loff_t *pos)
+{
+	struct shrinker *shrinker = (struct shrinker *)file->private_data;
+	unsigned long nr, total = 0, nr_to_scan;
+	unsigned long *count_per_node = NULL;
+	int nid;
+	char kbuf[24];
+	int read_len = size < (sizeof(kbuf) - 1) ? size : (sizeof(kbuf) - 1);
+	ssize_t ret;
+
+	if (copy_from_user(kbuf, buf, read_len))
+		return -EFAULT;
+	kbuf[read_len] = '\0';
+
+	if (kstrtoul(kbuf, 10, &nr_to_scan))
+		return -EINVAL;
+
+	ret = down_read_killable(&shrinker_rwsem);
+	if (ret)
+		return ret;
+
+	if (shrinker->flags & SHRINKER_NUMA_AWARE) {
+		/*
+		 * If the shrinker is numa aware, distribute nr_to_scan
+		 * proportionally.
+		 */
+		count_per_node = kcalloc(nr_node_ids, sizeof(unsigned long),
+					 GFP_KERNEL);
+		if (!count_per_node) {
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		for_each_node(nid) {
+			struct shrink_control sc = {
+				.gfp_mask = GFP_KERNEL,
+				.nid = nid,
+			};
+
+			nr = shrinker->count_objects(shrinker, &sc);
+			if (nr == SHRINK_EMPTY)
+				nr = 0;
+			count_per_node[nid] = nr;
+			total += nr;
+
+			cond_resched();
+		}
+	}
+
+	for_each_node(nid) {
+		struct shrink_control sc = {
+			.gfp_mask = GFP_KERNEL,
+			.nid = nid,
+		};
+
+		if (shrinker->flags & SHRINKER_NUMA_AWARE) {
+			sc.nr_to_scan = nr_to_scan * count_per_node[nid] /
+				(total ? total : 1);
+			sc.nr_scanned = sc.nr_to_scan;
+		} else {
+			sc.nr_to_scan = nr_to_scan;
+			sc.nr_scanned = sc.nr_to_scan;
+		}
+
+		nr = shrinker->scan_objects(shrinker, &sc);
+		if (nr == SHRINK_STOP)
+			break;
+
+		if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
+			break;
+
+		cond_resched();
+
+	}
+	ret = size;
+out:
+	up_read(&shrinker_rwsem);
+	kfree(count_per_node);
+	return ret;
+}
+
+static int shrinker_debugfs_scan_open(struct inode *inode, struct file *file)
+{
+	file->private_data = inode->i_private;
+	return nonseekable_open(inode, file);
+}
+
+static const struct file_operations shrinker_debugfs_scan_fops = {
+	.owner	 = THIS_MODULE,
+	.open	 = shrinker_debugfs_scan_open,
+	.write	 = shrinker_debugfs_scan_write,
+};
+
+int shrinker_debugfs_add(struct shrinker *shrinker)
+{
+	struct dentry *entry;
+	char buf[256];
+	int id;
+
+	lockdep_assert_held(&shrinker_rwsem);
+
+	/* debugfs isn't initialized yet, add debugfs entries later. */
+	if (!shrinker_debugfs_root)
+		return 0;
+
+	id = ida_alloc(&shrinker_debugfs_ida, GFP_KERNEL);
+	if (id < 0)
+		return id;
+	shrinker->debugfs_id = id;
+
+	snprintf(buf, sizeof(buf), "%d", id);
+
+	/* create debugfs entry */
+	entry = debugfs_create_dir(buf, shrinker_debugfs_root);
+	if (IS_ERR(entry)) {
+		ida_free(&shrinker_debugfs_ida, id);
+		return PTR_ERR(entry);
+	}
+	shrinker->debugfs_entry = entry;
+
+	/* create generic interfaces */
+	debugfs_create_file("count", 0220, entry, shrinker,
+			    &shrinker_debugfs_count_fops);
+	debugfs_create_file("scan", 0440, entry, shrinker,
+			    &shrinker_debugfs_scan_fops);
+
+	return 0;
+}
+
+void shrinker_debugfs_remove(struct shrinker *shrinker)
+{
+	lockdep_assert_held(&shrinker_rwsem);
+
+	if (!shrinker->debugfs_entry)
+		return;
+
+	debugfs_remove_recursive(shrinker->debugfs_entry);
+	ida_free(&shrinker_debugfs_ida, shrinker->debugfs_id);
+}
+
+static int __init shrinker_debugfs_init(void)
+{
+	struct shrinker *shrinker;
+	int ret;
+
+	if (!debugfs_initialized())
+		return -ENODEV;
+
+	shrinker_debugfs_root = debugfs_create_dir("shrinker", NULL);
+	if (!shrinker_debugfs_root)
+		return -ENOMEM;
+
+	/* Create debugfs entries for shrinkers registered at boot */
+	ret = down_write_killable(&shrinker_rwsem);
+	if (ret)
+		return ret;
+
+	list_for_each_entry(shrinker, &shrinker_list, list)
+		if (!shrinker->debugfs_entry)
+			ret = shrinker_debugfs_add(shrinker);
+	up_write(&shrinker_rwsem);
+
+	return ret;
+}
+late_initcall(shrinker_debugfs_init);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 99a572f01cb4..121a54a1602b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -190,8 +190,8 @@ static void set_task_reclaim_state(struct task_struct *task,
 	task->reclaim_state = rs;
 }
 
-static LIST_HEAD(shrinker_list);
-static DECLARE_RWSEM(shrinker_rwsem);
+LIST_HEAD(shrinker_list);
+DECLARE_RWSEM(shrinker_rwsem);
 
 #ifdef CONFIG_MEMCG
 static int shrinker_nr_max;
@@ -655,6 +655,7 @@ void register_shrinker_prepared(struct shrinker *shrinker)
 	down_write(&shrinker_rwsem);
 	list_add_tail(&shrinker->list, &shrinker_list);
 	shrinker->flags |= SHRINKER_REGISTERED;
+	WARN_ON_ONCE(shrinker_debugfs_add(shrinker));
 	up_write(&shrinker_rwsem);
 }
 
@@ -682,6 +683,7 @@ void unregister_shrinker(struct shrinker *shrinker)
 	shrinker->flags &= ~SHRINKER_REGISTERED;
 	if (shrinker->flags & SHRINKER_MEMCG_AWARE)
 		unregister_memcg_shrinker(shrinker);
+	shrinker_debugfs_remove(shrinker);
 	up_write(&shrinker_rwsem);
 
 	kfree(shrinker->nr_deferred);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 2/7] mm: memcontrol: introduce mem_cgroup_ino() and mem_cgroup_get_from_ino()
  2022-04-22 20:26 [PATCH v2 0/7] mm: introduce shrinker debugfs interface Roman Gushchin
  2022-04-22 20:26 ` [PATCH v2 1/7] mm: introduce debugfs interface for kernel memory shrinkers Roman Gushchin
@ 2022-04-22 20:26 ` Roman Gushchin
  2022-04-22 20:26 ` [PATCH v2 3/7] mm: introduce memcg interfaces for shrinker debugfs Roman Gushchin
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Roman Gushchin @ 2022-04-22 20:26 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Dave Chinner, linux-kernel, Yang Shi, Kent Overstreet,
	Hillf Danton, Roman Gushchin

Shrinker debugfs requires a way to represent memory cgroups without
using full paths, both for displaying information and getting input
from a user.

Cgroup inode number is a perfect way, already used by e.g. bpf.

This commit adds a couple of helper functions which will be used to
represent and interact with memcg-aware shrinkers.

Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
---
 include/linux/memcontrol.h |  9 +++++++++
 mm/memcontrol.c            | 23 +++++++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 89b14729d59f..27a91dd210c9 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -831,6 +831,15 @@ static inline unsigned short mem_cgroup_id(struct mem_cgroup *memcg)
 }
 struct mem_cgroup *mem_cgroup_from_id(unsigned short id);
 
+#ifdef CONFIG_SHRINKER_DEBUG
+static inline unsigned long mem_cgroup_ino(struct mem_cgroup *memcg)
+{
+	return cgroup_ino(memcg->css.cgroup);
+}
+
+struct mem_cgroup *mem_cgroup_get_from_ino(unsigned long ino);
+#endif
+
 static inline struct mem_cgroup *mem_cgroup_from_seq(struct seq_file *m)
 {
 	return mem_cgroup_from_css(seq_css(m));
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 08cbe23a8b94..a508979177a2 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5022,6 +5022,29 @@ struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
 	return idr_find(&mem_cgroup_idr, id);
 }
 
+#ifdef CONFIG_SHRINKER_DEBUG
+struct mem_cgroup *mem_cgroup_get_from_ino(unsigned long ino)
+{
+	struct cgroup *cgrp;
+	struct cgroup_subsys_state *css;
+	struct mem_cgroup *memcg;
+
+	cgrp = cgroup_get_from_id(ino);
+	if (!cgrp)
+		return ERR_PTR(-ENOENT);
+
+	css = cgroup_get_e_css(cgrp, &memory_cgrp_subsys);
+	if (css)
+		memcg = container_of(css, struct mem_cgroup, css);
+	else
+		memcg = ERR_PTR(-ENOENT);
+
+	cgroup_put(cgrp);
+
+	return memcg;
+}
+#endif
+
 static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
 {
 	struct mem_cgroup_per_node *pn;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 3/7] mm: introduce memcg interfaces for shrinker debugfs
  2022-04-22 20:26 [PATCH v2 0/7] mm: introduce shrinker debugfs interface Roman Gushchin
  2022-04-22 20:26 ` [PATCH v2 1/7] mm: introduce debugfs interface for kernel memory shrinkers Roman Gushchin
  2022-04-22 20:26 ` [PATCH v2 2/7] mm: memcontrol: introduce mem_cgroup_ino() and mem_cgroup_get_from_ino() Roman Gushchin
@ 2022-04-22 20:26 ` Roman Gushchin
  2022-04-23  0:35   ` Hillf Danton
  2022-04-22 20:26 ` [PATCH v2 4/7] mm: introduce numa " Roman Gushchin
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Roman Gushchin @ 2022-04-22 20:26 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Dave Chinner, linux-kernel, Yang Shi, Kent Overstreet,
	Hillf Danton, Roman Gushchin

This commit introduces "count_memcg" and "scan_memcg" interfaces
for memcg-aware shrinkers.

Count_memcg using the following format:
<cgroup inode number1> <count2>
<cgroup inode number2> <count2>
...

Memory cgroups with 0 associated objects are skipped.

Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
---
 mm/shrinker_debug.c | 186 +++++++++++++++++++++++++++++++++-----------
 1 file changed, 139 insertions(+), 47 deletions(-)

diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c
index 4df7382a0737..002d44d6ad56 100644
--- a/mm/shrinker_debug.c
+++ b/mm/shrinker_debug.c
@@ -1,8 +1,10 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <linux/idr.h>
+#include <linux/slab.h>
 #include <linux/debugfs.h>
 #include <linux/seq_file.h>
 #include <linux/shrinker.h>
+#include <linux/memcontrol.h>
 
 /* defined in vmscan.c */
 extern struct rw_semaphore shrinker_rwsem;
@@ -11,25 +13,25 @@ extern struct list_head shrinker_list;
 static DEFINE_IDA(shrinker_debugfs_ida);
 static struct dentry *shrinker_debugfs_root;
 
-static int shrinker_debugfs_count_show(struct seq_file *m, void *v)
+static unsigned long shrinker_count_objects(struct shrinker *shrinker,
+					    struct mem_cgroup *memcg,
+					    unsigned long *count_per_node)
 {
-	struct shrinker *shrinker = (struct shrinker *)m->private;
 	unsigned long nr, total = 0;
-	int ret, nid;
-
-	ret = down_read_killable(&shrinker_rwsem);
-	if (ret)
-		return ret;
+	int nid;
 
 	for_each_node(nid) {
 		struct shrink_control sc = {
 			.gfp_mask = GFP_KERNEL,
 			.nid = nid,
+			.memcg = memcg,
 		};
 
 		nr = shrinker->count_objects(shrinker, &sc);
 		if (nr == SHRINK_EMPTY)
 			nr = 0;
+		if (count_per_node)
+			count_per_node[nid] = nr;
 		total += nr;
 
 		if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
@@ -37,32 +39,17 @@ static int shrinker_debugfs_count_show(struct seq_file *m, void *v)
 
 		cond_resched();
 	}
-	up_read(&shrinker_rwsem);
-
-	seq_printf(m, "%lu\n", total);
 
-	return ret;
+	return total;
 }
-DEFINE_SHOW_ATTRIBUTE(shrinker_debugfs_count);
 
-static ssize_t shrinker_debugfs_scan_write(struct file *file,
-					   const char __user *buf,
-					   size_t size, loff_t *pos)
+static int shrinker_scan_objects(struct shrinker *shrinker,
+				 struct mem_cgroup *memcg,
+				 unsigned long nr_to_scan)
 {
-	struct shrinker *shrinker = (struct shrinker *)file->private_data;
-	unsigned long nr, total = 0, nr_to_scan;
-	unsigned long *count_per_node = NULL;
-	int nid;
-	char kbuf[24];
-	int read_len = size < (sizeof(kbuf) - 1) ? size : (sizeof(kbuf) - 1);
-	ssize_t ret;
-
-	if (copy_from_user(kbuf, buf, read_len))
-		return -EFAULT;
-	kbuf[read_len] = '\0';
-
-	if (kstrtoul(kbuf, 10, &nr_to_scan))
-		return -EINVAL;
+	unsigned long *count_per_node;
+	unsigned long total, nr;
+	int ret, nid;
 
 	ret = down_read_killable(&shrinker_rwsem);
 	if (ret)
@@ -80,20 +67,7 @@ static ssize_t shrinker_debugfs_scan_write(struct file *file,
 			goto out;
 		}
 
-		for_each_node(nid) {
-			struct shrink_control sc = {
-				.gfp_mask = GFP_KERNEL,
-				.nid = nid,
-			};
-
-			nr = shrinker->count_objects(shrinker, &sc);
-			if (nr == SHRINK_EMPTY)
-				nr = 0;
-			count_per_node[nid] = nr;
-			total += nr;
-
-			cond_resched();
-		}
+		total = shrinker_count_objects(shrinker, memcg, count_per_node);
 	}
 
 	for_each_node(nid) {
@@ -102,13 +76,13 @@ static ssize_t shrinker_debugfs_scan_write(struct file *file,
 			.nid = nid,
 		};
 
-		if (shrinker->flags & SHRINKER_NUMA_AWARE) {
+		if (count_per_node) {
 			sc.nr_to_scan = nr_to_scan * count_per_node[nid] /
 				(total ? total : 1);
 			sc.nr_scanned = sc.nr_to_scan;
 		} else {
 			sc.nr_to_scan = nr_to_scan;
-			sc.nr_scanned = sc.nr_to_scan;
+			sc.nr_scanned = nr_to_scan;
 		}
 
 		nr = shrinker->scan_objects(shrinker, &sc);
@@ -119,15 +93,51 @@ static ssize_t shrinker_debugfs_scan_write(struct file *file,
 			break;
 
 		cond_resched();
-
 	}
-	ret = size;
 out:
 	up_read(&shrinker_rwsem);
 	kfree(count_per_node);
 	return ret;
 }
 
+static int shrinker_debugfs_count_show(struct seq_file *m, void *v)
+{
+	struct shrinker *shrinker = (struct shrinker *)m->private;
+	int ret;
+
+	ret = down_read_killable(&shrinker_rwsem);
+	if (!ret) {
+		unsigned long total = shrinker_count_objects(shrinker, NULL, NULL);
+
+		up_read(&shrinker_rwsem);
+		seq_printf(m, "%lu\n", total);
+	}
+	return ret;
+}
+DEFINE_SHOW_ATTRIBUTE(shrinker_debugfs_count);
+
+static ssize_t shrinker_debugfs_scan_write(struct file *file,
+					   const char __user *buf,
+					   size_t size, loff_t *pos)
+{
+	struct shrinker *shrinker = (struct shrinker *)file->private_data;
+	unsigned long nr_to_scan;
+	char kbuf[24];
+	int read_len = size < (sizeof(kbuf) - 1) ? size : (sizeof(kbuf) - 1);
+	ssize_t ret;
+
+	if (copy_from_user(kbuf, buf, read_len))
+		return -EFAULT;
+	kbuf[read_len] = '\0';
+
+	if (kstrtoul(kbuf, 10, &nr_to_scan))
+		return -EINVAL;
+
+	ret = shrinker_scan_objects(shrinker, NULL, nr_to_scan);
+
+	return ret ? ret : size;
+}
+
 static int shrinker_debugfs_scan_open(struct inode *inode, struct file *file)
 {
 	file->private_data = inode->i_private;
@@ -140,6 +150,78 @@ static const struct file_operations shrinker_debugfs_scan_fops = {
 	.write	 = shrinker_debugfs_scan_write,
 };
 
+#ifdef CONFIG_MEMCG
+static int shrinker_debugfs_count_memcg_show(struct seq_file *m, void *v)
+{
+	struct shrinker *shrinker = (struct shrinker *)m->private;
+	struct mem_cgroup *memcg;
+	unsigned long total;
+	int ret;
+
+	ret = down_read_killable(&shrinker_rwsem);
+	if (ret)
+		return ret;
+	rcu_read_lock();
+
+	memcg = mem_cgroup_iter(NULL, NULL, NULL);
+	do {
+		if (!mem_cgroup_online(memcg))
+			continue;
+
+		total = shrinker_count_objects(shrinker, memcg, NULL);
+		if (!total)
+			continue;
+
+		seq_printf(m, "%lu %lu\n", mem_cgroup_ino(memcg), total);
+	} while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
+
+	rcu_read_unlock();
+	up_read(&shrinker_rwsem);
+
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(shrinker_debugfs_count_memcg);
+
+static ssize_t shrinker_debugfs_scan_memcg_write(struct file *file,
+						 const char __user *buf,
+						 size_t size, loff_t *pos)
+{
+	struct shrinker *shrinker = (struct shrinker *)file->private_data;
+	unsigned long nr_to_scan, ino;
+	struct mem_cgroup *memcg;
+	char kbuf[48];
+	int read_len = size < (sizeof(kbuf) - 1) ? size : (sizeof(kbuf) - 1);
+	ssize_t ret;
+
+	if (copy_from_user(kbuf, buf, read_len))
+		return -EFAULT;
+	kbuf[read_len] = '\0';
+
+	if (sscanf(kbuf, "%lu %lu", &ino, &nr_to_scan) < 2)
+		return -EINVAL;
+
+	memcg = mem_cgroup_get_from_ino(ino);
+	if (!memcg || IS_ERR(memcg))
+		return -ENOENT;
+
+	if (!mem_cgroup_online(memcg)) {
+		mem_cgroup_put(memcg);
+		return -ENOENT;
+	}
+
+	ret = shrinker_scan_objects(shrinker, memcg, nr_to_scan);
+	mem_cgroup_put(memcg);
+
+	return ret ? ret : size;
+}
+
+static const struct file_operations shrinker_debugfs_scan_memcg_fops = {
+	.owner	 = THIS_MODULE,
+	.open	 = shrinker_debugfs_scan_open,
+	.write	 = shrinker_debugfs_scan_memcg_write,
+};
+#endif
+
 int shrinker_debugfs_add(struct shrinker *shrinker)
 {
 	struct dentry *entry;
@@ -173,6 +255,16 @@ int shrinker_debugfs_add(struct shrinker *shrinker)
 	debugfs_create_file("scan", 0440, entry, shrinker,
 			    &shrinker_debugfs_scan_fops);
 
+#ifdef CONFIG_MEMCG
+	/* create memcg interfaces */
+	if (shrinker->flags & SHRINKER_MEMCG_AWARE) {
+		debugfs_create_file("count_memcg", 0220, entry, shrinker,
+				    &shrinker_debugfs_count_memcg_fops);
+		debugfs_create_file("scan_memcg", 0440, entry, shrinker,
+				    &shrinker_debugfs_scan_memcg_fops);
+	}
+#endif
+
 	return 0;
 }
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 4/7] mm: introduce numa interfaces for shrinker debugfs
  2022-04-22 20:26 [PATCH v2 0/7] mm: introduce shrinker debugfs interface Roman Gushchin
                   ` (2 preceding siblings ...)
  2022-04-22 20:26 ` [PATCH v2 3/7] mm: introduce memcg interfaces for shrinker debugfs Roman Gushchin
@ 2022-04-22 20:26 ` Roman Gushchin
  2022-04-22 20:26 ` [PATCH v2 5/7] mm: provide shrinkers with names Roman Gushchin
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Roman Gushchin @ 2022-04-22 20:26 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Dave Chinner, linux-kernel, Yang Shi, Kent Overstreet,
	Hillf Danton, Roman Gushchin

This commit introduces "count_node", "scan_node", "count_memcg_node"
and "scan_memcg_node" interfaces for numa-aware and numa- and
memcg-aware shrinkers.

Usage examples:
1) Get per-node and per-memcg per-node counts:
  $ cat count_node
    209 3
  $ cat count_memcg_node
    1 209 3
    20 96 0
    53 810 7
    2297 2 0
    218 13 0
    581 30 0
    911 124 0
    <CUT>

2) Scan individual node:
  $ echo "1 200" > scan_node
  $ cat scan_node
    2

3) Scan individual memcg and node:
  $ echo "1868 0 500" > scan_memcg_node
  $ cat scan_memcg_node
    435

Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
---
 mm/shrinker_debug.c | 200 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 200 insertions(+)

diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c
index 002d44d6ad56..81350b64bf01 100644
--- a/mm/shrinker_debug.c
+++ b/mm/shrinker_debug.c
@@ -222,6 +222,185 @@ static const struct file_operations shrinker_debugfs_scan_memcg_fops = {
 };
 #endif
 
+#ifdef CONFIG_NUMA
+static int shrinker_debugfs_count_node_show(struct seq_file *m, void *v)
+{
+	struct shrinker *shrinker = (struct shrinker *)m->private;
+	unsigned long nr;
+	int ret, nid;
+
+	ret = down_read_killable(&shrinker_rwsem);
+	if (ret)
+		return ret;
+
+	for_each_node(nid) {
+		struct shrink_control sc = {
+			.gfp_mask = GFP_KERNEL,
+			.nid = nid,
+		};
+
+		nr = shrinker->count_objects(shrinker, &sc);
+		if (nr == SHRINK_EMPTY)
+			nr = 0;
+
+		seq_printf(m, "%s%lu", nid ? " " : "", nr);
+		cond_resched();
+	}
+	up_read(&shrinker_rwsem);
+	seq_puts(m, "\n");
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(shrinker_debugfs_count_node);
+
+static ssize_t shrinker_debugfs_scan_node_write(struct file *file,
+						const char __user *buf,
+						size_t size, loff_t *pos)
+{
+	struct shrinker *shrinker = (struct shrinker *)file->private_data;
+	unsigned long nr_to_scan = 0;
+	int nid;
+	struct shrink_control sc = {
+		.gfp_mask = GFP_KERNEL,
+	};
+	char kbuf[48];
+	int read_len = size < (sizeof(kbuf) - 1) ? size : (sizeof(kbuf) - 1);
+	ssize_t ret;
+
+	if (copy_from_user(kbuf, buf, read_len))
+		return -EFAULT;
+	kbuf[read_len] = '\0';
+
+	if (sscanf(kbuf, "%d %lu", &nid, &nr_to_scan) < 2)
+		return -EINVAL;
+
+	if (nid < 0 || nid >= nr_node_ids)
+		return -EINVAL;
+
+	ret = down_read_killable(&shrinker_rwsem);
+	if (ret)
+		return ret;
+
+	sc.nid = nid;
+	sc.nr_to_scan = nr_to_scan;
+	sc.nr_scanned = nr_to_scan;
+
+	shrinker->scan_objects(shrinker, &sc);
+
+	up_read(&shrinker_rwsem);
+
+	return ret ? ret : size;
+}
+
+static const struct file_operations shrinker_debugfs_scan_node_fops = {
+	.owner	 = THIS_MODULE,
+	.open	 = shrinker_debugfs_scan_open,
+	.write	 = shrinker_debugfs_scan_node_write,
+};
+
+#ifdef CONFIG_MEMCG
+static int shrinker_debugfs_count_memcg_node_show(struct seq_file *m, void *v)
+{
+	struct shrinker *shrinker = (struct shrinker *)m->private;
+	unsigned long *count_per_node = NULL;
+	struct mem_cgroup *memcg;
+	unsigned long total;
+	int ret, nid;
+
+	count_per_node = kcalloc(nr_node_ids, sizeof(unsigned long), GFP_KERNEL);
+	if (!count_per_node)
+		return -ENOMEM;
+
+	ret = down_read_killable(&shrinker_rwsem);
+	if (ret) {
+		kfree(count_per_node);
+		return ret;
+	}
+	rcu_read_lock();
+
+	memcg = mem_cgroup_iter(NULL, NULL, NULL);
+	do {
+		if (!mem_cgroup_online(memcg))
+			continue;
+
+		total = shrinker_count_objects(shrinker, memcg, count_per_node);
+		if (!total)
+			continue;
+
+		seq_printf(m, "%lu", mem_cgroup_ino(memcg));
+		for_each_node(nid)
+			seq_printf(m, " %lu", count_per_node[nid]);
+		seq_puts(m, "\n");
+	} while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
+
+	rcu_read_unlock();
+	up_read(&shrinker_rwsem);
+
+	kfree(count_per_node);
+	return ret;
+}
+DEFINE_SHOW_ATTRIBUTE(shrinker_debugfs_count_memcg_node);
+
+static ssize_t shrinker_debugfs_scan_memcg_node_write(struct file *file,
+						      const char __user *buf,
+						      size_t size, loff_t *pos)
+{
+	struct shrinker *shrinker = (struct shrinker *)file->private_data;
+	unsigned long nr_to_scan = 0, ino;
+	struct shrink_control sc = {
+		.gfp_mask = GFP_KERNEL,
+	};
+	struct mem_cgroup *memcg;
+	int nid;
+	char kbuf[72];
+	int read_len = size < (sizeof(kbuf) - 1) ? size : (sizeof(kbuf) - 1);
+	ssize_t ret;
+
+	if (copy_from_user(kbuf, buf, read_len))
+		return -EFAULT;
+	kbuf[read_len] = '\0';
+
+	if (sscanf(kbuf, "%lu %d %lu", &ino, &nid, &nr_to_scan) < 2)
+		return -EINVAL;
+
+	if (nid < 0 || nid >= nr_node_ids)
+		return -EINVAL;
+
+	memcg = mem_cgroup_get_from_ino(ino);
+	if (!memcg || IS_ERR(memcg))
+		return -ENOENT;
+
+	if (!mem_cgroup_online(memcg)) {
+		mem_cgroup_put(memcg);
+		return -ENOENT;
+	}
+
+	ret = down_read_killable(&shrinker_rwsem);
+	if (ret) {
+		mem_cgroup_put(memcg);
+		return ret;
+	}
+
+	sc.nid = nid;
+	sc.memcg = memcg;
+	sc.nr_to_scan = nr_to_scan;
+	sc.nr_scanned = nr_to_scan;
+
+	shrinker->scan_objects(shrinker, &sc);
+
+	up_read(&shrinker_rwsem);
+	mem_cgroup_put(memcg);
+
+	return ret ? ret : size;
+}
+
+static const struct file_operations shrinker_debugfs_scan_memcg_node_fops = {
+	.owner	 = THIS_MODULE,
+	.open	 = shrinker_debugfs_scan_open,
+	.write	 = shrinker_debugfs_scan_memcg_node_write,
+};
+#endif /* CONFIG_MEMCG */
+#endif /* CONFIG_NUMA */
+
 int shrinker_debugfs_add(struct shrinker *shrinker)
 {
 	struct dentry *entry;
@@ -265,6 +444,27 @@ int shrinker_debugfs_add(struct shrinker *shrinker)
 	}
 #endif
 
+#ifdef CONFIG_NUMA
+	/* create numa and memcg/numa interfaces */
+	if ((shrinker->flags & SHRINKER_NUMA_AWARE) && nr_node_ids > 1) {
+		debugfs_create_file("count_node", 0220, entry, shrinker,
+				    &shrinker_debugfs_count_node_fops);
+		debugfs_create_file("scan_node", 0440, entry, shrinker,
+				    &shrinker_debugfs_scan_node_fops);
+
+#ifdef CONFIG_MEMCG
+		if (shrinker->flags & SHRINKER_MEMCG_AWARE) {
+			debugfs_create_file("count_memcg_node", 0220, entry,
+					    shrinker,
+					    &shrinker_debugfs_count_memcg_node_fops);
+			debugfs_create_file("scan_memcg_node", 0440, entry,
+					    shrinker,
+					    &shrinker_debugfs_scan_memcg_node_fops);
+		}
+#endif /* CONFIG_MEMCG */
+	}
+#endif /* CONFIG_NUMA */
+
 	return 0;
 }
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 5/7] mm: provide shrinkers with names
  2022-04-22 20:26 [PATCH v2 0/7] mm: introduce shrinker debugfs interface Roman Gushchin
                   ` (3 preceding siblings ...)
  2022-04-22 20:26 ` [PATCH v2 4/7] mm: introduce numa " Roman Gushchin
@ 2022-04-22 20:26 ` Roman Gushchin
  2022-04-23  7:46       ` Christophe JAILLET
  2022-04-22 20:26 ` [PATCH v2 6/7] docs: document shrinker debugfs Roman Gushchin
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Roman Gushchin @ 2022-04-22 20:26 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Dave Chinner, linux-kernel, Yang Shi, Kent Overstreet,
	Hillf Danton, Roman Gushchin

Currently shrinkers are anonymous objects. For debugging purposes they
can be identified by count/scan function names, but it's not always
useful: e.g. for superblock's shrinkers it's nice to have at least
an idea of to which superblock the shrinker belongs.

This commit adds names to shrinkers. register_shrinker() and
prealloc_shrinker() functions are extended to take a format and
arguments to master a name. If CONFIG_SHRINKER_DEBUG is on,
the name is saved until the corresponding debugfs object is created,
otherwise it's simple ignored.

After this change the shrinker debugfs directory looks like:
  $ cd /sys/kernel/debug/shrinker/
  $ ls
    dqcache-16          sb-cgroup2-30    sb-hugetlbfs-33  sb-proc-41       sb-selinuxfs-22  sb-tmpfs-40    sb-zsmalloc-19
    kfree_rcu-0         sb-configfs-23   sb-iomem-12      sb-proc-44       sb-sockfs-8      sb-tmpfs-42    shadow-18
    sb-aio-20           sb-dax-11        sb-mqueue-21     sb-proc-45       sb-sysfs-26      sb-tmpfs-43    thp_deferred_split-10
    sb-anon_inodefs-15  sb-debugfs-7     sb-nsfs-4        sb-proc-47       sb-tmpfs-1       sb-tmpfs-46    thp_zero-9
    sb-bdev-3           sb-devpts-28     sb-pipefs-14     sb-pstore-31     sb-tmpfs-27      sb-tmpfs-49    xfs_buf-37
    sb-bpf-32           sb-devtmpfs-5    sb-proc-25       sb-rootfs-2      sb-tmpfs-29      sb-tracefs-13  xfs_inodegc-38
    sb-btrfs-24         sb-hugetlbfs-17  sb-proc-39       sb-securityfs-6  sb-tmpfs-35      sb-xfs-36      zspool-34

Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
---
 arch/x86/kvm/mmu/mmu.c                        |  2 +-
 drivers/android/binder_alloc.c                |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_shrinker.c  |  3 +-
 drivers/gpu/drm/msm/msm_gem_shrinker.c        |  2 +-
 .../gpu/drm/panfrost/panfrost_gem_shrinker.c  |  2 +-
 drivers/gpu/drm/ttm/ttm_pool.c                |  2 +-
 drivers/md/bcache/btree.c                     |  2 +-
 drivers/md/dm-bufio.c                         |  2 +-
 drivers/md/dm-zoned-metadata.c                |  2 +-
 drivers/md/raid5.c                            |  2 +-
 drivers/misc/vmw_balloon.c                    |  2 +-
 drivers/virtio/virtio_balloon.c               |  2 +-
 drivers/xen/xenbus/xenbus_probe_backend.c     |  2 +-
 fs/erofs/utils.c                              |  2 +-
 fs/ext4/extents_status.c                      |  3 +-
 fs/f2fs/super.c                               |  2 +-
 fs/gfs2/glock.c                               |  2 +-
 fs/gfs2/main.c                                |  2 +-
 fs/jbd2/journal.c                             |  2 +-
 fs/mbcache.c                                  |  2 +-
 fs/nfs/nfs42xattr.c                           |  7 ++-
 fs/nfs/super.c                                |  2 +-
 fs/nfsd/filecache.c                           |  2 +-
 fs/nfsd/nfscache.c                            |  2 +-
 fs/quota/dquot.c                              |  2 +-
 fs/super.c                                    |  2 +-
 fs/ubifs/super.c                              |  2 +-
 fs/xfs/xfs_buf.c                              |  2 +-
 fs/xfs/xfs_icache.c                           |  2 +-
 fs/xfs/xfs_qm.c                               |  2 +-
 include/linux/shrinker.h                      |  5 +-
 kernel/rcu/tree.c                             |  2 +-
 mm/huge_memory.c                              |  4 +-
 mm/shrinker_debug.c                           |  7 ++-
 mm/vmscan.c                                   | 60 ++++++++++++++++++-
 mm/workingset.c                               |  2 +-
 mm/zsmalloc.c                                 |  2 +-
 net/sunrpc/auth.c                             |  2 +-
 38 files changed, 106 insertions(+), 45 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c623019929a7..8cfabdd63406 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6283,7 +6283,7 @@ int kvm_mmu_vendor_module_init(void)
 	if (percpu_counter_init(&kvm_total_used_mmu_pages, 0, GFP_KERNEL))
 		goto out;
 
-	ret = register_shrinker(&mmu_shrinker);
+	ret = register_shrinker(&mmu_shrinker, "mmu");
 	if (ret)
 		goto out;
 
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 2ac1008a5f39..951343c41ba8 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -1084,7 +1084,7 @@ int binder_alloc_shrinker_init(void)
 	int ret = list_lru_init(&binder_alloc_lru);
 
 	if (ret == 0) {
-		ret = register_shrinker(&binder_shrinker);
+		ret = register_shrinker(&binder_shrinker, "binder");
 		if (ret)
 			list_lru_destroy(&binder_alloc_lru);
 	}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
index 6a6ff98a8746..85524ef92ea4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
@@ -426,7 +426,8 @@ void i915_gem_driver_register__shrinker(struct drm_i915_private *i915)
 	i915->mm.shrinker.count_objects = i915_gem_shrinker_count;
 	i915->mm.shrinker.seeks = DEFAULT_SEEKS;
 	i915->mm.shrinker.batch = 4096;
-	drm_WARN_ON(&i915->drm, register_shrinker(&i915->mm.shrinker));
+	drm_WARN_ON(&i915->drm, register_shrinker(&i915->mm.shrinker,
+						  "drm_i915_gem"));
 
 	i915->mm.oom_notifier.notifier_call = i915_gem_shrinker_oom;
 	drm_WARN_ON(&i915->drm, register_oom_notifier(&i915->mm.oom_notifier));
diff --git a/drivers/gpu/drm/msm/msm_gem_shrinker.c b/drivers/gpu/drm/msm/msm_gem_shrinker.c
index 086dacf2f26a..2d3cf4f13dfd 100644
--- a/drivers/gpu/drm/msm/msm_gem_shrinker.c
+++ b/drivers/gpu/drm/msm/msm_gem_shrinker.c
@@ -221,7 +221,7 @@ void msm_gem_shrinker_init(struct drm_device *dev)
 	priv->shrinker.count_objects = msm_gem_shrinker_count;
 	priv->shrinker.scan_objects = msm_gem_shrinker_scan;
 	priv->shrinker.seeks = DEFAULT_SEEKS;
-	WARN_ON(register_shrinker(&priv->shrinker));
+	WARN_ON(register_shrinker(&priv->shrinker, "drm_msm_gem"));
 
 	priv->vmap_notifier.notifier_call = msm_gem_shrinker_vmap;
 	WARN_ON(register_vmap_purge_notifier(&priv->vmap_notifier));
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c b/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
index 77e7cb6d1ae3..0d028266ee9e 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
@@ -103,7 +103,7 @@ void panfrost_gem_shrinker_init(struct drm_device *dev)
 	pfdev->shrinker.count_objects = panfrost_gem_shrinker_count;
 	pfdev->shrinker.scan_objects = panfrost_gem_shrinker_scan;
 	pfdev->shrinker.seeks = DEFAULT_SEEKS;
-	WARN_ON(register_shrinker(&pfdev->shrinker));
+	WARN_ON(register_shrinker(&pfdev->shrinker, "drm_panfrost"));
 }
 
 /**
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 1bba0a0ed3f9..b8b41d242197 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -722,7 +722,7 @@ int ttm_pool_mgr_init(unsigned long num_pages)
 	mm_shrinker.count_objects = ttm_pool_shrinker_count;
 	mm_shrinker.scan_objects = ttm_pool_shrinker_scan;
 	mm_shrinker.seeks = 1;
-	return register_shrinker(&mm_shrinker);
+	return register_shrinker(&mm_shrinker, "drm_ttm_pool");
 }
 
 /**
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index ad9f16689419..c1f734ab86b3 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -812,7 +812,7 @@ int bch_btree_cache_alloc(struct cache_set *c)
 	c->shrink.seeks = 4;
 	c->shrink.batch = c->btree_pages * 2;
 
-	if (register_shrinker(&c->shrink))
+	if (register_shrinker(&c->shrink, "btree"))
 		pr_warn("bcache: %s: could not register shrinker\n",
 				__func__);
 
diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index 5ffa1dcf84cf..bcc95898c341 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -1806,7 +1806,7 @@ struct dm_bufio_client *dm_bufio_client_create(struct block_device *bdev, unsign
 	c->shrinker.scan_objects = dm_bufio_shrink_scan;
 	c->shrinker.seeks = 1;
 	c->shrinker.batch = 0;
-	r = register_shrinker(&c->shrinker);
+	r = register_shrinker(&c->shrinker, "dm_bufio");
 	if (r)
 		goto bad;
 
diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index d1ea66114d14..05f2fd12066b 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -2944,7 +2944,7 @@ int dmz_ctr_metadata(struct dmz_dev *dev, int num_dev,
 	zmd->mblk_shrinker.seeks = DEFAULT_SEEKS;
 
 	/* Metadata cache shrinker */
-	ret = register_shrinker(&zmd->mblk_shrinker);
+	ret = register_shrinker(&zmd->mblk_shrinker, "md_meta");
 	if (ret) {
 		dmz_zmd_err(zmd, "Register metadata cache shrinker failed");
 		goto err;
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 59f91e392a2a..1b326b93155c 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7383,7 +7383,7 @@ static struct r5conf *setup_conf(struct mddev *mddev)
 	conf->shrinker.count_objects = raid5_cache_count;
 	conf->shrinker.batch = 128;
 	conf->shrinker.flags = 0;
-	if (register_shrinker(&conf->shrinker)) {
+	if (register_shrinker(&conf->shrinker, "md")) {
 		pr_warn("md/raid:%s: couldn't register shrinker.\n",
 			mdname(mddev));
 		goto abort;
diff --git a/drivers/misc/vmw_balloon.c b/drivers/misc/vmw_balloon.c
index f1d8ba6d4857..6c9ddf1187dd 100644
--- a/drivers/misc/vmw_balloon.c
+++ b/drivers/misc/vmw_balloon.c
@@ -1587,7 +1587,7 @@ static int vmballoon_register_shrinker(struct vmballoon *b)
 	b->shrinker.count_objects = vmballoon_shrinker_count;
 	b->shrinker.seeks = DEFAULT_SEEKS;
 
-	r = register_shrinker(&b->shrinker);
+	r = register_shrinker(&b->shrinker, "vmw_balloon");
 
 	if (r == 0)
 		b->shrinker_registered = true;
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index f4c34a2a6b8e..093e06e19d0e 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -875,7 +875,7 @@ static int virtio_balloon_register_shrinker(struct virtio_balloon *vb)
 	vb->shrinker.count_objects = virtio_balloon_shrinker_count;
 	vb->shrinker.seeks = DEFAULT_SEEKS;
 
-	return register_shrinker(&vb->shrinker);
+	return register_shrinker(&vb->shrinker, "virtio_valloon");
 }
 
 static int virtballoon_probe(struct virtio_device *vdev)
diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c b/drivers/xen/xenbus/xenbus_probe_backend.c
index 5abded97e1a7..a6c5e344017d 100644
--- a/drivers/xen/xenbus/xenbus_probe_backend.c
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c
@@ -305,7 +305,7 @@ static int __init xenbus_probe_backend_init(void)
 
 	register_xenstore_notifier(&xenstore_notifier);
 
-	if (register_shrinker(&backend_memory_shrinker))
+	if (register_shrinker(&backend_memory_shrinker, "xen_backend"))
 		pr_warn("shrinker registration failed\n");
 
 	return 0;
diff --git a/fs/erofs/utils.c b/fs/erofs/utils.c
index ec9a1d780dc1..67eb64fadd4f 100644
--- a/fs/erofs/utils.c
+++ b/fs/erofs/utils.c
@@ -282,7 +282,7 @@ static struct shrinker erofs_shrinker_info = {
 
 int __init erofs_init_shrinker(void)
 {
-	return register_shrinker(&erofs_shrinker_info);
+	return register_shrinker(&erofs_shrinker_info, "erofs");
 }
 
 void erofs_exit_shrinker(void)
diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index 9a3a8996aacf..a7aa79d580e5 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -1650,11 +1650,10 @@ int ext4_es_register_shrinker(struct ext4_sb_info *sbi)
 	err = percpu_counter_init(&sbi->s_es_stats.es_stats_shk_cnt, 0, GFP_KERNEL);
 	if (err)
 		goto err3;
-
 	sbi->s_es_shrinker.scan_objects = ext4_es_scan;
 	sbi->s_es_shrinker.count_objects = ext4_es_count;
 	sbi->s_es_shrinker.seeks = DEFAULT_SEEKS;
-	err = register_shrinker(&sbi->s_es_shrinker);
+	err = register_shrinker(&sbi->s_es_shrinker, "ext4_es");
 	if (err)
 		goto err4;
 
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 4368f90571bd..2fc40a1635f3 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -4579,7 +4579,7 @@ static int __init init_f2fs_fs(void)
 	err = f2fs_init_sysfs();
 	if (err)
 		goto free_garbage_collection_cache;
-	err = register_shrinker(&f2fs_shrinker_info);
+	err = register_shrinker(&f2fs_shrinker_info, "f2fs");
 	if (err)
 		goto free_sysfs;
 	err = register_filesystem(&f2fs_fs_type);
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 26169cedcefc..791c23d9f7e7 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -2549,7 +2549,7 @@ int __init gfs2_glock_init(void)
 		return -ENOMEM;
 	}
 
-	ret = register_shrinker(&glock_shrinker);
+	ret = register_shrinker(&glock_shrinker, "gfs2_glock");
 	if (ret) {
 		destroy_workqueue(gfs2_delete_workqueue);
 		destroy_workqueue(glock_workqueue);
diff --git a/fs/gfs2/main.c b/fs/gfs2/main.c
index 28d0eb23e18e..dde981b78488 100644
--- a/fs/gfs2/main.c
+++ b/fs/gfs2/main.c
@@ -150,7 +150,7 @@ static int __init init_gfs2_fs(void)
 	if (!gfs2_trans_cachep)
 		goto fail_cachep8;
 
-	error = register_shrinker(&gfs2_qd_shrinker);
+	error = register_shrinker(&gfs2_qd_shrinker, "gfs2_qd");
 	if (error)
 		goto fail_shrinker;
 
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index c0cbeeaec2d1..e7786445ecc1 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1418,7 +1418,7 @@ static journal_t *journal_init_common(struct block_device *bdev,
 	if (percpu_counter_init(&journal->j_checkpoint_jh_count, 0, GFP_KERNEL))
 		goto err_cleanup;
 
-	if (register_shrinker(&journal->j_shrinker)) {
+	if (register_shrinker(&journal->j_shrinker, "jbd2_journal")) {
 		percpu_counter_destroy(&journal->j_checkpoint_jh_count);
 		goto err_cleanup;
 	}
diff --git a/fs/mbcache.c b/fs/mbcache.c
index 97c54d3a2227..379dc5b0b6ad 100644
--- a/fs/mbcache.c
+++ b/fs/mbcache.c
@@ -367,7 +367,7 @@ struct mb_cache *mb_cache_create(int bucket_bits)
 	cache->c_shrink.count_objects = mb_cache_count;
 	cache->c_shrink.scan_objects = mb_cache_scan;
 	cache->c_shrink.seeks = DEFAULT_SEEKS;
-	if (register_shrinker(&cache->c_shrink)) {
+	if (register_shrinker(&cache->c_shrink, "mb_cache")) {
 		kfree(cache->c_hash);
 		kfree(cache);
 		goto err_out;
diff --git a/fs/nfs/nfs42xattr.c b/fs/nfs/nfs42xattr.c
index e7b34f7e0614..147b8a2f2dc6 100644
--- a/fs/nfs/nfs42xattr.c
+++ b/fs/nfs/nfs42xattr.c
@@ -1017,15 +1017,16 @@ int __init nfs4_xattr_cache_init(void)
 	if (ret)
 		goto out2;
 
-	ret = register_shrinker(&nfs4_xattr_cache_shrinker);
+	ret = register_shrinker(&nfs4_xattr_cache_shrinker, "nfs_xattr_cache");
 	if (ret)
 		goto out1;
 
-	ret = register_shrinker(&nfs4_xattr_entry_shrinker);
+	ret = register_shrinker(&nfs4_xattr_entry_shrinker, "nfs_xattr_entry");
 	if (ret)
 		goto out;
 
-	ret = register_shrinker(&nfs4_xattr_large_entry_shrinker);
+	ret = register_shrinker(&nfs4_xattr_large_entry_shrinker,
+				"nfs_xattr_large_entry");
 	if (!ret)
 		return 0;
 
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 6ab5eeb000dc..c7a2aef911f1 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -149,7 +149,7 @@ int __init register_nfs_fs(void)
 	ret = nfs_register_sysctl();
 	if (ret < 0)
 		goto error_2;
-	ret = register_shrinker(&acl_shrinker);
+	ret = register_shrinker(&acl_shrinker, "nfs_acl");
 	if (ret < 0)
 		goto error_3;
 #ifdef CONFIG_NFS_V4_2
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 2c1b027774d4..9c2879a3c3c0 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -666,7 +666,7 @@ nfsd_file_cache_init(void)
 		goto out_err;
 	}
 
-	ret = register_shrinker(&nfsd_file_shrinker);
+	ret = register_shrinker(&nfsd_file_shrinker, "nfsd_filecache");
 	if (ret) {
 		pr_err("nfsd: failed to register nfsd_file_shrinker: %d\n", ret);
 		goto out_lru;
diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index 0b3f12aa37ff..f1cfb06d0be5 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -176,7 +176,7 @@ int nfsd_reply_cache_init(struct nfsd_net *nn)
 	nn->nfsd_reply_cache_shrinker.scan_objects = nfsd_reply_cache_scan;
 	nn->nfsd_reply_cache_shrinker.count_objects = nfsd_reply_cache_count;
 	nn->nfsd_reply_cache_shrinker.seeks = 1;
-	status = register_shrinker(&nn->nfsd_reply_cache_shrinker);
+	status = register_shrinker(&nn->nfsd_reply_cache_shrinker, "nfsd_reply");
 	if (status)
 		goto out_stats_destroy;
 
diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index a74aef99bd3d..854d2b1d0914 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -2985,7 +2985,7 @@ static int __init dquot_init(void)
 	pr_info("VFS: Dquot-cache hash table entries: %ld (order %ld,"
 		" %ld bytes)\n", nr_hash, order, (PAGE_SIZE << order));
 
-	if (register_shrinker(&dqcache_shrinker))
+	if (register_shrinker(&dqcache_shrinker, "dqcache"))
 		panic("Cannot register dquot shrinker");
 
 	return 0;
diff --git a/fs/super.c b/fs/super.c
index 60f57c7bc0a6..5776a3dbaf10 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -265,7 +265,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
 	s->s_shrink.count_objects = super_cache_count;
 	s->s_shrink.batch = 1024;
 	s->s_shrink.flags = SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE;
-	if (prealloc_shrinker(&s->s_shrink))
+	if (prealloc_shrinker(&s->s_shrink, "sb-%s", type->name))
 		goto fail;
 	if (list_lru_init_memcg(&s->s_dentry_lru, &s->s_shrink))
 		goto fail;
diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index bad67455215f..a3663d201f64 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -2430,7 +2430,7 @@ static int __init ubifs_init(void)
 	if (!ubifs_inode_slab)
 		return -ENOMEM;
 
-	err = register_shrinker(&ubifs_shrinker_info);
+	err = register_shrinker(&ubifs_shrinker_info, "ubifs");
 	if (err)
 		goto out_slab;
 
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index e1afb9e503e1..5645e92df0c9 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1986,7 +1986,7 @@ xfs_alloc_buftarg(
 	btp->bt_shrinker.scan_objects = xfs_buftarg_shrink_scan;
 	btp->bt_shrinker.seeks = DEFAULT_SEEKS;
 	btp->bt_shrinker.flags = SHRINKER_NUMA_AWARE;
-	if (register_shrinker(&btp->bt_shrinker))
+	if (register_shrinker(&btp->bt_shrinker, "xfs_buf"))
 		goto error_pcpu;
 	return btp;
 
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index bffd6eb0b298..d0c4e74ff763 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -2198,5 +2198,5 @@ xfs_inodegc_register_shrinker(
 	shrink->flags = SHRINKER_NONSLAB;
 	shrink->batch = XFS_INODEGC_SHRINKER_BATCH;
 
-	return register_shrinker(shrink);
+	return register_shrinker(shrink, "xfs_inodegc");
 }
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index f165d1a3de1d..93ded9e81f49 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -686,7 +686,7 @@ xfs_qm_init_quotainfo(
 	qinf->qi_shrinker.seeks = DEFAULT_SEEKS;
 	qinf->qi_shrinker.flags = SHRINKER_NUMA_AWARE;
 
-	error = register_shrinker(&qinf->qi_shrinker);
+	error = register_shrinker(&qinf->qi_shrinker, "xfs_qm");
 	if (error)
 		goto out_free_inos;
 
diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 17985a890887..8dffe56d6ad1 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -75,6 +75,7 @@ struct shrinker {
 #endif
 #ifdef CONFIG_SHRINKER_DEBUG
 	int debugfs_id;
+	char *name;
 	struct dentry *debugfs_entry;
 #endif
 	/* objs pending delete, per node */
@@ -92,9 +93,9 @@ struct shrinker {
  */
 #define SHRINKER_NONSLAB	(1 << 3)
 
-extern int prealloc_shrinker(struct shrinker *shrinker);
+extern int prealloc_shrinker(struct shrinker *shrinker, const char *fmt, ...);
 extern void register_shrinker_prepared(struct shrinker *shrinker);
-extern int register_shrinker(struct shrinker *shrinker);
+extern int register_shrinker(struct shrinker *shrinker, const char *fmt, ...);
 extern void unregister_shrinker(struct shrinker *shrinker);
 extern void free_prealloced_shrinker(struct shrinker *shrinker);
 extern void synchronize_shrinkers(void);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 5c587e00811c..b4c66916bea9 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4978,7 +4978,7 @@ static void __init kfree_rcu_batch_init(void)
 		INIT_DELAYED_WORK(&krcp->page_cache_work, fill_page_cache_func);
 		krcp->initialized = true;
 	}
-	if (register_shrinker(&kfree_rcu_shrinker))
+	if (register_shrinker(&kfree_rcu_shrinker, "kfree_rcu"))
 		pr_err("Failed to register kfree_rcu() shrinker!\n");
 }
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 541c1eba072f..76086984347f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -423,10 +423,10 @@ static int __init hugepage_init(void)
 	if (err)
 		goto err_slab;
 
-	err = register_shrinker(&huge_zero_page_shrinker);
+	err = register_shrinker(&huge_zero_page_shrinker, "thp_zero");
 	if (err)
 		goto err_hzp_shrinker;
-	err = register_shrinker(&deferred_split_shrinker);
+	err = register_shrinker(&deferred_split_shrinker, "thp_deferred_split");
 	if (err)
 		goto err_split_shrinker;
 
diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c
index 81350b64bf01..e2f7ef5fecfc 100644
--- a/mm/shrinker_debug.c
+++ b/mm/shrinker_debug.c
@@ -418,7 +418,7 @@ int shrinker_debugfs_add(struct shrinker *shrinker)
 		return id;
 	shrinker->debugfs_id = id;
 
-	snprintf(buf, sizeof(buf), "%d", id);
+	snprintf(buf, sizeof(buf), "%s-%d", shrinker->name, id);
 
 	/* create debugfs entry */
 	entry = debugfs_create_dir(buf, shrinker_debugfs_root);
@@ -465,6 +465,10 @@ int shrinker_debugfs_add(struct shrinker *shrinker)
 	}
 #endif /* CONFIG_NUMA */
 
+	/* shrinker->name is not needed anymore, free it */
+	kfree(shrinker->name);
+	shrinker->name = NULL;
+
 	return 0;
 }
 
@@ -477,6 +481,7 @@ void shrinker_debugfs_remove(struct shrinker *shrinker)
 
 	debugfs_remove_recursive(shrinker->debugfs_entry);
 	ida_free(&shrinker_debugfs_ida, shrinker->debugfs_id);
+	WARN_ON_ONCE(shrinker->name);
 }
 
 static int __init shrinker_debugfs_init(void)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 121a54a1602b..6025cfda4932 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -613,7 +613,7 @@ static unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru,
 /*
  * Add a shrinker callback to be called from the vm.
  */
-int prealloc_shrinker(struct shrinker *shrinker)
+static int __prealloc_shrinker(struct shrinker *shrinker)
 {
 	unsigned int size;
 	int err;
@@ -637,6 +637,34 @@ int prealloc_shrinker(struct shrinker *shrinker)
 	return 0;
 }
 
+#ifdef CONFIG_SHRINKER_DEBUG
+int prealloc_shrinker(struct shrinker *shrinker, const char *fmt, ...)
+{
+	int err;
+	char buf[64];
+	va_list ap;
+
+	va_start(ap, fmt);
+	vscnprintf(buf, sizeof(buf), fmt, ap);
+	va_end(ap);
+
+	shrinker->name = kstrdup(buf, GFP_KERNEL);
+	if (!shrinker->name)
+		return -ENOMEM;
+
+	err = __prealloc_shrinker(shrinker);
+	if (err)
+		kfree(shrinker->name);
+
+	return err;
+}
+#else
+int prealloc_shrinker(struct shrinker *shrinker, const char *fmt, ...)
+{
+	return __prealloc_shrinker(shrinker);
+}
+#endif
+
 void free_prealloced_shrinker(struct shrinker *shrinker)
 {
 	if (shrinker->flags & SHRINKER_MEMCG_AWARE) {
@@ -648,6 +676,9 @@ void free_prealloced_shrinker(struct shrinker *shrinker)
 
 	kfree(shrinker->nr_deferred);
 	shrinker->nr_deferred = NULL;
+#ifdef CONFIG_SHRINKER_DEBUG
+	kfree(shrinker->name);
+#endif
 }
 
 void register_shrinker_prepared(struct shrinker *shrinker)
@@ -659,15 +690,38 @@ void register_shrinker_prepared(struct shrinker *shrinker)
 	up_write(&shrinker_rwsem);
 }
 
-int register_shrinker(struct shrinker *shrinker)
+static int __register_shrinker(struct shrinker *shrinker)
 {
-	int err = prealloc_shrinker(shrinker);
+	int err = __prealloc_shrinker(shrinker);
 
 	if (err)
 		return err;
 	register_shrinker_prepared(shrinker);
 	return 0;
 }
+
+#ifdef CONFIG_SHRINKER_DEBUG
+int register_shrinker(struct shrinker *shrinker, const char *fmt, ...)
+{
+	char buf[64];
+	va_list ap;
+
+	va_start(ap, fmt);
+	vscnprintf(buf, sizeof(buf), fmt, ap);
+	va_end(ap);
+
+	shrinker->name = kstrdup(buf, GFP_KERNEL);
+	if (!shrinker->name)
+		return -ENOMEM;
+
+	return __register_shrinker(shrinker);
+}
+#else
+int register_shrinker(struct shrinker *shrinker, const char *fmt, ...)
+{
+	return __register_shrinker(shrinker);
+}
+#endif
 EXPORT_SYMBOL(register_shrinker);
 
 /*
diff --git a/mm/workingset.c b/mm/workingset.c
index 592569a8974c..840986179cf3 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -625,7 +625,7 @@ static int __init workingset_init(void)
 	pr_info("workingset: timestamp_bits=%d max_order=%d bucket_order=%u\n",
 	       timestamp_bits, max_order, bucket_order);
 
-	ret = prealloc_shrinker(&workingset_shadow_shrinker);
+	ret = prealloc_shrinker(&workingset_shadow_shrinker, "shadow");
 	if (ret)
 		goto err;
 	ret = __list_lru_init(&shadow_nodes, true, &shadow_nodes_key,
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 9152fbde33b5..a19de176f604 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -2188,7 +2188,7 @@ static int zs_register_shrinker(struct zs_pool *pool)
 	pool->shrinker.batch = 0;
 	pool->shrinker.seeks = DEFAULT_SEEKS;
 
-	return register_shrinker(&pool->shrinker);
+	return register_shrinker(&pool->shrinker, "zspool");
 }
 
 /**
diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c
index 682fcd24bf43..a29742a9c3f1 100644
--- a/net/sunrpc/auth.c
+++ b/net/sunrpc/auth.c
@@ -874,7 +874,7 @@ int __init rpcauth_init_module(void)
 	err = rpc_init_authunix();
 	if (err < 0)
 		goto out1;
-	err = register_shrinker(&rpc_cred_shrinker);
+	err = register_shrinker(&rpc_cred_shrinker, "rpc_cred");
 	if (err < 0)
 		goto out2;
 	return 0;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 6/7] docs: document shrinker debugfs
  2022-04-22 20:26 [PATCH v2 0/7] mm: introduce shrinker debugfs interface Roman Gushchin
                   ` (4 preceding siblings ...)
  2022-04-22 20:26 ` [PATCH v2 5/7] mm: provide shrinkers with names Roman Gushchin
@ 2022-04-22 20:26 ` Roman Gushchin
  2022-04-22 20:26 ` [PATCH v2 7/7] tools: add memcg_shrinker.py Roman Gushchin
  2022-04-26  6:02 ` [PATCH v2 0/7] mm: introduce shrinker debugfs interface Dave Chinner
  7 siblings, 0 replies; 24+ messages in thread
From: Roman Gushchin @ 2022-04-22 20:26 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Dave Chinner, linux-kernel, Yang Shi, Kent Overstreet,
	Hillf Danton, Roman Gushchin

Add a document describing the shrinker debugfs interface.

Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
---
 Documentation/admin-guide/mm/index.rst        |  1 +
 .../admin-guide/mm/shrinker_debugfs.rst       | 90 +++++++++++++++++++
 2 files changed, 91 insertions(+)
 create mode 100644 Documentation/admin-guide/mm/shrinker_debugfs.rst

diff --git a/Documentation/admin-guide/mm/index.rst b/Documentation/admin-guide/mm/index.rst
index c21b5823f126..1bd11118dfb1 100644
--- a/Documentation/admin-guide/mm/index.rst
+++ b/Documentation/admin-guide/mm/index.rst
@@ -36,6 +36,7 @@ the Linux memory management.
    numa_memory_policy
    numaperf
    pagemap
+   shrinker_debugfs
    soft-dirty
    swap_numa
    transhuge
diff --git a/Documentation/admin-guide/mm/shrinker_debugfs.rst b/Documentation/admin-guide/mm/shrinker_debugfs.rst
new file mode 100644
index 000000000000..c2f3da534b70
--- /dev/null
+++ b/Documentation/admin-guide/mm/shrinker_debugfs.rst
@@ -0,0 +1,90 @@
+==========================
+Shrinker Debugfs Interface
+==========================
+
+Shrinker debugfs interface provides a visibility into the kernel memory
+shrinkers subsystem and allows to get statistics and interact with
+individual shrinkers.
+
+For each shrinker registered in the system a directory in <debugfs>/shrinker/
+is created. The directory is named like "kfree_rcu-0". Each name is composed
+from the shrinker's name and an unique id.
+
+Each shrinker directory contains "count" and "scan" files, which allow
+to trigger count_objects() and scan_objects() callbacks. For memcg-aware
+and numa-aware shrinkers count_memcg, scan_memcg, count_node, scan_node,
+count_memcg_node and scan_memcg_node are additionally provided. They allow
+to get per-memcg and/or per-node object count and shrink only a specific
+memcg/node.
+
+Usage examples:
+
+ 1. List registered shrinkers::
+      $ cd /sys/kernel/debug/shrinker/
+      $ ls
+      dqcache-16          sb-cgroup2-30    sb-hugetlbfs-33  sb-proc-41       sb-selinuxfs-22  sb-tmpfs-40    sb-zsmalloc-19
+      kfree_rcu-0         sb-configfs-23   sb-iomem-12      sb-proc-44       sb-sockfs-8      sb-tmpfs-42    shadow-18
+      sb-aio-20           sb-dax-11        sb-mqueue-21     sb-proc-45       sb-sysfs-26      sb-tmpfs-43    thp_deferred_split-10
+      sb-anon_inodefs-15  sb-debugfs-7     sb-nsfs-4        sb-proc-47       sb-tmpfs-1       sb-tmpfs-46    thp_zero-9
+      sb-bdev-3           sb-devpts-28     sb-pipefs-14     sb-pstore-31     sb-tmpfs-27      sb-tmpfs-49    xfs_buf-37
+      sb-bpf-32           sb-devtmpfs-5    sb-proc-25       sb-rootfs-2      sb-tmpfs-29      sb-tracefs-13  xfs_inodegc-38
+      sb-btrfs-24         sb-hugetlbfs-17  sb-proc-39       sb-securityfs-6  sb-tmpfs-35      sb-xfs-36      zspool-34
+
+ 2. Get information about a specific shrinker::
+      $ cd sb-btrfs-24/
+      $ ls
+      count  count_memcg  count_memcg_node  count_node  scan  scan_memcg  scan_memcg_node  scan_node
+
+ 3. Count objects on the system/root cgroup level::
+      $ cat count
+      212
+
+ 4. Count objects on the system/root cgroup level per numa node (on a 2-node machine)::
+      $ cat count_node
+      209 3
+
+ 5. Count objects for each memcg (output format: cgroup inode, count)::
+      $ cat count_memcg
+      1 212
+      20 96
+      53 817
+      2297 2
+      218 13
+      581 30
+      911 124
+      ...
+
+ 6. Same but with a per-node output::
+      $ cat count_memcg_node
+      1 209 3
+      20 96 0
+      53 810 7
+      2297 2 0
+      218 13 0
+      581 30 0
+      911 124 0
+      ...
+
+ 7. Scan system/root shrinker::
+      $ cat count
+      212
+      $ echo 100 > scan
+      $ cat scan
+      97
+      $ cat count
+      115
+
+ 8. Scan individual memcg::
+      $ echo "1868 500" > scan_memcg
+      $ cat scan_memcg
+      193
+
+ 9. Scan individual node::
+      $ echo "1 200" > scan_node
+      $ cat scan_node
+      2
+
+ 10. Scan individual memcg and node::
+     $ echo "1868 0 500" > scan_memcg_node
+     $ cat scan_memcg_node
+     435
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 7/7] tools: add memcg_shrinker.py
  2022-04-22 20:26 [PATCH v2 0/7] mm: introduce shrinker debugfs interface Roman Gushchin
                   ` (5 preceding siblings ...)
  2022-04-22 20:26 ` [PATCH v2 6/7] docs: document shrinker debugfs Roman Gushchin
@ 2022-04-22 20:26 ` Roman Gushchin
  2022-04-26  6:02 ` [PATCH v2 0/7] mm: introduce shrinker debugfs interface Dave Chinner
  7 siblings, 0 replies; 24+ messages in thread
From: Roman Gushchin @ 2022-04-22 20:26 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Dave Chinner, linux-kernel, Yang Shi, Kent Overstreet,
	Hillf Danton, Roman Gushchin

Add a simple tool which prints a sorted list of shrinker lists
in the following format: (number of objects, shrinker name, cgroup).

Example:
  $ ./memcg_shrinker.py | head -n 10
    2092     sb-sysfs-26          /sys/fs/cgroup/system.slice
    1809     sb-sysfs-26          /sys/fs/cgroup/system.slice/systemd-udevd.service
    1350     sb-btrfs-24          /sys/fs/cgroup/system.slice
    1016     sb-btrfs-24          /sys/fs/cgroup/system.slice/system-dbus\x2d:1.3\x2dorg.fedoraproject.Setroubleshootd.slice
    861      sb-btrfs-24          /sys/fs/cgroup/system.slice/system-dbus\x2d:1.3\x2dorg.fedoraproject.SetroubleshootPrivileged.slice
    672      sb-btrfs-24          /sys/fs/cgroup/system.slice/firewalld.service
    655      sb-cgroup2-30        /sys/fs/cgroup/init.scope
    275      sb-sysfs-26          /
    232      sb-btrfs-24          /
    221      sb-proc-25           /sys/fs/cgroup/system.slice/systemd-journald.service

Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
---
 tools/cgroup/memcg_shrinker.py | 70 ++++++++++++++++++++++++++++++++++
 1 file changed, 70 insertions(+)
 create mode 100755 tools/cgroup/memcg_shrinker.py

diff --git a/tools/cgroup/memcg_shrinker.py b/tools/cgroup/memcg_shrinker.py
new file mode 100755
index 000000000000..9c32bf0247b2
--- /dev/null
+++ b/tools/cgroup/memcg_shrinker.py
@@ -0,0 +1,70 @@
+#!/usr/bin/env python3
+#
+# Copyright (C) 2022 Roman Gushchin <roman.gushchin@linux.dev>
+# Copyright (C) 2022 Meta
+
+import os
+import argparse
+import sys
+
+
+def scan_cgroups(cgroup_root):
+    cgroups = {}
+
+    for root, subdirs, _ in os.walk(cgroup_root):
+        for cgroup in subdirs:
+            path = os.path.join(root, cgroup)
+            ino = os.stat(path).st_ino
+            cgroups[ino] = path
+
+    # (memcg ino, path)
+    return cgroups
+
+
+def scan_shrinkers(shrinker_debugfs):
+    shrinkers = []
+
+    for root, subdirs, _ in os.walk(shrinker_debugfs):
+        for shrinker in subdirs:
+            count_memcg_path = os.path.join(root, shrinker, "count_memcg")
+            try:
+                with open(count_memcg_path) as f:
+                    for line in f.readlines():
+                        items = line.split(' ')
+                        ino = int(items[0])
+                        shrinkers.append((int(items[1]), shrinker, ino))
+            except FileNotFoundError:
+                count_path = os.path.join(root, shrinker, "count")
+                with open(count_path) as f:
+                    shrinkers.append((int(f.readline()), shrinker, 0))
+
+    # (count, shrinker, memcg ino)
+    return shrinkers
+
+
+def main():
+    cgroups = scan_cgroups("/sys/fs/cgroup/")
+    shrinkers = scan_shrinkers("/sys/kernel/debug/shrinker/")
+    shrinkers = sorted(shrinkers, reverse = True, key = lambda x: x[0])
+
+    for s in shrinkers:
+        count = s[0]
+        name = s[1]
+        ino = s[2]
+
+        if count == 0:
+            break
+
+        if ino == 0 or ino == 1:
+            cg = "/"
+        else:
+            try:
+                cg = cgroups[ino]
+            except KeyError:
+                cg = "unknown (%d)" % ino
+
+        print("%-8s %-20s %s" % (count, name, cg))
+
+
+if __name__ == '__main__':
+    main()
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/7] mm: introduce memcg interfaces for shrinker debugfs
  2022-04-22 20:26 ` [PATCH v2 3/7] mm: introduce memcg interfaces for shrinker debugfs Roman Gushchin
@ 2022-04-23  0:35   ` Hillf Danton
  2022-04-23  1:29     ` Roman Gushchin
  0 siblings, 1 reply; 24+ messages in thread
From: Hillf Danton @ 2022-04-23  0:35 UTC (permalink / raw)
  To: Roman Gushchin; +Cc: Andrew Morton, linux-mm, Dave Chinner, linux-kernel

On Fri, 22 Apr 2022 13:26:40 -0700 Roman Gushchin wrote:
> This commit introduces "count_memcg" and "scan_memcg" interfaces
> for memcg-aware shrinkers.
> 
> Count_memcg using the following format:
> <cgroup inode number1> <count2>
> <cgroup inode number2> <count2>
> ...
> 
> Memory cgroups with 0 associated objects are skipped.
> 
> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> ---
>  mm/shrinker_debug.c | 186 +++++++++++++++++++++++++++++++++-----------
>  1 file changed, 139 insertions(+), 47 deletions(-)
> 
> diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c
> index 4df7382a0737..002d44d6ad56 100644
> --- a/mm/shrinker_debug.c
> +++ b/mm/shrinker_debug.c
> @@ -1,8 +1,10 @@
>  // SPDX-License-Identifier: GPL-2.0
>  #include <linux/idr.h>
> +#include <linux/slab.h>
>  #include <linux/debugfs.h>
>  #include <linux/seq_file.h>
>  #include <linux/shrinker.h>
> +#include <linux/memcontrol.h>
>  
>  /* defined in vmscan.c */
>  extern struct rw_semaphore shrinker_rwsem;
> @@ -11,25 +13,25 @@ extern struct list_head shrinker_list;
>  static DEFINE_IDA(shrinker_debugfs_ida);
>  static struct dentry *shrinker_debugfs_root;
>  
> -static int shrinker_debugfs_count_show(struct seq_file *m, void *v)
> +static unsigned long shrinker_count_objects(struct shrinker *shrinker,
> +					    struct mem_cgroup *memcg,
> +					    unsigned long *count_per_node)
>  {
> -	struct shrinker *shrinker = (struct shrinker *)m->private;
>  	unsigned long nr, total = 0;
> -	int ret, nid;
> -
> -	ret = down_read_killable(&shrinker_rwsem);
> -	if (ret)
> -		return ret;
> +	int nid;
>  
>  	for_each_node(nid) {
>  		struct shrink_control sc = {
>  			.gfp_mask = GFP_KERNEL,
>  			.nid = nid,
> +			.memcg = memcg,
>  		};
>  
>  		nr = shrinker->count_objects(shrinker, &sc);
>  		if (nr == SHRINK_EMPTY)
>  			nr = 0;
> +		if (count_per_node)
> +			count_per_node[nid] = nr;
>  		total += nr;
>  
>  		if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
> @@ -37,32 +39,17 @@ static int shrinker_debugfs_count_show(struct seq_file *m, void *v)
>  
>  		cond_resched();

Nit, add a line in response to signal before schedule, given the
killable above.

>  	}
> -	up_read(&shrinker_rwsem);
> -
> -	seq_printf(m, "%lu\n", total);
>  
> -	return ret;
> +	return total;
>  }
> -DEFINE_SHOW_ATTRIBUTE(shrinker_debugfs_count);
>  
> -static ssize_t shrinker_debugfs_scan_write(struct file *file,
> -					   const char __user *buf,
> -					   size_t size, loff_t *pos)
> +static int shrinker_scan_objects(struct shrinker *shrinker,
> +				 struct mem_cgroup *memcg,
> +				 unsigned long nr_to_scan)
>  {
> -	struct shrinker *shrinker = (struct shrinker *)file->private_data;
> -	unsigned long nr, total = 0, nr_to_scan;
> -	unsigned long *count_per_node = NULL;
> -	int nid;
> -	char kbuf[24];
> -	int read_len = size < (sizeof(kbuf) - 1) ? size : (sizeof(kbuf) - 1);
> -	ssize_t ret;
> -
> -	if (copy_from_user(kbuf, buf, read_len))
> -		return -EFAULT;
> -	kbuf[read_len] = '\0';
> -
> -	if (kstrtoul(kbuf, 10, &nr_to_scan))
> -		return -EINVAL;
> +	unsigned long *count_per_node;
> +	unsigned long total, nr;
> +	int ret, nid;
>  
>  	ret = down_read_killable(&shrinker_rwsem);
>  	if (ret)
> @@ -80,20 +67,7 @@ static ssize_t shrinker_debugfs_scan_write(struct file *file,
>  			goto out;
>  		}
>  
> -		for_each_node(nid) {
> -			struct shrink_control sc = {
> -				.gfp_mask = GFP_KERNEL,
> -				.nid = nid,
> -			};
> -
> -			nr = shrinker->count_objects(shrinker, &sc);
> -			if (nr == SHRINK_EMPTY)
> -				nr = 0;
> -			count_per_node[nid] = nr;
> -			total += nr;
> -
> -			cond_resched();
> -		}
> +		total = shrinker_count_objects(shrinker, memcg, count_per_node);
>  	}
>  
>  	for_each_node(nid) {
> @@ -102,13 +76,13 @@ static ssize_t shrinker_debugfs_scan_write(struct file *file,
>  			.nid = nid,
>  		};
>  
> -		if (shrinker->flags & SHRINKER_NUMA_AWARE) {
> +		if (count_per_node) {
>  			sc.nr_to_scan = nr_to_scan * count_per_node[nid] /
>  				(total ? total : 1);
>  			sc.nr_scanned = sc.nr_to_scan;
>  		} else {
>  			sc.nr_to_scan = nr_to_scan;
> -			sc.nr_scanned = sc.nr_to_scan;
> +			sc.nr_scanned = nr_to_scan;
>  		}
>  
>  		nr = shrinker->scan_objects(shrinker, &sc);
> @@ -119,15 +93,51 @@ static ssize_t shrinker_debugfs_scan_write(struct file *file,
>  			break;
>  
>  		cond_resched();
> -
>  	}
> -	ret = size;
>  out:
>  	up_read(&shrinker_rwsem);
>  	kfree(count_per_node);
>  	return ret;
>  }
>  
> +static int shrinker_debugfs_count_show(struct seq_file *m, void *v)
> +{
> +	struct shrinker *shrinker = (struct shrinker *)m->private;
> +	int ret;
> +
> +	ret = down_read_killable(&shrinker_rwsem);
> +	if (!ret) {
> +		unsigned long total = shrinker_count_objects(shrinker, NULL, NULL);
> +
> +		up_read(&shrinker_rwsem);
> +		seq_printf(m, "%lu\n", total);
> +	}
> +	return ret;
> +}
> +DEFINE_SHOW_ATTRIBUTE(shrinker_debugfs_count);
> +
> +static ssize_t shrinker_debugfs_scan_write(struct file *file,
> +					   const char __user *buf,
> +					   size_t size, loff_t *pos)
> +{
> +	struct shrinker *shrinker = (struct shrinker *)file->private_data;
> +	unsigned long nr_to_scan;
> +	char kbuf[24];
> +	int read_len = size < (sizeof(kbuf) - 1) ? size : (sizeof(kbuf) - 1);
> +	ssize_t ret;
> +
> +	if (copy_from_user(kbuf, buf, read_len))
> +		return -EFAULT;
> +	kbuf[read_len] = '\0';
> +
> +	if (kstrtoul(kbuf, 10, &nr_to_scan))
> +		return -EINVAL;
> +
> +	ret = shrinker_scan_objects(shrinker, NULL, nr_to_scan);
> +
> +	return ret ? ret : size;
> +}
> +
>  static int shrinker_debugfs_scan_open(struct inode *inode, struct file *file)
>  {
>  	file->private_data = inode->i_private;
> @@ -140,6 +150,78 @@ static const struct file_operations shrinker_debugfs_scan_fops = {
>  	.write	 = shrinker_debugfs_scan_write,
>  };
>  
> +#ifdef CONFIG_MEMCG
> +static int shrinker_debugfs_count_memcg_show(struct seq_file *m, void *v)
> +{
> +	struct shrinker *shrinker = (struct shrinker *)m->private;
> +	struct mem_cgroup *memcg;
> +	unsigned long total;
> +	int ret;
> +
> +	ret = down_read_killable(&shrinker_rwsem);
> +	if (ret)
> +		return ret;
> +	rcu_read_lock();

A minute ... things like cond_resched() or mutex in individual shrinker
implementation can ruin your nice work within two seconds. The bigger
pain is can we rule them out from coming shrinkers?

Hillf
> +
> +	memcg = mem_cgroup_iter(NULL, NULL, NULL);
> +	do {

Nit, take a look at pending signal.

> +		if (!mem_cgroup_online(memcg))
> +			continue;
> +
> +		total = shrinker_count_objects(shrinker, memcg, NULL);
> +		if (!total)
> +			continue;
> +
> +		seq_printf(m, "%lu %lu\n", mem_cgroup_ino(memcg), total);
> +	} while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
> +
> +	rcu_read_unlock();
> +	up_read(&shrinker_rwsem);
> +
> +	return 0;
> +}
> +DEFINE_SHOW_ATTRIBUTE(shrinker_debugfs_count_memcg);


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/7] mm: introduce memcg interfaces for shrinker debugfs
  2022-04-23  0:35   ` Hillf Danton
@ 2022-04-23  1:29     ` Roman Gushchin
  0 siblings, 0 replies; 24+ messages in thread
From: Roman Gushchin @ 2022-04-23  1:29 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Andrew Morton, linux-mm, Dave Chinner, linux-kernel

On Sat, Apr 23, 2022 at 08:35:52AM +0800, Hillf Danton wrote:
> On Fri, 22 Apr 2022 13:26:40 -0700 Roman Gushchin wrote:
> > This commit introduces "count_memcg" and "scan_memcg" interfaces
> > for memcg-aware shrinkers.
> > 
> > Count_memcg using the following format:
> > <cgroup inode number1> <count2>
> > <cgroup inode number2> <count2>
> > ...
> > 
> > Memory cgroups with 0 associated objects are skipped.
> > 
> > Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> > ---
> >  mm/shrinker_debug.c | 186 +++++++++++++++++++++++++++++++++-----------
> >  1 file changed, 139 insertions(+), 47 deletions(-)
> > 
> > diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c
> > index 4df7382a0737..002d44d6ad56 100644
> > --- a/mm/shrinker_debug.c
> > +++ b/mm/shrinker_debug.c
> > @@ -1,8 +1,10 @@
> >  // SPDX-License-Identifier: GPL-2.0
> >  #include <linux/idr.h>
> > +#include <linux/slab.h>
> >  #include <linux/debugfs.h>
> >  #include <linux/seq_file.h>
> >  #include <linux/shrinker.h>
> > +#include <linux/memcontrol.h>
> >  
> >  /* defined in vmscan.c */
> >  extern struct rw_semaphore shrinker_rwsem;
> > @@ -11,25 +13,25 @@ extern struct list_head shrinker_list;
> >  static DEFINE_IDA(shrinker_debugfs_ida);
> >  static struct dentry *shrinker_debugfs_root;
> >  
> > -static int shrinker_debugfs_count_show(struct seq_file *m, void *v)
> > +static unsigned long shrinker_count_objects(struct shrinker *shrinker,
> > +					    struct mem_cgroup *memcg,
> > +					    unsigned long *count_per_node)
> >  {
> > -	struct shrinker *shrinker = (struct shrinker *)m->private;
> >  	unsigned long nr, total = 0;
> > -	int ret, nid;
> > -
> > -	ret = down_read_killable(&shrinker_rwsem);
> > -	if (ret)
> > -		return ret;
> > +	int nid;
> >  
> >  	for_each_node(nid) {
> >  		struct shrink_control sc = {
> >  			.gfp_mask = GFP_KERNEL,
> >  			.nid = nid,
> > +			.memcg = memcg,
> >  		};
> >  
> >  		nr = shrinker->count_objects(shrinker, &sc);
> >  		if (nr == SHRINK_EMPTY)
> >  			nr = 0;
> > +		if (count_per_node)
> > +			count_per_node[nid] = nr;
> >  		total += nr;
> >  
> >  		if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
> > @@ -37,32 +39,17 @@ static int shrinker_debugfs_count_show(struct seq_file *m, void *v)
> >  
> >  		cond_resched();
> 
> Nit, add a line in response to signal before schedule, given the
> killable above.

Good point, thanks!

> 
> >  	}
> > -	up_read(&shrinker_rwsem);
> > -
> > -	seq_printf(m, "%lu\n", total);
> >  
> > -	return ret;
> > +	return total;
> >  }
> > -DEFINE_SHOW_ATTRIBUTE(shrinker_debugfs_count);
> >  
> > -static ssize_t shrinker_debugfs_scan_write(struct file *file,
> > -					   const char __user *buf,
> > -					   size_t size, loff_t *pos)
> > +static int shrinker_scan_objects(struct shrinker *shrinker,
> > +				 struct mem_cgroup *memcg,
> > +				 unsigned long nr_to_scan)
> >  {
> > -	struct shrinker *shrinker = (struct shrinker *)file->private_data;
> > -	unsigned long nr, total = 0, nr_to_scan;
> > -	unsigned long *count_per_node = NULL;
> > -	int nid;
> > -	char kbuf[24];
> > -	int read_len = size < (sizeof(kbuf) - 1) ? size : (sizeof(kbuf) - 1);
> > -	ssize_t ret;
> > -
> > -	if (copy_from_user(kbuf, buf, read_len))
> > -		return -EFAULT;
> > -	kbuf[read_len] = '\0';
> > -
> > -	if (kstrtoul(kbuf, 10, &nr_to_scan))
> > -		return -EINVAL;
> > +	unsigned long *count_per_node;
> > +	unsigned long total, nr;
> > +	int ret, nid;
> >  
> >  	ret = down_read_killable(&shrinker_rwsem);
> >  	if (ret)
> > @@ -80,20 +67,7 @@ static ssize_t shrinker_debugfs_scan_write(struct file *file,
> >  			goto out;
> >  		}
> >  
> > -		for_each_node(nid) {
> > -			struct shrink_control sc = {
> > -				.gfp_mask = GFP_KERNEL,
> > -				.nid = nid,
> > -			};
> > -
> > -			nr = shrinker->count_objects(shrinker, &sc);
> > -			if (nr == SHRINK_EMPTY)
> > -				nr = 0;
> > -			count_per_node[nid] = nr;
> > -			total += nr;
> > -
> > -			cond_resched();
> > -		}
> > +		total = shrinker_count_objects(shrinker, memcg, count_per_node);
> >  	}
> >  
> >  	for_each_node(nid) {
> > @@ -102,13 +76,13 @@ static ssize_t shrinker_debugfs_scan_write(struct file *file,
> >  			.nid = nid,
> >  		};
> >  
> > -		if (shrinker->flags & SHRINKER_NUMA_AWARE) {
> > +		if (count_per_node) {
> >  			sc.nr_to_scan = nr_to_scan * count_per_node[nid] /
> >  				(total ? total : 1);
> >  			sc.nr_scanned = sc.nr_to_scan;
> >  		} else {
> >  			sc.nr_to_scan = nr_to_scan;
> > -			sc.nr_scanned = sc.nr_to_scan;
> > +			sc.nr_scanned = nr_to_scan;
> >  		}
> >  
> >  		nr = shrinker->scan_objects(shrinker, &sc);
> > @@ -119,15 +93,51 @@ static ssize_t shrinker_debugfs_scan_write(struct file *file,
> >  			break;
> >  
> >  		cond_resched();
> > -
> >  	}
> > -	ret = size;
> >  out:
> >  	up_read(&shrinker_rwsem);
> >  	kfree(count_per_node);
> >  	return ret;
> >  }
> >  
> > +static int shrinker_debugfs_count_show(struct seq_file *m, void *v)
> > +{
> > +	struct shrinker *shrinker = (struct shrinker *)m->private;
> > +	int ret;
> > +
> > +	ret = down_read_killable(&shrinker_rwsem);
> > +	if (!ret) {
> > +		unsigned long total = shrinker_count_objects(shrinker, NULL, NULL);
> > +
> > +		up_read(&shrinker_rwsem);
> > +		seq_printf(m, "%lu\n", total);
> > +	}
> > +	return ret;
> > +}
> > +DEFINE_SHOW_ATTRIBUTE(shrinker_debugfs_count);
> > +
> > +static ssize_t shrinker_debugfs_scan_write(struct file *file,
> > +					   const char __user *buf,
> > +					   size_t size, loff_t *pos)
> > +{
> > +	struct shrinker *shrinker = (struct shrinker *)file->private_data;
> > +	unsigned long nr_to_scan;
> > +	char kbuf[24];
> > +	int read_len = size < (sizeof(kbuf) - 1) ? size : (sizeof(kbuf) - 1);
> > +	ssize_t ret;
> > +
> > +	if (copy_from_user(kbuf, buf, read_len))
> > +		return -EFAULT;
> > +	kbuf[read_len] = '\0';
> > +
> > +	if (kstrtoul(kbuf, 10, &nr_to_scan))
> > +		return -EINVAL;
> > +
> > +	ret = shrinker_scan_objects(shrinker, NULL, nr_to_scan);
> > +
> > +	return ret ? ret : size;
> > +}
> > +
> >  static int shrinker_debugfs_scan_open(struct inode *inode, struct file *file)
> >  {
> >  	file->private_data = inode->i_private;
> > @@ -140,6 +150,78 @@ static const struct file_operations shrinker_debugfs_scan_fops = {
> >  	.write	 = shrinker_debugfs_scan_write,
> >  };
> >  
> > +#ifdef CONFIG_MEMCG
> > +static int shrinker_debugfs_count_memcg_show(struct seq_file *m, void *v)
> > +{
> > +	struct shrinker *shrinker = (struct shrinker *)m->private;
> > +	struct mem_cgroup *memcg;
> > +	unsigned long total;
> > +	int ret;
> > +
> > +	ret = down_read_killable(&shrinker_rwsem);
> > +	if (ret)
> > +		return ret;
> > +	rcu_read_lock();
> 
> A minute ... things like cond_resched() or mutex in individual shrinker
> implementation can ruin your nice work within two seconds. The bigger
> pain is can we rule them out from coming shrinkers?

Hm, why? Isn't it the same path as in reclaim? Can you, please, elaborate?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 5/7] mm: provide shrinkers with names
  2022-04-22 20:26 ` [PATCH v2 5/7] mm: provide shrinkers with names Roman Gushchin
@ 2022-04-23  7:46       ` Christophe JAILLET
  0 siblings, 0 replies; 24+ messages in thread
From: Christophe JAILLET @ 2022-04-23  7:46 UTC (permalink / raw)
  To: Roman Gushchin, Andrew Morton, linux-mm
  Cc: Dave Chinner, linux-kernel, Yang Shi, Kent Overstreet, Hillf Danton

Hi,

Le 22/04/2022 à 22:26, Roman Gushchin a écrit :
> Currently shrinkers are anonymous objects. For debugging purposes they
> can be identified by count/scan function names, but it's not always
> useful: e.g. for superblock's shrinkers it's nice to have at least
> an idea of to which superblock the shrinker belongs.
> 
> This commit adds names to shrinkers. register_shrinker() and
> prealloc_shrinker() functions are extended to take a format and
> arguments to master a name. If CONFIG_SHRINKER_DEBUG is on,
> the name is saved until the corresponding debugfs object is created,
> otherwise it's simple ignored.
> 
> After this change the shrinker debugfs directory looks like:
>    $ cd /sys/kernel/debug/shrinker/
>    $ ls
>      dqcache-16          sb-cgroup2-30    sb-hugetlbfs-33  sb-proc-41       sb-selinuxfs-22  sb-tmpfs-40    sb-zsmalloc-19
>      kfree_rcu-0         sb-configfs-23   sb-iomem-12      sb-proc-44       sb-sockfs-8      sb-tmpfs-42    shadow-18
>      sb-aio-20           sb-dax-11        sb-mqueue-21     sb-proc-45       sb-sysfs-26      sb-tmpfs-43    thp_deferred_split-10
>      sb-anon_inodefs-15  sb-debugfs-7     sb-nsfs-4        sb-proc-47       sb-tmpfs-1       sb-tmpfs-46    thp_zero-9
>      sb-bdev-3           sb-devpts-28     sb-pipefs-14     sb-pstore-31     sb-tmpfs-27      sb-tmpfs-49    xfs_buf-37
>      sb-bpf-32           sb-devtmpfs-5    sb-proc-25       sb-rootfs-2      sb-tmpfs-29      sb-tracefs-13  xfs_inodegc-38
>      sb-btrfs-24         sb-hugetlbfs-17  sb-proc-39       sb-securityfs-6  sb-tmpfs-35      sb-xfs-36      zspool-34
> 
> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> ---

[...]

> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 59f91e392a2a..1b326b93155c 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -7383,7 +7383,7 @@ static struct r5conf *setup_conf(struct mddev *mddev)
>   	conf->shrinker.count_objects = raid5_cache_count;
>   	conf->shrinker.batch = 128;
>   	conf->shrinker.flags = 0;
> -	if (register_shrinker(&conf->shrinker)) {
> +	if (register_shrinker(&conf->shrinker, "md")) {

Based on pr_warn below, does it make sense to have something like:
   register_shrinker(&conf->shrinker, "md-%s", mdname(mddev))

>   		pr_warn("md/raid:%s: couldn't register shrinker.\n",
>   			mdname(mddev));
>   		goto abort;

[...]

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 121a54a1602b..6025cfda4932 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -613,7 +613,7 @@ static unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru,
>   /*
>    * Add a shrinker callback to be called from the vm.
>    */
> -int prealloc_shrinker(struct shrinker *shrinker)
> +static int __prealloc_shrinker(struct shrinker *shrinker)
>   {
>   	unsigned int size;
>   	int err;
> @@ -637,6 +637,34 @@ int prealloc_shrinker(struct shrinker *shrinker)
>   	return 0;
>   }
>   
> +#ifdef CONFIG_SHRINKER_DEBUG
> +int prealloc_shrinker(struct shrinker *shrinker, const char *fmt, ...)
> +{
> +	int err;
> +	char buf[64];
> +	va_list ap;
> +
> +	va_start(ap, fmt);
> +	vscnprintf(buf, sizeof(buf), fmt, ap);
> +	va_end(ap);
> +
> +	shrinker->name = kstrdup(buf, GFP_KERNEL);
> +	if (!shrinker->name)
> +		return -ENOMEM;

use kvasprintf_const() (and kfree_const() elsewhere) to simplify code 
and take advantage of kstrdup_const() in most cases?

> +
> +	err = __prealloc_shrinker(shrinker);
> +	if (err)
> +		kfree(shrinker->name);
> +
> +	return err;
> +}
> +#else
> +int prealloc_shrinker(struct shrinker *shrinker, const char *fmt, ...)
> +{
> +	return __prealloc_shrinker(shrinker);
> +}
> +#endif
> +
>   void free_prealloced_shrinker(struct shrinker *shrinker)
>   {
>   	if (shrinker->flags & SHRINKER_MEMCG_AWARE) {
> @@ -648,6 +676,9 @@ void free_prealloced_shrinker(struct shrinker *shrinker)
>   
>   	kfree(shrinker->nr_deferred);
>   	shrinker->nr_deferred = NULL;
> +#ifdef CONFIG_SHRINKER_DEBUG
> +	kfree(shrinker->name);
> +#endif
>   }
>   
>   void register_shrinker_prepared(struct shrinker *shrinker)
> @@ -659,15 +690,38 @@ void register_shrinker_prepared(struct shrinker *shrinker)
>   	up_write(&shrinker_rwsem);
>   }
>   
> -int register_shrinker(struct shrinker *shrinker)
> +static int __register_shrinker(struct shrinker *shrinker)
>   {
> -	int err = prealloc_shrinker(shrinker);
> +	int err = __prealloc_shrinker(shrinker);
>   
>   	if (err)
>   		return err;
>   	register_shrinker_prepared(shrinker);
>   	return 0;
>   }
> +
> +#ifdef CONFIG_SHRINKER_DEBUG
> +int register_shrinker(struct shrinker *shrinker, const char *fmt, ...)
> +{
> +	char buf[64];
> +	va_list ap;
> +
> +	va_start(ap, fmt);
> +	vscnprintf(buf, sizeof(buf), fmt, ap);
> +	va_end(ap);
> +
> +	shrinker->name = kstrdup(buf, GFP_KERNEL);
> +	if (!shrinker->name)
> +		return -ENOMEM;
> +

same as above.

> +	return __register_shrinker(shrinker);

Missing error handling and freeing of shrinker->name as done in 
prealloc_shrinker()?

CJ

> +}
> +#else
> +int register_shrinker(struct shrinker *shrinker, const char *fmt, ...)
> +{
> +	return __register_shrinker(shrinker);
> +}
> +#endif
>   EXPORT_SYMBOL(register_shrinker);
>   
>   /*

[...]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 5/7] mm: provide shrinkers with names
@ 2022-04-23  7:46       ` Christophe JAILLET
  0 siblings, 0 replies; 24+ messages in thread
From: Christophe JAILLET @ 2022-04-23  7:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm

Hi,

Le 22/04/2022 à 22:26, Roman Gushchin a écrit :
> Currently shrinkers are anonymous objects. For debugging purposes they
> can be identified by count/scan function names, but it's not always
> useful: e.g. for superblock's shrinkers it's nice to have at least
> an idea of to which superblock the shrinker belongs.
> 
> This commit adds names to shrinkers. register_shrinker() and
> prealloc_shrinker() functions are extended to take a format and
> arguments to master a name. If CONFIG_SHRINKER_DEBUG is on,
> the name is saved until the corresponding debugfs object is created,
> otherwise it's simple ignored.
> 
> After this change the shrinker debugfs directory looks like:
>    $ cd /sys/kernel/debug/shrinker/
>    $ ls
>      dqcache-16          sb-cgroup2-30    sb-hugetlbfs-33  sb-proc-41       sb-selinuxfs-22  sb-tmpfs-40    sb-zsmalloc-19
>      kfree_rcu-0         sb-configfs-23   sb-iomem-12      sb-proc-44       sb-sockfs-8      sb-tmpfs-42    shadow-18
>      sb-aio-20           sb-dax-11        sb-mqueue-21     sb-proc-45       sb-sysfs-26      sb-tmpfs-43    thp_deferred_split-10
>      sb-anon_inodefs-15  sb-debugfs-7     sb-nsfs-4        sb-proc-47       sb-tmpfs-1       sb-tmpfs-46    thp_zero-9
>      sb-bdev-3           sb-devpts-28     sb-pipefs-14     sb-pstore-31     sb-tmpfs-27      sb-tmpfs-49    xfs_buf-37
>      sb-bpf-32           sb-devtmpfs-5    sb-proc-25       sb-rootfs-2      sb-tmpfs-29      sb-tracefs-13  xfs_inodegc-38
>      sb-btrfs-24         sb-hugetlbfs-17  sb-proc-39       sb-securityfs-6  sb-tmpfs-35      sb-xfs-36      zspool-34
> 
> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> ---

[...]

> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 59f91e392a2a..1b326b93155c 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -7383,7 +7383,7 @@ static struct r5conf *setup_conf(struct mddev *mddev)
>   	conf->shrinker.count_objects = raid5_cache_count;
>   	conf->shrinker.batch = 128;
>   	conf->shrinker.flags = 0;
> -	if (register_shrinker(&conf->shrinker)) {
> +	if (register_shrinker(&conf->shrinker, "md")) {

Based on pr_warn below, does it make sense to have something like:
   register_shrinker(&conf->shrinker, "md-%s", mdname(mddev))

>   		pr_warn("md/raid:%s: couldn't register shrinker.\n",
>   			mdname(mddev));
>   		goto abort;

[...]

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 121a54a1602b..6025cfda4932 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -613,7 +613,7 @@ static unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru,
>   /*
>    * Add a shrinker callback to be called from the vm.
>    */
> -int prealloc_shrinker(struct shrinker *shrinker)
> +static int __prealloc_shrinker(struct shrinker *shrinker)
>   {
>   	unsigned int size;
>   	int err;
> @@ -637,6 +637,34 @@ int prealloc_shrinker(struct shrinker *shrinker)
>   	return 0;
>   }
>   
> +#ifdef CONFIG_SHRINKER_DEBUG
> +int prealloc_shrinker(struct shrinker *shrinker, const char *fmt, ...)
> +{
> +	int err;
> +	char buf[64];
> +	va_list ap;
> +
> +	va_start(ap, fmt);
> +	vscnprintf(buf, sizeof(buf), fmt, ap);
> +	va_end(ap);
> +
> +	shrinker->name = kstrdup(buf, GFP_KERNEL);
> +	if (!shrinker->name)
> +		return -ENOMEM;

use kvasprintf_const() (and kfree_const() elsewhere) to simplify code 
and take advantage of kstrdup_const() in most cases?

> +
> +	err = __prealloc_shrinker(shrinker);
> +	if (err)
> +		kfree(shrinker->name);
> +
> +	return err;
> +}
> +#else
> +int prealloc_shrinker(struct shrinker *shrinker, const char *fmt, ...)
> +{
> +	return __prealloc_shrinker(shrinker);
> +}
> +#endif
> +
>   void free_prealloced_shrinker(struct shrinker *shrinker)
>   {
>   	if (shrinker->flags & SHRINKER_MEMCG_AWARE) {
> @@ -648,6 +676,9 @@ void free_prealloced_shrinker(struct shrinker *shrinker)
>   
>   	kfree(shrinker->nr_deferred);
>   	shrinker->nr_deferred = NULL;
> +#ifdef CONFIG_SHRINKER_DEBUG
> +	kfree(shrinker->name);
> +#endif
>   }
>   
>   void register_shrinker_prepared(struct shrinker *shrinker)
> @@ -659,15 +690,38 @@ void register_shrinker_prepared(struct shrinker *shrinker)
>   	up_write(&shrinker_rwsem);
>   }
>   
> -int register_shrinker(struct shrinker *shrinker)
> +static int __register_shrinker(struct shrinker *shrinker)
>   {
> -	int err = prealloc_shrinker(shrinker);
> +	int err = __prealloc_shrinker(shrinker);
>   
>   	if (err)
>   		return err;
>   	register_shrinker_prepared(shrinker);
>   	return 0;
>   }
> +
> +#ifdef CONFIG_SHRINKER_DEBUG
> +int register_shrinker(struct shrinker *shrinker, const char *fmt, ...)
> +{
> +	char buf[64];
> +	va_list ap;
> +
> +	va_start(ap, fmt);
> +	vscnprintf(buf, sizeof(buf), fmt, ap);
> +	va_end(ap);
> +
> +	shrinker->name = kstrdup(buf, GFP_KERNEL);
> +	if (!shrinker->name)
> +		return -ENOMEM;
> +

same as above.

> +	return __register_shrinker(shrinker);

Missing error handling and freeing of shrinker->name as done in 
prealloc_shrinker()?

CJ

> +}
> +#else
> +int register_shrinker(struct shrinker *shrinker, const char *fmt, ...)
> +{
> +	return __register_shrinker(shrinker);
> +}
> +#endif
>   EXPORT_SYMBOL(register_shrinker);
>   
>   /*

[...]


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/7] mm: introduce debugfs interface for kernel memory shrinkers
@ 2022-04-23  7:51     ` Christophe JAILLET
  0 siblings, 0 replies; 24+ messages in thread
From: Christophe JAILLET @ 2022-04-23  7:51 UTC (permalink / raw)
  To: Roman Gushchin, Andrew Morton, linux-mm
  Cc: Dave Chinner, linux-kernel, Yang Shi, Kent Overstreet, Hillf Danton

Le 22/04/2022 à 22:26, Roman Gushchin a écrit :
> This commit introduces the /sys/kernel/debug/shrinker debugfs
> interface which provides an ability to observe the state and interact
> with individual kernel memory shrinkers.
> 
> Because the feature is oriented on kernel developers and adds some
> memory overhead (which shouldn't be large unless there is a huge
> amount of registered shrinkers), it's guarded by a config option
> (disabled by default).
> 
> This commit introduces "count" and "scan" interfaces for each
> shrinker registered in the system.
> 
> Basic usage:
>    1) Get the number of objects
>      $ cat count
> 
>    2) Try to reclaim 500 objects
>      $ echo "500" > scan
> 
> Following commits in the series will add memcg- and numa-specific
> features.
> 
> This commit gives debugfs entries simple numeric names, which are not
> very convenient. The following commit in the series will provide
> shrinkers with more meaningful names.
> 
> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> ---
>   include/linux/shrinker.h |  19 +++-
>   lib/Kconfig.debug        |   9 ++
>   mm/Makefile              |   1 +
>   mm/shrinker_debug.c      | 214 +++++++++++++++++++++++++++++++++++++++
>   mm/vmscan.c              |   6 +-
>   5 files changed, 246 insertions(+), 3 deletions(-)
>   create mode 100644 mm/shrinker_debug.c

[...]

> diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c
> new file mode 100644
> index 000000000000..4df7382a0737
> --- /dev/null
> +++ b/mm/shrinker_debug.c

[...]

> +int shrinker_debugfs_add(struct shrinker *shrinker)
> +{
> +	struct dentry *entry;
> +	char buf[256];
Later, this buffer is filled with a "%s-%d". (patch 5/7)
The %s is generated out of a max 64 bytes long string.

To be consistent with the 2 buffers sizes, maybe this one could be 
reduced a bit to save some stack?

CJ

> +	int id;
> +
> +	lockdep_assert_held(&shrinker_rwsem);
> +
> +	/* debugfs isn't initialized yet, add debugfs entries later. */
> +	if (!shrinker_debugfs_root)
> +		return 0;
> +
> +	id = ida_alloc(&shrinker_debugfs_ida, GFP_KERNEL);
> +	if (id < 0)
> +		return id;
> +	shrinker->debugfs_id = id;
> +
> +	snprintf(buf, sizeof(buf), "%d", id);
> +
> +	/* create debugfs entry */
> +	entry = debugfs_create_dir(buf, shrinker_debugfs_root);
> +	if (IS_ERR(entry)) {
> +		ida_free(&shrinker_debugfs_ida, id);
> +		return PTR_ERR(entry);
> +	}

[...]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/7] mm: introduce debugfs interface for kernel memory shrinkers
@ 2022-04-23  7:51     ` Christophe JAILLET
  0 siblings, 0 replies; 24+ messages in thread
From: Christophe JAILLET @ 2022-04-23  7:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm

Le 22/04/2022 à 22:26, Roman Gushchin a écrit :
> This commit introduces the /sys/kernel/debug/shrinker debugfs
> interface which provides an ability to observe the state and interact
> with individual kernel memory shrinkers.
> 
> Because the feature is oriented on kernel developers and adds some
> memory overhead (which shouldn't be large unless there is a huge
> amount of registered shrinkers), it's guarded by a config option
> (disabled by default).
> 
> This commit introduces "count" and "scan" interfaces for each
> shrinker registered in the system.
> 
> Basic usage:
>    1) Get the number of objects
>      $ cat count
> 
>    2) Try to reclaim 500 objects
>      $ echo "500" > scan
> 
> Following commits in the series will add memcg- and numa-specific
> features.
> 
> This commit gives debugfs entries simple numeric names, which are not
> very convenient. The following commit in the series will provide
> shrinkers with more meaningful names.
> 
> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> ---
>   include/linux/shrinker.h |  19 +++-
>   lib/Kconfig.debug        |   9 ++
>   mm/Makefile              |   1 +
>   mm/shrinker_debug.c      | 214 +++++++++++++++++++++++++++++++++++++++
>   mm/vmscan.c              |   6 +-
>   5 files changed, 246 insertions(+), 3 deletions(-)
>   create mode 100644 mm/shrinker_debug.c

[...]

> diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c
> new file mode 100644
> index 000000000000..4df7382a0737
> --- /dev/null
> +++ b/mm/shrinker_debug.c

[...]

> +int shrinker_debugfs_add(struct shrinker *shrinker)
> +{
> +	struct dentry *entry;
> +	char buf[256];
Later, this buffer is filled with a "%s-%d". (patch 5/7)
The %s is generated out of a max 64 bytes long string.

To be consistent with the 2 buffers sizes, maybe this one could be 
reduced a bit to save some stack?

CJ

> +	int id;
> +
> +	lockdep_assert_held(&shrinker_rwsem);
> +
> +	/* debugfs isn't initialized yet, add debugfs entries later. */
> +	if (!shrinker_debugfs_root)
> +		return 0;
> +
> +	id = ida_alloc(&shrinker_debugfs_ida, GFP_KERNEL);
> +	if (id < 0)
> +		return id;
> +	shrinker->debugfs_id = id;
> +
> +	snprintf(buf, sizeof(buf), "%d", id);
> +
> +	/* create debugfs entry */
> +	entry = debugfs_create_dir(buf, shrinker_debugfs_root);
> +	if (IS_ERR(entry)) {
> +		ida_free(&shrinker_debugfs_ida, id);
> +		return PTR_ERR(entry);
> +	}

[...]


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/7] mm: introduce shrinker debugfs interface
  2022-04-22 20:26 [PATCH v2 0/7] mm: introduce shrinker debugfs interface Roman Gushchin
                   ` (6 preceding siblings ...)
  2022-04-22 20:26 ` [PATCH v2 7/7] tools: add memcg_shrinker.py Roman Gushchin
@ 2022-04-26  6:02 ` Dave Chinner
  2022-04-26  6:45   ` Kent Overstreet
                     ` (3 more replies)
  7 siblings, 4 replies; 24+ messages in thread
From: Dave Chinner @ 2022-04-26  6:02 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, linux-mm, linux-kernel, Yang Shi, Kent Overstreet,
	Hillf Danton

On Fri, Apr 22, 2022 at 01:26:37PM -0700, Roman Gushchin wrote:
> There are 50+ different shrinkers in the kernel, many with their own bells and
> whistles. Under the memory pressure the kernel applies some pressure on each of
> them in the order of which they were created/registered in the system. Some
> of them can contain only few objects, some can be quite large. Some can be
> effective at reclaiming memory, some not.
> 
> The only existing debugging mechanism is a couple of tracepoints in
> do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end. They aren't
> covering everything though: shrinkers which report 0 objects will never show up,
> there is no support for memcg-aware shrinkers. Shrinkers are identified by their
> scan function, which is not always enough (e.g. hard to guess which super
> block's shrinker it is having only "super_cache_scan").

In general, I've had no trouble identifying individual shrinker
instances because I'm always looking at individual subsystem
shrinker tracepoints, too.  Hence I've almost always got the
identification information in the traces I need to trace just the
individual shrinker tracepoints and a bit of sed/grep/awk and I've
got something I can feed to gnuplot or a python script to graph...

> They are a passive
> mechanism: there is no way to call into counting and scanning of an individual
> shrinker and profile it.

IDGI. profiling shrinkers iunder ideal conditions when there isn't
memory pressure is largely a useless exercise because execution
patterns under memory pressure are vastly different.

All the problems with shrinkers show up when progress cannot be made
as fast as memory reclaim wants memory to be reclaimed. How do you
trigger priority windup causing large amounts of deferred processing
because shrinkers are running in GFP_NOFS/GFP_NOIO context? How do
you simulate objects getting dirtied in memory so they can't be
immediately reclaimed so the shrinker can't make any progress at all
until IO completes? How do you simulate the unbound concurrency that
direct reclaim can drive into the shrinkers that causes massive lock
contention on shared structures and locks that need to be accessed
to free objects?

IOWs, if all you want to do is profile shrinkers running in the
absence of memory pressure, then you can do that perfectly well with
the existing 'echo 2 > /proc/sys/vm/drop_caches' mechanism. We don't
need some complex debugfs API just to profile the shrinker
behaviour.

So why do we need any of the complexity and potential for abuse that
comes from exposing control of shrinkers directly to userspace like
these patches do?

> To provide a better visibility and debug options for memory shrinkers
> this patchset introduces a /sys/kernel/debug/shrinker interface, to some extent
> similar to /sys/kernel/slab.

/sys/kernel/slab contains read-only usage information - it is
analagous for visibility arguments, but it is not equivalent for
the rest of the "active" functionality you want to add here....

> For each shrinker registered in the system a directory is created. The directory
> contains "count" and "scan" files, which allow to trigger count_objects()
> and scan_objects() callbacks. For memcg-aware and numa-aware shrinkers
> count_memcg, scan_memcg, count_node, scan_node, count_memcg_node
> and scan_memcg_node are additionally provided. They allow to get per-memcg
> and/or per-node object count and shrink only a specific memcg/node.

Great, but why does the shrinker introspection interface need active
scan control functions like these?

> To make debugging more pleasant, the patchset also names all shrinkers,
> so that debugfs entries can have more meaningful names.
> 
> Usage examples:
> 
> 1) List registered shrinkers:
>   $ cd /sys/kernel/debug/shrinker/
>   $ ls
>     dqcache-16          sb-cgroup2-30    sb-hugetlbfs-33  sb-proc-41       sb-selinuxfs-22  sb-tmpfs-40    sb-zsmalloc-19
>     kfree_rcu-0         sb-configfs-23   sb-iomem-12      sb-proc-44       sb-sockfs-8      sb-tmpfs-42    shadow-18
>     sb-aio-20           sb-dax-11        sb-mqueue-21     sb-proc-45       sb-sysfs-26      sb-tmpfs-43    thp_deferred_split-10
>     sb-anon_inodefs-15  sb-debugfs-7     sb-nsfs-4        sb-proc-47       sb-tmpfs-1       sb-tmpfs-46    thp_zero-9
>     sb-bdev-3           sb-devpts-28     sb-pipefs-14     sb-pstore-31     sb-tmpfs-27      sb-tmpfs-49    xfs_buf-37
>     sb-bpf-32           sb-devtmpfs-5    sb-proc-25       sb-rootfs-2      sb-tmpfs-29      sb-tracefs-13  xfs_inodegc-38
>     sb-btrfs-24         sb-hugetlbfs-17  sb-proc-39       sb-securityfs-6  sb-tmpfs-35      sb-xfs-36      zspool-34

Ouch. That's not going to be useful for humans debugging a system as
there's no way to cross reference a "superblock" with an actual
filesystem mount point. Nor is there any way to reallly know that
all the shrinkers in one filesystem are related.

We normally solve this by ensuring that the fs related object has
the short bdev name appended to them. e.g:

$ pgrep xfs
1 I root          36       2  0  60 -20 -     0 -      Apr19 ?        00:00:10 [kworker/0:1H-xfs-log/dm-3]
1 I root         679       2  0  60 -20 -     0 -      Apr19 ?        00:00:00 [xfsalloc]
1 I root         680       2  0  60 -20 -     0 -      Apr19 ?        00:00:00 [xfs_mru_cache]
1 I root         681       2  0  60 -20 -     0 -      Apr19 ?        00:00:00 [xfs-buf/dm-1]
.....

Here we have a kworker process running log IO completion work on
dm-3, two global workqueue rescuer tasks (alloc, mru) and a rescuer
task for xfs-buf workqueue on dm-1.

We need the same name discrimination for shrinker information here,
too - just saying "this is an XFS superblock shrinker" is just not
sufficient when there are hundreds of XFS mount points with a
handful of shrinkers each.

> 2) Get information about a specific shrinker:
>   $ cd sb-btrfs-24/
>   $ ls
>     count  count_memcg  count_memcg_node  count_node  scan  scan_memcg  scan_memcg_node  scan_node
> 
> 3) Count objects on the system/root cgroup level
>   $ cat count
>     212
> 
> 4) Count objects on the system/root cgroup level per numa node (on a 2-node machine)
>   $ cat count_node
>     209 3

So a single space separated line with a number per node?

When you have a few hundred nodes and hundreds of thousands of objects per
node, we overrun the 4kB page size with a single line. What then?

> 5) Count objects for each memcg (output format: cgroup inode, count)
>   $ cat count_memcg
>     1 212
>     20 96
>     53 817
>     2297 2
>     218 13
>     581 30
>     911 124
>     <CUT>

What does "<CUT>" mean?

Also, this now iterates separate memcg per line. A parser now needs
to know the difference between count/count_node and
count_memcg/count_memcg_node because they are subtly different file
formats.  These files should have the same format, otherwise it just
creates needless complexity.

Indeed, why do we even need count/count_node? They are just the
"index 1" memcg output, so are totally redundant.

> 6) Same but with a per-node output
>   $ cat count_memcg_node
>     1 209 3
>     20 96 0
>     53 810 7
>     2297 2 0
>     218 13 0
>     581 30 0
>     911 124 0
>     <CUT>

So now we have a hundred nodes in the machine and thousands of
memcgs. And the information we want is in the numerically largest
memcg that is last in the list. ANd we want to graph it's behaviour
over time at high resolution (say 1Hz). Now we burn huge amounts
of CPU counting memcgs that we don't care about and then throwing
away most of the information. That's highly in-efficient and really
doesn't scale.

[snap active scan interface]

This just seems like a solution looking for a problem to solve.
Can you please describe the problem this infrastructure is going
to solve?

Cheers,

Dave.
-- 
Dave Chinner
dchinner@redhat.com


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/7] mm: introduce shrinker debugfs interface
  2022-04-26  6:02 ` [PATCH v2 0/7] mm: introduce shrinker debugfs interface Dave Chinner
@ 2022-04-26  6:45   ` Kent Overstreet
  2022-04-26  8:45   ` Hillf Danton
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 24+ messages in thread
From: Kent Overstreet @ 2022-04-26  6:45 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Roman Gushchin, Andrew Morton, linux-mm, linux-kernel, Yang Shi,
	Hillf Danton

On Tue, Apr 26, 2022 at 04:02:19PM +1000, Dave Chinner wrote:
> This just seems like a solution looking for a problem to solve.
> Can you please describe the problem this infrastructure is going
> to solve?

A point I was making over VC is that memcg is completely irrelevant to debugging
most of these issues; all the issues we've been talking about can be easily
reproduced in a single test VM without memcg.

Yet we don't even have the tooling to debug the simple stuff.

Why are we trying to make big and complicated stuff when we can't even debug the
simple cases? And I've been getting _really_ tired of the stock answer of "that
use case isn't interesting to the big cloud providers".

A: If you're a Linux kernel developer at this level, you have earned a great
deal of trust and it is incumbent upon you to be a good steward of the code you
have been entrusted with, instead of just spending all your time chasing fat
bonuses from your employer while ignoring what's good for the codebase as a
whole. That's pissing all over the commons that came long before you and will
hopefully still be around long after you.

B: Even aside from that, it's incredibly shortsighted and a poor use of time and
resources. When I was at Google I saw, over and over again, people rushing to do
something big and complicated and new because that was how they could get a
promotion, instead of working on basic stuff like refactoring core IO paths (and
it's been my experience over and over again that when you just try to make code
saner and more understandable, you almost always find big performance
improvements along the way... but that's not as exciting as rushing to find the
biggest coolest optimization or all-the-bells-and-whistles interface).

So yeah, this patchset screams of someone looking for a promotion to me.

Meanwhile, the status of visibility into the _basics_ of what goes on in MM is
utter dogshit. There's just too many _basic_ questions that are a pain in the
ass to answer - even just profiling memory usage by file:line number is a
shitshow.

One thing that I run into a lot is people rush to say "tracepoints!" for a lot
of problems - but tracepoints aren't a good answer for a lot of problems because
having them on all the time is problematic.

What I would like to see is more lighter weight collection of statistics, and
some basic library code for things like latency measurements of important
operations broken out by quantiles, with rate & frequence - this is something
that's helped in bcachefs. If anyone's interested, the code for that starts
here:

https://evilpiepirate.org/git/bcachefs.git/tree/fs/bcachefs/bcachefs.h#n322

Specifically for shrinkers, I'd like if we had rolling averages over the past
few seconds for e.g. _rate_ of objects requested to be freed vs. actually freed.
If we collect those kinds of rate measurements (and perhaps latency too, to show
stalls) at various places in the MM code, perhaps we'd be able to see what's
getting stuck when we OOM.

We should have rate of objects getting added, too, and we should be collecting
data from the list_lru code as well, like you were mentioning the other night.

And if we collect this data in such a way that it can be displayed in sysfs, but
done with the to_text() methods I've been talking about, it'll also be trivial
to include that in the show_mem() report when we OOM.

Anyways, that's my two cents.... I can't claim to have any brilliant insights
here, but I hope Roman will start taking ideas from more people (and Dave's been
a real wealth of information on this topic! I'd pick his brain if I were you,
Roman).

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/7] mm: introduce shrinker debugfs interface
  2022-04-26  6:02 ` [PATCH v2 0/7] mm: introduce shrinker debugfs interface Dave Chinner
  2022-04-26  6:45   ` Kent Overstreet
@ 2022-04-26  8:45   ` Hillf Danton
  2022-04-26 16:41   ` Roman Gushchin
  2022-04-26 19:05   ` Roman Gushchin
  3 siblings, 0 replies; 24+ messages in thread
From: Hillf Danton @ 2022-04-26  8:45 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Roman Gushchin, Andrew Morton, linux-mm, linux-kernel, Yang Shi,
	Kent Overstreet

On Tue, 26 Apr 2022 16:02:19 +1000 Dave Chinner wrote:
> On Fri, Apr 22, 2022 at 01:26:37PM -0700, Roman Gushchin wrote:
> > There are 50+ different shrinkers in the kernel, many with their own bells and
> > whistles. Under the memory pressure the kernel applies some pressure on each of
> > them in the order of which they were created/registered in the system. Some
> > of them can contain only few objects, some can be quite large. Some can be
> > effective at reclaiming memory, some not.
> > 
> > The only existing debugging mechanism is a couple of tracepoints in
> > do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end. They aren't
> > covering everything though: shrinkers which report 0 objects will never show up,
> > there is no support for memcg-aware shrinkers. Shrinkers are identified by their
> > scan function, which is not always enough (e.g. hard to guess which super
> > block's shrinker it is having only "super_cache_scan").
> 
> In general, I've had no trouble identifying individual shrinker
> instances because I'm always looking at individual subsystem
> shrinker tracepoints, too.  Hence I've almost always got the
> identification information in the traces I need to trace just the
> individual shrinker tracepoints and a bit of sed/grep/awk and I've
> got something I can feed to gnuplot or a python script to graph...
> 
> > They are a passive
> > mechanism: there is no way to call into counting and scanning of an individual
> > shrinker and profile it.
> 
> IDGI. profiling shrinkers iunder ideal conditions when there isn't
> memory pressure is largely a useless exercise because execution
> patterns under memory pressure are vastly different.

Well how many minutes, two or ten, does it take for kswapd to reclaim
100 xfs objects at DEF_PRIORITY-3?

> 
> All the problems with shrinkers show up when progress cannot be made
> as fast as memory reclaim wants memory to be reclaimed. How do you
> trigger priority windup causing large amounts of deferred processing
> because shrinkers are running in GFP_NOFS/GFP_NOIO context? How do
> you simulate objects getting dirtied in memory so they can't be
> immediately reclaimed so the shrinker can't make any progress at all
> until IO completes? How do you simulate the unbound concurrency that
> direct reclaim can drive into the shrinkers that causes massive lock
> contention on shared structures and locks that need to be accessed
> to free objects?
> 
> IOWs, if all you want to do is profile shrinkers running in the
> absence of memory pressure, then you can do that perfectly well with
> the existing 'echo 2 > /proc/sys/vm/drop_caches' mechanism. We don't
> need some complex debugfs API just to profile the shrinker
> behaviour.

Hm ... given ext4, what sense does xfs make? Or vice verse?
Or Given wine, why Coke?

I want to see the minutes recycling ten ext4 objects with xfs intact
before waking kswapd up.

Hillf
> 
> So why do we need any of the complexity and potential for abuse that
> comes from exposing control of shrinkers directly to userspace like
> these patches do?
> 
> > To provide a better visibility and debug options for memory shrinkers
> > this patchset introduces a /sys/kernel/debug/shrinker interface, to some extent
> > similar to /sys/kernel/slab.
> 
> /sys/kernel/slab contains read-only usage information - it is
> analagous for visibility arguments, but it is not equivalent for
> the rest of the "active" functionality you want to add here....
> 
> > For each shrinker registered in the system a directory is created. The directory
> > contains "count" and "scan" files, which allow to trigger count_objects()
> > and scan_objects() callbacks. For memcg-aware and numa-aware shrinkers
> > count_memcg, scan_memcg, count_node, scan_node, count_memcg_node
> > and scan_memcg_node are additionally provided. They allow to get per-memcg
> > and/or per-node object count and shrink only a specific memcg/node.
> 
> Great, but why does the shrinker introspection interface need active
> scan control functions like these?
> 
> > To make debugging more pleasant, the patchset also names all shrinkers,
> > so that debugfs entries can have more meaningful names.
> > 
> > Usage examples:
> > 
> > 1) List registered shrinkers:
> >   $ cd /sys/kernel/debug/shrinker/
> >   $ ls
> >     dqcache-16          sb-cgroup2-30    sb-hugetlbfs-33  sb-proc-41       sb-selinuxfs-22  sb-tmpfs-40    sb-zsmalloc-19
> >     kfree_rcu-0         sb-configfs-23   sb-iomem-12      sb-proc-44       sb-sockfs-8      sb-tmpfs-42    shadow-18
> >     sb-aio-20           sb-dax-11        sb-mqueue-21     sb-proc-45       sb-sysfs-26      sb-tmpfs-43    thp_deferred_split-10
> >     sb-anon_inodefs-15  sb-debugfs-7     sb-nsfs-4        sb-proc-47       sb-tmpfs-1       sb-tmpfs-46    thp_zero-9
> >     sb-bdev-3           sb-devpts-28     sb-pipefs-14     sb-pstore-31     sb-tmpfs-27      sb-tmpfs-49    xfs_buf-37
> >     sb-bpf-32           sb-devtmpfs-5    sb-proc-25       sb-rootfs-2      sb-tmpfs-29      sb-tracefs-13  xfs_inodegc-38
> >     sb-btrfs-24         sb-hugetlbfs-17  sb-proc-39       sb-securityfs-6  sb-tmpfs-35      sb-xfs-36      zspool-34
> 
> Ouch. That's not going to be useful for humans debugging a system as
> there's no way to cross reference a "superblock" with an actual
> filesystem mount point. Nor is there any way to reallly know that
> all the shrinkers in one filesystem are related.
> 
> We normally solve this by ensuring that the fs related object has
> the short bdev name appended to them. e.g:
> 
> $ pgrep xfs
> 1 I root          36       2  0  60 -20 -     0 -      Apr19 ?        00:00:10 [kworker/0:1H-xfs-log/dm-3]
> 1 I root         679       2  0  60 -20 -     0 -      Apr19 ?        00:00:00 [xfsalloc]
> 1 I root         680       2  0  60 -20 -     0 -      Apr19 ?        00:00:00 [xfs_mru_cache]
> 1 I root         681       2  0  60 -20 -     0 -      Apr19 ?        00:00:00 [xfs-buf/dm-1]
> .....
> 
> Here we have a kworker process running log IO completion work on
> dm-3, two global workqueue rescuer tasks (alloc, mru) and a rescuer
> task for xfs-buf workqueue on dm-1.
> 
> We need the same name discrimination for shrinker information here,
> too - just saying "this is an XFS superblock shrinker" is just not
> sufficient when there are hundreds of XFS mount points with a
> handful of shrinkers each.
> 
> > 2) Get information about a specific shrinker:
> >   $ cd sb-btrfs-24/
> >   $ ls
> >     count  count_memcg  count_memcg_node  count_node  scan  scan_memcg  scan_memcg_node  scan_node
> > 
> > 3) Count objects on the system/root cgroup level
> >   $ cat count
> >     212
> > 
> > 4) Count objects on the system/root cgroup level per numa node (on a 2-node machine)
> >   $ cat count_node
> >     209 3
> 
> So a single space separated line with a number per node?
> 
> When you have a few hundred nodes and hundreds of thousands of objects per
> node, we overrun the 4kB page size with a single line. What then?
> 
> > 5) Count objects for each memcg (output format: cgroup inode, count)
> >   $ cat count_memcg
> >     1 212
> >     20 96
> >     53 817
> >     2297 2
> >     218 13
> >     581 30
> >     911 124
> >     <CUT>
> 
> What does "<CUT>" mean?
> 
> Also, this now iterates separate memcg per line. A parser now needs
> to know the difference between count/count_node and
> count_memcg/count_memcg_node because they are subtly different file
> formats.  These files should have the same format, otherwise it just
> creates needless complexity.
> 
> Indeed, why do we even need count/count_node? They are just the
> "index 1" memcg output, so are totally redundant.
> 
> > 6) Same but with a per-node output
> >   $ cat count_memcg_node
> >     1 209 3
> >     20 96 0
> >     53 810 7
> >     2297 2 0
> >     218 13 0
> >     581 30 0
> >     911 124 0
> >     <CUT>
> 
> So now we have a hundred nodes in the machine and thousands of
> memcgs. And the information we want is in the numerically largest
> memcg that is last in the list. ANd we want to graph it's behaviour
> over time at high resolution (say 1Hz). Now we burn huge amounts
> of CPU counting memcgs that we don't care about and then throwing
> away most of the information. That's highly in-efficient and really
> doesn't scale.
> 
> [snap active scan interface]
> 
> This just seems like a solution looking for a problem to solve.
> Can you please describe the problem this infrastructure is going
> to solve?
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> dchinner@redhat.com
> 
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/7] mm: introduce shrinker debugfs interface
  2022-04-26  6:02 ` [PATCH v2 0/7] mm: introduce shrinker debugfs interface Dave Chinner
  2022-04-26  6:45   ` Kent Overstreet
  2022-04-26  8:45   ` Hillf Danton
@ 2022-04-26 16:41   ` Roman Gushchin
  2022-04-26 18:37     ` Kent Overstreet
  2022-04-27  1:22     ` Dave Chinner
  2022-04-26 19:05   ` Roman Gushchin
  3 siblings, 2 replies; 24+ messages in thread
From: Roman Gushchin @ 2022-04-26 16:41 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andrew Morton, linux-mm, linux-kernel, Yang Shi, Kent Overstreet,
	Hillf Danton

On Tue, Apr 26, 2022 at 04:02:19PM +1000, Dave Chinner wrote:
> On Fri, Apr 22, 2022 at 01:26:37PM -0700, Roman Gushchin wrote:
> > There are 50+ different shrinkers in the kernel, many with their own bells and
> > whistles. Under the memory pressure the kernel applies some pressure on each of
> > them in the order of which they were created/registered in the system. Some
> > of them can contain only few objects, some can be quite large. Some can be
> > effective at reclaiming memory, some not.
> > 
> > The only existing debugging mechanism is a couple of tracepoints in
> > do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end. They aren't
> > covering everything though: shrinkers which report 0 objects will never show up,
> > there is no support for memcg-aware shrinkers. Shrinkers are identified by their
> > scan function, which is not always enough (e.g. hard to guess which super
> > block's shrinker it is having only "super_cache_scan").
> 
> In general, I've had no trouble identifying individual shrinker
> instances because I'm always looking at individual subsystem
> shrinker tracepoints, too.  Hence I've almost always got the
> identification information in the traces I need to trace just the
> individual shrinker tracepoints and a bit of sed/grep/awk and I've
> got something I can feed to gnuplot or a python script to graph...
> 
> > They are a passive
> > mechanism: there is no way to call into counting and scanning of an individual
> > shrinker and profile it.
> 
> IDGI. profiling shrinkers iunder ideal conditions when there isn't
> memory pressure is largely a useless exercise because execution
> patterns under memory pressure are vastly different.
> 
> All the problems with shrinkers show up when progress cannot be made
> as fast as memory reclaim wants memory to be reclaimed. How do you
> trigger priority windup causing large amounts of deferred processing
> because shrinkers are running in GFP_NOFS/GFP_NOIO context? How do
> you simulate objects getting dirtied in memory so they can't be
> immediately reclaimed so the shrinker can't make any progress at all
> until IO completes? How do you simulate the unbound concurrency that
> direct reclaim can drive into the shrinkers that causes massive lock
> contention on shared structures and locks that need to be accessed
> to free objects?
> 
> IOWs, if all you want to do is profile shrinkers running in the
> absence of memory pressure, then you can do that perfectly well with
> the existing 'echo 2 > /proc/sys/vm/drop_caches' mechanism. We don't
> need some complex debugfs API just to profile the shrinker
> behaviour.
> 
> So why do we need any of the complexity and potential for abuse that
> comes from exposing control of shrinkers directly to userspace like
> these patches do?
> 
> > To provide a better visibility and debug options for memory shrinkers
> > this patchset introduces a /sys/kernel/debug/shrinker interface, to some extent
> > similar to /sys/kernel/slab.
> 
> /sys/kernel/slab contains read-only usage information - it is
> analagous for visibility arguments, but it is not equivalent for
> the rest of the "active" functionality you want to add here....
> 
> > For each shrinker registered in the system a directory is created. The directory
> > contains "count" and "scan" files, which allow to trigger count_objects()
> > and scan_objects() callbacks. For memcg-aware and numa-aware shrinkers
> > count_memcg, scan_memcg, count_node, scan_node, count_memcg_node
> > and scan_memcg_node are additionally provided. They allow to get per-memcg
> > and/or per-node object count and shrink only a specific memcg/node.
> 
> Great, but why does the shrinker introspection interface need active
> scan control functions like these?
> 
> > To make debugging more pleasant, the patchset also names all shrinkers,
> > so that debugfs entries can have more meaningful names.
> > 
> > Usage examples:
> > 
> > 1) List registered shrinkers:
> >   $ cd /sys/kernel/debug/shrinker/
> >   $ ls
> >     dqcache-16          sb-cgroup2-30    sb-hugetlbfs-33  sb-proc-41       sb-selinuxfs-22  sb-tmpfs-40    sb-zsmalloc-19
> >     kfree_rcu-0         sb-configfs-23   sb-iomem-12      sb-proc-44       sb-sockfs-8      sb-tmpfs-42    shadow-18
> >     sb-aio-20           sb-dax-11        sb-mqueue-21     sb-proc-45       sb-sysfs-26      sb-tmpfs-43    thp_deferred_split-10
> >     sb-anon_inodefs-15  sb-debugfs-7     sb-nsfs-4        sb-proc-47       sb-tmpfs-1       sb-tmpfs-46    thp_zero-9
> >     sb-bdev-3           sb-devpts-28     sb-pipefs-14     sb-pstore-31     sb-tmpfs-27      sb-tmpfs-49    xfs_buf-37
> >     sb-bpf-32           sb-devtmpfs-5    sb-proc-25       sb-rootfs-2      sb-tmpfs-29      sb-tracefs-13  xfs_inodegc-38
> >     sb-btrfs-24         sb-hugetlbfs-17  sb-proc-39       sb-securityfs-6  sb-tmpfs-35      sb-xfs-36      zspool-34
> 
> Ouch. That's not going to be useful for humans debugging a system as
> there's no way to cross reference a "superblock" with an actual
> filesystem mount point. Nor is there any way to reallly know that
> all the shrinkers in one filesystem are related.
> 
> We normally solve this by ensuring that the fs related object has
> the short bdev name appended to them. e.g:
> 
> $ pgrep xfs
> 1 I root          36       2  0  60 -20 -     0 -      Apr19 ?        00:00:10 [kworker/0:1H-xfs-log/dm-3]
> 1 I root         679       2  0  60 -20 -     0 -      Apr19 ?        00:00:00 [xfsalloc]
> 1 I root         680       2  0  60 -20 -     0 -      Apr19 ?        00:00:00 [xfs_mru_cache]
> 1 I root         681       2  0  60 -20 -     0 -      Apr19 ?        00:00:00 [xfs-buf/dm-1]
> .....
> 
> Here we have a kworker process running log IO completion work on
> dm-3, two global workqueue rescuer tasks (alloc, mru) and a rescuer
> task for xfs-buf workqueue on dm-1.
> 
> We need the same name discrimination for shrinker information here,
> too - just saying "this is an XFS superblock shrinker" is just not
> sufficient when there are hundreds of XFS mount points with a
> handful of shrinkers each.
> 
> > 2) Get information about a specific shrinker:
> >   $ cd sb-btrfs-24/
> >   $ ls
> >     count  count_memcg  count_memcg_node  count_node  scan  scan_memcg  scan_memcg_node  scan_node
> > 
> > 3) Count objects on the system/root cgroup level
> >   $ cat count
> >     212
> > 
> > 4) Count objects on the system/root cgroup level per numa node (on a 2-node machine)
> >   $ cat count_node
> >     209 3
> 
> So a single space separated line with a number per node?
> 
> When you have a few hundred nodes and hundreds of thousands of objects per
> node, we overrun the 4kB page size with a single line. What then?
> 
> > 5) Count objects for each memcg (output format: cgroup inode, count)
> >   $ cat count_memcg
> >     1 212
> >     20 96
> >     53 817
> >     2297 2
> >     218 13
> >     581 30
> >     911 124
> >     <CUT>
> 
> What does "<CUT>" mean?
> 
> Also, this now iterates separate memcg per line. A parser now needs
> to know the difference between count/count_node and
> count_memcg/count_memcg_node because they are subtly different file
> formats.  These files should have the same format, otherwise it just
> creates needless complexity.
> 
> Indeed, why do we even need count/count_node? They are just the
> "index 1" memcg output, so are totally redundant.
> 
> > 6) Same but with a per-node output
> >   $ cat count_memcg_node
> >     1 209 3
> >     20 96 0
> >     53 810 7
> >     2297 2 0
> >     218 13 0
> >     581 30 0
> >     911 124 0
> >     <CUT>
> 
> So now we have a hundred nodes in the machine and thousands of
> memcgs. And the information we want is in the numerically largest
> memcg that is last in the list. ANd we want to graph it's behaviour
> over time at high resolution (say 1Hz). Now we burn huge amounts
> of CPU counting memcgs that we don't care about and then throwing
> away most of the information. That's highly in-efficient and really
> doesn't scale.
> 
> [snap active scan interface]
> 
> This just seems like a solution looking for a problem to solve.
> Can you please describe the problem this infrastructure is going
> to solve?

Hi Dave!

Thank you for taking a look.

Can you, please, summarize your position, because it's a bit unclear.
You made a lot of good points about some details (e.g. shrinkers naming,
and I totally agree there; machines with hundreds of nodes etc), then
you said the active scanning is useless and then said the whole thing
is useless and we're fine with what we have regarding shrinkers debugging.

My plan is to work on convert shrinkers API to bytes and experiment
with different LRU implementations. I find an ability to easily export
statistics and other data (which doesn't exist now) via debugfs useful
(and way more convenient than changing existing tracepoints), as well as
an ability to trigger scanning of individual shrinkers. If nobody else
seeing any value here, I'm fine to keep these patches private, no reason
to argue about the output format then.

If you (or somebody else) see some value in at least "count" part, I'm happy
to answer all questions and incorporate the feedback in the next version.

Thank you!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/7] mm: introduce shrinker debugfs interface
  2022-04-26 16:41   ` Roman Gushchin
@ 2022-04-26 18:37     ` Kent Overstreet
  2022-04-27  1:22     ` Dave Chinner
  1 sibling, 0 replies; 24+ messages in thread
From: Kent Overstreet @ 2022-04-26 18:37 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Dave Chinner, Andrew Morton, linux-mm, linux-kernel, Yang Shi,
	Hillf Danton

On Tue, Apr 26, 2022 at 09:41:34AM -0700, Roman Gushchin wrote:
> My plan is to work on convert shrinkers API to bytes and experiment
> with different LRU implementations. I find an ability to easily export
> statistics and other data (which doesn't exist now) via debugfs useful
> (and way more convenient than changing existing tracepoints), as well as
> an ability to trigger scanning of individual shrinkers. If nobody else
> seeing any value here, I'm fine to keep these patches private, no reason
> to argue about the output format then.

I don't think converting the shrinker API to bytes instead of object counts is
such a great idea - that's going to introducing new rounding errors and new
corner cases when we can't free the exact # of bytes requested.

I was thinking along the lines of adding reporting for memory usage in bytes as
either an additional thing the .count_objects reports, or a new callback.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/7] mm: introduce shrinker debugfs interface
  2022-04-26  6:02 ` [PATCH v2 0/7] mm: introduce shrinker debugfs interface Dave Chinner
                     ` (2 preceding siblings ...)
  2022-04-26 16:41   ` Roman Gushchin
@ 2022-04-26 19:05   ` Roman Gushchin
  2022-04-27  1:02     ` Dave Chinner
  3 siblings, 1 reply; 24+ messages in thread
From: Roman Gushchin @ 2022-04-26 19:05 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andrew Morton, linux-mm, linux-kernel, Yang Shi, Kent Overstreet,
	Hillf Danton

On Tue, Apr 26, 2022 at 04:02:19PM +1000, Dave Chinner wrote:
> On Fri, Apr 22, 2022 at 01:26:37PM -0700, Roman Gushchin wrote:
> > There are 50+ different shrinkers in the kernel, many with their own bells and
> > whistles. Under the memory pressure the kernel applies some pressure on each of
> > them in the order of which they were created/registered in the system. Some
> > of them can contain only few objects, some can be quite large. Some can be
> > effective at reclaiming memory, some not.
> > 
> > The only existing debugging mechanism is a couple of tracepoints in
> > do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end. They aren't
> > covering everything though: shrinkers which report 0 objects will never show up,
> > there is no support for memcg-aware shrinkers. Shrinkers are identified by their
> > scan function, which is not always enough (e.g. hard to guess which super
> > block's shrinker it is having only "super_cache_scan").
> 
> In general, I've had no trouble identifying individual shrinker
> instances because I'm always looking at individual subsystem
> shrinker tracepoints, too.  Hence I've almost always got the
> identification information in the traces I need to trace just the
> individual shrinker tracepoints and a bit of sed/grep/awk and I've
> got something I can feed to gnuplot or a python script to graph...

You spent a lot of time working on shrinkers in general and xfs-specific
shrinkers in particular, no questions here. But imagine someone who's not
a core-mm developer and is adding a new shrinker.

> 
> > They are a passive
> > mechanism: there is no way to call into counting and scanning of an individual
> > shrinker and profile it.
> 
> IDGI. profiling shrinkers iunder ideal conditions when there isn't
> memory pressure is largely a useless exercise because execution
> patterns under memory pressure are vastly different.
> 
> All the problems with shrinkers show up when progress cannot be made
> as fast as memory reclaim wants memory to be reclaimed. How do you
> trigger priority windup causing large amounts of deferred processing
> because shrinkers are running in GFP_NOFS/GFP_NOIO context? How do
> you simulate objects getting dirtied in memory so they can't be
> immediately reclaimed so the shrinker can't make any progress at all
> until IO completes? How do you simulate the unbound concurrency that
> direct reclaim can drive into the shrinkers that causes massive lock
> contention on shared structures and locks that need to be accessed
> to free objects?

These are valid points and I assume we can find ways to emulate some of
these conditions, e.g. by allowing to run scanning using the GFP_NOFS context.
I though about it but decided to left for further improvements.

> 
> IOWs, if all you want to do is profile shrinkers running in the
> absence of memory pressure, then you can do that perfectly well with
> the existing 'echo 2 > /proc/sys/vm/drop_caches' mechanism. We don't
> need some complex debugfs API just to profile the shrinker
> behaviour.

And then we need somehow separate shrinkers in the result?

> 
> So why do we need any of the complexity and potential for abuse that
> comes from exposing control of shrinkers directly to userspace like
> these patches do?

I feel like the added complexity is minimal (unlike slab's sysfs, for
example). If the config option is off (by default), there is no additional
risk and overhead as well.

> 
> > To provide a better visibility and debug options for memory shrinkers
> > this patchset introduces a /sys/kernel/debug/shrinker interface, to some extent
> > similar to /sys/kernel/slab.
> 
> /sys/kernel/slab contains read-only usage information - it is
> analagous for visibility arguments, but it is not equivalent for
> the rest of the "active" functionality you want to add here....
> 
> > For each shrinker registered in the system a directory is created. The directory
> > contains "count" and "scan" files, which allow to trigger count_objects()
> > and scan_objects() callbacks. For memcg-aware and numa-aware shrinkers
> > count_memcg, scan_memcg, count_node, scan_node, count_memcg_node
> > and scan_memcg_node are additionally provided. They allow to get per-memcg
> > and/or per-node object count and shrink only a specific memcg/node.
> 
> Great, but why does the shrinker introspection interface need active
> scan control functions like these?

It makes testing of (new) shrinkers easier, for example.
For instance, shadow entries shrinker hides associated objects by returning
0 count most of the time (unless the total consumed memory is bigger than a
certain amount of the total memory).
echo 2 > /proc/sys/vm/drop_caches won't even trigger the scanning.

> 
> > To make debugging more pleasant, the patchset also names all shrinkers,
> > so that debugfs entries can have more meaningful names.
> > 
> > Usage examples:
> > 
> > 1) List registered shrinkers:
> >   $ cd /sys/kernel/debug/shrinker/
> >   $ ls
> >     dqcache-16          sb-cgroup2-30    sb-hugetlbfs-33  sb-proc-41       sb-selinuxfs-22  sb-tmpfs-40    sb-zsmalloc-19
> >     kfree_rcu-0         sb-configfs-23   sb-iomem-12      sb-proc-44       sb-sockfs-8      sb-tmpfs-42    shadow-18
> >     sb-aio-20           sb-dax-11        sb-mqueue-21     sb-proc-45       sb-sysfs-26      sb-tmpfs-43    thp_deferred_split-10
> >     sb-anon_inodefs-15  sb-debugfs-7     sb-nsfs-4        sb-proc-47       sb-tmpfs-1       sb-tmpfs-46    thp_zero-9
> >     sb-bdev-3           sb-devpts-28     sb-pipefs-14     sb-pstore-31     sb-tmpfs-27      sb-tmpfs-49    xfs_buf-37
> >     sb-bpf-32           sb-devtmpfs-5    sb-proc-25       sb-rootfs-2      sb-tmpfs-29      sb-tracefs-13  xfs_inodegc-38
> >     sb-btrfs-24         sb-hugetlbfs-17  sb-proc-39       sb-securityfs-6  sb-tmpfs-35      sb-xfs-36      zspool-34
> 
> Ouch. That's not going to be useful for humans debugging a system as
> there's no way to cross reference a "superblock" with an actual
> filesystem mount point. Nor is there any way to reallly know that
> all the shrinkers in one filesystem are related.
> 
> We normally solve this by ensuring that the fs related object has
> the short bdev name appended to them. e.g:
> 
> $ pgrep xfs
> 1 I root          36       2  0  60 -20 -     0 -      Apr19 ?        00:00:10 [kworker/0:1H-xfs-log/dm-3]
> 1 I root         679       2  0  60 -20 -     0 -      Apr19 ?        00:00:00 [xfsalloc]
> 1 I root         680       2  0  60 -20 -     0 -      Apr19 ?        00:00:00 [xfs_mru_cache]
> 1 I root         681       2  0  60 -20 -     0 -      Apr19 ?        00:00:00 [xfs-buf/dm-1]
> .....
> 
> Here we have a kworker process running log IO completion work on
> dm-3, two global workqueue rescuer tasks (alloc, mru) and a rescuer
> task for xfs-buf workqueue on dm-1.
> 
> We need the same name discrimination for shrinker information here,
> too - just saying "this is an XFS superblock shrinker" is just not
> sufficient when there are hundreds of XFS mount points with a
> handful of shrinkers each.

Good point, I think it's doable, and I really like it.

> 
> > 2) Get information about a specific shrinker:
> >   $ cd sb-btrfs-24/
> >   $ ls
> >     count  count_memcg  count_memcg_node  count_node  scan  scan_memcg  scan_memcg_node  scan_node
> > 
> > 3) Count objects on the system/root cgroup level
> >   $ cat count
> >     212
> > 
> > 4) Count objects on the system/root cgroup level per numa node (on a 2-node machine)
> >   $ cat count_node
> >     209 3
> 
> So a single space separated line with a number per node?
> 
> When you have a few hundred nodes and hundreds of thousands of objects per
> node, we overrun the 4kB page size with a single line. What then?

With seq_buf api we don't have 4kb limit, do we?

> 
> > 5) Count objects for each memcg (output format: cgroup inode, count)
> >   $ cat count_memcg
> >     1 212
> >     20 96
> >     53 817
> >     2297 2
> >     218 13
> >     581 30
> >     911 124
> >     <CUT>
> 
> What does "<CUT>" mean?

I've just shortened the lengthy output, not a part of the original output.

> 
> Also, this now iterates separate memcg per line. A parser now needs
> to know the difference between count/count_node and
> count_memcg/count_memcg_node because they are subtly different file
> formats.  These files should have the same format, otherwise it just
> creates needless complexity.
> 
> Indeed, why do we even need count/count_node? They are just the
> "index 1" memcg output, so are totally redundant.

Ok, but then we need a flag to indicate that a shrinker is memcg-aware?
But I got your point and I (partially) agree.
But do you think we're fine with just one interface and don't need
an aggregation over nodes? So just count_memcg_node?


> 
> > 6) Same but with a per-node output
> >   $ cat count_memcg_node
> >     1 209 3
> >     20 96 0
> >     53 810 7
> >     2297 2 0
> >     218 13 0
> >     581 30 0
> >     911 124 0
> >     <CUT>
> 
> So now we have a hundred nodes in the machine and thousands of
> memcgs. And the information we want is in the numerically largest
> memcg that is last in the list. ANd we want to graph it's behaviour
> over time at high resolution (say 1Hz). Now we burn huge amounts
> of CPU counting memcgs that we don't care about and then throwing
> away most of the information. That's highly in-efficient and really
> doesn't scale.

For this case we can provide an interface which allows to specify both
node and memcg and get the count. Personally I don't have a machine
with hundred nodes, so it's not on my radar.
If you find it useful, happy to add.

Thanks!

Roman

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/7] mm: introduce shrinker debugfs interface
  2022-04-26 19:05   ` Roman Gushchin
@ 2022-04-27  1:02     ` Dave Chinner
  0 siblings, 0 replies; 24+ messages in thread
From: Dave Chinner @ 2022-04-27  1:02 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, linux-mm, linux-kernel, Yang Shi, Kent Overstreet,
	Hillf Danton

On Tue, Apr 26, 2022 at 12:05:30PM -0700, Roman Gushchin wrote:
> On Tue, Apr 26, 2022 at 04:02:19PM +1000, Dave Chinner wrote:
> > On Fri, Apr 22, 2022 at 01:26:37PM -0700, Roman Gushchin wrote:
> > > There are 50+ different shrinkers in the kernel, many with their own bells and
> > > whistles. Under the memory pressure the kernel applies some pressure on each of
> > > them in the order of which they were created/registered in the system. Some
> > > of them can contain only few objects, some can be quite large. Some can be
> > > effective at reclaiming memory, some not.
> > > 
> > > The only existing debugging mechanism is a couple of tracepoints in
> > > do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end. They aren't
> > > covering everything though: shrinkers which report 0 objects will never show up,
> > > there is no support for memcg-aware shrinkers. Shrinkers are identified by their
> > > scan function, which is not always enough (e.g. hard to guess which super
> > > block's shrinker it is having only "super_cache_scan").
> > 
> > In general, I've had no trouble identifying individual shrinker
> > instances because I'm always looking at individual subsystem
> > shrinker tracepoints, too.  Hence I've almost always got the
> > identification information in the traces I need to trace just the
> > individual shrinker tracepoints and a bit of sed/grep/awk and I've
> > got something I can feed to gnuplot or a python script to graph...
> 
> You spent a lot of time working on shrinkers in general and xfs-specific
> shrinkers in particular, no questions here. But imagine someone who's not
> a core-mm developer and is adding a new shrinker.

At which point, they add their own subsystem introspection to
understand what their shrinker implementation is doing.

You keep talking about shrinkers as if they exist in isolation
to the actual subsystems that implement shrinkers. I think that is a
fundamental mistake - you cannot understand how a shrinker is
actually working without understanding something about what the
subsystem that implements the shrinker actually does.

That is, the tracepoints in the shrinker code are largely
supplemental to the subsystem introspection that is really
determining the behaviour of the system.

The shrinker infrastructure is only providing a measure of memory
pressure - most shrinker implementations jsut don't care about what
happens in the shrinker infrastructure - they just count and scan
objects for reclaim, and mostly that just works for them.

> > > They are a passive
> > > mechanism: there is no way to call into counting and scanning of an individual
> > > shrinker and profile it.
> > 
> > IDGI. profiling shrinkers iunder ideal conditions when there isn't
> > memory pressure is largely a useless exercise because execution
> > patterns under memory pressure are vastly different.
> > 
> > All the problems with shrinkers show up when progress cannot be made
> > as fast as memory reclaim wants memory to be reclaimed. How do you
> > trigger priority windup causing large amounts of deferred processing
> > because shrinkers are running in GFP_NOFS/GFP_NOIO context? How do
> > you simulate objects getting dirtied in memory so they can't be
> > immediately reclaimed so the shrinker can't make any progress at all
> > until IO completes? How do you simulate the unbound concurrency that
> > direct reclaim can drive into the shrinkers that causes massive lock
> > contention on shared structures and locks that need to be accessed
> > to free objects?
> 
> These are valid points and I assume we can find ways to emulate some of
> these conditions, e.g. by allowing to run scanning using the GFP_NOFS context.
> I though about it but decided to left for further improvements.
> 
> > 
> > IOWs, if all you want to do is profile shrinkers running in the
> > absence of memory pressure, then you can do that perfectly well with
> > the existing 'echo 2 > /proc/sys/vm/drop_caches' mechanism. We don't
> > need some complex debugfs API just to profile the shrinker
> > behaviour.
> 
> And then we need somehow separate shrinkers in the result?

How do you profile a shrinker in the first place? You have to load
up the slab cache/LRU before you have something you can actually
profile. SO it's as simple as 'drop caches, load up cache to be
profiled, drop caches'. It's trivial to isolate the specific cache
that got loaded up from the tracepoints, and then with other
tracepoints and/or perf profiling, you can extract the profile of
the shrinker that is doing all the reclaim work.

Indeed, you can point perf at the specific task that drops the
caches, and that is all you'll get in the profile. If you can't
isolate the specific shrinker profile from the output of such a
simple test setup, then you should hand in your Kernel Developer
badge....

> > So why do we need any of the complexity and potential for abuse that
> > comes from exposing control of shrinkers directly to userspace like
> > these patches do?
> 
> I feel like the added complexity is minimal (unlike slab's sysfs, for
> example). If the config option is off (by default), there is no additional
> risk and overhead as well.

No. The argument that "if we turn it off there's no overhead" means
one of two things:

1. nobody turns it on and it never gets tested and so bitrots and is
useless, or
2. distro's all turn it on because some tool they ship or customer
they ship to wants it.

Either way, hiding it behind a config option is not an acceptible
solution for mering poorly thought out infrastructure.

> > > To provide a better visibility and debug options for memory shrinkers
> > > this patchset introduces a /sys/kernel/debug/shrinker interface, to some extent
> > > similar to /sys/kernel/slab.
> > 
> > /sys/kernel/slab contains read-only usage information - it is
> > analagous for visibility arguments, but it is not equivalent for
> > the rest of the "active" functionality you want to add here....
> > 
> > > For each shrinker registered in the system a directory is created. The directory
> > > contains "count" and "scan" files, which allow to trigger count_objects()
> > > and scan_objects() callbacks. For memcg-aware and numa-aware shrinkers
> > > count_memcg, scan_memcg, count_node, scan_node, count_memcg_node
> > > and scan_memcg_node are additionally provided. They allow to get per-memcg
> > > and/or per-node object count and shrink only a specific memcg/node.
> > 
> > Great, but why does the shrinker introspection interface need active
> > scan control functions like these?
> 
> It makes testing of (new) shrinkers easier, for example.
> For instance, shadow entries shrinker hides associated objects by returning
> 0 count most of the time (unless the total consumed memory is bigger than a
> certain amount of the total memory).
> echo 2 > /proc/sys/vm/drop_caches won't even trigger the scanning.

And that's exactly my point above: you cannot test shrinkers in
isolation of the subsystem that loads them up. In this case, you
*aren't testing the shrinker*, you are testing how the shadow entry
subsystem manages the working set. The shrinker is an integrated
part of that subsystem, so any test hooks needed to trigger the
reclaim of shadow entries belong in the ->count method of the
the shrinker implementation, such that it runs whenever the shrinker
is called rather than only when the memory usage threshold is
triggered.

At that point, drop_caches then does exactly what you need.

Shrinkers cannot be tested in isolation of the subsystem they act
on!

> > > 2) Get information about a specific shrinker:
> > >   $ cd sb-btrfs-24/
> > >   $ ls
> > >     count  count_memcg  count_memcg_node  count_node  scan  scan_memcg  scan_memcg_node  scan_node
> > > 
> > > 3) Count objects on the system/root cgroup level
> > >   $ cat count
> > >     212
> > > 
> > > 4) Count objects on the system/root cgroup level per numa node (on a 2-node machine)
> > >   $ cat count_node
> > >     209 3
> > 
> > So a single space separated line with a number per node?
> > 
> > When you have a few hundred nodes and hundreds of thousands of objects per
> > node, we overrun the 4kB page size with a single line. What then?
> 
> With seq_buf api we don't have 4kb limit, do we?

No idea. Never cared enough about sysfs to need to know.

But that doesn't avoid the issue: verbosity and overhead to
create/parse this information.


> > Also, this now iterates separate memcg per line. A parser now needs
> > to know the difference between count/count_node and
> > count_memcg/count_memcg_node because they are subtly different file
> > formats.  These files should have the same format, otherwise it just
> > creates needless complexity.
> > 
> > Indeed, why do we even need count/count_node? They are just the
> > "index 1" memcg output, so are totally redundant.
> 
> Ok, but then we need a flag to indicate that a shrinker is memcg-aware?
> But I got your point and I (partially) agree.
> But do you think we're fine with just one interface and don't need
> an aggregation over nodes? So just count_memcg_node?

/me puts on the broken record

Shrinker infrastructure needs to stop treating memcgs are something
special and off to the side. We need to integrate the code so there
is a single scan loop that simply treats the "no memcg" case as the
root memcg. Bleeding architectural/implementation deficiencies into
user visible APIs is even worse than just having to put up with them
in the implementation....

> > > 6) Same but with a per-node output
> > >   $ cat count_memcg_node
> > >     1 209 3
> > >     20 96 0
> > >     53 810 7
> > >     2297 2 0
> > >     218 13 0
> > >     581 30 0
> > >     911 124 0
> > >     <CUT>
> > 
> > So now we have a hundred nodes in the machine and thousands of
> > memcgs. And the information we want is in the numerically largest
> > memcg that is last in the list. ANd we want to graph it's behaviour
> > over time at high resolution (say 1Hz). Now we burn huge amounts
> > of CPU counting memcgs that we don't care about and then throwing
> > away most of the information. That's highly in-efficient and really
> > doesn't scale.
> 
> For this case we can provide an interface which allows to specify both
> node and memcg and get the count. Personally I don't have a machine
> with hundred nodes, so it's not on my radar.

Yup, but there are people how do have this sort of machine, which do
use memcgs (in their thousands) and do have many, many superblocks
(in their thousands). Just because you personally don't have such
machines it does not mean you don't have to design for such
machines. Saying "I don't care other people's requirements" is
exactly what Kent had a rant about in the other leg of this thread.

We know that we have these scalability issues in generic
infrastructure, and therefore generic infrastructure has to handle
these issues at a architecture and design level. We don't need the
initial implementation to work well at such levels of scalability,
but we sure as hell need the design, APIs and file formats to scale
out because if it doesn't scale there is no question that *we will
have to fix it*.

So, yeah, you need to think about how to do fine-grained access to
shrinker stats effectively. That might require a complete change of
presentation API. For example, changing the filesystem layout to
be memcg centric rather than shrinker instance centric would make an
awful lot of this file parsing problem go away.

e.g:

/sys/kernel/debug/mm/memcg/<memcg instance>/shrinker/<shrinker instance>/stats

Cheers,

Dave.
-- 
Dave Chinner
dchinner@redhat.com


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/7] mm: introduce shrinker debugfs interface
  2022-04-26 16:41   ` Roman Gushchin
  2022-04-26 18:37     ` Kent Overstreet
@ 2022-04-27  1:22     ` Dave Chinner
  2022-04-27  2:18       ` Roman Gushchin
  1 sibling, 1 reply; 24+ messages in thread
From: Dave Chinner @ 2022-04-27  1:22 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, linux-mm, linux-kernel, Yang Shi, Kent Overstreet,
	Hillf Danton

On Tue, Apr 26, 2022 at 09:41:34AM -0700, Roman Gushchin wrote:
> Can you, please, summarize your position, because it's a bit unclear.
> You made a lot of good points about some details (e.g. shrinkers naming,
> and I totally agree there; machines with hundreds of nodes etc), then
> you said the active scanning is useless and then said the whole thing
> is useless and we're fine with what we have regarding shrinkers debugging.

Better introspection the first thing we need. Work on improving
that. I've been making suggestions to help improve introspection
infrastructure.

Before anything else, we need to improve introspection so we can
gain better insight into the problems we have. Once we understand
the problems better and have evidence to back up where the problems
lie and we have a plan to solve them, then we can talk about whether
we need other user accessible shrinker APIs.

For the moment, exposing shrinker control interfaces to userspace
could potentially be very bad because it exposes internal
architectural and implementation details to a user API.  Just
because it is in /sys/kernel/debug it doesn't mean applications
won't start to use it and build dependencies on it.

That doesn't mean I'm opposed to exposing a shrinker control
mechanism to debugfs - I'm still on the fence on that one. However,
I definitely think that an API that directly exposes the internal
implementation to userspace is the wrong way to go about this.

Fine grained shrinker control is not necessary to improve shrinker
introspection and OOM debugging capability, so if you want/need
control interfaces then I think you should separate those out into a
separate line of development where it doesn't derail the discussion
on how to improve shrinker/OOM introspection.

-Dave.
-- 
Dave Chinner
dchinner@redhat.com


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/7] mm: introduce shrinker debugfs interface
  2022-04-27  1:22     ` Dave Chinner
@ 2022-04-27  2:18       ` Roman Gushchin
  0 siblings, 0 replies; 24+ messages in thread
From: Roman Gushchin @ 2022-04-27  2:18 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andrew Morton, linux-mm, linux-kernel, Yang Shi, Kent Overstreet,
	Hillf Danton

On Wed, Apr 27, 2022 at 11:22:55AM +1000, Dave Chinner wrote:
> On Tue, Apr 26, 2022 at 09:41:34AM -0700, Roman Gushchin wrote:
> > Can you, please, summarize your position, because it's a bit unclear.
> > You made a lot of good points about some details (e.g. shrinkers naming,
> > and I totally agree there; machines with hundreds of nodes etc), then
> > you said the active scanning is useless and then said the whole thing
> > is useless and we're fine with what we have regarding shrinkers debugging.
> 
> Better introspection the first thing we need. Work on improving
> that. I've been making suggestions to help improve introspection
> infrastructure.
> 
> Before anything else, we need to improve introspection so we can
> gain better insight into the problems we have. Once we understand
> the problems better and have evidence to back up where the problems
> lie and we have a plan to solve them, then we can talk about whether
> we need other user accessible shrinker APIs.

Ok, at least we do agree here.

This is exactly why I've started with this debugfs stuff.

> 
> For the moment, exposing shrinker control interfaces to userspace
> could potentially be very bad because it exposes internal
> architectural and implementation details to a user API.  Just
> because it is in /sys/kernel/debug it doesn't mean applications
> won't start to use it and build dependencies on it.
> 
> That doesn't mean I'm opposed to exposing a shrinker control
> mechanism to debugfs - I'm still on the fence on that one. However,
> I definitely think that an API that directly exposes the internal
> implementation to userspace is the wrong way to go about this.

Ok, if it's about having memcg-aware and other interfaces, I can
agree here as well.

I actually made an attempt to unify memcg-aware and system-wide
shrinker scanning, not very successful yet, but it's definitely
on my todo list. I'm pretty sure we're iterating over and over
some empty root-level shrinkers without benefiting the bitmap
infrastructure which works for memory cgroups.

> 
> Fine grained shrinker control is not necessary to improve shrinker
> introspection and OOM debugging capability, so if you want/need
> control interfaces then I think you should separate those out into a
> separate line of development where it doesn't derail the discussion
> on how to improve shrinker/OOM introspection.

Ok, no problems here. Btw, tem OOM debugging is a separate topic brought
in by Kent, I'd keep it separate too, as it comes with many OOM-specific
complications.

From your another email:
> So, yeah, you need to think about how to do fine-grained access to
> shrinker stats effectively. That might require a complete change of
> presentation API. For example, changing the filesystem layout to be
> memcg centric rather than shrinker instance centric would make an
> awful lot of this file parsing problem go away.
>
> e.g:
>
> /sys/kernel/debug/mm/memcg/<memcg instance>/shrinker/<shrinker instance>/stats

The problem with this approach (I though about it) is that it comes
with a high memory overhead especially on that machine with thousands cgroups
and mount points. And beside the memory overhead, it's really expensive to
collect system-wide data and get a big picture, as it requires opening
and reading of thousand of files.

Actually, you wrote recently:
"I've thought about it, too, and can see where it could be useful.
However, when I consider the list_lru memcg integration, I suspect
it becomes a "can't see the forest for the trees" problem. We're
going to end up with millions of sysfs objects with no obvious way
to navigate, iterate or search them if we just take the naive "sysfs
object + stats per list_lru instance" approach."

It all makes me think we need both: a way to iterate over all memcgs and dump
all the numbers at once and a way to get a specific per-memcg (per-node) count.

Thanks!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 5/7] mm: provide shrinkers with names
  2022-04-23  7:46       ` Christophe JAILLET
  (?)
@ 2022-04-28  0:25       ` Roman Gushchin
  -1 siblings, 0 replies; 24+ messages in thread
From: Roman Gushchin @ 2022-04-28  0:25 UTC (permalink / raw)
  To: Christophe JAILLET
  Cc: Andrew Morton, linux-mm, Dave Chinner, linux-kernel, Yang Shi,
	Kent Overstreet, Hillf Danton

On Sat, Apr 23, 2022 at 09:46:25AM +0200, Christophe JAILLET wrote:
> Hi,
> 
> Le 22/04/2022 à 22:26, Roman Gushchin a écrit :
> > Currently shrinkers are anonymous objects. For debugging purposes they
> > can be identified by count/scan function names, but it's not always
> > useful: e.g. for superblock's shrinkers it's nice to have at least
> > an idea of to which superblock the shrinker belongs.
> > 
> > This commit adds names to shrinkers. register_shrinker() and
> > prealloc_shrinker() functions are extended to take a format and
> > arguments to master a name. If CONFIG_SHRINKER_DEBUG is on,
> > the name is saved until the corresponding debugfs object is created,
> > otherwise it's simple ignored.
> > 
> > After this change the shrinker debugfs directory looks like:
> >    $ cd /sys/kernel/debug/shrinker/
> >    $ ls
> >      dqcache-16          sb-cgroup2-30    sb-hugetlbfs-33  sb-proc-41       sb-selinuxfs-22  sb-tmpfs-40    sb-zsmalloc-19
> >      kfree_rcu-0         sb-configfs-23   sb-iomem-12      sb-proc-44       sb-sockfs-8      sb-tmpfs-42    shadow-18
> >      sb-aio-20           sb-dax-11        sb-mqueue-21     sb-proc-45       sb-sysfs-26      sb-tmpfs-43    thp_deferred_split-10
> >      sb-anon_inodefs-15  sb-debugfs-7     sb-nsfs-4        sb-proc-47       sb-tmpfs-1       sb-tmpfs-46    thp_zero-9
> >      sb-bdev-3           sb-devpts-28     sb-pipefs-14     sb-pstore-31     sb-tmpfs-27      sb-tmpfs-49    xfs_buf-37
> >      sb-bpf-32           sb-devtmpfs-5    sb-proc-25       sb-rootfs-2      sb-tmpfs-29      sb-tracefs-13  xfs_inodegc-38
> >      sb-btrfs-24         sb-hugetlbfs-17  sb-proc-39       sb-securityfs-6  sb-tmpfs-35      sb-xfs-36      zspool-34
> > 
> > Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> > ---
> 
> [...]
> 
> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > index 59f91e392a2a..1b326b93155c 100644
> > --- a/drivers/md/raid5.c
> > +++ b/drivers/md/raid5.c
> > @@ -7383,7 +7383,7 @@ static struct r5conf *setup_conf(struct mddev *mddev)
> >   	conf->shrinker.count_objects = raid5_cache_count;
> >   	conf->shrinker.batch = 128;
> >   	conf->shrinker.flags = 0;
> > -	if (register_shrinker(&conf->shrinker)) {
> > +	if (register_shrinker(&conf->shrinker, "md")) {
> 
> Based on pr_warn below, does it make sense to have something like:
>   register_shrinker(&conf->shrinker, "md-%s", mdname(mddev))
> 
> >   		pr_warn("md/raid:%s: couldn't register shrinker.\n",
> >   			mdname(mddev));
> >   		goto abort;

Good idea, will do, thanks!

> 
> [...]
> 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 121a54a1602b..6025cfda4932 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -613,7 +613,7 @@ static unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru,
> >   /*
> >    * Add a shrinker callback to be called from the vm.
> >    */
> > -int prealloc_shrinker(struct shrinker *shrinker)
> > +static int __prealloc_shrinker(struct shrinker *shrinker)
> >   {
> >   	unsigned int size;
> >   	int err;
> > @@ -637,6 +637,34 @@ int prealloc_shrinker(struct shrinker *shrinker)
> >   	return 0;
> >   }
> > +#ifdef CONFIG_SHRINKER_DEBUG
> > +int prealloc_shrinker(struct shrinker *shrinker, const char *fmt, ...)
> > +{
> > +	int err;
> > +	char buf[64];
> > +	va_list ap;
> > +
> > +	va_start(ap, fmt);
> > +	vscnprintf(buf, sizeof(buf), fmt, ap);
> > +	va_end(ap);
> > +
> > +	shrinker->name = kstrdup(buf, GFP_KERNEL);
> > +	if (!shrinker->name)
> > +		return -ENOMEM;
> 
> use kvasprintf_const() (and kfree_const() elsewhere) to simplify code and
> take advantage of kstrdup_const() in most cases?

Sure, good point.

> 
> > +
> > +	err = __prealloc_shrinker(shrinker);
> > +	if (err)
> > +		kfree(shrinker->name);
> > +
> > +	return err;
> > +}
> > +#else
> > +int prealloc_shrinker(struct shrinker *shrinker, const char *fmt, ...)
> > +{
> > +	return __prealloc_shrinker(shrinker);
> > +}
> > +#endif
> > +
> >   void free_prealloced_shrinker(struct shrinker *shrinker)
> >   {
> >   	if (shrinker->flags & SHRINKER_MEMCG_AWARE) {
> > @@ -648,6 +676,9 @@ void free_prealloced_shrinker(struct shrinker *shrinker)
> >   	kfree(shrinker->nr_deferred);
> >   	shrinker->nr_deferred = NULL;
> > +#ifdef CONFIG_SHRINKER_DEBUG
> > +	kfree(shrinker->name);
> > +#endif
> >   }
> >   void register_shrinker_prepared(struct shrinker *shrinker)
> > @@ -659,15 +690,38 @@ void register_shrinker_prepared(struct shrinker *shrinker)
> >   	up_write(&shrinker_rwsem);
> >   }
> > -int register_shrinker(struct shrinker *shrinker)
> > +static int __register_shrinker(struct shrinker *shrinker)
> >   {
> > -	int err = prealloc_shrinker(shrinker);
> > +	int err = __prealloc_shrinker(shrinker);
> >   	if (err)
> >   		return err;
> >   	register_shrinker_prepared(shrinker);
> >   	return 0;
> >   }
> > +
> > +#ifdef CONFIG_SHRINKER_DEBUG
> > +int register_shrinker(struct shrinker *shrinker, const char *fmt, ...)
> > +{
> > +	char buf[64];
> > +	va_list ap;
> > +
> > +	va_start(ap, fmt);
> > +	vscnprintf(buf, sizeof(buf), fmt, ap);
> > +	va_end(ap);
> > +
> > +	shrinker->name = kstrdup(buf, GFP_KERNEL);
> > +	if (!shrinker->name)
> > +		return -ENOMEM;
> > +
> 
> same as above.
> 
> > +	return __register_shrinker(shrinker);
> 
> Missing error handling and freeing of shrinker->name as done in
> prealloc_shrinker()?

Will check in the next version.

Thank you for taking a look!

Roman

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2022-04-28  0:25 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-22 20:26 [PATCH v2 0/7] mm: introduce shrinker debugfs interface Roman Gushchin
2022-04-22 20:26 ` [PATCH v2 1/7] mm: introduce debugfs interface for kernel memory shrinkers Roman Gushchin
2022-04-23  7:51   ` Christophe JAILLET
2022-04-23  7:51     ` Christophe JAILLET
2022-04-22 20:26 ` [PATCH v2 2/7] mm: memcontrol: introduce mem_cgroup_ino() and mem_cgroup_get_from_ino() Roman Gushchin
2022-04-22 20:26 ` [PATCH v2 3/7] mm: introduce memcg interfaces for shrinker debugfs Roman Gushchin
2022-04-23  0:35   ` Hillf Danton
2022-04-23  1:29     ` Roman Gushchin
2022-04-22 20:26 ` [PATCH v2 4/7] mm: introduce numa " Roman Gushchin
2022-04-22 20:26 ` [PATCH v2 5/7] mm: provide shrinkers with names Roman Gushchin
2022-04-23  7:46   ` Christophe JAILLET
2022-04-23  7:46     ` Christophe JAILLET
2022-04-23  7:46       ` Christophe JAILLET
2022-04-28  0:25       ` Roman Gushchin
2022-04-22 20:26 ` [PATCH v2 6/7] docs: document shrinker debugfs Roman Gushchin
2022-04-22 20:26 ` [PATCH v2 7/7] tools: add memcg_shrinker.py Roman Gushchin
2022-04-26  6:02 ` [PATCH v2 0/7] mm: introduce shrinker debugfs interface Dave Chinner
2022-04-26  6:45   ` Kent Overstreet
2022-04-26  8:45   ` Hillf Danton
2022-04-26 16:41   ` Roman Gushchin
2022-04-26 18:37     ` Kent Overstreet
2022-04-27  1:22     ` Dave Chinner
2022-04-27  2:18       ` Roman Gushchin
2022-04-26 19:05   ` Roman Gushchin
2022-04-27  1:02     ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.