All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH rfc 0/5] mm: introduce shrinker sysfs interface
@ 2022-04-16  0:27 Roman Gushchin
  2022-04-16  0:27 ` [PATCH rfc 1/5] mm: introduce sysfs interface for debugging kernel shrinker Roman Gushchin
                   ` (9 more replies)
  0 siblings, 10 replies; 24+ messages in thread
From: Roman Gushchin @ 2022-04-16  0:27 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Dave Chinner, linux-kernel, Johannes Weiner,
	Michal Hocko, Shakeel Butt, Yang Shi, Roman Gushchin

There are 50+ different shrinkers in the kernel, many with their own bells and
whistles. Under the memory pressure the kernel applies some pressure on each of
them in the order of which they were created/registered in the system. Some
of them can contain only few objects, some can be quite large. Some can be
effective at reclaiming memory, some not.

The only existing debugging mechanism is a couple of tracepoints in
do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end. They aren't
covering everything though: shrinkers which report 0 objects will never show up,
there is no support for memcg-aware shrinkers. Shrinkers are identified by their
scan function, which is not always enough (e.g. hard to guess which super
block's shrinker it is having only "super_cache_scan"). They are a passive
mechanism: there is no way to call into counting and scanning of an individual
shrinker and profile it.

To provide a better visibility and debug options for memory shrinkers
this patchset introduces a /sys/kernel/shrinker interface, to some extent
similar to /sys/kernel/slab.

For each shrinker registered in the system a folder is created. The folder
contains "count" and "scan" files, which allow to trigger count_objects()
and scan_objects() callbacks. For memcg-aware and numa-aware shrinkers
count_memcg, scan_memcg, count_node, scan_node, count_memcg_node
and scan_memcg_node are additionally provided. They allow to get per-memcg
and/or per-node object count and shrink only a specific memcg/node.

To make debugging more pleasant, the patchset also names all shrinkers,
so that sysfs entries can have more meaningful names.

Usage examples:

1) List registered shrinkers:
  $ cd /sys/kernel/shrinker/
  $ ls
    dqcache-16          sb-cgroup2-30    sb-hugetlbfs-33  sb-proc-41       sb-selinuxfs-22  sb-tmpfs-40    sb-zsmalloc-19
    kfree_rcu-0         sb-configfs-23   sb-iomem-12      sb-proc-44       sb-sockfs-8      sb-tmpfs-42    shadow-18
    sb-aio-20           sb-dax-11        sb-mqueue-21     sb-proc-45       sb-sysfs-26      sb-tmpfs-43    thp_deferred_split-10
    sb-anon_inodefs-15  sb-debugfs-7     sb-nsfs-4        sb-proc-47       sb-tmpfs-1       sb-tmpfs-46    thp_zero-9
    sb-bdev-3           sb-devpts-28     sb-pipefs-14     sb-pstore-31     sb-tmpfs-27      sb-tmpfs-49    xfs_buf-37
    sb-bpf-32           sb-devtmpfs-5    sb-proc-25       sb-rootfs-2      sb-tmpfs-29      sb-tracefs-13  xfs_inodegc-38
    sb-btrfs-24         sb-hugetlbfs-17  sb-proc-39       sb-securityfs-6  sb-tmpfs-35      sb-xfs-36      zspool-34

2) Get information about a specific shrinker:
  $ cd sb-btrfs-24/
  $ ls
    count  count_memcg  count_memcg_node  count_node  scan  scan_memcg  scan_memcg_node  scan_node

3) Count objects on the system/root cgroup level
  $ cat count
    212

4) Count objects on the system/root cgroup level per numa node (on a 2-node machine)
  $ cat count_node
    209 3

5) Count objects for each memcg (output format: cgroup inode, count)
  $ cat count_memcg
    1 212
    20 96
    53 817
    2297 2
    218 13
    581 30
    911 124
    <CUT>

6) Same but with a per-node output
  $ cat count_memcg_node
    1 209 3
    20 96 0
    53 810 7
    2297 2 0
    218 13 0
    581 30 0
    911 124 0
    <CUT>

7) Don't display cgroups with less than 500 attached objects
  $ echo 500 > count_memcg
  $ cat count_memcg
    53 817
    1868 886
    2396 799
    2462 861

8) Don't display cgroups with less than 500 attached objects (sum over all nodes)
  $ echo "500" > count_memcg_node
  $ cat count_memcg_node
    53 810 7
    1868 886 0
    2396 799 0
    2462 861 0

9) Scan system/root shrinker
  $ cat count
    212
  $ echo 100 > scan
  $ cat scan
    97
  $ cat count
    115

10) Scan individual memcg
  $ echo "1868 500" > scan_memcg
  $ cat scan_memcg
    193

11) Scan individual node
  $ echo "1 200" > scan_node
  $ cat scan_node
    2

12) Scan individual memcg and node
  $ echo "1868 0 500" > scan_memcg_node
  $ cat scan_memcg_node
    435

If the output doesn't fit into a single page, "...\n" is printed at the end of
output.


Roman Gushchin (5):
  mm: introduce sysfs interface for debugging kernel shrinker
  mm: memcontrol: introduce mem_cgroup_ino() and
    mem_cgroup_get_from_ino()
  mm: introduce memcg interfaces for shrinker sysfs
  mm: introduce numa interfaces for shrinker sysfs
  mm: provide shrinkers with names

 arch/x86/kvm/mmu/mmu.c                        |   2 +-
 drivers/android/binder_alloc.c                |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_shrinker.c  |   3 +-
 drivers/gpu/drm/msm/msm_gem_shrinker.c        |   2 +-
 .../gpu/drm/panfrost/panfrost_gem_shrinker.c  |   2 +-
 drivers/gpu/drm/ttm/ttm_pool.c                |   2 +-
 drivers/md/bcache/btree.c                     |   2 +-
 drivers/md/dm-bufio.c                         |   2 +-
 drivers/md/dm-zoned-metadata.c                |   2 +-
 drivers/md/raid5.c                            |   2 +-
 drivers/misc/vmw_balloon.c                    |   2 +-
 drivers/virtio/virtio_balloon.c               |   2 +-
 drivers/xen/xenbus/xenbus_probe_backend.c     |   2 +-
 fs/erofs/utils.c                              |   2 +-
 fs/ext4/extents_status.c                      |   3 +-
 fs/f2fs/super.c                               |   2 +-
 fs/gfs2/glock.c                               |   2 +-
 fs/gfs2/main.c                                |   2 +-
 fs/jbd2/journal.c                             |   2 +-
 fs/mbcache.c                                  |   2 +-
 fs/nfs/nfs42xattr.c                           |   7 +-
 fs/nfs/super.c                                |   2 +-
 fs/nfsd/filecache.c                           |   2 +-
 fs/nfsd/nfscache.c                            |   2 +-
 fs/quota/dquot.c                              |   2 +-
 fs/super.c                                    |   2 +-
 fs/ubifs/super.c                              |   2 +-
 fs/xfs/xfs_buf.c                              |   2 +-
 fs/xfs/xfs_icache.c                           |   2 +-
 fs/xfs/xfs_qm.c                               |   2 +-
 include/linux/memcontrol.h                    |   9 +
 include/linux/shrinker.h                      |  25 +-
 kernel/rcu/tree.c                             |   2 +-
 lib/Kconfig.debug                             |   9 +
 mm/Makefile                                   |   1 +
 mm/huge_memory.c                              |   4 +-
 mm/memcontrol.c                               |  23 +
 mm/shrinker_debug.c                           | 792 ++++++++++++++++++
 mm/vmscan.c                                   |  66 +-
 mm/workingset.c                               |   2 +-
 mm/zsmalloc.c                                 |   2 +-
 net/sunrpc/auth.c                             |   2 +-
 42 files changed, 957 insertions(+), 47 deletions(-)
 create mode 100644 mm/shrinker_debug.c

-- 
2.35.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH rfc 1/5] mm: introduce sysfs interface for debugging kernel shrinker
  2022-04-16  0:27 [PATCH rfc 0/5] mm: introduce shrinker sysfs interface Roman Gushchin
@ 2022-04-16  0:27 ` Roman Gushchin
  2022-04-16  1:35   ` Hillf Danton
  2022-04-16  0:27 ` [PATCH rfc 2/5] mm: memcontrol: introduce mem_cgroup_ino() and mem_cgroup_get_from_ino() Roman Gushchin
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Roman Gushchin @ 2022-04-16  0:27 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Dave Chinner, linux-kernel, Johannes Weiner,
	Michal Hocko, Shakeel Butt, Yang Shi, Roman Gushchin

This commit introduces the /sys/kernel/shrinker sysfs interface
which provides an ability to observe the state and interact with
individual kernel memory shrinkers.

Because the feature is oriented on kernel developers and adds some
memory overhead (which shouldn't be large unless there is a huge
amount of registered shrinkers), it's guarded by a config option
(disabled by default).

To simplify the code, kobjects are not embedded into shrinkers
objects, but are created, linked and unlinked dynamically.

This commit introduces basic "count" and "scan" interfaces.
Basic usage:
  $ cat count                   : get the number of objects
  $ echo "500" > scan           : try to reclaim 500 objects
  $ cat scan                    : get the number of objects reclaimed
Following commits in the series will add memcg- and numa-specific
features.

This commit gives sysfs entries simple numeric names, which are not
very convenient. The following commit in the series will provide
shrinkers with more meaningful names.

Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
---
 include/linux/shrinker.h |  20 ++-
 lib/Kconfig.debug        |   9 ++
 mm/Makefile              |   1 +
 mm/shrinker_debug.c      | 294 +++++++++++++++++++++++++++++++++++++++
 mm/vmscan.c              |   6 +-
 5 files changed, 327 insertions(+), 3 deletions(-)
 create mode 100644 mm/shrinker_debug.c

diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 76fbf92b04d9..50c0e233ecdd 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -2,6 +2,8 @@
 #ifndef _LINUX_SHRINKER_H
 #define _LINUX_SHRINKER_H
 
+struct shrinker_kobj;
+
 /*
  * This struct is used to pass information from page reclaim to the shrinkers.
  * We consolidate the values for easier extension later.
@@ -72,6 +74,9 @@ struct shrinker {
 #ifdef CONFIG_MEMCG
 	/* ID in shrinker_idr */
 	int id;
+#endif
+#ifdef CONFIG_SHRINKER_DEBUG
+	struct shrinker_kobj *kobj;
 #endif
 	/* objs pending delete, per node */
 	atomic_long_t *nr_deferred;
@@ -94,4 +99,17 @@ extern int register_shrinker(struct shrinker *shrinker);
 extern void unregister_shrinker(struct shrinker *shrinker);
 extern void free_prealloced_shrinker(struct shrinker *shrinker);
 extern void synchronize_shrinkers(void);
-#endif
+
+#ifdef CONFIG_SHRINKER_DEBUG
+int shrinker_init_kobj(struct shrinker *shrinker);
+void shrinker_unlink_kobj(struct shrinker *shrinker);
+#else /* CONFIG_SHRINKER_DEBUG */
+static inline int shrinker_init_kobj(struct shrinker *shrinker)
+{
+	return 0;
+}
+static inline void shrinker_unlink_kobj(struct shrinker *shrinker)
+{
+}
+#endif /* CONFIG_SHRINKER_DEBUG */
+#endif /* _LINUX_SHRINKER_H */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 6bf9cceb7d20..6369fcd9587f 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -733,6 +733,15 @@ config SLUB_STATS
 	  out which slabs are relevant to a particular load.
 	  Try running: slabinfo -DA
 
+config SHRINKER_DEBUG
+	default n
+	bool "Enable shrinker debugging support"
+	depends on SYSFS
+	help
+	  Say Y to enable the /sys/kernel/shrinkers debug interface which
+	  provides visibility into the kernel memory shrinkers subsystem.
+	  Disable it to avoid an extra memory footprint.
+
 config HAVE_DEBUG_KMEMLEAK
 	bool
 
diff --git a/mm/Makefile b/mm/Makefile
index 6f9ffa968a1a..9a564f836403 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -133,3 +133,4 @@ obj-$(CONFIG_PAGE_REPORTING) += page_reporting.o
 obj-$(CONFIG_IO_MAPPING) += io-mapping.o
 obj-$(CONFIG_HAVE_BOOTMEM_INFO_NODE) += bootmem_info.o
 obj-$(CONFIG_GENERIC_IOREMAP) += ioremap.o
+obj-$(CONFIG_SHRINKER_DEBUG) += shrinker_debug.o
diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c
new file mode 100644
index 000000000000..817d578f993c
--- /dev/null
+++ b/mm/shrinker_debug.c
@@ -0,0 +1,294 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/idr.h>
+#include <linux/slab.h>
+#include <linux/kobject.h>
+#include <linux/shrinker.h>
+
+/* defined in vmscan.c */
+extern struct rw_semaphore shrinker_rwsem;
+extern struct list_head shrinker_list;
+
+static DEFINE_IDA(shrinker_sysfs_ida);
+
+struct shrinker_kobj {
+	struct kobject kobj;
+	struct shrinker *shrinker;
+	int id;
+};
+
+struct shrinker_attribute {
+	struct attribute attr;
+	ssize_t (*show)(struct shrinker_kobj *skobj,
+			struct shrinker_attribute *attr, char *buf);
+	ssize_t (*store)(struct shrinker_kobj *skobj,
+			 struct shrinker_attribute *attr, const char *buf,
+			 size_t count);
+	unsigned long private;
+};
+
+#define to_shrinker_kobj(x) container_of(x, struct shrinker_kobj, kobj)
+#define to_shrinker_attr(x) container_of(x, struct shrinker_attribute, attr)
+
+static ssize_t shrinker_attr_show(struct kobject *kobj, struct attribute *attr,
+				  char *buf)
+{
+	struct shrinker_attribute *attribute = to_shrinker_attr(attr);
+	struct shrinker_kobj *skobj = to_shrinker_kobj(kobj);
+
+	if (!attribute->show)
+		return -EIO;
+
+	return attribute->show(skobj, attribute, buf);
+}
+
+static ssize_t shrinker_attr_store(struct kobject *kobj, struct attribute *attr,
+				   const char *buf, size_t len)
+{
+	struct shrinker_attribute *attribute = to_shrinker_attr(attr);
+	struct shrinker_kobj *skobj = to_shrinker_kobj(kobj);
+
+	if (!attribute->store)
+		return -EIO;
+
+	return attribute->store(skobj, attribute, buf, len);
+}
+
+static const struct sysfs_ops shrinker_sysfs_ops = {
+	.show = shrinker_attr_show,
+	.store = shrinker_attr_store,
+};
+
+static void shrinker_kobj_release(struct kobject *kobj)
+{
+	struct shrinker_kobj *skobj = to_shrinker_kobj(kobj);
+
+	WARN_ON(skobj->shrinker);
+	kfree(skobj);
+}
+
+static ssize_t count_show(struct shrinker_kobj *skobj,
+			  struct shrinker_attribute *attr, char *buf)
+{
+	unsigned long nr, total = 0;
+	struct shrinker *shrinker;
+	int nid;
+
+	down_read(&shrinker_rwsem);
+
+	shrinker = skobj->shrinker;
+	if (!shrinker) {
+		up_read(&shrinker_rwsem);
+		return -EBUSY;
+	}
+
+	for_each_node(nid) {
+		struct shrink_control sc = {
+			.gfp_mask = GFP_KERNEL,
+			.nid = nid,
+		};
+
+		nr = shrinker->count_objects(shrinker, &sc);
+		if (nr == SHRINK_EMPTY)
+			nr = 0;
+		total += nr;
+
+		if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
+			break;
+
+		cond_resched();
+	}
+	up_read(&shrinker_rwsem);
+	return sprintf(buf, "%lu\n", total);
+}
+
+static struct shrinker_attribute count_attribute = __ATTR_RO(count);
+
+static ssize_t scan_show(struct shrinker_kobj *skobj,
+			 struct shrinker_attribute *attr, char *buf)
+{
+	/*
+	 * Display the number of objects freed on the last scan.
+	 */
+	return sprintf(buf, "%lu\n", attr->private);
+}
+
+static ssize_t scan_store(struct shrinker_kobj *skobj,
+			  struct shrinker_attribute *attr,
+			  const char *buf, size_t size)
+{
+	unsigned long nr, total = 0, nr_to_scan = 0, freed = 0;
+	unsigned long *count_per_node = NULL;
+	struct shrinker *shrinker;
+	ssize_t ret = size;
+	int nid;
+
+	if (kstrtoul(buf, 10, &nr_to_scan))
+		return -EINVAL;
+
+	down_read(&shrinker_rwsem);
+
+	shrinker = skobj->shrinker;
+	if (!shrinker) {
+		ret = -EBUSY;
+		goto out;
+	}
+
+	if (shrinker->flags & SHRINKER_NUMA_AWARE) {
+		/*
+		 * If the shrinker is numa aware, distribute nr_to_scan
+		 * proportionally.
+		 */
+		count_per_node = kzalloc(sizeof(unsigned long) * nr_node_ids,
+					 GFP_KERNEL);
+		if (!count_per_node) {
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		for_each_node(nid) {
+			struct shrink_control sc = {
+				.gfp_mask = GFP_KERNEL,
+				.nid = nid,
+			};
+
+			nr = shrinker->count_objects(shrinker, &sc);
+			if (nr == SHRINK_EMPTY)
+				nr = 0;
+			count_per_node[nid] = nr;
+			total += nr;
+
+			cond_resched();
+		}
+	}
+
+	for_each_node(nid) {
+		struct shrink_control sc = {
+			.gfp_mask = GFP_KERNEL,
+			.nid = nid,
+		};
+
+		if (shrinker->flags & SHRINKER_NUMA_AWARE) {
+			sc.nr_to_scan = nr_to_scan * count_per_node[nid] /
+				(total ? total : 1);
+			sc.nr_scanned = sc.nr_to_scan;
+		} else {
+			sc.nr_to_scan = nr_to_scan;
+			sc.nr_scanned = sc.nr_to_scan;
+		}
+
+		nr = shrinker->scan_objects(shrinker, &sc);
+		if (nr == SHRINK_STOP || nr == SHRINK_EMPTY)
+			nr = 0;
+
+		freed += nr;
+
+		if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
+			break;
+
+		cond_resched();
+
+	}
+	attr->private = freed;
+out:
+	up_read(&shrinker_rwsem);
+	kfree(count_per_node);
+	return ret;
+}
+
+static struct shrinker_attribute scan_attribute = __ATTR_RW(scan);
+
+static struct attribute *shrinker_default_attrs[] = {
+	&count_attribute.attr,
+	&scan_attribute.attr,
+	NULL,
+};
+
+static const struct attribute_group shrinker_default_group = {
+	.attrs = shrinker_default_attrs,
+};
+
+static const struct attribute_group *shrinker_sysfs_groups[] = {
+	&shrinker_default_group,
+	NULL,
+};
+
+static struct kobj_type shrinker_ktype = {
+	.sysfs_ops = &shrinker_sysfs_ops,
+	.release = shrinker_kobj_release,
+	.default_groups = shrinker_sysfs_groups,
+};
+
+static struct kset *shrinker_kset;
+
+int shrinker_init_kobj(struct shrinker *shrinker)
+{
+	struct shrinker_kobj *skobj;
+	int ret = 0;
+	int id;
+
+	/* Sysfs isn't initialize yet, allocate kobjects later. */
+	if (!shrinker_kset)
+		return 0;
+
+	skobj = kzalloc(sizeof(struct shrinker_kobj), GFP_KERNEL);
+	if (!skobj)
+		return -ENOMEM;
+
+	id = ida_alloc(&shrinker_sysfs_ida, GFP_KERNEL);
+	if (id < 0) {
+		kfree(skobj);
+		return id;
+	}
+
+	skobj->id = id;
+	skobj->kobj.kset = shrinker_kset;
+	skobj->shrinker = shrinker;
+	ret = kobject_init_and_add(&skobj->kobj, &shrinker_ktype, NULL, "%d",
+				   id);
+	if (ret) {
+		ida_free(&shrinker_sysfs_ida, id);
+		kobject_put(&skobj->kobj);
+		return ret;
+	}
+
+	shrinker->kobj = skobj;
+
+	kobject_uevent(&skobj->kobj, KOBJ_ADD);
+
+	return ret;
+}
+
+void shrinker_unlink_kobj(struct shrinker *shrinker)
+{
+	struct shrinker_kobj *skobj;
+
+	if (!shrinker->kobj)
+		return;
+
+	skobj = shrinker->kobj;
+	skobj->shrinker = NULL;
+	ida_free(&shrinker_sysfs_ida, skobj->id);
+	shrinker->kobj = NULL;
+
+	kobject_put(&skobj->kobj);
+}
+
+static int __init shrinker_sysfs_init(void)
+{
+	struct shrinker *shrinker;
+	int ret = 0;
+
+	shrinker_kset = kset_create_and_add("shrinker", NULL, kernel_kobj);
+	if (!shrinker_kset)
+		return -ENOMEM;
+
+	/* Create sysfs entries for shrinkers registered at boot */
+	down_write(&shrinker_rwsem);
+	list_for_each_entry(shrinker, &shrinker_list, list)
+		if (!shrinker->kobj)
+			ret = shrinker_init_kobj(shrinker);
+	up_write(&shrinker_rwsem);
+
+	return ret;
+}
+__initcall(shrinker_sysfs_init);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index d4a7d2bd276d..79eaa9cea618 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -201,8 +201,8 @@ static void set_task_reclaim_state(struct task_struct *task,
 	task->reclaim_state = rs;
 }
 
-static LIST_HEAD(shrinker_list);
-static DECLARE_RWSEM(shrinker_rwsem);
+LIST_HEAD(shrinker_list);
+DECLARE_RWSEM(shrinker_rwsem);
 
 #ifdef CONFIG_MEMCG
 static int shrinker_nr_max;
@@ -666,6 +666,7 @@ void register_shrinker_prepared(struct shrinker *shrinker)
 	down_write(&shrinker_rwsem);
 	list_add_tail(&shrinker->list, &shrinker_list);
 	shrinker->flags |= SHRINKER_REGISTERED;
+	WARN_ON_ONCE(shrinker_init_kobj(shrinker));
 	up_write(&shrinker_rwsem);
 }
 
@@ -693,6 +694,7 @@ void unregister_shrinker(struct shrinker *shrinker)
 	shrinker->flags &= ~SHRINKER_REGISTERED;
 	if (shrinker->flags & SHRINKER_MEMCG_AWARE)
 		unregister_memcg_shrinker(shrinker);
+	shrinker_unlink_kobj(shrinker);
 	up_write(&shrinker_rwsem);
 
 	kfree(shrinker->nr_deferred);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH rfc 2/5] mm: memcontrol: introduce mem_cgroup_ino() and mem_cgroup_get_from_ino()
  2022-04-16  0:27 [PATCH rfc 0/5] mm: introduce shrinker sysfs interface Roman Gushchin
  2022-04-16  0:27 ` [PATCH rfc 1/5] mm: introduce sysfs interface for debugging kernel shrinker Roman Gushchin
@ 2022-04-16  0:27 ` Roman Gushchin
  2022-04-16  0:27 ` [PATCH rfc 3/5] mm: introduce memcg interfaces for shrinker sysfs Roman Gushchin
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Roman Gushchin @ 2022-04-16  0:27 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Dave Chinner, linux-kernel, Johannes Weiner,
	Michal Hocko, Shakeel Butt, Yang Shi, Roman Gushchin

Shrinker sysfs requires a way to represent memory cgroups without
using full paths, both for displaying information and getting input
from a user.

Cgroup inode numbers is a perfect way, used by e.g. bpf.

This commit adds a couple of helper functions, which will be used to
represent and interact with memcg-aware shrinkers using shrinkers sysfs.

Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
---
 include/linux/memcontrol.h |  9 +++++++++
 mm/memcontrol.c            | 23 +++++++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index cc16ba262464..299472046000 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -838,6 +838,15 @@ static inline unsigned short mem_cgroup_id(struct mem_cgroup *memcg)
 }
 struct mem_cgroup *mem_cgroup_from_id(unsigned short id);
 
+#ifdef CONFIG_SHRINKER_DEBUG
+static inline unsigned long mem_cgroup_ino(struct mem_cgroup *memcg)
+{
+	return cgroup_ino(memcg->css.cgroup);
+}
+
+struct mem_cgroup *mem_cgroup_get_from_ino(unsigned long ino);
+#endif
+
 static inline struct mem_cgroup *mem_cgroup_from_seq(struct seq_file *m)
 {
 	return mem_cgroup_from_css(seq_css(m));
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4a3e1300c5a1..030dd637ec7a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5023,6 +5023,29 @@ struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
 	return idr_find(&mem_cgroup_idr, id);
 }
 
+#ifdef CONFIG_SHRINKER_DEBUG
+struct mem_cgroup *mem_cgroup_get_from_ino(unsigned long ino)
+{
+	struct cgroup *cgrp;
+	struct cgroup_subsys_state *css;
+	struct mem_cgroup *memcg;
+
+	cgrp = cgroup_get_from_id(ino);
+	if (!cgrp)
+		return ERR_PTR(-ENOENT);
+
+	css = cgroup_get_e_css(cgrp, &memory_cgrp_subsys);
+	if (css)
+		memcg = container_of(css, struct mem_cgroup, css);
+	else
+		memcg = ERR_PTR(-ENOENT);
+
+	cgroup_put(cgrp);
+
+	return memcg;
+}
+#endif
+
 static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
 {
 	struct mem_cgroup_per_node *pn;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH rfc 3/5] mm: introduce memcg interfaces for shrinker sysfs
  2022-04-16  0:27 [PATCH rfc 0/5] mm: introduce shrinker sysfs interface Roman Gushchin
  2022-04-16  0:27 ` [PATCH rfc 1/5] mm: introduce sysfs interface for debugging kernel shrinker Roman Gushchin
  2022-04-16  0:27 ` [PATCH rfc 2/5] mm: memcontrol: introduce mem_cgroup_ino() and mem_cgroup_get_from_ino() Roman Gushchin
@ 2022-04-16  0:27 ` Roman Gushchin
  2022-04-16  0:27 ` [PATCH rfc 4/5] mm: introduce numa " Roman Gushchin
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Roman Gushchin @ 2022-04-16  0:27 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Dave Chinner, linux-kernel, Johannes Weiner,
	Michal Hocko, Shakeel Butt, Yang Shi, Roman Gushchin

This commit introduces "count_memcg" and "scan_memcg" interfaces
for memcg-aware shrinkers.

Count_memcg using the following format:
<cgroup inode number1> <count2>
<cgroup inode number2> <count2>
...

Memory cgroups with 0 associated objects are skipped.

If the output doesn't fit into a page (sysfs limitation), a separate
line with "..." is added at the end.

It's possible to write a minimum number to the "count_memcg" interface
to filter the output.

Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
---
 mm/shrinker_debug.c | 216 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 216 insertions(+)

diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c
index 817d578f993c..24f78f5feb22 100644
--- a/mm/shrinker_debug.c
+++ b/mm/shrinker_debug.c
@@ -3,6 +3,7 @@
 #include <linux/slab.h>
 #include <linux/kobject.h>
 #include <linux/shrinker.h>
+#include <linux/memcontrol.h>
 
 /* defined in vmscan.c */
 extern struct rw_semaphore shrinker_rwsem;
@@ -207,8 +208,223 @@ static const struct attribute_group shrinker_default_group = {
 	.attrs = shrinker_default_attrs,
 };
 
+#ifdef CONFIG_MEMCG
+static ssize_t count_memcg_show(struct shrinker_kobj *skobj,
+				struct shrinker_attribute *attr, char *buf)
+{
+	unsigned long nr, total;
+	struct shrinker *shrinker;
+	struct mem_cgroup *memcg;
+	ssize_t ret = 0;
+	int nid;
+
+	down_read(&shrinker_rwsem);
+	rcu_read_lock();
+
+	shrinker = skobj->shrinker;
+	if (!shrinker) {
+		ret = -EBUSY;
+		goto out;
+	}
+
+	memcg = mem_cgroup_iter(NULL, NULL, NULL);
+	do {
+		if (!mem_cgroup_online(memcg))
+			continue;
+
+		/*
+		 * Display a PAGE_SIZE of data, reserve last 50 characters
+		 * for "...".
+		 */
+		if (ret > PAGE_SIZE - 50) {
+			ret += sprintf(buf + ret, "...\n");
+			mem_cgroup_iter_break(NULL, memcg);
+			break;
+		}
+
+		total = 0;
+		for_each_node(nid) {
+			struct shrink_control sc = {
+				.gfp_mask = GFP_KERNEL,
+				.nid = nid,
+				.memcg = memcg,
+			};
+
+			nr = shrinker->count_objects(shrinker, &sc);
+			if (nr == SHRINK_EMPTY)
+				nr = 0;
+			total += nr;
+
+			if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
+				break;
+
+			cond_resched();
+		}
+
+		if (!total || total < attr->private)
+			continue;
+
+		ret += sprintf(buf + ret, "%lu %lu\n", mem_cgroup_ino(memcg),
+			       total);
+
+		cond_resched();
+	} while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
+out:
+	rcu_read_unlock();
+	up_read(&shrinker_rwsem);
+	return ret;
+}
+
+static ssize_t count_memcg_store(struct shrinker_kobj *skobj,
+				 struct shrinker_attribute *attr,
+				 const char *buf, size_t size)
+{
+	unsigned long min_count;
+
+	if (kstrtoul(buf, 10, &min_count))
+		return -EINVAL;
+
+	attr->private = min_count;
+
+	return size;
+}
+
+static struct shrinker_attribute count_memcg_attribute = __ATTR_RW(count_memcg);
+
+static ssize_t scan_memcg_show(struct shrinker_kobj *skobj,
+			       struct shrinker_attribute *attr, char *buf)
+{
+	/*
+	 * Display the number of objects freed on the last scan.
+	 */
+	return sprintf(buf, "%lu\n", attr->private);
+}
+
+static ssize_t scan_memcg_store(struct shrinker_kobj *skobj,
+			  struct shrinker_attribute *attr,
+			  const char *buf, size_t size)
+{
+	unsigned long nr, nr_to_scan = 0, freed = 0, total = 0, ino;
+	unsigned long *count_per_node = NULL;
+	struct mem_cgroup *memcg;
+	struct shrinker *shrinker;
+	ssize_t ret = size;
+	int nid;
+
+	if (sscanf(buf, "%lu %lu", &ino, &nr_to_scan) < 2)
+		return -EINVAL;
+
+	memcg = mem_cgroup_get_from_ino(ino);
+	if (!memcg || IS_ERR(memcg))
+		return -ENOENT;
+
+	if (!mem_cgroup_online(memcg)) {
+		mem_cgroup_put(memcg);
+		return -ENOENT;
+	}
+
+	down_read(&shrinker_rwsem);
+
+	shrinker = skobj->shrinker;
+	if (!shrinker) {
+		ret = -EBUSY;
+		goto out;
+	}
+
+	if (shrinker->flags & SHRINKER_NUMA_AWARE) {
+		count_per_node = kzalloc(sizeof(unsigned long) * nr_node_ids,
+					GFP_KERNEL);
+		if (!count_per_node) {
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		for_each_node(nid) {
+			struct shrink_control sc = {
+				.gfp_mask = GFP_KERNEL,
+				.nid = nid,
+				.memcg = memcg,
+			};
+
+			nr = shrinker->count_objects(shrinker, &sc);
+			if (nr == SHRINK_EMPTY)
+				nr = 0;
+			count_per_node[nid] = nr;
+			total += nr;
+
+			cond_resched();
+		}
+	}
+
+	for_each_node(nid) {
+		struct shrink_control sc = {
+			.gfp_mask = GFP_KERNEL,
+			.nid = nid,
+			.memcg = memcg,
+		};
+
+		if (shrinker->flags & SHRINKER_NUMA_AWARE) {
+			sc.nr_to_scan = nr_to_scan * count_per_node[nid] /
+				(total ? total : 1);
+			sc.nr_scanned = sc.nr_to_scan;
+		} else {
+			sc.nr_to_scan = nr_to_scan;
+			sc.nr_scanned = sc.nr_to_scan;
+		}
+
+		nr = shrinker->scan_objects(shrinker, &sc);
+		if (nr == SHRINK_STOP || nr == SHRINK_EMPTY)
+			nr = 0;
+
+		freed += nr;
+
+		if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
+			break;
+
+		cond_resched();
+	}
+	attr->private = freed;
+out:
+	up_read(&shrinker_rwsem);
+	mem_cgroup_put(memcg);
+	kfree(count_per_node);
+	return ret;
+}
+
+static struct shrinker_attribute scan_memcg_attribute = __ATTR_RW(scan_memcg);
+
+static struct attribute *shrinker_memcg_attrs[] = {
+	&count_memcg_attribute.attr,
+	&scan_memcg_attribute.attr,
+	NULL,
+};
+
+static umode_t memcg_attrs_visible(struct kobject *kobj, struct attribute *attr,
+				   int i)
+{
+	struct shrinker_kobj *skobj = to_shrinker_kobj(kobj);
+	struct shrinker *shrinker;
+	int ret = 0;
+
+	lockdep_assert_held(&shrinker_rwsem);
+
+	shrinker = skobj->shrinker;
+	if (shrinker && (shrinker->flags & SHRINKER_MEMCG_AWARE))
+		ret = 0644;
+
+	return ret;
+}
+
+static const struct attribute_group shrinker_memcg_group = {
+	.attrs = shrinker_memcg_attrs,
+	.is_visible = memcg_attrs_visible,
+};
+#endif /* CONFIG_MEMCG */
 static const struct attribute_group *shrinker_sysfs_groups[] = {
 	&shrinker_default_group,
+#ifdef CONFIG_MEMCG
+	&shrinker_memcg_group,
+#endif
 	NULL,
 };
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH rfc 4/5] mm: introduce numa interfaces for shrinker sysfs
  2022-04-16  0:27 [PATCH rfc 0/5] mm: introduce shrinker sysfs interface Roman Gushchin
                   ` (2 preceding siblings ...)
  2022-04-16  0:27 ` [PATCH rfc 3/5] mm: introduce memcg interfaces for shrinker sysfs Roman Gushchin
@ 2022-04-16  0:27 ` Roman Gushchin
  2022-04-16  0:27 ` [PATCH rfc 5/5] mm: provide shrinkers with names Roman Gushchin
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Roman Gushchin @ 2022-04-16  0:27 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Dave Chinner, linux-kernel, Johannes Weiner,
	Michal Hocko, Shakeel Butt, Yang Shi, Roman Gushchin

This commit introduces "count_node", "scan_node", "count_memcg_node"
and "scan_memcg_node" interfaces for numa-aware and numa- and
memcg-aware shrinkers.

Usage examples:
1) Get per-node and per-memcg per-node counts:
  $ cat count_node
    209 3
  $ cat count_memcg_node
    1 209 3
    20 96 0
    53 810 7
    2297 2 0
    218 13 0
    581 30 0
    911 124 0
    <CUT>

2) Scan individual node:
  $ echo "1 200" > scan_node
  $ cat scan_node
    2

3) Scan individual memcg and node:
  $ echo "1868 0 500" > scan_memcg_node
  $ cat scan_memcg_node
    435

Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
---
 mm/shrinker_debug.c | 279 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 279 insertions(+)

diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c
index 24f78f5feb22..ae6e434500bc 100644
--- a/mm/shrinker_debug.c
+++ b/mm/shrinker_debug.c
@@ -420,10 +420,289 @@ static const struct attribute_group shrinker_memcg_group = {
 	.is_visible = memcg_attrs_visible,
 };
 #endif /* CONFIG_MEMCG */
+
+#ifdef CONFIG_NUMA
+static ssize_t count_node_show(struct shrinker_kobj *skobj,
+			       struct shrinker_attribute *attr, char *buf)
+{
+	struct shrinker *shrinker;
+	unsigned long nr;
+	int nid;
+	ssize_t ret = 0;
+
+	down_read(&shrinker_rwsem);
+
+	shrinker = skobj->shrinker;
+	if (!shrinker) {
+		ret = -EBUSY;
+		goto out;
+	}
+
+	for_each_node(nid) {
+		struct shrink_control sc = {
+			.gfp_mask = GFP_KERNEL,
+			.nid = nid,
+		};
+
+		nr = shrinker->count_objects(shrinker, &sc);
+		if (nr == SHRINK_EMPTY)
+			nr = 0;
+
+		ret += sprintf(buf + ret, "%lu ", nr);
+
+		cond_resched();
+	}
+out:
+	up_read(&shrinker_rwsem);
+	ret += sprintf(buf + ret, "\n");
+	return ret;
+}
+
+static struct shrinker_attribute count_node_attribute = __ATTR_RO(count_node);
+
+static ssize_t scan_node_show(struct shrinker_kobj *skobj,
+			      struct shrinker_attribute *attr, char *buf)
+{
+	/*
+	 * Display the number of objects freed on the last scan.
+	 */
+	return sprintf(buf, "%lu\n", attr->private);
+}
+
+static ssize_t scan_node_store(struct shrinker_kobj *skobj,
+			       struct shrinker_attribute *attr,
+			       const char *buf, size_t size)
+{
+	unsigned long nr, nr_to_scan = 0;
+	struct shrinker *shrinker;
+	ssize_t ret = size;
+	int nid;
+	struct shrink_control sc = {
+		.gfp_mask = GFP_KERNEL,
+	};
+
+	if (sscanf(buf, "%d %lu", &nid, &nr_to_scan) < 2)
+		return -EINVAL;
+
+	if (nid >= nr_node_ids)
+		return -EINVAL;
+
+	if (nid < 0 || nid >= nr_node_ids)
+		return -EINVAL;
+
+	down_read(&shrinker_rwsem);
+
+	shrinker = skobj->shrinker;
+	if (!shrinker) {
+		ret = -EBUSY;
+		goto out;
+	}
+
+	sc.nid = nid;
+	sc.nr_to_scan = nr_to_scan;
+	sc.nr_scanned = nr_to_scan;
+
+	nr = shrinker->scan_objects(shrinker, &sc);
+	if (nr == SHRINK_STOP || nr == SHRINK_EMPTY)
+		nr = 0;
+
+	attr->private = nr;
+out:
+	up_read(&shrinker_rwsem);
+	return ret;
+}
+
+static struct shrinker_attribute scan_node_attribute = __ATTR_RW(scan_node);
+
+#ifdef CONFIG_MEMCG
+static ssize_t count_memcg_node_show(struct shrinker_kobj *skobj,
+				     struct shrinker_attribute *attr, char *buf)
+{
+	unsigned long nr, total;
+	unsigned long *count_per_node = NULL;
+	struct shrinker *shrinker;
+	struct mem_cgroup *memcg;
+	ssize_t ret = 0;
+	int nid;
+
+	down_read(&shrinker_rwsem);
+	rcu_read_lock();
+
+	shrinker = skobj->shrinker;
+	if (!shrinker) {
+		ret = -EBUSY;
+		goto out;
+	}
+
+	count_per_node = kzalloc(sizeof(unsigned long) * nr_node_ids, GFP_KERNEL);
+	if (!count_per_node) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	memcg = mem_cgroup_iter(NULL, NULL, NULL);
+	do {
+		if (!mem_cgroup_online(memcg))
+			continue;
+
+		/*
+		 * Display a PAGE_SIZE of data, reserve last few characters
+		 * for "...".
+		 */
+		if (ret > PAGE_SIZE - (nr_node_ids * 20 + 30)) {
+			ret += sprintf(buf + ret, "...\n");
+			mem_cgroup_iter_break(NULL, memcg);
+			break;
+		}
+
+		total = 0;
+		for_each_node(nid) {
+			struct shrink_control sc = {
+				.gfp_mask = GFP_KERNEL,
+				.nid = nid,
+				.memcg = memcg,
+			};
+
+			nr = shrinker->count_objects(shrinker, &sc);
+			if (nr == SHRINK_EMPTY)
+				nr = 0;
+			count_per_node[nid] = nr;
+			total += nr;
+		}
+		if (!total || total < attr->private)
+			continue;
+
+		ret += sprintf(buf + ret, "%lu ", mem_cgroup_ino(memcg));
+		for_each_node(nid)
+			ret += sprintf(buf + ret, "%lu ", count_per_node[nid]);
+		ret += sprintf(buf + ret, "\n");
+	} while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
+out:
+	rcu_read_unlock();
+	up_read(&shrinker_rwsem);
+	kfree(count_per_node);
+	return ret;
+}
+
+static ssize_t count_memcg_node_store(struct shrinker_kobj *skobj,
+				      struct shrinker_attribute *attr,
+				      const char *buf, size_t size)
+{
+	unsigned long min_count;
+
+	if (kstrtoul(buf, 10, &min_count))
+		return -EINVAL;
+
+	attr->private = min_count;
+
+	return size;
+}
+
+static struct shrinker_attribute count_memcg_node_attribute =
+	__ATTR_RW(count_memcg_node);
+
+static ssize_t scan_memcg_node_show(struct shrinker_kobj *skobj,
+				    struct shrinker_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%lu\n", attr->private);
+}
+
+static ssize_t scan_memcg_node_store(struct shrinker_kobj *skobj,
+				     struct shrinker_attribute *attr,
+				     const char *buf, size_t size)
+{
+	unsigned long nr_to_scan = 0, nr, ino;
+	struct shrink_control sc = {
+		.gfp_mask = GFP_KERNEL,
+	};
+	struct mem_cgroup *memcg;
+	struct shrinker *shrinker;
+	ssize_t ret = size;
+	int nid;
+
+	if (sscanf(buf, "%lu %d %lu", &ino, &nid, &nr_to_scan) < 2)
+		return -EINVAL;
+
+	if (nid >= nr_node_ids)
+		return -EINVAL;
+
+	memcg = mem_cgroup_get_from_ino(ino);
+	if (!memcg || IS_ERR(memcg))
+		return -ENOENT;
+
+	if (!mem_cgroup_online(memcg)) {
+		mem_cgroup_put(memcg);
+		return -ENOENT;
+	}
+
+	down_read(&shrinker_rwsem);
+
+	shrinker = skobj->shrinker;
+	if (!shrinker) {
+		ret = -EBUSY;
+		goto out;
+	}
+
+	sc.nid = nid;
+	sc.memcg = memcg;
+	sc.nr_to_scan = nr_to_scan;
+	sc.nr_scanned = nr_to_scan;
+
+	nr = shrinker->scan_objects(shrinker, &sc);
+	if (nr == SHRINK_STOP || nr == SHRINK_EMPTY)
+		nr = 0;
+
+	attr->private = nr;
+out:
+	up_read(&shrinker_rwsem);
+	mem_cgroup_put(memcg);
+	return ret;
+}
+
+static struct shrinker_attribute scan_memcg_node_attribute =
+	__ATTR_RW(scan_memcg_node);
+#endif /* CONFIG_MEMCG */
+
+static struct attribute *shrinker_node_attrs[] = {
+	&count_node_attribute.attr,
+	&scan_node_attribute.attr,
+#ifdef CONFIG_MEMCG
+	&count_memcg_node_attribute.attr,
+	&scan_memcg_node_attribute.attr,
+#endif
+	NULL,
+};
+
+static umode_t node_attrs_visible(struct kobject *kobj, struct attribute *attr,
+				  int i)
+{
+	struct shrinker_kobj *skobj = to_shrinker_kobj(kobj);
+	struct shrinker *shrinker;
+	int ret = 0;
+
+	lockdep_assert_held(&shrinker_rwsem);
+
+	shrinker = skobj->shrinker;
+	if (nr_node_ids > 1 && shrinker &&
+	    (shrinker->flags & SHRINKER_NUMA_AWARE))
+		ret = 0644;
+
+	return ret;
+}
+
+static const struct attribute_group shrinker_node_group = {
+	.attrs = shrinker_node_attrs,
+	.is_visible = node_attrs_visible,
+};
+#endif /* CONFIG_NUMA */
+
 static const struct attribute_group *shrinker_sysfs_groups[] = {
 	&shrinker_default_group,
 #ifdef CONFIG_MEMCG
 	&shrinker_memcg_group,
+#endif
+#ifdef CONFIG_NUMA
+	&shrinker_node_group,
 #endif
 	NULL,
 };
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH rfc 5/5] mm: provide shrinkers with names
  2022-04-16  0:27 [PATCH rfc 0/5] mm: introduce shrinker sysfs interface Roman Gushchin
                   ` (3 preceding siblings ...)
  2022-04-16  0:27 ` [PATCH rfc 4/5] mm: introduce numa " Roman Gushchin
@ 2022-04-16  0:27 ` Roman Gushchin
  2022-04-18  9:27 ` [PATCH rfc 0/5] mm: introduce shrinker sysfs interface Mike Rapoport
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Roman Gushchin @ 2022-04-16  0:27 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Dave Chinner, linux-kernel, Johannes Weiner,
	Michal Hocko, Shakeel Butt, Yang Shi, Roman Gushchin

Currently shrinkers are anonymous objects. For debugging purposes they
can be identified by count/scan function names, but it's not always
useful: e.g. for superblock's shrinkers it's nice to have at least
an idea of to which superblock the shrinker belongs.

This commit adds names to shrinkers. register_shrinker() and
prealloc_shrinker() functions are extended to take a format and
arguments to master a name. If CONFIG_SHRINKER_DEBUG is on,
the name is saved until the corresponding kobject is created,
otherwise it's simple ignored.

After this change the shrinker sysfs folder looks like:
  $ cd /sys/kernel/shrinker/
  $ ls
    dqcache-16          sb-cgroup2-30    sb-hugetlbfs-33  sb-proc-41       sb-selinuxfs-22  sb-tmpfs-40    sb-zsmalloc-19
    kfree_rcu-0         sb-configfs-23   sb-iomem-12      sb-proc-44       sb-sockfs-8      sb-tmpfs-42    shadow-18
    sb-aio-20           sb-dax-11        sb-mqueue-21     sb-proc-45       sb-sysfs-26      sb-tmpfs-43    thp_deferred_split-10
    sb-anon_inodefs-15  sb-debugfs-7     sb-nsfs-4        sb-proc-47       sb-tmpfs-1       sb-tmpfs-46    thp_zero-9
    sb-bdev-3           sb-devpts-28     sb-pipefs-14     sb-pstore-31     sb-tmpfs-27      sb-tmpfs-49    xfs_buf-37
    sb-bpf-32           sb-devtmpfs-5    sb-proc-25       sb-rootfs-2      sb-tmpfs-29      sb-tracefs-13  xfs_inodegc-38
    sb-btrfs-24         sb-hugetlbfs-17  sb-proc-39       sb-securityfs-6  sb-tmpfs-35      sb-xfs-36      zspool-34

Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
---
 arch/x86/kvm/mmu/mmu.c                        |  2 +-
 drivers/android/binder_alloc.c                |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_shrinker.c  |  3 +-
 drivers/gpu/drm/msm/msm_gem_shrinker.c        |  2 +-
 .../gpu/drm/panfrost/panfrost_gem_shrinker.c  |  2 +-
 drivers/gpu/drm/ttm/ttm_pool.c                |  2 +-
 drivers/md/bcache/btree.c                     |  2 +-
 drivers/md/dm-bufio.c                         |  2 +-
 drivers/md/dm-zoned-metadata.c                |  2 +-
 drivers/md/raid5.c                            |  2 +-
 drivers/misc/vmw_balloon.c                    |  2 +-
 drivers/virtio/virtio_balloon.c               |  2 +-
 drivers/xen/xenbus/xenbus_probe_backend.c     |  2 +-
 fs/erofs/utils.c                              |  2 +-
 fs/ext4/extents_status.c                      |  3 +-
 fs/f2fs/super.c                               |  2 +-
 fs/gfs2/glock.c                               |  2 +-
 fs/gfs2/main.c                                |  2 +-
 fs/jbd2/journal.c                             |  2 +-
 fs/mbcache.c                                  |  2 +-
 fs/nfs/nfs42xattr.c                           |  7 ++-
 fs/nfs/super.c                                |  2 +-
 fs/nfsd/filecache.c                           |  2 +-
 fs/nfsd/nfscache.c                            |  2 +-
 fs/quota/dquot.c                              |  2 +-
 fs/super.c                                    |  2 +-
 fs/ubifs/super.c                              |  2 +-
 fs/xfs/xfs_buf.c                              |  2 +-
 fs/xfs/xfs_icache.c                           |  2 +-
 fs/xfs/xfs_qm.c                               |  2 +-
 include/linux/shrinker.h                      |  5 +-
 kernel/rcu/tree.c                             |  2 +-
 mm/huge_memory.c                              |  4 +-
 mm/shrinker_debug.c                           |  7 ++-
 mm/vmscan.c                                   | 60 ++++++++++++++++++-
 mm/workingset.c                               |  2 +-
 mm/zsmalloc.c                                 |  2 +-
 net/sunrpc/auth.c                             |  2 +-
 38 files changed, 105 insertions(+), 46 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f9080ee50ffa..7f3abc800621 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6283,7 +6283,7 @@ int kvm_mmu_vendor_module_init(void)
 	if (percpu_counter_init(&kvm_total_used_mmu_pages, 0, GFP_KERNEL))
 		goto out;
 
-	ret = register_shrinker(&mmu_shrinker);
+	ret = register_shrinker(&mmu_shrinker, "mmu");
 	if (ret)
 		goto out;
 
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 2ac1008a5f39..951343c41ba8 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -1084,7 +1084,7 @@ int binder_alloc_shrinker_init(void)
 	int ret = list_lru_init(&binder_alloc_lru);
 
 	if (ret == 0) {
-		ret = register_shrinker(&binder_shrinker);
+		ret = register_shrinker(&binder_shrinker, "binder");
 		if (ret)
 			list_lru_destroy(&binder_alloc_lru);
 	}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
index 6a6ff98a8746..85524ef92ea4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
@@ -426,7 +426,8 @@ void i915_gem_driver_register__shrinker(struct drm_i915_private *i915)
 	i915->mm.shrinker.count_objects = i915_gem_shrinker_count;
 	i915->mm.shrinker.seeks = DEFAULT_SEEKS;
 	i915->mm.shrinker.batch = 4096;
-	drm_WARN_ON(&i915->drm, register_shrinker(&i915->mm.shrinker));
+	drm_WARN_ON(&i915->drm, register_shrinker(&i915->mm.shrinker,
+						  "drm_i915_gem"));
 
 	i915->mm.oom_notifier.notifier_call = i915_gem_shrinker_oom;
 	drm_WARN_ON(&i915->drm, register_oom_notifier(&i915->mm.oom_notifier));
diff --git a/drivers/gpu/drm/msm/msm_gem_shrinker.c b/drivers/gpu/drm/msm/msm_gem_shrinker.c
index 086dacf2f26a..2d3cf4f13dfd 100644
--- a/drivers/gpu/drm/msm/msm_gem_shrinker.c
+++ b/drivers/gpu/drm/msm/msm_gem_shrinker.c
@@ -221,7 +221,7 @@ void msm_gem_shrinker_init(struct drm_device *dev)
 	priv->shrinker.count_objects = msm_gem_shrinker_count;
 	priv->shrinker.scan_objects = msm_gem_shrinker_scan;
 	priv->shrinker.seeks = DEFAULT_SEEKS;
-	WARN_ON(register_shrinker(&priv->shrinker));
+	WARN_ON(register_shrinker(&priv->shrinker, "drm_msm_gem"));
 
 	priv->vmap_notifier.notifier_call = msm_gem_shrinker_vmap;
 	WARN_ON(register_vmap_purge_notifier(&priv->vmap_notifier));
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c b/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
index 77e7cb6d1ae3..0d028266ee9e 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
@@ -103,7 +103,7 @@ void panfrost_gem_shrinker_init(struct drm_device *dev)
 	pfdev->shrinker.count_objects = panfrost_gem_shrinker_count;
 	pfdev->shrinker.scan_objects = panfrost_gem_shrinker_scan;
 	pfdev->shrinker.seeks = DEFAULT_SEEKS;
-	WARN_ON(register_shrinker(&pfdev->shrinker));
+	WARN_ON(register_shrinker(&pfdev->shrinker, "drm_panfrost"));
 }
 
 /**
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 1bba0a0ed3f9..b8b41d242197 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -722,7 +722,7 @@ int ttm_pool_mgr_init(unsigned long num_pages)
 	mm_shrinker.count_objects = ttm_pool_shrinker_count;
 	mm_shrinker.scan_objects = ttm_pool_shrinker_scan;
 	mm_shrinker.seeks = 1;
-	return register_shrinker(&mm_shrinker);
+	return register_shrinker(&mm_shrinker, "drm_ttm_pool");
 }
 
 /**
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index ad9f16689419..c1f734ab86b3 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -812,7 +812,7 @@ int bch_btree_cache_alloc(struct cache_set *c)
 	c->shrink.seeks = 4;
 	c->shrink.batch = c->btree_pages * 2;
 
-	if (register_shrinker(&c->shrink))
+	if (register_shrinker(&c->shrink, "btree"))
 		pr_warn("bcache: %s: could not register shrinker\n",
 				__func__);
 
diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index e9cbc70d5a0e..2a2255dc507f 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -1807,7 +1807,7 @@ struct dm_bufio_client *dm_bufio_client_create(struct block_device *bdev, unsign
 	c->shrinker.scan_objects = dm_bufio_shrink_scan;
 	c->shrinker.seeks = 1;
 	c->shrinker.batch = 0;
-	r = register_shrinker(&c->shrinker);
+	r = register_shrinker(&c->shrinker, "dm_bufio");
 	if (r)
 		goto bad;
 
diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index d1ea66114d14..05f2fd12066b 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -2944,7 +2944,7 @@ int dmz_ctr_metadata(struct dmz_dev *dev, int num_dev,
 	zmd->mblk_shrinker.seeks = DEFAULT_SEEKS;
 
 	/* Metadata cache shrinker */
-	ret = register_shrinker(&zmd->mblk_shrinker);
+	ret = register_shrinker(&zmd->mblk_shrinker, "md_meta");
 	if (ret) {
 		dmz_zmd_err(zmd, "Register metadata cache shrinker failed");
 		goto err;
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 351d341a1ffa..7a2ee351b67f 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7383,7 +7383,7 @@ static struct r5conf *setup_conf(struct mddev *mddev)
 	conf->shrinker.count_objects = raid5_cache_count;
 	conf->shrinker.batch = 128;
 	conf->shrinker.flags = 0;
-	if (register_shrinker(&conf->shrinker)) {
+	if (register_shrinker(&conf->shrinker, "md")) {
 		pr_warn("md/raid:%s: couldn't register shrinker.\n",
 			mdname(mddev));
 		goto abort;
diff --git a/drivers/misc/vmw_balloon.c b/drivers/misc/vmw_balloon.c
index f1d8ba6d4857..6c9ddf1187dd 100644
--- a/drivers/misc/vmw_balloon.c
+++ b/drivers/misc/vmw_balloon.c
@@ -1587,7 +1587,7 @@ static int vmballoon_register_shrinker(struct vmballoon *b)
 	b->shrinker.count_objects = vmballoon_shrinker_count;
 	b->shrinker.seeks = DEFAULT_SEEKS;
 
-	r = register_shrinker(&b->shrinker);
+	r = register_shrinker(&b->shrinker, "vmw_balloon");
 
 	if (r == 0)
 		b->shrinker_registered = true;
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index f4c34a2a6b8e..093e06e19d0e 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -875,7 +875,7 @@ static int virtio_balloon_register_shrinker(struct virtio_balloon *vb)
 	vb->shrinker.count_objects = virtio_balloon_shrinker_count;
 	vb->shrinker.seeks = DEFAULT_SEEKS;
 
-	return register_shrinker(&vb->shrinker);
+	return register_shrinker(&vb->shrinker, "virtio_valloon");
 }
 
 static int virtballoon_probe(struct virtio_device *vdev)
diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c b/drivers/xen/xenbus/xenbus_probe_backend.c
index 5abded97e1a7..a6c5e344017d 100644
--- a/drivers/xen/xenbus/xenbus_probe_backend.c
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c
@@ -305,7 +305,7 @@ static int __init xenbus_probe_backend_init(void)
 
 	register_xenstore_notifier(&xenstore_notifier);
 
-	if (register_shrinker(&backend_memory_shrinker))
+	if (register_shrinker(&backend_memory_shrinker, "xen_backend"))
 		pr_warn("shrinker registration failed\n");
 
 	return 0;
diff --git a/fs/erofs/utils.c b/fs/erofs/utils.c
index ec9a1d780dc1..67eb64fadd4f 100644
--- a/fs/erofs/utils.c
+++ b/fs/erofs/utils.c
@@ -282,7 +282,7 @@ static struct shrinker erofs_shrinker_info = {
 
 int __init erofs_init_shrinker(void)
 {
-	return register_shrinker(&erofs_shrinker_info);
+	return register_shrinker(&erofs_shrinker_info, "erofs");
 }
 
 void erofs_exit_shrinker(void)
diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index 9a3a8996aacf..a7aa79d580e5 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -1650,11 +1650,10 @@ int ext4_es_register_shrinker(struct ext4_sb_info *sbi)
 	err = percpu_counter_init(&sbi->s_es_stats.es_stats_shk_cnt, 0, GFP_KERNEL);
 	if (err)
 		goto err3;
-
 	sbi->s_es_shrinker.scan_objects = ext4_es_scan;
 	sbi->s_es_shrinker.count_objects = ext4_es_count;
 	sbi->s_es_shrinker.seeks = DEFAULT_SEEKS;
-	err = register_shrinker(&sbi->s_es_shrinker);
+	err = register_shrinker(&sbi->s_es_shrinker, "ext4_es");
 	if (err)
 		goto err4;
 
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index ea939db18f88..ffce794bd4f6 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -4609,7 +4609,7 @@ static int __init init_f2fs_fs(void)
 	err = f2fs_init_sysfs();
 	if (err)
 		goto free_garbage_collection_cache;
-	err = register_shrinker(&f2fs_shrinker_info);
+	err = register_shrinker(&f2fs_shrinker_info, "f2fs");
 	if (err)
 		goto free_sysfs;
 	err = register_filesystem(&f2fs_fs_type);
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 26169cedcefc..791c23d9f7e7 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -2549,7 +2549,7 @@ int __init gfs2_glock_init(void)
 		return -ENOMEM;
 	}
 
-	ret = register_shrinker(&glock_shrinker);
+	ret = register_shrinker(&glock_shrinker, "gfs2_glock");
 	if (ret) {
 		destroy_workqueue(gfs2_delete_workqueue);
 		destroy_workqueue(glock_workqueue);
diff --git a/fs/gfs2/main.c b/fs/gfs2/main.c
index 28d0eb23e18e..dde981b78488 100644
--- a/fs/gfs2/main.c
+++ b/fs/gfs2/main.c
@@ -150,7 +150,7 @@ static int __init init_gfs2_fs(void)
 	if (!gfs2_trans_cachep)
 		goto fail_cachep8;
 
-	error = register_shrinker(&gfs2_qd_shrinker);
+	error = register_shrinker(&gfs2_qd_shrinker, "gfs2_qd");
 	if (error)
 		goto fail_shrinker;
 
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index fcacafa4510d..271f418d3dc3 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1418,7 +1418,7 @@ static journal_t *journal_init_common(struct block_device *bdev,
 	if (percpu_counter_init(&journal->j_checkpoint_jh_count, 0, GFP_KERNEL))
 		goto err_cleanup;
 
-	if (register_shrinker(&journal->j_shrinker)) {
+	if (register_shrinker(&journal->j_shrinker, "jbd2_journal")) {
 		percpu_counter_destroy(&journal->j_checkpoint_jh_count);
 		goto err_cleanup;
 	}
diff --git a/fs/mbcache.c b/fs/mbcache.c
index 97c54d3a2227..379dc5b0b6ad 100644
--- a/fs/mbcache.c
+++ b/fs/mbcache.c
@@ -367,7 +367,7 @@ struct mb_cache *mb_cache_create(int bucket_bits)
 	cache->c_shrink.count_objects = mb_cache_count;
 	cache->c_shrink.scan_objects = mb_cache_scan;
 	cache->c_shrink.seeks = DEFAULT_SEEKS;
-	if (register_shrinker(&cache->c_shrink)) {
+	if (register_shrinker(&cache->c_shrink, "mb_cache")) {
 		kfree(cache->c_hash);
 		kfree(cache);
 		goto err_out;
diff --git a/fs/nfs/nfs42xattr.c b/fs/nfs/nfs42xattr.c
index e7b34f7e0614..147b8a2f2dc6 100644
--- a/fs/nfs/nfs42xattr.c
+++ b/fs/nfs/nfs42xattr.c
@@ -1017,15 +1017,16 @@ int __init nfs4_xattr_cache_init(void)
 	if (ret)
 		goto out2;
 
-	ret = register_shrinker(&nfs4_xattr_cache_shrinker);
+	ret = register_shrinker(&nfs4_xattr_cache_shrinker, "nfs_xattr_cache");
 	if (ret)
 		goto out1;
 
-	ret = register_shrinker(&nfs4_xattr_entry_shrinker);
+	ret = register_shrinker(&nfs4_xattr_entry_shrinker, "nfs_xattr_entry");
 	if (ret)
 		goto out;
 
-	ret = register_shrinker(&nfs4_xattr_large_entry_shrinker);
+	ret = register_shrinker(&nfs4_xattr_large_entry_shrinker,
+				"nfs_xattr_large_entry");
 	if (!ret)
 		return 0;
 
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 6ab5eeb000dc..c7a2aef911f1 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -149,7 +149,7 @@ int __init register_nfs_fs(void)
 	ret = nfs_register_sysctl();
 	if (ret < 0)
 		goto error_2;
-	ret = register_shrinker(&acl_shrinker);
+	ret = register_shrinker(&acl_shrinker, "nfs_acl");
 	if (ret < 0)
 		goto error_3;
 #ifdef CONFIG_NFS_V4_2
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 2c1b027774d4..9c2879a3c3c0 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -666,7 +666,7 @@ nfsd_file_cache_init(void)
 		goto out_err;
 	}
 
-	ret = register_shrinker(&nfsd_file_shrinker);
+	ret = register_shrinker(&nfsd_file_shrinker, "nfsd_filecache");
 	if (ret) {
 		pr_err("nfsd: failed to register nfsd_file_shrinker: %d\n", ret);
 		goto out_lru;
diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index 0b3f12aa37ff..f1cfb06d0be5 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -176,7 +176,7 @@ int nfsd_reply_cache_init(struct nfsd_net *nn)
 	nn->nfsd_reply_cache_shrinker.scan_objects = nfsd_reply_cache_scan;
 	nn->nfsd_reply_cache_shrinker.count_objects = nfsd_reply_cache_count;
 	nn->nfsd_reply_cache_shrinker.seeks = 1;
-	status = register_shrinker(&nn->nfsd_reply_cache_shrinker);
+	status = register_shrinker(&nn->nfsd_reply_cache_shrinker, "nfsd_reply");
 	if (status)
 		goto out_stats_destroy;
 
diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index a74aef99bd3d..854d2b1d0914 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -2985,7 +2985,7 @@ static int __init dquot_init(void)
 	pr_info("VFS: Dquot-cache hash table entries: %ld (order %ld,"
 		" %ld bytes)\n", nr_hash, order, (PAGE_SIZE << order));
 
-	if (register_shrinker(&dqcache_shrinker))
+	if (register_shrinker(&dqcache_shrinker, "dqcache"))
 		panic("Cannot register dquot shrinker");
 
 	return 0;
diff --git a/fs/super.c b/fs/super.c
index f1d4a193602d..8983c264cfc6 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -265,7 +265,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
 	s->s_shrink.count_objects = super_cache_count;
 	s->s_shrink.batch = 1024;
 	s->s_shrink.flags = SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE;
-	if (prealloc_shrinker(&s->s_shrink))
+	if (prealloc_shrinker(&s->s_shrink, "sb-%s", type->name))
 		goto fail;
 	if (list_lru_init_memcg(&s->s_dentry_lru, &s->s_shrink))
 		goto fail;
diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index bad67455215f..a3663d201f64 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -2430,7 +2430,7 @@ static int __init ubifs_init(void)
 	if (!ubifs_inode_slab)
 		return -ENOMEM;
 
-	err = register_shrinker(&ubifs_shrinker_info);
+	err = register_shrinker(&ubifs_shrinker_info, "ubifs");
 	if (err)
 		goto out_slab;
 
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index e1afb9e503e1..5645e92df0c9 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1986,7 +1986,7 @@ xfs_alloc_buftarg(
 	btp->bt_shrinker.scan_objects = xfs_buftarg_shrink_scan;
 	btp->bt_shrinker.seeks = DEFAULT_SEEKS;
 	btp->bt_shrinker.flags = SHRINKER_NUMA_AWARE;
-	if (register_shrinker(&btp->bt_shrinker))
+	if (register_shrinker(&btp->bt_shrinker, "xfs_buf"))
 		goto error_pcpu;
 	return btp;
 
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index bffd6eb0b298..d0c4e74ff763 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -2198,5 +2198,5 @@ xfs_inodegc_register_shrinker(
 	shrink->flags = SHRINKER_NONSLAB;
 	shrink->batch = XFS_INODEGC_SHRINKER_BATCH;
 
-	return register_shrinker(shrink);
+	return register_shrinker(shrink, "xfs_inodegc");
 }
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index f165d1a3de1d..93ded9e81f49 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -686,7 +686,7 @@ xfs_qm_init_quotainfo(
 	qinf->qi_shrinker.seeks = DEFAULT_SEEKS;
 	qinf->qi_shrinker.flags = SHRINKER_NUMA_AWARE;
 
-	error = register_shrinker(&qinf->qi_shrinker);
+	error = register_shrinker(&qinf->qi_shrinker, "xfs_qm");
 	if (error)
 		goto out_free_inos;
 
diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 50c0e233ecdd..8b683f141098 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -77,6 +77,7 @@ struct shrinker {
 #endif
 #ifdef CONFIG_SHRINKER_DEBUG
 	struct shrinker_kobj *kobj;
+	char *name;
 #endif
 	/* objs pending delete, per node */
 	atomic_long_t *nr_deferred;
@@ -93,9 +94,9 @@ struct shrinker {
  */
 #define SHRINKER_NONSLAB	(1 << 3)
 
-extern int prealloc_shrinker(struct shrinker *shrinker);
+extern int prealloc_shrinker(struct shrinker *shrinker, const char *fmt, ...);
 extern void register_shrinker_prepared(struct shrinker *shrinker);
-extern int register_shrinker(struct shrinker *shrinker);
+extern int register_shrinker(struct shrinker *shrinker, const char *fmt, ...);
 extern void unregister_shrinker(struct shrinker *shrinker);
 extern void free_prealloced_shrinker(struct shrinker *shrinker);
 extern void synchronize_shrinkers(void);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 29d0fa150721..4cd5f9907f80 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4874,7 +4874,7 @@ static void __init kfree_rcu_batch_init(void)
 		INIT_DELAYED_WORK(&krcp->page_cache_work, fill_page_cache_func);
 		krcp->initialized = true;
 	}
-	if (register_shrinker(&kfree_rcu_shrinker))
+	if (register_shrinker(&kfree_rcu_shrinker, "kfree_rcu"))
 		pr_err("Failed to register kfree_rcu() shrinker!\n");
 }
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 72683907c23c..eccaa3bbd467 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -423,10 +423,10 @@ static int __init hugepage_init(void)
 	if (err)
 		goto err_slab;
 
-	err = register_shrinker(&huge_zero_page_shrinker);
+	err = register_shrinker(&huge_zero_page_shrinker, "thp_zero");
 	if (err)
 		goto err_hzp_shrinker;
-	err = register_shrinker(&deferred_split_shrinker);
+	err = register_shrinker(&deferred_split_shrinker, "thp_deferred_split");
 	if (err)
 		goto err_split_shrinker;
 
diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c
index ae6e434500bc..ce74d220bad3 100644
--- a/mm/shrinker_debug.c
+++ b/mm/shrinker_debug.c
@@ -738,14 +738,17 @@ int shrinker_init_kobj(struct shrinker *shrinker)
 	skobj->id = id;
 	skobj->kobj.kset = shrinker_kset;
 	skobj->shrinker = shrinker;
-	ret = kobject_init_and_add(&skobj->kobj, &shrinker_ktype, NULL, "%d",
-				   id);
+	ret = kobject_init_and_add(&skobj->kobj, &shrinker_ktype, NULL, "%s-%d",
+				   shrinker->name, id);
 	if (ret) {
 		ida_free(&shrinker_sysfs_ida, id);
 		kobject_put(&skobj->kobj);
 		return ret;
 	}
 
+	/* shrinker->name is not needed anymore, free it */
+	kfree(shrinker->name);
+	shrinker->name = NULL;
 	shrinker->kobj = skobj;
 
 	kobject_uevent(&skobj->kobj, KOBJ_ADD);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 79eaa9cea618..4b030c94f3c0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -624,7 +624,7 @@ static unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru,
 /*
  * Add a shrinker callback to be called from the vm.
  */
-int prealloc_shrinker(struct shrinker *shrinker)
+static int __prealloc_shrinker(struct shrinker *shrinker)
 {
 	unsigned int size;
 	int err;
@@ -648,6 +648,34 @@ int prealloc_shrinker(struct shrinker *shrinker)
 	return 0;
 }
 
+#ifdef CONFIG_SHRINKER_DEBUG
+int prealloc_shrinker(struct shrinker *shrinker, const char *fmt, ...)
+{
+	int err;
+	char buf[64];
+	va_list ap;
+
+	va_start(ap, fmt);
+	vscnprintf(buf, sizeof(buf), fmt, ap);
+	va_end(ap);
+
+	shrinker->name = kstrdup(buf, GFP_KERNEL);
+	if (!shrinker->name)
+		return -ENOMEM;
+
+	err = __prealloc_shrinker(shrinker);
+	if (err)
+		kfree(shrinker->name);
+
+	return err;
+}
+#else
+int prealloc_shrinker(struct shrinker *shrinker, const char *fmt, ...)
+{
+	return __prealloc_shrinker(shrinker);
+}
+#endif
+
 void free_prealloced_shrinker(struct shrinker *shrinker)
 {
 	if (shrinker->flags & SHRINKER_MEMCG_AWARE) {
@@ -659,6 +687,9 @@ void free_prealloced_shrinker(struct shrinker *shrinker)
 
 	kfree(shrinker->nr_deferred);
 	shrinker->nr_deferred = NULL;
+#ifdef CONFIG_SHRINKER_DEBUG
+	kfree(shrinker->name);
+#endif
 }
 
 void register_shrinker_prepared(struct shrinker *shrinker)
@@ -670,15 +701,38 @@ void register_shrinker_prepared(struct shrinker *shrinker)
 	up_write(&shrinker_rwsem);
 }
 
-int register_shrinker(struct shrinker *shrinker)
+static int __register_shrinker(struct shrinker *shrinker)
 {
-	int err = prealloc_shrinker(shrinker);
+	int err = __prealloc_shrinker(shrinker);
 
 	if (err)
 		return err;
 	register_shrinker_prepared(shrinker);
 	return 0;
 }
+
+#ifdef CONFIG_SHRINKER_DEBUG
+int register_shrinker(struct shrinker *shrinker, const char *fmt, ...)
+{
+	char buf[64];
+	va_list ap;
+
+	va_start(ap, fmt);
+	vscnprintf(buf, sizeof(buf), fmt, ap);
+	va_end(ap);
+
+	shrinker->name = kstrdup(buf, GFP_KERNEL);
+	if (!shrinker->name)
+		return -ENOMEM;
+
+	return __register_shrinker(shrinker);
+}
+#else
+int register_shrinker(struct shrinker *shrinker, const char *fmt, ...)
+{
+	return __register_shrinker(shrinker);
+}
+#endif
 EXPORT_SYMBOL(register_shrinker);
 
 /*
diff --git a/mm/workingset.c b/mm/workingset.c
index 85f7472f07ba..8a16fcb573d8 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -740,7 +740,7 @@ static int __init workingset_init(void)
 	pr_info("workingset: timestamp_bits=%d max_order=%d bucket_order=%u\n",
 	       timestamp_bits, max_order, bucket_order);
 
-	ret = prealloc_shrinker(&workingset_shadow_shrinker);
+	ret = prealloc_shrinker(&workingset_shadow_shrinker, "shadow");
 	if (ret)
 		goto err;
 	ret = __list_lru_init(&shadow_nodes, true, &shadow_nodes_key,
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 9152fbde33b5..a19de176f604 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -2188,7 +2188,7 @@ static int zs_register_shrinker(struct zs_pool *pool)
 	pool->shrinker.batch = 0;
 	pool->shrinker.seeks = DEFAULT_SEEKS;
 
-	return register_shrinker(&pool->shrinker);
+	return register_shrinker(&pool->shrinker, "zspool");
 }
 
 /**
diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c
index 682fcd24bf43..a29742a9c3f1 100644
--- a/net/sunrpc/auth.c
+++ b/net/sunrpc/auth.c
@@ -874,7 +874,7 @@ int __init rpcauth_init_module(void)
 	err = rpc_init_authunix();
 	if (err < 0)
 		goto out1;
-	err = register_shrinker(&rpc_cred_shrinker);
+	err = register_shrinker(&rpc_cred_shrinker, "rpc_cred");
 	if (err < 0)
 		goto out2;
 	return 0;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH rfc 1/5] mm: introduce sysfs interface for debugging kernel shrinker
  2022-04-16  0:27 ` [PATCH rfc 1/5] mm: introduce sysfs interface for debugging kernel shrinker Roman Gushchin
@ 2022-04-16  1:35   ` Hillf Danton
  0 siblings, 0 replies; 24+ messages in thread
From: Hillf Danton @ 2022-04-16  1:35 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, Andrew Morton, Dave Chinner, linux-kernel, Michal Hocko

On Fri, 15 Apr 2022 17:27:52 -0700 Roman Gushchin wrote:
> This commit introduces the /sys/kernel/shrinker sysfs interface
> which provides an ability to observe the state and interact with
> individual kernel memory shrinkers.
> 
> Because the feature is oriented on kernel developers and adds some
> memory overhead (which shouldn't be large unless there is a huge
> amount of registered shrinkers), it's guarded by a config option
> (disabled by default).
> 
> To simplify the code, kobjects are not embedded into shrinkers
> objects, but are created, linked and unlinked dynamically.
> 
> This commit introduces basic "count" and "scan" interfaces.
> Basic usage:
>   $ cat count                   : get the number of objects
>   $ echo "500" > scan           : try to reclaim 500 objects
>   $ cat scan                    : get the number of objects reclaimed

What is nice in design is the window opened for scanning individual shrinker
without bothering wakeup of kswapd, thus this is good work from the drawing
board.

> +
> +static ssize_t scan_store(struct shrinker_kobj *skobj,
> +			  struct shrinker_attribute *attr,
> +			  const char *buf, size_t size)
> +{
> +	unsigned long nr, total = 0, nr_to_scan = 0, freed = 0;
> +	unsigned long *count_per_node = NULL;
> +	struct shrinker *shrinker;
> +	ssize_t ret = size;
> +	int nid;
> +
> +	if (kstrtoul(buf, 10, &nr_to_scan))
> +		return -EINVAL;
> +
> +	down_read(&shrinker_rwsem);

Nit, use down_read_killable instead to allow the CAP_SYS_ADMIN guy to
change mind on cmdline.

Hillf


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rfc 0/5] mm: introduce shrinker sysfs interface
  2022-04-16  0:27 [PATCH rfc 0/5] mm: introduce shrinker sysfs interface Roman Gushchin
                   ` (4 preceding siblings ...)
  2022-04-16  0:27 ` [PATCH rfc 5/5] mm: provide shrinkers with names Roman Gushchin
@ 2022-04-18  9:27 ` Mike Rapoport
  2022-04-18 17:27   ` Roman Gushchin
  2022-04-19  4:27 ` Andrew Morton
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Mike Rapoport @ 2022-04-18  9:27 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, Andrew Morton, Dave Chinner, linux-kernel,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Yang Shi

On Fri, Apr 15, 2022 at 05:27:51PM -0700, Roman Gushchin wrote:
> There are 50+ different shrinkers in the kernel, many with their own bells and
> whistles. Under the memory pressure the kernel applies some pressure on each of
> them in the order of which they were created/registered in the system. Some
> of them can contain only few objects, some can be quite large. Some can be
> effective at reclaiming memory, some not.
> 
> The only existing debugging mechanism is a couple of tracepoints in
> do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end. They aren't
> covering everything though: shrinkers which report 0 objects will never show up,
> there is no support for memcg-aware shrinkers. Shrinkers are identified by their
> scan function, which is not always enough (e.g. hard to guess which super
> block's shrinker it is having only "super_cache_scan"). They are a passive
> mechanism: there is no way to call into counting and scanning of an individual
> shrinker and profile it.
> 
> To provide a better visibility and debug options for memory shrinkers
> this patchset introduces a /sys/kernel/shrinker interface, to some extent
> similar to /sys/kernel/slab.

Wouldn't debugfs better fit the purpose of shrinker debugging?
 
-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rfc 0/5] mm: introduce shrinker sysfs interface
  2022-04-18  9:27 ` [PATCH rfc 0/5] mm: introduce shrinker sysfs interface Mike Rapoport
@ 2022-04-18 17:27   ` Roman Gushchin
  2022-04-19  6:33     ` Mike Rapoport
  0 siblings, 1 reply; 24+ messages in thread
From: Roman Gushchin @ 2022-04-18 17:27 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-mm, Andrew Morton, Dave Chinner, linux-kernel,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Yang Shi

On Mon, Apr 18, 2022 at 12:27:36PM +0300, Mike Rapoport wrote:
> On Fri, Apr 15, 2022 at 05:27:51PM -0700, Roman Gushchin wrote:
> > There are 50+ different shrinkers in the kernel, many with their own bells and
> > whistles. Under the memory pressure the kernel applies some pressure on each of
> > them in the order of which they were created/registered in the system. Some
> > of them can contain only few objects, some can be quite large. Some can be
> > effective at reclaiming memory, some not.
> > 
> > The only existing debugging mechanism is a couple of tracepoints in
> > do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end. They aren't
> > covering everything though: shrinkers which report 0 objects will never show up,
> > there is no support for memcg-aware shrinkers. Shrinkers are identified by their
> > scan function, which is not always enough (e.g. hard to guess which super
> > block's shrinker it is having only "super_cache_scan"). They are a passive
> > mechanism: there is no way to call into counting and scanning of an individual
> > shrinker and profile it.
> > 
> > To provide a better visibility and debug options for memory shrinkers
> > this patchset introduces a /sys/kernel/shrinker interface, to some extent
> > similar to /sys/kernel/slab.
> 
> Wouldn't debugfs better fit the purpose of shrinker debugging?

I think sysfs fits better, but not a very strong opinion.

Even though the interface is likely not very useful for the general
public, big cloud instances might wanna enable it to gather statistics
(and it's certainly what we gonna do at Facebook) and to provide
additional data when something is off.  They might not have debugfs
mounted. And it's really similar to /sys/kernel/slab.

Are there any reasons why debugfs is preferable?

Thanks!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rfc 0/5] mm: introduce shrinker sysfs interface
  2022-04-16  0:27 [PATCH rfc 0/5] mm: introduce shrinker sysfs interface Roman Gushchin
                   ` (5 preceding siblings ...)
  2022-04-18  9:27 ` [PATCH rfc 0/5] mm: introduce shrinker sysfs interface Mike Rapoport
@ 2022-04-19  4:27 ` Andrew Morton
  2022-04-19 17:52   ` Roman Gushchin
  2022-04-19 18:20 ` Kent Overstreet
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2022-04-19  4:27 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, Dave Chinner, linux-kernel, Johannes Weiner,
	Michal Hocko, Shakeel Butt, Yang Shi

On Fri, 15 Apr 2022 17:27:51 -0700 Roman Gushchin <roman.gushchin@linux.dev> wrote:

> There are 50+ different shrinkers in the kernel, many with their own bells and
> whistles. Under the memory pressure the kernel applies some pressure on each of
> them in the order of which they were created/registered in the system. Some
> of them can contain only few objects, some can be quite large. Some can be
> effective at reclaiming memory, some not.
> 
> The only existing debugging mechanism is a couple of tracepoints in
> do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end. They aren't
> covering everything though: shrinkers which report 0 objects will never show up,
> there is no support for memcg-aware shrinkers. Shrinkers are identified by their
> scan function, which is not always enough (e.g. hard to guess which super
> block's shrinker it is having only "super_cache_scan"). They are a passive
> mechanism: there is no way to call into counting and scanning of an individual
> shrinker and profile it.
> 
> To provide a better visibility and debug options for memory shrinkers
> this patchset introduces a /sys/kernel/shrinker interface, to some extent
> similar to /sys/kernel/slab.
> 
> For each shrinker registered in the system a folder is created.

Please, "directory".

> The folder
> contains "count" and "scan" files, which allow to trigger count_objects()
> and scan_objects() callbacks. For memcg-aware and numa-aware shrinkers
> count_memcg, scan_memcg, count_node, scan_node, count_memcg_node
> and scan_memcg_node are additionally provided. They allow to get per-memcg
> and/or per-node object count and shrink only a specific memcg/node.
> 
> To make debugging more pleasant, the patchset also names all shrinkers,
> so that sysfs entries can have more meaningful names.

I also was wondering "why not debugfs".

> Usage examples:
> 
> ...
>
> If the output doesn't fit into a single page, "...\n" is printed at the end of
> output.

Unclear.  At the end of what output?

> 
> Roman Gushchin (5):
>   mm: introduce sysfs interface for debugging kernel shrinker
>   mm: memcontrol: introduce mem_cgroup_ino() and
>     mem_cgroup_get_from_ino()
>   mm: introduce memcg interfaces for shrinker sysfs
>   mm: introduce numa interfaces for shrinker sysfs
>   mm: provide shrinkers with names
> 
>  arch/x86/kvm/mmu/mmu.c                        |   2 +-
>  ...
>

Nothing under Documentation/!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rfc 0/5] mm: introduce shrinker sysfs interface
  2022-04-18 17:27   ` Roman Gushchin
@ 2022-04-19  6:33     ` Mike Rapoport
  2022-04-19 17:58       ` Roman Gushchin
  0 siblings, 1 reply; 24+ messages in thread
From: Mike Rapoport @ 2022-04-19  6:33 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, Andrew Morton, Dave Chinner, linux-kernel,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Yang Shi

On Mon, Apr 18, 2022 at 10:27:34AM -0700, Roman Gushchin wrote:
> On Mon, Apr 18, 2022 at 12:27:36PM +0300, Mike Rapoport wrote:
> > On Fri, Apr 15, 2022 at 05:27:51PM -0700, Roman Gushchin wrote:
> > > There are 50+ different shrinkers in the kernel, many with their own bells and
> > > whistles. Under the memory pressure the kernel applies some pressure on each of
> > > them in the order of which they were created/registered in the system. Some
> > > of them can contain only few objects, some can be quite large. Some can be
> > > effective at reclaiming memory, some not.
> > > 
> > > The only existing debugging mechanism is a couple of tracepoints in
> > > do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end. They aren't
> > > covering everything though: shrinkers which report 0 objects will never show up,
> > > there is no support for memcg-aware shrinkers. Shrinkers are identified by their
> > > scan function, which is not always enough (e.g. hard to guess which super
> > > block's shrinker it is having only "super_cache_scan"). They are a passive
> > > mechanism: there is no way to call into counting and scanning of an individual
> > > shrinker and profile it.
> > > 
> > > To provide a better visibility and debug options for memory shrinkers
> > > this patchset introduces a /sys/kernel/shrinker interface, to some extent
> > > similar to /sys/kernel/slab.
> > 
> > Wouldn't debugfs better fit the purpose of shrinker debugging?
> 
> I think sysfs fits better, but not a very strong opinion.
> 
> Even though the interface is likely not very useful for the general
> public, big cloud instances might wanna enable it to gather statistics
> (and it's certainly what we gonna do at Facebook) and to provide
> additional data when something is off.  They might not have debugfs
> mounted. And it's really similar to /sys/kernel/slab.

And there is also similar /proc/vmallocinfo so why not /proc/shrinker? ;-)

I suspect slab ended up in sysfs because nobody suggested to use debugfs
back then. I've been able to track the transition from /proc/slabinfo to
/proc/slubinfo to /sys/kernel/slab, but could not find why Christoph chose
sysfs in the end.

> Are there any reasons why debugfs is preferable?

debugfs is more flexible because it's not stable kernel ABI so if there
will be need/desire to change the layout and content of the files with
debugfs it can be done more easily.

Is this a real problem for Facebook to mount debugfs? ;-)
 
> Thanks!

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rfc 0/5] mm: introduce shrinker sysfs interface
  2022-04-19  4:27 ` Andrew Morton
@ 2022-04-19 17:52   ` Roman Gushchin
  2022-04-19 18:25     ` Andrew Morton
  2022-04-19 18:33     ` Greg KH
  0 siblings, 2 replies; 24+ messages in thread
From: Roman Gushchin @ 2022-04-19 17:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Dave Chinner, linux-kernel, Johannes Weiner,
	Michal Hocko, Shakeel Butt, Yang Shi

On Mon, Apr 18, 2022 at 09:27:09PM -0700, Andrew Morton wrote:
> On Fri, 15 Apr 2022 17:27:51 -0700 Roman Gushchin <roman.gushchin@linux.dev> wrote:
> 
> > There are 50+ different shrinkers in the kernel, many with their own bells and
> > whistles. Under the memory pressure the kernel applies some pressure on each of
> > them in the order of which they were created/registered in the system. Some
> > of them can contain only few objects, some can be quite large. Some can be
> > effective at reclaiming memory, some not.
> > 
> > The only existing debugging mechanism is a couple of tracepoints in
> > do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end. They aren't
> > covering everything though: shrinkers which report 0 objects will never show up,
> > there is no support for memcg-aware shrinkers. Shrinkers are identified by their
> > scan function, which is not always enough (e.g. hard to guess which super
> > block's shrinker it is having only "super_cache_scan"). They are a passive
> > mechanism: there is no way to call into counting and scanning of an individual
> > shrinker and profile it.
> > 
> > To provide a better visibility and debug options for memory shrinkers
> > this patchset introduces a /sys/kernel/shrinker interface, to some extent
> > similar to /sys/kernel/slab.
> > 
> > For each shrinker registered in the system a folder is created.
> 
> Please, "directory".

Of course, sorry :)

> 
> > The folder
> > contains "count" and "scan" files, which allow to trigger count_objects()
> > and scan_objects() callbacks. For memcg-aware and numa-aware shrinkers
> > count_memcg, scan_memcg, count_node, scan_node, count_memcg_node
> > and scan_memcg_node are additionally provided. They allow to get per-memcg
> > and/or per-node object count and shrink only a specific memcg/node.
> > 
> > To make debugging more pleasant, the patchset also names all shrinkers,
> > so that sysfs entries can have more meaningful names.
> 
> I also was wondering "why not debugfs".

Fair enough, moving to debugfs in v1.

> 
> > Usage examples:
> > 
> > ...
> >
> > If the output doesn't fit into a single page, "...\n" is printed at the end of
> > output.
> 
> Unclear.  At the end of what output?

This is how it looks like when the output is too long:

[root@eth50-1 sb-btrfs-24]# cat count_memcg
1 226
20 96
53 811
2429 2
218 13
581 29
911 124
1010 3
1043 1
1076 1
1241 60
1274 7
1307 39
1340 3
1406 14
1439 63
1472 54
1505 8
1538 1
1571 6
1604 39
1637 9
1670 8
1703 4
1736 1094
1802 2
1868 2
1901 52
1934 592
1967 32
			< CUT >
18797 1
18830 1
18863 1
18896 1
18929 1
18962 1
18995 1
19028 1
19061 1
19094 1
19127 1
19160 1
19193 1
...

I'll try to make it more obvious from the description.

> 
> > 
> > Roman Gushchin (5):
> >   mm: introduce sysfs interface for debugging kernel shrinker
> >   mm: memcontrol: introduce mem_cgroup_ino() and
> >     mem_cgroup_get_from_ino()
> >   mm: introduce memcg interfaces for shrinker sysfs
> >   mm: introduce numa interfaces for shrinker sysfs
> >   mm: provide shrinkers with names
> > 
> >  arch/x86/kvm/mmu/mmu.c                        |   2 +-
> >  ...
> >
> 
> Nothing under Documentation/!

I planned to add it after the rfc version. Will do.

Thank you for taking a look!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rfc 0/5] mm: introduce shrinker sysfs interface
  2022-04-19  6:33     ` Mike Rapoport
@ 2022-04-19 17:58       ` Roman Gushchin
  0 siblings, 0 replies; 24+ messages in thread
From: Roman Gushchin @ 2022-04-19 17:58 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-mm, Andrew Morton, Dave Chinner, linux-kernel,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Yang Shi

On Tue, Apr 19, 2022 at 09:33:48AM +0300, Mike Rapoport wrote:
> On Mon, Apr 18, 2022 at 10:27:34AM -0700, Roman Gushchin wrote:
> > On Mon, Apr 18, 2022 at 12:27:36PM +0300, Mike Rapoport wrote:
> > > On Fri, Apr 15, 2022 at 05:27:51PM -0700, Roman Gushchin wrote:
> > > > There are 50+ different shrinkers in the kernel, many with their own bells and
> > > > whistles. Under the memory pressure the kernel applies some pressure on each of
> > > > them in the order of which they were created/registered in the system. Some
> > > > of them can contain only few objects, some can be quite large. Some can be
> > > > effective at reclaiming memory, some not.
> > > > 
> > > > The only existing debugging mechanism is a couple of tracepoints in
> > > > do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end. They aren't
> > > > covering everything though: shrinkers which report 0 objects will never show up,
> > > > there is no support for memcg-aware shrinkers. Shrinkers are identified by their
> > > > scan function, which is not always enough (e.g. hard to guess which super
> > > > block's shrinker it is having only "super_cache_scan"). They are a passive
> > > > mechanism: there is no way to call into counting and scanning of an individual
> > > > shrinker and profile it.
> > > > 
> > > > To provide a better visibility and debug options for memory shrinkers
> > > > this patchset introduces a /sys/kernel/shrinker interface, to some extent
> > > > similar to /sys/kernel/slab.
> > > 
> > > Wouldn't debugfs better fit the purpose of shrinker debugging?
> > 
> > I think sysfs fits better, but not a very strong opinion.
> > 
> > Even though the interface is likely not very useful for the general
> > public, big cloud instances might wanna enable it to gather statistics
> > (and it's certainly what we gonna do at Facebook) and to provide
> > additional data when something is off.  They might not have debugfs
> > mounted. And it's really similar to /sys/kernel/slab.
> 
> And there is also similar /proc/vmallocinfo so why not /proc/shrinker? ;-)
> 
> I suspect slab ended up in sysfs because nobody suggested to use debugfs
> back then. I've been able to track the transition from /proc/slabinfo to
> /proc/slubinfo to /sys/kernel/slab, but could not find why Christoph chose
> sysfs in the end.
>
> > Are there any reasons why debugfs is preferable?
> 
> debugfs is more flexible because it's not stable kernel ABI so if there
> will be need/desire to change the layout and content of the files with
> debugfs it can be done more easily.
> 
> Is this a real problem for Facebook to mount debugfs? ;-)

Fair enough, switching to debugfs in the next version.

Thanks!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rfc 0/5] mm: introduce shrinker sysfs interface
  2022-04-16  0:27 [PATCH rfc 0/5] mm: introduce shrinker sysfs interface Roman Gushchin
                   ` (6 preceding siblings ...)
  2022-04-19  4:27 ` Andrew Morton
@ 2022-04-19 18:20 ` Kent Overstreet
  2022-04-19 18:58   ` Roman Gushchin
  2022-04-19 18:36 ` Kent Overstreet
  2022-04-20 22:24 ` Yang Shi
  9 siblings, 1 reply; 24+ messages in thread
From: Kent Overstreet @ 2022-04-19 18:20 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, Andrew Morton, Dave Chinner, linux-kernel,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Yang Shi

On Fri, Apr 15, 2022 at 05:27:51PM -0700, Roman Gushchin wrote:
> There are 50+ different shrinkers in the kernel, many with their own bells and
> whistles. Under the memory pressure the kernel applies some pressure on each of
> them in the order of which they were created/registered in the system. Some
> of them can contain only few objects, some can be quite large. Some can be
> effective at reclaiming memory, some not.
> 
> The only existing debugging mechanism is a couple of tracepoints in
> do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end. They aren't
> covering everything though: shrinkers which report 0 objects will never show up,
> there is no support for memcg-aware shrinkers. Shrinkers are identified by their
> scan function, which is not always enough (e.g. hard to guess which super
> block's shrinker it is having only "super_cache_scan"). They are a passive
> mechanism: there is no way to call into counting and scanning of an individual
> shrinker and profile it.
> 
> To provide a better visibility and debug options for memory shrinkers
> this patchset introduces a /sys/kernel/shrinker interface, to some extent
> similar to /sys/kernel/slab.
> 
> For each shrinker registered in the system a folder is created. The folder
> contains "count" and "scan" files, which allow to trigger count_objects()
> and scan_objects() callbacks. For memcg-aware and numa-aware shrinkers
> count_memcg, scan_memcg, count_node, scan_node, count_memcg_node
> and scan_memcg_node are additionally provided. They allow to get per-memcg
> and/or per-node object count and shrink only a specific memcg/node.

Cool!

I've been starting to sketch out some shrinker improvements of my own, perhaps
we could combine efforts. The issue I've been targeting is that when we hit an
OOM, we currently don't get a lot of useful information - shrinkers ought to be
included, and we really want information on shrinker's internal state (e.g.
object dirtyness) if we're to have a chance at understanding why memory isn't
getting reclaimed.

https://evilpiepirate.org/git/bcachefs.git/log/?h=shrinker_to_text

This adds a .to_text() method - a pretty-printer - that shrinkers can
implement, and then on OOM we report on the top 10 shrinkers by memory usage, in
sorted order.

Another thing I'd like to do is have shrinkers report usage not just in object
counts but in bytes; I think it should be obvious why that's desirable.

Maybe we could have a memory-reporting-and-shrinker-improvements session at LSF?
I'd love to do some collective brainstorming and get some real momementum going
in this area.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rfc 0/5] mm: introduce shrinker sysfs interface
  2022-04-19 17:52   ` Roman Gushchin
@ 2022-04-19 18:25     ` Andrew Morton
  2022-04-19 18:43       ` Roman Gushchin
  2022-04-19 18:33     ` Greg KH
  1 sibling, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2022-04-19 18:25 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, Dave Chinner, linux-kernel, Johannes Weiner,
	Michal Hocko, Shakeel Butt, Yang Shi

On Tue, 19 Apr 2022 10:52:44 -0700 Roman Gushchin <roman.gushchin@linux.dev> wrote:

> > Unclear.  At the end of what output?
> 
> This is how it looks like when the output is too long:
> 
> [root@eth50-1 sb-btrfs-24]# cat count_memcg
> 1 226
> 20 96
> 53 811
> 2429 2
> 218 13
> 581 29
> 911 124
> 1010 3
> 1043 1
> 1076 1
> 1241 60
> 1274 7
> 1307 39
> 1340 3
> 1406 14
> 1439 63
> 1472 54
> 1505 8
> 1538 1
> 1571 6
> 1604 39
> 1637 9
> 1670 8
> 1703 4
> 1736 1094
> 1802 2
> 1868 2
> 1901 52
> 1934 592
> 1967 32
> 			< CUT >
> 18797 1
> 18830 1

We do that in-kernel?  Why?  That just makes parsers harder to write?
If someone has issues then direct them at /usr/bin/less?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rfc 0/5] mm: introduce shrinker sysfs interface
  2022-04-19 17:52   ` Roman Gushchin
  2022-04-19 18:25     ` Andrew Morton
@ 2022-04-19 18:33     ` Greg KH
  1 sibling, 0 replies; 24+ messages in thread
From: Greg KH @ 2022-04-19 18:33 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, linux-mm, Dave Chinner, linux-kernel,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Yang Shi

On Tue, Apr 19, 2022 at 10:52:44AM -0700, Roman Gushchin wrote:
> On Mon, Apr 18, 2022 at 09:27:09PM -0700, Andrew Morton wrote:
> > > The folder
> > > contains "count" and "scan" files, which allow to trigger count_objects()
> > > and scan_objects() callbacks. For memcg-aware and numa-aware shrinkers
> > > count_memcg, scan_memcg, count_node, scan_node, count_memcg_node
> > > and scan_memcg_node are additionally provided. They allow to get per-memcg
> > > and/or per-node object count and shrink only a specific memcg/node.
> > > 
> > > To make debugging more pleasant, the patchset also names all shrinkers,
> > > so that sysfs entries can have more meaningful names.
> > 
> > I also was wondering "why not debugfs".
> 
> Fair enough, moving to debugfs in v1.

Thank you, that keeps me from complaining about how badly you were
abusing sysfs in this patchset :)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rfc 0/5] mm: introduce shrinker sysfs interface
  2022-04-16  0:27 [PATCH rfc 0/5] mm: introduce shrinker sysfs interface Roman Gushchin
                   ` (7 preceding siblings ...)
  2022-04-19 18:20 ` Kent Overstreet
@ 2022-04-19 18:36 ` Kent Overstreet
  2022-04-19 18:50   ` Roman Gushchin
  2022-04-20 22:24 ` Yang Shi
  9 siblings, 1 reply; 24+ messages in thread
From: Kent Overstreet @ 2022-04-19 18:36 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, Andrew Morton, Dave Chinner, linux-kernel,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Yang Shi,
	Greg Kroah-Hartman

On Fri, Apr 15, 2022 at 05:27:51PM -0700, Roman Gushchin wrote:
> 7) Don't display cgroups with less than 500 attached objects
>   $ echo 500 > count_memcg
>   $ cat count_memcg
>     53 817
>     1868 886
>     2396 799
>     2462 861
> 
> 8) Don't display cgroups with less than 500 attached objects (sum over all nodes)
>   $ echo "500" > count_memcg_node
>   $ cat count_memcg_node
>     53 810 7
>     1868 886 0
>     2396 799 0
>     2462 861 0
> 
> 9) Scan system/root shrinker
>   $ cat count
>     212
>   $ echo 100 > scan
>   $ cat scan
>     97
>   $ cat count
>     115

This part seems entirely overengineered though and a really bad idea - can we
please _not_ store query state in the kernel? It's not thread safe, and it seems
like overengineering before we've done the basics (just getting this stuff in
sysfs is a major improvement!).

I know kmemleak does something kinda sorta like this, but that's a special
purpose debugging tool and this looks to be something more general purpose
that'll get used in production.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rfc 0/5] mm: introduce shrinker sysfs interface
  2022-04-19 18:25     ` Andrew Morton
@ 2022-04-19 18:43       ` Roman Gushchin
  0 siblings, 0 replies; 24+ messages in thread
From: Roman Gushchin @ 2022-04-19 18:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Dave Chinner, linux-kernel, Johannes Weiner,
	Michal Hocko, Shakeel Butt, Yang Shi

On Tue, Apr 19, 2022 at 11:25:49AM -0700, Andrew Morton wrote:
> On Tue, 19 Apr 2022 10:52:44 -0700 Roman Gushchin <roman.gushchin@linux.dev> wrote:
> 
> > > Unclear.  At the end of what output?
> > 
> > This is how it looks like when the output is too long:
> > 
> > [root@eth50-1 sb-btrfs-24]# cat count_memcg
> > 1 226
> > 20 96
> > 53 811
> > 2429 2
> > 218 13
> > 581 29
> > 911 124
> > 1010 3
> > 1043 1
> > 1076 1
> > 1241 60
> > 1274 7
> > 1307 39
> > 1340 3
> > 1406 14
> > 1439 63
> > 1472 54
> > 1505 8
> > 1538 1
> > 1571 6
> > 1604 39
> > 1637 9
> > 1670 8
> > 1703 4
> > 1736 1094
> > 1802 2
> > 1868 2
> > 1901 52
> > 1934 592
> > 1967 32
> > 			< CUT >
> > 18797 1
> > 18830 1
> 
> We do that in-kernel?  Why?  That just makes parsers harder to write?
> If someone has issues then direct them at /usr/bin/less?

It comes from the sysfs limitation: it expects that the output should fit
into the PAGE_SIZE. If the number of cgroups (and nodes) is large, it's not
always possible. In theory something like seq_file API should be used, but
Idk how hard it's to mix it with the sysfs/debugfs API. I'll try to figure
this out.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rfc 0/5] mm: introduce shrinker sysfs interface
  2022-04-19 18:36 ` Kent Overstreet
@ 2022-04-19 18:50   ` Roman Gushchin
  2022-04-19 21:10     ` Kent Overstreet
  0 siblings, 1 reply; 24+ messages in thread
From: Roman Gushchin @ 2022-04-19 18:50 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-mm, Andrew Morton, Dave Chinner, linux-kernel,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Yang Shi,
	Greg Kroah-Hartman

On Tue, Apr 19, 2022 at 02:36:54PM -0400, Kent Overstreet wrote:
> On Fri, Apr 15, 2022 at 05:27:51PM -0700, Roman Gushchin wrote:
> > 7) Don't display cgroups with less than 500 attached objects
> >   $ echo 500 > count_memcg
> >   $ cat count_memcg
> >     53 817
> >     1868 886
> >     2396 799
> >     2462 861
> > 
> > 8) Don't display cgroups with less than 500 attached objects (sum over all nodes)
> >   $ echo "500" > count_memcg_node
> >   $ cat count_memcg_node
> >     53 810 7
> >     1868 886 0
> >     2396 799 0
> >     2462 861 0
> > 
> > 9) Scan system/root shrinker
> >   $ cat count
> >     212
> >   $ echo 100 > scan
> >   $ cat scan
> >     97
> >   $ cat count
> >     115
> 
> This part seems entirely overengineered though and a really bad idea - can we
> please _not_ store query state in the kernel? It's not thread safe, and it seems
> like overengineering before we've done the basics (just getting this stuff in
> sysfs is a major improvement!).

Yes, it's not great, but I don't have a better idea yet. How to return the number
of freed objects? Do you suggest to drop this functionality at all or there are
other options I'm not seeing?

Counting again isn't a good option either: new object could have been added to
the list during the scan.

Thanks!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rfc 0/5] mm: introduce shrinker sysfs interface
  2022-04-19 18:20 ` Kent Overstreet
@ 2022-04-19 18:58   ` Roman Gushchin
  2022-04-19 19:46     ` Kent Overstreet
  0 siblings, 1 reply; 24+ messages in thread
From: Roman Gushchin @ 2022-04-19 18:58 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-mm, Andrew Morton, Dave Chinner, linux-kernel,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Yang Shi

On Tue, Apr 19, 2022 at 02:20:30PM -0400, Kent Overstreet wrote:
> On Fri, Apr 15, 2022 at 05:27:51PM -0700, Roman Gushchin wrote:
> > There are 50+ different shrinkers in the kernel, many with their own bells and
> > whistles. Under the memory pressure the kernel applies some pressure on each of
> > them in the order of which they were created/registered in the system. Some
> > of them can contain only few objects, some can be quite large. Some can be
> > effective at reclaiming memory, some not.
> > 
> > The only existing debugging mechanism is a couple of tracepoints in
> > do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end. They aren't
> > covering everything though: shrinkers which report 0 objects will never show up,
> > there is no support for memcg-aware shrinkers. Shrinkers are identified by their
> > scan function, which is not always enough (e.g. hard to guess which super
> > block's shrinker it is having only "super_cache_scan"). They are a passive
> > mechanism: there is no way to call into counting and scanning of an individual
> > shrinker and profile it.
> > 
> > To provide a better visibility and debug options for memory shrinkers
> > this patchset introduces a /sys/kernel/shrinker interface, to some extent
> > similar to /sys/kernel/slab.
> > 
> > For each shrinker registered in the system a folder is created. The folder
> > contains "count" and "scan" files, which allow to trigger count_objects()
> > and scan_objects() callbacks. For memcg-aware and numa-aware shrinkers
> > count_memcg, scan_memcg, count_node, scan_node, count_memcg_node
> > and scan_memcg_node are additionally provided. They allow to get per-memcg
> > and/or per-node object count and shrink only a specific memcg/node.
> 
> Cool!
> 
> I've been starting to sketch out some shrinker improvements of my own, perhaps
> we could combine efforts.

Thanks! Absolutely!

> The issue I've been targeting is that when we hit an
> OOM, we currently don't get a lot of useful information - shrinkers ought to be
> included, and we really want information on shrinker's internal state (e.g.
> object dirtyness) if we're to have a chance at understanding why memory isn't
> getting reclaimed.
> 
> https://evilpiepirate.org/git/bcachefs.git/log/?h=shrinker_to_text
> 
> This adds a .to_text() method - a pretty-printer - that shrinkers can
> implement, and then on OOM we report on the top 10 shrinkers by memory usage, in
> sorted order.

We must be really careful with describing what's allowed and not allowed
by these callbacks. In-kernel OOM is the last-resort mechanism and it should
be able to make forward progress in really nasty circumstances. So there are
significant (and not very well described) limitations on what can be done
from the oom context.

> 
> Another thing I'd like to do is have shrinkers report usage not just in object
> counts but in bytes; I think it should be obvious why that's desirable.

I totally agree, it's actually on my short-term todo list.

> 
> Maybe we could have a memory-reporting-and-shrinker-improvements session at LSF?
> I'd love to do some collective brainstorming and get some real momementum going
> in this area.

Would be really nice! I'm planning to work on improving shrinkers and gather ideas
and problems, so having a discussion would be really great.

Thanks!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rfc 0/5] mm: introduce shrinker sysfs interface
  2022-04-19 18:58   ` Roman Gushchin
@ 2022-04-19 19:46     ` Kent Overstreet
  0 siblings, 0 replies; 24+ messages in thread
From: Kent Overstreet @ 2022-04-19 19:46 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, Andrew Morton, Dave Chinner, linux-kernel,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Yang Shi

On Tue, Apr 19, 2022 at 11:58:00AM -0700, Roman Gushchin wrote:
> On Tue, Apr 19, 2022 at 02:20:30PM -0400, Kent Overstreet wrote:
> > On Fri, Apr 15, 2022 at 05:27:51PM -0700, Roman Gushchin wrote:
> > > There are 50+ different shrinkers in the kernel, many with their own bells and
> > > whistles. Under the memory pressure the kernel applies some pressure on each of
> > > them in the order of which they were created/registered in the system. Some
> > > of them can contain only few objects, some can be quite large. Some can be
> > > effective at reclaiming memory, some not.
> > > 
> > > The only existing debugging mechanism is a couple of tracepoints in
> > > do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end. They aren't
> > > covering everything though: shrinkers which report 0 objects will never show up,
> > > there is no support for memcg-aware shrinkers. Shrinkers are identified by their
> > > scan function, which is not always enough (e.g. hard to guess which super
> > > block's shrinker it is having only "super_cache_scan"). They are a passive
> > > mechanism: there is no way to call into counting and scanning of an individual
> > > shrinker and profile it.
> > > 
> > > To provide a better visibility and debug options for memory shrinkers
> > > this patchset introduces a /sys/kernel/shrinker interface, to some extent
> > > similar to /sys/kernel/slab.
> > > 
> > > For each shrinker registered in the system a folder is created. The folder
> > > contains "count" and "scan" files, which allow to trigger count_objects()
> > > and scan_objects() callbacks. For memcg-aware and numa-aware shrinkers
> > > count_memcg, scan_memcg, count_node, scan_node, count_memcg_node
> > > and scan_memcg_node are additionally provided. They allow to get per-memcg
> > > and/or per-node object count and shrink only a specific memcg/node.
> > 
> > Cool!
> > 
> > I've been starting to sketch out some shrinker improvements of my own, perhaps
> > we could combine efforts.
> 
> Thanks! Absolutely!
> 
> > The issue I've been targeting is that when we hit an
> > OOM, we currently don't get a lot of useful information - shrinkers ought to be
> > included, and we really want information on shrinker's internal state (e.g.
> > object dirtyness) if we're to have a chance at understanding why memory isn't
> > getting reclaimed.
> > 
> > https://evilpiepirate.org/git/bcachefs.git/log/?h=shrinker_to_text
> > 
> > This adds a .to_text() method - a pretty-printer - that shrinkers can
> > implement, and then on OOM we report on the top 10 shrinkers by memory usage, in
> > sorted order.
> 
> We must be really careful with describing what's allowed and not allowed
> by these callbacks. In-kernel OOM is the last-resort mechanism and it should
> be able to make forward progress in really nasty circumstances. So there are
> significant (and not very well described) limitations on what can be done
> from the oom context.

Yep. The only "interesting" thing my patches add is that we heap-allocate the
strings the .to_text methods generate (which is good! it means they can be used
both for printing to the console, and by sysfs code). Memory allocation failure
here is hardly the end of the world; those messages will just get truncated, and
I'm also going to mempool-ify printbufs (might do that today).

> > Another thing I'd like to do is have shrinkers report usage not just in object
> > counts but in bytes; I think it should be obvious why that's desirable.
> 
> I totally agree, it's actually on my short-term todo list.

Wonderful. A request I often get is for bcachefs's caches to show up as cached
memory via the free command - a perfectly reasonable request - and reporting
byte counts would make this possible.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rfc 0/5] mm: introduce shrinker sysfs interface
  2022-04-19 18:50   ` Roman Gushchin
@ 2022-04-19 21:10     ` Kent Overstreet
  0 siblings, 0 replies; 24+ messages in thread
From: Kent Overstreet @ 2022-04-19 21:10 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, Andrew Morton, Dave Chinner, linux-kernel,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Yang Shi,
	Greg Kroah-Hartman

On Tue, Apr 19, 2022 at 11:50:45AM -0700, Roman Gushchin wrote:
> On Tue, Apr 19, 2022 at 02:36:54PM -0400, Kent Overstreet wrote:
> > On Fri, Apr 15, 2022 at 05:27:51PM -0700, Roman Gushchin wrote:
> > > 7) Don't display cgroups with less than 500 attached objects
> > >   $ echo 500 > count_memcg
> > >   $ cat count_memcg
> > >     53 817
> > >     1868 886
> > >     2396 799
> > >     2462 861
> > > 
> > > 8) Don't display cgroups with less than 500 attached objects (sum over all nodes)
> > >   $ echo "500" > count_memcg_node
> > >   $ cat count_memcg_node
> > >     53 810 7
> > >     1868 886 0
> > >     2396 799 0
> > >     2462 861 0
> > > 
> > > 9) Scan system/root shrinker
> > >   $ cat count
> > >     212
> > >   $ echo 100 > scan
> > >   $ cat scan
> > >     97
> > >   $ cat count
> > >     115
> > 
> > This part seems entirely overengineered though and a really bad idea - can we
> > please _not_ store query state in the kernel? It's not thread safe, and it seems
> > like overengineering before we've done the basics (just getting this stuff in
> > sysfs is a major improvement!).
> 
> Yes, it's not great, but I don't have a better idea yet. How to return the number
> of freed objects? Do you suggest to drop this functionality at all or there are
> other options I'm not seeing?

I'd just drop all of the stateful stuff - or add an ioctl interface.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rfc 0/5] mm: introduce shrinker sysfs interface
  2022-04-16  0:27 [PATCH rfc 0/5] mm: introduce shrinker sysfs interface Roman Gushchin
                   ` (8 preceding siblings ...)
  2022-04-19 18:36 ` Kent Overstreet
@ 2022-04-20 22:24 ` Yang Shi
  2022-04-20 23:23   ` Roman Gushchin
  9 siblings, 1 reply; 24+ messages in thread
From: Yang Shi @ 2022-04-20 22:24 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Linux MM, Andrew Morton, Dave Chinner, Linux Kernel Mailing List,
	Johannes Weiner, Michal Hocko, Shakeel Butt

On Fri, Apr 15, 2022 at 5:28 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
> There are 50+ different shrinkers in the kernel, many with their own bells and
> whistles. Under the memory pressure the kernel applies some pressure on each of
> them in the order of which they were created/registered in the system. Some
> of them can contain only few objects, some can be quite large. Some can be
> effective at reclaiming memory, some not.
>
> The only existing debugging mechanism is a couple of tracepoints in
> do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end. They aren't
> covering everything though: shrinkers which report 0 objects will never show up,
> there is no support for memcg-aware shrinkers. Shrinkers are identified by their
> scan function, which is not always enough (e.g. hard to guess which super
> block's shrinker it is having only "super_cache_scan"). They are a passive
> mechanism: there is no way to call into counting and scanning of an individual
> shrinker and profile it.
>
> To provide a better visibility and debug options for memory shrinkers
> this patchset introduces a /sys/kernel/shrinker interface, to some extent
> similar to /sys/kernel/slab.
>
> For each shrinker registered in the system a folder is created. The folder
> contains "count" and "scan" files, which allow to trigger count_objects()
> and scan_objects() callbacks. For memcg-aware and numa-aware shrinkers
> count_memcg, scan_memcg, count_node, scan_node, count_memcg_node
> and scan_memcg_node are additionally provided. They allow to get per-memcg
> and/or per-node object count and shrink only a specific memcg/node.
>
> To make debugging more pleasant, the patchset also names all shrinkers,
> so that sysfs entries can have more meaningful names.
>
> Usage examples:

Thanks, Roman. A follow-up question, why do we have to implement this
in kernel if we just count the objects? It seems userspace tools could
achieve it too, for example, drgn :-). Actually I did write a drgn
script for debugging a problem a few months ago, which iterates
specific memcg's lru_list to count the objects by their state.

>
> 1) List registered shrinkers:
>   $ cd /sys/kernel/shrinker/
>   $ ls
>     dqcache-16          sb-cgroup2-30    sb-hugetlbfs-33  sb-proc-41       sb-selinuxfs-22  sb-tmpfs-40    sb-zsmalloc-19
>     kfree_rcu-0         sb-configfs-23   sb-iomem-12      sb-proc-44       sb-sockfs-8      sb-tmpfs-42    shadow-18
>     sb-aio-20           sb-dax-11        sb-mqueue-21     sb-proc-45       sb-sysfs-26      sb-tmpfs-43    thp_deferred_split-10
>     sb-anon_inodefs-15  sb-debugfs-7     sb-nsfs-4        sb-proc-47       sb-tmpfs-1       sb-tmpfs-46    thp_zero-9
>     sb-bdev-3           sb-devpts-28     sb-pipefs-14     sb-pstore-31     sb-tmpfs-27      sb-tmpfs-49    xfs_buf-37
>     sb-bpf-32           sb-devtmpfs-5    sb-proc-25       sb-rootfs-2      sb-tmpfs-29      sb-tracefs-13  xfs_inodegc-38
>     sb-btrfs-24         sb-hugetlbfs-17  sb-proc-39       sb-securityfs-6  sb-tmpfs-35      sb-xfs-36      zspool-34
>
> 2) Get information about a specific shrinker:
>   $ cd sb-btrfs-24/
>   $ ls
>     count  count_memcg  count_memcg_node  count_node  scan  scan_memcg  scan_memcg_node  scan_node
>
> 3) Count objects on the system/root cgroup level
>   $ cat count
>     212
>
> 4) Count objects on the system/root cgroup level per numa node (on a 2-node machine)
>   $ cat count_node
>     209 3
>
> 5) Count objects for each memcg (output format: cgroup inode, count)
>   $ cat count_memcg
>     1 212
>     20 96
>     53 817
>     2297 2
>     218 13
>     581 30
>     911 124
>     <CUT>
>
> 6) Same but with a per-node output
>   $ cat count_memcg_node
>     1 209 3
>     20 96 0
>     53 810 7
>     2297 2 0
>     218 13 0
>     581 30 0
>     911 124 0
>     <CUT>
>
> 7) Don't display cgroups with less than 500 attached objects
>   $ echo 500 > count_memcg
>   $ cat count_memcg
>     53 817
>     1868 886
>     2396 799
>     2462 861
>
> 8) Don't display cgroups with less than 500 attached objects (sum over all nodes)
>   $ echo "500" > count_memcg_node
>   $ cat count_memcg_node
>     53 810 7
>     1868 886 0
>     2396 799 0
>     2462 861 0
>
> 9) Scan system/root shrinker
>   $ cat count
>     212
>   $ echo 100 > scan
>   $ cat scan
>     97
>   $ cat count
>     115
>
> 10) Scan individual memcg
>   $ echo "1868 500" > scan_memcg
>   $ cat scan_memcg
>     193
>
> 11) Scan individual node
>   $ echo "1 200" > scan_node
>   $ cat scan_node
>     2
>
> 12) Scan individual memcg and node
>   $ echo "1868 0 500" > scan_memcg_node
>   $ cat scan_memcg_node
>     435
>
> If the output doesn't fit into a single page, "...\n" is printed at the end of
> output.
>
>
> Roman Gushchin (5):
>   mm: introduce sysfs interface for debugging kernel shrinker
>   mm: memcontrol: introduce mem_cgroup_ino() and
>     mem_cgroup_get_from_ino()
>   mm: introduce memcg interfaces for shrinker sysfs
>   mm: introduce numa interfaces for shrinker sysfs
>   mm: provide shrinkers with names
>
>  arch/x86/kvm/mmu/mmu.c                        |   2 +-
>  drivers/android/binder_alloc.c                |   2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_shrinker.c  |   3 +-
>  drivers/gpu/drm/msm/msm_gem_shrinker.c        |   2 +-
>  .../gpu/drm/panfrost/panfrost_gem_shrinker.c  |   2 +-
>  drivers/gpu/drm/ttm/ttm_pool.c                |   2 +-
>  drivers/md/bcache/btree.c                     |   2 +-
>  drivers/md/dm-bufio.c                         |   2 +-
>  drivers/md/dm-zoned-metadata.c                |   2 +-
>  drivers/md/raid5.c                            |   2 +-
>  drivers/misc/vmw_balloon.c                    |   2 +-
>  drivers/virtio/virtio_balloon.c               |   2 +-
>  drivers/xen/xenbus/xenbus_probe_backend.c     |   2 +-
>  fs/erofs/utils.c                              |   2 +-
>  fs/ext4/extents_status.c                      |   3 +-
>  fs/f2fs/super.c                               |   2 +-
>  fs/gfs2/glock.c                               |   2 +-
>  fs/gfs2/main.c                                |   2 +-
>  fs/jbd2/journal.c                             |   2 +-
>  fs/mbcache.c                                  |   2 +-
>  fs/nfs/nfs42xattr.c                           |   7 +-
>  fs/nfs/super.c                                |   2 +-
>  fs/nfsd/filecache.c                           |   2 +-
>  fs/nfsd/nfscache.c                            |   2 +-
>  fs/quota/dquot.c                              |   2 +-
>  fs/super.c                                    |   2 +-
>  fs/ubifs/super.c                              |   2 +-
>  fs/xfs/xfs_buf.c                              |   2 +-
>  fs/xfs/xfs_icache.c                           |   2 +-
>  fs/xfs/xfs_qm.c                               |   2 +-
>  include/linux/memcontrol.h                    |   9 +
>  include/linux/shrinker.h                      |  25 +-
>  kernel/rcu/tree.c                             |   2 +-
>  lib/Kconfig.debug                             |   9 +
>  mm/Makefile                                   |   1 +
>  mm/huge_memory.c                              |   4 +-
>  mm/memcontrol.c                               |  23 +
>  mm/shrinker_debug.c                           | 792 ++++++++++++++++++
>  mm/vmscan.c                                   |  66 +-
>  mm/workingset.c                               |   2 +-
>  mm/zsmalloc.c                                 |   2 +-
>  net/sunrpc/auth.c                             |   2 +-
>  42 files changed, 957 insertions(+), 47 deletions(-)
>  create mode 100644 mm/shrinker_debug.c
>
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rfc 0/5] mm: introduce shrinker sysfs interface
  2022-04-20 22:24 ` Yang Shi
@ 2022-04-20 23:23   ` Roman Gushchin
  0 siblings, 0 replies; 24+ messages in thread
From: Roman Gushchin @ 2022-04-20 23:23 UTC (permalink / raw)
  To: Yang Shi
  Cc: Linux MM, Andrew Morton, Dave Chinner, Linux Kernel Mailing List,
	Johannes Weiner, Michal Hocko, Shakeel Butt

On Wed, Apr 20, 2022 at 03:24:49PM -0700, Yang Shi wrote:
> On Fri, Apr 15, 2022 at 5:28 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> >
> > There are 50+ different shrinkers in the kernel, many with their own bells and
> > whistles. Under the memory pressure the kernel applies some pressure on each of
> > them in the order of which they were created/registered in the system. Some
> > of them can contain only few objects, some can be quite large. Some can be
> > effective at reclaiming memory, some not.
> >
> > The only existing debugging mechanism is a couple of tracepoints in
> > do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end. They aren't
> > covering everything though: shrinkers which report 0 objects will never show up,
> > there is no support for memcg-aware shrinkers. Shrinkers are identified by their
> > scan function, which is not always enough (e.g. hard to guess which super
> > block's shrinker it is having only "super_cache_scan"). They are a passive
> > mechanism: there is no way to call into counting and scanning of an individual
> > shrinker and profile it.
> >
> > To provide a better visibility and debug options for memory shrinkers
> > this patchset introduces a /sys/kernel/shrinker interface, to some extent
> > similar to /sys/kernel/slab.
> >
> > For each shrinker registered in the system a folder is created. The folder
> > contains "count" and "scan" files, which allow to trigger count_objects()
> > and scan_objects() callbacks. For memcg-aware and numa-aware shrinkers
> > count_memcg, scan_memcg, count_node, scan_node, count_memcg_node
> > and scan_memcg_node are additionally provided. They allow to get per-memcg
> > and/or per-node object count and shrink only a specific memcg/node.
> >
> > To make debugging more pleasant, the patchset also names all shrinkers,
> > so that sysfs entries can have more meaningful names.
> >
> > Usage examples:
> 
> Thanks, Roman. A follow-up question, why do we have to implement this
> in kernel if we just count the objects? It seems userspace tools could
> achieve it too, for example, drgn :-). Actually I did write a drgn
> script for debugging a problem a few months ago, which iterates
> specific memcg's lru_list to count the objects by their state.

Good question! It's because not all shrinkers are lru_list-based
and even some lru_list-based are implementing a custom logic on top of it,
e.g. shadow nodes. So there is no simple way to get the count from
a generic shrinker.

Also I want to be able to reclaim individual shrinkers from userspace
(e.g. to profile how effective the shrinking is).

Thanks!

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2022-04-20 23:23 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-16  0:27 [PATCH rfc 0/5] mm: introduce shrinker sysfs interface Roman Gushchin
2022-04-16  0:27 ` [PATCH rfc 1/5] mm: introduce sysfs interface for debugging kernel shrinker Roman Gushchin
2022-04-16  1:35   ` Hillf Danton
2022-04-16  0:27 ` [PATCH rfc 2/5] mm: memcontrol: introduce mem_cgroup_ino() and mem_cgroup_get_from_ino() Roman Gushchin
2022-04-16  0:27 ` [PATCH rfc 3/5] mm: introduce memcg interfaces for shrinker sysfs Roman Gushchin
2022-04-16  0:27 ` [PATCH rfc 4/5] mm: introduce numa " Roman Gushchin
2022-04-16  0:27 ` [PATCH rfc 5/5] mm: provide shrinkers with names Roman Gushchin
2022-04-18  9:27 ` [PATCH rfc 0/5] mm: introduce shrinker sysfs interface Mike Rapoport
2022-04-18 17:27   ` Roman Gushchin
2022-04-19  6:33     ` Mike Rapoport
2022-04-19 17:58       ` Roman Gushchin
2022-04-19  4:27 ` Andrew Morton
2022-04-19 17:52   ` Roman Gushchin
2022-04-19 18:25     ` Andrew Morton
2022-04-19 18:43       ` Roman Gushchin
2022-04-19 18:33     ` Greg KH
2022-04-19 18:20 ` Kent Overstreet
2022-04-19 18:58   ` Roman Gushchin
2022-04-19 19:46     ` Kent Overstreet
2022-04-19 18:36 ` Kent Overstreet
2022-04-19 18:50   ` Roman Gushchin
2022-04-19 21:10     ` Kent Overstreet
2022-04-20 22:24 ` Yang Shi
2022-04-20 23:23   ` Roman Gushchin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.