All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Printbufs & shrinker OOM reporting
@ 2022-04-21 23:48 Kent Overstreet
  2022-04-21 23:48 ` [PATCH 1/4] lib/printbuf: New data structure for heap-allocated strings Kent Overstreet
                   ` (13 more replies)
  0 siblings, 14 replies; 59+ messages in thread
From: Kent Overstreet @ 2022-04-21 23:48 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linux-fsdevel
  Cc: Kent Overstreet, roman.gushchin, hannes

Debugging OOMs has been one of my sources of frustration, so this patch series
is an attempt to do something about it.

The first patch in the series is something I've been slowly evolving in bcachefs
for years: simple heap allocated strings meant for appending to and building up
structured log/error messages. They make it easy and straightforward to write
pretty-printers for everything, which in turn makes good logging and error
messages something that just happens naturally.

We want it here because that means the reporting I'm adding to shrinkers can be
used by both OOM reporting, and for the sysfs (or is it debugfs now) interface
that Roman is adding.

This patch series also:
 - adds OOM reporting on shrinkers, reporting on top 10 shrinkers (in sorted
   order!)
 - changes slab reporting to be always-on, also reporting top 10 slabs in sorted
   order
 - starts centralizing OOM reporting in mm/show_mem.c
 
The last patch in the series is only a demonstration of how to implement the
shrinker .to_text() method, since bcachefs isn't upstream yet.

Kent Overstreet (4):
  lib/printbuf: New data structure for heap-allocated strings
  mm: Add a .to_text() method for shrinkers
  mm: Centralize & improve oom reporting in show_mem.c
  bcachefs: shrinker.to_text() methods

 fs/bcachefs/btree_cache.c     |  18 ++-
 fs/bcachefs/btree_key_cache.c |  18 ++-
 include/linux/printbuf.h      | 140 ++++++++++++++++++
 include/linux/shrinker.h      |   5 +
 lib/Makefile                  |   4 +-
 lib/printbuf.c                | 271 ++++++++++++++++++++++++++++++++++
 mm/Makefile                   |   2 +-
 mm/oom_kill.c                 |  23 ---
 {lib => mm}/show_mem.c        |  14 ++
 mm/slab.h                     |   6 +-
 mm/slab_common.c              |  53 ++++++-
 mm/vmscan.c                   |  75 ++++++++++
 12 files changed, 587 insertions(+), 42 deletions(-)
 create mode 100644 include/linux/printbuf.h
 create mode 100644 lib/printbuf.c
 rename {lib => mm}/show_mem.c (78%)

-- 
2.35.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 1/4] lib/printbuf: New data structure for heap-allocated strings
  2022-04-21 23:48 [PATCH 0/4] Printbufs & shrinker OOM reporting Kent Overstreet
@ 2022-04-21 23:48 ` Kent Overstreet
  2022-04-21 23:48 ` [PATCH 2/4] mm: Add a .to_text() method for shrinkers Kent Overstreet
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 59+ messages in thread
From: Kent Overstreet @ 2022-04-21 23:48 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linux-fsdevel
  Cc: Kent Overstreet, roman.gushchin, hannes

This adds printbufs: simple heap-allocated strings meant for building up
structured messages, for logging/procfs/sysfs and elsewhere. They've
been heavily used in bcachefs for writing .to_text() functions/methods -
pretty printers, which has in turn greatly improved the overall quality
of error messages.

Basic usage is documented in include/linux/printbuf.h.

The next patches in the series are going to be using printbufs to
implement a .to_text() method for shrinkers, and improving OOM
reporting.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
---
 include/linux/printbuf.h | 140 ++++++++++++++++++++
 lib/Makefile             |   2 +-
 lib/printbuf.c           | 271 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 412 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/printbuf.h
 create mode 100644 lib/printbuf.c

diff --git a/include/linux/printbuf.h b/include/linux/printbuf.h
new file mode 100644
index 0000000000..84a271446d
--- /dev/null
+++ b/include/linux/printbuf.h
@@ -0,0 +1,140 @@
+/* SPDX-License-Identifier: LGPL-2.1+ */
+/* Copyright (C) 2022 Kent Overstreet */
+
+#ifndef _LINUX_PRINTBUF_H
+#define _LINUX_PRINTBUF_H
+
+/*
+ * Printbufs: Simple heap allocated strings, with some features for structered
+ * formatting.
+ *
+ * This code has provisions for use in userspace, to aid in making other code
+ * portable between kernelspace and userspace.
+ *
+ * Basic example:
+ *
+ *   struct printbuf buf = PRINTBUF;
+ *
+ *   pr_buf(&buf, "foo=");
+ *   foo_to_text(&buf, foo);
+ *   printk("%s", buf.buf);
+ *   printbuf_exit(&buf);
+ *
+ * We can now write pretty printers instead of writing code that dumps
+ * everything to the kernel log buffer, and then those pretty-printers can be
+ * used by other code that outputs to kernel log, sysfs, debugfs, etc.
+ *
+ * Memory allocation: Outputing to a printbuf may allocate memory. This
+ * allocation is done with GFP_KERNEL, by default: use the newer
+ * memalloc_*_(save|restore) functions as needed.
+ *
+ * Since no equivalent yet exists for GFP_ATOMIC/GFP_NOWAIT, memory allocations
+ * will be done with GFP_ATOMIC if printbuf->atomic is nonzero.
+ *
+ * Memory allocation failures: We don't return errors directly, because on
+ * memory allocation failure we usually don't want to bail out and unwind - we
+ * want to print what we've got, on a best-effort basis. But code that does want
+ * to return -ENOMEM may check printbuf.allocation_failure.
+ *
+ * Indenting, tabstops:
+ *
+ * To aid is writing multi-line pretty printers spread across multiple
+ * functions, printbufs track the current indent level.
+ *
+ * pr_indent_push() and pr_indent_pop() increase and decrease the current indent
+ * level, respectively.
+ *
+ * To use tabstops, set printbuf->tabstops[]; they are in units of spaces, from
+ * start of line. Once set, pr_tab() will output spaces up to the next tabstop.
+ * pr_tab_rjust() will also advance the current line of text up to the next
+ * tabstop, but it does so by shifting text since the previous tabstop up to the
+ * next tabstop - right justifying it.
+ *
+ * Make sure you use pr_newline() instead of \n in the format string for indent
+ * level and tabstops to work corretly.
+ *
+ * Output units: printbuf->units exists to tell pretty-printers how to output
+ * numbers: a raw value (e.g. directly from a superblock field), as bytes, or as
+ * human readable bytes. pr_units() and pr_sectors() obey it.
+ *
+ * Other helpful functions:
+ *
+ * pr_human_readable_u64, pr_human_readable_s64: Print an integer with human
+ * readable units.
+ *
+ * pr_time(): for printing a time_t with strftime in userspace, prints as an
+ * integer number of seconds in the kernel.
+ *
+ * pr_string_option: Given an enumerated value and a string array with names for
+ * each option, prints out the enum names with the selected one indicated with
+ * square brackets.
+ *
+ * pr_bitflags: Given a bitflag and a string array with names for each bit,
+ * prints out the names of the selected bits.
+ */
+
+#include <linux/slab.h>
+
+enum printbuf_units {
+	PRINTBUF_UNITS_RAW,
+	PRINTBUF_UNITS_BYTES,
+	PRINTBUF_UNITS_HUMAN_READABLE,
+};
+
+struct printbuf {
+	char			*buf;
+	unsigned		size;
+	unsigned		pos;
+	unsigned		last_newline;
+	unsigned		last_field;
+	unsigned		indent;
+	enum printbuf_units	units:8;
+	u8			atomic;
+	bool			allocation_failure:1;
+	u8			tabstop;
+	u8			tabstops[4];
+};
+
+#define PRINTBUF ((struct printbuf) { NULL })
+
+/**
+ * printbuf_exit - exit a printbuf, freeing memory it owns and poisoning it
+ * against accidental use.
+ */
+static inline void printbuf_exit(struct printbuf *buf)
+{
+	kfree(buf->buf);
+	buf->buf = ERR_PTR(-EINTR); /* poison value */
+}
+
+/**
+ * printbuf_reset - re-use a printbuf without freeing and re-initializing it:
+ */
+static inline void printbuf_reset(struct printbuf *buf)
+{
+	buf->pos		= 0;
+	buf->last_newline	= 0;
+	buf->last_field		= 0;
+	buf->indent		= 0;
+	buf->tabstop		= 0;
+}
+
+void pr_buf(struct printbuf *out, const char *fmt, ...)
+	__attribute__ ((format (printf, 2, 3)));
+
+void pr_char(struct printbuf *buf, char c);
+void pr_newline(struct printbuf *);
+void pr_indent_push(struct printbuf *, unsigned);
+void pr_indent_pop(struct printbuf *, unsigned);
+void pr_tab(struct printbuf *);
+void pr_tab_rjust(struct printbuf *);
+void pr_human_readable_u64(struct printbuf *, u64);
+void pr_human_readable_s64(struct printbuf *, s64);
+void pr_units(struct printbuf *, s64, s64);
+void pr_sectors(struct printbuf *, u64);
+void pr_time(struct printbuf *, u64);
+void pr_uuid(struct printbuf *, u8 *);
+void pr_string_option(struct printbuf *, const char * const list[], size_t);
+void pr_bitflags(struct printbuf *, const char * const list[], u64);
+
+#endif /* _LINUX_PRINTBUF_H */
diff --git a/lib/Makefile b/lib/Makefile
index c588a126a3..31a3904eda 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -34,7 +34,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
 	 earlycpio.o seq_buf.o siphash.o dec_and_lock.o \
 	 nmi_backtrace.o nodemask.o win_minmax.o memcat_p.o \
-	 buildid.o
+	 buildid.o printbuf.o
 
 lib-$(CONFIG_PRINTK) += dump_stack.o
 lib-$(CONFIG_SMP) += cpumask.o
diff --git a/lib/printbuf.c b/lib/printbuf.c
new file mode 100644
index 0000000000..1d87de787f
--- /dev/null
+++ b/lib/printbuf.c
@@ -0,0 +1,271 @@
+// SPDX-License-Identifier: LGPL-2.1+
+/* Copyright (C) 2022 Kent Overstreet */
+
+#ifdef __KERNEL__
+#include <linux/export.h>
+#include <linux/kernel.h>
+#else
+#define EXPORT_SYMBOL(x)
+#endif
+
+#include <linux/log2.h>
+#include <linux/printbuf.h>
+
+static inline size_t printbuf_remaining(struct printbuf *buf)
+{
+	return buf->size - buf->pos;
+}
+
+static inline size_t printbuf_linelen(struct printbuf *buf)
+{
+	return buf->pos - buf->last_newline;
+}
+
+static int printbuf_realloc(struct printbuf *out, unsigned extra)
+{
+	unsigned new_size;
+	char *buf;
+
+	if (out->pos + extra + 1 < out->size)
+		return 0;
+
+	new_size = roundup_pow_of_two(out->size + extra);
+	buf = krealloc(out->buf, new_size, !out->atomic ? GFP_KERNEL : GFP_ATOMIC);
+
+	if (!buf) {
+		out->allocation_failure = true;
+		return -ENOMEM;
+	}
+
+	out->buf	= buf;
+	out->size	= new_size;
+	return 0;
+}
+
+void pr_buf(struct printbuf *out, const char *fmt, ...)
+{
+	va_list args;
+	int len;
+
+	do {
+		va_start(args, fmt);
+		len = vsnprintf(out->buf + out->pos, printbuf_remaining(out), fmt, args);
+		va_end(args);
+	} while (len + 1 >= printbuf_remaining(out) &&
+		 !printbuf_realloc(out, len + 1));
+
+	len = min_t(size_t, len,
+		  printbuf_remaining(out) ? printbuf_remaining(out) - 1 : 0);
+	out->pos += len;
+}
+EXPORT_SYMBOL(pr_buf);
+
+void pr_char(struct printbuf *buf, char c)
+{
+	if (!printbuf_realloc(buf, 1)) {
+		buf->buf[buf->pos++] = c;
+		buf->buf[buf->pos] = 0;
+	}
+}
+EXPORT_SYMBOL(pr_char);
+
+void pr_newline(struct printbuf *buf)
+{
+	unsigned i;
+
+	pr_char(buf, '\n');
+
+	buf->last_newline	= buf->pos;
+
+	for (i = 0; i < buf->indent; i++)
+		pr_char(buf, ' ');
+
+	buf->last_field		= buf->pos;
+	buf->tabstop = 0;
+}
+EXPORT_SYMBOL(pr_newline);
+
+void pr_indent_push(struct printbuf *buf, unsigned spaces)
+{
+	buf->indent += spaces;
+	while (spaces--)
+		pr_char(buf, ' ');
+}
+EXPORT_SYMBOL(pr_indent_push);
+
+void pr_indent_pop(struct printbuf *buf, unsigned spaces)
+{
+	if (buf->last_newline + buf->indent == buf->pos) {
+		buf->pos -= spaces;
+		buf->buf[buf->pos] = 0;
+	}
+	buf->indent -= spaces;
+}
+EXPORT_SYMBOL(pr_indent_pop);
+
+void pr_tab(struct printbuf *buf)
+{
+	BUG_ON(buf->tabstop > ARRAY_SIZE(buf->tabstops));
+
+	while (printbuf_remaining(buf) > 1 &&
+	       printbuf_linelen(buf) < buf->tabstops[buf->tabstop])
+		pr_char(buf, ' ');
+
+	buf->last_field = buf->pos;
+	buf->tabstop++;
+}
+EXPORT_SYMBOL(pr_tab);
+
+void pr_tab_rjust(struct printbuf *buf)
+{
+	BUG_ON(buf->tabstop > ARRAY_SIZE(buf->tabstops));
+
+	if (printbuf_linelen(buf) < buf->tabstops[buf->tabstop]) {
+		unsigned move = buf->pos - buf->last_field;
+		unsigned shift = buf->tabstops[buf->tabstop] -
+			printbuf_linelen(buf);
+
+		printbuf_realloc(buf, shift);
+
+		if (buf->last_field + shift + 1 < buf->size) {
+			move = min(move, buf->size - 1 - buf->last_field - shift);
+
+			memmove(buf->buf + buf->last_field + shift,
+				buf->buf + buf->last_field,
+				move);
+			memset(buf->buf + buf->last_field, ' ', shift);
+			buf->pos += shift;
+			buf->buf[buf->pos] = 0;
+		}
+	}
+
+	buf->last_field = buf->pos;
+	buf->tabstop++;
+}
+EXPORT_SYMBOL(pr_tab_rjust);
+
+static const char si_units[] = "?kMGTPEZY";
+
+void pr_human_readable_u64(struct printbuf *buf, u64 v)
+{
+	int u, t = 0;
+
+	for (u = 0; v >= 1024; u++) {
+		t = v & ~(~0U << 10);
+		v >>= 10;
+	}
+
+	pr_buf(buf, "%llu", v);
+
+	/*
+	 * 103 is magic: t is in the range [-1023, 1023] and we want
+	 * to turn it into [-9, 9]
+	 */
+	if (u && t && v < 100 && v > -100)
+		pr_buf(buf, ".%i", t / 103);
+	if (u)
+		pr_char(buf, si_units[u]);
+}
+EXPORT_SYMBOL(pr_human_readable_u64);
+
+void pr_human_readable_s64(struct printbuf *buf, s64 v)
+{
+	if (v < 0)
+		pr_char(buf, '-');
+	pr_human_readable_u64(buf, abs(v));
+}
+EXPORT_SYMBOL(pr_human_readable_s64);
+
+void pr_units(struct printbuf *out, s64 raw, s64 bytes)
+{
+	switch (out->units) {
+	case PRINTBUF_UNITS_RAW:
+		pr_buf(out, "%llu", raw);
+		break;
+	case PRINTBUF_UNITS_BYTES:
+		pr_buf(out, "%llu", bytes);
+		break;
+	case PRINTBUF_UNITS_HUMAN_READABLE:
+		pr_human_readable_s64(out, bytes);
+		break;
+	}
+}
+EXPORT_SYMBOL(pr_units);
+
+void pr_sectors(struct printbuf *out, u64 v)
+{
+	pr_units(out, v, v << 9);
+}
+EXPORT_SYMBOL(pr_sectors);
+
+#ifdef __KERNEL__
+
+void pr_time(struct printbuf *out, u64 time)
+{
+	pr_buf(out, "%llu", time);
+}
+EXPORT_SYMBOL(pr_time);
+
+void pr_uuid(struct printbuf *out, u8 *uuid)
+{
+	pr_buf(out, "%pUb", uuid);
+}
+EXPORT_SYMBOL(pr_uuid);
+
+#else
+
+#include <time.h>
+#include <uuid.h>
+
+void pr_time(struct printbuf *out, u64 _time)
+{
+	char time_str[64];
+	time_t time = _time;
+	struct tm *tm = localtime(&time);
+	size_t err = strftime(time_str, sizeof(time_str), "%c", tm);
+
+	if (!err)
+		pr_buf(out, "(formatting error)");
+	else
+		pr_buf(out, "%s", time_str);
+}
+
+void pr_uuid(struct printbuf *out, u8 *uuid)
+{
+	char uuid_str[40];
+
+	uuid_unparse_lower(uuid, uuid_str);
+	pr_buf(out, uuid_str);
+}
+
+#endif
+
+void pr_string_option(struct printbuf *out,
+		      const char * const list[],
+		      size_t selected)
+{
+	size_t i;
+
+	for (i = 0; list[i]; i++)
+		pr_buf(out, i == selected ? "[%s] " : "%s ", list[i]);
+}
+EXPORT_SYMBOL(pr_string_option);
+
+void pr_bitflags(struct printbuf *out,
+		 const char * const list[], u64 flags)
+{
+	unsigned bit, nr = 0;
+	bool first = true;
+
+	while (list[nr])
+		nr++;
+
+	while (flags && (bit = __ffs(flags)) < nr) {
+		if (!first)
+			pr_buf(out, ",");
+		first = false;
+		pr_buf(out, "%s", list[bit]);
+		flags ^= 1 << bit;
+	}
+}
+EXPORT_SYMBOL(pr_bitflags);
-- 
2.35.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 2/4] mm: Add a .to_text() method for shrinkers
  2022-04-21 23:48 [PATCH 0/4] Printbufs & shrinker OOM reporting Kent Overstreet
  2022-04-21 23:48 ` [PATCH 1/4] lib/printbuf: New data structure for heap-allocated strings Kent Overstreet
@ 2022-04-21 23:48 ` Kent Overstreet
  2022-04-22 12:21   ` Michal Hocko
  2022-04-21 23:48 ` [PATCH 3/4] mm: Centralize & improve oom reporting in show_mem.c Kent Overstreet
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 59+ messages in thread
From: Kent Overstreet @ 2022-04-21 23:48 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linux-fsdevel
  Cc: Kent Overstreet, roman.gushchin, hannes

This adds a new callback method to shrinkers which they can use to
describe anything relevant to memory reclaim about their internal state,
for example object dirtyness.

This uses the new printbufs to output to heap allocated strings, so that
the .to_text() methods can be used both for messages logged to the
console, and also sysfs/debugfs.

This patch also adds shrinkers_to_text(), which reports on the top 10
shrinkers - by object count - in sorted order, to be used in OOM
reporting.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
---
 include/linux/shrinker.h |  5 +++
 mm/vmscan.c              | 75 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 80 insertions(+)

diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 76fbf92b04..b5f411768b 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -2,6 +2,8 @@
 #ifndef _LINUX_SHRINKER_H
 #define _LINUX_SHRINKER_H
 
+struct printbuf;
+
 /*
  * This struct is used to pass information from page reclaim to the shrinkers.
  * We consolidate the values for easier extension later.
@@ -58,10 +60,12 @@ struct shrink_control {
  * @flags determine the shrinker abilities, like numa awareness
  */
 struct shrinker {
+	char name[32];
 	unsigned long (*count_objects)(struct shrinker *,
 				       struct shrink_control *sc);
 	unsigned long (*scan_objects)(struct shrinker *,
 				      struct shrink_control *sc);
+	void (*to_text)(struct printbuf *, struct shrinker *);
 
 	long batch;	/* reclaim batch size, 0 = default */
 	int seeks;	/* seeks to recreate an obj */
@@ -94,4 +98,5 @@ extern int register_shrinker(struct shrinker *shrinker);
 extern void unregister_shrinker(struct shrinker *shrinker);
 extern void free_prealloced_shrinker(struct shrinker *shrinker);
 extern void synchronize_shrinkers(void);
+void shrinkers_to_text(struct printbuf *);
 #endif
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 59b14e0d69..09c483dfd3 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -50,6 +50,7 @@
 #include <linux/printk.h>
 #include <linux/dax.h>
 #include <linux/psi.h>
+#include <linux/printbuf.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -702,6 +703,80 @@ void synchronize_shrinkers(void)
 }
 EXPORT_SYMBOL(synchronize_shrinkers);
 
+/**
+ * shrinkers_to_text - Report on shrinkers with highest usage
+ *
+ * This reports on the top 10 shrinkers, by object counts, in sorted order:
+ * intended to be used for OOM reporting.
+ */
+void shrinkers_to_text(struct printbuf *out)
+{
+	struct shrinker *shrinker;
+	struct shrinker_by_mem {
+		struct shrinker	*shrinker;
+		unsigned long	mem;
+	} shrinkers_by_mem[10];
+	int i, nr = 0;
+
+	if (!down_read_trylock(&shrinker_rwsem)) {
+		pr_buf(out, "(couldn't take shrinker lock)");
+		return;
+	}
+
+	list_for_each_entry(shrinker, &shrinker_list, list) {
+		struct shrink_control sc = { .gfp_mask = GFP_KERNEL, };
+		unsigned long mem = shrinker->count_objects(shrinker, &sc);
+
+		if (!mem || mem == SHRINK_STOP || mem == SHRINK_EMPTY)
+			continue;
+
+		for (i = 0; i < nr; i++)
+			if (mem < shrinkers_by_mem[i].mem)
+				break;
+
+		if (nr < ARRAY_SIZE(shrinkers_by_mem)) {
+			memmove(&shrinkers_by_mem[i + 1],
+				&shrinkers_by_mem[i],
+				sizeof(shrinkers_by_mem[0]) * (nr - i));
+			nr++;
+		} else if (i) {
+			i--;
+			memmove(&shrinkers_by_mem[0],
+				&shrinkers_by_mem[1],
+				sizeof(shrinkers_by_mem[0]) * i);
+		} else {
+			continue;
+		}
+
+		shrinkers_by_mem[i] = (struct shrinker_by_mem) {
+			.shrinker = shrinker,
+			.mem = mem,
+		};
+	}
+
+	for (i = nr - 1; i >= 0; --i) {
+		struct shrink_control sc = { .gfp_mask = GFP_KERNEL, };
+		shrinker = shrinkers_by_mem[i].shrinker;
+
+		if (shrinker->name[0])
+			pr_buf(out, "%s", shrinker->name);
+		else
+			pr_buf(out, "%ps:", shrinker->scan_objects);
+
+		pr_buf(out, " objects: %lu", shrinker->count_objects(shrinker, &sc));
+		pr_newline(out);
+
+		if (shrinker->to_text) {
+			pr_indent_push(out, 2);
+			shrinker->to_text(out, shrinker);
+			pr_indent_pop(out, 2);
+			pr_newline(out);
+		}
+	}
+
+	up_read(&shrinker_rwsem);
+}
+
 #define SHRINK_BATCH 128
 
 static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
-- 
2.35.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 3/4] mm: Centralize & improve oom reporting in show_mem.c
  2022-04-21 23:48 [PATCH 0/4] Printbufs & shrinker OOM reporting Kent Overstreet
  2022-04-21 23:48 ` [PATCH 1/4] lib/printbuf: New data structure for heap-allocated strings Kent Overstreet
  2022-04-21 23:48 ` [PATCH 2/4] mm: Add a .to_text() method for shrinkers Kent Overstreet
@ 2022-04-21 23:48 ` Kent Overstreet
  2022-04-21 23:48 ` [PATCH 4/4] bcachefs: shrinker.to_text() methods Kent Overstreet
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 59+ messages in thread
From: Kent Overstreet @ 2022-04-21 23:48 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linux-fsdevel
  Cc: Kent Overstreet, roman.gushchin, hannes

This patch:
 - Moves lib/show_mem.c to mm/show_mem.c
 - Changes show_mem() to always report on slab usage
 - Instead of reporting on all slabs, we only report on top 10 slabs,
   and in sorted order
 - Also reports on shrinkers, with the new shrinkers_to_text().

More OOM reporting can be moved to show_mem.c and improved, this patch
is only a small start.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
---
 lib/Makefile           |  2 +-
 mm/Makefile            |  2 +-
 mm/oom_kill.c          | 23 ------------------
 {lib => mm}/show_mem.c | 14 +++++++++++
 mm/slab.h              |  6 +++--
 mm/slab_common.c       | 53 +++++++++++++++++++++++++++++++++++-------
 6 files changed, 65 insertions(+), 35 deletions(-)
 rename {lib => mm}/show_mem.c (78%)

diff --git a/lib/Makefile b/lib/Makefile
index 31a3904eda..c5041d33d0 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -30,7 +30,7 @@ endif
 lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 rbtree.o radix-tree.o timerqueue.o xarray.o \
 	 idr.o extable.o sha1.o irq_regs.o argv_split.o \
-	 flex_proportions.o ratelimit.o show_mem.o \
+	 flex_proportions.o ratelimit.o \
 	 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
 	 earlycpio.o seq_buf.o siphash.o dec_and_lock.o \
 	 nmi_backtrace.o nodemask.o win_minmax.o memcat_p.o \
diff --git a/mm/Makefile b/mm/Makefile
index 70d4309c9c..97c0be12f3 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -54,7 +54,7 @@ obj-y			:= filemap.o mempool.o oom_kill.o fadvise.o \
 			   mm_init.o percpu.o slab_common.o \
 			   compaction.o vmacache.o \
 			   interval_tree.o list_lru.o workingset.o \
-			   debug.o gup.o mmap_lock.o $(mmu-y)
+			   debug.o gup.o mmap_lock.o show_mem.o $(mmu-y)
 
 # Give 'page_alloc' its own module-parameter namespace
 page-alloc-y := page_alloc.o
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 832fb33037..659c7d6376 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -171,27 +171,6 @@ static bool oom_unkillable_task(struct task_struct *p)
 	return false;
 }
 
-/*
- * Check whether unreclaimable slab amount is greater than
- * all user memory(LRU pages).
- * dump_unreclaimable_slab() could help in the case that
- * oom due to too much unreclaimable slab used by kernel.
-*/
-static bool should_dump_unreclaim_slab(void)
-{
-	unsigned long nr_lru;
-
-	nr_lru = global_node_page_state(NR_ACTIVE_ANON) +
-		 global_node_page_state(NR_INACTIVE_ANON) +
-		 global_node_page_state(NR_ACTIVE_FILE) +
-		 global_node_page_state(NR_INACTIVE_FILE) +
-		 global_node_page_state(NR_ISOLATED_ANON) +
-		 global_node_page_state(NR_ISOLATED_FILE) +
-		 global_node_page_state(NR_UNEVICTABLE);
-
-	return (global_node_page_state_pages(NR_SLAB_UNRECLAIMABLE_B) > nr_lru);
-}
-
 /**
  * oom_badness - heuristic function to determine which candidate task to kill
  * @p: task struct of which task we should calculate
@@ -465,8 +444,6 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
 		mem_cgroup_print_oom_meminfo(oc->memcg);
 	else {
 		show_mem(SHOW_MEM_FILTER_NODES, oc->nodemask);
-		if (should_dump_unreclaim_slab())
-			dump_unreclaimable_slab();
 	}
 	if (sysctl_oom_dump_tasks)
 		dump_tasks(oc);
diff --git a/lib/show_mem.c b/mm/show_mem.c
similarity index 78%
rename from lib/show_mem.c
rename to mm/show_mem.c
index 1c26c14ffb..c9f37f13d6 100644
--- a/lib/show_mem.c
+++ b/mm/show_mem.c
@@ -7,11 +7,15 @@
 
 #include <linux/mm.h>
 #include <linux/cma.h>
+#include <linux/printbuf.h>
+
+#include "slab.h"
 
 void show_mem(unsigned int filter, nodemask_t *nodemask)
 {
 	pg_data_t *pgdat;
 	unsigned long total = 0, reserved = 0, highmem = 0;
+	struct printbuf buf = PRINTBUF;
 
 	printk("Mem-Info:\n");
 	show_free_areas(filter, nodemask);
@@ -41,4 +45,14 @@ void show_mem(unsigned int filter, nodemask_t *nodemask)
 #ifdef CONFIG_MEMORY_FAILURE
 	printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages));
 #endif
+
+	pr_info("Unreclaimable slab info:\n");
+	dump_unreclaimable_slab(&buf);
+	printk("%s", buf.buf);
+	printbuf_reset(&buf);
+
+	printk("Shrinkers:\n");
+	shrinkers_to_text(&buf);
+	printk("%s", buf.buf);
+	printbuf_exit(&buf);
 }
diff --git a/mm/slab.h b/mm/slab.h
index c7f2abc2b1..abefbf7674 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -788,10 +788,12 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
 
 #endif
 
+struct printbuf;
+
 #if defined(CONFIG_SLAB) || defined(CONFIG_SLUB_DEBUG)
-void dump_unreclaimable_slab(void);
+void dump_unreclaimable_slab(struct printbuf *);
 #else
-static inline void dump_unreclaimable_slab(void)
+static inline void dump_unreclaimable_slab(struct printbuf *out)
 {
 }
 #endif
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 23f2ab0713..cb1c548c73 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -24,6 +24,7 @@
 #include <asm/tlbflush.h>
 #include <asm/page.h>
 #include <linux/memcontrol.h>
+#include <linux/printbuf.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/kmem.h>
@@ -1084,10 +1085,15 @@ static int slab_show(struct seq_file *m, void *p)
 	return 0;
 }
 
-void dump_unreclaimable_slab(void)
+void dump_unreclaimable_slab(struct printbuf *out)
 {
 	struct kmem_cache *s;
 	struct slabinfo sinfo;
+	struct slab_by_mem {
+		struct kmem_cache *s;
+		size_t total, active;
+	} slabs_by_mem[10], n;
+	int i, nr = 0;
 
 	/*
 	 * Here acquiring slab_mutex is risky since we don't prefer to get
@@ -1097,12 +1103,11 @@ void dump_unreclaimable_slab(void)
 	 * without acquiring the mutex.
 	 */
 	if (!mutex_trylock(&slab_mutex)) {
-		pr_warn("excessive unreclaimable slab but cannot dump stats\n");
+		pr_buf(out, "excessive unreclaimable slab but cannot dump stats\n");
 		return;
 	}
 
-	pr_info("Unreclaimable slab info:\n");
-	pr_info("Name                      Used          Total\n");
+	buf->atomic++;
 
 	list_for_each_entry(s, &slab_caches, list) {
 		if (s->flags & SLAB_RECLAIM_ACCOUNT)
@@ -1110,11 +1115,43 @@ void dump_unreclaimable_slab(void)
 
 		get_slabinfo(s, &sinfo);
 
-		if (sinfo.num_objs > 0)
-			pr_info("%-17s %10luKB %10luKB\n", s->name,
-				(sinfo.active_objs * s->size) / 1024,
-				(sinfo.num_objs * s->size) / 1024);
+		if (!sinfo.num_objs)
+			continue;
+
+		n.s = s;
+		n.total = sinfo.num_objs * s->size;
+		n.active = sinfo.active_objs * s->size;
+
+		for (i = 0; i < nr; i++)
+			if (n.total < slabs_by_mem[i].total)
+				break;
+
+		if (nr < ARRAY_SIZE(slabs_by_mem)) {
+			memmove(&slabs_by_mem[i + 1],
+				&slabs_by_mem[i],
+				sizeof(slabs_by_mem[0]) * (nr - i));
+			nr++;
+		} else if (i) {
+			i--;
+			memmove(&slabs_by_mem[0],
+				&slabs_by_mem[1],
+				sizeof(slabs_by_mem[0]) * i);
+		} else {
+			continue;
+		}
+
+		slabs_by_mem[i] = n;
+	}
+
+	for (i = nr - 1; i >= 0; --i) {
+		pr_buf(out, "%-17s total: ", slabs_by_mem[i].s->name);
+		pr_human_readable_u64(out, slabs_by_mem[i].total);
+		pr_buf(out, " active: ");
+		pr_human_readable_u64(out, slabs_by_mem[i].active);
+		pr_newline(out);
 	}
+
+	--buf->atomic;
 	mutex_unlock(&slab_mutex);
 }
 
-- 
2.35.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 4/4] bcachefs: shrinker.to_text() methods
  2022-04-21 23:48 [PATCH 0/4] Printbufs & shrinker OOM reporting Kent Overstreet
                   ` (2 preceding siblings ...)
  2022-04-21 23:48 ` [PATCH 3/4] mm: Centralize & improve oom reporting in show_mem.c Kent Overstreet
@ 2022-04-21 23:48 ` Kent Overstreet
  2022-04-21 23:48 ` [PATCH v2 0/8] Printbufs & improved shrinker debugging Kent Overstreet
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 59+ messages in thread
From: Kent Overstreet @ 2022-04-21 23:48 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linux-fsdevel; +Cc: Kent Overstreet, roman.gushchin

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
---
 fs/bcachefs/btree_cache.c     | 18 +++++++++++++++---
 fs/bcachefs/btree_key_cache.c | 18 +++++++++++++++---
 2 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/fs/bcachefs/btree_cache.c b/fs/bcachefs/btree_cache.c
index 72f0587e4d..75ef3b5462 100644
--- a/fs/bcachefs/btree_cache.c
+++ b/fs/bcachefs/btree_cache.c
@@ -394,6 +394,14 @@ static unsigned long bch2_btree_cache_count(struct shrinker *shrink,
 	return btree_cache_can_free(bc);
 }
 
+static void bch2_btree_cache_shrinker_to_text(struct printbuf *out, struct shrinker *shrink)
+{
+	struct bch_fs *c = container_of(shrink, struct bch_fs,
+					btree_cache.shrink);
+
+	bch2_btree_cache_to_text(out, c);
+}
+
 void bch2_fs_btree_cache_exit(struct bch_fs *c)
 {
 	struct btree_cache *bc = &c->btree_cache;
@@ -477,6 +485,7 @@ int bch2_fs_btree_cache_init(struct bch_fs *c)
 
 	bc->shrink.count_objects	= bch2_btree_cache_count;
 	bc->shrink.scan_objects		= bch2_btree_cache_scan;
+	bc->shrink.to_text		= bch2_btree_cache_shrinker_to_text;
 	bc->shrink.seeks		= 4;
 	ret = register_shrinker(&bc->shrink);
 out:
@@ -1147,7 +1156,10 @@ void bch2_btree_node_to_text(struct printbuf *out, struct bch_fs *c,
 
 void bch2_btree_cache_to_text(struct printbuf *out, struct bch_fs *c)
 {
-	pr_buf(out, "nr nodes:\t\t%u\n", c->btree_cache.used);
-	pr_buf(out, "nr dirty:\t\t%u\n", atomic_read(&c->btree_cache.dirty));
-	pr_buf(out, "cannibalize lock:\t%p\n", c->btree_cache.alloc_lock);
+	pr_buf(out, "nr nodes:          %u", c->btree_cache.used);
+	pr_newline(out);
+	pr_buf(out, "nr dirty:          %u", atomic_read(&c->btree_cache.dirty));
+	pr_newline(out);
+	pr_buf(out, "cannibalize lock:  %p", c->btree_cache.alloc_lock);
+	pr_newline(out);
 }
diff --git a/fs/bcachefs/btree_key_cache.c b/fs/bcachefs/btree_key_cache.c
index a575189f35..32b5cb6042 100644
--- a/fs/bcachefs/btree_key_cache.c
+++ b/fs/bcachefs/btree_key_cache.c
@@ -711,6 +711,14 @@ void bch2_fs_btree_key_cache_init_early(struct btree_key_cache *c)
 	INIT_LIST_HEAD(&c->freed);
 }
 
+static void bch2_btree_key_cache_shrinker_to_text(struct printbuf *out, struct shrinker *shrink)
+{
+	struct btree_key_cache *bc =
+		container_of(shrink, struct btree_key_cache, shrink);
+
+	bch2_btree_key_cache_to_text(out, bc);
+}
+
 int bch2_fs_btree_key_cache_init(struct btree_key_cache *c)
 {
 	int ret;
@@ -724,14 +732,18 @@ int bch2_fs_btree_key_cache_init(struct btree_key_cache *c)
 	c->shrink.seeks			= 1;
 	c->shrink.count_objects		= bch2_btree_key_cache_count;
 	c->shrink.scan_objects		= bch2_btree_key_cache_scan;
+	c->shrink.to_text		= bch2_btree_key_cache_shrinker_to_text;
 	return register_shrinker(&c->shrink);
 }
 
 void bch2_btree_key_cache_to_text(struct printbuf *out, struct btree_key_cache *c)
 {
-	pr_buf(out, "nr_freed:\t%zu\n",	c->nr_freed);
-	pr_buf(out, "nr_keys:\t%lu\n",	atomic_long_read(&c->nr_keys));
-	pr_buf(out, "nr_dirty:\t%lu\n",	atomic_long_read(&c->nr_dirty));
+	pr_buf(out, "nr_freed:  %zu",	c->nr_freed);
+	pr_newline(out);
+	pr_buf(out, "nr_keys:   %zu",	atomic_long_read(&c->nr_keys));
+	pr_newline(out);
+	pr_buf(out, "nr_dirty:  %zu",	atomic_long_read(&c->nr_dirty));
+	pr_newline(out);
 }
 
 void bch2_btree_key_cache_exit(void)
-- 
2.35.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v2 0/8] Printbufs & improved shrinker debugging
  2022-04-21 23:48 [PATCH 0/4] Printbufs & shrinker OOM reporting Kent Overstreet
                   ` (3 preceding siblings ...)
  2022-04-21 23:48 ` [PATCH 4/4] bcachefs: shrinker.to_text() methods Kent Overstreet
@ 2022-04-21 23:48 ` Kent Overstreet
  2022-04-21 23:48 ` [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings Kent Overstreet
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 59+ messages in thread
From: Kent Overstreet @ 2022-04-21 23:48 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linux-fsdevel
  Cc: Kent Overstreet, hch, hannes, akpm, linux-clk, linux-tegra,
	linux-input, roman.gushchin

v1 of this patch series here:
https://lore.kernel.org/linux-mm/20220419203202.2670193-1-kent.overstreet@gmail.com/T/

Changes since v1:

 - Converted some, not all, seq_buf code to printbufs (thanks Christoph for
   pointing out seq_buf).

   The seq_buf code I didn't convert is arch specific code where the memory
   allocation context is unclear, and they're using seq_buf to do everything on
   the stack. I'm considering adding a mode to printbufs where we point it at an
   external buffer instead - it wouldn't be much code, but OTOH seq_buf isn't
   much code either and it seems to be somewhat tied to tracing infrastructure.
   Deferring a decision on what to do for now.

 - pr_human_readable_u64() now uses string_get_size() (thanks Matthew for
   pointing this one out)

 - added new helpers printbuf_str() for getting a guaranteed-null-terminated
   string, and printbuf_atomic_inc() and printbuf_atomic_dec() for marking
   sections where allocations must be GFP_ATOMIC.

 - Broke out shrinker_to_text(): this new helper could be used by new sysfs or
   debugfs code, for displaying information about a single shrinker (as Roman is
   working on)

 - Added new tracking, per shrinker, for # of objects requested to be freed and
   # actually freed. Shrinkers won't necessarily free all objects requested for
   perfectly legitimate reasons, but if the two numbers are wildly off then
   that's going to lead to memory reclaim issues - these are both also included
   in shrinker_to_text().

Kent Overstreet (8):
  lib/printbuf: New data structure for heap-allocated strings
  Input/joystick/analog: Convert from seq_buf -> printbuf
  mm/memcontrol.c: Convert to printbuf
  clk: tegra: bpmp: Convert to printbuf
  mm: Add a .to_text() method for shrinkers
  mm: Count requests to free & nr freed per shrinker
  mm: Move lib/show_mem.c to mm/
  mm: Centralize & improve oom reporting in show_mem.c

 drivers/clk/tegra/clk-bpmp.c    |  21 ++-
 drivers/input/joystick/analog.c |  37 +++--
 include/linux/printbuf.h        | 164 +++++++++++++++++++
 include/linux/shrinker.h        |   8 +
 lib/Makefile                    |   4 +-
 lib/printbuf.c                  | 274 ++++++++++++++++++++++++++++++++
 mm/Makefile                     |   2 +-
 mm/memcontrol.c                 |  68 ++++----
 mm/oom_kill.c                   |  23 ---
 {lib => mm}/show_mem.c          |  14 ++
 mm/slab.h                       |   6 +-
 mm/slab_common.c                |  53 +++++-
 mm/vmscan.c                     |  88 ++++++++++
 13 files changed, 666 insertions(+), 96 deletions(-)
 create mode 100644 include/linux/printbuf.h
 create mode 100644 lib/printbuf.c
 rename {lib => mm}/show_mem.c (77%)

-- 
2.35.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings
  2022-04-21 23:48 [PATCH 0/4] Printbufs & shrinker OOM reporting Kent Overstreet
                   ` (4 preceding siblings ...)
  2022-04-21 23:48 ` [PATCH v2 0/8] Printbufs & improved shrinker debugging Kent Overstreet
@ 2022-04-21 23:48 ` Kent Overstreet
  2022-04-22  4:20   ` Christoph Hellwig
  2022-04-24 23:46   ` [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings Joe Perches
  2022-04-21 23:48 ` [PATCH v2 2/8] Input/joystick/analog: Convert from seq_buf -> printbuf Kent Overstreet
                   ` (7 subsequent siblings)
  13 siblings, 2 replies; 59+ messages in thread
From: Kent Overstreet @ 2022-04-21 23:48 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linux-fsdevel
  Cc: Kent Overstreet, hch, hannes, akpm, linux-clk, linux-tegra,
	linux-input, roman.gushchin

This adds printbufs: simple heap-allocated strings meant for building up
structured messages, for logging/procfs/sysfs and elsewhere. They've
been heavily used in bcachefs for writing .to_text() functions/methods -
pretty printers, which has in turn greatly improved the overall quality
of error messages.

Basic usage is documented in include/linux/printbuf.h.

The next patches in the series are going to be using printbufs to
implement a .to_text() method for shrinkers, and improving OOM
reporting.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
---
 include/linux/printbuf.h | 164 +++++++++++++++++++++++
 lib/Makefile             |   2 +-
 lib/printbuf.c           | 274 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 439 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/printbuf.h
 create mode 100644 lib/printbuf.c

diff --git a/include/linux/printbuf.h b/include/linux/printbuf.h
new file mode 100644
index 0000000000..276cdecf08
--- /dev/null
+++ b/include/linux/printbuf.h
@@ -0,0 +1,164 @@
+/* SPDX-License-Identifier: LGPL-2.1+ */
+/* Copyright (C) 2022 Kent Overstreet */
+
+#ifndef _LINUX_PRINTBUF_H
+#define _LINUX_PRINTBUF_H
+
+/*
+ * Printbufs: Simple heap allocated strings, with some features for structered
+ * formatting.
+ *
+ * This code has provisions for use in userspace, to aid in making other code
+ * portable between kernelspace and userspace.
+ *
+ * Basic example:
+ *
+ *   struct printbuf buf = PRINTBUF;
+ *
+ *   pr_buf(&buf, "foo=");
+ *   foo_to_text(&buf, foo);
+ *   printk("%s", buf.buf);
+ *   printbuf_exit(&buf);
+ *
+ * We can now write pretty printers instead of writing code that dumps
+ * everything to the kernel log buffer, and then those pretty-printers can be
+ * used by other code that outputs to kernel log, sysfs, debugfs, etc.
+ *
+ * Memory allocation: Outputing to a printbuf may allocate memory. This
+ * allocation is done with GFP_KERNEL, by default: use the newer
+ * memalloc_*_(save|restore) functions as needed.
+ *
+ * Since no equivalent yet exists for GFP_ATOMIC/GFP_NOWAIT, memory allocations
+ * will be done with GFP_ATOMIC if printbuf->atomic is nonzero.
+ *
+ * Memory allocation failures: We don't return errors directly, because on
+ * memory allocation failure we usually don't want to bail out and unwind - we
+ * want to print what we've got, on a best-effort basis. But code that does want
+ * to return -ENOMEM may check printbuf.allocation_failure.
+ *
+ * Indenting, tabstops:
+ *
+ * To aid is writing multi-line pretty printers spread across multiple
+ * functions, printbufs track the current indent level.
+ *
+ * pr_indent_push() and pr_indent_pop() increase and decrease the current indent
+ * level, respectively.
+ *
+ * To use tabstops, set printbuf->tabstops[]; they are in units of spaces, from
+ * start of line. Once set, pr_tab() will output spaces up to the next tabstop.
+ * pr_tab_rjust() will also advance the current line of text up to the next
+ * tabstop, but it does so by shifting text since the previous tabstop up to the
+ * next tabstop - right justifying it.
+ *
+ * Make sure you use pr_newline() instead of \n in the format string for indent
+ * level and tabstops to work corretly.
+ *
+ * Output units: printbuf->units exists to tell pretty-printers how to output
+ * numbers: a raw value (e.g. directly from a superblock field), as bytes, or as
+ * human readable bytes. pr_units() and pr_sectors() obey it.
+ *
+ * Other helpful functions:
+ *
+ * pr_human_readable_u64, pr_human_readable_s64: Print an integer with human
+ * readable units.
+ *
+ * pr_time(): for printing a time_t with strftime in userspace, prints as an
+ * integer number of seconds in the kernel.
+ *
+ * pr_string_option: Given an enumerated value and a string array with names for
+ * each option, prints out the enum names with the selected one indicated with
+ * square brackets.
+ *
+ * pr_bitflags: Given a bitflag and a string array with names for each bit,
+ * prints out the names of the selected bits.
+ */
+
+#include <linux/slab.h>
+#include <linux/string_helpers.h>
+
+enum printbuf_units {
+	PRINTBUF_UNITS_RAW,
+	PRINTBUF_UNITS_BYTES,
+	PRINTBUF_UNITS_HUMAN_READABLE,
+};
+
+struct printbuf {
+	char			*buf;
+	unsigned		size;
+	unsigned		pos;
+	unsigned		last_newline;
+	unsigned		last_field;
+	unsigned		indent;
+	enum printbuf_units	units:8;
+	/*
+	 * If nonzero, allocations will be done with GFP_ATOMIC:
+	 */
+	u8			atomic;
+	bool			allocation_failure:1;
+	/* SI units (10^3), or 2^10: */
+	enum string_size_units	human_readable_units:1;
+	u8			tabstop;
+	u8			tabstops[4];
+};
+
+#define PRINTBUF ((struct printbuf) { .human_readable_units = STRING_UNITS_2 })
+
+/**
+ * printbuf_exit - exit a printbuf, freeing memory it owns and poisoning it
+ * against accidental use.
+ */
+static inline void printbuf_exit(struct printbuf *buf)
+{
+	kfree(buf->buf);
+	buf->buf = ERR_PTR(-EINTR); /* poison value */
+}
+
+/**
+ * printbuf_reset - re-use a printbuf without freeing and re-initializing it:
+ */
+static inline void printbuf_reset(struct printbuf *buf)
+{
+	buf->pos		= 0;
+	buf->last_newline	= 0;
+	buf->last_field		= 0;
+	buf->indent		= 0;
+	buf->tabstop		= 0;
+	buf->allocation_failure	= 0;
+}
+
+/**
+ * printbuf_atomic_inc - mark as entering an atomic section
+ */
+static inline void printbuf_atomic_inc(struct printbuf *buf)
+{
+	buf->atomic++;
+}
+
+/**
+ * printbuf_atomic_inc - mark as leaving an atomic section
+ */
+static inline void printbuf_atomic_dec(struct printbuf *buf)
+{
+	buf->atomic--;
+}
+
+void pr_buf(struct printbuf *out, const char *fmt, ...)
+	__attribute__ ((format (printf, 2, 3)));
+
+void pr_char(struct printbuf *buf, char c);
+void pr_newline(struct printbuf *);
+void pr_indent_push(struct printbuf *, unsigned);
+void pr_indent_pop(struct printbuf *, unsigned);
+void pr_tab(struct printbuf *);
+void pr_tab_rjust(struct printbuf *);
+void pr_human_readable_u64(struct printbuf *, u64);
+void pr_human_readable_s64(struct printbuf *, s64);
+void pr_units(struct printbuf *, s64, s64);
+void pr_sectors(struct printbuf *, u64);
+void pr_time(struct printbuf *, u64);
+void pr_uuid(struct printbuf *, u8 *);
+void pr_string_option(struct printbuf *, const char * const list[], size_t);
+void pr_bitflags(struct printbuf *, const char * const list[], u64);
+const char *printbuf_str(const struct printbuf *);
+
+#endif /* _LINUX_PRINTBUF_H */
diff --git a/lib/Makefile b/lib/Makefile
index c588a126a3..31a3904eda 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -34,7 +34,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
 	 earlycpio.o seq_buf.o siphash.o dec_and_lock.o \
 	 nmi_backtrace.o nodemask.o win_minmax.o memcat_p.o \
-	 buildid.o
+	 buildid.o printbuf.o
 
 lib-$(CONFIG_PRINTK) += dump_stack.o
 lib-$(CONFIG_SMP) += cpumask.o
diff --git a/lib/printbuf.c b/lib/printbuf.c
new file mode 100644
index 0000000000..e0dfa82cda
--- /dev/null
+++ b/lib/printbuf.c
@@ -0,0 +1,274 @@
+// SPDX-License-Identifier: LGPL-2.1+
+/* Copyright (C) 2022 Kent Overstreet */
+
+#ifdef __KERNEL__
+#include <linux/export.h>
+#include <linux/kernel.h>
+#else
+#define EXPORT_SYMBOL(x)
+#endif
+
+#include <linux/log2.h>
+#include <linux/printbuf.h>
+#include <linux/string_helpers.h>
+
+static inline size_t printbuf_remaining(struct printbuf *buf)
+{
+	return buf->size - buf->pos;
+}
+
+static inline size_t printbuf_linelen(struct printbuf *buf)
+{
+	return buf->pos - buf->last_newline;
+}
+
+static int printbuf_realloc(struct printbuf *out, unsigned extra)
+{
+	unsigned new_size;
+	char *buf;
+
+	if (out->pos + extra + 1 < out->size)
+		return 0;
+
+	new_size = roundup_pow_of_two(out->size + extra);
+	buf = krealloc(out->buf, new_size, !out->atomic ? GFP_KERNEL : GFP_ATOMIC);
+
+	if (!buf) {
+		out->allocation_failure = true;
+		return -ENOMEM;
+	}
+
+	out->buf	= buf;
+	out->size	= new_size;
+	return 0;
+}
+
+void pr_buf(struct printbuf *out, const char *fmt, ...)
+{
+	va_list args;
+	int len;
+
+	do {
+		va_start(args, fmt);
+		len = vsnprintf(out->buf + out->pos, printbuf_remaining(out), fmt, args);
+		va_end(args);
+	} while (len + 1 >= printbuf_remaining(out) &&
+		 !printbuf_realloc(out, len + 1));
+
+	len = min_t(size_t, len,
+		  printbuf_remaining(out) ? printbuf_remaining(out) - 1 : 0);
+	out->pos += len;
+}
+EXPORT_SYMBOL(pr_buf);
+
+void pr_char(struct printbuf *buf, char c)
+{
+	if (!printbuf_realloc(buf, 1)) {
+		buf->buf[buf->pos++] = c;
+		buf->buf[buf->pos] = 0;
+	}
+}
+EXPORT_SYMBOL(pr_char);
+
+void pr_newline(struct printbuf *buf)
+{
+	unsigned i;
+
+	pr_char(buf, '\n');
+
+	buf->last_newline	= buf->pos;
+
+	for (i = 0; i < buf->indent; i++)
+		pr_char(buf, ' ');
+
+	buf->last_field		= buf->pos;
+	buf->tabstop = 0;
+}
+EXPORT_SYMBOL(pr_newline);
+
+void pr_indent_push(struct printbuf *buf, unsigned spaces)
+{
+	buf->indent += spaces;
+	while (spaces--)
+		pr_char(buf, ' ');
+}
+EXPORT_SYMBOL(pr_indent_push);
+
+void pr_indent_pop(struct printbuf *buf, unsigned spaces)
+{
+	if (buf->last_newline + buf->indent == buf->pos) {
+		buf->pos -= spaces;
+		buf->buf[buf->pos] = 0;
+	}
+	buf->indent -= spaces;
+}
+EXPORT_SYMBOL(pr_indent_pop);
+
+void pr_tab(struct printbuf *buf)
+{
+	BUG_ON(buf->tabstop > ARRAY_SIZE(buf->tabstops));
+
+	while (printbuf_remaining(buf) > 1 &&
+	       printbuf_linelen(buf) < buf->tabstops[buf->tabstop])
+		pr_char(buf, ' ');
+
+	buf->last_field = buf->pos;
+	buf->tabstop++;
+}
+EXPORT_SYMBOL(pr_tab);
+
+void pr_tab_rjust(struct printbuf *buf)
+{
+	BUG_ON(buf->tabstop > ARRAY_SIZE(buf->tabstops));
+
+	if (printbuf_linelen(buf) < buf->tabstops[buf->tabstop]) {
+		unsigned move = buf->pos - buf->last_field;
+		unsigned shift = buf->tabstops[buf->tabstop] -
+			printbuf_linelen(buf);
+
+		printbuf_realloc(buf, shift);
+
+		if (buf->last_field + shift + 1 < buf->size) {
+			move = min(move, buf->size - 1 - buf->last_field - shift);
+
+			memmove(buf->buf + buf->last_field + shift,
+				buf->buf + buf->last_field,
+				move);
+			memset(buf->buf + buf->last_field, ' ', shift);
+			buf->pos += shift;
+			buf->buf[buf->pos] = 0;
+		}
+	}
+
+	buf->last_field = buf->pos;
+	buf->tabstop++;
+}
+EXPORT_SYMBOL(pr_tab_rjust);
+
+void pr_human_readable_u64(struct printbuf *buf, u64 v)
+{
+	printbuf_realloc(buf, 10);
+	string_get_size(v, 1, buf->human_readable_units, buf->buf + buf->pos,
+			printbuf_remaining(buf));
+	buf->pos += strlen(buf->buf + buf->pos);
+}
+EXPORT_SYMBOL(pr_human_readable_u64);
+
+void pr_human_readable_s64(struct printbuf *buf, s64 v)
+{
+	if (v < 0)
+		pr_char(buf, '-');
+	pr_human_readable_u64(buf, abs(v));
+}
+EXPORT_SYMBOL(pr_human_readable_s64);
+
+void pr_units(struct printbuf *out, s64 raw, s64 bytes)
+{
+	switch (out->units) {
+	case PRINTBUF_UNITS_RAW:
+		pr_buf(out, "%llu", raw);
+		break;
+	case PRINTBUF_UNITS_BYTES:
+		pr_buf(out, "%llu", bytes);
+		break;
+	case PRINTBUF_UNITS_HUMAN_READABLE:
+		pr_human_readable_s64(out, bytes);
+		break;
+	}
+}
+EXPORT_SYMBOL(pr_units);
+
+void pr_sectors(struct printbuf *out, u64 v)
+{
+	pr_units(out, v, v << 9);
+}
+EXPORT_SYMBOL(pr_sectors);
+
+#ifdef __KERNEL__
+
+void pr_time(struct printbuf *out, u64 time)
+{
+	pr_buf(out, "%llu", time);
+}
+EXPORT_SYMBOL(pr_time);
+
+void pr_uuid(struct printbuf *out, u8 *uuid)
+{
+	pr_buf(out, "%pUb", uuid);
+}
+EXPORT_SYMBOL(pr_uuid);
+
+#else
+
+#include <time.h>
+#include <uuid.h>
+
+void pr_time(struct printbuf *out, u64 _time)
+{
+	char time_str[64];
+	time_t time = _time;
+	struct tm *tm = localtime(&time);
+	size_t err = strftime(time_str, sizeof(time_str), "%c", tm);
+
+	if (!err)
+		pr_buf(out, "(formatting error)");
+	else
+		pr_buf(out, "%s", time_str);
+}
+
+void pr_uuid(struct printbuf *out, u8 *uuid)
+{
+	char uuid_str[40];
+
+	uuid_unparse_lower(uuid, uuid_str);
+	pr_buf(out, uuid_str);
+}
+
+#endif
+
+void pr_string_option(struct printbuf *out,
+		      const char * const list[],
+		      size_t selected)
+{
+	size_t i;
+
+	for (i = 0; list[i]; i++)
+		pr_buf(out, i == selected ? "[%s] " : "%s ", list[i]);
+}
+EXPORT_SYMBOL(pr_string_option);
+
+void pr_bitflags(struct printbuf *out,
+		 const char * const list[], u64 flags)
+{
+	unsigned bit, nr = 0;
+	bool first = true;
+
+	while (list[nr])
+		nr++;
+
+	while (flags && (bit = __ffs(flags)) < nr) {
+		if (!first)
+			pr_buf(out, ",");
+		first = false;
+		pr_buf(out, "%s", list[bit]);
+		flags ^= 1 << bit;
+	}
+}
+EXPORT_SYMBOL(pr_bitflags);
+
+/**
+ * printbuf_str - returns printbuf's buf as a C string, guaranteed to be null
+ * terminated
+ */
+const char *printbuf_str(const struct printbuf *buf)
+{
+	/*
+	 * If we've written to a printbuf then it's guaranteed to be a null
+	 * terminated string - but if we haven't, then we might not have
+	 * allocated a buffer at all:
+	 */
+	return buf->pos
+		? buf->buf
+		: "";
+}
+EXPORT_SYMBOL(printbuf_str);
-- 
2.35.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v2 2/8] Input/joystick/analog: Convert from seq_buf -> printbuf
  2022-04-21 23:48 [PATCH 0/4] Printbufs & shrinker OOM reporting Kent Overstreet
                   ` (5 preceding siblings ...)
  2022-04-21 23:48 ` [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings Kent Overstreet
@ 2022-04-21 23:48 ` Kent Overstreet
  2022-04-21 23:48 ` [PATCH v2 3/8] mm/memcontrol.c: Convert to printbuf Kent Overstreet
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 59+ messages in thread
From: Kent Overstreet @ 2022-04-21 23:48 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linux-fsdevel
  Cc: Kent Overstreet, hch, hannes, akpm, linux-clk, linux-tegra,
	linux-input, roman.gushchin

seq_buf is being deprecated, this converts to printbuf which is similar
but heap allocates the string buffer.

This means we have to consider memory allocation context & failure: Here
we're in device initialization so GFP_KERNEL should be fine, and also as
we're in device initialization returning -ENOMEM is fine.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
---
 drivers/input/joystick/analog.c | 37 ++++++++++++++++++++-------------
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/drivers/input/joystick/analog.c b/drivers/input/joystick/analog.c
index 3088c5b829..72e1e30d19 100644
--- a/drivers/input/joystick/analog.c
+++ b/drivers/input/joystick/analog.c
@@ -19,7 +19,7 @@
 #include <linux/input.h>
 #include <linux/gameport.h>
 #include <linux/jiffies.h>
-#include <linux/seq_buf.h>
+#include <linux/printbuf.h>
 #include <linux/timex.h>
 #include <linux/timekeeping.h>
 
@@ -337,26 +337,32 @@ static void analog_calibrate_timer(struct analog_port *port)
  * analog_name() constructs a name for an analog joystick.
  */
 
-static void analog_name(struct analog *analog)
+static int analog_name(struct analog *analog)
 {
-	struct seq_buf s;
+	struct printbuf buf = PRINTBUF;
+	int ret = 0;
 
-	seq_buf_init(&s, analog->name, sizeof(analog->name));
-	seq_buf_printf(&s, "Analog %d-axis %d-button",
-		 hweight8(analog->mask & ANALOG_AXES_STD),
-		 hweight8(analog->mask & ANALOG_BTNS_STD) + !!(analog->mask & ANALOG_BTNS_CHF) * 2 +
-		 hweight16(analog->mask & ANALOG_BTNS_GAMEPAD) + !!(analog->mask & ANALOG_HBTN_CHF) * 4);
+	pr_buf(&buf, "Analog %d-axis %d-button",
+	       hweight8(analog->mask & ANALOG_AXES_STD),
+	       hweight8(analog->mask & ANALOG_BTNS_STD) + !!(analog->mask & ANALOG_BTNS_CHF) * 2 +
+	       hweight16(analog->mask & ANALOG_BTNS_GAMEPAD) + !!(analog->mask & ANALOG_HBTN_CHF) * 4);
 
 	if (analog->mask & ANALOG_HATS_ALL)
-		seq_buf_printf(&s, " %d-hat",
-			       hweight16(analog->mask & ANALOG_HATS_ALL));
+		pr_buf(&buf, " %d-hat",
+		       hweight16(analog->mask & ANALOG_HATS_ALL));
 
 	if (analog->mask & ANALOG_HAT_FCS)
-		seq_buf_printf(&s, " FCS");
+		pr_buf(&buf, " FCS");
 	if (analog->mask & ANALOG_ANY_CHF)
-		seq_buf_printf(&s, (analog->mask & ANALOG_SAITEK) ? " Saitek" : " CHF");
+		pr_buf(&buf, (analog->mask & ANALOG_SAITEK) ? " Saitek" : " CHF");
 
-	seq_buf_printf(&s, (analog->mask & ANALOG_GAMEPAD) ? " gamepad" : " joystick");
+	pr_buf(&buf, (analog->mask & ANALOG_GAMEPAD) ? " gamepad" : " joystick");
+
+	ret = buf.allocation_failure ? -ENOMEM : 0;
+	if (!ret)
+		strlcpy(analog->name, buf.buf, sizeof(analog->name));
+	printbuf_exit(&buf);
+	return ret;
 }
 
 /*
@@ -369,7 +375,10 @@ static int analog_init_device(struct analog_port *port, struct analog *analog, i
 	int i, j, t, v, w, x, y, z;
 	int error;
 
-	analog_name(analog);
+	error = analog_name(analog);
+	if (error)
+		return error;
+
 	snprintf(analog->phys, sizeof(analog->phys),
 		 "%s/input%d", port->gameport->phys, index);
 	analog->buttons = (analog->mask & ANALOG_GAMEPAD) ? analog_pad_btn : analog_joy_btn;
-- 
2.35.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v2 3/8] mm/memcontrol.c: Convert to printbuf
  2022-04-21 23:48 [PATCH 0/4] Printbufs & shrinker OOM reporting Kent Overstreet
                   ` (6 preceding siblings ...)
  2022-04-21 23:48 ` [PATCH v2 2/8] Input/joystick/analog: Convert from seq_buf -> printbuf Kent Overstreet
@ 2022-04-21 23:48 ` Kent Overstreet
  2022-04-22 12:28   ` Michal Hocko
  2022-04-21 23:48 ` [PATCH v2 4/8] clk: tegra: bpmp: " Kent Overstreet
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 59+ messages in thread
From: Kent Overstreet @ 2022-04-21 23:48 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linux-fsdevel
  Cc: Kent Overstreet, hch, hannes, akpm, linux-clk, linux-tegra,
	linux-input, roman.gushchin

This converts memory_stat_format() from seq_buf to printbuf. Printbuf is
simalar to seq_buf except that it heap allocates the string buffer:
here, we were already heap allocating the buffer with kmalloc() so the
conversion is trivial.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
---
 mm/memcontrol.c | 68 ++++++++++++++++++++++++-------------------------
 1 file changed, 33 insertions(+), 35 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 36e9f38c91..4cb0b7bc1c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -61,7 +61,7 @@
 #include <linux/file.h>
 #include <linux/tracehook.h>
 #include <linux/psi.h>
-#include <linux/seq_buf.h>
+#include <linux/printbuf.h>
 #include "internal.h"
 #include <net/sock.h>
 #include <net/ip.h>
@@ -1436,13 +1436,9 @@ static inline unsigned long memcg_page_state_output(struct mem_cgroup *memcg,
 
 static char *memory_stat_format(struct mem_cgroup *memcg)
 {
-	struct seq_buf s;
+	struct printbuf buf = PRINTBUF;
 	int i;
 
-	seq_buf_init(&s, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE);
-	if (!s.buffer)
-		return NULL;
-
 	/*
 	 * Provide statistics on the state of the memory subsystem as
 	 * well as cumulative event counters that show past behavior.
@@ -1459,49 +1455,51 @@ static char *memory_stat_format(struct mem_cgroup *memcg)
 		u64 size;
 
 		size = memcg_page_state_output(memcg, memory_stats[i].idx);
-		seq_buf_printf(&s, "%s %llu\n", memory_stats[i].name, size);
+		pr_buf(&buf, "%s %llu\n", memory_stats[i].name, size);
 
 		if (unlikely(memory_stats[i].idx == NR_SLAB_UNRECLAIMABLE_B)) {
 			size += memcg_page_state_output(memcg,
 							NR_SLAB_RECLAIMABLE_B);
-			seq_buf_printf(&s, "slab %llu\n", size);
+			pr_buf(&buf, "slab %llu\n", size);
 		}
 	}
 
 	/* Accumulated memory events */
 
-	seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGFAULT),
-		       memcg_events(memcg, PGFAULT));
-	seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGMAJFAULT),
-		       memcg_events(memcg, PGMAJFAULT));
-	seq_buf_printf(&s, "%s %lu\n",  vm_event_name(PGREFILL),
-		       memcg_events(memcg, PGREFILL));
-	seq_buf_printf(&s, "pgscan %lu\n",
-		       memcg_events(memcg, PGSCAN_KSWAPD) +
-		       memcg_events(memcg, PGSCAN_DIRECT));
-	seq_buf_printf(&s, "pgsteal %lu\n",
-		       memcg_events(memcg, PGSTEAL_KSWAPD) +
-		       memcg_events(memcg, PGSTEAL_DIRECT));
-	seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGACTIVATE),
-		       memcg_events(memcg, PGACTIVATE));
-	seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGDEACTIVATE),
-		       memcg_events(memcg, PGDEACTIVATE));
-	seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGLAZYFREE),
-		       memcg_events(memcg, PGLAZYFREE));
-	seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGLAZYFREED),
-		       memcg_events(memcg, PGLAZYFREED));
+	pr_buf(&buf, "%s %lu\n", vm_event_name(PGFAULT),
+	       memcg_events(memcg, PGFAULT));
+	pr_buf(&buf, "%s %lu\n", vm_event_name(PGMAJFAULT),
+	       memcg_events(memcg, PGMAJFAULT));
+	pr_buf(&buf, "%s %lu\n",  vm_event_name(PGREFILL),
+	       memcg_events(memcg, PGREFILL));
+	pr_buf(&buf, "pgscan %lu\n",
+	       memcg_events(memcg, PGSCAN_KSWAPD) +
+	       memcg_events(memcg, PGSCAN_DIRECT));
+	pr_buf(&buf, "pgsteal %lu\n",
+	       memcg_events(memcg, PGSTEAL_KSWAPD) +
+	       memcg_events(memcg, PGSTEAL_DIRECT));
+	pr_buf(&buf, "%s %lu\n", vm_event_name(PGACTIVATE),
+	       memcg_events(memcg, PGACTIVATE));
+	pr_buf(&buf, "%s %lu\n", vm_event_name(PGDEACTIVATE),
+	       memcg_events(memcg, PGDEACTIVATE));
+	pr_buf(&buf, "%s %lu\n", vm_event_name(PGLAZYFREE),
+	       memcg_events(memcg, PGLAZYFREE));
+	pr_buf(&buf, "%s %lu\n", vm_event_name(PGLAZYFREED),
+	       memcg_events(memcg, PGLAZYFREED));
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-	seq_buf_printf(&s, "%s %lu\n", vm_event_name(THP_FAULT_ALLOC),
-		       memcg_events(memcg, THP_FAULT_ALLOC));
-	seq_buf_printf(&s, "%s %lu\n", vm_event_name(THP_COLLAPSE_ALLOC),
-		       memcg_events(memcg, THP_COLLAPSE_ALLOC));
+	pr_buf(&buf, "%s %lu\n", vm_event_name(THP_FAULT_ALLOC),
+	       memcg_events(memcg, THP_FAULT_ALLOC));
+	pr_buf(&buf, "%s %lu\n", vm_event_name(THP_COLLAPSE_ALLOC),
+	       memcg_events(memcg, THP_COLLAPSE_ALLOC));
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
-	/* The above should easily fit into one page */
-	WARN_ON_ONCE(seq_buf_has_overflowed(&s));
+	if (buf.allocation_failure) {
+		printbuf_exit(&buf);
+		return NULL;
+	}
 
-	return s.buffer;
+	return buf.buf;
 }
 
 #define K(x) ((x) << (PAGE_SHIFT-10))
-- 
2.35.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v2 4/8] clk: tegra: bpmp: Convert to printbuf
  2022-04-21 23:48 [PATCH 0/4] Printbufs & shrinker OOM reporting Kent Overstreet
                   ` (7 preceding siblings ...)
  2022-04-21 23:48 ` [PATCH v2 3/8] mm/memcontrol.c: Convert to printbuf Kent Overstreet
@ 2022-04-21 23:48 ` Kent Overstreet
  2022-04-21 23:48 ` [PATCH v2 5/8] mm: Add a .to_text() method for shrinkers Kent Overstreet
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 59+ messages in thread
From: Kent Overstreet @ 2022-04-21 23:48 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linux-fsdevel
  Cc: Kent Overstreet, hch, hannes, akpm, linux-clk, linux-tegra,
	linux-input, roman.gushchin

This converts from seq_buf to printbuf, which is similar but heap
allocates the string buffer.

Previously in this code the string buffer was allocated on the stack;
this means we've added a new potential memory allocation failure. This
is fine though since it's only for a dev_printk() message.

Memory allocation context: printbuf doesn't take gfp flags, instead we
prefer the new memalloc_no*_(save|restore) interfaces to be used. Here
the surrounding code is already allocating with GFP_KERNEL, so
everything is fine.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
---
 drivers/clk/tegra/clk-bpmp.c | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/clk/tegra/clk-bpmp.c b/drivers/clk/tegra/clk-bpmp.c
index 6ecf18f71c..77a8c47806 100644
--- a/drivers/clk/tegra/clk-bpmp.c
+++ b/drivers/clk/tegra/clk-bpmp.c
@@ -5,7 +5,7 @@
 
 #include <linux/clk-provider.h>
 #include <linux/device.h>
-#include <linux/seq_buf.h>
+#include <linux/printbuf.h>
 #include <linux/slab.h>
 
 #include <soc/tegra/bpmp.h>
@@ -360,39 +360,38 @@ static void tegra_bpmp_clk_info_dump(struct tegra_bpmp *bpmp,
 				     const struct tegra_bpmp_clk_info *info)
 {
 	const char *prefix = "";
-	struct seq_buf buf;
+	struct printbuf buf = PRINTBUF;
 	unsigned int i;
-	char flags[64];
-
-	seq_buf_init(&buf, flags, sizeof(flags));
 
 	if (info->flags)
-		seq_buf_printf(&buf, "(");
+		pr_buf(&buf, "(");
 
 	if (info->flags & TEGRA_BPMP_CLK_HAS_MUX) {
-		seq_buf_printf(&buf, "%smux", prefix);
+		pr_buf(&buf, "%smux", prefix);
 		prefix = ", ";
 	}
 
 	if ((info->flags & TEGRA_BPMP_CLK_HAS_SET_RATE) == 0) {
-		seq_buf_printf(&buf, "%sfixed", prefix);
+		pr_buf(&buf, "%sfixed", prefix);
 		prefix = ", ";
 	}
 
 	if (info->flags & TEGRA_BPMP_CLK_IS_ROOT) {
-		seq_buf_printf(&buf, "%sroot", prefix);
+		pr_buf(&buf, "%sroot", prefix);
 		prefix = ", ";
 	}
 
 	if (info->flags)
-		seq_buf_printf(&buf, ")");
+		pr_buf(&buf, ")");
 
 	dev_printk(level, bpmp->dev, "%03u: %s\n", info->id, info->name);
-	dev_printk(level, bpmp->dev, "  flags: %lx %s\n", info->flags, flags);
+	dev_printk(level, bpmp->dev, "  flags: %lx %s\n", info->flags, printbuf_str(&buf));
 	dev_printk(level, bpmp->dev, "  parents: %u\n", info->num_parents);
 
 	for (i = 0; i < info->num_parents; i++)
 		dev_printk(level, bpmp->dev, "    %03u\n", info->parents[i]);
+
+	printbuf_exit(&buf);
 }
 
 static int tegra_bpmp_probe_clocks(struct tegra_bpmp *bpmp,
-- 
2.35.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v2 5/8] mm: Add a .to_text() method for shrinkers
  2022-04-21 23:48 [PATCH 0/4] Printbufs & shrinker OOM reporting Kent Overstreet
                   ` (8 preceding siblings ...)
  2022-04-21 23:48 ` [PATCH v2 4/8] clk: tegra: bpmp: " Kent Overstreet
@ 2022-04-21 23:48 ` Kent Overstreet
  2022-04-21 23:48 ` [PATCH v2 6/8] mm: Count requests to free & nr freed per shrinker Kent Overstreet
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 59+ messages in thread
From: Kent Overstreet @ 2022-04-21 23:48 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linux-fsdevel
  Cc: Kent Overstreet, hch, hannes, akpm, linux-clk, linux-tegra,
	linux-input, roman.gushchin

This adds a new callback method to shrinkers which they can use to
describe anything relevant to memory reclaim about their internal state,
for example object dirtyness.

This uses the new printbufs to output to heap allocated strings, so that
the .to_text() methods can be used both for messages logged to the
console, and also sysfs/debugfs.

This patch also adds shrinkers_to_text(), which reports on the top 10
shrinkers - by object count - in sorted order, to be used in OOM
reporting.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
---
 include/linux/shrinker.h |  5 +++
 mm/vmscan.c              | 78 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+)

diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 76fbf92b04..b5f411768b 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -2,6 +2,8 @@
 #ifndef _LINUX_SHRINKER_H
 #define _LINUX_SHRINKER_H
 
+struct printbuf;
+
 /*
  * This struct is used to pass information from page reclaim to the shrinkers.
  * We consolidate the values for easier extension later.
@@ -58,10 +60,12 @@ struct shrink_control {
  * @flags determine the shrinker abilities, like numa awareness
  */
 struct shrinker {
+	char name[32];
 	unsigned long (*count_objects)(struct shrinker *,
 				       struct shrink_control *sc);
 	unsigned long (*scan_objects)(struct shrinker *,
 				      struct shrink_control *sc);
+	void (*to_text)(struct printbuf *, struct shrinker *);
 
 	long batch;	/* reclaim batch size, 0 = default */
 	int seeks;	/* seeks to recreate an obj */
@@ -94,4 +98,5 @@ extern int register_shrinker(struct shrinker *shrinker);
 extern void unregister_shrinker(struct shrinker *shrinker);
 extern void free_prealloced_shrinker(struct shrinker *shrinker);
 extern void synchronize_shrinkers(void);
+void shrinkers_to_text(struct printbuf *);
 #endif
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 59b14e0d69..905bc23800 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -50,6 +50,7 @@
 #include <linux/printk.h>
 #include <linux/dax.h>
 #include <linux/psi.h>
+#include <linux/printbuf.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -702,6 +703,83 @@ void synchronize_shrinkers(void)
 }
 EXPORT_SYMBOL(synchronize_shrinkers);
 
+void shrinker_to_text(struct printbuf *out, struct shrinker *shrinker)
+{
+	struct shrink_control sc = { .gfp_mask = GFP_KERNEL, };
+
+	if (shrinker->name[0])
+		pr_buf(out, "%s", shrinker->name);
+	else
+		pr_buf(out, "%ps:", shrinker->scan_objects);
+
+	pr_buf(out, " objects: %lu", shrinker->count_objects(shrinker, &sc));
+	pr_newline(out);
+
+	if (shrinker->to_text) {
+		pr_indent_push(out, 2);
+		shrinker->to_text(out, shrinker);
+		pr_indent_pop(out, 2);
+		pr_newline(out);
+	}
+}
+
+/**
+ * shrinkers_to_text - Report on shrinkers with highest usage
+ *
+ * This reports on the top 10 shrinkers, by object counts, in sorted order:
+ * intended to be used for OOM reporting.
+ */
+void shrinkers_to_text(struct printbuf *out)
+{
+	struct shrinker *shrinker;
+	struct shrinker_by_mem {
+		struct shrinker	*shrinker;
+		unsigned long	mem;
+	} shrinkers_by_mem[10];
+	int i, nr = 0;
+
+	if (!down_read_trylock(&shrinker_rwsem)) {
+		pr_buf(out, "(couldn't take shrinker lock)");
+		return;
+	}
+
+	list_for_each_entry(shrinker, &shrinker_list, list) {
+		struct shrink_control sc = { .gfp_mask = GFP_KERNEL, };
+		unsigned long mem = shrinker->count_objects(shrinker, &sc);
+
+		if (!mem || mem == SHRINK_STOP || mem == SHRINK_EMPTY)
+			continue;
+
+		for (i = 0; i < nr; i++)
+			if (mem < shrinkers_by_mem[i].mem)
+				break;
+
+		if (nr < ARRAY_SIZE(shrinkers_by_mem)) {
+			memmove(&shrinkers_by_mem[i + 1],
+				&shrinkers_by_mem[i],
+				sizeof(shrinkers_by_mem[0]) * (nr - i));
+			nr++;
+		} else if (i) {
+			i--;
+			memmove(&shrinkers_by_mem[0],
+				&shrinkers_by_mem[1],
+				sizeof(shrinkers_by_mem[0]) * i);
+		} else {
+			continue;
+		}
+
+		shrinkers_by_mem[i] = (struct shrinker_by_mem) {
+			.shrinker = shrinker,
+			.mem = mem,
+		};
+	}
+
+	for (i = nr - 1; i >= 0; --i)
+		shrinker_to_text(out, shrinkers_by_mem[i].shrinker);
+
+	up_read(&shrinker_rwsem);
+}
+
 #define SHRINK_BATCH 128
 
 static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
-- 
2.35.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v2 6/8] mm: Count requests to free & nr freed per shrinker
  2022-04-21 23:48 [PATCH 0/4] Printbufs & shrinker OOM reporting Kent Overstreet
                   ` (9 preceding siblings ...)
  2022-04-21 23:48 ` [PATCH v2 5/8] mm: Add a .to_text() method for shrinkers Kent Overstreet
@ 2022-04-21 23:48 ` Kent Overstreet
  2022-04-21 23:48 ` [PATCH v2 7/8] mm: Move lib/show_mem.c to mm/ Kent Overstreet
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 59+ messages in thread
From: Kent Overstreet @ 2022-04-21 23:48 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linux-fsdevel
  Cc: Kent Overstreet, hch, hannes, akpm, linux-clk, linux-tegra,
	linux-input, roman.gushchin

The next step in this patch series for improving debugging of shrinker
related issues: keep counts of number of objects we request to free vs.
actually freed, and prints them in shrinker_to_text().

Shrinkers won't necessarily free all objects requested for a variety of
reasons, but if the two counts are wildly different something is likely
amiss.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
---
 include/linux/shrinker.h |  3 +++
 mm/vmscan.c              | 16 +++++++++++++---
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index b5f411768b..12967748f9 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -79,6 +79,9 @@ struct shrinker {
 #endif
 	/* objs pending delete, per node */
 	atomic_long_t *nr_deferred;
+
+	atomic_long_t objects_requested_to_free;
+	atomic_long_t objects_freed;
 };
 #define DEFAULT_SEEKS 2 /* A good number if you don't know better. */
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 905bc23800..12562546a7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -711,16 +711,22 @@ void shrinker_to_text(struct printbuf *out, struct shrinker *shrinker)
 		pr_buf(out, "%s", shrinker->name);
 	else
 		pr_buf(out, "%ps:", shrinker->scan_objects);
+	pr_newline(out);
+	pr_indent_push(out, 2);
 
-	pr_buf(out, " objects: %lu", shrinker->count_objects(shrinker, &sc));
+	pr_buf(out, "objects:           %lu", shrinker->count_objects(shrinker, &sc));
+	pr_newline(out);
+	pr_buf(out, "requested to free: %lu", atomic_long_read(&shrinker->objects_requested_to_free));
+	pr_newline(out);
+	pr_buf(out, "objects freed:     %lu", atomic_long_read(&shrinker->objects_freed));
 	pr_newline(out);
 
 	if (shrinker->to_text) {
-		pr_indent_push(out, 2);
 		shrinker->to_text(out, shrinker);
-		pr_indent_pop(out, 2);
 		pr_newline(out);
 	}
+
+	pr_indent_pop(out, 2);
 }
 
 /**
@@ -846,12 +852,16 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
 		unsigned long ret;
 		unsigned long nr_to_scan = min(batch_size, total_scan);
 
+		atomic_long_add(nr_to_scan, &shrinker->objects_requested_to_free);
+
 		shrinkctl->nr_to_scan = nr_to_scan;
 		shrinkctl->nr_scanned = nr_to_scan;
 		ret = shrinker->scan_objects(shrinker, shrinkctl);
 		if (ret == SHRINK_STOP)
 			break;
+
 		freed += ret;
+		atomic_long_add(ret, &shrinker->objects_freed);
 
 		count_vm_events(SLABS_SCANNED, shrinkctl->nr_scanned);
 		total_scan -= shrinkctl->nr_scanned;
-- 
2.35.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v2 7/8] mm: Move lib/show_mem.c to mm/
  2022-04-21 23:48 [PATCH 0/4] Printbufs & shrinker OOM reporting Kent Overstreet
                   ` (10 preceding siblings ...)
  2022-04-21 23:48 ` [PATCH v2 6/8] mm: Count requests to free & nr freed per shrinker Kent Overstreet
@ 2022-04-21 23:48 ` Kent Overstreet
  2022-04-22 12:32   ` Michal Hocko
  2022-04-21 23:48 ` [PATCH v2 8/8] mm: Centralize & improve oom reporting in show_mem.c Kent Overstreet
  2022-04-30  4:00   ` Dave Young
  13 siblings, 1 reply; 59+ messages in thread
From: Kent Overstreet @ 2022-04-21 23:48 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linux-fsdevel
  Cc: Kent Overstreet, hch, hannes, akpm, linux-clk, linux-tegra,
	linux-input, roman.gushchin

show_mem.c is really mm specific, and the next patch in the series is
going to require mm/slab.h, so let's move it before doing more work on
it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
---
 lib/Makefile           | 2 +-
 mm/Makefile            | 2 +-
 {lib => mm}/show_mem.c | 0
 3 files changed, 2 insertions(+), 2 deletions(-)
 rename {lib => mm}/show_mem.c (100%)

diff --git a/lib/Makefile b/lib/Makefile
index 31a3904eda..c5041d33d0 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -30,7 +30,7 @@ endif
 lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 rbtree.o radix-tree.o timerqueue.o xarray.o \
 	 idr.o extable.o sha1.o irq_regs.o argv_split.o \
-	 flex_proportions.o ratelimit.o show_mem.o \
+	 flex_proportions.o ratelimit.o \
 	 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
 	 earlycpio.o seq_buf.o siphash.o dec_and_lock.o \
 	 nmi_backtrace.o nodemask.o win_minmax.o memcat_p.o \
diff --git a/mm/Makefile b/mm/Makefile
index 70d4309c9c..97c0be12f3 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -54,7 +54,7 @@ obj-y			:= filemap.o mempool.o oom_kill.o fadvise.o \
 			   mm_init.o percpu.o slab_common.o \
 			   compaction.o vmacache.o \
 			   interval_tree.o list_lru.o workingset.o \
-			   debug.o gup.o mmap_lock.o $(mmu-y)
+			   debug.o gup.o mmap_lock.o show_mem.o $(mmu-y)
 
 # Give 'page_alloc' its own module-parameter namespace
 page-alloc-y := page_alloc.o
diff --git a/lib/show_mem.c b/mm/show_mem.c
similarity index 100%
rename from lib/show_mem.c
rename to mm/show_mem.c
-- 
2.35.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v2 8/8] mm: Centralize & improve oom reporting in show_mem.c
  2022-04-21 23:48 [PATCH 0/4] Printbufs & shrinker OOM reporting Kent Overstreet
                   ` (11 preceding siblings ...)
  2022-04-21 23:48 ` [PATCH v2 7/8] mm: Move lib/show_mem.c to mm/ Kent Overstreet
@ 2022-04-21 23:48 ` Kent Overstreet
  2022-04-22 12:58   ` Michal Hocko
  2022-04-30  4:00   ` Dave Young
  13 siblings, 1 reply; 59+ messages in thread
From: Kent Overstreet @ 2022-04-21 23:48 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linux-fsdevel
  Cc: Kent Overstreet, hch, hannes, akpm, linux-clk, linux-tegra,
	linux-input, roman.gushchin

This patch:
 - Changes show_mem() to always report on slab usage
 - Instead of reporting on all slabs, we only report on top 10 slabs,
   and in sorted order
 - Also reports on shrinkers, with the new shrinkers_to_text().
   Shrinkers need to be included in OOM/allocation failure reporting
   because they're responsible for memory reclaim - if a shrinker isn't
   giving up its memory, we need to know which one and why.

More OOM reporting can be moved to show_mem.c and improved, this patch
is only a start.

New example output on OOM/memory allocation failure:

00177 Mem-Info:
00177 active_anon:13706 inactive_anon:32266 isolated_anon:16
00177  active_file:1653 inactive_file:1822 isolated_file:0
00177  unevictable:0 dirty:0 writeback:0
00177  slab_reclaimable:6242 slab_unreclaimable:11168
00177  mapped:3824 shmem:3 pagetables:1266 bounce:0
00177  kernel_misc_reclaimable:0
00177  free:4362 free_pcp:35 free_cma:0
00177 Node 0 active_anon:54824kB inactive_anon:129064kB active_file:6612kB inactive_file:7288kB unevictable:0kB isolated(anon):64kB isolated(file):0kB mapped:15296kB dirty:0kB writeback:0kB shmem:12kB writeback_tmp:0kB kernel_stack:3392kB pagetables:5064kB all_unreclaimable? no
00177 DMA free:2232kB boost:0kB min:88kB low:108kB high:128kB reserved_highatomic:0KB active_anon:2924kB inactive_anon:6596kB active_file:428kB inactive_file:384kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
00177 lowmem_reserve[]: 0 426 426 426
00177 DMA32 free:15092kB boost:5836kB min:8432kB low:9080kB high:9728kB reserved_highatomic:0KB active_anon:52196kB inactive_anon:122392kB active_file:6176kB inactive_file:7068kB unevictable:0kB writepending:0kB present:507760kB managed:441816kB mlocked:0kB bounce:0kB free_pcp:72kB local_pcp:0kB free_cma:0kB
00177 lowmem_reserve[]: 0 0 0 0
00177 DMA: 284*4kB (UM) 53*8kB (UM) 21*16kB (U) 11*32kB (U) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2248kB
00177 DMA32: 2765*4kB (UME) 375*8kB (UME) 57*16kB (UM) 5*32kB (U) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15132kB
00177 4656 total pagecache pages
00177 1031 pages in swap cache
00177 Swap cache stats: add 6572399, delete 6572173, find 488603/3286476
00177 Free swap  = 509112kB
00177 Total swap = 2097148kB
00177 130938 pages RAM
00177 0 pages HighMem/MovableOnly
00177 16644 pages reserved
00177 Unreclaimable slab info:
00177 9p-fcall-cache    total: 8.25 MiB active: 8.25 MiB
00177 kernfs_node_cache total: 2.15 MiB active: 2.15 MiB
00177 kmalloc-64        total: 2.08 MiB active: 2.07 MiB
00177 task_struct       total: 1.95 MiB active: 1.95 MiB
00177 kmalloc-4k        total: 1.50 MiB active: 1.50 MiB
00177 signal_cache      total: 1.34 MiB active: 1.34 MiB
00177 kmalloc-2k        total: 1.16 MiB active: 1.16 MiB
00177 bch_inode_info    total: 1.02 MiB active: 922 KiB
00177 perf_event        total: 1.02 MiB active: 1.02 MiB
00177 biovec-max        total: 992 KiB active: 960 KiB
00177 Shrinkers:
00177 super_cache_scan: objects: 127
00177 super_cache_scan: objects: 106
00177 jbd2_journal_shrink_scan: objects: 32
00177 ext4_es_scan: objects: 32
00177 bch2_btree_cache_scan: objects: 8
00177   nr nodes:          24
00177   nr dirty:          0
00177   cannibalize lock:  0000000000000000
00177
00177 super_cache_scan: objects: 8
00177 super_cache_scan: objects: 1

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
---
 mm/oom_kill.c    | 23 ---------------------
 mm/show_mem.c    | 14 +++++++++++++
 mm/slab.h        |  6 ++++--
 mm/slab_common.c | 53 ++++++++++++++++++++++++++++++++++++++++--------
 4 files changed, 63 insertions(+), 33 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 832fb33037..659c7d6376 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -171,27 +171,6 @@ static bool oom_unkillable_task(struct task_struct *p)
 	return false;
 }
 
-/*
- * Check whether unreclaimable slab amount is greater than
- * all user memory(LRU pages).
- * dump_unreclaimable_slab() could help in the case that
- * oom due to too much unreclaimable slab used by kernel.
-*/
-static bool should_dump_unreclaim_slab(void)
-{
-	unsigned long nr_lru;
-
-	nr_lru = global_node_page_state(NR_ACTIVE_ANON) +
-		 global_node_page_state(NR_INACTIVE_ANON) +
-		 global_node_page_state(NR_ACTIVE_FILE) +
-		 global_node_page_state(NR_INACTIVE_FILE) +
-		 global_node_page_state(NR_ISOLATED_ANON) +
-		 global_node_page_state(NR_ISOLATED_FILE) +
-		 global_node_page_state(NR_UNEVICTABLE);
-
-	return (global_node_page_state_pages(NR_SLAB_UNRECLAIMABLE_B) > nr_lru);
-}
-
 /**
  * oom_badness - heuristic function to determine which candidate task to kill
  * @p: task struct of which task we should calculate
@@ -465,8 +444,6 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
 		mem_cgroup_print_oom_meminfo(oc->memcg);
 	else {
 		show_mem(SHOW_MEM_FILTER_NODES, oc->nodemask);
-		if (should_dump_unreclaim_slab())
-			dump_unreclaimable_slab();
 	}
 	if (sysctl_oom_dump_tasks)
 		dump_tasks(oc);
diff --git a/mm/show_mem.c b/mm/show_mem.c
index 1c26c14ffb..24b662f64d 100644
--- a/mm/show_mem.c
+++ b/mm/show_mem.c
@@ -7,11 +7,15 @@
 
 #include <linux/mm.h>
 #include <linux/cma.h>
+#include <linux/printbuf.h>
+
+#include "slab.h"
 
 void show_mem(unsigned int filter, nodemask_t *nodemask)
 {
 	pg_data_t *pgdat;
 	unsigned long total = 0, reserved = 0, highmem = 0;
+	struct printbuf buf = PRINTBUF;
 
 	printk("Mem-Info:\n");
 	show_free_areas(filter, nodemask);
@@ -41,4 +45,14 @@ void show_mem(unsigned int filter, nodemask_t *nodemask)
 #ifdef CONFIG_MEMORY_FAILURE
 	printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages));
 #endif
+
+	pr_info("Unreclaimable slab info:\n");
+	dump_unreclaimable_slab(&buf);
+	printk("%s", printbuf_str(&buf));
+	printbuf_reset(&buf);
+
+	printk("Shrinkers:\n");
+	shrinkers_to_text(&buf);
+	printk("%s", printbuf_str(&buf));
+	printbuf_exit(&buf);
 }
diff --git a/mm/slab.h b/mm/slab.h
index c7f2abc2b1..abefbf7674 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -788,10 +788,12 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
 
 #endif
 
+struct printbuf;
+
 #if defined(CONFIG_SLAB) || defined(CONFIG_SLUB_DEBUG)
-void dump_unreclaimable_slab(void);
+void dump_unreclaimable_slab(struct printbuf *);
 #else
-static inline void dump_unreclaimable_slab(void)
+static inline void dump_unreclaimable_slab(struct printbuf *out)
 {
 }
 #endif
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 23f2ab0713..1209480797 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -24,6 +24,7 @@
 #include <asm/tlbflush.h>
 #include <asm/page.h>
 #include <linux/memcontrol.h>
+#include <linux/printbuf.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/kmem.h>
@@ -1084,10 +1085,15 @@ static int slab_show(struct seq_file *m, void *p)
 	return 0;
 }
 
-void dump_unreclaimable_slab(void)
+void dump_unreclaimable_slab(struct printbuf *out)
 {
 	struct kmem_cache *s;
 	struct slabinfo sinfo;
+	struct slab_by_mem {
+		struct kmem_cache *s;
+		size_t total, active;
+	} slabs_by_mem[10], n;
+	int i, nr = 0;
 
 	/*
 	 * Here acquiring slab_mutex is risky since we don't prefer to get
@@ -1097,12 +1103,11 @@ void dump_unreclaimable_slab(void)
 	 * without acquiring the mutex.
 	 */
 	if (!mutex_trylock(&slab_mutex)) {
-		pr_warn("excessive unreclaimable slab but cannot dump stats\n");
+		pr_buf(out, "excessive unreclaimable slab but cannot dump stats\n");
 		return;
 	}
 
-	pr_info("Unreclaimable slab info:\n");
-	pr_info("Name                      Used          Total\n");
+	printbuf_atomic_inc(out);
 
 	list_for_each_entry(s, &slab_caches, list) {
 		if (s->flags & SLAB_RECLAIM_ACCOUNT)
@@ -1110,11 +1115,43 @@ void dump_unreclaimable_slab(void)
 
 		get_slabinfo(s, &sinfo);
 
-		if (sinfo.num_objs > 0)
-			pr_info("%-17s %10luKB %10luKB\n", s->name,
-				(sinfo.active_objs * s->size) / 1024,
-				(sinfo.num_objs * s->size) / 1024);
+		if (!sinfo.num_objs)
+			continue;
+
+		n.s = s;
+		n.total = sinfo.num_objs * s->size;
+		n.active = sinfo.active_objs * s->size;
+
+		for (i = 0; i < nr; i++)
+			if (n.total < slabs_by_mem[i].total)
+				break;
+
+		if (nr < ARRAY_SIZE(slabs_by_mem)) {
+			memmove(&slabs_by_mem[i + 1],
+				&slabs_by_mem[i],
+				sizeof(slabs_by_mem[0]) * (nr - i));
+			nr++;
+		} else if (i) {
+			i--;
+			memmove(&slabs_by_mem[0],
+				&slabs_by_mem[1],
+				sizeof(slabs_by_mem[0]) * i);
+		} else {
+			continue;
+		}
+
+		slabs_by_mem[i] = n;
+	}
+
+	for (i = nr - 1; i >= 0; --i) {
+		pr_buf(out, "%-17s total: ", slabs_by_mem[i].s->name);
+		pr_human_readable_u64(out, slabs_by_mem[i].total);
+		pr_buf(out, " active: ");
+		pr_human_readable_u64(out, slabs_by_mem[i].active);
+		pr_newline(out);
 	}
+
+	printbuf_atomic_dec(out);
 	mutex_unlock(&slab_mutex);
 }
 
-- 
2.35.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings
  2022-04-21 23:48 ` [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings Kent Overstreet
@ 2022-04-22  4:20   ` Christoph Hellwig
  2022-04-22  5:14     ` Kent Overstreet
  2022-04-24 23:46   ` [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings Joe Perches
  1 sibling, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2022-04-22  4:20 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-kernel, linux-mm, linux-fsdevel, hch, hannes, akpm,
	linux-clk, linux-tegra, linux-input, roman.gushchin, rostedt

I still see absolutel no reason to bloat the kernel with a duplicate
of the existing seq_buf functionality.  Please use that and improve it
where needed for your use case.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings
  2022-04-22  4:20   ` Christoph Hellwig
@ 2022-04-22  5:14     ` Kent Overstreet
  2022-04-22  5:22       ` Christoph Hellwig
  0 siblings, 1 reply; 59+ messages in thread
From: Kent Overstreet @ 2022-04-22  5:14 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-kernel, linux-mm, linux-fsdevel, hannes, akpm, linux-clk,
	linux-tegra, linux-input, roman.gushchin, rostedt

On Fri, Apr 22, 2022 at 06:20:17AM +0200, Christoph Hellwig wrote:
> I still see absolutel no reason to bloat the kernel with a duplicate
> of the existing seq_buf functionality.  Please use that and improve it
> where needed for your use case.

Christoph, you have no problem making more work for me but I can't even get you
to look at the bugs you introuduce in your refactorings that I report to you.

Still waiting on you to look at oops you introduced in bio_copy_data_iter...

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings
  2022-04-22  5:14     ` Kent Overstreet
@ 2022-04-22  5:22       ` Christoph Hellwig
  2022-04-22  5:40         ` Kent Overstreet
  0 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2022-04-22  5:22 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Christoph Hellwig, linux-kernel, linux-mm, linux-fsdevel, hannes,
	akpm, linux-clk, linux-tegra, linux-input, roman.gushchin,
	rostedt

On Fri, Apr 22, 2022 at 01:14:48AM -0400, Kent Overstreet wrote:
> Christoph, you have no problem making more work for me but I can't even get you

I think you are misunderstanding this.  You are trying to create more
work for people maintainaing the kernel by creating duplicate
infrastructure.  The burden is always on the submitter.

> to look at the bugs you introuduce in your refactorings that I report to you.
> 
> Still waiting on you to look at oops you introduced in bio_copy_data_iter...

I'm not sure why I shoud care about your out of tree code making
assumptions about block layer helpers.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings
  2022-04-22  5:22       ` Christoph Hellwig
@ 2022-04-22  5:40         ` Kent Overstreet
  2022-04-22  5:52           ` Christoph Hellwig
  2022-04-22 15:37           ` Steven Rostedt
  0 siblings, 2 replies; 59+ messages in thread
From: Kent Overstreet @ 2022-04-22  5:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-kernel, linux-mm, linux-fsdevel, hannes, akpm, linux-clk,
	linux-tegra, linux-input, roman.gushchin, rostedt

On Fri, Apr 22, 2022 at 07:22:08AM +0200, Christoph Hellwig wrote:
> On Fri, Apr 22, 2022 at 01:14:48AM -0400, Kent Overstreet wrote:
> > Christoph, you have no problem making more work for me but I can't even get you
> 
> I think you are misunderstanding this.  You are trying to create more
> work for people maintainaing the kernel by creating duplicate
> infrastructure.  The burden is always on the submitter.
>
> > to look at the bugs you introuduce in your refactorings that I report to you.
> > 
> > Still waiting on you to look at oops you introduced in bio_copy_data_iter...
> 
> I'm not sure why I shoud care about your out of tree code making
> assumptions about block layer helpers.

Wasn't just bcachefs, it affected bcache too, as Coly also reported. And I wrote
that code originally (and the whole fucking modern bvec iter infrastracture,
mind you) so please don't lecture me on making assumptions on block layer
helpers.

Here's the thing, I think you and I have somewhat different approaches to
engineering. Personaly, I find good engineering to be about tradeoffs, not
absolutism, and not letting perfect be the enemy of good.

So I'm honestly not super eager to start modifying tricky arch code that I can't
test, and digging into what looked like non trivial interactions between the way
the traceing code using seq_buf (naturally, given that's where it originates).

I like to push out code that I have high confidence in, and the patch series I
pushed out I do have confidence in, given that it's been in use for awhile and
it's well tested in my tree.

Now yes, I _could_ do a wholesale conversion of seq_buf to printbuf and delete
that code, but doing that job right, to be confident that I'm not introducing
bugs, is going to take more time than I really want to invest right now. I
really don't like to play fast and loose with that stuff.

And the reason getting this from you really irks me is that _practically every
single time_ I trip over a something nasty when I rebase and I git bisect or
blame it's something you did. I don't even bother reporting most of them to you.

I don't want to be calling you out for the work you do because on the whole it's
good and appreciated - I saw the patch series go by getting request_queue out of
filesystem land, I'm happy that's getting done. But I've also seen the stuff you
submit get _really_ churny at times for no good reason, and some really nasty,
data corrupting bugs go by, so...

Please chill out a bit if I'm not super in a rush to do it your way.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings
  2022-04-22  5:40         ` Kent Overstreet
@ 2022-04-22  5:52           ` Christoph Hellwig
  2022-04-22  6:06             ` Kent Overstreet
  2022-04-22 15:37           ` Steven Rostedt
  1 sibling, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2022-04-22  5:52 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Christoph Hellwig, linux-kernel, linux-mm, linux-fsdevel, hannes,
	akpm, linux-clk, linux-tegra, linux-input, roman.gushchin,
	rostedt

On Fri, Apr 22, 2022 at 01:40:15AM -0400, Kent Overstreet wrote:
> Wasn't just bcachefs, it affected bcache too, as Coly also reported.

Well, I've not seen a good bug report for that, but I'd gladly look at it.

> And I wrote
> that code originally (and the whole fucking modern bvec iter infrastracture,
> mind you) so please don't lecture me on making assumptions on block layer
> helpers.

Well, most of what we have is really from Ming.  Because your original
idea was awesome, but the code didn't really fit.  Then again I'm not
sure why this even matters.

I'm also relly not sure why you are getting so personal.  

> Now yes, I _could_ do a wholesale conversion of seq_buf to printbuf and delete
> that code, but doing that job right, to be confident that I'm not introducing
> bugs, is going to take more time than I really want to invest right now. I
> really don't like to play fast and loose with that stuff.

Even of that I'd rather see a very good reason first.  seq_bufs have been
in the kernel for a while and seem to work fine.  If you think there are
shortcomings please try to improve it, not replace or duplicate it.
Sometimes there might be a good reason to replace exiting code, but it
rather have to be a very good reason.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings
  2022-04-22  5:52           ` Christoph Hellwig
@ 2022-04-22  6:06             ` Kent Overstreet
  2022-04-22  6:11               ` Christoph Hellwig
  0 siblings, 1 reply; 59+ messages in thread
From: Kent Overstreet @ 2022-04-22  6:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-kernel, linux-mm, linux-fsdevel, hannes, akpm, linux-clk,
	linux-tegra, linux-input, roman.gushchin, rostedt

On Fri, Apr 22, 2022 at 07:52:14AM +0200, Christoph Hellwig wrote:
> On Fri, Apr 22, 2022 at 01:40:15AM -0400, Kent Overstreet wrote:
> > Wasn't just bcachefs, it affected bcache too, as Coly also reported.
> 
> Well, I've not seen a good bug report for that, but I'd gladly look at it.

Thanks. It's been awhile but I'll see if I can dig up the original bug report
tomorrow.

> > And I wrote
> > that code originally (and the whole fucking modern bvec iter infrastracture,
> > mind you) so please don't lecture me on making assumptions on block layer
> > helpers.
> 
> Well, most of what we have is really from Ming.  Because your original
> idea was awesome, but the code didn't really fit.  Then again I'm not
> sure why this even matters.

Didn't fit how? And Ming extended it to multipage bvecs based on my proposal.

> I'm also relly not sure why you are getting so personal.  

Put yourself my shoes, I've honestly found you to be hardheaded and exceedingly
difficult to work with for a very long time.

But I'll bite my tongue for now, because if you'll start listening to bug
reports that will go a long way towards easing things, and we've got LSF coming
up so maybe we can hash things out over beers.

> > Now yes, I _could_ do a wholesale conversion of seq_buf to printbuf and delete
> > that code, but doing that job right, to be confident that I'm not introducing
> > bugs, is going to take more time than I really want to invest right now. I
> > really don't like to play fast and loose with that stuff.
> 
> Even of that I'd rather see a very good reason first.  seq_bufs have been
> in the kernel for a while and seem to work fine.  If you think there are
> shortcomings please try to improve it, not replace or duplicate it.
> Sometimes there might be a good reason to replace exiting code, but it
> rather have to be a very good reason.

When the new version is semantically different from the old version it makes it
a lot easier to deal with the merge conflicts later when forward/backporting
stuff by giving the new version a new name.

Anyways, I'll have a chat with Steven Rostedt about it since I believe he wrote
the original code.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings
  2022-04-22  6:06             ` Kent Overstreet
@ 2022-04-22  6:11               ` Christoph Hellwig
  2022-04-22  6:18                 ` Kent Overstreet
  0 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2022-04-22  6:11 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Christoph Hellwig, linux-kernel, linux-mm, linux-fsdevel, hannes,
	akpm, linux-clk, linux-tegra, linux-input, roman.gushchin,
	rostedt

On Fri, Apr 22, 2022 at 02:06:44AM -0400, Kent Overstreet wrote:
> > Well, most of what we have is really from Ming.  Because your original
> > idea was awesome, but the code didn't really fit.  Then again I'm not
> > sure why this even matters.
> 
> Didn't fit how? And Ming extended it to multipage bvecs based on my proposal.

He did all the actual hard work to get it ready to merge and to work
everywhere.  As in he stuck around and actually finished the project
based on your design.

> 
> > I'm also relly not sure why you are getting so personal.  
> 
> Put yourself my shoes, I've honestly found you to be hardheaded and exceedingly
> difficult to work with for a very long time.

Thanks, but I've been walking these shoes for a while..

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings
  2022-04-22  6:11               ` Christoph Hellwig
@ 2022-04-22  6:18                 ` Kent Overstreet
  0 siblings, 0 replies; 59+ messages in thread
From: Kent Overstreet @ 2022-04-22  6:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-kernel, linux-mm, linux-fsdevel, hannes, akpm, linux-clk,
	linux-tegra, linux-input, roman.gushchin, rostedt

On Fri, Apr 22, 2022 at 08:11:52AM +0200, Christoph Hellwig wrote:
> On Fri, Apr 22, 2022 at 02:06:44AM -0400, Kent Overstreet wrote:
> > > Well, most of what we have is really from Ming.  Because your original
> > > idea was awesome, but the code didn't really fit.  Then again I'm not
> > > sure why this even matters.
> > 
> > Didn't fit how? And Ming extended it to multipage bvecs based on my proposal.
> 
> He did all the actual hard work to get it ready to merge and to work
> everywhere.  As in he stuck around and actually finished the project
> based on your design.

Not sure why you need to keep throwing shade, but..

> > > I'm also relly not sure why you are getting so personal.  
> > 
> > Put yourself my shoes, I've honestly found you to be hardheaded and exceedingly
> > difficult to work with for a very long time.
> 
> Thanks, but I've been walking these shoes for a while..

*snort* Yeah, I know I am too :)

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 2/4] mm: Add a .to_text() method for shrinkers
  2022-04-21 23:48 ` [PATCH 2/4] mm: Add a .to_text() method for shrinkers Kent Overstreet
@ 2022-04-22 12:21   ` Michal Hocko
  0 siblings, 0 replies; 59+ messages in thread
From: Michal Hocko @ 2022-04-22 12:21 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-kernel, linux-mm, linux-fsdevel, roman.gushchin, hannes

On Thu 21-04-22 19:48:26, Kent Overstreet wrote:
> This adds a new callback method to shrinkers which they can use to
> describe anything relevant to memory reclaim about their internal state,
> for example object dirtyness.
> 
> This uses the new printbufs to output to heap allocated strings, so that
> the .to_text() methods can be used both for messages logged to the
> console, and also sysfs/debugfs.
> 
> This patch also adds shrinkers_to_text(), which reports on the top 10
> shrinkers - by object count - in sorted order, to be used in OOM
> reporting.

Let's put aside whether doing the sorting is useful or not for a moment.
The primary concern I have here is that pr_buf is internally relying on
memory allocations. This makes it really risky to use from the OOM path
which is the primary motivation here AFAICS.

Especially the oom killer path where we _know_ the memory is depleted.
The only way to pursue the allocation is to rely on PF_MEMALLOC and
memory reserves.

If you want to have a generic way to dump shrinker's internal state then
those would really need a prellocated buffer and .to_text would need to
be documented to not depend on any locking that could be directly or
indirectly depending on memory allocations.

> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
> ---
>  include/linux/shrinker.h |  5 +++
>  mm/vmscan.c              | 75 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 80 insertions(+)
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 3/8] mm/memcontrol.c: Convert to printbuf
  2022-04-21 23:48 ` [PATCH v2 3/8] mm/memcontrol.c: Convert to printbuf Kent Overstreet
@ 2022-04-22 12:28   ` Michal Hocko
  0 siblings, 0 replies; 59+ messages in thread
From: Michal Hocko @ 2022-04-22 12:28 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-kernel, linux-mm, linux-fsdevel, hch, hannes, akpm,
	linux-clk, linux-tegra, linux-input, roman.gushchin

On Thu 21-04-22 19:48:32, Kent Overstreet wrote:
> This converts memory_stat_format() from seq_buf to printbuf. Printbuf is
> simalar to seq_buf except that it heap allocates the string buffer:
> here, we were already heap allocating the buffer with kmalloc() so the
> conversion is trivial.

What is the advantage of changing a well tested seq_buf with a different
way to do the same thing here?

I do not see this to be a noticeable simplification of the existing
code. The only advantage I can see is that the string storage allocation
is implicit and it would expand in case we ever overflow over a single
page. But is this really worth the code churn?

> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
> ---
>  mm/memcontrol.c | 68 ++++++++++++++++++++++++-------------------------
>  1 file changed, 33 insertions(+), 35 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 36e9f38c91..4cb0b7bc1c 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -61,7 +61,7 @@
>  #include <linux/file.h>
>  #include <linux/tracehook.h>
>  #include <linux/psi.h>
> -#include <linux/seq_buf.h>
> +#include <linux/printbuf.h>
>  #include "internal.h"
>  #include <net/sock.h>
>  #include <net/ip.h>
> @@ -1436,13 +1436,9 @@ static inline unsigned long memcg_page_state_output(struct mem_cgroup *memcg,
>  
>  static char *memory_stat_format(struct mem_cgroup *memcg)
>  {
> -	struct seq_buf s;
> +	struct printbuf buf = PRINTBUF;
>  	int i;
>  
> -	seq_buf_init(&s, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE);
> -	if (!s.buffer)
> -		return NULL;
> -
>  	/*
>  	 * Provide statistics on the state of the memory subsystem as
>  	 * well as cumulative event counters that show past behavior.
> @@ -1459,49 +1455,51 @@ static char *memory_stat_format(struct mem_cgroup *memcg)
>  		u64 size;
>  
>  		size = memcg_page_state_output(memcg, memory_stats[i].idx);
> -		seq_buf_printf(&s, "%s %llu\n", memory_stats[i].name, size);
> +		pr_buf(&buf, "%s %llu\n", memory_stats[i].name, size);
>  
>  		if (unlikely(memory_stats[i].idx == NR_SLAB_UNRECLAIMABLE_B)) {
>  			size += memcg_page_state_output(memcg,
>  							NR_SLAB_RECLAIMABLE_B);
> -			seq_buf_printf(&s, "slab %llu\n", size);
> +			pr_buf(&buf, "slab %llu\n", size);
>  		}
>  	}
>  
>  	/* Accumulated memory events */
>  
> -	seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGFAULT),
> -		       memcg_events(memcg, PGFAULT));
> -	seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGMAJFAULT),
> -		       memcg_events(memcg, PGMAJFAULT));
> -	seq_buf_printf(&s, "%s %lu\n",  vm_event_name(PGREFILL),
> -		       memcg_events(memcg, PGREFILL));
> -	seq_buf_printf(&s, "pgscan %lu\n",
> -		       memcg_events(memcg, PGSCAN_KSWAPD) +
> -		       memcg_events(memcg, PGSCAN_DIRECT));
> -	seq_buf_printf(&s, "pgsteal %lu\n",
> -		       memcg_events(memcg, PGSTEAL_KSWAPD) +
> -		       memcg_events(memcg, PGSTEAL_DIRECT));
> -	seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGACTIVATE),
> -		       memcg_events(memcg, PGACTIVATE));
> -	seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGDEACTIVATE),
> -		       memcg_events(memcg, PGDEACTIVATE));
> -	seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGLAZYFREE),
> -		       memcg_events(memcg, PGLAZYFREE));
> -	seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGLAZYFREED),
> -		       memcg_events(memcg, PGLAZYFREED));
> +	pr_buf(&buf, "%s %lu\n", vm_event_name(PGFAULT),
> +	       memcg_events(memcg, PGFAULT));
> +	pr_buf(&buf, "%s %lu\n", vm_event_name(PGMAJFAULT),
> +	       memcg_events(memcg, PGMAJFAULT));
> +	pr_buf(&buf, "%s %lu\n",  vm_event_name(PGREFILL),
> +	       memcg_events(memcg, PGREFILL));
> +	pr_buf(&buf, "pgscan %lu\n",
> +	       memcg_events(memcg, PGSCAN_KSWAPD) +
> +	       memcg_events(memcg, PGSCAN_DIRECT));
> +	pr_buf(&buf, "pgsteal %lu\n",
> +	       memcg_events(memcg, PGSTEAL_KSWAPD) +
> +	       memcg_events(memcg, PGSTEAL_DIRECT));
> +	pr_buf(&buf, "%s %lu\n", vm_event_name(PGACTIVATE),
> +	       memcg_events(memcg, PGACTIVATE));
> +	pr_buf(&buf, "%s %lu\n", vm_event_name(PGDEACTIVATE),
> +	       memcg_events(memcg, PGDEACTIVATE));
> +	pr_buf(&buf, "%s %lu\n", vm_event_name(PGLAZYFREE),
> +	       memcg_events(memcg, PGLAZYFREE));
> +	pr_buf(&buf, "%s %lu\n", vm_event_name(PGLAZYFREED),
> +	       memcg_events(memcg, PGLAZYFREED));
>  
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -	seq_buf_printf(&s, "%s %lu\n", vm_event_name(THP_FAULT_ALLOC),
> -		       memcg_events(memcg, THP_FAULT_ALLOC));
> -	seq_buf_printf(&s, "%s %lu\n", vm_event_name(THP_COLLAPSE_ALLOC),
> -		       memcg_events(memcg, THP_COLLAPSE_ALLOC));
> +	pr_buf(&buf, "%s %lu\n", vm_event_name(THP_FAULT_ALLOC),
> +	       memcg_events(memcg, THP_FAULT_ALLOC));
> +	pr_buf(&buf, "%s %lu\n", vm_event_name(THP_COLLAPSE_ALLOC),
> +	       memcg_events(memcg, THP_COLLAPSE_ALLOC));
>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>  
> -	/* The above should easily fit into one page */
> -	WARN_ON_ONCE(seq_buf_has_overflowed(&s));
> +	if (buf.allocation_failure) {
> +		printbuf_exit(&buf);
> +		return NULL;
> +	}
>  
> -	return s.buffer;
> +	return buf.buf;
>  }
>  
>  #define K(x) ((x) << (PAGE_SHIFT-10))
> -- 
> 2.35.2

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 7/8] mm: Move lib/show_mem.c to mm/
  2022-04-21 23:48 ` [PATCH v2 7/8] mm: Move lib/show_mem.c to mm/ Kent Overstreet
@ 2022-04-22 12:32   ` Michal Hocko
  0 siblings, 0 replies; 59+ messages in thread
From: Michal Hocko @ 2022-04-22 12:32 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-kernel, linux-mm, linux-fsdevel, hch, hannes, akpm,
	linux-clk, linux-tegra, linux-input, roman.gushchin

On Thu 21-04-22 19:48:36, Kent Overstreet wrote:
> show_mem.c is really mm specific, and the next patch in the series is
> going to require mm/slab.h, so let's move it before doing more work on
> it.
> 
> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

No real objections on this. I do agree that this is an mm specific
thing. It depends on other things in mm/page_alloc.c so I wouldn't mind
those living close together.

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  lib/Makefile           | 2 +-
>  mm/Makefile            | 2 +-
>  {lib => mm}/show_mem.c | 0
>  3 files changed, 2 insertions(+), 2 deletions(-)
>  rename {lib => mm}/show_mem.c (100%)
> 
> diff --git a/lib/Makefile b/lib/Makefile
> index 31a3904eda..c5041d33d0 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -30,7 +30,7 @@ endif
>  lib-y := ctype.o string.o vsprintf.o cmdline.o \
>  	 rbtree.o radix-tree.o timerqueue.o xarray.o \
>  	 idr.o extable.o sha1.o irq_regs.o argv_split.o \
> -	 flex_proportions.o ratelimit.o show_mem.o \
> +	 flex_proportions.o ratelimit.o \
>  	 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
>  	 earlycpio.o seq_buf.o siphash.o dec_and_lock.o \
>  	 nmi_backtrace.o nodemask.o win_minmax.o memcat_p.o \
> diff --git a/mm/Makefile b/mm/Makefile
> index 70d4309c9c..97c0be12f3 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -54,7 +54,7 @@ obj-y			:= filemap.o mempool.o oom_kill.o fadvise.o \
>  			   mm_init.o percpu.o slab_common.o \
>  			   compaction.o vmacache.o \
>  			   interval_tree.o list_lru.o workingset.o \
> -			   debug.o gup.o mmap_lock.o $(mmu-y)
> +			   debug.o gup.o mmap_lock.o show_mem.o $(mmu-y)
>  
>  # Give 'page_alloc' its own module-parameter namespace
>  page-alloc-y := page_alloc.o
> diff --git a/lib/show_mem.c b/mm/show_mem.c
> similarity index 100%
> rename from lib/show_mem.c
> rename to mm/show_mem.c
> -- 
> 2.35.2

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 8/8] mm: Centralize & improve oom reporting in show_mem.c
  2022-04-21 23:48 ` [PATCH v2 8/8] mm: Centralize & improve oom reporting in show_mem.c Kent Overstreet
@ 2022-04-22 12:58   ` Michal Hocko
  2022-04-22 15:09     ` Roman Gushchin
  0 siblings, 1 reply; 59+ messages in thread
From: Michal Hocko @ 2022-04-22 12:58 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-kernel, linux-mm, linux-fsdevel, hch, hannes, akpm,
	linux-clk, linux-tegra, linux-input, roman.gushchin

On Thu 21-04-22 19:48:37, Kent Overstreet wrote:
> This patch:
>  - Changes show_mem() to always report on slab usage
>  - Instead of reporting on all slabs, we only report on top 10 slabs,
>    and in sorted order

As I've already pointed out in the email thread for the previous
version, this would be better in its own patch explaining why we want to
make this unconditional and why to limit the number caches to print.
Why the trashold shouldn't be absolute size based?

>  - Also reports on shrinkers, with the new shrinkers_to_text().
>    Shrinkers need to be included in OOM/allocation failure reporting
>    because they're responsible for memory reclaim - if a shrinker isn't
>    giving up its memory, we need to know which one and why.

Again, I do agree that information about shrinkers can be useful but
there are two main things to consider. Do we want to dump that
information unconditionaly? E.g. does it make sense to print for all
allocation requests (even high order, GFP_NOWAIT...)? Should there be
any explicit trigger when to dump this data (like too many shrinkers
failing etc)?

Last but not least let me echo the concern from the other reply. Memory
allocations are not really reasonable to be done from the oom context so
the pr_buf doesn't sound like a good tool here.

Also please make sure to provide an example of the output and _explain_
how the new output is better than the existing one.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 8/8] mm: Centralize & improve oom reporting in show_mem.c
  2022-04-22 12:58   ` Michal Hocko
@ 2022-04-22 15:09     ` Roman Gushchin
  2022-04-22 23:48       ` Kent Overstreet
  0 siblings, 1 reply; 59+ messages in thread
From: Roman Gushchin @ 2022-04-22 15:09 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kent Overstreet, linux-kernel, linux-mm, linux-fsdevel, hch,
	hannes, akpm, linux-clk, linux-tegra, linux-input

On Fri, Apr 22, 2022 at 02:58:19PM +0200, Michal Hocko wrote:
> On Thu 21-04-22 19:48:37, Kent Overstreet wrote:
> > This patch:
> >  - Changes show_mem() to always report on slab usage
> >  - Instead of reporting on all slabs, we only report on top 10 slabs,
> >    and in sorted order
> 
> As I've already pointed out in the email thread for the previous
> version, this would be better in its own patch explaining why we want to
> make this unconditional and why to limit the number caches to print.
> Why the trashold shouldn't be absolute size based?
> 
> >  - Also reports on shrinkers, with the new shrinkers_to_text().
> >    Shrinkers need to be included in OOM/allocation failure reporting
> >    because they're responsible for memory reclaim - if a shrinker isn't
> >    giving up its memory, we need to know which one and why.
>
> Again, I do agree that information about shrinkers can be useful but
> there are two main things to consider. Do we want to dump that
> information unconditionaly? E.g. does it make sense to print for all
> allocation requests (even high order, GFP_NOWAIT...)? Should there be
> any explicit trigger when to dump this data (like too many shrinkers
> failing etc)?

To add a concern: largest shrinkers are usually memcg-aware. Scanning
over the whole cgroup tree (with potentially hundreds or thousands of cgroups)
and over all shrinkers from the oom context sounds like a bad idea to me.

IMO it's more appropriate to do from userspace by oomd or a similar daemon,
well before the in-kernel OOM kicks in.

> 
> Last but not least let me echo the concern from the other reply. Memory
> allocations are not really reasonable to be done from the oom context so
> the pr_buf doesn't sound like a good tool here.

+1

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings
  2022-04-22  5:40         ` Kent Overstreet
  2022-04-22  5:52           ` Christoph Hellwig
@ 2022-04-22 15:37           ` Steven Rostedt
  2022-04-22 19:30             ` Kent Overstreet
  1 sibling, 1 reply; 59+ messages in thread
From: Steven Rostedt @ 2022-04-22 15:37 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Christoph Hellwig, linux-kernel, linux-mm, linux-fsdevel, hannes,
	akpm, linux-clk, linux-tegra, linux-input, roman.gushchin

On Fri, 22 Apr 2022 01:40:15 -0400
Kent Overstreet <kent.overstreet@gmail.com> wrote:

> So I'm honestly not super eager to start modifying tricky arch code that I can't
> test, and digging into what looked like non trivial interactions between the way
> the traceing code using seq_buf (naturally, given that's where it originates).

Yes, seq_buf came from the tracing system but was to be used in a more
broader way. I had originally pushed trace_seq into the lib directory, but
Andrew Morton said it was too specific to tracing. Thus, I gutted the
generic parts out of it and created seq_buf, which looks to be something
that you could use. I had patches to convert seq_file to it, but ran out of
time. I probably can pull them out of the closet and start that again.

> 
> Now yes, I _could_ do a wholesale conversion of seq_buf to printbuf and delete
> that code, but doing that job right, to be confident that I'm not introducing
> bugs, is going to take more time than I really want to invest right now. I
> really don't like to play fast and loose with that stuff.

I would be happy to work with you to convert to seq_buf. If there's
something missing from it, I can help you change it so that it doesn't
cause any regressions with the tracing subsystem.

This is how open source programming is suppose to work ;-)

-- Steve

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings
  2022-04-22 15:37           ` Steven Rostedt
@ 2022-04-22 19:30             ` Kent Overstreet
  2022-04-22 19:39               ` Steven Rostedt
  2022-04-22 20:03               ` James Bottomley
  0 siblings, 2 replies; 59+ messages in thread
From: Kent Overstreet @ 2022-04-22 19:30 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Christoph Hellwig, linux-kernel, linux-mm, linux-fsdevel, hannes,
	akpm, linux-clk, linux-tegra, linux-input, roman.gushchin

Hi Steve!

On Fri, Apr 22, 2022 at 11:37:36AM -0400, Steven Rostedt wrote:
> On Fri, 22 Apr 2022 01:40:15 -0400
> Kent Overstreet <kent.overstreet@gmail.com> wrote:
> 
> > So I'm honestly not super eager to start modifying tricky arch code that I can't
> > test, and digging into what looked like non trivial interactions between the way
> > the traceing code using seq_buf (naturally, given that's where it originates).
> 
> Yes, seq_buf came from the tracing system but was to be used in a more
> broader way. I had originally pushed trace_seq into the lib directory, but
> Andrew Morton said it was too specific to tracing. Thus, I gutted the
> generic parts out of it and created seq_buf, which looks to be something
> that you could use. I had patches to convert seq_file to it, but ran out of
> time. I probably can pull them out of the closet and start that again.
> 
> > 
> > Now yes, I _could_ do a wholesale conversion of seq_buf to printbuf and delete
> > that code, but doing that job right, to be confident that I'm not introducing
> > bugs, is going to take more time than I really want to invest right now. I
> > really don't like to play fast and loose with that stuff.
> 
> I would be happy to work with you to convert to seq_buf. If there's
> something missing from it, I can help you change it so that it doesn't
> cause any regressions with the tracing subsystem.
> 
> This is how open source programming is suppose to work ;-)

Is it though? :)

One of the things I've been meaning to talk more about, that
came out of a recent Rust discussion, is that we in the kernel community could
really do a better job with how we interact with the outside world, particularly
with regards to the sharing of code.

The point was made to me when another long standing kernel dev was complaining
about Facebook being a large, insular, difficult to work with organization, that
likes to pretend it is the center of the universe and not bend to the outside
world, while doing the exact same thing with respect to new concerns brought by
the Rust community. The irony was illuminating :)

The reason I bring that up is that in this case, printbuf is the more evolved,
more widely used implementation, and you're asking me to discard it so the
kernel can stick with its more primitive, less widely used implementation.

$ git grep -w seq_buf|wc -l
86

$ git grep -w printbuf|wc -l
366

So, going to have to push back on that one :)

Printbufs aren't new code; everything in them is there because I've found it
valuable, which is why I decided to try promoting them to the kernel proper (and
more importantly, the idea of a standard way to pretty-print anything).

I'm happy to discuss the merits of the code more, and try to convince you why
you'll like them :)

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings
  2022-04-22 19:30             ` Kent Overstreet
@ 2022-04-22 19:39               ` Steven Rostedt
  2022-04-22 20:30                 ` Kent Overstreet
  2022-04-22 20:03               ` James Bottomley
  1 sibling, 1 reply; 59+ messages in thread
From: Steven Rostedt @ 2022-04-22 19:39 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Christoph Hellwig, linux-kernel, linux-mm, linux-fsdevel, hannes,
	akpm, linux-clk, linux-tegra, linux-input, roman.gushchin

On Fri, 22 Apr 2022 15:30:15 -0400
Kent Overstreet <kent.overstreet@gmail.com> wrote:

> > This is how open source programming is suppose to work ;-)  
> 
> Is it though? :)
> 
> One of the things I've been meaning to talk more about, that
> came out of a recent Rust discussion, is that we in the kernel community could
> really do a better job with how we interact with the outside world, particularly
> with regards to the sharing of code.
> 
> The point was made to me when another long standing kernel dev was complaining
> about Facebook being a large, insular, difficult to work with organization, that
> likes to pretend it is the center of the universe and not bend to the outside
> world, while doing the exact same thing with respect to new concerns brought by
> the Rust community. The irony was illuminating :)

I do not consider Facebook an open source company. One reason I turned them
down.

> 
> The reason I bring that up is that in this case, printbuf is the more evolved,
> more widely used implementation, and you're asking me to discard it so the
> kernel can stick with its more primitive, less widely used implementation.
> 
> $ git grep -w seq_buf|wc -l
> 86
> 
> $ git grep -w printbuf|wc -l
> 366

$ git grep printbuf
drivers/media/i2c/ccs/ccs-reg-access.c:                 char printbuf[(MAX_WRITE_LEN << 1) +
drivers/media/i2c/ccs/ccs-reg-access.c:                 bin2hex(printbuf, regdata, msg.len);
drivers/media/i2c/ccs/ccs-reg-access.c:                         regs->addr + j, printbuf);

I don't see it.

And by your notion:

$ git grep trace_seq | wc -l
1680

Thus we all should be using trace_seq!

> 
> So, going to have to push back on that one :)
> 
> Printbufs aren't new code; everything in them is there because I've found it
> valuable, which is why I decided to try promoting them to the kernel proper (and
> more importantly, the idea of a standard way to pretty-print anything).
> 
> I'm happy to discuss the merits of the code more, and try to convince you why
> you'll like them :)

I'd like to know more to why seq_buf is not good for you. And just telling
me that you never seriously tried to make it work because you were afraid
of causing tracing regressions without ever asking the tracing maintainer
is not going to cut it.

-- Steve

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings
  2022-04-22 19:30             ` Kent Overstreet
  2022-04-22 19:39               ` Steven Rostedt
@ 2022-04-22 20:03               ` James Bottomley
  2022-04-22 21:13                 ` Kent Overstreet
  1 sibling, 1 reply; 59+ messages in thread
From: James Bottomley @ 2022-04-22 20:03 UTC (permalink / raw)
  To: Kent Overstreet, Steven Rostedt
  Cc: Christoph Hellwig, linux-kernel, linux-mm, linux-fsdevel, hannes,
	akpm, linux-clk, linux-tegra, linux-input, roman.gushchin

On Fri, 2022-04-22 at 15:30 -0400, Kent Overstreet wrote:
> Hi Steve!
> 
> On Fri, Apr 22, 2022 at 11:37:36AM -0400, Steven Rostedt wrote:
> > On Fri, 22 Apr 2022 01:40:15 -0400
> > Kent Overstreet <kent.overstreet@gmail.com> wrote:
[...]
> > > Now yes, I _could_ do a wholesale conversion of seq_buf to
> > > printbuf and delete that code, but doing that job right, to be
> > > confident that I'm not introducing bugs, is going to take more
> > > time than I really want to invest right now. I really don't like
> > > to play fast and loose with that stuff.
> > 
> > I would be happy to work with you to convert to seq_buf. If there's
> > something missing from it, I can help you change it so that it
> > doesn't cause any regressions with the tracing subsystem.
> > 
> > This is how open source programming is suppose to work ;-)
> 
> Is it though? :)
> 
> One of the things I've been meaning to talk more about, that
> came out of a recent Rust discussion, is that we in the kernel
> community could really do a better job with how we interact with the
> outside world, particularly with regards to the sharing of code.
> 
> The point was made to me when another long standing kernel dev was
> complaining about Facebook being a large, insular, difficult to work
> with organization, that likes to pretend it is the center of the
> universe and not bend to the outside world, while doing the exact
> same thing with respect to new concerns brought by the Rust
> community. The irony was illuminating :)

Hey, I didn't say that at all.  I said vendoring the facebook reference
implementation wouldn't work (it being 74k lines and us using 300) but
that facebook was doing the right thing for us with zstd because they
were maintaining the core code we needed, even if we couldn't vendor it
from their code base:

https://lore.kernel.org/rust-for-linux/ea85b3bce5f172dc73e2be8eb4dbd21fae826fa1.camel@HansenPartnership.com/

You were the one who said all that about facebook, while incorrectly
implying I said it first (which is an interesting variation on the
strawman fallacy):

https://lore.kernel.org/rust-for-linux/20220415203926.pvahugtzrg4dbhcc@moria.home.lan/

James




^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings
  2022-04-22 19:39               ` Steven Rostedt
@ 2022-04-22 20:30                 ` Kent Overstreet
  2022-04-22 20:47                   ` Steven Rostedt
  0 siblings, 1 reply; 59+ messages in thread
From: Kent Overstreet @ 2022-04-22 20:30 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Christoph Hellwig, linux-kernel, linux-mm, linux-fsdevel, hannes,
	akpm, linux-clk, linux-tegra, linux-input, roman.gushchin

On Fri, Apr 22, 2022 at 03:39:16PM -0400, Steven Rostedt wrote:
> I do not consider Facebook an open source company. One reason I turned them
> down.

Surely you can see how NIH syndrome isn't something that just happens at
closed-source companies? How a default cultural assumption of "we do things the
way we've always done" leads to things getting insular?

> > The reason I bring that up is that in this case, printbuf is the more evolved,
> > more widely used implementation, and you're asking me to discard it so the
> > kernel can stick with its more primitive, less widely used implementation.
> > 
> > $ git grep -w seq_buf|wc -l
> > 86
> > 
> > $ git grep -w printbuf|wc -l
> > 366
> 
> $ git grep printbuf
> drivers/media/i2c/ccs/ccs-reg-access.c:                 char printbuf[(MAX_WRITE_LEN << 1) +
> drivers/media/i2c/ccs/ccs-reg-access.c:                 bin2hex(printbuf, regdata, msg.len);
> drivers/media/i2c/ccs/ccs-reg-access.c:                         regs->addr + j, printbuf);
> 
> I don't see it.

Here: https://evilpiepirate.org/git/bcachefs.git/

It may not be merged yet, but it is actively developed open source code with
active users that's intended to be merged!

> I'd like to know more to why seq_buf is not good for you. And just telling
> me that you never seriously tried to make it work because you were afraid
> of causing tracing regressions without ever asking the tracing maintainer
> is not going to cut it.

I didn't know about seq_buf until a day or two ago, that's literally all it was.

And Steve, apologies if I've come across as being a dick about this, that wasn't
my intent.  I've got nothing against you or your code - I'd love it if we could
just have a discussion about them on their merits, and if it feels like I'm
making an issue about this unnecessarily that's because I think there's
something about kernel process and culture worth improving that I want to raise,
so I'm sticking my neck out a bit here.

So here's the story of how I got from where seq_buf is now to where printbuf is
now:

 - Printbuf started out as almost an exact duplicate of seq_buf (like I said,
   not intentionally), with an external buffer typically allocated on the stack.

 - As error/log messages got to be bigger and more structured, stack usage
   eventually became an issue, so eventually I added the heap allocations. 

 - This made them a lot more convenient to use, and made possible entirely new
   ways of using them - so I started using them more, and converting everything
   that outputted to strings to them.

 - This lead to the realization that when pretty-printers are easy and
   convenient to write, that leads to writing pretty-printers for _more_ stuff,
   which makes it easy to stay in the habit of adding anything relevant to
   sysfs/debugfs - and log/error messages got a _whole_ lot better when I
   realized instead of writing format strings for every basic C type I can just
   use the .to_text() methods of the high level objects I'm working with.

Basically, my debugging life has gotten _drastically_ easier because of this
change in process and approach - deadlocks that I used to have to attach a
debugger for are now trivial because all the relevant state is in debugfs and
greppable, and filesystem inconsistencies that used to suck to debug I now just
take what's in the error message and grep through the journal for.

I can't understate how invaluable all this stuff has been, and I'm excited to
take the lessons I've learned and apply them to the wider kernel and make other
people's lives easier too.

The shrinkers-OOM-reporting patch was an obvious starting point because
 - shrinkers have internal state that's definitely worth reporting on
 - we shouldn't just be logging this on OOM, we should also make this available
   in sysfs or debugfs.

Feature wise, printbufs have:
 - heap allocation
 - extra state for formatting: indent level, tabstops, and a way of specifying
   units.

That's basically it. Heap allocation adds very little code and eliminates a
_lot_ of headaches in playing the "how much do I need to/can I put on the stack"
game, and you'll want the formatting options as soon as you start formatting
multi line output and writing pretty printers that call other pretty printers.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings
  2022-04-22 20:30                 ` Kent Overstreet
@ 2022-04-22 20:47                   ` Steven Rostedt
  2022-04-22 21:51                     ` Kent Overstreet
  0 siblings, 1 reply; 59+ messages in thread
From: Steven Rostedt @ 2022-04-22 20:47 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Christoph Hellwig, linux-kernel, linux-mm, linux-fsdevel, hannes,
	akpm, linux-clk, linux-tegra, linux-input, roman.gushchin

On Fri, 22 Apr 2022 16:30:57 -0400
Kent Overstreet <kent.overstreet@gmail.com> wrote:

> So here's the story of how I got from where seq_buf is now to where printbuf is
> now:
> 
>  - Printbuf started out as almost an exact duplicate of seq_buf (like I said,
>    not intentionally), with an external buffer typically allocated on the stack.

Basically seq_buf is designed to be used as an underlining infrastructure.
That's why it does not allocate any buffer and leaves that to the user
cases. Hence, trace_seq() allocates a page for use, and I even use seq_buf
in user space to dynamically allocate when needed.

> 
>  - As error/log messages got to be bigger and more structured, stack usage
>    eventually became an issue, so eventually I added the heap allocations. 

Which is something you could do on top of seq_buf. Point being, you do not
need to re-implement printbuf, and I have not looked at the code, but
instead, implement printbuf on top of seq_buf, and extend seq_buf where
needed. Like trace_seq does, and the patches I have for seq_file would do.
It would leave the string processing and buffer space management to
seq_buf, as there's ways to see "oh, we need more space, let's allocate
more" and then increase the heap.

> 
>  - This made them a lot more convenient to use, and made possible entirely new
>    ways of using them - so I started using them more, and converting everything
>    that outputted to strings to them.
> 
>  - This lead to the realization that when pretty-printers are easy and
>    convenient to write, that leads to writing pretty-printers for _more_ stuff,
>    which makes it easy to stay in the habit of adding anything relevant to
>    sysfs/debugfs - and log/error messages got a _whole_ lot better when I
>    realized instead of writing format strings for every basic C type I can just
>    use the .to_text() methods of the high level objects I'm working with.
> 
> Basically, my debugging life has gotten _drastically_ easier because of this
> change in process and approach - deadlocks that I used to have to attach a
> debugger for are now trivial because all the relevant state is in debugfs and
> greppable, and filesystem inconsistencies that used to suck to debug I now just
> take what's in the error message and grep through the journal for.
> 
> I can't understate how invaluable all this stuff has been, and I'm excited to
> take the lessons I've learned and apply them to the wider kernel and make other
> people's lives easier too.
> 
> The shrinkers-OOM-reporting patch was an obvious starting point because
>  - shrinkers have internal state that's definitely worth reporting on
>  - we shouldn't just be logging this on OOM, we should also make this available
>    in sysfs or debugfs.
> 
> Feature wise, printbufs have:
>  - heap allocation
>  - extra state for formatting: indent level, tabstops, and a way of specifying
>    units.
> 
> That's basically it. Heap allocation adds very little code and eliminates a
> _lot_ of headaches in playing the "how much do I need to/can I put on the stack"
> game, and you'll want the formatting options as soon as you start formatting
> multi line output and writing pretty printers that call other pretty printers.

I would be more willing to accept a printbuf, if it was built on top of
seq_buf. That is, you don't need to change all your user cases, you just
need to make printbuf an extension of seq_buf by using it underneath, like
trace_seq does. Then it would not be re-inventing the wheel, but just
building on top of it.

-- Steve


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings
  2022-04-22 20:03               ` James Bottomley
@ 2022-04-22 21:13                 ` Kent Overstreet
  2022-04-23 14:16                   ` Rust and Kernel Vendoring [Was Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings] James Bottomley
  0 siblings, 1 reply; 59+ messages in thread
From: Kent Overstreet @ 2022-04-22 21:13 UTC (permalink / raw)
  To: James Bottomley
  Cc: Steven Rostedt, Christoph Hellwig, linux-kernel, linux-mm,
	linux-fsdevel, hannes, akpm, linux-clk, linux-tegra, linux-input,
	roman.gushchin

On Fri, Apr 22, 2022 at 04:03:12PM -0400, James Bottomley wrote:
> Hey, I didn't say that at all.  I said vendoring the facebook reference
> implementation wouldn't work (it being 74k lines and us using 300) but
> that facebook was doing the right thing for us with zstd because they
> were maintaining the core code we needed, even if we couldn't vendor it
> from their code base:
> 
> https://lore.kernel.org/rust-for-linux/ea85b3bce5f172dc73e2be8eb4dbd21fae826fa1.camel@HansenPartnership.com/
> 
> You were the one who said all that about facebook, while incorrectly
> implying I said it first (which is an interesting variation on the
> strawman fallacy):

Hey, sorry for picking on you James - I didn't mean to single you out.

Let me explain where I'm coming from - actually, this email I received puts it
better than I could:

From: "John Ericson" <mail@johnericson.me>
To: "Kent Overstreet" <kent.overstreet@gmail.com>
Subject: Nice email on cargo crates in Linux

Hi. I saw your email linked from https://lwn.net/Articles/889924/. Nicely put!

I was involved in some of the allocating try_* function work to avoid nasty
panics, and thus make sure Rust in Linux didn't have to reinvent the wheel
redoing the alloc library. I very much share your goals but suspect this will
happen in a sort of boil-the-frog way over time. The basic portability method of
"incentivize users to write code that's *more* portable than what they currently
need" is a culture shock for user space and kernel space alike. But if we get
all our ducks in a row on the Rust side, eventually it should just be "too
tempting", and the code sharing will happen for economic reasons (rather than
ideological ones about trying to smash down the arbitrary dichotomies :) going
it alone).

The larger issue is major projects closed off unto themselves such as Linux,
PostgreSQL, or even the Glasgow Haskell Compiler (something I am currently
writing a plan to untangle) have fallen too deep in Conway's law's embrace, and
thus are full of bad habits like
https://johno.com/composition-over-configuration is arguing against. Over time,
I hope using external libraries will not only result in less duplicated work,
but also "pry open" their architecture a bit, tamping down on crazy
configurability and replacing it with more principled composition.

I fear this is all way too spicy to bring up on the LKML in 2022, at least
coming from someone like me without any landed kernel patches under his belt,
but I wanted to let you know other people have similar thoughts.

Cheers,

John

-------

Re: Rust in Linux, I get frustrated when I see senior people tell the new Rust
people "don't do that" to things that are standard practice in the outside
world.

I think Linus said recently that Rust in the kernel is something that could
fail, and he's right - but if it fails, it won't just be the failure of the Rust
people to do the required work, it'll be _our_ failure too, a failure to work
with them.

And I think the kernel needs Rust, in the long term. Rust is the biggest advance
towards programming languages with embedded correctness proofs in the systems
space since ever, and that's huge. It's only a first step, it doesn't solve
everything - the next step will probably be dependent types (as in Idris); after
that it's hard to say. But maybe 50, 100 years out, systems programming is going
to look very different.

And that excites me! What I love about programming is being able to sink into
something new, difficult and unsolved, and let my mind wander and explore and
play with ideas until I come across a simple and elegant way of solving it.

What I hate about programming is starting to do that, and then an hour in
getting interrupted by a bug report in an area of the code that I should have
been done with. And then again. And then again.

But that's just the nature of the beast in C, with the current state of the art
of the tools we have. Proving C programs correct can be done (I understand Paul
McKenney is on the bleeding edge of that), but it's too expensive and time
consuming to be practical for most things. We need better languages.

And we've got people trying to work with us, so I hope we work with them and
make it happen. That's going to require us to update our thinking in some areas
- like with cargo.

C people are used to projects being monoliths, but in the modern world things
that in prior days would have required hacking the compiler are now done as
libraries - and we're going to want that code! We don't need to be
reimplementing rust bitvecs, or string parsing, or - I could make up a long,
long list. That code is all going to just work in kernel land, and if we adopt
it it'll lead to a lot less duplication of effort.

Also, what John said in that email about "large projects closed off unto
themselves" - he's spot on about that. When I was at Google, their was this deep
unspoken assumption that anything done at Google was better than anything in the
outside world, and it _rankled_. It was a major reason I departed.

The kernel community has a lot of that going on here. Again, sorry to pick on
you James, but I wanted to make the argument that - maybe the kernel _should_ be
adopting a more structured way of using code from outside repositories, like
cargo, or git submodules (except I've never had a positive experience with git
submodules, so ignore that suggestion, unless I've just been using them wrong,
in which case someone please teach me). To read you and Greg saying "nah, just
copy code from other repos, it's fine" - it felt like being back in the old days
when we were still trying to get people to use source control, and having that
one older colleague who _insisted_ on not using source control of any kind, and
that's a bit disheartening.

I just want to make both the kernel and the wider world of open source software
a better place, and the thing I'm itching to see get better is that currently
it's a massive hassle to share code between kernel space and user space. This is
understandable, because before git we definitely didn't have the tooling, and in
C this kind of code sharing has historically been questionable.

But:
 - We've got a new language coming that makes this sort of thing a lot saner,
   both from a source code management perspective (cargo) and with much stronger
   language guarantees that code won't do something surprising on you

 - Even in C, we're gaining a better appreciation for why someone would want to
   do such a thing.

   Another random case in point: liblwipv6, a userspace TCP/IP library, lets you
   do really cool things with building/simulating networks entirely in userspace
   without root access, just over unix domain sockets. But it's a completely new
   implementation, and thus buggy and incomplete. Imagine if, in some far away
   future, we could someday just reuse the whole Linux network stack in
   userspace, for purposes undreamed of by the likes of us...

And also, personally I find that code gets better when you have to spend the
effort to clean it up, make it more modular, and separate it from the wider
project - that's when you find out what it's really about. There's a ton of
_really effing cool code_ in the kernel that could be seeing wider use in
userspace, too (workqueues!). And, like with pretty printers, once you start
down a good path it often begets more of itself - modularity will lead to more
modularity. If we start making thinks like workqueues more modular and also
usable as a userspace library, that's going to make it easier to port more
kernel code, and more... which will make writing C in userspace a nicer
experience, and maybe even lead to more code sharing in the reverse direction..

Anyways, just one man's dream :)

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings
  2022-04-22 20:47                   ` Steven Rostedt
@ 2022-04-22 21:51                     ` Kent Overstreet
  2022-04-22 22:20                       ` Steven Rostedt
  0 siblings, 1 reply; 59+ messages in thread
From: Kent Overstreet @ 2022-04-22 21:51 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Christoph Hellwig, linux-kernel, linux-mm, linux-fsdevel, hannes,
	akpm, linux-clk, linux-tegra, linux-input, roman.gushchin

On Fri, Apr 22, 2022 at 04:47:44PM -0400, Steven Rostedt wrote:
> Which is something you could do on top of seq_buf. Point being, you do not
> need to re-implement printbuf, and I have not looked at the code, but
> instead, implement printbuf on top of seq_buf, and extend seq_buf where
> needed. Like trace_seq does, and the patches I have for seq_file would do.
> It would leave the string processing and buffer space management to
> seq_buf, as there's ways to see "oh, we need more space, let's allocate
> more" and then increase the heap.

That sounds like it could work. 

> I would be more willing to accept a printbuf, if it was built on top of
> seq_buf. That is, you don't need to change all your user cases, you just
> need to make printbuf an extension of seq_buf by using it underneath, like
> trace_seq does. Then it would not be re-inventing the wheel, but just
> building on top of it.

Hmm... At first glance, redoing printbuf on top of seq_buf looks like it would
save a pretty trivial amount of code - and my engineering taste these days leans
more toward less layering if it's only slightly more code; I think I might like
printbuf and seq_buf to stay separate things (and both of them are pretty
small).

But it's definitely not an unreasonable idea - I can try it out and see how it
turns out. Would you have any objections to making some changes to seq_buf?

 - You've got size and len as size_t, I've got them as unsigned. Given that we
   need to be checking for overflow anyways for correctens, I like having them
   as u32s.
 - seq_buf->readpos - it looks like this is only used by seq_buf_to_user(), does
   it need to be in seq_buf?
 - in printbufs, I make sure the buffer is always nul-terminated - seems
   simplest, given that we need to make sure there's always room for the
   terminating nul anyways.

A downside of having printbuf on top of seq_buf is that now we've got two apis
that functions can output to - vs. if we modified printbuf by adding a flag for
"this is an external buffer, don't reallocate it". That approach would be less
code overall, for sure.

Could I get you to look over printbuf and share your thoughts on the different
approaches?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings
  2022-04-22 21:51                     ` Kent Overstreet
@ 2022-04-22 22:20                       ` Steven Rostedt
  0 siblings, 0 replies; 59+ messages in thread
From: Steven Rostedt @ 2022-04-22 22:20 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Christoph Hellwig, linux-kernel, linux-mm, linux-fsdevel, hannes,
	akpm, linux-clk, linux-tegra, linux-input, roman.gushchin

On Fri, 22 Apr 2022 17:51:46 -0400
Kent Overstreet <kent.overstreet@gmail.com> wrote:

> 
> But it's definitely not an unreasonable idea - I can try it out and see how it
> turns out. Would you have any objections to making some changes to seq_buf?

No I don't mind, and that's why I want the coupled, as enhancements or bug
fixes would happen to both.

> 
>  - You've got size and len as size_t, I've got them as unsigned. Given that we
>    need to be checking for overflow anyways for correctens, I like having them
>    as u32s.

I had it as size_t as I had planned (and still plan to) make seq_file use
seq_buf, and seq_file uses size_t. Who knows, perhaps in the future, we may
have strings that are more than 4GBs. ;-)

>  - seq_buf->readpos - it looks like this is only used by seq_buf_to_user(), does
>    it need to be in seq_buf?

Perhaps.

>  - in printbufs, I make sure the buffer is always nul-terminated - seems
>    simplest, given that we need to make sure there's always room for the
>    terminating nul anyways.

I'm not against that. It was an optimization, but I never actually
benchmarked it. But I'm not sure how many fast paths it is used in to
warrant that kind of optimization over the complexity it can bring for
users.


> 
> A downside of having printbuf on top of seq_buf is that now we've got two apis
> that functions can output to - vs. if we modified printbuf by adding a flag for
> "this is an external buffer, don't reallocate it". That approach would be less
> code overall, for sure.
> 
> Could I get you to look over printbuf and share your thoughts on the different
> approaches?

Sure, but will have to wait till next week.

-- Steve

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 8/8] mm: Centralize & improve oom reporting in show_mem.c
  2022-04-22 15:09     ` Roman Gushchin
@ 2022-04-22 23:48       ` Kent Overstreet
  2022-04-23  0:27         ` Roman Gushchin
  0 siblings, 1 reply; 59+ messages in thread
From: Kent Overstreet @ 2022-04-22 23:48 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Michal Hocko, linux-kernel, linux-mm, linux-fsdevel, hch, hannes,
	akpm, linux-clk, linux-tegra, linux-input, rostedt

On Fri, Apr 22, 2022 at 08:09:48AM -0700, Roman Gushchin wrote:
> To add a concern: largest shrinkers are usually memcg-aware. Scanning
> over the whole cgroup tree (with potentially hundreds or thousands of cgroups)
> and over all shrinkers from the oom context sounds like a bad idea to me.

Why would we be scanning over the whole cgroup tree? We don't do that in the
vmscan code, nor the new report. If the OOM is for a specific cgroup, we should
probably only be reporting on memory usage for that cgroup (show_mem() is not
currently cgroup aware, but perhaps it should be).

> IMO it's more appropriate to do from userspace by oomd or a similar daemon,
> well before the in-kernel OOM kicks in.

The reason I've been introducing printbufs and the .to_text() method was
specifically to make this code general enough to be available from
sysfs/debugfs - so I see no reasons why a userspace oomd couldn't make use of it
as well.

> > Last but not least let me echo the concern from the other reply. Memory
> > allocations are not really reasonable to be done from the oom context so
> > the pr_buf doesn't sound like a good tool here.
> 
> +1

In my experience, it's rare to be _so_ out of memory that small kmalloc
allocations are failing - we'll be triggering the show_mem() report before that
happens.

However, if this turns out not to be the case in practice, or if there's a
consensus now that we really want to guard against this, I have some thoughts.
We could either:

 - mempool-ify printbufs as a whole

 - reserve some memory just for the show_mem() report, which would mean either
   adding support to printbuf for external buffers (subsuming what seq_buf
   does), or shrinker .to_text() methods would have to output to seq_buf instead
   of printbuf (ew, API fragmentation).

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 8/8] mm: Centralize & improve oom reporting in show_mem.c
  2022-04-22 23:48       ` Kent Overstreet
@ 2022-04-23  0:27         ` Roman Gushchin
  2022-04-23  0:46           ` Kent Overstreet
  0 siblings, 1 reply; 59+ messages in thread
From: Roman Gushchin @ 2022-04-23  0:27 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Michal Hocko, linux-kernel, linux-mm, linux-fsdevel, hch, hannes,
	akpm, linux-clk, linux-tegra, linux-input, rostedt

On Fri, Apr 22, 2022 at 07:48:20PM -0400, Kent Overstreet wrote:
> On Fri, Apr 22, 2022 at 08:09:48AM -0700, Roman Gushchin wrote:
> > To add a concern: largest shrinkers are usually memcg-aware. Scanning
> > over the whole cgroup tree (with potentially hundreds or thousands of cgroups)
> > and over all shrinkers from the oom context sounds like a bad idea to me.
> 
> Why would we be scanning over the whole cgroup tree? We don't do that in the
> vmscan code, nor the new report. If the OOM is for a specific cgroup, we should
> probably only be reporting on memory usage for that cgroup (show_mem() is not
> currently cgroup aware, but perhaps it should be).

You're scanning over a small portion of all shrinker lists (on a machine with
cgroups), so the top-10 list has little value.
Global ->count_objects() return the number of objects on the system/root_mem_cgroup
level, not the shrinker's total.

> 
> > IMO it's more appropriate to do from userspace by oomd or a similar daemon,
> > well before the in-kernel OOM kicks in.
> 
> The reason I've been introducing printbufs and the .to_text() method was
> specifically to make this code general enough to be available from
> sysfs/debugfs - so I see no reasons why a userspace oomd couldn't make use of it
> as well.

Of course, I've nothing against adding .to_text().

> 
> > > Last but not least let me echo the concern from the other reply. Memory
> > > allocations are not really reasonable to be done from the oom context so
> > > the pr_buf doesn't sound like a good tool here.
> > 
> > +1
> 
> In my experience, it's rare to be _so_ out of memory that small kmalloc
> allocations are failing - we'll be triggering the show_mem() report before that
> happens.

I agree. However the OOM killer _has_ to make the progress even in such rare
circumstances.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 8/8] mm: Centralize & improve oom reporting in show_mem.c
  2022-04-23  0:27         ` Roman Gushchin
@ 2022-04-23  0:46           ` Kent Overstreet
  2022-04-23  1:25             ` Roman Gushchin
  2022-04-25  9:28             ` Michal Hocko
  0 siblings, 2 replies; 59+ messages in thread
From: Kent Overstreet @ 2022-04-23  0:46 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Michal Hocko, linux-kernel, linux-mm, linux-fsdevel, hch, hannes,
	akpm, linux-clk, linux-tegra, linux-input, rostedt

On Fri, Apr 22, 2022 at 05:27:41PM -0700, Roman Gushchin wrote:
> You're scanning over a small portion of all shrinker lists (on a machine with
> cgroups), so the top-10 list has little value.
> Global ->count_objects() return the number of objects on the system/root_mem_cgroup
> level, not the shrinker's total.

Not quite following what you're saying here...?

If you're complaining that my current top-10-shrinker report isn't memcg aware,
that's valid - I can fix that.

> > In my experience, it's rare to be _so_ out of memory that small kmalloc
> > allocations are failing - we'll be triggering the show_mem() report before that
> > happens.
> 
> I agree. However the OOM killer _has_ to make the progress even in such rare
> circumstances.

Oh, and the concern is allocator recursion? Yeah, that's a good point.

Do you know if using memalloc_noreclaim_(save|restore) is sufficient for that,
or do we want GFP_ATOMIC? I'm already using GFP_ATOMIC for allocations when we
generate the report on slabs, since we're taking the slab mutex there.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 8/8] mm: Centralize & improve oom reporting in show_mem.c
  2022-04-23  0:46           ` Kent Overstreet
@ 2022-04-23  1:25             ` Roman Gushchin
  2022-04-23 11:48               ` Tetsuo Handa
  2022-04-25  9:28             ` Michal Hocko
  1 sibling, 1 reply; 59+ messages in thread
From: Roman Gushchin @ 2022-04-23  1:25 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Michal Hocko, linux-kernel, linux-mm, linux-fsdevel, hch, hannes,
	akpm, linux-clk, linux-tegra, linux-input, rostedt

On Fri, Apr 22, 2022 at 08:46:07PM -0400, Kent Overstreet wrote:
> On Fri, Apr 22, 2022 at 05:27:41PM -0700, Roman Gushchin wrote:
> > You're scanning over a small portion of all shrinker lists (on a machine with
> > cgroups), so the top-10 list has little value.
> > Global ->count_objects() return the number of objects on the system/root_mem_cgroup
> > level, not the shrinker's total.
> 
> Not quite following what you're saying here...?
> 
> If you're complaining that my current top-10-shrinker report isn't memcg aware,
> that's valid - I can fix that.

For memcg-aware shrinkers each memcg has it's own LRU (per node).
If you want to print top-10 system-wide lists you need to call
->count_objects() for each shrinker for each memcg for each node.
It's quite a lot of work for an oom context.

> 
> > > In my experience, it's rare to be _so_ out of memory that small kmalloc
> > > allocations are failing - we'll be triggering the show_mem() report before that
> > > happens.
> > 
> > I agree. However the OOM killer _has_ to make the progress even in such rare
> > circumstances.
> 
> Oh, and the concern is allocator recursion? Yeah, that's a good point.

Yes, but not the only problem.

> 
> Do you know if using memalloc_noreclaim_(save|restore) is sufficient for that,
> or do we want GFP_ATOMIC? I'm already using GFP_ATOMIC for allocations when we
> generate the report on slabs, since we're taking the slab mutex there.

And this is another problem: grabbing _any_ locks from the oom context is asking
for trouble: you can potentially enter the oom path doing any allocation, so
now you have to check that no allocations are ever made holding this lock.
And I'm not aware of any reasonable way to test it, so most likely it ends up
introducing some very subtle bags, which will be triggered once a year.

Thanks!

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 8/8] mm: Centralize & improve oom reporting in show_mem.c
  2022-04-23  1:25             ` Roman Gushchin
@ 2022-04-23 11:48               ` Tetsuo Handa
  0 siblings, 0 replies; 59+ messages in thread
From: Tetsuo Handa @ 2022-04-23 11:48 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Michal Hocko, linux-kernel, linux-mm, linux-fsdevel, hch, hannes,
	akpm, linux-clk, linux-tegra, linux-input, rostedt,
	Roman Gushchin

On 2022/04/23 10:25, Roman Gushchin wrote:
>>> I agree. However the OOM killer _has_ to make the progress even in such rare
>>> circumstances.
>>
>> Oh, and the concern is allocator recursion? Yeah, that's a good point.
> 
> Yes, but not the only problem.
> 
>>
>> Do you know if using memalloc_noreclaim_(save|restore) is sufficient for that,
>> or do we want GFP_ATOMIC? I'm already using GFP_ATOMIC for allocations when we
>> generate the report on slabs, since we're taking the slab mutex there.
> 
> And this is another problem: grabbing _any_ locks from the oom context is asking
> for trouble: you can potentially enter the oom path doing any allocation, so
> now you have to check that no allocations are ever made holding this lock.
> And I'm not aware of any reasonable way to test it, so most likely it ends up
> introducing some very subtle bags, which will be triggered once a year.
> 

You can't allocate memory nor hold locks from OOM context. Since oom_lock mutex
serializes OOM reporting, you could use statically pre-allocated buffer for holding
one line of output. Correlating whole report will be done by the userspace program
with the aid of CONFIG_PRINTK_CALLER=y.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Rust and Kernel Vendoring [Was Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings]
  2022-04-22 21:13                 ` Kent Overstreet
@ 2022-04-23 14:16                   ` James Bottomley
  2022-04-24 20:36                     ` Kent Overstreet
  0 siblings, 1 reply; 59+ messages in thread
From: James Bottomley @ 2022-04-23 14:16 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Steven Rostedt, Christoph Hellwig, linux-kernel, linux-mm,
	linux-fsdevel, hannes, akpm, linux-clk, linux-tegra, linux-input,
	roman.gushchin

[change of subject for an easy thread kill, since I'm sure most people
on cc don't want to follow this]
On Fri, 2022-04-22 at 17:13 -0400, Kent Overstreet wrote:
> On Fri, Apr 22, 2022 at 04:03:12PM -0400, James Bottomley wrote:
> > Hey, I didn't say that at all.  I said vendoring the facebook
> > reference implementation wouldn't work (it being 74k lines and us
> > using 300) but that facebook was doing the right thing for us with
> > zstd because they were maintaining the core code we needed, even if
> > we couldn't vendor it from their code base:
> > 
> > https://lore.kernel.org/rust-for-linux/ea85b3bce5f172dc73e2be8eb4dbd21fae826fa1.camel@HansenPartnership.com/
> > 
> > You were the one who said all that about facebook, while
> > incorrectly implying I said it first (which is an interesting
> > variation on the strawman fallacy):
> 
> Hey, sorry for picking on you James - I didn't mean to single you
> out.
> 
> Let me explain where I'm coming from - actually, this email I
> received puts it better than I could:
> 
> From: "John Ericson" <mail@johnericson.me>
> To: "Kent Overstreet" <kent.overstreet@gmail.com>
> Subject: Nice email on cargo crates in Linux
> 
> Hi. I saw your email linked from https://lwn.net/Articles/889924/.
> Nicely put!
> 
> I was involved in some of the allocating try_* function work to avoid
> nasty panics, and thus make sure Rust in Linux didn't have to
> reinvent the wheel redoing the alloc library. I very much share your
> goals but suspect this will happen in a sort of boil-the-frog way
> over time. The basic portability method of "incentivize users to
> write code that's *more* portable than what they currently
> need" is a culture shock for user space and kernel space alike. But
> if we get all our ducks in a row on the Rust side, eventually it
> should just be "too tempting", and the code sharing will happen for
> economic reasons (rather than ideological ones about trying to smash
> down the arbitrary dichotomies :) going it alone).
> 
> The larger issue is major projects closed off unto themselves such as
> Linux, PostgreSQL, or even the Glasgow Haskell Compiler (something I
> am currently writing a plan to untangle) have fallen too deep in
> Conway's law's embrace, and thus are full of bad habits like
> https://johno.com/composition-over-configuration is arguing against.
> Over time, I hope using external libraries will not only result in
> less duplicated work, but also "pry open" their architecture a bit,
> tamping down on crazy configurability and replacing it with more
> principled composition.
> 
> I fear this is all way too spicy to bring up on the LKML in 2022, at
> least coming from someone like me without any landed kernel patches
> under his belt, but I wanted to let you know other people have
> similar thoughts.
> 
> Cheers,
> 
> John
> 
> -------
> 
> Re: Rust in Linux, I get frustrated when I see senior people tell the
> new Rust people "don't do that" to things that are standard practice
> in the outside world.

You stripped the nuance of that.  I said many no_std crates could be
used in the kernel.  I also said that the async crate couldn't because
the rust compiler itself would have to support the kernel threading
model.

> I think Linus said recently that Rust in the kernel is something that
> could fail, and he's right - but if it fails, it won't just be the
> failure of the Rust people to do the required work, it'll be _our_
> failure too, a failure to work with them.

The big risk is that rust needs to adapt to the kernel environment. 
This isn't rust specific, llvm had similar issues as an alternative C
compiler.  I think rust in the kernel would fail if it were only the
rust kernel people asking.  Fortunately the pressure to support rust in
embedded leading to the rise in no_std crates is a force which can also
get rust in the kernel over the finish line because of the focus it
puts on getting the language and crates to adapt to non standard
environments.

[...]
> The kernel community has a lot of that going on here. Again, sorry to
> pick on you James, but I wanted to make the argument that - maybe the
> kernel _should_ be adopting a more structured way of using code from
> outside repositories, like cargo, or git submodules (except I've
> never had a positive experience with git submodules, so ignore that
> suggestion, unless I've just been using them wrong, in which case
> someone please teach me). To read you and Greg saying "nah, just
> copy code from other repos, it's fine" - it felt like being back in
> the old days when we were still trying to get people to use source
> control, and having that one older colleague who _insisted_ on not
> using source control of any kind, and that's a bit disheartening.

Even in C terms, the kernel is a nostdlib environment.  If a C project
has too much libc dependency it's not going to work directly in the
kernel, nor should it.  Let's look at zstd (which is pretty much a
nostdlib project) as a great example: the facebook people didn't
actually port the top of their tree (1.5) to the kernel, they
backported bug fixes to the 1.4 branch and made a special release
(1.4.10) just for us.  Why did they do this?  It was because the 1.5
version vastly increased stack use to the extent it would run off the
end of the limited kernel stack so couldn't be ported directly into the
kernel.  A lot of C libraries that are nostdlib have problems like this
as well (you can use recursion, but not in the kernel).  There's no
easy way of shimming environmental constraints like this.

The lesson: it is possible to make the core of a project mobile, but
only if you're aware of all the environmental constraints it will run
into as it gets ported.  The list of possible environments is huge:
kernel, embedded, industrial control ..., so naturally not every (or
more accurately hardly any) project wants to do this.

James



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Rust and Kernel Vendoring [Was Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings]
  2022-04-23 14:16                   ` Rust and Kernel Vendoring [Was Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings] James Bottomley
@ 2022-04-24 20:36                     ` Kent Overstreet
  2022-04-26  2:22                       ` James Bottomley
  0 siblings, 1 reply; 59+ messages in thread
From: Kent Overstreet @ 2022-04-24 20:36 UTC (permalink / raw)
  To: James Bottomley
  Cc: Steven Rostedt, Christoph Hellwig, linux-kernel, linux-mm,
	linux-fsdevel, hannes, akpm, linux-clk, linux-tegra, linux-input,
	roman.gushchin

On Sat, Apr 23, 2022 at 10:16:37AM -0400, James Bottomley wrote:
> You stripped the nuance of that.  I said many no_std crates could be
> used in the kernel.  I also said that the async crate couldn't because
> the rust compiler itself would have to support the kernel threading
> model.

I just scanned through that thread and that's not what you said. What you said
was:

> The above is also the rust crate problem in miniature: the crates grow
> API features the kernel will never care about and importing them
> wholesale is going to take forever because of the internal kernel
> support issue.  In the end, to take rust async as an example, it will
> be much better to do for rust what we've done for zlib: take the core
> that can support the kernel threading model and reimplement that in the
> kernel crate.  The act of doing that will a) prove people care enough
> about the functionality and b) allow us to refine it nicely.
> 
> I also don't think rust would really want to import crates wholesale.
> The reason for no_std is that rust is trying to adapt to embedded
> environments, which the somewhat harsh constraints of the kernel is
> very similar to.

But maybe your position has changed somewhat? It sounds like you've been
arguing against just directly depending on foreign reposotories and for the
staus quo of just ad-hoc copying of code.

I'll help by stating my own position: I think we should be coming up with a
process for how dependencies on other git repositories are going to work,
something better than just cut and paste. Whether or not we vendorize code isn't
really that important, but I'd say that if we are vendorizing code and we're not
including entire sub-repositories (like cargo vendor does) we ought to still
make this a scripted process that takes as an input a list of files we're
pulling and a remote repository we're pulling from, and the file list and the
remote repo (and commit ID we're pulling from) should all be checked in.

I think using cargo would be _great_ because it would handle this part for us
(perhaps minus pulling of individual files? haven't checked) instead of
home-growing our own. However, I'd like something for C repositories too, and I
don't think we want to start depending on cargo for non Rust development... :)

But much more important that the technical details of how we import code is just
having good answers for people who aren't embedded in Linux kernel development
culture, so we aren't just telling them "no" by default because we haven't
thought about this stuff and having them walk away frustrated.

> > I think Linus said recently that Rust in the kernel is something that
> > could fail, and he's right - but if it fails, it won't just be the
> > failure of the Rust people to do the required work, it'll be _our_
> > failure too, a failure to work with them.
> 
> The big risk is that rust needs to adapt to the kernel environment. 
> This isn't rust specific, llvm had similar issues as an alternative C
> compiler.  I think rust in the kernel would fail if it were only the
> rust kernel people asking.  Fortunately the pressure to support rust in
> embedded leading to the rise in no_std crates is a force which can also
> get rust in the kernel over the finish line because of the focus it
> puts on getting the language and crates to adapt to non standard
> environments.

It's both! It's on all of us to make this work.

> > The kernel community has a lot of that going on here. Again, sorry to
> > pick on you James, but I wanted to make the argument that - maybe the
> > kernel _should_ be adopting a more structured way of using code from
> > outside repositories, like cargo, or git submodules (except I've
> > never had a positive experience with git submodules, so ignore that
> > suggestion, unless I've just been using them wrong, in which case
> > someone please teach me). To read you and Greg saying "nah, just
> > copy code from other repos, it's fine" - it felt like being back in
> > the old days when we were still trying to get people to use source
> > control, and having that one older colleague who _insisted_ on not
> > using source control of any kind, and that's a bit disheartening.
> 
> Even in C terms, the kernel is a nostdlib environment.  If a C project
> has too much libc dependency it's not going to work directly in the
> kernel, nor should it.  Let's look at zstd (which is pretty much a
> nostdlib project) as a great example: the facebook people didn't
> actually port the top of their tree (1.5) to the kernel, they
> backported bug fixes to the 1.4 branch and made a special release
> (1.4.10) just for us.  Why did they do this?  It was because the 1.5
> version vastly increased stack use to the extent it would run off the
> end of the limited kernel stack so couldn't be ported directly into the
> kernel.  A lot of C libraries that are nostdlib have problems like this
> as well (you can use recursion, but not in the kernel).  There's no
> easy way of shimming environmental constraints like this.

I wonder if we might have come up with a better solution if there'd been more
cross-project communication and less siloing. Small stacks aren't particular to
the kernel - it's definitely not unheard of to write userspace code where you
want to have a lot of small stacks (especially if you're doing some kind of
coroutine style threading; I've done stuff like this in the past) - and to me,
as someone who's been incrementing on and maintaining a codebase in active use
for 10 years, having multiple older versions in active use that need bugfixes
gives me cold shivers.

I wouldn't be surprised if at some point the zstd people walk back some of their
changes or make it configurable at some point :)

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings
  2022-04-21 23:48 ` [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings Kent Overstreet
  2022-04-22  4:20   ` Christoph Hellwig
@ 2022-04-24 23:46   ` Joe Perches
  2022-04-25  0:45     ` Kent Overstreet
  2022-04-25  2:44     ` Matthew Wilcox
  1 sibling, 2 replies; 59+ messages in thread
From: Joe Perches @ 2022-04-24 23:46 UTC (permalink / raw)
  To: Kent Overstreet, linux-kernel, linux-mm, linux-fsdevel
  Cc: hch, hannes, akpm, linux-clk, linux-tegra, linux-input, roman.gushchin

On Thu, 2022-04-21 at 19:48 -0400, Kent Overstreet wrote:
> This adds printbufs: simple heap-allocated strings meant for building up
> structured messages, for logging/procfs/sysfs and elsewhere. They've
> been heavily used in bcachefs for writing .to_text() functions/methods -
> pretty printers, which has in turn greatly improved the overall quality
> of error messages.
> 
> Basic usage is documented in include/linux/printbuf.h.

Given the maximum printk output is less than 1024 bytes, why should
this be allowed to be larger than that or larger than PAGE_SIZE?

> + * pr_human_readable_u64, pr_human_readable_s64: Print an integer with human
> + * readable units.

Why not extend vsprintf for this using something like %pH[8|16|32|64] 
or %pH[c|s|l|ll|uc|us|ul|ull] ?



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings
  2022-04-24 23:46   ` [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings Joe Perches
@ 2022-04-25  0:45     ` Kent Overstreet
  2022-04-25  2:44     ` Matthew Wilcox
  1 sibling, 0 replies; 59+ messages in thread
From: Kent Overstreet @ 2022-04-25  0:45 UTC (permalink / raw)
  To: Joe Perches
  Cc: linux-kernel, linux-mm, linux-fsdevel, hch, hannes, akpm,
	linux-clk, linux-tegra, linux-input, roman.gushchin

On Sun, Apr 24, 2022 at 04:46:03PM -0700, Joe Perches wrote:
> On Thu, 2022-04-21 at 19:48 -0400, Kent Overstreet wrote:
> > This adds printbufs: simple heap-allocated strings meant for building up
> > structured messages, for logging/procfs/sysfs and elsewhere. They've
> > been heavily used in bcachefs for writing .to_text() functions/methods -
> > pretty printers, which has in turn greatly improved the overall quality
> > of error messages.
> > 
> > Basic usage is documented in include/linux/printbuf.h.
> 
> Given the maximum printk output is less than 1024 bytes, why should
> this be allowed to be larger than that or larger than PAGE_SIZE?

It's not just used there - in bcachefs I use it for sysfs & debugfs, as well as
userspace code for e.g. printing out the superblock (which gets pretty big when
including all the variable length sections).

> > + * pr_human_readable_u64, pr_human_readable_s64: Print an integer with human
> > + * readable units.
> 
> Why not extend vsprintf for this using something like %pH[8|16|32|64] 
> or %pH[c|s|l|ll|uc|us|ul|ull] ?

It'd be incompatible with userspace printf. I do like the way we extend printf
is the kernel, but I'm trying to make sure the code I write now is by default
portable between both kernel space and userspace. Glibc has its own mechanism
for extending printf, I've been meaning to look at that more and see if it'd be
possible to do something more generic and extensible that works for both.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings
  2022-04-24 23:46   ` [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings Joe Perches
  2022-04-25  0:45     ` Kent Overstreet
@ 2022-04-25  2:44     ` Matthew Wilcox
  2022-04-25  4:19       ` Kent Overstreet
  1 sibling, 1 reply; 59+ messages in thread
From: Matthew Wilcox @ 2022-04-25  2:44 UTC (permalink / raw)
  To: Joe Perches
  Cc: Kent Overstreet, linux-kernel, linux-mm, linux-fsdevel, hch,
	hannes, akpm, linux-clk, linux-tegra, linux-input,
	roman.gushchin

On Sun, Apr 24, 2022 at 04:46:03PM -0700, Joe Perches wrote:
> > + * pr_human_readable_u64, pr_human_readable_s64: Print an integer with human
> > + * readable units.
> 
> Why not extend vsprintf for this using something like %pH[8|16|32|64] 
> or %pH[c|s|l|ll|uc|us|ul|ull] ?

The %pX extension we have is _cute_, but ultimately a bad idea.  It
centralises all kinds of unrelated things in vsprintf.c, eg bdev_name()
and clock() and ip_addr_string().

Really, it's working around that we don't have something like Java's
StringBuffer (which I see both seq_buf and printbuf as attempting to
be).  So we have this primitive format string hack instead of exposing
methods like:

void dentry_string(struct strbuf *, struct dentry *);

as an example,
                if (unlikely(ino == dir->i_ino)) {
                        EXT4_ERROR_INODE(dir, "'%pd' linked to parent dir",
                                         dentry);
                        return ERR_PTR(-EFSCORRUPTED);
                }

would become something like:

		if (unlikely(ino == dir->i_ino)) {
			struct strbuf strbuf;
			strbuf_char(strbuf, '\'');
			dentry_string(strbuf, dentry);
			strbuf_string(strbuf, "' linked to parent dir");
			EXT4_ERROR_INODE(dir, strbuf);
			return ERR_PTR(-EFSCORRUPTED);
		}

which isn't terribly nice, but C has sucky syntax for string
construction.  Other languages have done this better, including Rust.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings
  2022-04-25  2:44     ` Matthew Wilcox
@ 2022-04-25  4:19       ` Kent Overstreet
  2022-04-25  4:48         ` Joe Perches
  0 siblings, 1 reply; 59+ messages in thread
From: Kent Overstreet @ 2022-04-25  4:19 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Joe Perches, linux-kernel, linux-mm, linux-fsdevel, hch, hannes,
	akpm, linux-clk, linux-tegra, linux-input, roman.gushchin

On Mon, Apr 25, 2022 at 03:44:34AM +0100, Matthew Wilcox wrote:
> On Sun, Apr 24, 2022 at 04:46:03PM -0700, Joe Perches wrote:
> > > + * pr_human_readable_u64, pr_human_readable_s64: Print an integer with human
> > > + * readable units.
> > 
> > Why not extend vsprintf for this using something like %pH[8|16|32|64] 
> > or %pH[c|s|l|ll|uc|us|ul|ull] ?
> 
> The %pX extension we have is _cute_, but ultimately a bad idea.  It
> centralises all kinds of unrelated things in vsprintf.c, eg bdev_name()
> and clock() and ip_addr_string().

And it's not remotely discoverable. I didn't realize we had bdev_name()
available as a format string until just now or I would've been using it!

> Really, it's working around that we don't have something like Java's
> StringBuffer (which I see both seq_buf and printbuf as attempting to
> be).  So we have this primitive format string hack instead of exposing
> methods like:
> 
> void dentry_string(struct strbuf *, struct dentry *);

Exactly!

> as an example,
>                 if (unlikely(ino == dir->i_ino)) {
>                         EXT4_ERROR_INODE(dir, "'%pd' linked to parent dir",
>                                          dentry);
>                         return ERR_PTR(-EFSCORRUPTED);
>                 }
> 
> would become something like:
> 
> 		if (unlikely(ino == dir->i_ino)) {
> 			struct strbuf strbuf;
> 			strbuf_char(strbuf, '\'');
> 			dentry_string(strbuf, dentry);
> 			strbuf_string(strbuf, "' linked to parent dir");
> 			EXT4_ERROR_INODE(dir, strbuf);
> 			return ERR_PTR(-EFSCORRUPTED);
> 		}
> 
> which isn't terribly nice, but C has sucky syntax for string
> construction.  Other languages have done this better, including Rust.

Over IRC just now you proposed "%p(%p)", dentry_name, dentry - I'm _really_
liking this idea, especially if we can get glibc to take it.

Then your ext4 example becomes just 

	if (unlikely(ino == dir->i_ino)) {
		EXT4_ERROR_INODE(dir, "'%p(%p)' linked to parent dir",
				 dentry_name, dentry);
		return ERR_PTR(-EFSCORRUPTED);
	}

And you can cscope to the pretty-printer! And dentry_name becomes just

void dentry_name(struct printbuf *out, struct dentry *dentry)
{
	...
}

Which is quite a bit simpler than the current definition.

Sweeeeeet.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings
  2022-04-25  4:19       ` Kent Overstreet
@ 2022-04-25  4:48         ` Joe Perches
  2022-04-25  4:59           ` Kent Overstreet
  0 siblings, 1 reply; 59+ messages in thread
From: Joe Perches @ 2022-04-25  4:48 UTC (permalink / raw)
  To: Kent Overstreet, Matthew Wilcox
  Cc: linux-kernel, linux-mm, linux-fsdevel, hch, hannes, akpm,
	linux-clk, linux-tegra, linux-input, roman.gushchin

On Mon, 2022-04-25 at 00:19 -0400, Kent Overstreet wrote:
> On Mon, Apr 25, 2022 at 03:44:34AM +0100, Matthew Wilcox wrote:
> > On Sun, Apr 24, 2022 at 04:46:03PM -0700, Joe Perches wrote:
> > > > + * pr_human_readable_u64, pr_human_readable_s64: Print an integer with human
> > > > + * readable units.
> > > 
> > > Why not extend vsprintf for this using something like %pH[8|16|32|64] 
> > > or %pH[c|s|l|ll|uc|us|ul|ull] ?
> > 
> > The %pX extension we have is _cute_, but ultimately a bad idea.  It
> > centralises all kinds of unrelated things in vsprintf.c, eg bdev_name()
> > and clock() and ip_addr_string().
> 
> And it's not remotely discoverable. I didn't realize we had bdev_name()
> available as a format string until just now or I would've been using it!

Documentation/core-api/printk-formats.rst



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings
  2022-04-25  4:48         ` Joe Perches
@ 2022-04-25  4:59           ` Kent Overstreet
  2022-04-25  5:00             ` Joe Perches
  0 siblings, 1 reply; 59+ messages in thread
From: Kent Overstreet @ 2022-04-25  4:59 UTC (permalink / raw)
  To: Joe Perches
  Cc: Matthew Wilcox, linux-kernel, linux-mm, linux-fsdevel, hch,
	hannes, akpm, linux-clk, linux-tegra, linux-input,
	roman.gushchin

On Sun, Apr 24, 2022 at 09:48:58PM -0700, Joe Perches wrote:
> On Mon, 2022-04-25 at 00:19 -0400, Kent Overstreet wrote:
> > On Mon, Apr 25, 2022 at 03:44:34AM +0100, Matthew Wilcox wrote:
> > > On Sun, Apr 24, 2022 at 04:46:03PM -0700, Joe Perches wrote:
> > > > > + * pr_human_readable_u64, pr_human_readable_s64: Print an integer with human
> > > > > + * readable units.
> > > > 
> > > > Why not extend vsprintf for this using something like %pH[8|16|32|64] 
> > > > or %pH[c|s|l|ll|uc|us|ul|ull] ?
> > > 
> > > The %pX extension we have is _cute_, but ultimately a bad idea.  It
> > > centralises all kinds of unrelated things in vsprintf.c, eg bdev_name()
> > > and clock() and ip_addr_string().
> > 
> > And it's not remotely discoverable. I didn't realize we had bdev_name()
> > available as a format string until just now or I would've been using it!
> 
> Documentation/core-api/printk-formats.rst

Who has time for docs?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings
  2022-04-25  4:59           ` Kent Overstreet
@ 2022-04-25  5:00             ` Joe Perches
  2022-04-25  5:56               ` Kent Overstreet
  0 siblings, 1 reply; 59+ messages in thread
From: Joe Perches @ 2022-04-25  5:00 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Matthew Wilcox, linux-kernel, linux-mm, linux-fsdevel, hch,
	hannes, akpm, linux-clk, linux-tegra, linux-input,
	roman.gushchin

On Mon, 2022-04-25 at 00:59 -0400, Kent Overstreet wrote:
> On Sun, Apr 24, 2022 at 09:48:58PM -0700, Joe Perches wrote:
> > On Mon, 2022-04-25 at 00:19 -0400, Kent Overstreet wrote:
> > > On Mon, Apr 25, 2022 at 03:44:34AM +0100, Matthew Wilcox wrote:
> > > > On Sun, Apr 24, 2022 at 04:46:03PM -0700, Joe Perches wrote:
> > > > > > + * pr_human_readable_u64, pr_human_readable_s64: Print an integer with human
> > > > > > + * readable units.
> > > > > 
> > > > > Why not extend vsprintf for this using something like %pH[8|16|32|64] 
> > > > > or %pH[c|s|l|ll|uc|us|ul|ull] ?
> > > > 
> > > > The %pX extension we have is _cute_, but ultimately a bad idea.  It
> > > > centralises all kinds of unrelated things in vsprintf.c, eg bdev_name()
> > > > and clock() and ip_addr_string().
> > > 
> > > And it's not remotely discoverable. I didn't realize we had bdev_name()
> > > available as a format string until just now or I would've been using it!
> > 
> > Documentation/core-api/printk-formats.rst
> 
> Who has time for docs?

The same people that have time to reimplement the already implemented?


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings
  2022-04-25  5:00             ` Joe Perches
@ 2022-04-25  5:56               ` Kent Overstreet
  0 siblings, 0 replies; 59+ messages in thread
From: Kent Overstreet @ 2022-04-25  5:56 UTC (permalink / raw)
  To: Joe Perches
  Cc: Matthew Wilcox, linux-kernel, linux-mm, linux-fsdevel, hch,
	hannes, akpm, linux-clk, linux-tegra, linux-input,
	roman.gushchin

On Sun, Apr 24, 2022 at 10:00:32PM -0700, Joe Perches wrote:
> On Mon, 2022-04-25 at 00:59 -0400, Kent Overstreet wrote:
> > On Sun, Apr 24, 2022 at 09:48:58PM -0700, Joe Perches wrote:
> > > On Mon, 2022-04-25 at 00:19 -0400, Kent Overstreet wrote:
> > > > On Mon, Apr 25, 2022 at 03:44:34AM +0100, Matthew Wilcox wrote:
> > > > > On Sun, Apr 24, 2022 at 04:46:03PM -0700, Joe Perches wrote:
> > > > > > > + * pr_human_readable_u64, pr_human_readable_s64: Print an integer with human
> > > > > > > + * readable units.
> > > > > > 
> > > > > > Why not extend vsprintf for this using something like %pH[8|16|32|64] 
> > > > > > or %pH[c|s|l|ll|uc|us|ul|ull] ?
> > > > > 
> > > > > The %pX extension we have is _cute_, but ultimately a bad idea.  It
> > > > > centralises all kinds of unrelated things in vsprintf.c, eg bdev_name()
> > > > > and clock() and ip_addr_string().
> > > > 
> > > > And it's not remotely discoverable. I didn't realize we had bdev_name()
> > > > available as a format string until just now or I would've been using it!
> > > 
> > > Documentation/core-api/printk-formats.rst
> > 
> > Who has time for docs?
> 
> The same people that have time to reimplement the already implemented?

Touché :)

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 8/8] mm: Centralize & improve oom reporting in show_mem.c
  2022-04-23  0:46           ` Kent Overstreet
  2022-04-23  1:25             ` Roman Gushchin
@ 2022-04-25  9:28             ` Michal Hocko
  2022-04-25 15:28               ` Kent Overstreet
  1 sibling, 1 reply; 59+ messages in thread
From: Michal Hocko @ 2022-04-25  9:28 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Roman Gushchin, linux-kernel, linux-mm, linux-fsdevel, hch,
	hannes, akpm, linux-clk, linux-tegra, linux-input, rostedt

On Fri 22-04-22 20:46:07, Kent Overstreet wrote:
> On Fri, Apr 22, 2022 at 05:27:41PM -0700, Roman Gushchin wrote:
[...]
> > > In my experience, it's rare to be _so_ out of memory that small kmalloc
> > > allocations are failing - we'll be triggering the show_mem() report before that
> > > happens.
> > 
> > I agree. However the OOM killer _has_ to make the progress even in such rare
> > circumstances.

Absolutely agreed!

> Oh, and the concern is allocator recursion? Yeah, that's a good point.

No, not really. The oom killer is running with PF_MEMALLOC context so no
reclaim recursion is allowed. As I've already pointed out in other
reply the context will have access to memory reserves without any
constrains so it could deplete them completely resulting in other issues
during the recovery.

> Do you know if using memalloc_noreclaim_(save|restore) is sufficient for that,
> or do we want GFP_ATOMIC? I'm already using GFP_ATOMIC for allocations when we
> generate the report on slabs, since we're taking the slab mutex there.

No it's not. You simply _cannot_ allocate from the oom context.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 8/8] mm: Centralize & improve oom reporting in show_mem.c
  2022-04-25  9:28             ` Michal Hocko
@ 2022-04-25 15:28               ` Kent Overstreet
  2022-04-26  7:17                 ` Michal Hocko
  0 siblings, 1 reply; 59+ messages in thread
From: Kent Overstreet @ 2022-04-25 15:28 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Roman Gushchin, linux-kernel, linux-mm, linux-fsdevel, hch,
	hannes, akpm, linux-clk, linux-tegra, linux-input, rostedt

On Mon, Apr 25, 2022 at 11:28:26AM +0200, Michal Hocko wrote:
> 
> > Do you know if using memalloc_noreclaim_(save|restore) is sufficient for that,
> > or do we want GFP_ATOMIC? I'm already using GFP_ATOMIC for allocations when we
> > generate the report on slabs, since we're taking the slab mutex there.
> 
> No it's not. You simply _cannot_ allocate from the oom context.

Hmm, no, that can't be right. I've been using the patch set and it definitely
works, at least in my testing. Do you mean to say that we shouldn't? Can you
explain why?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Rust and Kernel Vendoring [Was Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings]
  2022-04-24 20:36                     ` Kent Overstreet
@ 2022-04-26  2:22                       ` James Bottomley
  0 siblings, 0 replies; 59+ messages in thread
From: James Bottomley @ 2022-04-26  2:22 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Steven Rostedt, Christoph Hellwig, linux-kernel, linux-mm,
	linux-fsdevel, hannes, akpm, linux-clk, linux-tegra, linux-input,
	roman.gushchin

On Sun, 2022-04-24 at 16:36 -0400, Kent Overstreet wrote:
> On Sat, Apr 23, 2022 at 10:16:37AM -0400, James Bottomley wrote:
> > You stripped the nuance of that.  I said many no_std crates could
> > be used in the kernel.  I also said that the async crate couldn't
> > because the rust compiler itself would have to support the kernel
> > threading model.
> 
> I just scanned through that thread and that's not what you said. What
> you said was:
> 
> > The above is also the rust crate problem in miniature: the crates
> > grow API features the kernel will never care about and importing
> > them wholesale is going to take forever because of the internal
> > kernel support issue.  In the end, to take rust async as an
> > example, it will be much better to do for rust what we've done for
> > zlib: take the core that can support the kernel threading model and
> > reimplement that in the kernel crate.  The act of doing that will
> > a) prove people care enough about the functionality and b) allow us
> > to refine it nicely.
> > 
> > I also don't think rust would really want to import crates
> > wholesale.  The reason for no_std is that rust is trying to adapt
> > to embedded environments, which the somewhat harsh constraints of
> > the kernel is very similar to.
> 
> But maybe your position has changed somewhat?

I read those two as saying the same thing just with differing levels of
detail.

>  It sounds like you've been arguing against just directly depending
> on foreign reposotories and for the staus quo of just ad-hoc copying
> of code.

I don't think I've said anything generalised about that either way. 
However, I have noted that most of the external repositories I've
looked at can't (or shouldn't) be imported directly.  Perhaps if we
could find one that could be pulled in directly, we could use that to
drive a discussion of how.

> I'll help by stating my own position: I think we should be coming up
> with a process for how dependencies on other git repositories are
> going to work, something better than just cut and paste. Whether or
> not we vendorize code isn't really that important, but I'd say that
> if we are vendorizing code and we're notincluding entire sub-
> repositories (like cargo vendor does) we ought to still make this a
> scripted process that takes as an input a list of files we're
> pulling and a remote repository we're pulling from, and the file list
> and the remote repo (and commit ID we're pulling from) should all be
> checked in.

Do we have an example of an external piece of code we could do this to?

[...]
> > > The kernel community has a lot of that going on here. Again,
> > > sorry to pick on you James, but I wanted to make the argument
> > > that - maybe the kernel _should_ be adopting a more structured
> > > way of using code from outside repositories, like cargo, or git
> > > submodules (except I've never had a positive experience with git
> > > submodules, so ignore that suggestion, unless I've just been
> > > using them wrong, in which case someone please teach me). To read
> > > you and Greg saying "nah, just copy code from other repos, it's
> > > fine" - it felt like being back in the old days when we were
> > > still trying to get people to use source control, and having that
> > > one older colleague who _insisted_ on not using source control of
> > > any kind, and that's a bit disheartening.
> > 
> > Even in C terms, the kernel is a nostdlib environment.  If a C
> > project has too much libc dependency it's not going to work
> > directly in the kernel, nor should it.  Let's look at zstd (which
> > is pretty much a nostdlib project) as a great example: the facebook
> > people didn't actually port the top of their tree (1.5) to the
> > kernel, they backported bug fixes to the 1.4 branch and made a
> > special release (1.4.10) just for us.  Why did they do this?  It
> > was because the 1.5 version vastly increased stack use to the
> > extent it would run off the end of the limited kernel stack so
> > couldn't be ported directly into the kernel.  A lot of C libraries
> > that are nostdlib have problems like this as well (you can use
> > recursion, but not in the kernel).  There's no easy way of shimming
> > environmental constraints like this.
> 
> I wonder if we might have come up with a better solution if there'd
> been more cross-project communication and less siloing. Small stacks
> aren't particular to the kernel - it's definitely not unheard of to
> write userspace code where you want to have a lot of small stacks
> (especially if you're doing some kind of coroutine style threading;
> I've done stuff like this in the past)

But would you say that every piece of userspace code should reject
recursion and write for small stacks just in case it needs to be reused
in a more extreme environment?

I don't; I think it's fine there are loads of implementations that
would never work in our environment, because mostly there's no reason
to use them in the kernel (or another embedded environment).  I also
understand why people build for the standard userspace environment
first ... it reduces complexity and makes the construction way easier. 

>  - and to me, as someone who's been incrementing on and maintaining a
> codebase in active use for 10 years, having multiple older versions
> in active use that need bugfixes gives me cold shivers.
> 
> I wouldn't be surprised if at some point the zstd people walk back
> some of their changes or make it configurable at some point :)

Many things can happen in the future, but right at the moment I still
think zstd is working fine for both parties.

James



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 8/8] mm: Centralize & improve oom reporting in show_mem.c
  2022-04-25 15:28               ` Kent Overstreet
@ 2022-04-26  7:17                 ` Michal Hocko
  2022-04-26  7:26                   ` Kent Overstreet
  0 siblings, 1 reply; 59+ messages in thread
From: Michal Hocko @ 2022-04-26  7:17 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Roman Gushchin, linux-kernel, linux-mm, linux-fsdevel, hch,
	hannes, akpm, linux-clk, linux-tegra, linux-input, rostedt

On Mon 25-04-22 11:28:11, Kent Overstreet wrote:
> On Mon, Apr 25, 2022 at 11:28:26AM +0200, Michal Hocko wrote:
> > 
> > > Do you know if using memalloc_noreclaim_(save|restore) is sufficient for that,
> > > or do we want GFP_ATOMIC? I'm already using GFP_ATOMIC for allocations when we
> > > generate the report on slabs, since we're taking the slab mutex there.
> > 
> > No it's not. You simply _cannot_ allocate from the oom context.
> 
> Hmm, no, that can't be right. I've been using the patch set and it definitely
> works, at least in my testing.

Yes, the world will not fall down and it really depends on the workload
what kind of effect this might have.

> Do you mean to say that we shouldn't? Can you explain why?

I have already touched on that but let me reiterate. Allocation context
called from the oom path will have an unbound access to memory reserves.
Those are a last resort emergency pools of memory that are not available
normally and there are areas which really depend on them to make a
further progress to release the memory pressure.

Swap over NFS would be one such example. If some other code path messes
with those reserves the swap IO path could fail with all sorts of
fallouts.

So to be really exact in my statement. You can allocate from the OOM
context but it is _strongly_ discouraged unless there is no other way
around that.

I would even claim that the memory reclaim in general shouldn't rely on
memory allocations (other than mempools). If an allocation is really
necessary then an extra care has to prevent from complete memory
depletion.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 8/8] mm: Centralize & improve oom reporting in show_mem.c
  2022-04-26  7:17                 ` Michal Hocko
@ 2022-04-26  7:26                   ` Kent Overstreet
  2022-04-26  7:40                     ` Michal Hocko
  0 siblings, 1 reply; 59+ messages in thread
From: Kent Overstreet @ 2022-04-26  7:26 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Roman Gushchin, linux-kernel, linux-mm, linux-fsdevel, hch,
	hannes, akpm, linux-clk, linux-tegra, linux-input, rostedt

On Tue, Apr 26, 2022 at 09:17:39AM +0200, Michal Hocko wrote:
> I have already touched on that but let me reiterate. Allocation context
> called from the oom path will have an unbound access to memory reserves.
> Those are a last resort emergency pools of memory that are not available
> normally and there are areas which really depend on them to make a
> further progress to release the memory pressure.
> 
> Swap over NFS would be one such example. If some other code path messes
> with those reserves the swap IO path could fail with all sorts of
> fallouts.
> 
> So to be really exact in my statement. You can allocate from the OOM
> context but it is _strongly_ discouraged unless there is no other way
> around that.
> 
> I would even claim that the memory reclaim in general shouldn't rely on
> memory allocations (other than mempools). If an allocation is really
> necessary then an extra care has to prevent from complete memory
> depletion.

100% agreement with this, I've always made sure IO paths I touched were fully
mempool-ified (some of my early work was actually for making sure bio allocation
underneath generic_make_request() won't deadlock - previously allocated bios
won't make forward progress and be freed due to generic_make_request() turning
recursion into iteration, but that's all ancient history).

Anyways, the reason I think this allocation is fine is it's GFP_NOWAIT and it's
completely fine if it fails - all we lose is some diagnostics, and also it's
released right away.

But there's also no need for it to be a point of contention, the way I'm going
with printbufs it'll be trivial to mempool-ify this if we want.

Before I get back to this I'm changing the approach I'm taking with printbufs
and first using it to clean up vsnprintf() and all the related code, which is..
a bit of an undertaking. End result is going to be really cool though.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 8/8] mm: Centralize & improve oom reporting in show_mem.c
  2022-04-26  7:26                   ` Kent Overstreet
@ 2022-04-26  7:40                     ` Michal Hocko
  0 siblings, 0 replies; 59+ messages in thread
From: Michal Hocko @ 2022-04-26  7:40 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Roman Gushchin, linux-kernel, linux-mm, linux-fsdevel, hch,
	hannes, akpm, linux-clk, linux-tegra, linux-input, rostedt

On Tue 26-04-22 03:26:12, Kent Overstreet wrote:
[...]
> Anyways, the reason I think this allocation is fine is it's GFP_NOWAIT and it's
> completely fine if it fails - all we lose is some diagnostics, and also it's
> released right away.

I think you are still missing the PF_MEMALLOC point. Please have a look
how this leads to no reclaim recursion so GFP_NOWAIT has no meaning when
you are allocating from PF_MEMALLOC context (which the oom killer and
any reclaim path is). Also have a look at how __gfp_pfmemalloc_flags
makes the allocation request from that context ALLOC_NO_WATERMARKS.
See?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 0/4] Printbufs & shrinker OOM reporting
  2022-04-21 23:48 [PATCH 0/4] Printbufs & shrinker OOM reporting Kent Overstreet
@ 2022-04-30  4:00   ` Dave Young
  2022-04-21 23:48 ` [PATCH 2/4] mm: Add a .to_text() method for shrinkers Kent Overstreet
                     ` (12 subsequent siblings)
  13 siblings, 0 replies; 59+ messages in thread
From: Dave Young @ 2022-04-30  4:00 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	linux-mm, linux-fsdevel, roman.gushchin, hannes, kexec

Hi Kent,
On Fri, 22 Apr 2022 at 07:56, Kent Overstreet <kent.overstreet@gmail.com> wrote:
>
> Debugging OOMs has been one of my sources of frustration, so this patch series
> is an attempt to do something about it.
>
> The first patch in the series is something I've been slowly evolving in bcachefs
> for years: simple heap allocated strings meant for appending to and building up
> structured log/error messages. They make it easy and straightforward to write
> pretty-printers for everything, which in turn makes good logging and error
> messages something that just happens naturally.
>
> We want it here because that means the reporting I'm adding to shrinkers can be
> used by both OOM reporting, and for the sysfs (or is it debugfs now) interface
> that Roman is adding.
>

I added the kexec list in cc.  It seems like a nice enhancement to oom
reporting.
I suspect kdump tooling need changes to retrieve the kmsg log from
vmcore, could you confirm it?  For example makedumpfile, crash, and
kexec-tools (its vmcore-dmesg tool).


> This patch series also:
>  - adds OOM reporting on shrinkers, reporting on top 10 shrinkers (in sorted
>    order!)
>  - changes slab reporting to be always-on, also reporting top 10 slabs in sorted
>    order
>  - starts centralizing OOM reporting in mm/show_mem.c
>
> The last patch in the series is only a demonstration of how to implement the
> shrinker .to_text() method, since bcachefs isn't upstream yet.
>
> Kent Overstreet (4):
>   lib/printbuf: New data structure for heap-allocated strings
>   mm: Add a .to_text() method for shrinkers
>   mm: Centralize & improve oom reporting in show_mem.c
>   bcachefs: shrinker.to_text() methods
>
>  fs/bcachefs/btree_cache.c     |  18 ++-
>  fs/bcachefs/btree_key_cache.c |  18 ++-
>  include/linux/printbuf.h      | 140 ++++++++++++++++++
>  include/linux/shrinker.h      |   5 +
>  lib/Makefile                  |   4 +-
>  lib/printbuf.c                | 271 ++++++++++++++++++++++++++++++++++
>  mm/Makefile                   |   2 +-
>  mm/oom_kill.c                 |  23 ---
>  {lib => mm}/show_mem.c        |  14 ++
>  mm/slab.h                     |   6 +-
>  mm/slab_common.c              |  53 ++++++-
>  mm/vmscan.c                   |  75 ++++++++++
>  12 files changed, 587 insertions(+), 42 deletions(-)
>  create mode 100644 include/linux/printbuf.h
>  create mode 100644 lib/printbuf.c
>  rename {lib => mm}/show_mem.c (78%)
>
> --
> 2.35.2
>

Thanks
Dave


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 0/4] Printbufs & shrinker OOM reporting
@ 2022-04-30  4:00   ` Dave Young
  0 siblings, 0 replies; 59+ messages in thread
From: Dave Young @ 2022-04-30  4:00 UTC (permalink / raw)
  To: kexec

Hi Kent,
On Fri, 22 Apr 2022 at 07:56, Kent Overstreet <kent.overstreet@gmail.com> wrote:
>
> Debugging OOMs has been one of my sources of frustration, so this patch series
> is an attempt to do something about it.
>
> The first patch in the series is something I've been slowly evolving in bcachefs
> for years: simple heap allocated strings meant for appending to and building up
> structured log/error messages. They make it easy and straightforward to write
> pretty-printers for everything, which in turn makes good logging and error
> messages something that just happens naturally.
>
> We want it here because that means the reporting I'm adding to shrinkers can be
> used by both OOM reporting, and for the sysfs (or is it debugfs now) interface
> that Roman is adding.
>

I added the kexec list in cc.  It seems like a nice enhancement to oom
reporting.
I suspect kdump tooling need changes to retrieve the kmsg log from
vmcore, could you confirm it?  For example makedumpfile, crash, and
kexec-tools (its vmcore-dmesg tool).


> This patch series also:
>  - adds OOM reporting on shrinkers, reporting on top 10 shrinkers (in sorted
>    order!)
>  - changes slab reporting to be always-on, also reporting top 10 slabs in sorted
>    order
>  - starts centralizing OOM reporting in mm/show_mem.c
>
> The last patch in the series is only a demonstration of how to implement the
> shrinker .to_text() method, since bcachefs isn't upstream yet.
>
> Kent Overstreet (4):
>   lib/printbuf: New data structure for heap-allocated strings
>   mm: Add a .to_text() method for shrinkers
>   mm: Centralize & improve oom reporting in show_mem.c
>   bcachefs: shrinker.to_text() methods
>
>  fs/bcachefs/btree_cache.c     |  18 ++-
>  fs/bcachefs/btree_key_cache.c |  18 ++-
>  include/linux/printbuf.h      | 140 ++++++++++++++++++
>  include/linux/shrinker.h      |   5 +
>  lib/Makefile                  |   4 +-
>  lib/printbuf.c                | 271 ++++++++++++++++++++++++++++++++++
>  mm/Makefile                   |   2 +-
>  mm/oom_kill.c                 |  23 ---
>  {lib => mm}/show_mem.c        |  14 ++
>  mm/slab.h                     |   6 +-
>  mm/slab_common.c              |  53 ++++++-
>  mm/vmscan.c                   |  75 ++++++++++
>  12 files changed, 587 insertions(+), 42 deletions(-)
>  create mode 100644 include/linux/printbuf.h
>  create mode 100644 lib/printbuf.c
>  rename {lib => mm}/show_mem.c (78%)
>
> --
> 2.35.2
>

Thanks
Dave



^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2022-04-30  4:01 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-21 23:48 [PATCH 0/4] Printbufs & shrinker OOM reporting Kent Overstreet
2022-04-21 23:48 ` [PATCH 1/4] lib/printbuf: New data structure for heap-allocated strings Kent Overstreet
2022-04-21 23:48 ` [PATCH 2/4] mm: Add a .to_text() method for shrinkers Kent Overstreet
2022-04-22 12:21   ` Michal Hocko
2022-04-21 23:48 ` [PATCH 3/4] mm: Centralize & improve oom reporting in show_mem.c Kent Overstreet
2022-04-21 23:48 ` [PATCH 4/4] bcachefs: shrinker.to_text() methods Kent Overstreet
2022-04-21 23:48 ` [PATCH v2 0/8] Printbufs & improved shrinker debugging Kent Overstreet
2022-04-21 23:48 ` [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings Kent Overstreet
2022-04-22  4:20   ` Christoph Hellwig
2022-04-22  5:14     ` Kent Overstreet
2022-04-22  5:22       ` Christoph Hellwig
2022-04-22  5:40         ` Kent Overstreet
2022-04-22  5:52           ` Christoph Hellwig
2022-04-22  6:06             ` Kent Overstreet
2022-04-22  6:11               ` Christoph Hellwig
2022-04-22  6:18                 ` Kent Overstreet
2022-04-22 15:37           ` Steven Rostedt
2022-04-22 19:30             ` Kent Overstreet
2022-04-22 19:39               ` Steven Rostedt
2022-04-22 20:30                 ` Kent Overstreet
2022-04-22 20:47                   ` Steven Rostedt
2022-04-22 21:51                     ` Kent Overstreet
2022-04-22 22:20                       ` Steven Rostedt
2022-04-22 20:03               ` James Bottomley
2022-04-22 21:13                 ` Kent Overstreet
2022-04-23 14:16                   ` Rust and Kernel Vendoring [Was Re: [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings] James Bottomley
2022-04-24 20:36                     ` Kent Overstreet
2022-04-26  2:22                       ` James Bottomley
2022-04-24 23:46   ` [PATCH v2 1/8] lib/printbuf: New data structure for heap-allocated strings Joe Perches
2022-04-25  0:45     ` Kent Overstreet
2022-04-25  2:44     ` Matthew Wilcox
2022-04-25  4:19       ` Kent Overstreet
2022-04-25  4:48         ` Joe Perches
2022-04-25  4:59           ` Kent Overstreet
2022-04-25  5:00             ` Joe Perches
2022-04-25  5:56               ` Kent Overstreet
2022-04-21 23:48 ` [PATCH v2 2/8] Input/joystick/analog: Convert from seq_buf -> printbuf Kent Overstreet
2022-04-21 23:48 ` [PATCH v2 3/8] mm/memcontrol.c: Convert to printbuf Kent Overstreet
2022-04-22 12:28   ` Michal Hocko
2022-04-21 23:48 ` [PATCH v2 4/8] clk: tegra: bpmp: " Kent Overstreet
2022-04-21 23:48 ` [PATCH v2 5/8] mm: Add a .to_text() method for shrinkers Kent Overstreet
2022-04-21 23:48 ` [PATCH v2 6/8] mm: Count requests to free & nr freed per shrinker Kent Overstreet
2022-04-21 23:48 ` [PATCH v2 7/8] mm: Move lib/show_mem.c to mm/ Kent Overstreet
2022-04-22 12:32   ` Michal Hocko
2022-04-21 23:48 ` [PATCH v2 8/8] mm: Centralize & improve oom reporting in show_mem.c Kent Overstreet
2022-04-22 12:58   ` Michal Hocko
2022-04-22 15:09     ` Roman Gushchin
2022-04-22 23:48       ` Kent Overstreet
2022-04-23  0:27         ` Roman Gushchin
2022-04-23  0:46           ` Kent Overstreet
2022-04-23  1:25             ` Roman Gushchin
2022-04-23 11:48               ` Tetsuo Handa
2022-04-25  9:28             ` Michal Hocko
2022-04-25 15:28               ` Kent Overstreet
2022-04-26  7:17                 ` Michal Hocko
2022-04-26  7:26                   ` Kent Overstreet
2022-04-26  7:40                     ` Michal Hocko
2022-04-30  4:00 ` [PATCH 0/4] Printbufs & shrinker OOM reporting Dave Young
2022-04-30  4:00   ` Dave Young

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.