linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] replace runtime slub_debug toggling with more capable boot parameter
@ 2020-06-02 14:15 Vlastimil Babka
  2020-06-02 14:15 ` [RFC PATCH 1/5] mm, slub: extend slub_debug syntax for multiple blocks Vlastimil Babka
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Vlastimil Babka @ 2020-06-02 14:15 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim
  Cc: linux-mm, linux-kernel, kernel-team, vinmenon, Kees Cook,
	Matthew Garrett, Vlastimil Babka, Jann Horn, Vijayanand Jitta

Hi,

we had recently reports [1,2] of troubles with runtime writes to various SLUB
per-cache sysfs files related to slub_debug or tuning leading to crashes. I have
inspected all those writable files and the rather unfortunate result is that
most of them are made read-only by this patchset, as fixing the issues doesn't
seem viable. Details for each are given in each patch. I expect some
resistance, hence the RFC.

The runtime toggles were however necessary for the use case described in [3],
so the first patch extends the slub_debug boot parameter syntax to achieve the
same configuration without runtime toggles. That should hopefully make the
changes more feasible.

[1] https://lkml.kernel.org/r/1580379523-32272-1-git-send-email-vjitta@codeaurora.org
[2] https://lore.kernel.org/r/CAG48ez31PP--h6_FzVyfJ4H86QYczAFPdxtJHUEEan+7VJETAQ@mail.gmail.com
[3] https://lore.kernel.org/r/1383cd32-1ddc-4dac-b5f8-9c42282fa81c@codeaurora.org

Vlastimil Babka (5):
  mm, slub: extend slub_debug syntax for multiple blocks
  mm, slub: make some slub_debug related attributes read-only
  mm, slub: remove runtime allocation order changes
  mm, slub: make remaining slub_debug related attributes read-only
  mm, slub: make reclaim_account attribute read-only

 .../admin-guide/kernel-parameters.txt         |   2 +-
 Documentation/vm/slub.rst                     |  36 +-
 mm/slub.c                                     | 315 ++++++++----------
 3 files changed, 163 insertions(+), 190 deletions(-)

-- 
2.26.2



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC PATCH 1/5] mm, slub: extend slub_debug syntax for multiple blocks
  2020-06-02 14:15 [RFC PATCH 0/5] replace runtime slub_debug toggling with more capable boot parameter Vlastimil Babka
@ 2020-06-02 14:15 ` Vlastimil Babka
  2020-06-05 21:06   ` Kees Cook
  2020-06-02 14:15 ` [RFC PATCH 2/5] mm, slub: make some slub_debug related attributes read-only Vlastimil Babka
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Vlastimil Babka @ 2020-06-02 14:15 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim
  Cc: linux-mm, linux-kernel, kernel-team, vinmenon, Kees Cook,
	Matthew Garrett, Vlastimil Babka, Jann Horn, Vijayanand Jitta

The slub_debug kernel boot parameter can either apply a single set of options
to all caches or a list of caches. There is a use case where debugging is
applied for all caches and then disabled at runtime for specific caches, for
performance and memory consumption reasons [1]. As runtime changes are
dangerous, extend the boot parameter syntax so that multiple blocks of either
global or slab-specific options can be specified, with blocks delimited by ';'.
This will also support the use case of [1] without runtime changes.

For details see the updated Documentation/vm/slub.rst

[1] https://lore.kernel.org/r/1383cd32-1ddc-4dac-b5f8-9c42282fa81c@codeaurora.org

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 .../admin-guide/kernel-parameters.txt         |   2 +-
 Documentation/vm/slub.rst                     |  17 ++
 mm/slub.c                                     | 177 +++++++++++++-----
 3 files changed, 144 insertions(+), 52 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 6253849afac2..6ea00c8dd635 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4613,7 +4613,7 @@
 			fragmentation.  Defaults to 1 for systems with
 			more than 32MB of RAM, 0 otherwise.
 
-	slub_debug[=options[,slabs]]	[MM, SLUB]
+	slub_debug[=options[,slabs][;[options[,slabs]]...]	[MM, SLUB]
 			Enabling slub_debug allows one to determine the
 			culprit if slab objects become corrupted. Enabling
 			slub_debug can create guard zones around objects and
diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst
index 4eee598555c9..f1154f707e06 100644
--- a/Documentation/vm/slub.rst
+++ b/Documentation/vm/slub.rst
@@ -41,6 +41,11 @@ slub_debug=<Debug-Options>,<slab name1>,<slab name2>,...
 	Enable options only for select slabs (no spaces
 	after a comma)
 
+Multiple blocks of options for all slabs or selected slabs can be given, with
+blocks of options delimited by ';'. The last of "all slabs" blocks is applied
+to all slabs except those that match one of the "select slabs" block. Options
+of the first "select slabs" blocks that matches the slab's name are applied.
+
 Possible debug options are::
 
 	F		Sanity checks on (enables SLAB_DEBUG_CONSISTENCY_CHECKS
@@ -83,6 +88,18 @@ in low memory situations or if there's high fragmentation of memory.  To
 
 	slub_debug=O
 
+You can apply different options to different list of slab names, using blocks
+of options. This will enable red zoning for dentry and user tracking for
+kmalloc. All other slabs will not get any debugging enabled::
+
+	slub_debug=Z,dentry;U,kmalloc-*
+
+You can also enable options (e.g. sanity checks and poisoning) for all caches
+except some that are deemed too performance critical and don't need to be
+debugged::
+
+	slub_debug=FZ;-,zs_handle,zspage
+
 In case you forgot to enable debugging on the kernel command line: It is
 possible to enable debugging manually when the kernel is up. Look at the
 contents of::
diff --git a/mm/slub.c b/mm/slub.c
index 03e063cd979f..47430aad9a65 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -499,7 +499,7 @@ static slab_flags_t slub_debug = DEBUG_DEFAULT_FLAGS;
 static slab_flags_t slub_debug;
 #endif
 
-static char *slub_debug_slabs;
+static char *slub_debug_string;
 static int disable_higher_order_debug;
 
 /*
@@ -1262,68 +1262,132 @@ static noinline int free_debug_processing(
 	return ret;
 }
 
-static int __init setup_slub_debug(char *str)
+/*
+ * Parse a block of slub_debug options. Blocks are delimited by ';'
+ *
+ * @str:    start of block
+ * @flags:  returns parsed flags, or DEBUG_DEFAULT_FLAGS if none specified
+ * @slabs:  return start of list of slabs, or NULL when there's no list
+ * @init:   assume this is initial parsing and not per-kmem-create parsing
+ *
+ * returns the start of next block if there's any, or NULL
+ */
+char *
+parse_slub_debug_flags(char *str, slab_flags_t *flags, char **slabs, bool init)
 {
-	slub_debug = DEBUG_DEFAULT_FLAGS;
-	if (*str++ != '=' || !*str)
-		/*
-		 * No options specified. Switch on full debugging.
-		 */
-		goto out;
+	bool higher_order_disable = false;
 
-	if (*str == ',')
+	/* Skip any completely empty blocks */
+	while (*str && *str == ';')
+		str++;
+
+	if (*str == ',') {
 		/*
 		 * No options but restriction on slabs. This means full
 		 * debugging for slabs matching a pattern.
 		 */
+		*flags = DEBUG_DEFAULT_FLAGS;
 		goto check_slabs;
+	}
+	*flags = 0;
 
-	slub_debug = 0;
-	if (*str == '-')
-		/*
-		 * Switch off all debugging measures.
-		 */
-		goto out;
-
-	/*
-	 * Determine which debug features should be switched on
-	 */
-	for (; *str && *str != ','; str++) {
+	/* Determine which debug features should be switched on */
+	for (; *str && *str != ',' && *str != ';'; str++) {
 		switch (tolower(*str)) {
+		case '-':
+			*flags = 0;
+			break;
 		case 'f':
-			slub_debug |= SLAB_CONSISTENCY_CHECKS;
+			*flags |= SLAB_CONSISTENCY_CHECKS;
 			break;
 		case 'z':
-			slub_debug |= SLAB_RED_ZONE;
+			*flags |= SLAB_RED_ZONE;
 			break;
 		case 'p':
-			slub_debug |= SLAB_POISON;
+			*flags |= SLAB_POISON;
 			break;
 		case 'u':
-			slub_debug |= SLAB_STORE_USER;
+			*flags |= SLAB_STORE_USER;
 			break;
 		case 't':
-			slub_debug |= SLAB_TRACE;
+			*flags |= SLAB_TRACE;
 			break;
 		case 'a':
-			slub_debug |= SLAB_FAILSLAB;
+			*flags |= SLAB_FAILSLAB;
 			break;
 		case 'o':
 			/*
 			 * Avoid enabling debugging on caches if its minimum
 			 * order would increase as a result.
 			 */
-			disable_higher_order_debug = 1;
+			higher_order_disable = true;
 			break;
 		default:
-			pr_err("slub_debug option '%c' unknown. skipped\n",
-			       *str);
+			if (init)
+				pr_err("slub_debug option '%c' unknown. skipped\n", *str);
 		}
 	}
-
 check_slabs:
 	if (*str == ',')
-		slub_debug_slabs = str + 1;
+		*slabs = ++str;
+	else
+		*slabs = NULL;
+
+	/* Skip over the slab list */
+	while (*str && *str != ';')
+		str++;
+
+	/* Skip any completely empty blocks */
+	while (*str && *str == ';')
+		str++;
+
+	if (init && higher_order_disable)
+		disable_higher_order_debug = 1;
+
+	if (*str)
+		return str;
+	else
+		return NULL;
+}
+
+static int __init setup_slub_debug(char *str)
+{
+	slab_flags_t flags;
+	char *saved_str;
+	char *slab_list;
+	bool global_slub_debug_changed = false;
+	bool slab_list_specified = false;
+
+	slub_debug = DEBUG_DEFAULT_FLAGS;
+	if (*str++ != '=' || !*str)
+		/*
+		 * No options specified. Switch on full debugging.
+		 */
+		goto out;
+
+	saved_str = str;
+	while (str) {
+		str = parse_slub_debug_flags(str, &flags, &slab_list, true);
+
+		if (!slab_list) {
+			slub_debug = flags;
+			global_slub_debug_changed = true;
+		} else {
+			slab_list_specified = true;
+		}
+	}
+
+	/*
+	 * For backwards compatibility, a single list of flags with list of
+	 * slabs means debugging is only enabled for those slabs, so the global
+	 * slub_debug should be 0. We can extended that to multiple lists as
+	 * long as there is no option specifying flags without a slab list.
+	 */
+	if (slab_list_specified) {
+		if (!global_slub_debug_changed)
+			slub_debug = 0;
+		slub_debug_string = saved_str;
+	}
 out:
 	if ((static_branch_unlikely(&init_on_alloc) ||
 	     static_branch_unlikely(&init_on_free)) &&
@@ -1352,36 +1416,47 @@ slab_flags_t kmem_cache_flags(unsigned int object_size,
 {
 	char *iter;
 	size_t len;
+	char *next_block;
+	slab_flags_t block_flags;
 
 	/* If slub_debug = 0, it folds into the if conditional. */
-	if (!slub_debug_slabs)
+	if (!slub_debug_string)
 		return flags | slub_debug;
 
 	len = strlen(name);
-	iter = slub_debug_slabs;
-	while (*iter) {
-		char *end, *glob;
-		size_t cmplen;
-
-		end = strchrnul(iter, ',');
+	next_block = slub_debug_string;
+	/* Go through all blocks of debug options, see if any matches our slab's name */
+	while (next_block) {
+		next_block = parse_slub_debug_flags(next_block, &block_flags, &iter, false);
+		if (!iter)
+			continue;
+		/* Found a block that has a slab list, search it */
+		while (*iter) {
+			char *end, *glob;
+			size_t cmplen;
+
+			end = strchrnul(iter, ',');
+			if (next_block && next_block < end)
+				end = next_block - 1;
+
+			glob = strnchr(iter, end - iter, '*');
+			if (glob)
+				cmplen = glob - iter;
+			else
+				cmplen = max_t(size_t, len, (end - iter));
 
-		glob = strnchr(iter, end - iter, '*');
-		if (glob)
-			cmplen = glob - iter;
-		else
-			cmplen = max_t(size_t, len, (end - iter));
+			if (!strncmp(name, iter, cmplen)) {
+				flags |= block_flags;
+				return flags;
+			}
 
-		if (!strncmp(name, iter, cmplen)) {
-			flags |= slub_debug;
-			break;
+			if (!*end || *end == ';')
+				break;
+			iter = end + 1;
 		}
-
-		if (!*end)
-			break;
-		iter = end + 1;
 	}
 
-	return flags;
+	return slub_debug;
 }
 #else /* !CONFIG_SLUB_DEBUG */
 static inline void setup_object_debug(struct kmem_cache *s,
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH 2/5] mm, slub: make some slub_debug related attributes read-only
  2020-06-02 14:15 [RFC PATCH 0/5] replace runtime slub_debug toggling with more capable boot parameter Vlastimil Babka
  2020-06-02 14:15 ` [RFC PATCH 1/5] mm, slub: extend slub_debug syntax for multiple blocks Vlastimil Babka
@ 2020-06-02 14:15 ` Vlastimil Babka
  2020-06-05 21:06   ` Kees Cook
  2020-06-06  0:32   ` Roman Gushchin
  2020-06-02 14:15 ` [RFC PATCH 3/5] mm, slub: remove runtime allocation order changes Vlastimil Babka
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 16+ messages in thread
From: Vlastimil Babka @ 2020-06-02 14:15 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim
  Cc: linux-mm, linux-kernel, kernel-team, vinmenon, Kees Cook,
	Matthew Garrett, Vlastimil Babka, Vijayanand Jitta, Jann Horn

SLUB_DEBUG creates several files under /sys/kernel/slab/<cache>/ that can be
read to check if the respective debugging options are enabled for given cache.
The options can be also toggled at runtime by writing into the files. Some of
those, namely red_zone, poison, and store_user can be toggled only when no
objects yet exist in the cache.

Vijayanand reports [1] that there is a problem with freelist randomization if
changing the debugging option's state results in different number of objects
per page, and the random sequence cache needs thus needs to be recomputed.

However, another problem is that the check for "no objects yet exist in the
cache" is racy, as noted by Jann [2] and fixing that would add overhead or
otherwise complicate the allocation/freeing paths. Thus it would be much
simpler just to remove the runtime toggling support. The documentation
describes it's "In case you forgot to enable debugging on the kernel command
line", but the neccessity of having no objects limits its usefulness anyway for
many caches.

Vijayanand describes an use case [3] where debugging is enabled for all but
zram caches for memory overhead reasons, and using the runtime toggles was the
only way to achieve such configuration. After the previous patch it's now
possible to do that directly from the kernel boot option, so we can remove the
dangerous runtime toggles by making the /sys attribute files read-only.

While updating it, also improve the documentation of the debugging /sys files.

[1] https://lkml.kernel.org/r/1580379523-32272-1-git-send-email-vjitta@codeaurora.org
[2] https://lore.kernel.org/r/CAG48ez31PP--h6_FzVyfJ4H86QYczAFPdxtJHUEEan+7VJETAQ@mail.gmail.com
[3] https://lore.kernel.org/r/1383cd32-1ddc-4dac-b5f8-9c42282fa81c@codeaurora.org

Reported-by: Vijayanand Jitta <vjitta@codeaurora.org>
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 Documentation/vm/slub.rst | 28 ++++++++++++++----------
 mm/slub.c                 | 46 +++------------------------------------
 2 files changed, 20 insertions(+), 54 deletions(-)

diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst
index f1154f707e06..61805e984a0d 100644
--- a/Documentation/vm/slub.rst
+++ b/Documentation/vm/slub.rst
@@ -100,20 +100,26 @@ except some that are deemed too performance critical and don't need to be
 
 	slub_debug=FZ;-,zs_handle,zspage
 
-In case you forgot to enable debugging on the kernel command line: It is
-possible to enable debugging manually when the kernel is up. Look at the
-contents of::
+The state of each debug option for a slab can be found in the respective files
+under::
 
 	/sys/kernel/slab/<slab name>/
 
-Look at the writable files. Writing 1 to them will enable the
-corresponding debug option. All options can be set on a slab that does
-not contain objects. If the slab already contains objects then sanity checks
-and tracing may only be enabled. The other options may cause the realignment
-of objects.
-
-Careful with tracing: It may spew out lots of information and never stop if
-used on the wrong slab.
+If the file contains 1, the option is enabled, 0 means disabled. The debug
+options from the ``slub_debug`` parameter translate to the following files::
+
+	F	sanity_checks
+	Z	red_zone
+	P	poison
+	U	store_user
+	T	trace
+	A	failslab
+
+The sanity_checks, trace and failslab files are writable, so writing 1 or 0
+will enable or disable the option at runtime. The writes to trace and failslab
+may return -EINVAL if the cache is subject to slab merging. Careful with
+tracing: It may spew out lots of information and never stop if used on the
+wrong slab.
 
 Slab merging
 ============
diff --git a/mm/slub.c b/mm/slub.c
index 47430aad9a65..ac198202dbb0 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5351,61 +5351,21 @@ static ssize_t red_zone_show(struct kmem_cache *s, char *buf)
 	return sprintf(buf, "%d\n", !!(s->flags & SLAB_RED_ZONE));
 }
 
-static ssize_t red_zone_store(struct kmem_cache *s,
-				const char *buf, size_t length)
-{
-	if (any_slab_objects(s))
-		return -EBUSY;
-
-	s->flags &= ~SLAB_RED_ZONE;
-	if (buf[0] == '1') {
-		s->flags |= SLAB_RED_ZONE;
-	}
-	calculate_sizes(s, -1);
-	return length;
-}
-SLAB_ATTR(red_zone);
+SLAB_ATTR_RO(red_zone);
 
 static ssize_t poison_show(struct kmem_cache *s, char *buf)
 {
 	return sprintf(buf, "%d\n", !!(s->flags & SLAB_POISON));
 }
 
-static ssize_t poison_store(struct kmem_cache *s,
-				const char *buf, size_t length)
-{
-	if (any_slab_objects(s))
-		return -EBUSY;
-
-	s->flags &= ~SLAB_POISON;
-	if (buf[0] == '1') {
-		s->flags |= SLAB_POISON;
-	}
-	calculate_sizes(s, -1);
-	return length;
-}
-SLAB_ATTR(poison);
+SLAB_ATTR_RO(poison);
 
 static ssize_t store_user_show(struct kmem_cache *s, char *buf)
 {
 	return sprintf(buf, "%d\n", !!(s->flags & SLAB_STORE_USER));
 }
 
-static ssize_t store_user_store(struct kmem_cache *s,
-				const char *buf, size_t length)
-{
-	if (any_slab_objects(s))
-		return -EBUSY;
-
-	s->flags &= ~SLAB_STORE_USER;
-	if (buf[0] == '1') {
-		s->flags &= ~__CMPXCHG_DOUBLE;
-		s->flags |= SLAB_STORE_USER;
-	}
-	calculate_sizes(s, -1);
-	return length;
-}
-SLAB_ATTR(store_user);
+SLAB_ATTR_RO(store_user);
 
 static ssize_t validate_show(struct kmem_cache *s, char *buf)
 {
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH 3/5] mm, slub: remove runtime allocation order changes
  2020-06-02 14:15 [RFC PATCH 0/5] replace runtime slub_debug toggling with more capable boot parameter Vlastimil Babka
  2020-06-02 14:15 ` [RFC PATCH 1/5] mm, slub: extend slub_debug syntax for multiple blocks Vlastimil Babka
  2020-06-02 14:15 ` [RFC PATCH 2/5] mm, slub: make some slub_debug related attributes read-only Vlastimil Babka
@ 2020-06-02 14:15 ` Vlastimil Babka
  2020-06-05 21:06   ` Kees Cook
  2020-06-06  0:32   ` Roman Gushchin
  2020-06-02 14:15 ` [RFC PATCH 4/5] mm, slub: make remaining slub_debug related attributes read-only Vlastimil Babka
  2020-06-02 14:15 ` [RFC PATCH 5/5] mm, slub: make reclaim_account attribute read-only Vlastimil Babka
  4 siblings, 2 replies; 16+ messages in thread
From: Vlastimil Babka @ 2020-06-02 14:15 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim
  Cc: linux-mm, linux-kernel, kernel-team, vinmenon, Kees Cook,
	Matthew Garrett, Vlastimil Babka, Jann Horn, Vijayanand Jitta

SLUB allows runtime changing of page allocation order by writing into the
/sys/kernel/slab/<cache>/order file. Jann has reported [1] that this interface
allows the order to be set too small, leading to crashes.

While it's possible to fix the immediate issue, closer inspection reveals
potential races. Storing the new order calls calculate_sizes() which
non-atomically updates a lot of kmem_cache fields while the cache is still in
use. Unexpected behavior might occur even if the fields are set to the same
value as they were.

This could be fixed by splitting out the part of calculate_sizes() that depends
on forced_order, so that we only update kmem_cache.oo field. This could still
race with init_cache_random_seq(), shuffle_freelist(), allocate_slab(). Perhaps
it's possible to audit and e.g. add some READ_ONCE/WRITE_ONCE accesses, it
might be easier just to remove the runtime order changes, which is what this
patch does. If there are valid usecases for per-cache order setting, we could
e.g. extend the boot parameters to do that.

[1] https://lore.kernel.org/r/CAG48ez31PP--h6_FzVyfJ4H86QYczAFPdxtJHUEEan+7VJETAQ@mail.gmail.com

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 19 +------------------
 1 file changed, 1 insertion(+), 18 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index ac198202dbb0..58c1e9e7b3b3 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5111,28 +5111,11 @@ static ssize_t objs_per_slab_show(struct kmem_cache *s, char *buf)
 }
 SLAB_ATTR_RO(objs_per_slab);
 
-static ssize_t order_store(struct kmem_cache *s,
-				const char *buf, size_t length)
-{
-	unsigned int order;
-	int err;
-
-	err = kstrtouint(buf, 10, &order);
-	if (err)
-		return err;
-
-	if (order > slub_max_order || order < slub_min_order)
-		return -EINVAL;
-
-	calculate_sizes(s, order);
-	return length;
-}
-
 static ssize_t order_show(struct kmem_cache *s, char *buf)
 {
 	return sprintf(buf, "%u\n", oo_order(s->oo));
 }
-SLAB_ATTR(order);
+SLAB_ATTR_RO(order);
 
 static ssize_t min_partial_show(struct kmem_cache *s, char *buf)
 {
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH 4/5] mm, slub: make remaining slub_debug related attributes read-only
  2020-06-02 14:15 [RFC PATCH 0/5] replace runtime slub_debug toggling with more capable boot parameter Vlastimil Babka
                   ` (2 preceding siblings ...)
  2020-06-02 14:15 ` [RFC PATCH 3/5] mm, slub: remove runtime allocation order changes Vlastimil Babka
@ 2020-06-02 14:15 ` Vlastimil Babka
  2020-06-05 21:07   ` Kees Cook
  2020-06-06  0:33   ` Roman Gushchin
  2020-06-02 14:15 ` [RFC PATCH 5/5] mm, slub: make reclaim_account attribute read-only Vlastimil Babka
  4 siblings, 2 replies; 16+ messages in thread
From: Vlastimil Babka @ 2020-06-02 14:15 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim
  Cc: linux-mm, linux-kernel, kernel-team, vinmenon, Kees Cook,
	Matthew Garrett, Vlastimil Babka, Jann Horn, Vijayanand Jitta

SLUB_DEBUG creates several files under /sys/kernel/slab/<cache>/ that can be
read to check if the respective debugging options are enabled for given cache.
Some options, namely sanity_checks, trace, and failslab can be also enabled and
disabled at runtime by writing into the files.

The runtime toggling is racy. Some options disable __CMPXCHG_DOUBLE when
enabled, which means that in case of concurrent allocations, some can still use
__CMPXCHG_DOUBLE and some not, leading to potential corruption. The s->flags
field is also not updated or checked atomically. The simplest solution is to
remove the runtime toggling. The extended slub_debug boot parameter syntax
introduced by earlier patch should allow to fine-tune the debugging
configuration during boot with same granularity.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 Documentation/vm/slub.rst |  7 ++---
 mm/slub.c                 | 62 ++-------------------------------------
 2 files changed, 5 insertions(+), 64 deletions(-)

diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst
index 61805e984a0d..f240292589bd 100644
--- a/Documentation/vm/slub.rst
+++ b/Documentation/vm/slub.rst
@@ -115,11 +115,8 @@ If the file contains 1, the option is enabled, 0 means disabled. The debug
 	T	trace
 	A	failslab
 
-The sanity_checks, trace and failslab files are writable, so writing 1 or 0
-will enable or disable the option at runtime. The writes to trace and failslab
-may return -EINVAL if the cache is subject to slab merging. Careful with
-tracing: It may spew out lots of information and never stop if used on the
-wrong slab.
+Careful with tracing: It may spew out lots of information and never stop if
+used on the wrong slab.
 
 Slab merging
 ============
diff --git a/mm/slub.c b/mm/slub.c
index 58c1e9e7b3b3..38dd6f3ebb04 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5056,20 +5056,6 @@ static ssize_t show_slab_objects(struct kmem_cache *s,
 	return x + sprintf(buf + x, "\n");
 }
 
-#ifdef CONFIG_SLUB_DEBUG
-static int any_slab_objects(struct kmem_cache *s)
-{
-	int node;
-	struct kmem_cache_node *n;
-
-	for_each_kmem_cache_node(s, node, n)
-		if (atomic_long_read(&n->total_objects))
-			return 1;
-
-	return 0;
-}
-#endif
-
 #define to_slab_attr(n) container_of(n, struct slab_attribute, attr)
 #define to_slab(n) container_of(n, struct kmem_cache, kobj)
 
@@ -5291,43 +5277,13 @@ static ssize_t sanity_checks_show(struct kmem_cache *s, char *buf)
 {
 	return sprintf(buf, "%d\n", !!(s->flags & SLAB_CONSISTENCY_CHECKS));
 }
-
-static ssize_t sanity_checks_store(struct kmem_cache *s,
-				const char *buf, size_t length)
-{
-	s->flags &= ~SLAB_CONSISTENCY_CHECKS;
-	if (buf[0] == '1') {
-		s->flags &= ~__CMPXCHG_DOUBLE;
-		s->flags |= SLAB_CONSISTENCY_CHECKS;
-	}
-	return length;
-}
-SLAB_ATTR(sanity_checks);
+SLAB_ATTR_RO(sanity_checks);
 
 static ssize_t trace_show(struct kmem_cache *s, char *buf)
 {
 	return sprintf(buf, "%d\n", !!(s->flags & SLAB_TRACE));
 }
-
-static ssize_t trace_store(struct kmem_cache *s, const char *buf,
-							size_t length)
-{
-	/*
-	 * Tracing a merged cache is going to give confusing results
-	 * as well as cause other issues like converting a mergeable
-	 * cache into an umergeable one.
-	 */
-	if (s->refcount > 1)
-		return -EINVAL;
-
-	s->flags &= ~SLAB_TRACE;
-	if (buf[0] == '1') {
-		s->flags &= ~__CMPXCHG_DOUBLE;
-		s->flags |= SLAB_TRACE;
-	}
-	return length;
-}
-SLAB_ATTR(trace);
+SLAB_ATTR_RO(trace);
 
 static ssize_t red_zone_show(struct kmem_cache *s, char *buf)
 {
@@ -5391,19 +5347,7 @@ static ssize_t failslab_show(struct kmem_cache *s, char *buf)
 {
 	return sprintf(buf, "%d\n", !!(s->flags & SLAB_FAILSLAB));
 }
-
-static ssize_t failslab_store(struct kmem_cache *s, const char *buf,
-							size_t length)
-{
-	if (s->refcount > 1)
-		return -EINVAL;
-
-	s->flags &= ~SLAB_FAILSLAB;
-	if (buf[0] == '1')
-		s->flags |= SLAB_FAILSLAB;
-	return length;
-}
-SLAB_ATTR(failslab);
+SLAB_ATTR_RO(failslab);
 #endif
 
 static ssize_t shrink_show(struct kmem_cache *s, char *buf)
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH 5/5] mm, slub: make reclaim_account attribute read-only
  2020-06-02 14:15 [RFC PATCH 0/5] replace runtime slub_debug toggling with more capable boot parameter Vlastimil Babka
                   ` (3 preceding siblings ...)
  2020-06-02 14:15 ` [RFC PATCH 4/5] mm, slub: make remaining slub_debug related attributes read-only Vlastimil Babka
@ 2020-06-02 14:15 ` Vlastimil Babka
  2020-06-05 21:07   ` Kees Cook
  2020-06-06  0:34   ` Roman Gushchin
  4 siblings, 2 replies; 16+ messages in thread
From: Vlastimil Babka @ 2020-06-02 14:15 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim
  Cc: linux-mm, linux-kernel, kernel-team, vinmenon, Kees Cook,
	Matthew Garrett, Vlastimil Babka, Jann Horn, Vijayanand Jitta

The attribute reflects the SLAB_RECLAIM_ACCOUNT cache flag. It's not clear why
this attribute was writable in the first place, as it's tied to how the cache
is used by its creator, it's not a user tunable. Furthermore:

- it affects slab merging, but that's not being checked while toggled
- if affects whether __GFP_RECLAIMABLE flag is used to allocate page, but
  the runtime toggle doesn't update allocflags
- it affects cache_vmstat_idx() so runtime toggle might lead to incosistency
  of NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE

Thus make it read-only.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 38dd6f3ebb04..d4a9a097da50 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5223,16 +5223,7 @@ static ssize_t reclaim_account_show(struct kmem_cache *s, char *buf)
 {
 	return sprintf(buf, "%d\n", !!(s->flags & SLAB_RECLAIM_ACCOUNT));
 }
-
-static ssize_t reclaim_account_store(struct kmem_cache *s,
-				const char *buf, size_t length)
-{
-	s->flags &= ~SLAB_RECLAIM_ACCOUNT;
-	if (buf[0] == '1')
-		s->flags |= SLAB_RECLAIM_ACCOUNT;
-	return length;
-}
-SLAB_ATTR(reclaim_account);
+SLAB_ATTR_RO(reclaim_account);
 
 static ssize_t hwcache_align_show(struct kmem_cache *s, char *buf)
 {
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 1/5] mm, slub: extend slub_debug syntax for multiple blocks
  2020-06-02 14:15 ` [RFC PATCH 1/5] mm, slub: extend slub_debug syntax for multiple blocks Vlastimil Babka
@ 2020-06-05 21:06   ` Kees Cook
  2020-06-08 16:58     ` Vlastimil Babka
  0 siblings, 1 reply; 16+ messages in thread
From: Kees Cook @ 2020-06-05 21:06 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	linux-mm, linux-kernel, kernel-team, vinmenon, Matthew Garrett,
	Jann Horn, Vijayanand Jitta

On Tue, Jun 02, 2020 at 04:15:15PM +0200, Vlastimil Babka wrote:
> The slub_debug kernel boot parameter can either apply a single set of options
> to all caches or a list of caches. There is a use case where debugging is
> applied for all caches and then disabled at runtime for specific caches, for
> performance and memory consumption reasons [1]. As runtime changes are
> dangerous, extend the boot parameter syntax so that multiple blocks of either
> global or slab-specific options can be specified, with blocks delimited by ';'.
> This will also support the use case of [1] without runtime changes.
> 
> For details see the updated Documentation/vm/slub.rst
> 
> [1] https://lore.kernel.org/r/1383cd32-1ddc-4dac-b5f8-9c42282fa81c@codeaurora.org
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  .../admin-guide/kernel-parameters.txt         |   2 +-
>  Documentation/vm/slub.rst                     |  17 ++
>  mm/slub.c                                     | 177 +++++++++++++-----
>  3 files changed, 144 insertions(+), 52 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 6253849afac2..6ea00c8dd635 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4613,7 +4613,7 @@
>  			fragmentation.  Defaults to 1 for systems with
>  			more than 32MB of RAM, 0 otherwise.
>  
> -	slub_debug[=options[,slabs]]	[MM, SLUB]
> +	slub_debug[=options[,slabs][;[options[,slabs]]...]	[MM, SLUB]
>  			Enabling slub_debug allows one to determine the
>  			culprit if slab objects become corrupted. Enabling
>  			slub_debug can create guard zones around objects and
> diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst
> index 4eee598555c9..f1154f707e06 100644
> --- a/Documentation/vm/slub.rst
> +++ b/Documentation/vm/slub.rst
> @@ -41,6 +41,11 @@ slub_debug=<Debug-Options>,<slab name1>,<slab name2>,...
>  	Enable options only for select slabs (no spaces
>  	after a comma)
>  
> +Multiple blocks of options for all slabs or selected slabs can be given, with
> +blocks of options delimited by ';'. The last of "all slabs" blocks is applied
> +to all slabs except those that match one of the "select slabs" block. Options
> +of the first "select slabs" blocks that matches the slab's name are applied.
> +
>  Possible debug options are::
>  
>  	F		Sanity checks on (enables SLAB_DEBUG_CONSISTENCY_CHECKS
> @@ -83,6 +88,18 @@ in low memory situations or if there's high fragmentation of memory.  To
>  
>  	slub_debug=O
>  
> +You can apply different options to different list of slab names, using blocks
> +of options. This will enable red zoning for dentry and user tracking for
> +kmalloc. All other slabs will not get any debugging enabled::
> +
> +	slub_debug=Z,dentry;U,kmalloc-*
> +
> +You can also enable options (e.g. sanity checks and poisoning) for all caches
> +except some that are deemed too performance critical and don't need to be
> +debugged::

Just for more clarity, how about:

 ... debugged by starting the list with "-" (to mean "all except the
following")::

> +
> +	slub_debug=FZ;-,zs_handle,zspage
> +
>  In case you forgot to enable debugging on the kernel command line: It is
>  possible to enable debugging manually when the kernel is up. Look at the
>  contents of::

Everything else looks great; very nice! :)

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 2/5] mm, slub: make some slub_debug related attributes read-only
  2020-06-02 14:15 ` [RFC PATCH 2/5] mm, slub: make some slub_debug related attributes read-only Vlastimil Babka
@ 2020-06-05 21:06   ` Kees Cook
  2020-06-06  0:32   ` Roman Gushchin
  1 sibling, 0 replies; 16+ messages in thread
From: Kees Cook @ 2020-06-05 21:06 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	linux-mm, linux-kernel, kernel-team, vinmenon, Matthew Garrett,
	Vijayanand Jitta, Jann Horn

On Tue, Jun 02, 2020 at 04:15:16PM +0200, Vlastimil Babka wrote:
> SLUB_DEBUG creates several files under /sys/kernel/slab/<cache>/ that can be
> read to check if the respective debugging options are enabled for given cache.
> The options can be also toggled at runtime by writing into the files. Some of
> those, namely red_zone, poison, and store_user can be toggled only when no
> objects yet exist in the cache.
> 
> Vijayanand reports [1] that there is a problem with freelist randomization if
> changing the debugging option's state results in different number of objects
> per page, and the random sequence cache needs thus needs to be recomputed.
> 
> However, another problem is that the check for "no objects yet exist in the
> cache" is racy, as noted by Jann [2] and fixing that would add overhead or
> otherwise complicate the allocation/freeing paths. Thus it would be much
> simpler just to remove the runtime toggling support. The documentation
> describes it's "In case you forgot to enable debugging on the kernel command
> line", but the neccessity of having no objects limits its usefulness anyway for
> many caches.
> 
> Vijayanand describes an use case [3] where debugging is enabled for all but
> zram caches for memory overhead reasons, and using the runtime toggles was the
> only way to achieve such configuration. After the previous patch it's now
> possible to do that directly from the kernel boot option, so we can remove the
> dangerous runtime toggles by making the /sys attribute files read-only.
> 
> While updating it, also improve the documentation of the debugging /sys files.
> 
> [1] https://lkml.kernel.org/r/1580379523-32272-1-git-send-email-vjitta@codeaurora.org
> [2] https://lore.kernel.org/r/CAG48ez31PP--h6_FzVyfJ4H86QYczAFPdxtJHUEEan+7VJETAQ@mail.gmail.com
> [3] https://lore.kernel.org/r/1383cd32-1ddc-4dac-b5f8-9c42282fa81c@codeaurora.org
> 
> Reported-by: Vijayanand Jitta <vjitta@codeaurora.org>
> Reported-by: Jann Horn <jannh@google.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 3/5] mm, slub: remove runtime allocation order changes
  2020-06-02 14:15 ` [RFC PATCH 3/5] mm, slub: remove runtime allocation order changes Vlastimil Babka
@ 2020-06-05 21:06   ` Kees Cook
  2020-06-06  0:32   ` Roman Gushchin
  1 sibling, 0 replies; 16+ messages in thread
From: Kees Cook @ 2020-06-05 21:06 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	linux-mm, linux-kernel, kernel-team, vinmenon, Matthew Garrett,
	Jann Horn, Vijayanand Jitta

On Tue, Jun 02, 2020 at 04:15:17PM +0200, Vlastimil Babka wrote:
> SLUB allows runtime changing of page allocation order by writing into the
> /sys/kernel/slab/<cache>/order file. Jann has reported [1] that this interface
> allows the order to be set too small, leading to crashes.
> 
> While it's possible to fix the immediate issue, closer inspection reveals
> potential races. Storing the new order calls calculate_sizes() which
> non-atomically updates a lot of kmem_cache fields while the cache is still in
> use. Unexpected behavior might occur even if the fields are set to the same
> value as they were.
> 
> This could be fixed by splitting out the part of calculate_sizes() that depends
> on forced_order, so that we only update kmem_cache.oo field. This could still
> race with init_cache_random_seq(), shuffle_freelist(), allocate_slab(). Perhaps
> it's possible to audit and e.g. add some READ_ONCE/WRITE_ONCE accesses, it
> might be easier just to remove the runtime order changes, which is what this
> patch does. If there are valid usecases for per-cache order setting, we could
> e.g. extend the boot parameters to do that.
> 
> [1] https://lore.kernel.org/r/CAG48ez31PP--h6_FzVyfJ4H86QYczAFPdxtJHUEEan+7VJETAQ@mail.gmail.com
> 
> Reported-by: Jann Horn <jannh@google.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 4/5] mm, slub: make remaining slub_debug related attributes read-only
  2020-06-02 14:15 ` [RFC PATCH 4/5] mm, slub: make remaining slub_debug related attributes read-only Vlastimil Babka
@ 2020-06-05 21:07   ` Kees Cook
  2020-06-06  0:33   ` Roman Gushchin
  1 sibling, 0 replies; 16+ messages in thread
From: Kees Cook @ 2020-06-05 21:07 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	linux-mm, linux-kernel, kernel-team, vinmenon, Matthew Garrett,
	Jann Horn, Vijayanand Jitta

On Tue, Jun 02, 2020 at 04:15:18PM +0200, Vlastimil Babka wrote:
> SLUB_DEBUG creates several files under /sys/kernel/slab/<cache>/ that can be
> read to check if the respective debugging options are enabled for given cache.
> Some options, namely sanity_checks, trace, and failslab can be also enabled and
> disabled at runtime by writing into the files.
> 
> The runtime toggling is racy. Some options disable __CMPXCHG_DOUBLE when
> enabled, which means that in case of concurrent allocations, some can still use
> __CMPXCHG_DOUBLE and some not, leading to potential corruption. The s->flags
> field is also not updated or checked atomically. The simplest solution is to
> remove the runtime toggling. The extended slub_debug boot parameter syntax
> introduced by earlier patch should allow to fine-tune the debugging
> configuration during boot with same granularity.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 5/5] mm, slub: make reclaim_account attribute read-only
  2020-06-02 14:15 ` [RFC PATCH 5/5] mm, slub: make reclaim_account attribute read-only Vlastimil Babka
@ 2020-06-05 21:07   ` Kees Cook
  2020-06-06  0:34   ` Roman Gushchin
  1 sibling, 0 replies; 16+ messages in thread
From: Kees Cook @ 2020-06-05 21:07 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	linux-mm, linux-kernel, kernel-team, vinmenon, Matthew Garrett,
	Jann Horn, Vijayanand Jitta

On Tue, Jun 02, 2020 at 04:15:19PM +0200, Vlastimil Babka wrote:
> The attribute reflects the SLAB_RECLAIM_ACCOUNT cache flag. It's not clear why
> this attribute was writable in the first place, as it's tied to how the cache
> is used by its creator, it's not a user tunable. Furthermore:
> 
> - it affects slab merging, but that's not being checked while toggled
> - if affects whether __GFP_RECLAIMABLE flag is used to allocate page, but
>   the runtime toggle doesn't update allocflags
> - it affects cache_vmstat_idx() so runtime toggle might lead to incosistency
>   of NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE
> 
> Thus make it read-only.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 2/5] mm, slub: make some slub_debug related attributes read-only
  2020-06-02 14:15 ` [RFC PATCH 2/5] mm, slub: make some slub_debug related attributes read-only Vlastimil Babka
  2020-06-05 21:06   ` Kees Cook
@ 2020-06-06  0:32   ` Roman Gushchin
  1 sibling, 0 replies; 16+ messages in thread
From: Roman Gushchin @ 2020-06-06  0:32 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	linux-mm, linux-kernel, kernel-team, vinmenon, Kees Cook,
	Matthew Garrett, Vijayanand Jitta, Jann Horn

On Tue, Jun 02, 2020 at 04:15:16PM +0200, Vlastimil Babka wrote:
> SLUB_DEBUG creates several files under /sys/kernel/slab/<cache>/ that can be
> read to check if the respective debugging options are enabled for given cache.
> The options can be also toggled at runtime by writing into the files. Some of
> those, namely red_zone, poison, and store_user can be toggled only when no
> objects yet exist in the cache.
> 
> Vijayanand reports [1] that there is a problem with freelist randomization if
> changing the debugging option's state results in different number of objects
> per page, and the random sequence cache needs thus needs to be recomputed.
> 
> However, another problem is that the check for "no objects yet exist in the
> cache" is racy, as noted by Jann [2] and fixing that would add overhead or
> otherwise complicate the allocation/freeing paths. Thus it would be much
> simpler just to remove the runtime toggling support. The documentation
> describes it's "In case you forgot to enable debugging on the kernel command
> line", but the neccessity of having no objects limits its usefulness anyway for
> many caches.
> 
> Vijayanand describes an use case [3] where debugging is enabled for all but
> zram caches for memory overhead reasons, and using the runtime toggles was the
> only way to achieve such configuration. After the previous patch it's now
> possible to do that directly from the kernel boot option, so we can remove the
> dangerous runtime toggles by making the /sys attribute files read-only.
> 
> While updating it, also improve the documentation of the debugging /sys files.
> 
> [1] https://lkml.kernel.org/r/1580379523-32272-1-git-send-email-vjitta@codeaurora.org
> [2] https://lore.kernel.org/r/CAG48ez31PP--h6_FzVyfJ4H86QYczAFPdxtJHUEEan+7VJETAQ@mail.gmail.com
> [3] https://lore.kernel.org/r/1383cd32-1ddc-4dac-b5f8-9c42282fa81c@codeaurora.org
> 
> Reported-by: Vijayanand Jitta <vjitta@codeaurora.org>
> Reported-by: Jann Horn <jannh@google.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Roman Gushchin <guro@fb.com>

Thanks!

> ---
>  Documentation/vm/slub.rst | 28 ++++++++++++++----------
>  mm/slub.c                 | 46 +++------------------------------------
>  2 files changed, 20 insertions(+), 54 deletions(-)
> 
> diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst
> index f1154f707e06..61805e984a0d 100644
> --- a/Documentation/vm/slub.rst
> +++ b/Documentation/vm/slub.rst
> @@ -100,20 +100,26 @@ except some that are deemed too performance critical and don't need to be
>  
>  	slub_debug=FZ;-,zs_handle,zspage
>  
> -In case you forgot to enable debugging on the kernel command line: It is
> -possible to enable debugging manually when the kernel is up. Look at the
> -contents of::
> +The state of each debug option for a slab can be found in the respective files
> +under::
>  
>  	/sys/kernel/slab/<slab name>/
>  
> -Look at the writable files. Writing 1 to them will enable the
> -corresponding debug option. All options can be set on a slab that does
> -not contain objects. If the slab already contains objects then sanity checks
> -and tracing may only be enabled. The other options may cause the realignment
> -of objects.
> -
> -Careful with tracing: It may spew out lots of information and never stop if
> -used on the wrong slab.
> +If the file contains 1, the option is enabled, 0 means disabled. The debug
> +options from the ``slub_debug`` parameter translate to the following files::
> +
> +	F	sanity_checks
> +	Z	red_zone
> +	P	poison
> +	U	store_user
> +	T	trace
> +	A	failslab
> +
> +The sanity_checks, trace and failslab files are writable, so writing 1 or 0
> +will enable or disable the option at runtime. The writes to trace and failslab
> +may return -EINVAL if the cache is subject to slab merging. Careful with
> +tracing: It may spew out lots of information and never stop if used on the
> +wrong slab.
>  
>  Slab merging
>  ============
> diff --git a/mm/slub.c b/mm/slub.c
> index 47430aad9a65..ac198202dbb0 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -5351,61 +5351,21 @@ static ssize_t red_zone_show(struct kmem_cache *s, char *buf)
>  	return sprintf(buf, "%d\n", !!(s->flags & SLAB_RED_ZONE));
>  }
>  
> -static ssize_t red_zone_store(struct kmem_cache *s,
> -				const char *buf, size_t length)
> -{
> -	if (any_slab_objects(s))
> -		return -EBUSY;
> -
> -	s->flags &= ~SLAB_RED_ZONE;
> -	if (buf[0] == '1') {
> -		s->flags |= SLAB_RED_ZONE;
> -	}
> -	calculate_sizes(s, -1);
> -	return length;
> -}
> -SLAB_ATTR(red_zone);
> +SLAB_ATTR_RO(red_zone);
>  
>  static ssize_t poison_show(struct kmem_cache *s, char *buf)
>  {
>  	return sprintf(buf, "%d\n", !!(s->flags & SLAB_POISON));
>  }
>  
> -static ssize_t poison_store(struct kmem_cache *s,
> -				const char *buf, size_t length)
> -{
> -	if (any_slab_objects(s))
> -		return -EBUSY;
> -
> -	s->flags &= ~SLAB_POISON;
> -	if (buf[0] == '1') {
> -		s->flags |= SLAB_POISON;
> -	}
> -	calculate_sizes(s, -1);
> -	return length;
> -}
> -SLAB_ATTR(poison);
> +SLAB_ATTR_RO(poison);
>  
>  static ssize_t store_user_show(struct kmem_cache *s, char *buf)
>  {
>  	return sprintf(buf, "%d\n", !!(s->flags & SLAB_STORE_USER));
>  }
>  
> -static ssize_t store_user_store(struct kmem_cache *s,
> -				const char *buf, size_t length)
> -{
> -	if (any_slab_objects(s))
> -		return -EBUSY;
> -
> -	s->flags &= ~SLAB_STORE_USER;
> -	if (buf[0] == '1') {
> -		s->flags &= ~__CMPXCHG_DOUBLE;
> -		s->flags |= SLAB_STORE_USER;
> -	}
> -	calculate_sizes(s, -1);
> -	return length;
> -}
> -SLAB_ATTR(store_user);
> +SLAB_ATTR_RO(store_user);
>  
>  static ssize_t validate_show(struct kmem_cache *s, char *buf)
>  {
> -- 
> 2.26.2
> 
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 3/5] mm, slub: remove runtime allocation order changes
  2020-06-02 14:15 ` [RFC PATCH 3/5] mm, slub: remove runtime allocation order changes Vlastimil Babka
  2020-06-05 21:06   ` Kees Cook
@ 2020-06-06  0:32   ` Roman Gushchin
  1 sibling, 0 replies; 16+ messages in thread
From: Roman Gushchin @ 2020-06-06  0:32 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	linux-mm, linux-kernel, kernel-team, vinmenon, Kees Cook,
	Matthew Garrett, Jann Horn, Vijayanand Jitta

On Tue, Jun 02, 2020 at 04:15:17PM +0200, Vlastimil Babka wrote:
> SLUB allows runtime changing of page allocation order by writing into the
> /sys/kernel/slab/<cache>/order file. Jann has reported [1] that this interface
> allows the order to be set too small, leading to crashes.
> 
> While it's possible to fix the immediate issue, closer inspection reveals
> potential races. Storing the new order calls calculate_sizes() which
> non-atomically updates a lot of kmem_cache fields while the cache is still in
> use. Unexpected behavior might occur even if the fields are set to the same
> value as they were.
> 
> This could be fixed by splitting out the part of calculate_sizes() that depends
> on forced_order, so that we only update kmem_cache.oo field. This could still
> race with init_cache_random_seq(), shuffle_freelist(), allocate_slab(). Perhaps
> it's possible to audit and e.g. add some READ_ONCE/WRITE_ONCE accesses, it
> might be easier just to remove the runtime order changes, which is what this
> patch does. If there are valid usecases for per-cache order setting, we could
> e.g. extend the boot parameters to do that.
> 
> [1] https://lore.kernel.org/r/CAG48ez31PP--h6_FzVyfJ4H86QYczAFPdxtJHUEEan+7VJETAQ@mail.gmail.com
> 
> Reported-by: Jann Horn <jannh@google.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Roman Gushchin <guro@fb.com>

Thanks!

> ---
>  mm/slub.c | 19 +------------------
>  1 file changed, 1 insertion(+), 18 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index ac198202dbb0..58c1e9e7b3b3 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -5111,28 +5111,11 @@ static ssize_t objs_per_slab_show(struct kmem_cache *s, char *buf)
>  }
>  SLAB_ATTR_RO(objs_per_slab);
>  
> -static ssize_t order_store(struct kmem_cache *s,
> -				const char *buf, size_t length)
> -{
> -	unsigned int order;
> -	int err;
> -
> -	err = kstrtouint(buf, 10, &order);
> -	if (err)
> -		return err;
> -
> -	if (order > slub_max_order || order < slub_min_order)
> -		return -EINVAL;
> -
> -	calculate_sizes(s, order);
> -	return length;
> -}
> -
>  static ssize_t order_show(struct kmem_cache *s, char *buf)
>  {
>  	return sprintf(buf, "%u\n", oo_order(s->oo));
>  }
> -SLAB_ATTR(order);
> +SLAB_ATTR_RO(order);
>  
>  static ssize_t min_partial_show(struct kmem_cache *s, char *buf)
>  {
> -- 
> 2.26.2
> 
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 4/5] mm, slub: make remaining slub_debug related attributes read-only
  2020-06-02 14:15 ` [RFC PATCH 4/5] mm, slub: make remaining slub_debug related attributes read-only Vlastimil Babka
  2020-06-05 21:07   ` Kees Cook
@ 2020-06-06  0:33   ` Roman Gushchin
  1 sibling, 0 replies; 16+ messages in thread
From: Roman Gushchin @ 2020-06-06  0:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	linux-mm, linux-kernel, kernel-team, vinmenon, Kees Cook,
	Matthew Garrett, Jann Horn, Vijayanand Jitta

On Tue, Jun 02, 2020 at 04:15:18PM +0200, Vlastimil Babka wrote:
> SLUB_DEBUG creates several files under /sys/kernel/slab/<cache>/ that can be
> read to check if the respective debugging options are enabled for given cache.
> Some options, namely sanity_checks, trace, and failslab can be also enabled and
> disabled at runtime by writing into the files.
> 
> The runtime toggling is racy. Some options disable __CMPXCHG_DOUBLE when
> enabled, which means that in case of concurrent allocations, some can still use
> __CMPXCHG_DOUBLE and some not, leading to potential corruption. The s->flags
> field is also not updated or checked atomically. The simplest solution is to
> remove the runtime toggling. The extended slub_debug boot parameter syntax
> introduced by earlier patch should allow to fine-tune the debugging
> configuration during boot with same granularity.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Roman Gushchin <guro@fb.com>

Thanks!

> ---
>  Documentation/vm/slub.rst |  7 ++---
>  mm/slub.c                 | 62 ++-------------------------------------
>  2 files changed, 5 insertions(+), 64 deletions(-)
> 
> diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst
> index 61805e984a0d..f240292589bd 100644
> --- a/Documentation/vm/slub.rst
> +++ b/Documentation/vm/slub.rst
> @@ -115,11 +115,8 @@ If the file contains 1, the option is enabled, 0 means disabled. The debug
>  	T	trace
>  	A	failslab
>  
> -The sanity_checks, trace and failslab files are writable, so writing 1 or 0
> -will enable or disable the option at runtime. The writes to trace and failslab
> -may return -EINVAL if the cache is subject to slab merging. Careful with
> -tracing: It may spew out lots of information and never stop if used on the
> -wrong slab.
> +Careful with tracing: It may spew out lots of information and never stop if
> +used on the wrong slab.
>  
>  Slab merging
>  ============
> diff --git a/mm/slub.c b/mm/slub.c
> index 58c1e9e7b3b3..38dd6f3ebb04 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -5056,20 +5056,6 @@ static ssize_t show_slab_objects(struct kmem_cache *s,
>  	return x + sprintf(buf + x, "\n");
>  }
>  
> -#ifdef CONFIG_SLUB_DEBUG
> -static int any_slab_objects(struct kmem_cache *s)
> -{
> -	int node;
> -	struct kmem_cache_node *n;
> -
> -	for_each_kmem_cache_node(s, node, n)
> -		if (atomic_long_read(&n->total_objects))
> -			return 1;
> -
> -	return 0;
> -}
> -#endif
> -
>  #define to_slab_attr(n) container_of(n, struct slab_attribute, attr)
>  #define to_slab(n) container_of(n, struct kmem_cache, kobj)
>  
> @@ -5291,43 +5277,13 @@ static ssize_t sanity_checks_show(struct kmem_cache *s, char *buf)
>  {
>  	return sprintf(buf, "%d\n", !!(s->flags & SLAB_CONSISTENCY_CHECKS));
>  }
> -
> -static ssize_t sanity_checks_store(struct kmem_cache *s,
> -				const char *buf, size_t length)
> -{
> -	s->flags &= ~SLAB_CONSISTENCY_CHECKS;
> -	if (buf[0] == '1') {
> -		s->flags &= ~__CMPXCHG_DOUBLE;
> -		s->flags |= SLAB_CONSISTENCY_CHECKS;
> -	}
> -	return length;
> -}
> -SLAB_ATTR(sanity_checks);
> +SLAB_ATTR_RO(sanity_checks);
>  
>  static ssize_t trace_show(struct kmem_cache *s, char *buf)
>  {
>  	return sprintf(buf, "%d\n", !!(s->flags & SLAB_TRACE));
>  }
> -
> -static ssize_t trace_store(struct kmem_cache *s, const char *buf,
> -							size_t length)
> -{
> -	/*
> -	 * Tracing a merged cache is going to give confusing results
> -	 * as well as cause other issues like converting a mergeable
> -	 * cache into an umergeable one.
> -	 */
> -	if (s->refcount > 1)
> -		return -EINVAL;
> -
> -	s->flags &= ~SLAB_TRACE;
> -	if (buf[0] == '1') {
> -		s->flags &= ~__CMPXCHG_DOUBLE;
> -		s->flags |= SLAB_TRACE;
> -	}
> -	return length;
> -}
> -SLAB_ATTR(trace);
> +SLAB_ATTR_RO(trace);
>  
>  static ssize_t red_zone_show(struct kmem_cache *s, char *buf)
>  {
> @@ -5391,19 +5347,7 @@ static ssize_t failslab_show(struct kmem_cache *s, char *buf)
>  {
>  	return sprintf(buf, "%d\n", !!(s->flags & SLAB_FAILSLAB));
>  }
> -
> -static ssize_t failslab_store(struct kmem_cache *s, const char *buf,
> -							size_t length)
> -{
> -	if (s->refcount > 1)
> -		return -EINVAL;
> -
> -	s->flags &= ~SLAB_FAILSLAB;
> -	if (buf[0] == '1')
> -		s->flags |= SLAB_FAILSLAB;
> -	return length;
> -}
> -SLAB_ATTR(failslab);
> +SLAB_ATTR_RO(failslab);
>  #endif
>  
>  static ssize_t shrink_show(struct kmem_cache *s, char *buf)
> -- 
> 2.26.2
> 
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 5/5] mm, slub: make reclaim_account attribute read-only
  2020-06-02 14:15 ` [RFC PATCH 5/5] mm, slub: make reclaim_account attribute read-only Vlastimil Babka
  2020-06-05 21:07   ` Kees Cook
@ 2020-06-06  0:34   ` Roman Gushchin
  1 sibling, 0 replies; 16+ messages in thread
From: Roman Gushchin @ 2020-06-06  0:34 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	linux-mm, linux-kernel, kernel-team, vinmenon, Kees Cook,
	Matthew Garrett, Jann Horn, Vijayanand Jitta

On Tue, Jun 02, 2020 at 04:15:19PM +0200, Vlastimil Babka wrote:
> The attribute reflects the SLAB_RECLAIM_ACCOUNT cache flag. It's not clear why
> this attribute was writable in the first place, as it's tied to how the cache
> is used by its creator, it's not a user tunable. Furthermore:
> 
> - it affects slab merging, but that's not being checked while toggled
> - if affects whether __GFP_RECLAIMABLE flag is used to allocate page, but
>   the runtime toggle doesn't update allocflags
> - it affects cache_vmstat_idx() so runtime toggle might lead to incosistency
>   of NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE
> 
> Thus make it read-only.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Roman Gushchin <guro@fb.com>

> ---
>  mm/slub.c | 11 +----------
>  1 file changed, 1 insertion(+), 10 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 38dd6f3ebb04..d4a9a097da50 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -5223,16 +5223,7 @@ static ssize_t reclaim_account_show(struct kmem_cache *s, char *buf)
>  {
>  	return sprintf(buf, "%d\n", !!(s->flags & SLAB_RECLAIM_ACCOUNT));
>  }
> -
> -static ssize_t reclaim_account_store(struct kmem_cache *s,
> -				const char *buf, size_t length)
> -{
> -	s->flags &= ~SLAB_RECLAIM_ACCOUNT;
> -	if (buf[0] == '1')
> -		s->flags |= SLAB_RECLAIM_ACCOUNT;
> -	return length;
> -}
> -SLAB_ATTR(reclaim_account);
> +SLAB_ATTR_RO(reclaim_account);
>  
>  static ssize_t hwcache_align_show(struct kmem_cache *s, char *buf)
>  {
> -- 
> 2.26.2
> 
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 1/5] mm, slub: extend slub_debug syntax for multiple blocks
  2020-06-05 21:06   ` Kees Cook
@ 2020-06-08 16:58     ` Vlastimil Babka
  0 siblings, 0 replies; 16+ messages in thread
From: Vlastimil Babka @ 2020-06-08 16:58 UTC (permalink / raw)
  To: Kees Cook
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	linux-mm, linux-kernel, kernel-team, vinmenon, Matthew Garrett,
	Jann Horn, Vijayanand Jitta


On 6/5/20 11:06 PM, Kees Cook wrote:
>> @@ -83,6 +88,18 @@ in low memory situations or if there's high fragmentation of memory.  To
>>  
>>  	slub_debug=O
>>  
>> +You can apply different options to different list of slab names, using blocks
>> +of options. This will enable red zoning for dentry and user tracking for
>> +kmalloc. All other slabs will not get any debugging enabled::
>> +
>> +	slub_debug=Z,dentry;U,kmalloc-*
>> +
>> +You can also enable options (e.g. sanity checks and poisoning) for all caches
>> +except some that are deemed too performance critical and don't need to be
>> +debugged::
> 
> Just for more clarity, how about:
> 
>  ... debugged by starting the list with "-" (to mean "all except the
> following")::

Hmm, "-" is not exactly "all except the following", but "no debugging", as
explained in the list of debug options earlier in the file.

So I'm updating it to this:

You can also enable options (e.g. sanity checks and poisoning) for all caches
except some that are deemed too performance critical and don't need to be
debugged by specifying global debug options followed by a list of slab names
with "-" as options::

        slub_debug=FZ;-,zs_handle,zspage


>> +
>> +	slub_debug=FZ;-,zs_handle,zspage
>> +
>>  In case you forgot to enable debugging on the kernel command line: It is
>>  possible to enable debugging manually when the kernel is up. Look at the
>>  contents of::
> 
> Everything else looks great; very nice! :)
> 
> Reviewed-by: Kees Cook <keescook@chromium.org>
 
Thanks!


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-06-08 16:59 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-02 14:15 [RFC PATCH 0/5] replace runtime slub_debug toggling with more capable boot parameter Vlastimil Babka
2020-06-02 14:15 ` [RFC PATCH 1/5] mm, slub: extend slub_debug syntax for multiple blocks Vlastimil Babka
2020-06-05 21:06   ` Kees Cook
2020-06-08 16:58     ` Vlastimil Babka
2020-06-02 14:15 ` [RFC PATCH 2/5] mm, slub: make some slub_debug related attributes read-only Vlastimil Babka
2020-06-05 21:06   ` Kees Cook
2020-06-06  0:32   ` Roman Gushchin
2020-06-02 14:15 ` [RFC PATCH 3/5] mm, slub: remove runtime allocation order changes Vlastimil Babka
2020-06-05 21:06   ` Kees Cook
2020-06-06  0:32   ` Roman Gushchin
2020-06-02 14:15 ` [RFC PATCH 4/5] mm, slub: make remaining slub_debug related attributes read-only Vlastimil Babka
2020-06-05 21:07   ` Kees Cook
2020-06-06  0:33   ` Roman Gushchin
2020-06-02 14:15 ` [RFC PATCH 5/5] mm, slub: make reclaim_account attribute read-only Vlastimil Babka
2020-06-05 21:07   ` Kees Cook
2020-06-06  0:34   ` Roman Gushchin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).