linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] stackdepot, kasan, workqueue: Avoid expanding stackdepot slabs when holding raw_spin_lock
@ 2021-09-07 14:13 Marco Elver
  2021-09-07 14:13 ` [PATCH 1/6] lib/stackdepot: include gfp.h Marco Elver
                   ` (8 more replies)
  0 siblings, 9 replies; 17+ messages in thread
From: Marco Elver @ 2021-09-07 14:13 UTC (permalink / raw)
  To: elver, Andrew Morton
  Cc: Shuah Khan, Tejun Heo, Lai Jiangshan, Andrey Konovalov,
	Walter Wu, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov,
	Vijayanand Jitta, Vinayak Menon, Gustavo A. R. Silva, kasan-dev,
	linux-kernel, linux-mm, Aleksandr Nogikh, Taras Madan

Shuah Khan reported [1]:

 | When CONFIG_PROVE_RAW_LOCK_NESTING=y and CONFIG_KASAN are enabled,
 | kasan_record_aux_stack() runs into "BUG: Invalid wait context" when
 | it tries to allocate memory attempting to acquire spinlock in page
 | allocation code while holding workqueue pool raw_spinlock.
 |
 | There are several instances of this problem when block layer tries
 | to __queue_work(). Call trace from one of these instances is below:
 |
 |     kblockd_mod_delayed_work_on()
 |       mod_delayed_work_on()
 |         __queue_delayed_work()
 |           __queue_work() (rcu_read_lock, raw_spin_lock pool->lock held)
 |             insert_work()
 |               kasan_record_aux_stack()
 |                 kasan_save_stack()
 |                   stack_depot_save()
 |                     alloc_pages()
 |                       __alloc_pages()
 |                         get_page_from_freelist()
 |                           rm_queue()
 |                             rm_queue_pcplist()
 |                               local_lock_irqsave(&pagesets.lock, flags);
 |                               [ BUG: Invalid wait context triggered ]

[1] https://lkml.kernel.org/r/20210902200134.25603-1-skhan@linuxfoundation.org

PROVE_RAW_LOCK_NESTING is pointing out that (on RT kernels) the locking
rules are being violated. More generally, memory is being allocated from
a non-preemptive context (raw_spin_lock'd c-s) where it is not allowed.

To properly fix this, we must prevent stackdepot from replenishing its
"stack slab" pool if memory allocations cannot be done in the current
context: it's a bug to use either GFP_ATOMIC nor GFP_NOWAIT in certain
non-preemptive contexts, including raw_spin_locks (see gfp.h and
ab00db216c9c7).

The only downside is that saving a stack trace may fail if: stackdepot
runs out of space AND the same stack trace has not been recorded before.
I expect this to be unlikely, and a simple experiment (boot the kernel)
didn't result in any failure to record stack trace from insert_work().

The series includes a few minor fixes to stackdepot that I noticed in
preparing the series. It then introduces __stack_depot_save(), which
exposes the option to force stackdepot to not allocate any memory.
Finally, KASAN is changed to use the new stackdepot interface and
provide kasan_record_aux_stack_noalloc(), which is then used by
workqueue code.

Marco Elver (6):
  lib/stackdepot: include gfp.h
  lib/stackdepot: remove unused function argument
  lib/stackdepot: introduce __stack_depot_save()
  kasan: common: provide can_alloc in kasan_save_stack()
  kasan: generic: introduce kasan_record_aux_stack_noalloc()
  workqueue, kasan: avoid alloc_pages() when recording stack

 include/linux/kasan.h      |  2 ++
 include/linux/stackdepot.h |  6 +++++
 kernel/workqueue.c         |  2 +-
 lib/stackdepot.c           | 51 ++++++++++++++++++++++++++++++--------
 mm/kasan/common.c          |  6 ++---
 mm/kasan/generic.c         | 14 +++++++++--
 mm/kasan/kasan.h           |  2 +-
 7 files changed, 65 insertions(+), 18 deletions(-)

-- 
2.33.0.153.gba50c8fa24-goog


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/6] lib/stackdepot: include gfp.h
  2021-09-07 14:13 [PATCH 0/6] stackdepot, kasan, workqueue: Avoid expanding stackdepot slabs when holding raw_spin_lock Marco Elver
@ 2021-09-07 14:13 ` Marco Elver
  2021-09-14 12:11   ` Alexander Potapenko
  2021-09-07 14:13 ` [PATCH 2/6] lib/stackdepot: remove unused function argument Marco Elver
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 17+ messages in thread
From: Marco Elver @ 2021-09-07 14:13 UTC (permalink / raw)
  To: elver
  Cc: Andrew Morton, Shuah Khan, Tejun Heo, Lai Jiangshan,
	Andrey Konovalov, Walter Wu, Andrey Ryabinin,
	Alexander Potapenko, Dmitry Vyukov, Vijayanand Jitta,
	Vinayak Menon, Gustavo A. R. Silva, kasan-dev, linux-kernel,
	linux-mm, Aleksandr Nogikh, Taras Madan

<linux/stackdepot.h> refers to gfp_t, but doesn't include gfp.h.

Fix it by including <linux/gfp.h>.

Signed-off-by: Marco Elver <elver@google.com>
---
 include/linux/stackdepot.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index 6bb4bc1a5f54..97b36dc53301 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -11,6 +11,8 @@
 #ifndef _LINUX_STACKDEPOT_H
 #define _LINUX_STACKDEPOT_H
 
+#include <linux/gfp.h>
+
 typedef u32 depot_stack_handle_t;
 
 depot_stack_handle_t stack_depot_save(unsigned long *entries,
-- 
2.33.0.153.gba50c8fa24-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/6] lib/stackdepot: remove unused function argument
  2021-09-07 14:13 [PATCH 0/6] stackdepot, kasan, workqueue: Avoid expanding stackdepot slabs when holding raw_spin_lock Marco Elver
  2021-09-07 14:13 ` [PATCH 1/6] lib/stackdepot: include gfp.h Marco Elver
@ 2021-09-07 14:13 ` Marco Elver
  2021-09-14 12:12   ` Alexander Potapenko
  2021-09-07 14:13 ` [PATCH 3/6] lib/stackdepot: introduce __stack_depot_save() Marco Elver
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 17+ messages in thread
From: Marco Elver @ 2021-09-07 14:13 UTC (permalink / raw)
  To: elver
  Cc: Andrew Morton, Shuah Khan, Tejun Heo, Lai Jiangshan,
	Andrey Konovalov, Walter Wu, Andrey Ryabinin,
	Alexander Potapenko, Dmitry Vyukov, Vijayanand Jitta,
	Vinayak Menon, Gustavo A. R. Silva, kasan-dev, linux-kernel,
	linux-mm, Aleksandr Nogikh, Taras Madan

alloc_flags in depot_alloc_stack() is no longer used; remove it.

Signed-off-by: Marco Elver <elver@google.com>
---
 lib/stackdepot.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 0a2e417f83cb..c80a9f734253 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -102,8 +102,8 @@ static bool init_stack_slab(void **prealloc)
 }
 
 /* Allocation of a new stack in raw storage */
-static struct stack_record *depot_alloc_stack(unsigned long *entries, int size,
-		u32 hash, void **prealloc, gfp_t alloc_flags)
+static struct stack_record *
+depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
 {
 	struct stack_record *stack;
 	size_t required_size = struct_size(stack, entries, size);
@@ -309,9 +309,8 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries,
 
 	found = find_stack(*bucket, entries, nr_entries, hash);
 	if (!found) {
-		struct stack_record *new =
-			depot_alloc_stack(entries, nr_entries,
-					  hash, &prealloc, alloc_flags);
+		struct stack_record *new = depot_alloc_stack(entries, nr_entries, hash, &prealloc);
+
 		if (new) {
 			new->next = *bucket;
 			/*
-- 
2.33.0.153.gba50c8fa24-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/6] lib/stackdepot: introduce __stack_depot_save()
  2021-09-07 14:13 [PATCH 0/6] stackdepot, kasan, workqueue: Avoid expanding stackdepot slabs when holding raw_spin_lock Marco Elver
  2021-09-07 14:13 ` [PATCH 1/6] lib/stackdepot: include gfp.h Marco Elver
  2021-09-07 14:13 ` [PATCH 2/6] lib/stackdepot: remove unused function argument Marco Elver
@ 2021-09-07 14:13 ` Marco Elver
  2021-09-10 14:08   ` Sebastian Andrzej Siewior
  2021-09-07 14:13 ` [PATCH 4/6] kasan: common: provide can_alloc in kasan_save_stack() Marco Elver
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 17+ messages in thread
From: Marco Elver @ 2021-09-07 14:13 UTC (permalink / raw)
  To: elver
  Cc: Andrew Morton, Shuah Khan, Tejun Heo, Lai Jiangshan,
	Andrey Konovalov, Walter Wu, Andrey Ryabinin,
	Alexander Potapenko, Dmitry Vyukov, Vijayanand Jitta,
	Vinayak Menon, Gustavo A. R. Silva, kasan-dev, linux-kernel,
	linux-mm, Aleksandr Nogikh, Taras Madan

Add __stack_depot_save(), which provides more fine-grained control over
stackdepot's memory allocation behaviour, in case stackdepot runs out of
"stack slabs".

Normally stackdepot uses alloc_pages() in case it runs out of space;
passing can_alloc==false to __stack_depot_save() prohibits this, at the
cost of more likely failure to record a stack trace.

Signed-off-by: Marco Elver <elver@google.com>
---
 include/linux/stackdepot.h |  4 ++++
 lib/stackdepot.c           | 42 ++++++++++++++++++++++++++++++++------
 2 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index 97b36dc53301..b2f7e7c6ba54 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -15,6 +15,10 @@
 
 typedef u32 depot_stack_handle_t;
 
+depot_stack_handle_t __stack_depot_save(unsigned long *entries,
+					unsigned int nr_entries,
+					gfp_t gfp_flags, bool can_alloc);
+
 depot_stack_handle_t stack_depot_save(unsigned long *entries,
 				      unsigned int nr_entries, gfp_t gfp_flags);
 
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index c80a9f734253..cab6cf117290 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -248,17 +248,28 @@ unsigned int stack_depot_fetch(depot_stack_handle_t handle,
 EXPORT_SYMBOL_GPL(stack_depot_fetch);
 
 /**
- * stack_depot_save - Save a stack trace from an array
+ * __stack_depot_save - Save a stack trace from an array
  *
  * @entries:		Pointer to storage array
  * @nr_entries:		Size of the storage array
  * @alloc_flags:	Allocation gfp flags
+ * @can_alloc:		Allocate stack slabs (increased chance of failure if false)
+ *
+ * Saves a stack trace from @entries array of size @nr_entries. If @can_alloc is
+ * %true, is allowed to replenish the stack slab pool in case no space is left
+ * (allocates using GFP flags of @alloc_flags). If @can_alloc is %false, avoids
+ * any allocations and will fail if no space is left to store the stack trace.
+ *
+ * Context: Any context, but setting @can_alloc to %false is required if
+ *          alloc_pages() cannot be used from the current context. Currently
+ *          this is the case from contexts where neither %GFP_ATOMIC nor
+ *          %GFP_NOWAIT can be used (NMI, raw_spin_lock).
  *
- * Return: The handle of the stack struct stored in depot
+ * Return: The handle of the stack struct stored in depot, 0 on failure.
  */
-depot_stack_handle_t stack_depot_save(unsigned long *entries,
-				      unsigned int nr_entries,
-				      gfp_t alloc_flags)
+depot_stack_handle_t __stack_depot_save(unsigned long *entries,
+					unsigned int nr_entries,
+					gfp_t alloc_flags, bool can_alloc)
 {
 	struct stack_record *found = NULL, **bucket;
 	depot_stack_handle_t retval = 0;
@@ -291,7 +302,7 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries,
 	 * The smp_load_acquire() here pairs with smp_store_release() to
 	 * |next_slab_inited| in depot_alloc_stack() and init_stack_slab().
 	 */
-	if (unlikely(!smp_load_acquire(&next_slab_inited))) {
+	if (unlikely(can_alloc && !smp_load_acquire(&next_slab_inited))) {
 		/*
 		 * Zero out zone modifiers, as we don't have specific zone
 		 * requirements. Keep the flags related to allocation in atomic
@@ -339,6 +350,25 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries,
 fast_exit:
 	return retval;
 }
+EXPORT_SYMBOL_GPL(__stack_depot_save);
+
+/**
+ * stack_depot_save - Save a stack trace from an array
+ *
+ * @entries:		Pointer to storage array
+ * @nr_entries:		Size of the storage array
+ * @alloc_flags:	Allocation gfp flags
+ *
+ * Context: Contexts where allocations via alloc_pages() are allowed.
+ *
+ * Return: The handle of the stack struct stored in depot, 0 on failure.
+ */
+depot_stack_handle_t stack_depot_save(unsigned long *entries,
+				      unsigned int nr_entries,
+				      gfp_t alloc_flags)
+{
+	return __stack_depot_save(entries, nr_entries, alloc_flags, true);
+}
 EXPORT_SYMBOL_GPL(stack_depot_save);
 
 static inline int in_irqentry_text(unsigned long ptr)
-- 
2.33.0.153.gba50c8fa24-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/6] kasan: common: provide can_alloc in kasan_save_stack()
  2021-09-07 14:13 [PATCH 0/6] stackdepot, kasan, workqueue: Avoid expanding stackdepot slabs when holding raw_spin_lock Marco Elver
                   ` (2 preceding siblings ...)
  2021-09-07 14:13 ` [PATCH 3/6] lib/stackdepot: introduce __stack_depot_save() Marco Elver
@ 2021-09-07 14:13 ` Marco Elver
  2021-09-07 14:13 ` [PATCH 5/6] kasan: generic: introduce kasan_record_aux_stack_noalloc() Marco Elver
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Marco Elver @ 2021-09-07 14:13 UTC (permalink / raw)
  To: elver
  Cc: Andrew Morton, Shuah Khan, Tejun Heo, Lai Jiangshan,
	Andrey Konovalov, Walter Wu, Andrey Ryabinin,
	Alexander Potapenko, Dmitry Vyukov, Vijayanand Jitta,
	Vinayak Menon, Gustavo A. R. Silva, kasan-dev, linux-kernel,
	linux-mm, Aleksandr Nogikh, Taras Madan

Add another argument, can_alloc, to kasan_save_stack() which is passed
as-is to __stack_depot_save().

No functional change intended.

Signed-off-by: Marco Elver <elver@google.com>
---
 mm/kasan/common.c  | 6 +++---
 mm/kasan/generic.c | 2 +-
 mm/kasan/kasan.h   | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index 2baf121fb8c5..3e0999892c36 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -30,20 +30,20 @@
 #include "kasan.h"
 #include "../slab.h"
 
-depot_stack_handle_t kasan_save_stack(gfp_t flags)
+depot_stack_handle_t kasan_save_stack(gfp_t flags, bool can_alloc)
 {
 	unsigned long entries[KASAN_STACK_DEPTH];
 	unsigned int nr_entries;
 
 	nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0);
 	nr_entries = filter_irq_stacks(entries, nr_entries);
-	return stack_depot_save(entries, nr_entries, flags);
+	return __stack_depot_save(entries, nr_entries, flags, can_alloc);
 }
 
 void kasan_set_track(struct kasan_track *track, gfp_t flags)
 {
 	track->pid = current->pid;
-	track->stack = kasan_save_stack(flags);
+	track->stack = kasan_save_stack(flags, true);
 }
 
 #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index c3f5ba7a294a..2a8e59e6326d 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -345,7 +345,7 @@ void kasan_record_aux_stack(void *addr)
 		return;
 
 	alloc_meta->aux_stack[1] = alloc_meta->aux_stack[0];
-	alloc_meta->aux_stack[0] = kasan_save_stack(GFP_NOWAIT);
+	alloc_meta->aux_stack[0] = kasan_save_stack(GFP_NOWAIT, true);
 }
 
 void kasan_set_free_info(struct kmem_cache *cache,
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index fa02c88b6948..e442d94a8f6e 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -260,7 +260,7 @@ void kasan_report_invalid_free(void *object, unsigned long ip);
 
 struct page *kasan_addr_to_page(const void *addr);
 
-depot_stack_handle_t kasan_save_stack(gfp_t flags);
+depot_stack_handle_t kasan_save_stack(gfp_t flags, bool can_alloc);
 void kasan_set_track(struct kasan_track *track, gfp_t flags);
 void kasan_set_free_info(struct kmem_cache *cache, void *object, u8 tag);
 struct kasan_track *kasan_get_free_track(struct kmem_cache *cache,
-- 
2.33.0.153.gba50c8fa24-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/6] kasan: generic: introduce kasan_record_aux_stack_noalloc()
  2021-09-07 14:13 [PATCH 0/6] stackdepot, kasan, workqueue: Avoid expanding stackdepot slabs when holding raw_spin_lock Marco Elver
                   ` (3 preceding siblings ...)
  2021-09-07 14:13 ` [PATCH 4/6] kasan: common: provide can_alloc in kasan_save_stack() Marco Elver
@ 2021-09-07 14:13 ` Marco Elver
  2021-09-07 14:13 ` [PATCH 6/6] workqueue, kasan: avoid alloc_pages() when recording stack Marco Elver
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Marco Elver @ 2021-09-07 14:13 UTC (permalink / raw)
  To: elver
  Cc: Andrew Morton, Shuah Khan, Tejun Heo, Lai Jiangshan,
	Andrey Konovalov, Walter Wu, Andrey Ryabinin,
	Alexander Potapenko, Dmitry Vyukov, Vijayanand Jitta,
	Vinayak Menon, Gustavo A. R. Silva, kasan-dev, linux-kernel,
	linux-mm, Aleksandr Nogikh, Taras Madan

Introduce a variant of kasan_record_aux_stack() that does not do any
memory allocation through stackdepot. This will permit using it in
contexts that cannot allocate any memory.

Signed-off-by: Marco Elver <elver@google.com>
---
 include/linux/kasan.h |  2 ++
 mm/kasan/generic.c    | 14 ++++++++++++--
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index dd874a1ee862..736d7b458996 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -370,12 +370,14 @@ static inline void kasan_unpoison_task_stack(struct task_struct *task) {}
 void kasan_cache_shrink(struct kmem_cache *cache);
 void kasan_cache_shutdown(struct kmem_cache *cache);
 void kasan_record_aux_stack(void *ptr);
+void kasan_record_aux_stack_noalloc(void *ptr);
 
 #else /* CONFIG_KASAN_GENERIC */
 
 static inline void kasan_cache_shrink(struct kmem_cache *cache) {}
 static inline void kasan_cache_shutdown(struct kmem_cache *cache) {}
 static inline void kasan_record_aux_stack(void *ptr) {}
+static inline void kasan_record_aux_stack_noalloc(void *ptr) {}
 
 #endif /* CONFIG_KASAN_GENERIC */
 
diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index 2a8e59e6326d..84a038b07c6f 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -328,7 +328,7 @@ DEFINE_ASAN_SET_SHADOW(f3);
 DEFINE_ASAN_SET_SHADOW(f5);
 DEFINE_ASAN_SET_SHADOW(f8);
 
-void kasan_record_aux_stack(void *addr)
+static void __kasan_record_aux_stack(void *addr, bool can_alloc)
 {
 	struct page *page = kasan_addr_to_page(addr);
 	struct kmem_cache *cache;
@@ -345,7 +345,17 @@ void kasan_record_aux_stack(void *addr)
 		return;
 
 	alloc_meta->aux_stack[1] = alloc_meta->aux_stack[0];
-	alloc_meta->aux_stack[0] = kasan_save_stack(GFP_NOWAIT, true);
+	alloc_meta->aux_stack[0] = kasan_save_stack(GFP_NOWAIT, can_alloc);
+}
+
+void kasan_record_aux_stack(void *addr)
+{
+	return __kasan_record_aux_stack(addr, true);
+}
+
+void kasan_record_aux_stack_noalloc(void *addr)
+{
+	return __kasan_record_aux_stack(addr, false);
 }
 
 void kasan_set_free_info(struct kmem_cache *cache,
-- 
2.33.0.153.gba50c8fa24-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 6/6] workqueue, kasan: avoid alloc_pages() when recording stack
  2021-09-07 14:13 [PATCH 0/6] stackdepot, kasan, workqueue: Avoid expanding stackdepot slabs when holding raw_spin_lock Marco Elver
                   ` (4 preceding siblings ...)
  2021-09-07 14:13 ` [PATCH 5/6] kasan: generic: introduce kasan_record_aux_stack_noalloc() Marco Elver
@ 2021-09-07 14:13 ` Marco Elver
  2021-09-07 16:39   ` Tejun Heo
  2021-09-07 14:17 ` [PATCH 0/6] stackdepot, kasan, workqueue: Avoid expanding stackdepot slabs when holding raw_spin_lock Marco Elver
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 17+ messages in thread
From: Marco Elver @ 2021-09-07 14:13 UTC (permalink / raw)
  To: elver
  Cc: Andrew Morton, Shuah Khan, Tejun Heo, Lai Jiangshan,
	Andrey Konovalov, Walter Wu, Andrey Ryabinin,
	Alexander Potapenko, Dmitry Vyukov, Vijayanand Jitta,
	Vinayak Menon, Gustavo A. R. Silva, kasan-dev, linux-kernel,
	linux-mm, Aleksandr Nogikh, Taras Madan

Shuah Khan reported:

 | When CONFIG_PROVE_RAW_LOCK_NESTING=y and CONFIG_KASAN are enabled,
 | kasan_record_aux_stack() runs into "BUG: Invalid wait context" when
 | it tries to allocate memory attempting to acquire spinlock in page
 | allocation code while holding workqueue pool raw_spinlock.
 |
 | There are several instances of this problem when block layer tries
 | to __queue_work(). Call trace from one of these instances is below:
 |
 |     kblockd_mod_delayed_work_on()
 |       mod_delayed_work_on()
 |         __queue_delayed_work()
 |           __queue_work() (rcu_read_lock, raw_spin_lock pool->lock held)
 |             insert_work()
 |               kasan_record_aux_stack()
 |                 kasan_save_stack()
 |                   stack_depot_save()
 |                     alloc_pages()
 |                       __alloc_pages()
 |                         get_page_from_freelist()
 |                           rm_queue()
 |                             rm_queue_pcplist()
 |                               local_lock_irqsave(&pagesets.lock, flags);
 |                               [ BUG: Invalid wait context triggered ]

The default kasan_record_aux_stack() calls stack_depot_save() with
GFP_NOWAIT, which in turn can then call alloc_pages(GFP_NOWAIT, ...).
In general, however, it is not even possible to use either GFP_ATOMIC
nor GFP_NOWAIT in certain non-preemptive contexts, including
raw_spin_locks (see gfp.h and ab00db216c9c7).

Fix it by instructing stackdepot to not expand stack storage via
alloc_pages() in case it runs out by using kasan_record_aux_stack_noalloc().

While there is an increased risk of failing to insert the stack trace,
this is typically unlikely, especially if the same insertion had already
succeeded previously (stack depot hit). For frequent calls from the same
location, it therefore becomes extremely unlikely that
kasan_record_aux_stack_noalloc() fails.

Link: https://lkml.kernel.org/r/20210902200134.25603-1-skhan@linuxfoundation.org
Reported-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Marco Elver <elver@google.com>
---
 kernel/workqueue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 50142fc08902..0681774e6908 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1329,7 +1329,7 @@ static void insert_work(struct pool_workqueue *pwq, struct work_struct *work,
 	struct worker_pool *pool = pwq->pool;
 
 	/* record the work call stack in order to print it in KASAN reports */
-	kasan_record_aux_stack(work);
+	kasan_record_aux_stack_noalloc(work);
 
 	/* we own @work, set data and link */
 	set_work_pwq(work, pwq, extra_flags);
-- 
2.33.0.153.gba50c8fa24-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/6] stackdepot, kasan, workqueue: Avoid expanding stackdepot slabs when holding raw_spin_lock
  2021-09-07 14:13 [PATCH 0/6] stackdepot, kasan, workqueue: Avoid expanding stackdepot slabs when holding raw_spin_lock Marco Elver
                   ` (5 preceding siblings ...)
  2021-09-07 14:13 ` [PATCH 6/6] workqueue, kasan: avoid alloc_pages() when recording stack Marco Elver
@ 2021-09-07 14:17 ` Marco Elver
  2021-09-07 20:05 ` Shuah Khan
  2021-09-14 12:55 ` Alexander Potapenko
  8 siblings, 0 replies; 17+ messages in thread
From: Marco Elver @ 2021-09-07 14:17 UTC (permalink / raw)
  To: elver, Andrew Morton
  Cc: Shuah Khan, Tejun Heo, Lai Jiangshan, Andrey Konovalov,
	Walter Wu, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov,
	Vijayanand Jitta, Vinayak Menon, Gustavo A. R. Silva, kasan-dev,
	linux-kernel, linux-mm, Aleksandr Nogikh, Taras Madan,
	Thomas Gleixner, Sebastian Andrzej Siewior

[+Cc: Thomas, Sebastian]

Sorry, forgot to Cc you... :-/

On Tue, 7 Sept 2021 at 16:14, Marco Elver <elver@google.com> wrote:
>
> Shuah Khan reported [1]:
>
>  | When CONFIG_PROVE_RAW_LOCK_NESTING=y and CONFIG_KASAN are enabled,
>  | kasan_record_aux_stack() runs into "BUG: Invalid wait context" when
>  | it tries to allocate memory attempting to acquire spinlock in page
>  | allocation code while holding workqueue pool raw_spinlock.
>  |
>  | There are several instances of this problem when block layer tries
>  | to __queue_work(). Call trace from one of these instances is below:
>  |
>  |     kblockd_mod_delayed_work_on()
>  |       mod_delayed_work_on()
>  |         __queue_delayed_work()
>  |           __queue_work() (rcu_read_lock, raw_spin_lock pool->lock held)
>  |             insert_work()
>  |               kasan_record_aux_stack()
>  |                 kasan_save_stack()
>  |                   stack_depot_save()
>  |                     alloc_pages()
>  |                       __alloc_pages()
>  |                         get_page_from_freelist()
>  |                           rm_queue()
>  |                             rm_queue_pcplist()
>  |                               local_lock_irqsave(&pagesets.lock, flags);
>  |                               [ BUG: Invalid wait context triggered ]
>
> [1] https://lkml.kernel.org/r/20210902200134.25603-1-skhan@linuxfoundation.org
>
> PROVE_RAW_LOCK_NESTING is pointing out that (on RT kernels) the locking
> rules are being violated. More generally, memory is being allocated from
> a non-preemptive context (raw_spin_lock'd c-s) where it is not allowed.
>
> To properly fix this, we must prevent stackdepot from replenishing its
> "stack slab" pool if memory allocations cannot be done in the current
> context: it's a bug to use either GFP_ATOMIC nor GFP_NOWAIT in certain
> non-preemptive contexts, including raw_spin_locks (see gfp.h and
> ab00db216c9c7).
>
> The only downside is that saving a stack trace may fail if: stackdepot
> runs out of space AND the same stack trace has not been recorded before.
> I expect this to be unlikely, and a simple experiment (boot the kernel)
> didn't result in any failure to record stack trace from insert_work().
>
> The series includes a few minor fixes to stackdepot that I noticed in
> preparing the series. It then introduces __stack_depot_save(), which
> exposes the option to force stackdepot to not allocate any memory.
> Finally, KASAN is changed to use the new stackdepot interface and
> provide kasan_record_aux_stack_noalloc(), which is then used by
> workqueue code.
>
> Marco Elver (6):
>   lib/stackdepot: include gfp.h
>   lib/stackdepot: remove unused function argument
>   lib/stackdepot: introduce __stack_depot_save()
>   kasan: common: provide can_alloc in kasan_save_stack()
>   kasan: generic: introduce kasan_record_aux_stack_noalloc()
>   workqueue, kasan: avoid alloc_pages() when recording stack
>
>  include/linux/kasan.h      |  2 ++
>  include/linux/stackdepot.h |  6 +++++
>  kernel/workqueue.c         |  2 +-
>  lib/stackdepot.c           | 51 ++++++++++++++++++++++++++++++--------
>  mm/kasan/common.c          |  6 ++---
>  mm/kasan/generic.c         | 14 +++++++++--
>  mm/kasan/kasan.h           |  2 +-
>  7 files changed, 65 insertions(+), 18 deletions(-)
>
> --
> 2.33.0.153.gba50c8fa24-goog
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/6] workqueue, kasan: avoid alloc_pages() when recording stack
  2021-09-07 14:13 ` [PATCH 6/6] workqueue, kasan: avoid alloc_pages() when recording stack Marco Elver
@ 2021-09-07 16:39   ` Tejun Heo
  0 siblings, 0 replies; 17+ messages in thread
From: Tejun Heo @ 2021-09-07 16:39 UTC (permalink / raw)
  To: Marco Elver
  Cc: Andrew Morton, Shuah Khan, Lai Jiangshan, Andrey Konovalov,
	Walter Wu, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov,
	Vijayanand Jitta, Vinayak Menon, Gustavo A. R. Silva, kasan-dev,
	linux-kernel, linux-mm, Aleksandr Nogikh, Taras Madan

On Tue, Sep 07, 2021 at 04:13:07PM +0200, Marco Elver wrote:
> Shuah Khan reported:
> 
>  | When CONFIG_PROVE_RAW_LOCK_NESTING=y and CONFIG_KASAN are enabled,
>  | kasan_record_aux_stack() runs into "BUG: Invalid wait context" when
>  | it tries to allocate memory attempting to acquire spinlock in page
>  | allocation code while holding workqueue pool raw_spinlock.
>  |
>  | There are several instances of this problem when block layer tries
>  | to __queue_work(). Call trace from one of these instances is below:
>  |
>  |     kblockd_mod_delayed_work_on()
>  |       mod_delayed_work_on()
>  |         __queue_delayed_work()
>  |           __queue_work() (rcu_read_lock, raw_spin_lock pool->lock held)
>  |             insert_work()
>  |               kasan_record_aux_stack()
>  |                 kasan_save_stack()
>  |                   stack_depot_save()
>  |                     alloc_pages()
>  |                       __alloc_pages()
>  |                         get_page_from_freelist()
>  |                           rm_queue()
>  |                             rm_queue_pcplist()
>  |                               local_lock_irqsave(&pagesets.lock, flags);
>  |                               [ BUG: Invalid wait context triggered ]
> 
> The default kasan_record_aux_stack() calls stack_depot_save() with
> GFP_NOWAIT, which in turn can then call alloc_pages(GFP_NOWAIT, ...).
> In general, however, it is not even possible to use either GFP_ATOMIC
> nor GFP_NOWAIT in certain non-preemptive contexts, including
> raw_spin_locks (see gfp.h and ab00db216c9c7).
> 
> Fix it by instructing stackdepot to not expand stack storage via
> alloc_pages() in case it runs out by using kasan_record_aux_stack_noalloc().
> 
> While there is an increased risk of failing to insert the stack trace,
> this is typically unlikely, especially if the same insertion had already
> succeeded previously (stack depot hit). For frequent calls from the same
> location, it therefore becomes extremely unlikely that
> kasan_record_aux_stack_noalloc() fails.
> 
> Link: https://lkml.kernel.org/r/20210902200134.25603-1-skhan@linuxfoundation.org
> Reported-by: Shuah Khan <skhan@linuxfoundation.org>
> Signed-off-by: Marco Elver <elver@google.com>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/6] stackdepot, kasan, workqueue: Avoid expanding stackdepot slabs when holding raw_spin_lock
  2021-09-07 14:13 [PATCH 0/6] stackdepot, kasan, workqueue: Avoid expanding stackdepot slabs when holding raw_spin_lock Marco Elver
                   ` (6 preceding siblings ...)
  2021-09-07 14:17 ` [PATCH 0/6] stackdepot, kasan, workqueue: Avoid expanding stackdepot slabs when holding raw_spin_lock Marco Elver
@ 2021-09-07 20:05 ` Shuah Khan
  2021-09-10 10:50   ` Vlastimil Babka
  2021-09-14 12:55 ` Alexander Potapenko
  8 siblings, 1 reply; 17+ messages in thread
From: Shuah Khan @ 2021-09-07 20:05 UTC (permalink / raw)
  To: Marco Elver, Andrew Morton
  Cc: Tejun Heo, Lai Jiangshan, Andrey Konovalov, Walter Wu,
	Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov,
	Vijayanand Jitta, Vinayak Menon, Gustavo A. R. Silva, kasan-dev,
	linux-kernel, linux-mm, Aleksandr Nogikh, Taras Madan,
	Shuah Khan

On 9/7/21 8:13 AM, Marco Elver wrote:
> Shuah Khan reported [1]:
> 
>   | When CONFIG_PROVE_RAW_LOCK_NESTING=y and CONFIG_KASAN are enabled,
>   | kasan_record_aux_stack() runs into "BUG: Invalid wait context" when
>   | it tries to allocate memory attempting to acquire spinlock in page
>   | allocation code while holding workqueue pool raw_spinlock.
>   |
>   | There are several instances of this problem when block layer tries
>   | to __queue_work(). Call trace from one of these instances is below:
>   |
>   |     kblockd_mod_delayed_work_on()
>   |       mod_delayed_work_on()
>   |         __queue_delayed_work()
>   |           __queue_work() (rcu_read_lock, raw_spin_lock pool->lock held)
>   |             insert_work()
>   |               kasan_record_aux_stack()
>   |                 kasan_save_stack()
>   |                   stack_depot_save()
>   |                     alloc_pages()
>   |                       __alloc_pages()
>   |                         get_page_from_freelist()
>   |                           rm_queue()
>   |                             rm_queue_pcplist()
>   |                               local_lock_irqsave(&pagesets.lock, flags);
>   |                               [ BUG: Invalid wait context triggered ]
> 
> [1] https://lkml.kernel.org/r/20210902200134.25603-1-skhan@linuxfoundation.org
> 
> PROVE_RAW_LOCK_NESTING is pointing out that (on RT kernels) the locking
> rules are being violated. More generally, memory is being allocated from
> a non-preemptive context (raw_spin_lock'd c-s) where it is not allowed.
> 
> To properly fix this, we must prevent stackdepot from replenishing its
> "stack slab" pool if memory allocations cannot be done in the current
> context: it's a bug to use either GFP_ATOMIC nor GFP_NOWAIT in certain
> non-preemptive contexts, including raw_spin_locks (see gfp.h and
> ab00db216c9c7).
> 
> The only downside is that saving a stack trace may fail if: stackdepot
> runs out of space AND the same stack trace has not been recorded before.
> I expect this to be unlikely, and a simple experiment (boot the kernel)
> didn't result in any failure to record stack trace from insert_work().
> 
> The series includes a few minor fixes to stackdepot that I noticed in
> preparing the series. It then introduces __stack_depot_save(), which
> exposes the option to force stackdepot to not allocate any memory.
> Finally, KASAN is changed to use the new stackdepot interface and
> provide kasan_record_aux_stack_noalloc(), which is then used by
> workqueue code.
> 
> Marco Elver (6):
>    lib/stackdepot: include gfp.h
>    lib/stackdepot: remove unused function argument
>    lib/stackdepot: introduce __stack_depot_save()
>    kasan: common: provide can_alloc in kasan_save_stack()
>    kasan: generic: introduce kasan_record_aux_stack_noalloc()
>    workqueue, kasan: avoid alloc_pages() when recording stack
> 
>   include/linux/kasan.h      |  2 ++
>   include/linux/stackdepot.h |  6 +++++
>   kernel/workqueue.c         |  2 +-
>   lib/stackdepot.c           | 51 ++++++++++++++++++++++++++++++--------
>   mm/kasan/common.c          |  6 ++---
>   mm/kasan/generic.c         | 14 +++++++++--
>   mm/kasan/kasan.h           |  2 +-
>   7 files changed, 65 insertions(+), 18 deletions(-)
> 

Thank you. Tested all the 6 patches in this series on Linux 5.14. This problem
exists in 5.13 and needs to be marked for both 5.14 and 5.13 stable releases.

Here is my

Tested-by: Shuah Khan <skhan@linuxfoundation.org>

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/6] stackdepot, kasan, workqueue: Avoid expanding stackdepot slabs when holding raw_spin_lock
  2021-09-07 20:05 ` Shuah Khan
@ 2021-09-10 10:50   ` Vlastimil Babka
  2021-09-10 15:28     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 17+ messages in thread
From: Vlastimil Babka @ 2021-09-10 10:50 UTC (permalink / raw)
  To: Shuah Khan, Marco Elver, Andrew Morton
  Cc: Tejun Heo, Lai Jiangshan, Andrey Konovalov, Walter Wu,
	Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov,
	Vijayanand Jitta, Vinayak Menon, Gustavo A. R. Silva, kasan-dev,
	linux-kernel, linux-mm, Aleksandr Nogikh, Taras Madan,
	Thomas Gleixner, Peter Zijlstra, Sebastian Andrzej Siewior

On 9/7/21 22:05, Shuah Khan wrote:
> On 9/7/21 8:13 AM, Marco Elver wrote:
>> Shuah Khan reported [1]:
>>
>>   | When CONFIG_PROVE_RAW_LOCK_NESTING=y and CONFIG_KASAN are enabled,
>>   | kasan_record_aux_stack() runs into "BUG: Invalid wait context" when
>>   | it tries to allocate memory attempting to acquire spinlock in page
>>   | allocation code while holding workqueue pool raw_spinlock.
>>   |
>>   | There are several instances of this problem when block layer tries
>>   | to __queue_work(). Call trace from one of these instances is below:
>>   |
>>   |     kblockd_mod_delayed_work_on()
>>   |       mod_delayed_work_on()
>>   |         __queue_delayed_work()
>>   |           __queue_work() (rcu_read_lock, raw_spin_lock pool->lock held)
>>   |             insert_work()
>>   |               kasan_record_aux_stack()
>>   |                 kasan_save_stack()
>>   |                   stack_depot_save()
>>   |                     alloc_pages()
>>   |                       __alloc_pages()
>>   |                         get_page_from_freelist()
>>   |                           rm_queue()
>>   |                             rm_queue_pcplist()
>>   |                               local_lock_irqsave(&pagesets.lock, flags);
>>   |                               [ BUG: Invalid wait context triggered ]
>>
>> [1]
>> https://lkml.kernel.org/r/20210902200134.25603-1-skhan@linuxfoundation.org
>>
>> PROVE_RAW_LOCK_NESTING is pointing out that (on RT kernels) the locking
>> rules are being violated. More generally, memory is being allocated from
>> a non-preemptive context (raw_spin_lock'd c-s) where it is not allowed.
>>
>> To properly fix this, we must prevent stackdepot from replenishing its
>> "stack slab" pool if memory allocations cannot be done in the current
>> context: it's a bug to use either GFP_ATOMIC nor GFP_NOWAIT in certain
>> non-preemptive contexts, including raw_spin_locks (see gfp.h and
>> ab00db216c9c7).
>>
>> The only downside is that saving a stack trace may fail if: stackdepot
>> runs out of space AND the same stack trace has not been recorded before.
>> I expect this to be unlikely, and a simple experiment (boot the kernel)
>> didn't result in any failure to record stack trace from insert_work().
>>
>> The series includes a few minor fixes to stackdepot that I noticed in
>> preparing the series. It then introduces __stack_depot_save(), which
>> exposes the option to force stackdepot to not allocate any memory.
>> Finally, KASAN is changed to use the new stackdepot interface and
>> provide kasan_record_aux_stack_noalloc(), which is then used by
>> workqueue code.
>>
>> Marco Elver (6):
>>    lib/stackdepot: include gfp.h
>>    lib/stackdepot: remove unused function argument
>>    lib/stackdepot: introduce __stack_depot_save()
>>    kasan: common: provide can_alloc in kasan_save_stack()
>>    kasan: generic: introduce kasan_record_aux_stack_noalloc()
>>    workqueue, kasan: avoid alloc_pages() when recording stack
>>
>>   include/linux/kasan.h      |  2 ++
>>   include/linux/stackdepot.h |  6 +++++
>>   kernel/workqueue.c         |  2 +-
>>   lib/stackdepot.c           | 51 ++++++++++++++++++++++++++++++--------
>>   mm/kasan/common.c          |  6 ++---
>>   mm/kasan/generic.c         | 14 +++++++++--
>>   mm/kasan/kasan.h           |  2 +-
>>   7 files changed, 65 insertions(+), 18 deletions(-)
>>
> 
> Thank you. Tested all the 6 patches in this series on Linux 5.14. This problem
> exists in 5.13 and needs to be marked for both 5.14 and 5.13 stable releases.

I think if this problem manifests only with CONFIG_PROVE_RAW_LOCK_NESTING
then it shouldn't be backported to stable. CONFIG_PROVE_RAW_LOCK_NESTING is
an experimental/development option to earlier discover what will collide
with RT lock semantics, without needing the full RT tree.
Thus, good to fix going forward, but not necessary to stable backport.

> Here is my
> 
> Tested-by: Shuah Khan <skhan@linuxfoundation.org>
> 
> thanks,
> -- Shuah
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/6] lib/stackdepot: introduce __stack_depot_save()
  2021-09-07 14:13 ` [PATCH 3/6] lib/stackdepot: introduce __stack_depot_save() Marco Elver
@ 2021-09-10 14:08   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 17+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-09-10 14:08 UTC (permalink / raw)
  To: Marco Elver
  Cc: Andrew Morton, Shuah Khan, Tejun Heo, Lai Jiangshan,
	Andrey Konovalov, Walter Wu, Andrey Ryabinin,
	Alexander Potapenko, Dmitry Vyukov, Vijayanand Jitta,
	Vinayak Menon, Gustavo A. R. Silva, kasan-dev, linux-kernel,
	linux-mm, Aleksandr Nogikh, Taras Madan

On 2021-09-07 16:13:04 [+0200], Marco Elver wrote:
> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> index c80a9f734253..cab6cf117290 100644
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -339,6 +350,25 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries,
>  fast_exit:
>  	return retval;
>  }
> +EXPORT_SYMBOL_GPL(__stack_depot_save);
> +
> +/**
> + * stack_depot_save - Save a stack trace from an array
> + *
> + * @entries:		Pointer to storage array
> + * @nr_entries:		Size of the storage array
> + * @alloc_flags:	Allocation gfp flags
> + *
> + * Context: Contexts where allocations via alloc_pages() are allowed.

Could we add here something like (see __stack_depot_save() for details)
since it has more verbose.

Sebastian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/6] stackdepot, kasan, workqueue: Avoid expanding stackdepot slabs when holding raw_spin_lock
  2021-09-10 10:50   ` Vlastimil Babka
@ 2021-09-10 15:28     ` Sebastian Andrzej Siewior
  2021-09-13  6:00       ` Marco Elver
  0 siblings, 1 reply; 17+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-09-10 15:28 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Shuah Khan, Marco Elver, Andrew Morton, Tejun Heo, Lai Jiangshan,
	Andrey Konovalov, Walter Wu, Andrey Ryabinin,
	Alexander Potapenko, Dmitry Vyukov, Vijayanand Jitta,
	Vinayak Menon, Gustavo A. R. Silva, kasan-dev, linux-kernel,
	linux-mm, Aleksandr Nogikh, Taras Madan, Thomas Gleixner,
	Peter Zijlstra

On 2021-09-10 12:50:51 [+0200], Vlastimil Babka wrote:
> > Thank you. Tested all the 6 patches in this series on Linux 5.14. This problem
> > exists in 5.13 and needs to be marked for both 5.14 and 5.13 stable releases.
> 
> I think if this problem manifests only with CONFIG_PROVE_RAW_LOCK_NESTING
> then it shouldn't be backported to stable. CONFIG_PROVE_RAW_LOCK_NESTING is
> an experimental/development option to earlier discover what will collide
> with RT lock semantics, without needing the full RT tree.
> Thus, good to fix going forward, but not necessary to stable backport.

  Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
for the series. Thank you.

As for the backport I agree here with Vlastimil.

I pulled it into my RT tree for some testing and it looked good. I had
to
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3030,7 +3030,7 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
        head->func = func;
        head->next = NULL;
        local_irq_save(flags);
-       kasan_record_aux_stack(head);
+       kasan_record_aux_stack_noalloc(head);
        rdp = this_cpu_ptr(&rcu_data);
 
        /* Add the callback to our list. */

We could move kasan_record_aux_stack() before that local_irq_save() but
then call_rcu() can be called preempt-disabled section so we would have
the same problem.

The second warning came from kasan_quarantine_remove_cache(). At the end
per_cpu_remove_cache() -> qlist_free_all() will free memory with
disabled interrupts (due to that smp-function call).
Moving it to kworker would solve the problem. I don't mind keeping that
smp_function call assuming that it is all debug-code and it increases
overall latency anyway. But then could we maybe move all those objects
to a single list which freed after on_each_cpu()?

Otherwise I haven't seen any new warnings showing up with KASAN enabled.

Sebastian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/6] stackdepot, kasan, workqueue: Avoid expanding stackdepot slabs when holding raw_spin_lock
  2021-09-10 15:28     ` Sebastian Andrzej Siewior
@ 2021-09-13  6:00       ` Marco Elver
  0 siblings, 0 replies; 17+ messages in thread
From: Marco Elver @ 2021-09-13  6:00 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Vlastimil Babka, Shuah Khan, Andrew Morton, Tejun Heo,
	Lai Jiangshan, Andrey Konovalov, Walter Wu, Andrey Ryabinin,
	Alexander Potapenko, Dmitry Vyukov, Vijayanand Jitta,
	Vinayak Menon, Gustavo A. R. Silva, kasan-dev, linux-kernel,
	linux-mm, Aleksandr Nogikh, Taras Madan, Thomas Gleixner,
	Peter Zijlstra

On Fri, 10 Sept 2021 at 17:28, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> On 2021-09-10 12:50:51 [+0200], Vlastimil Babka wrote:
> > > Thank you. Tested all the 6 patches in this series on Linux 5.14. This problem
> > > exists in 5.13 and needs to be marked for both 5.14 and 5.13 stable releases.
> >
> > I think if this problem manifests only with CONFIG_PROVE_RAW_LOCK_NESTING
> > then it shouldn't be backported to stable. CONFIG_PROVE_RAW_LOCK_NESTING is
> > an experimental/development option to earlier discover what will collide
> > with RT lock semantics, without needing the full RT tree.
> > Thus, good to fix going forward, but not necessary to stable backport.
>
>   Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> for the series. Thank you.

Thank you. I'll send v2 with Acks/Tested-by added and the comment
addition you suggested.

> As for the backport I agree here with Vlastimil.
>
> I pulled it into my RT tree for some testing and it looked good. I had
> to
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3030,7 +3030,7 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
>         head->func = func;
>         head->next = NULL;
>         local_irq_save(flags);
> -       kasan_record_aux_stack(head);
> +       kasan_record_aux_stack_noalloc(head);
>         rdp = this_cpu_ptr(&rcu_data);
>
>         /* Add the callback to our list. */
>
> We could move kasan_record_aux_stack() before that local_irq_save() but
> then call_rcu() can be called preempt-disabled section so we would have
> the same problem.
>
> The second warning came from kasan_quarantine_remove_cache(). At the end
> per_cpu_remove_cache() -> qlist_free_all() will free memory with
> disabled interrupts (due to that smp-function call).
> Moving it to kworker would solve the problem. I don't mind keeping that
> smp_function call assuming that it is all debug-code and it increases
> overall latency anyway. But then could we maybe move all those objects
> to a single list which freed after on_each_cpu()?

The quarantine is per-CPU, and I think what you suggest would
fundamentally change its design. If you have something that works on
RT without a fundamental change would be ideal (it is all debug code
and not used on non-KASAN kernels).


> Otherwise I haven't seen any new warnings showing up with KASAN enabled.
>
> Sebastian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/6] lib/stackdepot: include gfp.h
  2021-09-07 14:13 ` [PATCH 1/6] lib/stackdepot: include gfp.h Marco Elver
@ 2021-09-14 12:11   ` Alexander Potapenko
  0 siblings, 0 replies; 17+ messages in thread
From: Alexander Potapenko @ 2021-09-14 12:11 UTC (permalink / raw)
  To: Marco Elver
  Cc: Andrew Morton, Shuah Khan, Tejun Heo, Lai Jiangshan,
	Andrey Konovalov, Walter Wu, Andrey Ryabinin, Dmitry Vyukov,
	Vijayanand Jitta, Vinayak Menon, Gustavo A. R. Silva, kasan-dev,
	LKML, Linux Memory Management List, Aleksandr Nogikh,
	Taras Madan

On Tue, Sep 7, 2021 at 4:14 PM Marco Elver <elver@google.com> wrote:
>
> <linux/stackdepot.h> refers to gfp_t, but doesn't include gfp.h.
>
> Fix it by including <linux/gfp.h>.
>
> Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>

> ---
>  include/linux/stackdepot.h | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
> index 6bb4bc1a5f54..97b36dc53301 100644
> --- a/include/linux/stackdepot.h
> +++ b/include/linux/stackdepot.h
> @@ -11,6 +11,8 @@
>  #ifndef _LINUX_STACKDEPOT_H
>  #define _LINUX_STACKDEPOT_H
>
> +#include <linux/gfp.h>
> +
>  typedef u32 depot_stack_handle_t;
>
>  depot_stack_handle_t stack_depot_save(unsigned long *entries,
> --
> 2.33.0.153.gba50c8fa24-goog
>


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/6] lib/stackdepot: remove unused function argument
  2021-09-07 14:13 ` [PATCH 2/6] lib/stackdepot: remove unused function argument Marco Elver
@ 2021-09-14 12:12   ` Alexander Potapenko
  0 siblings, 0 replies; 17+ messages in thread
From: Alexander Potapenko @ 2021-09-14 12:12 UTC (permalink / raw)
  To: Marco Elver
  Cc: Andrew Morton, Shuah Khan, Tejun Heo, Lai Jiangshan,
	Andrey Konovalov, Walter Wu, Andrey Ryabinin, Dmitry Vyukov,
	Vijayanand Jitta, Vinayak Menon, Gustavo A. R. Silva, kasan-dev,
	LKML, Linux Memory Management List, Aleksandr Nogikh,
	Taras Madan

On Tue, Sep 7, 2021 at 4:14 PM Marco Elver <elver@google.com> wrote:
>
> alloc_flags in depot_alloc_stack() is no longer used; remove it.
>
> Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>

> ---
>  lib/stackdepot.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> index 0a2e417f83cb..c80a9f734253 100644
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -102,8 +102,8 @@ static bool init_stack_slab(void **prealloc)
>  }
>
>  /* Allocation of a new stack in raw storage */
> -static struct stack_record *depot_alloc_stack(unsigned long *entries, int size,
> -               u32 hash, void **prealloc, gfp_t alloc_flags)
> +static struct stack_record *
> +depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
>  {
>         struct stack_record *stack;
>         size_t required_size = struct_size(stack, entries, size);
> @@ -309,9 +309,8 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries,
>
>         found = find_stack(*bucket, entries, nr_entries, hash);
>         if (!found) {
> -               struct stack_record *new =
> -                       depot_alloc_stack(entries, nr_entries,
> -                                         hash, &prealloc, alloc_flags);
> +               struct stack_record *new = depot_alloc_stack(entries, nr_entries, hash, &prealloc);
> +
>                 if (new) {
>                         new->next = *bucket;
>                         /*
> --
> 2.33.0.153.gba50c8fa24-goog
>


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/6] stackdepot, kasan, workqueue: Avoid expanding stackdepot slabs when holding raw_spin_lock
  2021-09-07 14:13 [PATCH 0/6] stackdepot, kasan, workqueue: Avoid expanding stackdepot slabs when holding raw_spin_lock Marco Elver
                   ` (7 preceding siblings ...)
  2021-09-07 20:05 ` Shuah Khan
@ 2021-09-14 12:55 ` Alexander Potapenko
  8 siblings, 0 replies; 17+ messages in thread
From: Alexander Potapenko @ 2021-09-14 12:55 UTC (permalink / raw)
  To: Marco Elver
  Cc: Andrew Morton, Shuah Khan, Tejun Heo, Lai Jiangshan,
	Andrey Konovalov, Walter Wu, Andrey Ryabinin, Dmitry Vyukov,
	Vijayanand Jitta, Vinayak Menon, Gustavo A. R. Silva, kasan-dev,
	LKML, Linux Memory Management List, Aleksandr Nogikh,
	Taras Madan

On Tue, Sep 7, 2021 at 4:14 PM Marco Elver <elver@google.com> wrote:
>
> Shuah Khan reported [1]:
>
>  | When CONFIG_PROVE_RAW_LOCK_NESTING=y and CONFIG_KASAN are enabled,
>  | kasan_record_aux_stack() runs into "BUG: Invalid wait context" when
>  | it tries to allocate memory attempting to acquire spinlock in page
>  | allocation code while holding workqueue pool raw_spinlock.
>  |
>  | There are several instances of this problem when block layer tries
>  | to __queue_work(). Call trace from one of these instances is below:
>  |
>  |     kblockd_mod_delayed_work_on()
>  |       mod_delayed_work_on()
>  |         __queue_delayed_work()
>  |           __queue_work() (rcu_read_lock, raw_spin_lock pool->lock held)
>  |             insert_work()
>  |               kasan_record_aux_stack()
>  |                 kasan_save_stack()
>  |                   stack_depot_save()
>  |                     alloc_pages()
>  |                       __alloc_pages()
>  |                         get_page_from_freelist()
>  |                           rm_queue()
>  |                             rm_queue_pcplist()
>  |                               local_lock_irqsave(&pagesets.lock, flags);
>  |                               [ BUG: Invalid wait context triggered ]
>
> [1] https://lkml.kernel.org/r/20210902200134.25603-1-skhan@linuxfoundation.org
>
> PROVE_RAW_LOCK_NESTING is pointing out that (on RT kernels) the locking
> rules are being violated. More generally, memory is being allocated from
> a non-preemptive context (raw_spin_lock'd c-s) where it is not allowed.
>
> To properly fix this, we must prevent stackdepot from replenishing its
> "stack slab" pool if memory allocations cannot be done in the current
> context: it's a bug to use either GFP_ATOMIC nor GFP_NOWAIT in certain
> non-preemptive contexts, including raw_spin_locks (see gfp.h and
> ab00db216c9c7).
>
> The only downside is that saving a stack trace may fail if: stackdepot
> runs out of space AND the same stack trace has not been recorded before.
> I expect this to be unlikely, and a simple experiment (boot the kernel)
> didn't result in any failure to record stack trace from insert_work().
>
> The series includes a few minor fixes to stackdepot that I noticed in
> preparing the series. It then introduces __stack_depot_save(), which
> exposes the option to force stackdepot to not allocate any memory.
> Finally, KASAN is changed to use the new stackdepot interface and
> provide kasan_record_aux_stack_noalloc(), which is then used by
> workqueue code.
>
> Marco Elver (6):
>   lib/stackdepot: include gfp.h
>   lib/stackdepot: remove unused function argument
>   lib/stackdepot: introduce __stack_depot_save()
>   kasan: common: provide can_alloc in kasan_save_stack()
>   kasan: generic: introduce kasan_record_aux_stack_noalloc()
>   workqueue, kasan: avoid alloc_pages() when recording stack

Acked-by: Alexander Potapenko <glider@google.com>

for the whole series.

>
>  include/linux/kasan.h      |  2 ++
>  include/linux/stackdepot.h |  6 +++++
>  kernel/workqueue.c         |  2 +-
>  lib/stackdepot.c           | 51 ++++++++++++++++++++++++++++++--------
>  mm/kasan/common.c          |  6 ++---
>  mm/kasan/generic.c         | 14 +++++++++--
>  mm/kasan/kasan.h           |  2 +-
>  7 files changed, 65 insertions(+), 18 deletions(-)
>
> --
> 2.33.0.153.gba50c8fa24-goog
>


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-09-14 12:56 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-07 14:13 [PATCH 0/6] stackdepot, kasan, workqueue: Avoid expanding stackdepot slabs when holding raw_spin_lock Marco Elver
2021-09-07 14:13 ` [PATCH 1/6] lib/stackdepot: include gfp.h Marco Elver
2021-09-14 12:11   ` Alexander Potapenko
2021-09-07 14:13 ` [PATCH 2/6] lib/stackdepot: remove unused function argument Marco Elver
2021-09-14 12:12   ` Alexander Potapenko
2021-09-07 14:13 ` [PATCH 3/6] lib/stackdepot: introduce __stack_depot_save() Marco Elver
2021-09-10 14:08   ` Sebastian Andrzej Siewior
2021-09-07 14:13 ` [PATCH 4/6] kasan: common: provide can_alloc in kasan_save_stack() Marco Elver
2021-09-07 14:13 ` [PATCH 5/6] kasan: generic: introduce kasan_record_aux_stack_noalloc() Marco Elver
2021-09-07 14:13 ` [PATCH 6/6] workqueue, kasan: avoid alloc_pages() when recording stack Marco Elver
2021-09-07 16:39   ` Tejun Heo
2021-09-07 14:17 ` [PATCH 0/6] stackdepot, kasan, workqueue: Avoid expanding stackdepot slabs when holding raw_spin_lock Marco Elver
2021-09-07 20:05 ` Shuah Khan
2021-09-10 10:50   ` Vlastimil Babka
2021-09-10 15:28     ` Sebastian Andrzej Siewior
2021-09-13  6:00       ` Marco Elver
2021-09-14 12:55 ` Alexander Potapenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).