linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9] kernel/fork: beware of __put_task_struct calling context
@ 2023-05-16 19:14 Wander Lairson Costa
  2023-05-16 19:24 ` Matthew Wilcox
  2023-05-17 15:26 ` Oleg Nesterov
  0 siblings, 2 replies; 16+ messages in thread
From: Wander Lairson Costa @ 2023-05-16 19:14 UTC (permalink / raw)
  To: Oleg Nesterov, Russell King (Oracle),
	Brian Cain, Michael Ellerman, Stafford Horne, Kefeng Wang,
	Wander Lairson Costa, Andrew Morton, Liam R. Howlett,
	Vlastimil Babka, Matthew Wilcox (Oracle),
	Eric W. Biederman, Andrei Vagin, Peter Zijlstra,
	Paul E. McKenney, Daniel Bristot de Oliveira, Yu Zhao,
	Alexey Gladkov, Mike Kravetz, Yang Shi, open list
  Cc: Hu Chunyu, Valentin Schneider, Sebastian Andrzej Siewior,
	Steven Rostedt, Luis Goncalves

Under PREEMPT_RT, __put_task_struct() indirectly acquires sleeping
locks. Therefore, it can't be called from an non-preemptible context.

One practical example is splat inside inactive_task_timer(), which is
called in a interrupt context:

CPU: 1 PID: 2848 Comm: life Kdump: loaded Tainted: G W ---------
 Hardware name: HP ProLiant DL388p Gen8, BIOS P70 07/15/2012
 Call Trace:
 dump_stack_lvl+0x57/0x7d
 mark_lock_irq.cold+0x33/0xba
 ? stack_trace_save+0x4b/0x70
 ? save_trace+0x55/0x150
 mark_lock+0x1e7/0x400
 mark_usage+0x11d/0x140
 __lock_acquire+0x30d/0x930
 lock_acquire.part.0+0x9c/0x210
 ? refill_obj_stock+0x3d/0x3a0
 ? rcu_read_lock_sched_held+0x3f/0x70
 ? trace_lock_acquire+0x38/0x140
 ? lock_acquire+0x30/0x80
 ? refill_obj_stock+0x3d/0x3a0
 rt_spin_lock+0x27/0xe0
 ? refill_obj_stock+0x3d/0x3a0
 refill_obj_stock+0x3d/0x3a0
 ? inactive_task_timer+0x1ad/0x340
 kmem_cache_free+0x357/0x560
 inactive_task_timer+0x1ad/0x340
 ? switched_from_dl+0x2d0/0x2d0
 __run_hrtimer+0x8a/0x1a0
 __hrtimer_run_queues+0x91/0x130
 hrtimer_interrupt+0x10f/0x220
 __sysvec_apic_timer_interrupt+0x7b/0xd0
 sysvec_apic_timer_interrupt+0x4f/0xd0
 ? asm_sysvec_apic_timer_interrupt+0xa/0x20
 asm_sysvec_apic_timer_interrupt+0x12/0x20
 RIP: 0033:0x7fff196bf6f5

Instead of calling __put_task_struct() directly, we defer it using
call_rcu(). A more natural approach would use a workqueue, but since
in PREEMPT_RT, we can't allocate dynamic memory from atomic context,
the code would become more complex because we would need to put the
work_struct instance in the task_struct and initialize it when we
allocate a new task_struct.

Changelog
=========

v1:
* Initial implementation fixing the splat.

v2:
* Isolate the logic in its own function.
* Fix two more cases caught in review.

v3:
* Change __put_task_struct() to handle the issue internally.

v4:
* Explain why call_rcu() is safe to call from interrupt context.

v5:
* Explain why __put_task_struct() doesn't conflict with
  put_task_sruct_rcu_user.

v6:
* As per Sebastian's review, revert back the implementation of v2
  with a distinct function.
* Add a check in put_task_struct() to warning when called from a
  non-sleepable context.
* Address more call sites.

v7:
* Fix typos.
* Add an explanation why the new function doesn't conflict with
  delayed_free_task().

v8:
* Bring back v5.
* Fix coding style.

v9:
* Reorganize to not need ___put_task_struct() by Oleg's suggestion.

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Reported-by: Hu Chunyu <chuhu@redhat.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Valentin Schneider <vschneid@redhat.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Paul McKenney <paulmck@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Luis Goncalves <lgoncalv@redhat.com>
---
 include/linux/sched/task.h | 28 +++++++++++++++++++++++++++-
 kernel/fork.c              |  8 ++++++++
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index d6c48163c6de..9bcb9535d4e1 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -112,10 +112,36 @@ static inline struct task_struct *get_task_struct(struct task_struct *t)
 }
 
 extern void __put_task_struct(struct task_struct *t);
+extern void __put_task_struct_rcu_cb(struct rcu_head *rhp);
 
 static inline void put_task_struct(struct task_struct *t)
 {
-	if (refcount_dec_and_test(&t->usage))
+	if (!refcount_dec_and_test(&t->usage))
+		return;
+
+	/*
+	 * under PREEMPT_RT, we can't call put_task_struct
+	 * in atomic context because it will indirectly
+	 * acquire sleeping locks.
+	 *
+	 * call_rcu() will schedule delayed_put_task_struct_rcu()
+	 * to be called in process context.
+	 *
+	 * __put_task_struct() is called when
+	 * refcount_dec_and_test(&t->usage) succeeds.
+	 *
+	 * This means that it can't "conflict" with
+	 * put_task_struct_rcu_user() which abuses ->rcu the same
+	 * way; rcu_users has a reference so task->usage can't be
+	 * zero after rcu_users 1 -> 0 transition.
+	 *
+	 * delayed_free_task() also uses ->rcu, but it is only called
+	 * when it fails to fork a process. Therefore, there is no
+	 * way it can conflict with put_task_struct().
+	 */
+	if (IS_ENABLED(CONFIG_PREEMPT_RT) && !preemptible())
+		call_rcu(&t->rcu, __put_task_struct_rcu_cb);
+	else
 		__put_task_struct(t);
 }
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 08969f5aa38d..fd3bb4a554c4 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -846,6 +846,14 @@ void __put_task_struct(struct task_struct *tsk)
 }
 EXPORT_SYMBOL_GPL(__put_task_struct);
 
+void __put_task_struct_rcu_cb(struct rcu_head *rhp)
+{
+	struct task_struct *task = container_of(rhp, struct task_struct, rcu);
+
+	__put_task_struct(task);
+}
+EXPORT_SYMBOL_GPL(__put_task_struct_rcu_cb);
+
 void __init __weak arch_task_cache_init(void) { }
 
 /*
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v9] kernel/fork: beware of __put_task_struct calling context
  2023-05-16 19:14 [PATCH v9] kernel/fork: beware of __put_task_struct calling context Wander Lairson Costa
@ 2023-05-16 19:24 ` Matthew Wilcox
  2023-05-16 21:05   ` Andrew Morton
  2023-05-17 15:26 ` Oleg Nesterov
  1 sibling, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2023-05-16 19:24 UTC (permalink / raw)
  To: Wander Lairson Costa
  Cc: Oleg Nesterov, Russell King (Oracle),
	Brian Cain, Michael Ellerman, Stafford Horne, Kefeng Wang,
	Andrew Morton, Liam R. Howlett, Vlastimil Babka,
	Eric W. Biederman, Andrei Vagin, Peter Zijlstra,
	Paul E. McKenney, Daniel Bristot de Oliveira, Yu Zhao,
	Alexey Gladkov, Mike Kravetz, Yang Shi, open list, Hu Chunyu,
	Valentin Schneider, Sebastian Andrzej Siewior, Steven Rostedt,
	Luis Goncalves

On Tue, May 16, 2023 at 04:14:41PM -0300, Wander Lairson Costa wrote:
> +void __put_task_struct_rcu_cb(struct rcu_head *rhp)
> +{
> +	struct task_struct *task = container_of(rhp, struct task_struct, rcu);
> +
> +	__put_task_struct(task);
> +}
> +EXPORT_SYMBOL_GPL(__put_task_struct_rcu_cb);

Why does this need to be exported when its only caller is within the
main kernel and cannot possibly be built as a module?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v9] kernel/fork: beware of __put_task_struct calling context
  2023-05-16 19:24 ` Matthew Wilcox
@ 2023-05-16 21:05   ` Andrew Morton
  2023-05-16 21:41     ` Matthew Wilcox
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2023-05-16 21:05 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Wander Lairson Costa, Oleg Nesterov, Russell King (Oracle),
	Brian Cain, Michael Ellerman, Stafford Horne, Kefeng Wang,
	Liam R. Howlett, Vlastimil Babka, Eric W. Biederman,
	Andrei Vagin, Peter Zijlstra, Paul E. McKenney,
	Daniel Bristot de Oliveira, Yu Zhao, Alexey Gladkov,
	Mike Kravetz, Yang Shi, open list, Hu Chunyu, Valentin Schneider,
	Sebastian Andrzej Siewior, Steven Rostedt, Luis Goncalves

On Tue, 16 May 2023 20:24:04 +0100 Matthew Wilcox <willy@infradead.org> wrote:

> On Tue, May 16, 2023 at 04:14:41PM -0300, Wander Lairson Costa wrote:
> > +void __put_task_struct_rcu_cb(struct rcu_head *rhp)
> > +{
> > +	struct task_struct *task = container_of(rhp, struct task_struct, rcu);
> > +
> > +	__put_task_struct(task);
> > +}
> > +EXPORT_SYMBOL_GPL(__put_task_struct_rcu_cb);
> 
> Why does this need to be exported when its only caller is within the
> main kernel and cannot possibly be built as a module?

It's referenced by inlined put_task_struct(), which is called from all
over.

However I believe the above definition could be inside #ifdef
CONFIG_PREEMPT_RT, to save a scrap of resources?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v9] kernel/fork: beware of __put_task_struct calling context
  2023-05-16 21:05   ` Andrew Morton
@ 2023-05-16 21:41     ` Matthew Wilcox
  2023-05-16 22:50       ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2023-05-16 21:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Wander Lairson Costa, Oleg Nesterov, Russell King (Oracle),
	Brian Cain, Michael Ellerman, Stafford Horne, Kefeng Wang,
	Liam R. Howlett, Vlastimil Babka, Eric W. Biederman,
	Andrei Vagin, Peter Zijlstra, Paul E. McKenney,
	Daniel Bristot de Oliveira, Yu Zhao, Alexey Gladkov,
	Mike Kravetz, Yang Shi, open list, Hu Chunyu, Valentin Schneider,
	Sebastian Andrzej Siewior, Steven Rostedt, Luis Goncalves

On Tue, May 16, 2023 at 02:05:55PM -0700, Andrew Morton wrote:
> On Tue, 16 May 2023 20:24:04 +0100 Matthew Wilcox <willy@infradead.org> wrote:
> 
> > On Tue, May 16, 2023 at 04:14:41PM -0300, Wander Lairson Costa wrote:
> > > +void __put_task_struct_rcu_cb(struct rcu_head *rhp)
> > > +{
> > > +	struct task_struct *task = container_of(rhp, struct task_struct, rcu);
> > > +
> > > +	__put_task_struct(task);
> > > +}
> > > +EXPORT_SYMBOL_GPL(__put_task_struct_rcu_cb);
> > 
> > Why does this need to be exported when its only caller is within the
> > main kernel and cannot possibly be built as a module?
> 
> It's referenced by inlined put_task_struct(), which is called from all
> over.

Oh, I missed that put_task_struct() was still inlined.  Should it be?
It seems quite large now.

> However I believe the above definition could be inside #ifdef
> CONFIG_PREEMPT_RT, to save a scrap of resources?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v9] kernel/fork: beware of __put_task_struct calling context
  2023-05-16 21:41     ` Matthew Wilcox
@ 2023-05-16 22:50       ` Andrew Morton
  0 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2023-05-16 22:50 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Wander Lairson Costa, Oleg Nesterov, Russell King (Oracle),
	Brian Cain, Michael Ellerman, Stafford Horne, Kefeng Wang,
	Liam R. Howlett, Vlastimil Babka, Eric W. Biederman,
	Andrei Vagin, Peter Zijlstra, Paul E. McKenney,
	Daniel Bristot de Oliveira, Yu Zhao, Alexey Gladkov,
	Mike Kravetz, Yang Shi, open list, Hu Chunyu, Valentin Schneider,
	Sebastian Andrzej Siewior, Steven Rostedt, Luis Goncalves

On Tue, 16 May 2023 22:41:18 +0100 Matthew Wilcox <willy@infradead.org> wrote:

> 
> Oh, I missed that put_task_struct() was still inlined.  Should it be?
> It seems quite large now.

It's not significantly worse because of this patch.  In fact, it's
unchanged for non-RT kernels.

Possibly put_task_struct() *should* be uninlined, because it made the
mistake of using the dang refcount stuff, which never saw a byte which
it couldn't consume :(


I mean...

--- a/fs/open.c~a
+++ a/fs/open.c
@@ -1572,3 +1572,9 @@ int stream_open(struct inode *inode, str
 }
 
 EXPORT_SYMBOL(stream_open);
+
+#include <linux/refcount.h>
+bool foo(refcount_t *r)
+{
+	return refcount_dec_and_test(r);
+}
_

is worth

339 bytes of text for an allmodconfig build 
67 bytes of text for an allnoconfig build 
77 bytes of text for a defconfig build 




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v9] kernel/fork: beware of __put_task_struct calling context
  2023-05-16 19:14 [PATCH v9] kernel/fork: beware of __put_task_struct calling context Wander Lairson Costa
  2023-05-16 19:24 ` Matthew Wilcox
@ 2023-05-17 15:26 ` Oleg Nesterov
  2023-05-17 16:57   ` Wander Lairson Costa
  1 sibling, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2023-05-17 15:26 UTC (permalink / raw)
  To: Wander Lairson Costa
  Cc: Russell King (Oracle),
	Brian Cain, Michael Ellerman, Stafford Horne, Kefeng Wang,
	Andrew Morton, Liam R. Howlett, Vlastimil Babka,
	Matthew Wilcox (Oracle),
	Eric W. Biederman, Andrei Vagin, Peter Zijlstra,
	Paul E. McKenney, Daniel Bristot de Oliveira, Yu Zhao,
	Alexey Gladkov, Mike Kravetz, Yang Shi, open list, Hu Chunyu,
	Valentin Schneider, Sebastian Andrzej Siewior, Steven Rostedt,
	Luis Goncalves

On 05/16, Wander Lairson Costa wrote:
>
>  static inline void put_task_struct(struct task_struct *t)
>  {
> -	if (refcount_dec_and_test(&t->usage))
> +	if (!refcount_dec_and_test(&t->usage))
> +		return;
> +
> +	/*
> +	 * under PREEMPT_RT, we can't call put_task_struct
> +	 * in atomic context because it will indirectly
> +	 * acquire sleeping locks.
> +	 *
> +	 * call_rcu() will schedule delayed_put_task_struct_rcu()
> +	 * to be called in process context.
> +	 *
> +	 * __put_task_struct() is called when
> +	 * refcount_dec_and_test(&t->usage) succeeds.
> +	 *
> +	 * This means that it can't "conflict" with
> +	 * put_task_struct_rcu_user() which abuses ->rcu the same
> +	 * way; rcu_users has a reference so task->usage can't be
> +	 * zero after rcu_users 1 -> 0 transition.
> +	 *
> +	 * delayed_free_task() also uses ->rcu, but it is only called
> +	 * when it fails to fork a process. Therefore, there is no
> +	 * way it can conflict with put_task_struct().
> +	 */
> +	if (IS_ENABLED(CONFIG_PREEMPT_RT) && !preemptible())
> +		call_rcu(&t->rcu, __put_task_struct_rcu_cb);
> +	else
>  		__put_task_struct(t);
>  }

LGTM but we still need to understand the possible problems with CONFIG_PROVE_RAW_LOCK_NESTING ...

Again, I'll try to investigate when I have time although I am not sure I can really help.

Perhaps you too can try to do this ? ;)

Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v9] kernel/fork: beware of __put_task_struct calling context
  2023-05-17 15:26 ` Oleg Nesterov
@ 2023-05-17 16:57   ` Wander Lairson Costa
  2023-05-29 12:22     ` Oleg Nesterov
  0 siblings, 1 reply; 16+ messages in thread
From: Wander Lairson Costa @ 2023-05-17 16:57 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Russell King (Oracle),
	Brian Cain, Michael Ellerman, Stafford Horne, Kefeng Wang,
	Andrew Morton, Liam R. Howlett, Vlastimil Babka,
	Matthew Wilcox (Oracle),
	Eric W. Biederman, Andrei Vagin, Peter Zijlstra,
	Paul E. McKenney, Daniel Bristot de Oliveira, Yu Zhao,
	Alexey Gladkov, Mike Kravetz, Yang Shi, open list, Hu Chunyu,
	Valentin Schneider, Sebastian Andrzej Siewior, Steven Rostedt,
	Luis Goncalves

On Wed, May 17, 2023 at 12:26 PM Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 05/16, Wander Lairson Costa wrote:
> >
> >  static inline void put_task_struct(struct task_struct *t)
> >  {
> > -     if (refcount_dec_and_test(&t->usage))
> > +     if (!refcount_dec_and_test(&t->usage))
> > +             return;
> > +
> > +     /*
> > +      * under PREEMPT_RT, we can't call put_task_struct
> > +      * in atomic context because it will indirectly
> > +      * acquire sleeping locks.
> > +      *
> > +      * call_rcu() will schedule delayed_put_task_struct_rcu()
> > +      * to be called in process context.
> > +      *
> > +      * __put_task_struct() is called when
> > +      * refcount_dec_and_test(&t->usage) succeeds.
> > +      *
> > +      * This means that it can't "conflict" with
> > +      * put_task_struct_rcu_user() which abuses ->rcu the same
> > +      * way; rcu_users has a reference so task->usage can't be
> > +      * zero after rcu_users 1 -> 0 transition.
> > +      *
> > +      * delayed_free_task() also uses ->rcu, but it is only called
> > +      * when it fails to fork a process. Therefore, there is no
> > +      * way it can conflict with put_task_struct().
> > +      */
> > +     if (IS_ENABLED(CONFIG_PREEMPT_RT) && !preemptible())
> > +             call_rcu(&t->rcu, __put_task_struct_rcu_cb);
> > +     else
> >               __put_task_struct(t);
> >  }
>
> LGTM but we still need to understand the possible problems with CONFIG_PROVE_RAW_LOCK_NESTING ...
>
> Again, I'll try to investigate when I have time although I am not sure I can really help.
>
> Perhaps you too can try to do this ? ;)
>

FWIW, I tested this patch with CONFIG_PROVE_LOCK_NESTING in RT and
stock kernels. No splat happened.

> Oleg.
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v9] kernel/fork: beware of __put_task_struct calling context
  2023-05-17 16:57   ` Wander Lairson Costa
@ 2023-05-29 12:22     ` Oleg Nesterov
  2023-06-01 17:45       ` Wander Lairson Costa
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2023-05-29 12:22 UTC (permalink / raw)
  To: Wander Lairson Costa
  Cc: Russell King (Oracle),
	Brian Cain, Michael Ellerman, Stafford Horne, Kefeng Wang,
	Andrew Morton, Liam R. Howlett, Vlastimil Babka,
	Matthew Wilcox (Oracle),
	Eric W. Biederman, Andrei Vagin, Peter Zijlstra,
	Paul E. McKenney, Daniel Bristot de Oliveira, Yu Zhao,
	Alexey Gladkov, Mike Kravetz, Yang Shi, open list, Hu Chunyu,
	Valentin Schneider, Sebastian Andrzej Siewior, Steven Rostedt,
	Luis Goncalves

On 05/17, Wander Lairson Costa wrote:
>
> On Wed, May 17, 2023 at 12:26 PM Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > LGTM but we still need to understand the possible problems with CONFIG_PROVE_RAW_LOCK_NESTING ...
> >
> > Again, I'll try to investigate when I have time although I am not sure I can really help.
> >
> > Perhaps you too can try to do this ? ;)
> >
>
> FWIW, I tested this patch with CONFIG_PROVE_LOCK_NESTING in RT and
> stock kernels. No splat happened.

Strange... FYI, I am running the kernel with this patch

	diff --git a/kernel/sys.c b/kernel/sys.c
	index 339fee3eff6a..3169cceddf3b 100644
	--- a/kernel/sys.c
	+++ b/kernel/sys.c
	@@ -2412,6 +2412,17 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
	 
		error = 0;
		switch (option) {
	+	case 666: {
	+		static DEFINE_SPINLOCK(l);
	+		static DEFINE_RAW_SPINLOCK(r);
	+
	+		raw_spin_lock(&r);
	+		spin_lock(&l);
	+		spin_unlock(&l);
	+		raw_spin_unlock(&r);
	+
	+		break;
	+	}
		case PR_SET_PDEATHSIG:
			if (!valid_signal(arg2)) {
				error = -EINVAL;

applied (because I am too lazy to compile a module ;) and

	# perl -e 'syscall 157,666'

triggers the lockdep bug

	=============================
	[ BUG: Invalid wait context ]
	6.4.0-rc2-00018-g4d6d4c7f541d-dirty #1176 Not tainted
	-----------------------------
	perl/35 is trying to lock:
	ffffffff81c4cc18 (l){....}-{3:3}, at: __do_sys_prctl+0x21b/0x87b
	other info that might help us debug this:
	context-{5:5}
	...

as expected.

Looks like your testing was wrong... Or maybe you missed another lockdep problem ?
Did you check dmesg? Perhaps lockdep detected another bug,say, even at boot time ?
In this case debug_locks_off() sets debug_locks = 0 and this disables lockdep.

Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v9] kernel/fork: beware of __put_task_struct calling context
  2023-05-29 12:22     ` Oleg Nesterov
@ 2023-06-01 17:45       ` Wander Lairson Costa
  2023-06-01 18:13         ` Oleg Nesterov
  0 siblings, 1 reply; 16+ messages in thread
From: Wander Lairson Costa @ 2023-06-01 17:45 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Russell King (Oracle),
	Brian Cain, Michael Ellerman, Stafford Horne, Kefeng Wang,
	Andrew Morton, Liam R. Howlett, Vlastimil Babka,
	Matthew Wilcox (Oracle),
	Eric W. Biederman, Andrei Vagin, Peter Zijlstra,
	Paul E. McKenney, Daniel Bristot de Oliveira, Yu Zhao,
	Alexey Gladkov, Mike Kravetz, Yang Shi, open list, Hu Chunyu,
	Valentin Schneider, Sebastian Andrzej Siewior, Steven Rostedt,
	Luis Goncalves

On Mon, May 29, 2023 at 9:23 AM Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 05/17, Wander Lairson Costa wrote:
> >
> > On Wed, May 17, 2023 at 12:26 PM Oleg Nesterov <oleg@redhat.com> wrote:
> > >
> > > LGTM but we still need to understand the possible problems with CONFIG_PROVE_RAW_LOCK_NESTING ...
> > >
> > > Again, I'll try to investigate when I have time although I am not sure I can really help.
> > >
> > > Perhaps you too can try to do this ? ;)
> > >
> >
> > FWIW, I tested this patch with CONFIG_PROVE_LOCK_NESTING in RT and
> > stock kernels. No splat happened.
>
> Strange... FYI, I am running the kernel with this patch
>
>         diff --git a/kernel/sys.c b/kernel/sys.c
>         index 339fee3eff6a..3169cceddf3b 100644
>         --- a/kernel/sys.c
>         +++ b/kernel/sys.c
>         @@ -2412,6 +2412,17 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>
>                 error = 0;
>                 switch (option) {
>         +       case 666: {
>         +               static DEFINE_SPINLOCK(l);
>         +               static DEFINE_RAW_SPINLOCK(r);
>         +
>         +               raw_spin_lock(&r);
>         +               spin_lock(&l);
>         +               spin_unlock(&l);
>         +               raw_spin_unlock(&r);
>         +
>         +               break;
>         +       }
>                 case PR_SET_PDEATHSIG:
>                         if (!valid_signal(arg2)) {
>                                 error = -EINVAL;
>
> applied (because I am too lazy to compile a module ;) and
>

FWIW, I converted it to a module [1]

>         # perl -e 'syscall 157,666'
>
> triggers the lockdep bug
>
>         =============================
>         [ BUG: Invalid wait context ]
>         6.4.0-rc2-00018-g4d6d4c7f541d-dirty #1176 Not tainted
>         -----------------------------
>         perl/35 is trying to lock:
>         ffffffff81c4cc18 (l){....}-{3:3}, at: __do_sys_prctl+0x21b/0x87b
>         other info that might help us debug this:
>         context-{5:5}
>         ...
>
> as expected.
>

Yeah, I tried it here and I had the same results, but only in the RT
kernel. But running the reproducer for put_task_struct(), works fine.

> Looks like your testing was wrong... Or maybe you missed another lockdep problem ?
> Did you check dmesg? Perhaps lockdep detected another bug,say, even at boot time ?
> In this case debug_locks_off() sets debug_locks = 0 and this disables lockdep.
>
> Oleg.
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v9] kernel/fork: beware of __put_task_struct calling context
  2023-06-01 17:45       ` Wander Lairson Costa
@ 2023-06-01 18:13         ` Oleg Nesterov
  2023-06-01 18:23           ` Wander Lairson Costa
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2023-06-01 18:13 UTC (permalink / raw)
  To: Wander Lairson Costa
  Cc: Russell King (Oracle),
	Brian Cain, Michael Ellerman, Stafford Horne, Kefeng Wang,
	Andrew Morton, Liam R. Howlett, Vlastimil Babka,
	Matthew Wilcox (Oracle),
	Eric W. Biederman, Andrei Vagin, Peter Zijlstra,
	Paul E. McKenney, Daniel Bristot de Oliveira, Yu Zhao,
	Alexey Gladkov, Mike Kravetz, Yang Shi, open list, Hu Chunyu,
	Valentin Schneider, Sebastian Andrzej Siewior, Steven Rostedt,
	Luis Goncalves

On 06/01, Wander Lairson Costa wrote:
>
> On Mon, May 29, 2023 at 9:23 AM Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > On 05/17, Wander Lairson Costa wrote:
> > >
> > > On Wed, May 17, 2023 at 12:26 PM Oleg Nesterov <oleg@redhat.com> wrote:
> > > >
> > > > LGTM but we still need to understand the possible problems with CONFIG_PROVE_RAW_LOCK_NESTING ...
> > > >
> > > > Again, I'll try to investigate when I have time although I am not sure I can really help.
> > > >
> > > > Perhaps you too can try to do this ? ;)
> > > >
> > >
> > > FWIW, I tested this patch with CONFIG_PROVE_LOCK_NESTING in RT and
> > > stock kernels. No splat happened.
> >
> > Strange... FYI, I am running the kernel with this patch
> >
> >         diff --git a/kernel/sys.c b/kernel/sys.c
> >         index 339fee3eff6a..3169cceddf3b 100644
> >         --- a/kernel/sys.c
> >         +++ b/kernel/sys.c
> >         @@ -2412,6 +2412,17 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> >
> >                 error = 0;
> >                 switch (option) {
> >         +       case 666: {
> >         +               static DEFINE_SPINLOCK(l);
> >         +               static DEFINE_RAW_SPINLOCK(r);
> >         +
> >         +               raw_spin_lock(&r);
> >         +               spin_lock(&l);
> >         +               spin_unlock(&l);
> >         +               raw_spin_unlock(&r);
> >         +
> >         +               break;
> >         +       }
> >                 case PR_SET_PDEATHSIG:
> >                         if (!valid_signal(arg2)) {
> >                                 error = -EINVAL;
> >
> > applied (because I am too lazy to compile a module ;) and
> >
>
> FWIW, I converted it to a module [1]

where is [1] ?  not that I think this matters though...

> >         # perl -e 'syscall 157,666'
> >
> > triggers the lockdep bug
> >
> >         =============================
> >         [ BUG: Invalid wait context ]
> >         6.4.0-rc2-00018-g4d6d4c7f541d-dirty #1176 Not tainted
> >         -----------------------------
> >         perl/35 is trying to lock:
> >         ffffffff81c4cc18 (l){....}-{3:3}, at: __do_sys_prctl+0x21b/0x87b
> >         other info that might help us debug this:
> >         context-{5:5}
> >         ...
> >
> > as expected.
> >
>
> Yeah, I tried it here and I had the same results,

OK,

> but only in the RT kernel

this again suggests that your testing was wrong or I am totally confused (quite
possible, I know nothing about RT). I did the testing without CONFIG_PREEMPT_RT.

> But running the reproducer for put_task_struct(), works fine.

which reproducer ?

Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v9] kernel/fork: beware of __put_task_struct calling context
  2023-06-01 18:13         ` Oleg Nesterov
@ 2023-06-01 18:23           ` Wander Lairson Costa
  2023-06-02 17:34             ` Oleg Nesterov
  2023-06-02 17:39             ` Oleg Nesterov
  0 siblings, 2 replies; 16+ messages in thread
From: Wander Lairson Costa @ 2023-06-01 18:23 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Russell King (Oracle),
	Brian Cain, Michael Ellerman, Stafford Horne, Kefeng Wang,
	Andrew Morton, Liam R. Howlett, Vlastimil Babka,
	Matthew Wilcox (Oracle),
	Eric W. Biederman, Andrei Vagin, Peter Zijlstra,
	Paul E. McKenney, Daniel Bristot de Oliveira, Yu Zhao,
	Alexey Gladkov, Mike Kravetz, Yang Shi, open list, Hu Chunyu,
	Valentin Schneider, Sebastian Andrzej Siewior, Steven Rostedt,
	Luis Goncalves

On Thu, Jun 1, 2023 at 3:14 PM Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 06/01, Wander Lairson Costa wrote:
> >
> > On Mon, May 29, 2023 at 9:23 AM Oleg Nesterov <oleg@redhat.com> wrote:
> > >
> > > On 05/17, Wander Lairson Costa wrote:
> > > >
> > > > On Wed, May 17, 2023 at 12:26 PM Oleg Nesterov <oleg@redhat.com> wrote:
> > > > >
> > > > > LGTM but we still need to understand the possible problems with CONFIG_PROVE_RAW_LOCK_NESTING ...
> > > > >
> > > > > Again, I'll try to investigate when I have time although I am not sure I can really help.
> > > > >
> > > > > Perhaps you too can try to do this ? ;)
> > > > >
> > > >
> > > > FWIW, I tested this patch with CONFIG_PROVE_LOCK_NESTING in RT and
> > > > stock kernels. No splat happened.
> > >
> > > Strange... FYI, I am running the kernel with this patch
> > >
> > >         diff --git a/kernel/sys.c b/kernel/sys.c
> > >         index 339fee3eff6a..3169cceddf3b 100644
> > >         --- a/kernel/sys.c
> > >         +++ b/kernel/sys.c
> > >         @@ -2412,6 +2412,17 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> > >
> > >                 error = 0;
> > >                 switch (option) {
> > >         +       case 666: {
> > >         +               static DEFINE_SPINLOCK(l);
> > >         +               static DEFINE_RAW_SPINLOCK(r);
> > >         +
> > >         +               raw_spin_lock(&r);
> > >         +               spin_lock(&l);
> > >         +               spin_unlock(&l);
> > >         +               raw_spin_unlock(&r);
> > >         +
> > >         +               break;
> > >         +       }
> > >                 case PR_SET_PDEATHSIG:
> > >                         if (!valid_signal(arg2)) {
> > >                                 error = -EINVAL;
> > >
> > > applied (because I am too lazy to compile a module ;) and
> > >
> >
> > FWIW, I converted it to a module [1]
>
> where is [1] ?  not that I think this matters though...
>
> > >         # perl -e 'syscall 157,666'
> > >
> > > triggers the lockdep bug
> > >
> > >         =============================
> > >         [ BUG: Invalid wait context ]
> > >         6.4.0-rc2-00018-g4d6d4c7f541d-dirty #1176 Not tainted
> > >         -----------------------------
> > >         perl/35 is trying to lock:
> > >         ffffffff81c4cc18 (l){....}-{3:3}, at: __do_sys_prctl+0x21b/0x87b
> > >         other info that might help us debug this:
> > >         context-{5:5}
> > >         ...
> > >
> > > as expected.
> > >
> >
> > Yeah, I tried it here and I had the same results,
>
> OK,
>
> > but only in the RT kernel
>
> this again suggests that your testing was wrong or I am totally confused (quite
> possible, I know nothing about RT). I did the testing without CONFIG_PREEMPT_RT.
>

Hrm, could you please share your .config?

> > But running the reproducer for put_task_struct(), works fine.
>
> which reproducer ?
>

Only now I noticed I didn't add the reproducer to the commit message:

while true; do
    stress-ng --sched deadline --sched-period 1000000000
--sched-runtime 800000000 --sched-deadline 1000000000 --mmapfork 23 -t
20
done


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v9] kernel/fork: beware of __put_task_struct calling context
  2023-06-01 18:23           ` Wander Lairson Costa
@ 2023-06-02 17:34             ` Oleg Nesterov
  2023-06-05 11:24               ` Wander Lairson Costa
  2023-06-02 17:39             ` Oleg Nesterov
  1 sibling, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2023-06-02 17:34 UTC (permalink / raw)
  To: Wander Lairson Costa
  Cc: Russell King (Oracle),
	Brian Cain, Michael Ellerman, Stafford Horne, Kefeng Wang,
	Andrew Morton, Liam R. Howlett, Vlastimil Babka,
	Matthew Wilcox (Oracle),
	Eric W. Biederman, Andrei Vagin, Peter Zijlstra,
	Paul E. McKenney, Daniel Bristot de Oliveira, Yu Zhao,
	Alexey Gladkov, Mike Kravetz, Yang Shi, open list, Hu Chunyu,
	Valentin Schneider, Sebastian Andrzej Siewior, Steven Rostedt,
	Luis Goncalves

On 06/01, Wander Lairson Costa wrote:
>
> On Thu, Jun 1, 2023 at 3:14 PM Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > > but only in the RT kernel
> >
> > this again suggests that your testing was wrong or I am totally confused (quite
> > possible, I know nothing about RT). I did the testing without CONFIG_PREEMPT_RT.
> >
>
> Hrm, could you please share your .config?

Sure. I do not want to spam the list, I'll send you a private email.

Can you share your kernel module code?

Did you verify that debug_locks != 0 as I asked in my previous email ?

> > > But running the reproducer for put_task_struct(), works fine.
> >
> > which reproducer ?
> >
>
> Only now I noticed I didn't add the reproducer to the commit message:
>
> while true; do
>     stress-ng --sched deadline --sched-period 1000000000
> --sched-runtime 800000000 --sched-deadline 1000000000 --mmapfork 23 -t
> 20
> done

Cough ;) I think we need a more simple one to enssure that
refcount_sub_and_test(nr, &t->usage) returns true under raw_spin_lock()
and then __put_task_struct() actually takes spin_lock().

Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v9] kernel/fork: beware of __put_task_struct calling context
  2023-06-01 18:23           ` Wander Lairson Costa
  2023-06-02 17:34             ` Oleg Nesterov
@ 2023-06-02 17:39             ` Oleg Nesterov
  1 sibling, 0 replies; 16+ messages in thread
From: Oleg Nesterov @ 2023-06-02 17:39 UTC (permalink / raw)
  To: Wander Lairson Costa
  Cc: Russell King (Oracle),
	Brian Cain, Michael Ellerman, Stafford Horne, Kefeng Wang,
	Andrew Morton, Liam R. Howlett, Vlastimil Babka,
	Matthew Wilcox (Oracle),
	Eric W. Biederman, Andrei Vagin, Peter Zijlstra,
	Paul E. McKenney, Daniel Bristot de Oliveira, Yu Zhao,
	Alexey Gladkov, Mike Kravetz, Yang Shi, open list, Hu Chunyu,
	Valentin Schneider, Sebastian Andrzej Siewior, Steven Rostedt,
	Luis Goncalves

On 06/01, Wander Lairson Costa wrote:
>
> On Thu, Jun 1, 2023 at 3:14 PM Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > > but only in the RT kernel
> >
> > this again suggests that your testing was wrong or I am totally confused (quite
> > possible, I know nothing about RT). I did the testing without CONFIG_PREEMPT_RT.
> >
>
> Hrm, could you please share your .config?

Sure. I do not want to spam the list, I'll send you a private email.

Can you share your kernel module code?

Did you verify that debug_locks != 0 as I asked in my previous email ?

> > > But running the reproducer for put_task_struct(), works fine.
> >
> > which reproducer ?
> >
>
> Only now I noticed I didn't add the reproducer to the commit message:
>
> while true; do
>     stress-ng --sched deadline --sched-period 1000000000
> --sched-runtime 800000000 --sched-deadline 1000000000 --mmapfork 23 -t
> 20
> done

Cough ;) I think we need something more simple to ensure that
refcount_sub_and_test(nr, &t->usage) returns true under raw_spin_lock()
and then __put_task_struct() actually takes spin_lock().

Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v9] kernel/fork: beware of __put_task_struct calling context
  2023-06-02 17:34             ` Oleg Nesterov
@ 2023-06-05 11:24               ` Wander Lairson Costa
  2023-06-06 20:39                 ` Oleg Nesterov
  0 siblings, 1 reply; 16+ messages in thread
From: Wander Lairson Costa @ 2023-06-05 11:24 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Russell King (Oracle),
	Brian Cain, Michael Ellerman, Stafford Horne, Kefeng Wang,
	Andrew Morton, Liam R. Howlett, Vlastimil Babka,
	Matthew Wilcox (Oracle),
	Eric W. Biederman, Andrei Vagin, Peter Zijlstra,
	Paul E. McKenney, Daniel Bristot de Oliveira, Yu Zhao,
	Alexey Gladkov, Mike Kravetz, Yang Shi, open list, Hu Chunyu,
	Valentin Schneider, Sebastian Andrzej Siewior, Steven Rostedt,
	Luis Goncalves

On Fri, Jun 2, 2023 at 2:34 PM Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 06/01, Wander Lairson Costa wrote:
> >
> > On Thu, Jun 1, 2023 at 3:14 PM Oleg Nesterov <oleg@redhat.com> wrote:
> > >
> > > > but only in the RT kernel
> > >
> > > this again suggests that your testing was wrong or I am totally confused (quite
> > > possible, I know nothing about RT). I did the testing without CONFIG_PREEMPT_RT.
> > >
> >
> > Hrm, could you please share your .config?
>
> Sure. I do not want to spam the list, I'll send you a private email.
>

Thanks. I found an unrelated earlier splat in the console code. That's
why I couldn't reproduce it in the stock kernel.

> Can you share your kernel module code?
>

*facepalm* I forgot to post the link: https://github.com/walac/test-prove-lock/

> Did you verify that debug_locks != 0 as I asked in my previous email ?
>
> > > > But running the reproducer for put_task_struct(), works fine.
> > >
> > > which reproducer ?
> > >
> >
> > Only now I noticed I didn't add the reproducer to the commit message:
> >
> > while true; do
> >     stress-ng --sched deadline --sched-period 1000000000
> > --sched-runtime 800000000 --sched-deadline 1000000000 --mmapfork 23 -t
> > 20
> > done
>
> Cough ;) I think we need a more simple one to enssure that
> refcount_sub_and_test(nr, &t->usage) returns true under raw_spin_lock()
> and then __put_task_struct() actually takes spin_lock().
>
> Oleg.
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v9] kernel/fork: beware of __put_task_struct calling context
  2023-06-05 11:24               ` Wander Lairson Costa
@ 2023-06-06 20:39                 ` Oleg Nesterov
  2023-06-09 19:07                   ` Wander Lairson Costa
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2023-06-06 20:39 UTC (permalink / raw)
  To: Wander Lairson Costa
  Cc: Russell King (Oracle),
	Brian Cain, Michael Ellerman, Stafford Horne, Kefeng Wang,
	Andrew Morton, Liam R. Howlett, Vlastimil Babka,
	Matthew Wilcox (Oracle),
	Eric W. Biederman, Andrei Vagin, Peter Zijlstra,
	Paul E. McKenney, Daniel Bristot de Oliveira, Yu Zhao,
	Alexey Gladkov, Mike Kravetz, Yang Shi, open list, Hu Chunyu,
	Valentin Schneider, Sebastian Andrzej Siewior, Steven Rostedt,
	Luis Goncalves

On 06/05, Wander Lairson Costa wrote:
>
> Thanks. I found an unrelated earlier splat in the console code. That's
> why I couldn't reproduce it in the stock kernel.

As expected...

So... Not sure what can I say ;) can you verify that this patch doesn't solve
the issues with CONFIG_PROVE_RAW_LOCK_NESTING pointed out by Sebastian? Using
stress-ng or anything else.

This is not that bad, unless I am totally confused the current code (without
your patch) has the same problem (otherwise we wouldn't need this fix).

But perhaps you can make 2/2 which adds the DEFINE_WAIT_OVERRIDE_MAP() hack
as Peter suggested?

Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v9] kernel/fork: beware of __put_task_struct calling context
  2023-06-06 20:39                 ` Oleg Nesterov
@ 2023-06-09 19:07                   ` Wander Lairson Costa
  0 siblings, 0 replies; 16+ messages in thread
From: Wander Lairson Costa @ 2023-06-09 19:07 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Russell King (Oracle),
	Brian Cain, Michael Ellerman, Stafford Horne, Kefeng Wang,
	Andrew Morton, Liam R. Howlett, Vlastimil Babka,
	Matthew Wilcox (Oracle),
	Eric W. Biederman, Andrei Vagin, Peter Zijlstra,
	Paul E. McKenney, Daniel Bristot de Oliveira, Yu Zhao,
	Alexey Gladkov, Mike Kravetz, Yang Shi, open list, Hu Chunyu,
	Valentin Schneider, Sebastian Andrzej Siewior, Steven Rostedt,
	Luis Goncalves

On Tue, Jun 6, 2023 at 5:40 PM Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 06/05, Wander Lairson Costa wrote:
> >
> > Thanks. I found an unrelated earlier splat in the console code. That's
> > why I couldn't reproduce it in the stock kernel.
>
> As expected...
>
> So... Not sure what can I say ;) can you verify that this patch doesn't solve
> the issues with CONFIG_PROVE_RAW_LOCK_NESTING pointed out by Sebastian? Using
> stress-ng or anything else.
>

I managed to test it without a console. No issues happened in the stock kernel.

> This is not that bad, unless I am totally confused the current code (without
> your patch) has the same problem (otherwise we wouldn't need this fix).
>

That's my understanding as well.

> But perhaps you can make 2/2 which adds the DEFINE_WAIT_OVERRIDE_MAP() hack
> as Peter suggested?
>

Yes, sure. I would like to get the issue reproduced in practice to
make sure I am really fixing the problem. But I can live with that.

> Oleg.
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-06-09 19:08 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-16 19:14 [PATCH v9] kernel/fork: beware of __put_task_struct calling context Wander Lairson Costa
2023-05-16 19:24 ` Matthew Wilcox
2023-05-16 21:05   ` Andrew Morton
2023-05-16 21:41     ` Matthew Wilcox
2023-05-16 22:50       ` Andrew Morton
2023-05-17 15:26 ` Oleg Nesterov
2023-05-17 16:57   ` Wander Lairson Costa
2023-05-29 12:22     ` Oleg Nesterov
2023-06-01 17:45       ` Wander Lairson Costa
2023-06-01 18:13         ` Oleg Nesterov
2023-06-01 18:23           ` Wander Lairson Costa
2023-06-02 17:34             ` Oleg Nesterov
2023-06-05 11:24               ` Wander Lairson Costa
2023-06-06 20:39                 ` Oleg Nesterov
2023-06-09 19:07                   ` Wander Lairson Costa
2023-06-02 17:39             ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).