All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] signal: break out of wait loops on kthread_stop()
@ 2022-06-27 12:00 Jason A. Donenfeld
  2022-06-27 13:27 ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Jason A. Donenfeld @ 2022-06-27 12:00 UTC (permalink / raw)
  To: linux-kernel, mingo, peterz, ebiederm
  Cc: Jason A. Donenfeld, Toke Høiland-Jørgensen, Kalle Valo,
	Johannes Berg

I was recently surprised to learn that msleep_interruptible(),
wait_for_completion_interruptible_timeout(), and related functions
simply hung when I called kthread_stop() on kthreads using them. The
solution to fixing the case with msleep_interruptible() was more simply
to move to schedule_timeout_interruptible(). Why?

The reason is that msleep_interruptible(), and many functions just like
it, has a loop like this:

        while (timeout && !signal_pending(current))
                timeout = schedule_timeout_interruptible(timeout);

The call to kthread_stop() woke up the thread, so schedule_timeout_
interruptible() returned early, but because signal_pending() returned
true, it went back into another timeout, which was never woken up.

This wait loop pattern is common to various pieces of code, and I
suspect that subtle misuse in a kthread that caused a deadlock in the
code I looked at last week is also found elsewhere.

So this commit causes signal_pending() to return true when
kthread_stop() is called. This is already what's done for
TIF_NOTIFY_SIGNAL, for these same purposes of breaking out of wait
loops, so a similar KTHREAD_SHOULD_STOP check isn't too much different.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 include/linux/kthread.h      | 1 +
 include/linux/sched/signal.h | 9 +++++++++
 kernel/kthread.c             | 6 ++++++
 3 files changed, 16 insertions(+)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 30e5bec81d2b..7061dde23237 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -87,6 +87,7 @@ void kthread_bind(struct task_struct *k, unsigned int cpu);
 void kthread_bind_mask(struct task_struct *k, const struct cpumask *mask);
 int kthread_stop(struct task_struct *k);
 bool kthread_should_stop(void);
+bool __kthread_should_stop(struct task_struct *k);
 bool kthread_should_park(void);
 bool __kthread_should_park(struct task_struct *k);
 bool kthread_freezable_should_stop(bool *was_frozen);
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index cafbe03eed01..08700c65b806 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -11,6 +11,7 @@
 #include <linux/refcount.h>
 #include <linux/posix-timers.h>
 #include <linux/mm_types.h>
+#include <linux/kthread.h>
 #include <asm/ptrace.h>
 
 /*
@@ -397,6 +398,14 @@ static inline int signal_pending(struct task_struct *p)
 	 */
 	if (unlikely(test_tsk_thread_flag(p, TIF_NOTIFY_SIGNAL)))
 		return 1;
+
+	/*
+	 * Likewise, KTHREAD_SHOULD_STOP isn't really a signal, but it also
+	 * requires the same behavior, lest wait loops go forever.
+	 */
+	if (unlikely(__kthread_should_stop(p)))
+		return 1;
+
 	return task_sigpending(p);
 }
 
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 3c677918d8f2..7e0743330cd4 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -145,6 +145,12 @@ void free_kthread_struct(struct task_struct *k)
 	kfree(kthread);
 }
 
+bool __kthread_should_stop(struct task_struct *k)
+{
+	return (k->flags & PF_KTHREAD) &&
+	       test_bit(KTHREAD_SHOULD_STOP, &to_kthread(k)->flags);
+}
+
 /**
  * kthread_should_stop - should this kthread return now?
  *
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] signal: break out of wait loops on kthread_stop()
  2022-06-27 12:00 [PATCH] signal: break out of wait loops on kthread_stop() Jason A. Donenfeld
@ 2022-06-27 13:27 ` Peter Zijlstra
  2022-06-27 14:54   ` Jason A. Donenfeld
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2022-06-27 13:27 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: linux-kernel, mingo, ebiederm, Toke Høiland-Jørgensen,
	Kalle Valo, Johannes Berg

On Mon, Jun 27, 2022 at 02:00:20PM +0200, Jason A. Donenfeld wrote:

> +bool __kthread_should_stop(struct task_struct *k)
> +{
> +	return (k->flags & PF_KTHREAD) &&
> +	       test_bit(KTHREAD_SHOULD_STOP, &to_kthread(k)->flags);
> +}

This used to be a racy pattern; not sure it still is since Eric poked at
this last, but please use __to_kthread() instead.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] signal: break out of wait loops on kthread_stop()
  2022-06-27 13:27 ` Peter Zijlstra
@ 2022-06-27 14:54   ` Jason A. Donenfeld
  2022-06-27 14:57     ` [PATCH v2] " Jason A. Donenfeld
  0 siblings, 1 reply; 17+ messages in thread
From: Jason A. Donenfeld @ 2022-06-27 14:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, ebiederm, Toke Høiland-Jørgensen,
	Kalle Valo, Johannes Berg

On Mon, Jun 27, 2022 at 03:27:03PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 27, 2022 at 02:00:20PM +0200, Jason A. Donenfeld wrote:
> 
> > +bool __kthread_should_stop(struct task_struct *k)
> > +{
> > +	return (k->flags & PF_KTHREAD) &&
> > +	       test_bit(KTHREAD_SHOULD_STOP, &to_kthread(k)->flags);
> > +}
> 
> This used to be a racy pattern; not sure it still is since Eric poked at
> this last, but please use __to_kthread() instead.

Ah, indeed. Will send a v2.

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2] signal: break out of wait loops on kthread_stop()
  2022-06-27 14:54   ` Jason A. Donenfeld
@ 2022-06-27 14:57     ` Jason A. Donenfeld
  2022-06-27 19:16       ` Eric W. Biederman
  0 siblings, 1 reply; 17+ messages in thread
From: Jason A. Donenfeld @ 2022-06-27 14:57 UTC (permalink / raw)
  To: linux-kernel, mingo, peterz, ebiederm
  Cc: Jason A. Donenfeld, Toke Høiland-Jørgensen, Kalle Valo,
	Johannes Berg

I was recently surprised to learn that msleep_interruptible(),
wait_for_completion_interruptible_timeout(), and related functions
simply hung when I called kthread_stop() on kthreads using them. The
solution to fixing the case with msleep_interruptible() was more simply
to move to schedule_timeout_interruptible(). Why?

The reason is that msleep_interruptible(), and many functions just like
it, has a loop like this:

        while (timeout && !signal_pending(current))
                timeout = schedule_timeout_interruptible(timeout);

The call to kthread_stop() woke up the thread, so schedule_timeout_
interruptible() returned early, but because signal_pending() returned
true, it went back into another timeout, which was never woken up.

This wait loop pattern is common to various pieces of code, and I
suspect that subtle misuse in a kthread that caused a deadlock in the
code I looked at last week is also found elsewhere.

So this commit causes signal_pending() to return true when
kthread_stop() is called. This is already what's done for
TIF_NOTIFY_SIGNAL, for these same purposes of breaking out of wait
loops, so a similar KTHREAD_SHOULD_STOP check isn't too much different.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 include/linux/kthread.h      | 1 +
 include/linux/sched/signal.h | 9 +++++++++
 kernel/kthread.c             | 8 ++++++++
 3 files changed, 18 insertions(+)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 30e5bec81d2b..7061dde23237 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -87,6 +87,7 @@ void kthread_bind(struct task_struct *k, unsigned int cpu);
 void kthread_bind_mask(struct task_struct *k, const struct cpumask *mask);
 int kthread_stop(struct task_struct *k);
 bool kthread_should_stop(void);
+bool __kthread_should_stop(struct task_struct *k);
 bool kthread_should_park(void);
 bool __kthread_should_park(struct task_struct *k);
 bool kthread_freezable_should_stop(bool *was_frozen);
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index cafbe03eed01..08700c65b806 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -11,6 +11,7 @@
 #include <linux/refcount.h>
 #include <linux/posix-timers.h>
 #include <linux/mm_types.h>
+#include <linux/kthread.h>
 #include <asm/ptrace.h>
 
 /*
@@ -397,6 +398,14 @@ static inline int signal_pending(struct task_struct *p)
 	 */
 	if (unlikely(test_tsk_thread_flag(p, TIF_NOTIFY_SIGNAL)))
 		return 1;
+
+	/*
+	 * Likewise, KTHREAD_SHOULD_STOP isn't really a signal, but it also
+	 * requires the same behavior, lest wait loops go forever.
+	 */
+	if (unlikely(__kthread_should_stop(p)))
+		return 1;
+
 	return task_sigpending(p);
 }
 
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 3c677918d8f2..80f6ba323060 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -145,6 +145,14 @@ void free_kthread_struct(struct task_struct *k)
 	kfree(kthread);
 }
 
+bool __kthread_should_stop(struct task_struct *k)
+{
+	struct kthread *kthread = __to_kthread(k);
+
+	return kthread && test_bit(KTHREAD_SHOULD_STOP, &kthread->flags);
+}
+EXPORT_SYMBOL_GPL(__kthread_should_stop);
+
 /**
  * kthread_should_stop - should this kthread return now?
  *
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] signal: break out of wait loops on kthread_stop()
  2022-06-27 14:57     ` [PATCH v2] " Jason A. Donenfeld
@ 2022-06-27 19:16       ` Eric W. Biederman
  2022-06-28 15:59         ` Jason A. Donenfeld
  0 siblings, 1 reply; 17+ messages in thread
From: Eric W. Biederman @ 2022-06-27 19:16 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: linux-kernel, mingo, peterz, Toke Høiland-Jørgensen,
	Kalle Valo, Johannes Berg

"Jason A. Donenfeld" <Jason@zx2c4.com> writes:

> I was recently surprised to learn that msleep_interruptible(),
> wait_for_completion_interruptible_timeout(), and related functions
> simply hung when I called kthread_stop() on kthreads using them. The
> solution to fixing the case with msleep_interruptible() was more simply
> to move to schedule_timeout_interruptible(). Why?
>
> The reason is that msleep_interruptible(), and many functions just like
> it, has a loop like this:
>
>         while (timeout && !signal_pending(current))
>                 timeout = schedule_timeout_interruptible(timeout);
>
> The call to kthread_stop() woke up the thread, so schedule_timeout_
> interruptible() returned early, but because signal_pending() returned
> true, it went back into another timeout, which was never woken up.
>
> This wait loop pattern is common to various pieces of code, and I
> suspect that subtle misuse in a kthread that caused a deadlock in the
> code I looked at last week is also found elsewhere.
>
> So this commit causes signal_pending() to return true when
> kthread_stop() is called. This is already what's done for
> TIF_NOTIFY_SIGNAL, for these same purposes of breaking out of wait
> loops, so a similar KTHREAD_SHOULD_STOP check isn't too much
> different.

Semantically this makes a lot of sense.

Bloating up signal_pending which is mainly called in non-kthread
contexts is undesirable.

Instead could you modify kthread_stop to call set_notify_signal().

That is exactly what set_notify_signal is there for.  When you don't
actually have a signal but you want to break out of an interruptible
loop.  My last round of work in the area decoupled set_notify_signal
from any other semantics.


It would be nice to get everything down so that we only need to test
TIF_NOTIFY_SIGNAL in signal_pending.  Unfortunately to do that I need
to do something with task_sigpending, and it hasn't been important
enough to weed through all of those details yet.

Eric



> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> Cc: Kalle Valo <kvalo@kernel.org>
> Cc: Johannes Berg <johannes@sipsolutions.net>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
>  include/linux/kthread.h      | 1 +
>  include/linux/sched/signal.h | 9 +++++++++
>  kernel/kthread.c             | 8 ++++++++
>  3 files changed, 18 insertions(+)
>
> diff --git a/include/linux/kthread.h b/include/linux/kthread.h
> index 30e5bec81d2b..7061dde23237 100644
> --- a/include/linux/kthread.h
> +++ b/include/linux/kthread.h
> @@ -87,6 +87,7 @@ void kthread_bind(struct task_struct *k, unsigned int cpu);
>  void kthread_bind_mask(struct task_struct *k, const struct cpumask *mask);
>  int kthread_stop(struct task_struct *k);
>  bool kthread_should_stop(void);
> +bool __kthread_should_stop(struct task_struct *k);
>  bool kthread_should_park(void);
>  bool __kthread_should_park(struct task_struct *k);
>  bool kthread_freezable_should_stop(bool *was_frozen);
> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
> index cafbe03eed01..08700c65b806 100644
> --- a/include/linux/sched/signal.h
> +++ b/include/linux/sched/signal.h
> @@ -11,6 +11,7 @@
>  #include <linux/refcount.h>
>  #include <linux/posix-timers.h>
>  #include <linux/mm_types.h>
> +#include <linux/kthread.h>
>  #include <asm/ptrace.h>
>  
>  /*
> @@ -397,6 +398,14 @@ static inline int signal_pending(struct task_struct *p)
>  	 */
>  	if (unlikely(test_tsk_thread_flag(p, TIF_NOTIFY_SIGNAL)))
>  		return 1;
> +
> +	/*
> +	 * Likewise, KTHREAD_SHOULD_STOP isn't really a signal, but it also
> +	 * requires the same behavior, lest wait loops go forever.
> +	 */
> +	if (unlikely(__kthread_should_stop(p)))
> +		return 1;
> +
>  	return task_sigpending(p);
>  }
>  
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index 3c677918d8f2..80f6ba323060 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -145,6 +145,14 @@ void free_kthread_struct(struct task_struct *k)
>  	kfree(kthread);
>  }
>  
> +bool __kthread_should_stop(struct task_struct *k)
> +{
> +	struct kthread *kthread = __to_kthread(k);
> +
> +	return kthread && test_bit(KTHREAD_SHOULD_STOP, &kthread->flags);
> +}
> +EXPORT_SYMBOL_GPL(__kthread_should_stop);
> +
>  /**
>   * kthread_should_stop - should this kthread return now?
>   *

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] signal: break out of wait loops on kthread_stop()
  2022-06-27 19:16       ` Eric W. Biederman
@ 2022-06-28 15:59         ` Jason A. Donenfeld
  2022-06-28 16:14           ` [PATCH v3] " Jason A. Donenfeld
  0 siblings, 1 reply; 17+ messages in thread
From: Jason A. Donenfeld @ 2022-06-28 15:59 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, mingo, peterz, Toke Høiland-Jørgensen,
	Kalle Valo, Johannes Berg

Hi Eric,

On Mon, Jun 27, 2022 at 02:16:08PM -0500, Eric W. Biederman wrote:
> Semantically this makes a lot of sense.
> 
> Bloating up signal_pending which is mainly called in non-kthread
> contexts is undesirable.

I guess I understand that concern, but does it really matter here? This
is called by code that waits anyway, so it's not like performance
matters at all, right?

> Instead could you modify kthread_stop to call set_notify_signal().
> 
> That is exactly what set_notify_signal is there for.  When you don't
> actually have a signal but you want to break out of an interruptible
> loop.  My last round of work in the area decoupled set_notify_signal
> from any other semantics.

This sounds like the best option here, if in fact it does work. I'll
send in a patch for that and we can see how it interacts with the other
work you're doing.

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3] signal: break out of wait loops on kthread_stop()
  2022-06-28 15:59         ` Jason A. Donenfeld
@ 2022-06-28 16:14           ` Jason A. Donenfeld
  2022-07-04 12:22             ` Jason A. Donenfeld
  0 siblings, 1 reply; 17+ messages in thread
From: Jason A. Donenfeld @ 2022-06-28 16:14 UTC (permalink / raw)
  To: Eric W. Biederman, linux-kernel
  Cc: Jason A. Donenfeld, Ingo Molnar, Peter Zijlstra,
	Toke Høiland-Jørgensen, Kalle Valo, Johannes Berg

I was recently surprised to learn that msleep_interruptible(),
wait_for_completion_interruptible_timeout(), and related functions
simply hung when I called kthread_stop() on kthreads using them. The
solution to fixing the case with msleep_interruptible() was more simply
to move to schedule_timeout_interruptible(). Why?

The reason is that msleep_interruptible(), and many functions just like
it, has a loop like this:

        while (timeout && !signal_pending(current))
                timeout = schedule_timeout_interruptible(timeout);

The call to kthread_stop() woke up the thread, so schedule_timeout_
interruptible() returned early, but because signal_pending() returned
true, it went back into another timeout, which was never woken up.

This wait loop pattern is common to various pieces of code, and I
suspect that the subtle misuse in a kthread that caused a deadlock in
the code I looked at last week is also found elsewhere.

So this commit causes signal_pending() to return true when
kthread_stop() is called, by setting TIF_NOTIFY_SIGNAL.

The same also applies to the similar kthread_park() functionality.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 kernel/kthread.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 3c677918d8f2..63d5a1f4cb93 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -661,12 +661,14 @@ int kthread_park(struct task_struct *k)
 
 	set_bit(KTHREAD_SHOULD_PARK, &kthread->flags);
 	if (k != current) {
+		test_and_set_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL);
 		wake_up_process(k);
 		/*
 		 * Wait for __kthread_parkme() to complete(), this means we
 		 * _will_ have TASK_PARKED and are about to call schedule().
 		 */
 		wait_for_completion(&kthread->parked);
+		clear_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL);
 		/*
 		 * Now wait for that schedule() to complete and the task to
 		 * get scheduled out.
@@ -704,8 +706,10 @@ int kthread_stop(struct task_struct *k)
 	kthread = to_kthread(k);
 	set_bit(KTHREAD_SHOULD_STOP, &kthread->flags);
 	kthread_unpark(k);
+	test_and_set_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL);
 	wake_up_process(k);
 	wait_for_completion(&kthread->exited);
+	clear_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL);
 	ret = kthread->result;
 	put_task_struct(k);
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] signal: break out of wait loops on kthread_stop()
  2022-06-28 16:14           ` [PATCH v3] " Jason A. Donenfeld
@ 2022-07-04 12:22             ` Jason A. Donenfeld
  2022-07-11 17:53               ` Jason A. Donenfeld
  0 siblings, 1 reply; 17+ messages in thread
From: Jason A. Donenfeld @ 2022-07-04 12:22 UTC (permalink / raw)
  To: Eric W. Biederman, linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Toke Høiland-Jørgensen,
	Kalle Valo, Johannes Berg

Hey Eric,

On Tue, Jun 28, 2022 at 06:14:41PM +0200, Jason A. Donenfeld wrote:
> I was recently surprised to learn that msleep_interruptible(),
> wait_for_completion_interruptible_timeout(), and related functions
> simply hung when I called kthread_stop() on kthreads using them. The
> solution to fixing the case with msleep_interruptible() was more simply
> to move to schedule_timeout_interruptible(). Why?
> 
> The reason is that msleep_interruptible(), and many functions just like
> it, has a loop like this:
> 
>         while (timeout && !signal_pending(current))
>                 timeout = schedule_timeout_interruptible(timeout);
> 
> The call to kthread_stop() woke up the thread, so schedule_timeout_
> interruptible() returned early, but because signal_pending() returned
> true, it went back into another timeout, which was never woken up.
> 
> This wait loop pattern is common to various pieces of code, and I
> suspect that the subtle misuse in a kthread that caused a deadlock in
> the code I looked at last week is also found elsewhere.
> 
> So this commit causes signal_pending() to return true when
> kthread_stop() is called, by setting TIF_NOTIFY_SIGNAL.
> 
> The same also applies to the similar kthread_park() functionality.
> 
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> Cc: Kalle Valo <kvalo@kernel.org>
> Cc: Johannes Berg <johannes@sipsolutions.net>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
>  kernel/kthread.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index 3c677918d8f2..63d5a1f4cb93 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -661,12 +661,14 @@ int kthread_park(struct task_struct *k)
>  
>  	set_bit(KTHREAD_SHOULD_PARK, &kthread->flags);
>  	if (k != current) {
> +		test_and_set_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL);
>  		wake_up_process(k);
>  		/*
>  		 * Wait for __kthread_parkme() to complete(), this means we
>  		 * _will_ have TASK_PARKED and are about to call schedule().
>  		 */
>  		wait_for_completion(&kthread->parked);
> +		clear_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL);
>  		/*
>  		 * Now wait for that schedule() to complete and the task to
>  		 * get scheduled out.
> @@ -704,8 +706,10 @@ int kthread_stop(struct task_struct *k)
>  	kthread = to_kthread(k);
>  	set_bit(KTHREAD_SHOULD_STOP, &kthread->flags);
>  	kthread_unpark(k);
> +	test_and_set_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL);
>  	wake_up_process(k);
>  	wait_for_completion(&kthread->exited);
> +	clear_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL);
>  	ret = kthread->result;
>  	put_task_struct(k);
>  
> -- 
> 2.35.1
> 

Is this more to the tune of what you had in mind in your message [1]?

Jason

[1] https://lore.kernel.org/lkml/877d51udc7.fsf@email.froward.int.ebiederm.org/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] signal: break out of wait loops on kthread_stop()
  2022-07-04 12:22             ` Jason A. Donenfeld
@ 2022-07-11 17:53               ` Jason A. Donenfeld
  2022-07-11 18:57                 ` Eric W. Biederman
  0 siblings, 1 reply; 17+ messages in thread
From: Jason A. Donenfeld @ 2022-07-11 17:53 UTC (permalink / raw)
  To: Eric W. Biederman, LKML

Hi Eric,

On Mon, Jul 4, 2022 at 2:22 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Hey Eric,
>
> On Tue, Jun 28, 2022 at 06:14:41PM +0200, Jason A. Donenfeld wrote:
> > I was recently surprised to learn that msleep_interruptible(),
> > wait_for_completion_interruptible_timeout(), and related functions
> > simply hung when I called kthread_stop() on kthreads using them. The
> > solution to fixing the case with msleep_interruptible() was more simply
> > to move to schedule_timeout_interruptible(). Why?
> >
> > The reason is that msleep_interruptible(), and many functions just like
> > it, has a loop like this:
> >
> >         while (timeout && !signal_pending(current))
> >                 timeout = schedule_timeout_interruptible(timeout);
> >
> > The call to kthread_stop() woke up the thread, so schedule_timeout_
> > interruptible() returned early, but because signal_pending() returned
> > true, it went back into another timeout, which was never woken up.
> >
> > This wait loop pattern is common to various pieces of code, and I
> > suspect that the subtle misuse in a kthread that caused a deadlock in
> > the code I looked at last week is also found elsewhere.
> >
> > So this commit causes signal_pending() to return true when
> > kthread_stop() is called, by setting TIF_NOTIFY_SIGNAL.
> >
> > The same also applies to the similar kthread_park() functionality.
> >
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Eric W. Biederman <ebiederm@xmission.com>
> > Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> > Cc: Kalle Valo <kvalo@kernel.org>
> > Cc: Johannes Berg <johannes@sipsolutions.net>
> > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> > ---
> >  kernel/kthread.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/kernel/kthread.c b/kernel/kthread.c
> > index 3c677918d8f2..63d5a1f4cb93 100644
> > --- a/kernel/kthread.c
> > +++ b/kernel/kthread.c
> > @@ -661,12 +661,14 @@ int kthread_park(struct task_struct *k)
> >
> >       set_bit(KTHREAD_SHOULD_PARK, &kthread->flags);
> >       if (k != current) {
> > +             test_and_set_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL);
> >               wake_up_process(k);
> >               /*
> >                * Wait for __kthread_parkme() to complete(), this means we
> >                * _will_ have TASK_PARKED and are about to call schedule().
> >                */
> >               wait_for_completion(&kthread->parked);
> > +             clear_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL);
> >               /*
> >                * Now wait for that schedule() to complete and the task to
> >                * get scheduled out.
> > @@ -704,8 +706,10 @@ int kthread_stop(struct task_struct *k)
> >       kthread = to_kthread(k);
> >       set_bit(KTHREAD_SHOULD_STOP, &kthread->flags);
> >       kthread_unpark(k);
> > +     test_and_set_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL);
> >       wake_up_process(k);
> >       wait_for_completion(&kthread->exited);
> > +     clear_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL);
> >       ret = kthread->result;
> >       put_task_struct(k);
> >
> > --
> > 2.35.1
> >
>
> Is this more to the tune of what you had in mind in your message [1]?
>
> Jason
>
> [1] https://lore.kernel.org/lkml/877d51udc7.fsf@email.froward.int.ebiederm.org/

Paging again...

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] signal: break out of wait loops on kthread_stop()
  2022-07-11 17:53               ` Jason A. Donenfeld
@ 2022-07-11 18:57                 ` Eric W. Biederman
  2022-07-11 20:18                   ` Jason A. Donenfeld
  0 siblings, 1 reply; 17+ messages in thread
From: Eric W. Biederman @ 2022-07-11 18:57 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: LKML

"Jason A. Donenfeld" <Jason@zx2c4.com> writes:

> Hi Eric,
>
> On Mon, Jul 4, 2022 at 2:22 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>>
>> Hey Eric,
>>
>> On Tue, Jun 28, 2022 at 06:14:41PM +0200, Jason A. Donenfeld wrote:
>> > I was recently surprised to learn that msleep_interruptible(),
>> > wait_for_completion_interruptible_timeout(), and related functions
>> > simply hung when I called kthread_stop() on kthreads using them. The
>> > solution to fixing the case with msleep_interruptible() was more simply
>> > to move to schedule_timeout_interruptible(). Why?
>> >
>> > The reason is that msleep_interruptible(), and many functions just like
>> > it, has a loop like this:
>> >
>> >         while (timeout && !signal_pending(current))
>> >                 timeout = schedule_timeout_interruptible(timeout);
>> >
>> > The call to kthread_stop() woke up the thread, so schedule_timeout_
>> > interruptible() returned early, but because signal_pending() returned
>> > true, it went back into another timeout, which was never woken up.
>> >
>> > This wait loop pattern is common to various pieces of code, and I
>> > suspect that the subtle misuse in a kthread that caused a deadlock in
>> > the code I looked at last week is also found elsewhere.
>> >
>> > So this commit causes signal_pending() to return true when
>> > kthread_stop() is called, by setting TIF_NOTIFY_SIGNAL.
>> >
>> > The same also applies to the similar kthread_park() functionality.
>> >
>> > Cc: Ingo Molnar <mingo@redhat.com>
>> > Cc: Peter Zijlstra <peterz@infradead.org>
>> > Cc: Eric W. Biederman <ebiederm@xmission.com>
>> > Cc: Toke Høiland-Jørgensen <toke@redhat.com>
>> > Cc: Kalle Valo <kvalo@kernel.org>
>> > Cc: Johannes Berg <johannes@sipsolutions.net>
>> > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
>> > ---
>> >  kernel/kthread.c | 4 ++++
>> >  1 file changed, 4 insertions(+)
>> >
>> > diff --git a/kernel/kthread.c b/kernel/kthread.c
>> > index 3c677918d8f2..63d5a1f4cb93 100644
>> > --- a/kernel/kthread.c
>> > +++ b/kernel/kthread.c
>> > @@ -661,12 +661,14 @@ int kthread_park(struct task_struct *k)
>> >
>> >       set_bit(KTHREAD_SHOULD_PARK, &kthread->flags);
>> >       if (k != current) {
>> > +             test_and_set_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL);
>> >               wake_up_process(k);
>> >               /*
>> >                * Wait for __kthread_parkme() to complete(), this means we
>> >                * _will_ have TASK_PARKED and are about to call schedule().
>> >                */
>> >               wait_for_completion(&kthread->parked);
>> > +             clear_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL);
>> >               /*
>> >                * Now wait for that schedule() to complete and the task to
>> >                * get scheduled out.
>> > @@ -704,8 +706,10 @@ int kthread_stop(struct task_struct *k)
>> >       kthread = to_kthread(k);
>> >       set_bit(KTHREAD_SHOULD_STOP, &kthread->flags);
>> >       kthread_unpark(k);
>> > +     test_and_set_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL);
>> >       wake_up_process(k);
>> >       wait_for_completion(&kthread->exited);
>> > +     clear_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL);
>> >       ret = kthread->result;
>> >       put_task_struct(k);
>> >
>> > --
>> > 2.35.1
>> >
>>
>> Is this more to the tune of what you had in mind in your message [1]?
>>
>> Jason
>>
>> [1] https://lore.kernel.org/lkml/877d51udc7.fsf@email.froward.int.ebiederm.org/
>
> Paging again...

Overall it looks reasonable.

For kthread_stop clearing TIF_NOTIFY_SIGNAL seems pointless as the
process has exited.

I wonder if we should clear TIF_NOTIFY_SIGNAL in kthread_parkme.

Ordinarily TIF_NOTIFY_SIGNAL gets cleared when the target process
calls get_signal.  Which doesn't happen for kernel threads.
So I am not certain what the best pattern is, but my sense is that
keeping things as close to how TIF_NOTIFY_SIGNAL is processed
in the rest of the kernel seems the least error prone pattern.

So I am thinking the code should do something like:

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 544fd4097406..c80e8d44e405 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -266,6 +266,7 @@ static void __kthread_parkme(struct kthread *self)
 		 * changin from TASK_PARKED and us failing the
 		 * wait_task_inactive() in kthread_park().
 		 */
+		clear_notify_signal();
 		set_special_state(TASK_PARKED);
 		if (!test_bit(KTHREAD_SHOULD_PARK, &self->flags))
 			break;


To ensure that TIF_NOTIFY_SIGNAL is not set.

I am trying to think about how things interact if two threads call
kthread_park.  If the target thread is not responsible for clearing
TIF_NOTIFY_SIGNAL it feels like two threads could get into a race.  A
race where one thread sees the target thread has parked itself right
after another thread calls kthread_park and sets TIF_NOTIFY_SIGNAL.

Given the nature of kthread_park that scenario is probably completely
ridiculous, but it is all I can think of at the moment to demonstrate
my concerns.

In practice kthread_park is pretty specialized.  Do you have any
evidence that we need to break out of interruptible loops to make it
more reliable.  Perhaps it is best just to handle kthread_stop, and
leave kthread_park for when it actually matters.  Which would ensure
there are examples that people care about to help sort through exactly
how the code should behave in the kthread_park case.

Which I guess my long way of saying I think you can just change
kthread_stop to say:

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 544fd4097406..52e9b3432496 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -704,6 +704,7 @@ int kthread_stop(struct task_struct *k)
 	kthread = to_kthread(k);
 	set_bit(KTHREAD_SHOULD_STOP, &kthread->flags);
 	kthread_unpark(k);
+	set_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL);
 	wake_up_process(k);
 	wait_for_completion(&kthread->exited);
 	ret = kthread->result;


Your patch is correct that most of set_notify_signal is redundant with
wake_up_process(k).  Further if you aren't going to use the return value
like set_notify_signal does there is no need to test if the flag gets
set.

Eric


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] signal: break out of wait loops on kthread_stop()
  2022-07-11 18:57                 ` Eric W. Biederman
@ 2022-07-11 20:18                   ` Jason A. Donenfeld
  2022-07-11 20:21                     ` [PATCH v4] " Jason A. Donenfeld
  2022-07-11 22:04                     ` [PATCH v3] " Eric W. Biederman
  0 siblings, 2 replies; 17+ messages in thread
From: Jason A. Donenfeld @ 2022-07-11 20:18 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: LKML

Hi Eric,

Thanks for the review.

On Mon, Jul 11, 2022 at 01:57:55PM -0500, Eric W. Biederman wrote:
> For kthread_stop clearing TIF_NOTIFY_SIGNAL seems pointless as the
> process has exited.

Alright, I'll get rid of the clear there.

> I wonder if we should clear TIF_NOTIFY_SIGNAL in kthread_parkme.
> 
> Ordinarily TIF_NOTIFY_SIGNAL gets cleared when the target process
> calls get_signal.  Which doesn't happen for kernel threads.
> So I am not certain what the best pattern is, but my sense is that
> keeping things as close to how TIF_NOTIFY_SIGNAL is processed
> in the rest of the kernel seems the least error prone pattern.
> 
> So I am thinking the code should do something like:
> 
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index 544fd4097406..c80e8d44e405 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -266,6 +266,7 @@ static void __kthread_parkme(struct kthread *self)
>  		 * changin from TASK_PARKED and us failing the
>  		 * wait_task_inactive() in kthread_park().
>  		 */
> +		clear_notify_signal();
>  		set_special_state(TASK_PARKED);
>  		if (!test_bit(KTHREAD_SHOULD_PARK, &self->flags))
>  			break;

That seems at least semantically correct for how parkers usually work.

> I am trying to think about how things interact if two threads call
> kthread_park.  If the target thread is not responsible for clearing
> TIF_NOTIFY_SIGNAL it feels like two threads could get into a race.  A
> race where one thread sees the target thread has parked itself right
> after another thread calls kthread_park and sets TIF_NOTIFY_SIGNAL.
> 
> Given the nature of kthread_park that scenario is probably completely
> ridiculous, but it is all I can think of at the moment to demonstrate
> my concerns.

Right, it's a bit of a stretch to consider two threads racing on
kthread_park(), and were it to happen, maybe the right response would
be, "take a mutex so you don't race for that weird case." But it still
seems semantically correct to clear the flag as your diff does.

> In practice kthread_park is pretty specialized.  Do you have any
> evidence that we need to break out of interruptible loops to make it
> more reliable.  Perhaps it is best just to handle kthread_stop, and
> leave kthread_park for when it actually matters.  Which would ensure
> there are examples that people care about to help sort through exactly
> how the code should behave in the kthread_park case.

Oh, okay. I guess I can do that and we'll address the park case sometime
later if it ever comes up? But actually, upon very very cursory look, it
might actually be an issue now... The uses of it currently break down
into:

Core kernel things:
- kernel/stop_machine.c
- kernel/smpboot.c
- kernel/cpu.c

Drivers:
- drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
- drivers/gpu/drm/msm/adreno/adreno_device.c
- drivers/gpu/drm/scheduler/sched_main.c
- drivers/net/wireless/mediatek/mt76/util.h
- drivers/firmware/psci/psci_checker.c
- drivers/md/raid5-cache.c
- fs/btrfs/disk-io.c

btrfs makes calls to _interruptible() functions (though I'm not sure how
the context stack traces work out). The drm scheduler appears to use
wait_event_interruptible() in the first couple lines of the thread's
loop, though it's wait condition does check for kthread_should_stop(). 

So I don't know...

> Which I guess my long way of saying I think you can just change
> kthread_stop to say:
> 
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index 544fd4097406..52e9b3432496 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -704,6 +704,7 @@ int kthread_stop(struct task_struct *k)
>  	kthread = to_kthread(k);
>  	set_bit(KTHREAD_SHOULD_STOP, &kthread->flags);
>  	kthread_unpark(k);
> +	set_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL);
>  	wake_up_process(k);
>  	wait_for_completion(&kthread->exited);
>  	ret = kthread->result;
> 

Okay. I'll constrain it to just kthread_stop(). But please file away in
the back of your mind the potential for kthread_park() to be problematic
down the line, in case we have to fix that later.

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4] signal: break out of wait loops on kthread_stop()
  2022-07-11 20:18                   ` Jason A. Donenfeld
@ 2022-07-11 20:21                     ` Jason A. Donenfeld
  2022-07-11 22:05                       ` Eric W. Biederman
  2022-07-11 22:04                     ` [PATCH v3] " Eric W. Biederman
  1 sibling, 1 reply; 17+ messages in thread
From: Jason A. Donenfeld @ 2022-07-11 20:21 UTC (permalink / raw)
  To: linux-kernel, ebiederm; +Cc: Jason A. Donenfeld

I was recently surprised to learn that msleep_interruptible(),
wait_for_completion_interruptible_timeout(), and related functions
simply hung when I called kthread_stop() on kthreads using them. The
solution to fixing the case with msleep_interruptible() was more simply
to move to schedule_timeout_interruptible(). Why?

The reason is that msleep_interruptible(), and many functions just like
it, has a loop like this:

        while (timeout && !signal_pending(current))
                timeout = schedule_timeout_interruptible(timeout);

The call to kthread_stop() woke up the thread, so schedule_timeout_
interruptible() returned early, but because signal_pending() returned
true, it went back into another timeout, which was never woken up.

This wait loop pattern is common to various pieces of code, and I
suspect that the subtle misuse in a kthread that caused a deadlock in
the code I looked at last week is also found elsewhere.

So this commit causes signal_pending() to return true when
kthread_stop() is called, by setting TIF_NOTIFY_SIGNAL.

The same also probably applies to the similar kthread_park()
functionality, but that can be addressed later, as its semantics are
slightly different.

Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
Changes v3->v4:
- Don't address park() for now.
- Don't bother clearing the flag, since the task is about to be freed
  anyway.

 kernel/kthread.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 3c677918d8f2..8888987f2b25 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -704,6 +704,7 @@ int kthread_stop(struct task_struct *k)
 	kthread = to_kthread(k);
 	set_bit(KTHREAD_SHOULD_STOP, &kthread->flags);
 	kthread_unpark(k);
+	test_and_set_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL);
 	wake_up_process(k);
 	wait_for_completion(&kthread->exited);
 	ret = kthread->result;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] signal: break out of wait loops on kthread_stop()
  2022-07-11 20:18                   ` Jason A. Donenfeld
  2022-07-11 20:21                     ` [PATCH v4] " Jason A. Donenfeld
@ 2022-07-11 22:04                     ` Eric W. Biederman
  1 sibling, 0 replies; 17+ messages in thread
From: Eric W. Biederman @ 2022-07-11 22:04 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: LKML

"Jason A. Donenfeld" <Jason@zx2c4.com> writes:

> Hi Eric,
>
> Thanks for the review.
>
>> Which I guess my long way of saying I think you can just change
>> kthread_stop to say:
>> 
>> diff --git a/kernel/kthread.c b/kernel/kthread.c
>> index 544fd4097406..52e9b3432496 100644
>> --- a/kernel/kthread.c
>> +++ b/kernel/kthread.c
>> @@ -704,6 +704,7 @@ int kthread_stop(struct task_struct *k)
>>  	kthread = to_kthread(k);
>>  	set_bit(KTHREAD_SHOULD_STOP, &kthread->flags);
>>  	kthread_unpark(k);
>> +	set_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL);
>>  	wake_up_process(k);
>>  	wait_for_completion(&kthread->exited);
>>  	ret = kthread->result;
>> 
>
> Okay. I'll constrain it to just kthread_stop(). But please file away in
> the back of your mind the potential for kthread_park() to be problematic
> down the line, in case we have to fix that later.

Definitely.  Right now I am certain you are motivated to test and make
certain the kthread_stop case will work.  I just have the feeling that
we don't care enough about kthread_park, and so attempting to solve it
now is as likely to cause problems as solve them.

Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4] signal: break out of wait loops on kthread_stop()
  2022-07-11 20:21                     ` [PATCH v4] " Jason A. Donenfeld
@ 2022-07-11 22:05                       ` Eric W. Biederman
  2022-07-11 23:21                         ` [PATCH v5] " Jason A. Donenfeld
  0 siblings, 1 reply; 17+ messages in thread
From: Eric W. Biederman @ 2022-07-11 22:05 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: linux-kernel

"Jason A. Donenfeld" <Jason@zx2c4.com> writes:

> I was recently surprised to learn that msleep_interruptible(),
> wait_for_completion_interruptible_timeout(), and related functions
> simply hung when I called kthread_stop() on kthreads using them. The
> solution to fixing the case with msleep_interruptible() was more simply
> to move to schedule_timeout_interruptible(). Why?
>
> The reason is that msleep_interruptible(), and many functions just like
> it, has a loop like this:
>
>         while (timeout && !signal_pending(current))
>                 timeout = schedule_timeout_interruptible(timeout);
>
> The call to kthread_stop() woke up the thread, so schedule_timeout_
> interruptible() returned early, but because signal_pending() returned
> true, it went back into another timeout, which was never woken up.
>
> This wait loop pattern is common to various pieces of code, and I
> suspect that the subtle misuse in a kthread that caused a deadlock in
> the code I looked at last week is also found elsewhere.
>
> So this commit causes signal_pending() to return true when
> kthread_stop() is called, by setting TIF_NOTIFY_SIGNAL.
>
> The same also probably applies to the similar kthread_park()
> functionality, but that can be addressed later, as its semantics are
> slightly different.
>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
> Changes v3->v4:
> - Don't address park() for now.
> - Don't bother clearing the flag, since the task is about to be freed
>   anyway.
>
>  kernel/kthread.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index 3c677918d8f2..8888987f2b25 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -704,6 +704,7 @@ int kthread_stop(struct task_struct *k)
>  	kthread = to_kthread(k);
>  	set_bit(KTHREAD_SHOULD_STOP, &kthread->flags);
>  	kthread_unpark(k);
> +	test_and_set_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL);
>  	wake_up_process(k);
>  	wait_for_completion(&kthread->exited);
>  	ret = kthread->result;

Minor it.  Unless I have missed something that should just be
set_tsk_thread_flag.  You aren't using the return value so I don't
think there is any point in testing the previous state of
TIF_NOTIFY_SIGNAL.

Eric


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v5] signal: break out of wait loops on kthread_stop()
  2022-07-11 22:05                       ` Eric W. Biederman
@ 2022-07-11 23:21                         ` Jason A. Donenfeld
  2022-07-12  0:00                           ` Eric W. Biederman
  0 siblings, 1 reply; 17+ messages in thread
From: Jason A. Donenfeld @ 2022-07-11 23:21 UTC (permalink / raw)
  To: ebiederm, linux-kernel; +Cc: Jason A. Donenfeld

I was recently surprised to learn that msleep_interruptible(),
wait_for_completion_interruptible_timeout(), and related functions
simply hung when I called kthread_stop() on kthreads using them. The
solution to fixing the case with msleep_interruptible() was more simply
to move to schedule_timeout_interruptible(). Why?

The reason is that msleep_interruptible(), and many functions just like
it, has a loop like this:

        while (timeout && !signal_pending(current))
                timeout = schedule_timeout_interruptible(timeout);

The call to kthread_stop() woke up the thread, so schedule_timeout_
interruptible() returned early, but because signal_pending() returned
true, it went back into another timeout, which was never woken up.

This wait loop pattern is common to various pieces of code, and I
suspect that the subtle misuse in a kthread that caused a deadlock in
the code I looked at last week is also found elsewhere.

So this commit causes signal_pending() to return true when
kthread_stop() is called, by setting TIF_NOTIFY_SIGNAL.

The same also probably applies to the similar kthread_park()
functionality, but that can be addressed later, as its semantics are
slightly different.

Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
Changes v4->v5:
- Use set_tsk_thread_flag instead of test_and_set_tsk_thread_flag. Must
  have been a copy and paste mistarget.
Changes v3->v4:
- Don't address park() for now.
- Don't bother clearing the flag, since the task is about to be freed
  anyway.

 kernel/kthread.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 3c677918d8f2..7243a010f433 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -704,6 +704,7 @@ int kthread_stop(struct task_struct *k)
 	kthread = to_kthread(k);
 	set_bit(KTHREAD_SHOULD_STOP, &kthread->flags);
 	kthread_unpark(k);
+	set_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL);
 	wake_up_process(k);
 	wait_for_completion(&kthread->exited);
 	ret = kthread->result;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v5] signal: break out of wait loops on kthread_stop()
  2022-07-11 23:21                         ` [PATCH v5] " Jason A. Donenfeld
@ 2022-07-12  0:00                           ` Eric W. Biederman
  2022-07-12  0:18                             ` Jason A. Donenfeld
  0 siblings, 1 reply; 17+ messages in thread
From: Eric W. Biederman @ 2022-07-12  0:00 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: linux-kernel

"Jason A. Donenfeld" <Jason@zx2c4.com> writes:

> I was recently surprised to learn that msleep_interruptible(),
> wait_for_completion_interruptible_timeout(), and related functions
> simply hung when I called kthread_stop() on kthreads using them. The
> solution to fixing the case with msleep_interruptible() was more simply
> to move to schedule_timeout_interruptible(). Why?
>
> The reason is that msleep_interruptible(), and many functions just like
> it, has a loop like this:
>
>         while (timeout && !signal_pending(current))
>                 timeout = schedule_timeout_interruptible(timeout);
>
> The call to kthread_stop() woke up the thread, so schedule_timeout_
> interruptible() returned early, but because signal_pending() returned
> true, it went back into another timeout, which was never woken up.
>
> This wait loop pattern is common to various pieces of code, and I
> suspect that the subtle misuse in a kthread that caused a deadlock in
> the code I looked at last week is also found elsewhere.
>
> So this commit causes signal_pending() to return true when
> kthread_stop() is called, by setting TIF_NOTIFY_SIGNAL.
>
> The same also probably applies to the similar kthread_park()
> functionality, but that can be addressed later, as its semantics are
> slightly different.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

Do I need to pick this up and put it on a topic branch?
Or does this patch have another patch to get merged?


Eric

> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
> Changes v4->v5:
> - Use set_tsk_thread_flag instead of test_and_set_tsk_thread_flag. Must
>   have been a copy and paste mistarget.
> Changes v3->v4:
> - Don't address park() for now.
> - Don't bother clearing the flag, since the task is about to be freed
>   anyway.
>
>  kernel/kthread.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index 3c677918d8f2..7243a010f433 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -704,6 +704,7 @@ int kthread_stop(struct task_struct *k)
>  	kthread = to_kthread(k);
>  	set_bit(KTHREAD_SHOULD_STOP, &kthread->flags);
>  	kthread_unpark(k);
> +	set_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL);
>  	wake_up_process(k);
>  	wait_for_completion(&kthread->exited);
>  	ret = kthread->result;

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5] signal: break out of wait loops on kthread_stop()
  2022-07-12  0:00                           ` Eric W. Biederman
@ 2022-07-12  0:18                             ` Jason A. Donenfeld
  0 siblings, 0 replies; 17+ messages in thread
From: Jason A. Donenfeld @ 2022-07-12  0:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, gregerwin256, toke, kvalo, rsalvaterra, linux-wireless

Hi Eric,

On Mon, Jul 11, 2022 at 07:00:25PM -0500, Eric W. Biederman wrote:
> "Jason A. Donenfeld" <Jason@zx2c4.com> writes:
> 
> > I was recently surprised to learn that msleep_interruptible(),
> > wait_for_completion_interruptible_timeout(), and related functions
> > simply hung when I called kthread_stop() on kthreads using them. The
> > solution to fixing the case with msleep_interruptible() was more simply
> > to move to schedule_timeout_interruptible(). Why?
> >
> > The reason is that msleep_interruptible(), and many functions just like
> > it, has a loop like this:
> >
> >         while (timeout && !signal_pending(current))
> >                 timeout = schedule_timeout_interruptible(timeout);
> >
> > The call to kthread_stop() woke up the thread, so schedule_timeout_
> > interruptible() returned early, but because signal_pending() returned
> > true, it went back into another timeout, which was never woken up.
> >
> > This wait loop pattern is common to various pieces of code, and I
> > suspect that the subtle misuse in a kthread that caused a deadlock in
> > the code I looked at last week is also found elsewhere.
> >
> > So this commit causes signal_pending() to return true when
> > kthread_stop() is called, by setting TIF_NOTIFY_SIGNAL.
> >
> > The same also probably applies to the similar kthread_park()
> > functionality, but that can be addressed later, as its semantics are
> > slightly different.
> 
> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> 
> Do I need to pick this up and put it on a topic branch?
> Or does this patch have another patch to get merged?

I think it'd make sense to put this in a topic branch.

I discovered this in the process of developing [1] (which could use an
Ack for the wake_up_state EXPORT_SYMBOL, by the way). That's marked
for stable@ because the breakage there is kind of bad -- people can't put
their laptops to sleep right now. After both this patch and that one
land, I may then revisit the ath9k one and make changes for non-stable@.
That is, of course, if [1] even lands; right now I fear it might be
relegated to scheduler bikeshed purgatory and I don't have the time
right now to deal with that.

Longer term, this patch here will let me rework [1] to get rid of the
set_notify_signal() trick and use a proper condition variable wait with
`wait_for_condition_interruptible_timeout`, since this patch makes that
function work right for both contexts in which hwrng devices are called
(kthread and process). But that will mean reworking the hwrng API, which
means writing patches for every single hwrng driver, so that's work for
another time.

In the mean time, [1] is a good way forward (provided it gets acked).
And then this patch puts things in a good position to improve down the
line. I did a bunch of testing of this patch when trying out different
candidates for [1] before settling on [1] as a good-enough intermediate
solution. 

So, anyway, feel free to put this in a topic branch.

Jason

[1] https://lore.kernel.org/lkml/20220629114240.946411-1-Jason@zx2c4.com/

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-07-12  0:19 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-27 12:00 [PATCH] signal: break out of wait loops on kthread_stop() Jason A. Donenfeld
2022-06-27 13:27 ` Peter Zijlstra
2022-06-27 14:54   ` Jason A. Donenfeld
2022-06-27 14:57     ` [PATCH v2] " Jason A. Donenfeld
2022-06-27 19:16       ` Eric W. Biederman
2022-06-28 15:59         ` Jason A. Donenfeld
2022-06-28 16:14           ` [PATCH v3] " Jason A. Donenfeld
2022-07-04 12:22             ` Jason A. Donenfeld
2022-07-11 17:53               ` Jason A. Donenfeld
2022-07-11 18:57                 ` Eric W. Biederman
2022-07-11 20:18                   ` Jason A. Donenfeld
2022-07-11 20:21                     ` [PATCH v4] " Jason A. Donenfeld
2022-07-11 22:05                       ` Eric W. Biederman
2022-07-11 23:21                         ` [PATCH v5] " Jason A. Donenfeld
2022-07-12  0:00                           ` Eric W. Biederman
2022-07-12  0:18                             ` Jason A. Donenfeld
2022-07-11 22:04                     ` [PATCH v3] " Eric W. Biederman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.