linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] signal: Last two bits for PREEMPT_RT
@ 2022-07-20 15:44 Sebastian Andrzej Siewior
  2022-07-20 15:44 ` [PATCH 1/2] signal: Don't disable preemption in ptrace_stop() on PREEMPT_RT Sebastian Andrzej Siewior
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-07-20 15:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: Eric W. Biederman, Ben Segall, Daniel Bristot de Oliveira,
	Dietmar Eggemann, Ingo Molnar, Juri Lelli, Mel Gorman,
	Oleg Nesterov, Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
	Valentin Schneider, Vincent Guittot

This are the last two signal/ptrace related bits to get PREEMPT_RT
working.

Sebastian


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] signal: Don't disable preemption in ptrace_stop() on PREEMPT_RT.
  2022-07-20 15:44 [PATCH 0/2] signal: Last two bits for PREEMPT_RT Sebastian Andrzej Siewior
@ 2022-07-20 15:44 ` Sebastian Andrzej Siewior
  2022-07-20 15:44 ` [PATCH 2/2] sched: Consider task_struct::saved_state in wait_task_inactive() Sebastian Andrzej Siewior
  2022-07-25 15:46 ` [PATCH 0/2] signal: Last two bits for PREEMPT_RT Sebastian Andrzej Siewior
  2 siblings, 0 replies; 8+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-07-20 15:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: Eric W. Biederman, Ben Segall, Daniel Bristot de Oliveira,
	Dietmar Eggemann, Ingo Molnar, Juri Lelli, Mel Gorman,
	Oleg Nesterov, Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
	Valentin Schneider, Vincent Guittot, Sebastian Andrzej Siewior

Commit
   53da1d9456fe7 ("fix ptrace slowness")

is just band aid around the problem.
The invocation of do_notify_parent_cldstop() wakes the parent and makes
it runnable. The scheduler then wants to replace this still running task
with the parent. With the read_lock() acquired this is not possible
because preemption is disabled and so this is deferred until read_unlock().
This scheduling point is undesired and is avoided by disabling preemption
around the unlock operation enabled again before the schedule() invocation
without a preemption point.
This is only undesired because the parent sleeps a cycle in
wait_task_inactive() until the traced task leaves the run-queue in
schedule(). It is not a correctness issue, it is just band aid to avoid the
visbile delay which sums up over multiple invocations.
The task can still be preempted if an interrupt occurs between
preempt_enable_no_resched() and freezable_schedule() because on the IRQ-exit
path of the interrupt scheduling _will_ happen. This is ignored since it does
not happen very often.

On PREEMPT_RT keeping preemption disabled during the invocation of
cgroup_enter_frozen() becomes a problem because the function acquires
css_set_lock which is a sleeping lock on PREEMPT_RT and must not be
acquired with disabled preemption.

Don't disable preemption on PREEMPT_RT. Remove the TODO regarding adding
read_unlock_no_resched() as there is no need for it and will cause harm.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/signal.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2297,13 +2297,13 @@ static int ptrace_stop(int exit_code, in
 	/*
 	 * Don't want to allow preemption here, because
 	 * sys_ptrace() needs this task to be inactive.
-	 *
-	 * XXX: implement read_unlock_no_resched().
 	 */
-	preempt_disable();
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+		preempt_disable();
 	read_unlock(&tasklist_lock);
 	cgroup_enter_frozen();
-	preempt_enable_no_resched();
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+		preempt_enable_no_resched();
 	freezable_schedule();
 	cgroup_leave_frozen(true);
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/2] sched: Consider task_struct::saved_state in wait_task_inactive().
  2022-07-20 15:44 [PATCH 0/2] signal: Last two bits for PREEMPT_RT Sebastian Andrzej Siewior
  2022-07-20 15:44 ` [PATCH 1/2] signal: Don't disable preemption in ptrace_stop() on PREEMPT_RT Sebastian Andrzej Siewior
@ 2022-07-20 15:44 ` Sebastian Andrzej Siewior
  2022-07-25 17:47   ` Valentin Schneider
  2022-07-25 15:46 ` [PATCH 0/2] signal: Last two bits for PREEMPT_RT Sebastian Andrzej Siewior
  2 siblings, 1 reply; 8+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-07-20 15:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: Eric W. Biederman, Ben Segall, Daniel Bristot de Oliveira,
	Dietmar Eggemann, Ingo Molnar, Juri Lelli, Mel Gorman,
	Oleg Nesterov, Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
	Valentin Schneider, Vincent Guittot, Sebastian Andrzej Siewior

Ptrace is using wait_task_inactive() to wait for the tracee to reach a
certain task state. On PREEMPT_RT that state may be stored in
task_struct::saved_state while the tracee blocks on a sleeping lock and
task_struct::__state is set to TASK_RTLOCK_WAIT.
It is not possible to check only for TASK_RTLOCK_WAIT to be sure that the task
is blocked on a sleeping lock because during wake up (after the sleeping lock
has been acquired) the task state is set TASK_RUNNING. After the task in on CPU
and acquired the pi_lock it will reset the state accordingly but until then
TASK_RUNNING will be observed (with the desired state saved in saved_state).

Check also for task_struct::saved_state if the desired match was not found in
task_struct::__state on PREEMPT_RT. If the state was found in saved_state, wait
until the task is idle and state is visible in task_struct::__state.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/sched/core.c |   46 +++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 41 insertions(+), 5 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3257,6 +3257,40 @@ int migrate_swap(struct task_struct *cur
 }
 #endif /* CONFIG_NUMA_BALANCING */
 
+#ifdef CONFIG_PREEMPT_RT
+static __always_inline bool state_mismatch(struct task_struct *p, unsigned int match_state)
+{
+	unsigned long flags;
+	bool mismatch;
+
+	raw_spin_lock_irqsave(&p->pi_lock, flags);
+	mismatch = READ_ONCE(p->__state) != match_state &&
+		READ_ONCE(p->saved_state) != match_state;
+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+	return mismatch;
+}
+static __always_inline bool state_match(struct task_struct *p, unsigned int match_state,
+					bool *wait)
+{
+	if (READ_ONCE(p->__state) == match_state)
+		return true;
+	if (READ_ONCE(p->saved_state) != match_state)
+		return false;
+	*wait = true;
+	return true;
+}
+#else
+static __always_inline bool state_mismatch(struct task_struct *p, unsigned int match_state)
+{
+	return READ_ONCE(p->__state) != match_state;
+}
+static __always_inline bool state_match(struct task_struct *p, unsigned int match_state,
+					bool *wait)
+{
+	return READ_ONCE(p->__state) == match_state;
+}
+#endif
+
 /*
  * wait_task_inactive - wait for a thread to unschedule.
  *
@@ -3275,7 +3309,7 @@ int migrate_swap(struct task_struct *cur
  */
 unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state)
 {
-	int running, queued;
+	bool running, wait;
 	struct rq_flags rf;
 	unsigned long ncsw;
 	struct rq *rq;
@@ -3301,7 +3335,7 @@ unsigned long wait_task_inactive(struct
 		 * is actually now running somewhere else!
 		 */
 		while (task_running(rq, p)) {
-			if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
+			if (match_state && state_mismatch(p, match_state))
 				return 0;
 			cpu_relax();
 		}
@@ -3314,10 +3348,12 @@ unsigned long wait_task_inactive(struct
 		rq = task_rq_lock(p, &rf);
 		trace_sched_wait_task(p);
 		running = task_running(rq, p);
-		queued = task_on_rq_queued(p);
+		wait = task_on_rq_queued(p);
 		ncsw = 0;
-		if (!match_state || READ_ONCE(p->__state) == match_state)
+
+		if (!match_state || state_match(p, match_state, &wait))
 			ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
+
 		task_rq_unlock(rq, p, &rf);
 
 		/*
@@ -3346,7 +3382,7 @@ unsigned long wait_task_inactive(struct
 		 * running right now), it's preempted, and we should
 		 * yield - it could be a while.
 		 */
-		if (unlikely(queued)) {
+		if (unlikely(wait)) {
 			ktime_t to = NSEC_PER_SEC / HZ;
 
 			set_current_state(TASK_UNINTERRUPTIBLE);

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] signal: Last two bits for PREEMPT_RT
  2022-07-20 15:44 [PATCH 0/2] signal: Last two bits for PREEMPT_RT Sebastian Andrzej Siewior
  2022-07-20 15:44 ` [PATCH 1/2] signal: Don't disable preemption in ptrace_stop() on PREEMPT_RT Sebastian Andrzej Siewior
  2022-07-20 15:44 ` [PATCH 2/2] sched: Consider task_struct::saved_state in wait_task_inactive() Sebastian Andrzej Siewior
@ 2022-07-25 15:46 ` Sebastian Andrzej Siewior
  2 siblings, 0 replies; 8+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-07-25 15:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: Eric W. Biederman, Ben Segall, Daniel Bristot de Oliveira,
	Dietmar Eggemann, Ingo Molnar, Juri Lelli, Mel Gorman,
	Oleg Nesterov, Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
	Valentin Schneider, Vincent Guittot

On 2022-07-20 17:44:33 [+0200], To linux-kernel@vger.kernel.org wrote:
> This are the last two signal/ptrace related bits to get PREEMPT_RT
> working.

a polite ping for the series. This is one of two road blocks to get RT
enabled in v5.20. I don't want to add any pressure just point out that I
can't sit still for days since the end is near ;)

Sebastian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] sched: Consider task_struct::saved_state in wait_task_inactive().
  2022-07-20 15:44 ` [PATCH 2/2] sched: Consider task_struct::saved_state in wait_task_inactive() Sebastian Andrzej Siewior
@ 2022-07-25 17:47   ` Valentin Schneider
  2022-07-26  6:17     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 8+ messages in thread
From: Valentin Schneider @ 2022-07-25 17:47 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, linux-kernel
  Cc: Eric W. Biederman, Ben Segall, Daniel Bristot de Oliveira,
	Dietmar Eggemann, Ingo Molnar, Juri Lelli, Mel Gorman,
	Oleg Nesterov, Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
	Vincent Guittot, Sebastian Andrzej Siewior

On 20/07/22 17:44, Sebastian Andrzej Siewior wrote:
> Ptrace is using wait_task_inactive() to wait for the tracee to reach a
> certain task state. On PREEMPT_RT that state may be stored in
> task_struct::saved_state while the tracee blocks on a sleeping lock and
> task_struct::__state is set to TASK_RTLOCK_WAIT.
> It is not possible to check only for TASK_RTLOCK_WAIT to be sure that the task
> is blocked on a sleeping lock because during wake up (after the sleeping lock
> has been acquired) the task state is set TASK_RUNNING. After the task in on CPU
> and acquired the pi_lock it will reset the state accordingly but until then
> TASK_RUNNING will be observed (with the desired state saved in saved_state).
>
> Check also for task_struct::saved_state if the desired match was not found in
> task_struct::__state on PREEMPT_RT. If the state was found in saved_state, wait
> until the task is idle and state is visible in task_struct::__state.
>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

I have a suggestion and a comment below, but other than that this looks OK.

Reviewed-by: Valentin Schneider <vschneid@redhat.com>

> ---
>  kernel/sched/core.c |   46 +++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 41 insertions(+), 5 deletions(-)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3257,6 +3257,40 @@ int migrate_swap(struct task_struct *cur
>  }
>  #endif /* CONFIG_NUMA_BALANCING */
>
> +#ifdef CONFIG_PREEMPT_RT

Would something like the below be useful?

/*
 * If p->saved_state is anything else than TASK_RUNNING, then p blocked on an
 * rtlock *before* voluntarily calling into schedule() after setting its state
 * to X. For things like ptrace (X=TASK_TRACED), the task could have more work
 * to do upon acquiring the lock before whoever called wait_task_inactive()
 * should return. IOW, we have to wait for:
 *
 *   p.saved_state = TASK_RUNNING
 *   p.__state     = X
 *
 * which implies the task isn't blocked on an RT lock and got to schedule() by
 * itself.
 *
 * Also see comments in ttwu_state_match().
 */

> +static __always_inline bool state_mismatch(struct task_struct *p, unsigned int match_state)
> +{
> +	unsigned long flags;
> +	bool mismatch;
> +
> +	raw_spin_lock_irqsave(&p->pi_lock, flags);
> +	mismatch = READ_ONCE(p->__state) != match_state &&
> +		READ_ONCE(p->saved_state) != match_state;
> +	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> +	return mismatch;
> +}
> +static __always_inline bool state_match(struct task_struct *p, unsigned int match_state,
> +					bool *wait)
> +{
> +	if (READ_ONCE(p->__state) == match_state)
> +		return true;
> +	if (READ_ONCE(p->saved_state) != match_state)
> +		return false;
> +	*wait = true;
> +	return true;
> +}
> +#else
> +static __always_inline bool state_mismatch(struct task_struct *p, unsigned int match_state)
> +{
> +	return READ_ONCE(p->__state) != match_state;
> +}
> +static __always_inline bool state_match(struct task_struct *p, unsigned int match_state,
> +					bool *wait)
> +{
> +	return READ_ONCE(p->__state) == match_state;
> +}
> +#endif
> +
>  /*
>   * wait_task_inactive - wait for a thread to unschedule.
>   *
> @@ -3346,7 +3382,7 @@ unsigned long wait_task_inactive(struct
>                * running right now), it's preempted, and we should
>                * yield - it could be a while.
>                */
> -		if (unlikely(queued)) {
> +		if (unlikely(wait)) {

We could be repeatedly doing this for as long as the task is blocked on the
rtlock, but IIUC that's the same story on !PREEMPT_RT if it's just a queued
task preempted by a higher prio task, it may take a while for it to
schedule() and dequeue...

>                       ktime_t to = NSEC_PER_SEC / HZ;
>
>                       set_current_state(TASK_UNINTERRUPTIBLE);


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] sched: Consider task_struct::saved_state in wait_task_inactive().
  2022-07-25 17:47   ` Valentin Schneider
@ 2022-07-26  6:17     ` Sebastian Andrzej Siewior
  2022-07-26 10:18       ` Valentin Schneider
  0 siblings, 1 reply; 8+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-07-26  6:17 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, Eric W. Biederman, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Ingo Molnar,
	Juri Lelli, Mel Gorman, Oleg Nesterov, Peter Zijlstra,
	Steven Rostedt, Thomas Gleixner, Vincent Guittot

On 2022-07-25 18:47:58 [+0100], Valentin Schneider wrote:
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -3257,6 +3257,40 @@ int migrate_swap(struct task_struct *cur
> >  }
> >  #endif /* CONFIG_NUMA_BALANCING */
> >
> > +#ifdef CONFIG_PREEMPT_RT
> 
> Would something like the below be useful?
> 
> /*
>  * If p->saved_state is anything else than TASK_RUNNING, then p blocked on an
>  * rtlock *before* voluntarily calling into schedule() after setting its state
>  * to X. For things like ptrace (X=TASK_TRACED), the task could have more work
>  * to do upon acquiring the lock before whoever called wait_task_inactive()
>  * should return. IOW, we have to wait for:
>  *
>  *   p.saved_state = TASK_RUNNING
>  *   p.__state     = X
>  *
>  * which implies the task isn't blocked on an RT lock and got to schedule() by
>  * itself.
>  *
>  * Also see comments in ttwu_state_match().
>  */

This sums up the code. I would s/schedule/schedule_rtlock/ since there
are two entrypoints.

…
> > @@ -3346,7 +3382,7 @@ unsigned long wait_task_inactive(struct
> >                * running right now), it's preempted, and we should
> >                * yield - it could be a while.
> >                */
> > -		if (unlikely(queued)) {
> > +		if (unlikely(wait)) {
> 
> We could be repeatedly doing this for as long as the task is blocked on the
> rtlock, but IIUC that's the same story on !PREEMPT_RT if it's just a queued
> task preempted by a higher prio task, it may take a while for it to
> schedule() and dequeue...

Yes.

> >                       ktime_t to = NSEC_PER_SEC / HZ;
> >
> >                       set_current_state(TASK_UNINTERRUPTIBLE);

Sebastian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] sched: Consider task_struct::saved_state in wait_task_inactive().
  2022-07-26  6:17     ` Sebastian Andrzej Siewior
@ 2022-07-26 10:18       ` Valentin Schneider
  2022-07-26 13:16         ` [PATCH 2/2 v2] " Sebastian Andrzej Siewior
  0 siblings, 1 reply; 8+ messages in thread
From: Valentin Schneider @ 2022-07-26 10:18 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, Eric W. Biederman, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Ingo Molnar,
	Juri Lelli, Mel Gorman, Oleg Nesterov, Peter Zijlstra,
	Steven Rostedt, Thomas Gleixner, Vincent Guittot

On 26/07/22 08:17, Sebastian Andrzej Siewior wrote:
> On 2022-07-25 18:47:58 [+0100], Valentin Schneider wrote:
>> > --- a/kernel/sched/core.c
>> > +++ b/kernel/sched/core.c
>> > @@ -3257,6 +3257,40 @@ int migrate_swap(struct task_struct *cur
>> >  }
>> >  #endif /* CONFIG_NUMA_BALANCING */
>> >
>> > +#ifdef CONFIG_PREEMPT_RT
>>
>> Would something like the below be useful?
>>
>> /*
>>  * If p->saved_state is anything else than TASK_RUNNING, then p blocked on an
>>  * rtlock *before* voluntarily calling into schedule() after setting its state
>>  * to X. For things like ptrace (X=TASK_TRACED), the task could have more work
>>  * to do upon acquiring the lock before whoever called wait_task_inactive()
>>  * should return. IOW, we have to wait for:
>>  *
>>  *   p.saved_state = TASK_RUNNING
>>  *   p.__state     = X
>>  *
>>  * which implies the task isn't blocked on an RT lock and got to schedule() by
>>  * itself.
>>  *
>>  * Also see comments in ttwu_state_match().
>>  */
>
> This sums up the code. I would s/schedule/schedule_rtlock/ since there
> are two entrypoints.

Right, this any better?

/*
 * Consider:
 *
 *  set_special_state(X);
 *
 *  do_things()
 *    // Somewhere in there is an rtlock that can be contended:
 *    current_save_and_set_rtlock_wait_state();
 *    [...]
 *    schedule_rtlock(); (A)
 *    [...]
 *    current_restore_rtlock_saved_state();
 *
 *  schedule(); (B)
 *
 * If p->saved_state is anything else than TASK_RUNNING, then p blocked on an
 * rtlock (A) *before* voluntarily calling into schedule() (B) after setting its
 * state to X. For things like ptrace (X=TASK_TRACED), the task could have more
 * work to do upon acquiring the lock in do_things() before whoever called
 * wait_task_inactive() should return. IOW, we have to wait for:
 *
 *   p.saved_state = TASK_RUNNING
 *   p.__state     = X
 *
 * which implies the task isn't blocked on an RT lock and got to schedule() (B).
 *
 * Also see comments in ttwu_state_match().
 */


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/2 v2] sched: Consider task_struct::saved_state in wait_task_inactive().
  2022-07-26 10:18       ` Valentin Schneider
@ 2022-07-26 13:16         ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 8+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-07-26 13:16 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, Eric W. Biederman, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Ingo Molnar,
	Juri Lelli, Mel Gorman, Oleg Nesterov, Peter Zijlstra,
	Steven Rostedt, Thomas Gleixner, Vincent Guittot

Ptrace is using wait_task_inactive() to wait for the tracee to reach a
certain task state. On PREEMPT_RT that state may be stored in
task_struct::saved_state while the tracee blocks on a sleeping lock and
task_struct::__state is set to TASK_RTLOCK_WAIT.
It is not possible to check only for TASK_RTLOCK_WAIT to be sure that the task
is blocked on a sleeping lock because during wake up (after the sleeping lock
has been acquired) the task state is set TASK_RUNNING. After the task in on CPU
and acquired the pi_lock it will reset the state accordingly but until then
TASK_RUNNING will be observed (with the desired state saved in saved_state).

Check also for task_struct::saved_state if the desired match was not found in
task_struct::__state on PREEMPT_RT. If the state was found in saved_state, wait
until the task is idle and state is visible in task_struct::__state.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
---
v1…v2:
   - Add a comment suggested by Valentin Schneider.

 kernel/sched/core.c |   76 ++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 71 insertions(+), 5 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3257,6 +3257,70 @@ int migrate_swap(struct task_struct *cur
 }
 #endif /* CONFIG_NUMA_BALANCING */
 
+#ifdef CONFIG_PREEMPT_RT
+
+/*
+ * Consider:
+ *
+ *  set_special_state(X);
+ *
+ *  do_things()
+ *    // Somewhere in there is an rtlock that can be contended:
+ *    current_save_and_set_rtlock_wait_state();
+ *    [...]
+ *    schedule_rtlock(); (A)
+ *    [...]
+ *    current_restore_rtlock_saved_state();
+ *
+ *  schedule(); (B)
+ *
+ * If p->saved_state is anything else than TASK_RUNNING, then p blocked on an
+ * rtlock (A) *before* voluntarily calling into schedule() (B) after setting its
+ * state to X. For things like ptrace (X=TASK_TRACED), the task could have more
+ * work to do upon acquiring the lock in do_things() before whoever called
+ * wait_task_inactive() should return. IOW, we have to wait for:
+ *
+ *   p.saved_state = TASK_RUNNING
+ *   p.__state     = X
+ *
+ * which implies the task isn't blocked on an RT lock and got to schedule() (B).
+ *
+ * Also see comments in ttwu_state_match().
+ */
+
+static __always_inline bool state_mismatch(struct task_struct *p, unsigned int match_state)
+{
+	unsigned long flags;
+	bool mismatch;
+
+	raw_spin_lock_irqsave(&p->pi_lock, flags);
+	mismatch = READ_ONCE(p->__state) != match_state &&
+		READ_ONCE(p->saved_state) != match_state;
+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+	return mismatch;
+}
+static __always_inline bool state_match(struct task_struct *p, unsigned int match_state,
+					bool *wait)
+{
+	if (READ_ONCE(p->__state) == match_state)
+		return true;
+	if (READ_ONCE(p->saved_state) != match_state)
+		return false;
+	*wait = true;
+	return true;
+}
+#else
+static __always_inline bool state_mismatch(struct task_struct *p, unsigned int match_state)
+{
+	return READ_ONCE(p->__state) != match_state;
+}
+static __always_inline bool state_match(struct task_struct *p, unsigned int match_state,
+					bool *wait)
+{
+	return READ_ONCE(p->__state) == match_state;
+}
+#endif
+
 /*
  * wait_task_inactive - wait for a thread to unschedule.
  *
@@ -3275,7 +3339,7 @@ int migrate_swap(struct task_struct *cur
  */
 unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state)
 {
-	int running, queued;
+	bool running, wait;
 	struct rq_flags rf;
 	unsigned long ncsw;
 	struct rq *rq;
@@ -3301,7 +3365,7 @@ unsigned long wait_task_inactive(struct
 		 * is actually now running somewhere else!
 		 */
 		while (task_running(rq, p)) {
-			if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
+			if (match_state && state_mismatch(p, match_state))
 				return 0;
 			cpu_relax();
 		}
@@ -3314,10 +3378,12 @@ unsigned long wait_task_inactive(struct
 		rq = task_rq_lock(p, &rf);
 		trace_sched_wait_task(p);
 		running = task_running(rq, p);
-		queued = task_on_rq_queued(p);
+		wait = task_on_rq_queued(p);
 		ncsw = 0;
-		if (!match_state || READ_ONCE(p->__state) == match_state)
+
+		if (!match_state || state_match(p, match_state, &wait))
 			ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
+
 		task_rq_unlock(rq, p, &rf);
 
 		/*
@@ -3346,7 +3412,7 @@ unsigned long wait_task_inactive(struct
 		 * running right now), it's preempted, and we should
 		 * yield - it could be a while.
 		 */
-		if (unlikely(queued)) {
+		if (unlikely(wait)) {
 			ktime_t to = NSEC_PER_SEC / HZ;
 
 			set_current_state(TASK_UNINTERRUPTIBLE);

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-07-26 13:16 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-20 15:44 [PATCH 0/2] signal: Last two bits for PREEMPT_RT Sebastian Andrzej Siewior
2022-07-20 15:44 ` [PATCH 1/2] signal: Don't disable preemption in ptrace_stop() on PREEMPT_RT Sebastian Andrzej Siewior
2022-07-20 15:44 ` [PATCH 2/2] sched: Consider task_struct::saved_state in wait_task_inactive() Sebastian Andrzej Siewior
2022-07-25 17:47   ` Valentin Schneider
2022-07-26  6:17     ` Sebastian Andrzej Siewior
2022-07-26 10:18       ` Valentin Schneider
2022-07-26 13:16         ` [PATCH 2/2 v2] " Sebastian Andrzej Siewior
2022-07-25 15:46 ` [PATCH 0/2] signal: Last two bits for PREEMPT_RT Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).