All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race
@ 2022-01-27 15:40 Valentin Schneider
  2022-02-01 15:28 ` Dietmar Eggemann
  2022-03-01 15:24 ` [tip: sched/core] " tip-bot2 for Valentin Schneider
  0 siblings, 2 replies; 3+ messages in thread
From: Valentin Schneider @ 2022-01-27 15:40 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: John Keeping, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira

John reported that push_rt_task() can end up invoking
find_lowest_rq(rq->curr) when curr is not an RT task (in this case a CFS
one), which causes mayhem down convert_prio().

This can happen when current gets demoted to e.g. CFS when releasing an
rt_mutex, and the local CPU gets hit with an rto_push_work irqwork before
getting the chance to reschedule. Exactly who triggers this work isn't
entirely clear to me - switched_from_rt() only invokes rt_queue_pull_task()
if there are no RT tasks on the local RQ, which means the local CPU can't
be in the rto_mask.

My current suspected sequence is something along the lines of the below,
with the demoted task being current.

  mark_wakeup_next_waiter()
    rt_mutex_adjust_prio()
      rt_mutex_setprio() // deboost originally-CFS task
	check_class_changed()
	  switched_from_rt() // Only rt_queue_pull_task() if !rq->rt.rt_nr_running
	  switched_to_fair() // Sets need_resched
      __balance_callbacks() // if pull_rt_task(), tell_cpu_to_push() can't select local CPU per the above
      raw_spin_rq_unlock(rq)

       // need_resched is set, so task_woken_rt() can't
       // invoke push_rt_tasks(). Best I can come up with is
       // local CPU has rt_nr_migratory >= 2 after the demotion, so stays
       // in the rto_mask, and then:

       <some other CPU running rto_push_irq_work_func() queues rto_push_work on this CPU>
	 push_rt_task()
	   // breakage follows here as rq->curr is CFS

Move an existing check to check rq->curr vs the next pushable task's
priority before getting anywhere near find_lowest_rq(). While at it, add an
explicit sched_class of rq->curr check prior to invoking
find_lowest_rq(rq->curr). Align the DL logic to also reschedule regardless
of next_task's migratability.

Link: http://lore.kernel.org/r/Yb3vXx3DcqVOi+EA@donbot
Fixes: a7c81556ec4d ("sched: Fix migrate_disable() vs rt/dl balancing")
Reported-by: John Keeping <john@metanate.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: John Keeping <john@metanate.com>
---
v1 -> v2: Reworded comments, added DL part (Dietmar)
---
 kernel/sched/deadline.c | 12 ++++++------
 kernel/sched/rt.c       | 32 ++++++++++++++++++++++----------
 2 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index d2c072b0ef01..62f0cf842277 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2240,12 +2240,6 @@ static int push_dl_task(struct rq *rq)
 		return 0;
 
 retry:
-	if (is_migration_disabled(next_task))
-		return 0;
-
-	if (WARN_ON(next_task == rq->curr))
-		return 0;
-
 	/*
 	 * If next_task preempts rq->curr, and rq->curr
 	 * can move away, it makes sense to just reschedule
@@ -2258,6 +2252,12 @@ static int push_dl_task(struct rq *rq)
 		return 0;
 	}
 
+	if (is_migration_disabled(next_task))
+		return 0;
+
+	if (WARN_ON(next_task == rq->curr))
+		return 0;
+
 	/* We might release rq lock */
 	get_task_struct(next_task);
 
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 7b4f4fbbb404..14f273c29518 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2026,6 +2026,16 @@ static int push_rt_task(struct rq *rq, bool pull)
 		return 0;
 
 retry:
+	/*
+	 * It's possible that the next_task slipped in of
+	 * higher priority than current. If that's the case
+	 * just reschedule current.
+	 */
+	if (unlikely(next_task->prio < rq->curr->prio)) {
+		resched_curr(rq);
+		return 0;
+	}
+
 	if (is_migration_disabled(next_task)) {
 		struct task_struct *push_task = NULL;
 		int cpu;
@@ -2033,6 +2043,18 @@ static int push_rt_task(struct rq *rq, bool pull)
 		if (!pull || rq->push_busy)
 			return 0;
 
+		/*
+		 * Invoking find_lowest_rq() on anything but an RT task doesn't
+		 * make sense. Per the above priority check, curr has to
+		 * be of higher priority than next_task, so no need to
+		 * reschedule when bailing out.
+		 *
+		 * Note that the stoppers are masqueraded as SCHED_FIFO
+		 * (cf. sched_set_stop_task()), so we can't rely on rt_task().
+		 */
+		if (rq->curr->sched_class != &rt_sched_class)
+			return 0;
+
 		cpu = find_lowest_rq(rq->curr);
 		if (cpu == -1 || cpu == rq->cpu)
 			return 0;
@@ -2057,16 +2079,6 @@ static int push_rt_task(struct rq *rq, bool pull)
 	if (WARN_ON(next_task == rq->curr))
 		return 0;
 
-	/*
-	 * It's possible that the next_task slipped in of
-	 * higher priority than current. If that's the case
-	 * just reschedule current.
-	 */
-	if (unlikely(next_task->prio < rq->curr->prio)) {
-		resched_curr(rq);
-		return 0;
-	}
-
 	/* We might release rq lock */
 	get_task_struct(next_task);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race
  2022-01-27 15:40 [PATCH v2] sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race Valentin Schneider
@ 2022-02-01 15:28 ` Dietmar Eggemann
  2022-03-01 15:24 ` [tip: sched/core] " tip-bot2 for Valentin Schneider
  1 sibling, 0 replies; 3+ messages in thread
From: Dietmar Eggemann @ 2022-02-01 15:28 UTC (permalink / raw)
  To: Valentin Schneider, linux-kernel, linux-rt-users
  Cc: John Keeping, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira

On 27/01/2022 16:40, Valentin Schneider wrote:
> John reported that push_rt_task() can end up invoking
> find_lowest_rq(rq->curr) when curr is not an RT task (in this case a CFS
> one), which causes mayhem down convert_prio().
> 
> This can happen when current gets demoted to e.g. CFS when releasing an
> rt_mutex, and the local CPU gets hit with an rto_push_work irqwork before
> getting the chance to reschedule. Exactly who triggers this work isn't
> entirely clear to me - switched_from_rt() only invokes rt_queue_pull_task()
> if there are no RT tasks on the local RQ, which means the local CPU can't
> be in the rto_mask.
> 
> My current suspected sequence is something along the lines of the below,
> with the demoted task being current.
> 
>   mark_wakeup_next_waiter()
>     rt_mutex_adjust_prio()
>       rt_mutex_setprio() // deboost originally-CFS task
> 	check_class_changed()
> 	  switched_from_rt() // Only rt_queue_pull_task() if !rq->rt.rt_nr_running
> 	  switched_to_fair() // Sets need_resched
>       __balance_callbacks() // if pull_rt_task(), tell_cpu_to_push() can't select local CPU per the above
>       raw_spin_rq_unlock(rq)
> 
>        // need_resched is set, so task_woken_rt() can't
>        // invoke push_rt_tasks(). Best I can come up with is
>        // local CPU has rt_nr_migratory >= 2 after the demotion, so stays
>        // in the rto_mask, and then:
> 
>        <some other CPU running rto_push_irq_work_func() queues rto_push_work on this CPU>
> 	 push_rt_task()
> 	   // breakage follows here as rq->curr is CFS
> 
> Move an existing check to check rq->curr vs the next pushable task's
> priority before getting anywhere near find_lowest_rq(). While at it, add an
> explicit sched_class of rq->curr check prior to invoking
> find_lowest_rq(rq->curr). Align the DL logic to also reschedule regardless
> of next_task's migratability.
> 
> Link: http://lore.kernel.org/r/Yb3vXx3DcqVOi+EA@donbot
> Fixes: a7c81556ec4d ("sched: Fix migrate_disable() vs rt/dl balancing")
> Reported-by: John Keeping <john@metanate.com>
> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
> Tested-by: John Keeping <john@metanate.com>
> ---
> v1 -> v2: Reworded comments, added DL part (Dietmar)
> ---

LGTM.

Rescheduling in case rq->curr has lower prio (including CFS tasks) than
next_task and bailing out in case rq->curr is DL or stop-task prevents
the bug from happening.

The only small issue is the fact that, unlike in push_rt_task(), the DL
logic only compares DL tasks (if (dl_task(rq->curr) ...), so you miss
rescheduling when rq->curr is a lower priority non-DL task.

Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>

[...]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [tip: sched/core] sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race
  2022-01-27 15:40 [PATCH v2] sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race Valentin Schneider
  2022-02-01 15:28 ` Dietmar Eggemann
@ 2022-03-01 15:24 ` tip-bot2 for Valentin Schneider
  1 sibling, 0 replies; 3+ messages in thread
From: tip-bot2 for Valentin Schneider @ 2022-03-01 15:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: John Keeping, Valentin Schneider, Peter Zijlstra (Intel),
	Dietmar Eggemann, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     49bef33e4b87b743495627a529029156c6e09530
Gitweb:        https://git.kernel.org/tip/49bef33e4b87b743495627a529029156c6e09530
Author:        Valentin Schneider <valentin.schneider@arm.com>
AuthorDate:    Thu, 27 Jan 2022 15:40:59 
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 01 Mar 2022 16:18:38 +01:00

sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race

John reported that push_rt_task() can end up invoking
find_lowest_rq(rq->curr) when curr is not an RT task (in this case a CFS
one), which causes mayhem down convert_prio().

This can happen when current gets demoted to e.g. CFS when releasing an
rt_mutex, and the local CPU gets hit with an rto_push_work irqwork before
getting the chance to reschedule. Exactly who triggers this work isn't
entirely clear to me - switched_from_rt() only invokes rt_queue_pull_task()
if there are no RT tasks on the local RQ, which means the local CPU can't
be in the rto_mask.

My current suspected sequence is something along the lines of the below,
with the demoted task being current.

  mark_wakeup_next_waiter()
    rt_mutex_adjust_prio()
      rt_mutex_setprio() // deboost originally-CFS task
	check_class_changed()
	  switched_from_rt() // Only rt_queue_pull_task() if !rq->rt.rt_nr_running
	  switched_to_fair() // Sets need_resched
      __balance_callbacks() // if pull_rt_task(), tell_cpu_to_push() can't select local CPU per the above
      raw_spin_rq_unlock(rq)

       // need_resched is set, so task_woken_rt() can't
       // invoke push_rt_tasks(). Best I can come up with is
       // local CPU has rt_nr_migratory >= 2 after the demotion, so stays
       // in the rto_mask, and then:

       <some other CPU running rto_push_irq_work_func() queues rto_push_work on this CPU>
	 push_rt_task()
	   // breakage follows here as rq->curr is CFS

Move an existing check to check rq->curr vs the next pushable task's
priority before getting anywhere near find_lowest_rq(). While at it, add an
explicit sched_class of rq->curr check prior to invoking
find_lowest_rq(rq->curr). Align the DL logic to also reschedule regardless
of next_task's migratability.

Fixes: a7c81556ec4d ("sched: Fix migrate_disable() vs rt/dl balancing")
Reported-by: John Keeping <john@metanate.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: John Keeping <john@metanate.com>
Link: https://lore.kernel.org/r/20220127154059.974729-1-valentin.schneider@arm.com
---
 kernel/sched/deadline.c | 12 ++++++------
 kernel/sched/rt.c       | 32 ++++++++++++++++++++++----------
 2 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index d2c072b..62f0cf8 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2240,12 +2240,6 @@ static int push_dl_task(struct rq *rq)
 		return 0;
 
 retry:
-	if (is_migration_disabled(next_task))
-		return 0;
-
-	if (WARN_ON(next_task == rq->curr))
-		return 0;
-
 	/*
 	 * If next_task preempts rq->curr, and rq->curr
 	 * can move away, it makes sense to just reschedule
@@ -2258,6 +2252,12 @@ retry:
 		return 0;
 	}
 
+	if (is_migration_disabled(next_task))
+		return 0;
+
+	if (WARN_ON(next_task == rq->curr))
+		return 0;
+
 	/* We might release rq lock */
 	get_task_struct(next_task);
 
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 7b4f4fb..14f273c 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2026,6 +2026,16 @@ static int push_rt_task(struct rq *rq, bool pull)
 		return 0;
 
 retry:
+	/*
+	 * It's possible that the next_task slipped in of
+	 * higher priority than current. If that's the case
+	 * just reschedule current.
+	 */
+	if (unlikely(next_task->prio < rq->curr->prio)) {
+		resched_curr(rq);
+		return 0;
+	}
+
 	if (is_migration_disabled(next_task)) {
 		struct task_struct *push_task = NULL;
 		int cpu;
@@ -2033,6 +2043,18 @@ retry:
 		if (!pull || rq->push_busy)
 			return 0;
 
+		/*
+		 * Invoking find_lowest_rq() on anything but an RT task doesn't
+		 * make sense. Per the above priority check, curr has to
+		 * be of higher priority than next_task, so no need to
+		 * reschedule when bailing out.
+		 *
+		 * Note that the stoppers are masqueraded as SCHED_FIFO
+		 * (cf. sched_set_stop_task()), so we can't rely on rt_task().
+		 */
+		if (rq->curr->sched_class != &rt_sched_class)
+			return 0;
+
 		cpu = find_lowest_rq(rq->curr);
 		if (cpu == -1 || cpu == rq->cpu)
 			return 0;
@@ -2057,16 +2079,6 @@ retry:
 	if (WARN_ON(next_task == rq->curr))
 		return 0;
 
-	/*
-	 * It's possible that the next_task slipped in of
-	 * higher priority than current. If that's the case
-	 * just reschedule current.
-	 */
-	if (unlikely(next_task->prio < rq->curr->prio)) {
-		resched_curr(rq);
-		return 0;
-	}
-
 	/* We might release rq lock */
 	get_task_struct(next_task);
 

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-03-01 15:24 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-27 15:40 [PATCH v2] sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race Valentin Schneider
2022-02-01 15:28 ` Dietmar Eggemann
2022-03-01 15:24 ` [tip: sched/core] " tip-bot2 for Valentin Schneider

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.