All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dietmar Eggemann <dietmar.eggemann@arm.com>
To: Valentin Schneider <valentin.schneider@arm.com>,
	linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org
Cc: John Keeping <john@metanate.com>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>
Subject: Re: [PATCH v2] sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race
Date: Tue, 1 Feb 2022 16:28:10 +0100	[thread overview]
Message-ID: <c12765d4-6b0e-fd1a-8152-a68e6e56cbb8@arm.com> (raw)
In-Reply-To: <20220127154059.974729-1-valentin.schneider@arm.com>

On 27/01/2022 16:40, Valentin Schneider wrote:
> John reported that push_rt_task() can end up invoking
> find_lowest_rq(rq->curr) when curr is not an RT task (in this case a CFS
> one), which causes mayhem down convert_prio().
> 
> This can happen when current gets demoted to e.g. CFS when releasing an
> rt_mutex, and the local CPU gets hit with an rto_push_work irqwork before
> getting the chance to reschedule. Exactly who triggers this work isn't
> entirely clear to me - switched_from_rt() only invokes rt_queue_pull_task()
> if there are no RT tasks on the local RQ, which means the local CPU can't
> be in the rto_mask.
> 
> My current suspected sequence is something along the lines of the below,
> with the demoted task being current.
> 
>   mark_wakeup_next_waiter()
>     rt_mutex_adjust_prio()
>       rt_mutex_setprio() // deboost originally-CFS task
> 	check_class_changed()
> 	  switched_from_rt() // Only rt_queue_pull_task() if !rq->rt.rt_nr_running
> 	  switched_to_fair() // Sets need_resched
>       __balance_callbacks() // if pull_rt_task(), tell_cpu_to_push() can't select local CPU per the above
>       raw_spin_rq_unlock(rq)
> 
>        // need_resched is set, so task_woken_rt() can't
>        // invoke push_rt_tasks(). Best I can come up with is
>        // local CPU has rt_nr_migratory >= 2 after the demotion, so stays
>        // in the rto_mask, and then:
> 
>        <some other CPU running rto_push_irq_work_func() queues rto_push_work on this CPU>
> 	 push_rt_task()
> 	   // breakage follows here as rq->curr is CFS
> 
> Move an existing check to check rq->curr vs the next pushable task's
> priority before getting anywhere near find_lowest_rq(). While at it, add an
> explicit sched_class of rq->curr check prior to invoking
> find_lowest_rq(rq->curr). Align the DL logic to also reschedule regardless
> of next_task's migratability.
> 
> Link: http://lore.kernel.org/r/Yb3vXx3DcqVOi+EA@donbot
> Fixes: a7c81556ec4d ("sched: Fix migrate_disable() vs rt/dl balancing")
> Reported-by: John Keeping <john@metanate.com>
> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
> Tested-by: John Keeping <john@metanate.com>
> ---
> v1 -> v2: Reworded comments, added DL part (Dietmar)
> ---

LGTM.

Rescheduling in case rq->curr has lower prio (including CFS tasks) than
next_task and bailing out in case rq->curr is DL or stop-task prevents
the bug from happening.

The only small issue is the fact that, unlike in push_rt_task(), the DL
logic only compares DL tasks (if (dl_task(rq->curr) ...), so you miss
rescheduling when rq->curr is a lower priority non-DL task.

Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>

[...]

  reply	other threads:[~2022-02-01 15:28 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-27 15:40 [PATCH v2] sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race Valentin Schneider
2022-02-01 15:28 ` Dietmar Eggemann [this message]
2022-03-01 15:24 ` [tip: sched/core] " tip-bot2 for Valentin Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c12765d4-6b0e-fd1a-8152-a68e6e56cbb8@arm.com \
    --to=dietmar.eggemann@arm.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=john@metanate.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.