linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Juri Lelli <juri.lelli@redhat.com>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: peterz@infradead.org, mingo@redhat.com,
	linux-kernel@vger.kernel.org, vincent.guittot@linaro.org,
	rostedt@goodmis.org, bristot@redhat.com, bsegall@google.com,
	mgorman@suse.de, Mark Simmons <msimmons@redhat.com>
Subject: Re: [PATCH] sched/rt: Fix double enqueue caused by rt_effective_prio
Date: Wed, 7 Jul 2021 10:47:03 +0200	[thread overview]
Message-ID: <YOVqB1XKdoZYnn4m@localhost.localdomain> (raw)
In-Reply-To: <29c071b5-5dd9-42df-9e00-f3df644eeccc@arm.com>

Hi,

On 06/07/21 16:48, Dietmar Eggemann wrote:
> On 01/07/2021 11:14, Juri Lelli wrote:
> > Double enqueues in rt runqueues (list) have been reported while running
> > a simple test that spawns a number of threads doing a short sleep/run
> > pattern while being concurrently setscheduled between rt and fair class.
> 
> I tried to recreate this in rt-app (with `pi-mutex` resource and
> `pi_enabled=true` but I can't bring the system into hitting this warning.

So, this is a bit hard to reproduce. I'm attaching the reproducer we
have been using to test the fix. Note that we have seen this on RT (thus
why the repro doesn't need to explicitly use mutexes), but I'm not
seeing why this couldn't in principle happen on !RT as well.

> [...]
> 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 0c22cd026440..c84ac1d675f4 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -6823,7 +6823,8 @@ static void __setscheduler_params(struct task_struct *p,
> >  
> >  /* Actually do priority change: must hold pi & rq lock. */
> >  static void __setscheduler(struct rq *rq, struct task_struct *p,
> > -			   const struct sched_attr *attr, bool keep_boost)
> > +			   const struct sched_attr *attr, bool keep_boost,
> > +			   int new_effective_prio)
> >  {
> >  	/*
> >  	 * If params can't change scheduling class changes aren't allowed
> > @@ -6840,7 +6841,7 @@ static void __setscheduler(struct rq *rq, struct task_struct *p,
> >  	 */
> >  	p->prio = normal_prio(p);
> >  	if (keep_boost)
> > -		p->prio = rt_effective_prio(p, p->prio);
> > +		p->prio = new_effective_prio;
> 
> So in case __sched_setscheduler() is called for p (SCHED_NORMAL, NICE0)
> you want to avoid that this 2. rt_effective_prio() call returns
> p->prio=120 in case the 1. call (in __sched_setscheduler()) did return 0
> (due to pi_task->prio=0 (FIFO rt_priority=99 task))?

Not sure I completely follow your question. But what I'm seeing is that
the top_task prio/class can change (by a concurrent setscheduler call,
for example) between two consecutive rt_effective_prio() calls and this
eventually causes the double enqueue in the rt list.

Now, what I'm not sure about is if this is fine (as we always eventually
converge to correctness in the PI chain(s)), and thus the proposed fix,
or if we need to fix this differently.

> >  
> >  	if (dl_prio(p->prio))
> >  		p->sched_class = &dl_sched_class;
> > @@ -6873,7 +6874,7 @@ static int __sched_setscheduler(struct task_struct *p,
> >  	int newprio = dl_policy(attr->sched_policy) ? MAX_DL_PRIO - 1 :
> >  		      MAX_RT_PRIO - 1 - attr->sched_priority;
> >  	int retval, oldprio, oldpolicy = -1, queued, running;
> > -	int new_effective_prio, policy = attr->sched_policy;
> > +	int new_effective_prio = newprio, policy = attr->sched_policy;
> >  	const struct sched_class *prev_class;
> >  	struct callback_head *head;
> >  	struct rq_flags rf;
> > @@ -7072,6 +7073,9 @@ static int __sched_setscheduler(struct task_struct *p,
> >  	oldprio = p->prio;
> >  
> >  	if (pi) {
> > +		newprio = fair_policy(attr->sched_policy) ?
> > +			NICE_TO_PRIO(attr->sched_nice) : newprio;
> > +
> 
> Why is this necessary? p (SCHED_NORMAL) would get newprio=99 now and
> with this it gets [100...120...139] which is still greater than any RT
> (0-98)/DL (-1) prio?

It's needed because we might be going to use newprio (returned in
new_effective_prio) with __setscheduler() and that needs to be the
"final" nice scaled value.

Reproducer (on RT) follows.

Best,
Juri

---
# cat load.c
#include <unistd.h>
#include <time.h>


int main(){

        struct timespec t, t2;
        t.tv_sec = 0;
        t.tv_nsec = 100000;
        int i;
        while (1){
                // sleep(1);
                nanosleep(&t, &t2);
                i = 0;
                while(i < 100000){
                        i++;
                }
        }
}

--->8---

# cat setsched.c
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>

int main(int argc, char *argv[]){

        int ret;
        pid_t p;
        p = atoi(argv[1]);
        struct sched_param spr = { .sched_priority = 50};
        struct sched_param spo = { .sched_priority = 0};

        while(1){

                ret = sched_setscheduler(p, SCHED_RR, &spr);
                ret = sched_setscheduler(p, SCHED_OTHER, &spo);
        }
}

--->8---

# cat run.sh
#!/bin/bash

gcc -o load ./load.c
gcc -o setsched ./setsched.c
cp load rt_pid
mkdir TMP

for AUX in $(seq 36); do
    cp load TMP/load__${AUX}
    ./TMP/load__${AUX} &
done

sleep 1
for AUX in $(seq 18); do
    cp rt_pid TMP/rt_pid__${AUX}
    cp setsched TMP/setsched__${AUX}
    ./TMP/rt_pid__${AUX} &
    ./TMP/setsched__${AUX} $!&
done

--->8---

# cat destroy.sh
pkill load
pkill setsched
pkill rt_pid

rm load setsched rt_pid
rm -rf TMP


  reply	other threads:[~2021-07-07  8:47 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-01  9:14 [PATCH] sched/rt: Fix double enqueue caused by rt_effective_prio Juri Lelli
2021-07-06 14:48 ` Dietmar Eggemann
2021-07-07  8:47   ` Juri Lelli [this message]
2021-07-08 10:06 ` Peter Zijlstra
2021-07-08 10:26   ` Peter Zijlstra
2021-07-09  8:33     ` Juri Lelli
2021-08-02  7:35       ` Juri Lelli
2021-08-02 13:38         ` Peter Zijlstra
2021-07-08 11:35 ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YOVqB1XKdoZYnn4m@localhost.localdomain \
    --to=juri.lelli@redhat.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=msimmons@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).