* [PATCH] sched/rt: Fix double enqueue caused by rt_effective_prio @ 2021-07-01 9:14 Juri Lelli 2021-07-06 14:48 ` Dietmar Eggemann ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Juri Lelli @ 2021-07-01 9:14 UTC (permalink / raw) To: peterz, mingo Cc: linux-kernel, vincent.guittot, rostedt, dietmar.eggemann, bristot, bsegall, mgorman, Juri Lelli, Mark Simmons Double enqueues in rt runqueues (list) have been reported while running a simple test that spawns a number of threads doing a short sleep/run pattern while being concurrently setscheduled between rt and fair class. ------------[ cut here ]------------ WARNING: CPU: 3 PID: 2825 at kernel/sched/rt.c:1294 enqueue_task_rt+0x355/0x360 ... CPU: 3 PID: 2825 Comm: setsched__13 Tainted: G S 5.12.0-rc3-rt3 #16 Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/17/2018 RIP: 0010:enqueue_task_rt+0x355/0x360 Code: ... RSP: 0018:ffffa63209203e08 EFLAGS: 00010002 RAX: 0000000000000031 RBX: ffff9867cb6298c0 RCX: ffff98679fc67ca0 RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffff98679fc67980 RBP: ffff98679fc67740 R08: ffffffff85dcb3a8 R09: 0000000000000001 R10: 0000000000000000 R11: ffffffff86268838 R12: ffff98679fc67740 R13: ffff98679fc67740 R14: ffff9867cb629b40 R15: 000000000000000e FS: 00007f0ce9dbb500(0000) GS:ffff98679f8c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f0ce98ba650 CR3: 0000000240250003 CR4: 00000000001706e0 Call Trace: __sched_setscheduler+0x581/0x9d0 _sched_setscheduler+0x63/0xa0 do_sched_setscheduler+0xa0/0x150 __x64_sys_sched_setscheduler+0x1a/0x30 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f0ce98ba65b Code: ... RSP: 002b:00007ffea795ab88 EFLAGS: 00000217 ORIG_RAX: 0000000000000090 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0ce98ba65b RDX: 00007ffea795aba4 RSI: 0000000000000002 RDI: 0000000000000b08 RBP: 00007ffea795abb0 R08: 00007ffea795c216 R09: 0000000000000000 R10: 0000000000000003 R11: 0000000000000217 R12: 0000000000400500 R13: 00007ffea795ac90 R14: 0000000000000000 R15: 0000000000000000 ---[ end trace 0000000000000002 ]--- list_add double add: new=ffff9867cb629b40, prev=ffff9867cb629b40, next=ffff98679fc67ca0. ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:31! invalid opcode: 0000 [#1] PREEMPT_RT SMP PTI CPU: 3 PID: 2825 Comm: setsched__13 Tainted: G S W 5.12.0-rc3-rt3 #16 Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/17/2018 RIP: 0010:__list_add_valid+0x41/0x50 Code: ... RSP: 0018:ffffa63209203e00 EFLAGS: 00010046 RAX: 0000000000000058 RBX: ffff9867cb6298c0 RCX: 0000000000000003 RDX: 0000000000000000 RSI: ffffffff85d4ff42 RDI: 00000000ffffffff RBP: ffff98679fc67740 R08: ffffffff86270480 R09: 00080000000000ff R10: 0000000079157392 R11: 0000000000000033 R12: ffff98679fc67740 R13: ffff98679fc67740 R14: ffff9867cb629b40 R15: 0000000000000000 FS: 00007f0ce9dbb500(0000) GS:ffff98679f8c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f0ce98ba650 CR3: 0000000240250003 CR4: 00000000001706e0 Call Trace: enqueue_task_rt+0x291/0x360 __sched_setscheduler+0x581/0x9d0 _sched_setscheduler+0x63/0xa0 do_sched_setscheduler+0xa0/0x150 __x64_sys_sched_setscheduler+0x1a/0x30 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f0ce98ba65b Code: ... RSP: 002b:00007ffea795ab88 EFLAGS: 00000217 ORIG_RAX: 0000000000000090 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0ce98ba65b RDX: 00007ffea795aba4 RSI: 0000000000000002 RDI: 0000000000000b08 RBP: 00007ffea795abb0 R08: 00007ffea795c216 R09: 0000000000000000 R10: 0000000000000003 R11: 0000000000000217 R12: 0000000000400500 R13: 00007ffea795ac90 R14: 0000000000000000 R15: 0000000000000000 ... ---[ end trace 0000000000000003 ]--- __sched_setscheduler() uses rt_effective_prio() to handle proper queuing of priority boosted tasks that are setscheduled while being boosted. rt_effective_prio() is however called twice per each __sched_setscheduler() call: first directly by __sched_setscheduler() before dequeuing the task and then by __setscheduler() to actually do the priority change. If the priority of the pi_top_task is concurrently being changed however, it might happen that the two calls return different results. If, for example, the first call returned the same rt priority the task was running at and the second one a fair priority, the task won't be removed by the rt list (on_list still set) and then enqueued in the fair runqueue. When eventually setscheduled back to rt it will be seen as enqueued already and the WARNING/BUG be issued. Fix this by calling rt_effective_prio() only once and then reusing the return value. Concurrent priority inheritance handling is still safe and will eventually converge to a new state by following the inheritance chain(s). Fixes: ff77e46853598 ("sched/rt: Fix PI handling vs. sched_setscheduler()") Reported-by: Mark Simmons <msimmons@redhat.com> Signed-off-by: Juri Lelli <juri.lelli@redhat.com> --- Hi, Looks like this survives the test case (and regression testing), but I'm still not fully convinced that the above statement "Concurrent priority inheritance handling is still safe and will eventually converge to a new state by following the inheritance chain(s)" is actually sound. Please take a thorough look. Best, Juri --- kernel/sched/core.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 0c22cd026440..c84ac1d675f4 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6823,7 +6823,8 @@ static void __setscheduler_params(struct task_struct *p, /* Actually do priority change: must hold pi & rq lock. */ static void __setscheduler(struct rq *rq, struct task_struct *p, - const struct sched_attr *attr, bool keep_boost) + const struct sched_attr *attr, bool keep_boost, + int new_effective_prio) { /* * If params can't change scheduling class changes aren't allowed @@ -6840,7 +6841,7 @@ static void __setscheduler(struct rq *rq, struct task_struct *p, */ p->prio = normal_prio(p); if (keep_boost) - p->prio = rt_effective_prio(p, p->prio); + p->prio = new_effective_prio; if (dl_prio(p->prio)) p->sched_class = &dl_sched_class; @@ -6873,7 +6874,7 @@ static int __sched_setscheduler(struct task_struct *p, int newprio = dl_policy(attr->sched_policy) ? MAX_DL_PRIO - 1 : MAX_RT_PRIO - 1 - attr->sched_priority; int retval, oldprio, oldpolicy = -1, queued, running; - int new_effective_prio, policy = attr->sched_policy; + int new_effective_prio = newprio, policy = attr->sched_policy; const struct sched_class *prev_class; struct callback_head *head; struct rq_flags rf; @@ -7072,6 +7073,9 @@ static int __sched_setscheduler(struct task_struct *p, oldprio = p->prio; if (pi) { + newprio = fair_policy(attr->sched_policy) ? + NICE_TO_PRIO(attr->sched_nice) : newprio; + /* * Take priority boosted tasks into account. If the new * effective priority is unchanged, we just store the new @@ -7093,7 +7097,7 @@ static int __sched_setscheduler(struct task_struct *p, prev_class = p->sched_class; - __setscheduler(rq, p, attr, pi); + __setscheduler(rq, p, attr, pi, new_effective_prio); __setscheduler_uclamp(p, attr); if (queued) { -- 2.31.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] sched/rt: Fix double enqueue caused by rt_effective_prio 2021-07-01 9:14 [PATCH] sched/rt: Fix double enqueue caused by rt_effective_prio Juri Lelli @ 2021-07-06 14:48 ` Dietmar Eggemann 2021-07-07 8:47 ` Juri Lelli 2021-07-08 10:06 ` Peter Zijlstra 2021-07-08 11:35 ` Peter Zijlstra 2 siblings, 1 reply; 9+ messages in thread From: Dietmar Eggemann @ 2021-07-06 14:48 UTC (permalink / raw) To: Juri Lelli, peterz, mingo Cc: linux-kernel, vincent.guittot, rostedt, bristot, bsegall, mgorman, Mark Simmons On 01/07/2021 11:14, Juri Lelli wrote: > Double enqueues in rt runqueues (list) have been reported while running > a simple test that spawns a number of threads doing a short sleep/run > pattern while being concurrently setscheduled between rt and fair class. I tried to recreate this in rt-app (with `pi-mutex` resource and `pi_enabled=true` but I can't bring the system into hitting this warning. [...] > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 0c22cd026440..c84ac1d675f4 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -6823,7 +6823,8 @@ static void __setscheduler_params(struct task_struct *p, > > /* Actually do priority change: must hold pi & rq lock. */ > static void __setscheduler(struct rq *rq, struct task_struct *p, > - const struct sched_attr *attr, bool keep_boost) > + const struct sched_attr *attr, bool keep_boost, > + int new_effective_prio) > { > /* > * If params can't change scheduling class changes aren't allowed > @@ -6840,7 +6841,7 @@ static void __setscheduler(struct rq *rq, struct task_struct *p, > */ > p->prio = normal_prio(p); > if (keep_boost) > - p->prio = rt_effective_prio(p, p->prio); > + p->prio = new_effective_prio; So in case __sched_setscheduler() is called for p (SCHED_NORMAL, NICE0) you want to avoid that this 2. rt_effective_prio() call returns p->prio=120 in case the 1. call (in __sched_setscheduler()) did return 0 (due to pi_task->prio=0 (FIFO rt_priority=99 task))? > > if (dl_prio(p->prio)) > p->sched_class = &dl_sched_class; > @@ -6873,7 +6874,7 @@ static int __sched_setscheduler(struct task_struct *p, > int newprio = dl_policy(attr->sched_policy) ? MAX_DL_PRIO - 1 : > MAX_RT_PRIO - 1 - attr->sched_priority; > int retval, oldprio, oldpolicy = -1, queued, running; > - int new_effective_prio, policy = attr->sched_policy; > + int new_effective_prio = newprio, policy = attr->sched_policy; > const struct sched_class *prev_class; > struct callback_head *head; > struct rq_flags rf; > @@ -7072,6 +7073,9 @@ static int __sched_setscheduler(struct task_struct *p, > oldprio = p->prio; > > if (pi) { > + newprio = fair_policy(attr->sched_policy) ? > + NICE_TO_PRIO(attr->sched_nice) : newprio; > + Why is this necessary? p (SCHED_NORMAL) would get newprio=99 now and with this it gets [100...120...139] which is still greater than any RT (0-98)/DL (-1) prio? [...] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] sched/rt: Fix double enqueue caused by rt_effective_prio 2021-07-06 14:48 ` Dietmar Eggemann @ 2021-07-07 8:47 ` Juri Lelli 0 siblings, 0 replies; 9+ messages in thread From: Juri Lelli @ 2021-07-07 8:47 UTC (permalink / raw) To: Dietmar Eggemann Cc: peterz, mingo, linux-kernel, vincent.guittot, rostedt, bristot, bsegall, mgorman, Mark Simmons Hi, On 06/07/21 16:48, Dietmar Eggemann wrote: > On 01/07/2021 11:14, Juri Lelli wrote: > > Double enqueues in rt runqueues (list) have been reported while running > > a simple test that spawns a number of threads doing a short sleep/run > > pattern while being concurrently setscheduled between rt and fair class. > > I tried to recreate this in rt-app (with `pi-mutex` resource and > `pi_enabled=true` but I can't bring the system into hitting this warning. So, this is a bit hard to reproduce. I'm attaching the reproducer we have been using to test the fix. Note that we have seen this on RT (thus why the repro doesn't need to explicitly use mutexes), but I'm not seeing why this couldn't in principle happen on !RT as well. > [...] > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index 0c22cd026440..c84ac1d675f4 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -6823,7 +6823,8 @@ static void __setscheduler_params(struct task_struct *p, > > > > /* Actually do priority change: must hold pi & rq lock. */ > > static void __setscheduler(struct rq *rq, struct task_struct *p, > > - const struct sched_attr *attr, bool keep_boost) > > + const struct sched_attr *attr, bool keep_boost, > > + int new_effective_prio) > > { > > /* > > * If params can't change scheduling class changes aren't allowed > > @@ -6840,7 +6841,7 @@ static void __setscheduler(struct rq *rq, struct task_struct *p, > > */ > > p->prio = normal_prio(p); > > if (keep_boost) > > - p->prio = rt_effective_prio(p, p->prio); > > + p->prio = new_effective_prio; > > So in case __sched_setscheduler() is called for p (SCHED_NORMAL, NICE0) > you want to avoid that this 2. rt_effective_prio() call returns > p->prio=120 in case the 1. call (in __sched_setscheduler()) did return 0 > (due to pi_task->prio=0 (FIFO rt_priority=99 task))? Not sure I completely follow your question. But what I'm seeing is that the top_task prio/class can change (by a concurrent setscheduler call, for example) between two consecutive rt_effective_prio() calls and this eventually causes the double enqueue in the rt list. Now, what I'm not sure about is if this is fine (as we always eventually converge to correctness in the PI chain(s)), and thus the proposed fix, or if we need to fix this differently. > > > > if (dl_prio(p->prio)) > > p->sched_class = &dl_sched_class; > > @@ -6873,7 +6874,7 @@ static int __sched_setscheduler(struct task_struct *p, > > int newprio = dl_policy(attr->sched_policy) ? MAX_DL_PRIO - 1 : > > MAX_RT_PRIO - 1 - attr->sched_priority; > > int retval, oldprio, oldpolicy = -1, queued, running; > > - int new_effective_prio, policy = attr->sched_policy; > > + int new_effective_prio = newprio, policy = attr->sched_policy; > > const struct sched_class *prev_class; > > struct callback_head *head; > > struct rq_flags rf; > > @@ -7072,6 +7073,9 @@ static int __sched_setscheduler(struct task_struct *p, > > oldprio = p->prio; > > > > if (pi) { > > + newprio = fair_policy(attr->sched_policy) ? > > + NICE_TO_PRIO(attr->sched_nice) : newprio; > > + > > Why is this necessary? p (SCHED_NORMAL) would get newprio=99 now and > with this it gets [100...120...139] which is still greater than any RT > (0-98)/DL (-1) prio? It's needed because we might be going to use newprio (returned in new_effective_prio) with __setscheduler() and that needs to be the "final" nice scaled value. Reproducer (on RT) follows. Best, Juri --- # cat load.c #include <unistd.h> #include <time.h> int main(){ struct timespec t, t2; t.tv_sec = 0; t.tv_nsec = 100000; int i; while (1){ // sleep(1); nanosleep(&t, &t2); i = 0; while(i < 100000){ i++; } } } --->8--- # cat setsched.c #include <sched.h> #include <stdio.h> #include <stdlib.h> #include <sys/types.h> #include <unistd.h> int main(int argc, char *argv[]){ int ret; pid_t p; p = atoi(argv[1]); struct sched_param spr = { .sched_priority = 50}; struct sched_param spo = { .sched_priority = 0}; while(1){ ret = sched_setscheduler(p, SCHED_RR, &spr); ret = sched_setscheduler(p, SCHED_OTHER, &spo); } } --->8--- # cat run.sh #!/bin/bash gcc -o load ./load.c gcc -o setsched ./setsched.c cp load rt_pid mkdir TMP for AUX in $(seq 36); do cp load TMP/load__${AUX} ./TMP/load__${AUX} & done sleep 1 for AUX in $(seq 18); do cp rt_pid TMP/rt_pid__${AUX} cp setsched TMP/setsched__${AUX} ./TMP/rt_pid__${AUX} & ./TMP/setsched__${AUX} $!& done --->8--- # cat destroy.sh pkill load pkill setsched pkill rt_pid rm load setsched rt_pid rm -rf TMP ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] sched/rt: Fix double enqueue caused by rt_effective_prio 2021-07-01 9:14 [PATCH] sched/rt: Fix double enqueue caused by rt_effective_prio Juri Lelli 2021-07-06 14:48 ` Dietmar Eggemann @ 2021-07-08 10:06 ` Peter Zijlstra 2021-07-08 10:26 ` Peter Zijlstra 2021-07-08 11:35 ` Peter Zijlstra 2 siblings, 1 reply; 9+ messages in thread From: Peter Zijlstra @ 2021-07-08 10:06 UTC (permalink / raw) To: Juri Lelli Cc: mingo, linux-kernel, vincent.guittot, rostedt, dietmar.eggemann, bristot, bsegall, mgorman, Mark Simmons On Thu, Jul 01, 2021 at 11:14:31AM +0200, Juri Lelli wrote: > Double enqueues in rt runqueues (list) have been reported while running > a simple test that spawns a number of threads doing a short sleep/run > pattern while being concurrently setscheduled between rt and fair class. > __sched_setscheduler() uses rt_effective_prio() to handle proper queuing > of priority boosted tasks that are setscheduled while being boosted. > rt_effective_prio() is however called twice per each > __sched_setscheduler() call: first directly by __sched_setscheduler() > before dequeuing the task and then by __setscheduler() to actually do > the priority change. If the priority of the pi_top_task is concurrently > being changed however, it might happen that the two calls return > different results. If, for example, the first call returned the same rt > priority the task was running at and the second one a fair priority, the > task won't be removed by the rt list (on_list still set) and then > enqueued in the fair runqueue. When eventually setscheduled back to rt > it will be seen as enqueued already and the WARNING/BUG be issued. > > Fix this by calling rt_effective_prio() only once and then reusing the > return value. Concurrent priority inheritance handling is still safe and > will eventually converge to a new state by following the inheritance > chain(s). > > Fixes: ff77e46853598 ("sched/rt: Fix PI handling vs. sched_setscheduler()") You sure? AFAICT that commit only adds the warning that's triggered here, but the double rt_effective_prio() call seems to predate that. Or am I missing something about the on_rq changes in that patch? Commit 0782e63bc6fe seems to introduce the double call.. > --- > kernel/sched/core.c | 12 ++++++++---- > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 0c22cd026440..c84ac1d675f4 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -6823,7 +6823,8 @@ static void __setscheduler_params(struct task_struct *p, > > /* Actually do priority change: must hold pi & rq lock. */ > static void __setscheduler(struct rq *rq, struct task_struct *p, > - const struct sched_attr *attr, bool keep_boost) > + const struct sched_attr *attr, bool keep_boost, > + int new_effective_prio) > { > /* > * If params can't change scheduling class changes aren't allowed > @@ -6840,7 +6841,7 @@ static void __setscheduler(struct rq *rq, struct task_struct *p, > */ > p->prio = normal_prio(p); > if (keep_boost) > - p->prio = rt_effective_prio(p, p->prio); > + p->prio = new_effective_prio; > > if (dl_prio(p->prio)) > p->sched_class = &dl_sched_class; > @@ -6873,7 +6874,7 @@ static int __sched_setscheduler(struct task_struct *p, > int newprio = dl_policy(attr->sched_policy) ? MAX_DL_PRIO - 1 : > MAX_RT_PRIO - 1 - attr->sched_priority; > int retval, oldprio, oldpolicy = -1, queued, running; > - int new_effective_prio, policy = attr->sched_policy; > + int new_effective_prio = newprio, policy = attr->sched_policy; > const struct sched_class *prev_class; > struct callback_head *head; > struct rq_flags rf; > @@ -7072,6 +7073,9 @@ static int __sched_setscheduler(struct task_struct *p, > oldprio = p->prio; > > if (pi) { > + newprio = fair_policy(attr->sched_policy) ? > + NICE_TO_PRIO(attr->sched_nice) : newprio; *groan*.... That completes our local copy of normal_prio() based off of attr. > /* > * Take priority boosted tasks into account. If the new > * effective priority is unchanged, we just store the new > @@ -7093,7 +7097,7 @@ static int __sched_setscheduler(struct task_struct *p, > > prev_class = p->sched_class; > > - __setscheduler(rq, p, attr, pi); > + __setscheduler(rq, p, attr, pi, new_effective_prio); > __setscheduler_uclamp(p, attr); > > if (queued) { Slightly larger patch, but perhaps a little cleaner.. still pondering if we can share a little more between __sched_setscheduler() and rt_mutex_setprio(). --- kernel/sched/core.c | 51 +++++++++++++++++++++------------------------------ 1 file changed, 21 insertions(+), 30 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 970b95ff2952..ede10642612c 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1983,12 +1983,18 @@ void deactivate_task(struct rq *rq, struct task_struct *p, int flags) dequeue_task(rq, p, flags); } -/* - * __normal_prio - return the priority that is based on the static prio - */ -static inline int __normal_prio(struct task_struct *p) +static inline int __normal_prio(int policy, int rt_prio, int nice) { - return p->static_prio; + int prio; + + if (dl_policy(policy)) + prio = MAX_DL_PRIO - 1; + else if (rt_policy(policy)) + prio = MAX_RT_PRIO - 1 - rt_prio; + else + prio = NICE_TO_PRIO(nice); + + return prio; } /* @@ -2000,15 +2006,7 @@ static inline int __normal_prio(struct task_struct *p) */ static inline int normal_prio(struct task_struct *p) { - int prio; - - if (task_has_dl_policy(p)) - prio = MAX_DL_PRIO-1; - else if (task_has_rt_policy(p)) - prio = MAX_RT_PRIO-1 - p->rt_priority; - else - prio = __normal_prio(p); - return prio; + return __normal_prio(p->policy, p->rt_priority, PRIO_TO_NICE(p->static_prio)); } /* @@ -4101,7 +4099,7 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p) } else if (PRIO_TO_NICE(p->static_prio) < 0) p->static_prio = NICE_TO_PRIO(0); - p->prio = p->normal_prio = __normal_prio(p); + p->prio = p->normal_prio = p->static_prio; set_load_weight(p, false); /* @@ -6828,7 +6826,7 @@ static void __setscheduler_params(struct task_struct *p, /* Actually do priority change: must hold pi & rq lock. */ static void __setscheduler(struct rq *rq, struct task_struct *p, - const struct sched_attr *attr, bool keep_boost) + const struct sched_attr *attr, int newprio) { /* * If params can't change scheduling class changes aren't allowed @@ -6839,13 +6837,7 @@ static void __setscheduler(struct rq *rq, struct task_struct *p, __setscheduler_params(p, attr); - /* - * Keep a potential priority boosting if called from - * sched_setscheduler(). - */ - p->prio = normal_prio(p); - if (keep_boost) - p->prio = rt_effective_prio(p, p->prio); + p->prio = newprio; if (dl_prio(p->prio)) p->sched_class = &dl_sched_class; @@ -6875,10 +6867,8 @@ static int __sched_setscheduler(struct task_struct *p, const struct sched_attr *attr, bool user, bool pi) { - int newprio = dl_policy(attr->sched_policy) ? MAX_DL_PRIO - 1 : - MAX_RT_PRIO - 1 - attr->sched_priority; - int retval, oldprio, oldpolicy = -1, queued, running; - int new_effective_prio, policy = attr->sched_policy; + int oldpolicy = -1, policy = attr->sched_policy; + int retval, oldprio, newprio, queued, running; const struct sched_class *prev_class; struct callback_head *head; struct rq_flags rf; @@ -7076,6 +7066,7 @@ static int __sched_setscheduler(struct task_struct *p, p->sched_reset_on_fork = reset_on_fork; oldprio = p->prio; + newprio = __normal_prio(policy, attr->sched_priority, attr->sched_nice); if (pi) { /* * Take priority boosted tasks into account. If the new @@ -7084,8 +7075,8 @@ static int __sched_setscheduler(struct task_struct *p, * the runqueue. This will be done when the task deboost * itself. */ - new_effective_prio = rt_effective_prio(p, newprio); - if (new_effective_prio == oldprio) + newprio = rt_effective_prio(p, newprio); + if (newprio == oldprio) queue_flags &= ~DEQUEUE_MOVE; } @@ -7098,7 +7089,7 @@ static int __sched_setscheduler(struct task_struct *p, prev_class = p->sched_class; - __setscheduler(rq, p, attr, pi); + __setscheduler(rq, p, attr, newprio); __setscheduler_uclamp(p, attr); if (queued) { ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] sched/rt: Fix double enqueue caused by rt_effective_prio 2021-07-08 10:06 ` Peter Zijlstra @ 2021-07-08 10:26 ` Peter Zijlstra 2021-07-09 8:33 ` Juri Lelli 0 siblings, 1 reply; 9+ messages in thread From: Peter Zijlstra @ 2021-07-08 10:26 UTC (permalink / raw) To: Juri Lelli Cc: mingo, linux-kernel, vincent.guittot, rostedt, dietmar.eggemann, bristot, bsegall, mgorman, Mark Simmons On Thu, Jul 08, 2021 at 12:06:27PM +0200, Peter Zijlstra wrote: > Slightly larger patch, but perhaps a little cleaner.. still pondering if > we can share a little more between __sched_setscheduler() and > rt_mutex_setprio(). Best I can seem to come up with... --- kernel/sched/core.c | 45 +++++++++++++++++---------------------------- 1 file changed, 17 insertions(+), 28 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index ede10642612c..c686f0c70656 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6341,6 +6341,18 @@ int default_wake_function(wait_queue_entry_t *curr, unsigned mode, int wake_flag } EXPORT_SYMBOL(default_wake_function); +static void __setscheduler_prio(struct task_struct *p, int prio) +{ + if (dl_prio(prio)) + p->sched_class = &dl_sched_class; + else if (rt_prio(prio)) + p->sched_class = &rt_sched_class; + else + p->sched_class = &fair_sched_class; + + p->prio = prio; +} + #ifdef CONFIG_RT_MUTEXES static inline int __rt_effective_prio(struct task_struct *pi_task, int prio) @@ -6456,22 +6468,19 @@ void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task) } else { p->dl.pi_se = &p->dl; } - p->sched_class = &dl_sched_class; } else if (rt_prio(prio)) { if (dl_prio(oldprio)) p->dl.pi_se = &p->dl; if (oldprio < prio) queue_flag |= ENQUEUE_HEAD; - p->sched_class = &rt_sched_class; } else { if (dl_prio(oldprio)) p->dl.pi_se = &p->dl; if (rt_prio(oldprio)) p->rt.timeout = 0; - p->sched_class = &fair_sched_class; } - p->prio = prio; + __setscheduler_prio(p, prio); if (queued) enqueue_task(rq, p, queue_flag); @@ -6824,29 +6833,6 @@ static void __setscheduler_params(struct task_struct *p, set_load_weight(p, true); } -/* Actually do priority change: must hold pi & rq lock. */ -static void __setscheduler(struct rq *rq, struct task_struct *p, - const struct sched_attr *attr, int newprio) -{ - /* - * If params can't change scheduling class changes aren't allowed - * either. - */ - if (attr->sched_flags & SCHED_FLAG_KEEP_PARAMS) - return; - - __setscheduler_params(p, attr); - - p->prio = newprio; - - if (dl_prio(p->prio)) - p->sched_class = &dl_sched_class; - else if (rt_prio(p->prio)) - p->sched_class = &rt_sched_class; - else - p->sched_class = &fair_sched_class; -} - /* * Check the target process has a UID that matches the current process's: */ @@ -7089,7 +7075,10 @@ static int __sched_setscheduler(struct task_struct *p, prev_class = p->sched_class; - __setscheduler(rq, p, attr, newprio); + if (!(attr->sched_flags & SCHED_FLAG_KEEP_PARAMS)) { + __setscheduler_params(p, attr); + __setscheduler_prio(p, newprio); + } __setscheduler_uclamp(p, attr); if (queued) { ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] sched/rt: Fix double enqueue caused by rt_effective_prio 2021-07-08 10:26 ` Peter Zijlstra @ 2021-07-09 8:33 ` Juri Lelli 2021-08-02 7:35 ` Juri Lelli 0 siblings, 1 reply; 9+ messages in thread From: Juri Lelli @ 2021-07-09 8:33 UTC (permalink / raw) To: Peter Zijlstra Cc: mingo, linux-kernel, vincent.guittot, rostedt, dietmar.eggemann, bristot, bsegall, mgorman, Mark Simmons Hi Peter, On 08/07/21 12:26, Peter Zijlstra wrote: > On Thu, Jul 08, 2021 at 12:06:27PM +0200, Peter Zijlstra wrote: > > Slightly larger patch, but perhaps a little cleaner.. still pondering if > > we can share a little more between __sched_setscheduler() and > > rt_mutex_setprio(). > > Best I can seem to come up with... Thanks for the non-lazy version of the fix! Makes sense to me and it looks good from quick testing. I'll be doing more extensive testing and ask Mark (cc-ed) to help with that. :) We'll report back soon-ish. Best, Juri ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] sched/rt: Fix double enqueue caused by rt_effective_prio 2021-07-09 8:33 ` Juri Lelli @ 2021-08-02 7:35 ` Juri Lelli 2021-08-02 13:38 ` Peter Zijlstra 0 siblings, 1 reply; 9+ messages in thread From: Juri Lelli @ 2021-08-02 7:35 UTC (permalink / raw) To: Peter Zijlstra Cc: mingo, linux-kernel, vincent.guittot, rostedt, dietmar.eggemann, bristot, bsegall, mgorman, Mark Simmons Hi Peter, On 09/07/21 10:33, Juri Lelli wrote: > Hi Peter, > > On 08/07/21 12:26, Peter Zijlstra wrote: > > On Thu, Jul 08, 2021 at 12:06:27PM +0200, Peter Zijlstra wrote: > > > Slightly larger patch, but perhaps a little cleaner.. still pondering if > > > we can share a little more between __sched_setscheduler() and > > > rt_mutex_setprio(). > > > > Best I can seem to come up with... > > Thanks for the non-lazy version of the fix! > > Makes sense to me and it looks good from quick testing. I'll be doing > more extensive testing and ask Mark (cc-ed) to help with that. :) > > We'll report back soon-ish. So, yeah it took a while (mostly my pto :), apologies. However, we managed to confirm your fix works great! May I leave posting it again with changelog etc to you, or would you like I do that (making you the author)? Thanks, Juri ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] sched/rt: Fix double enqueue caused by rt_effective_prio 2021-08-02 7:35 ` Juri Lelli @ 2021-08-02 13:38 ` Peter Zijlstra 0 siblings, 0 replies; 9+ messages in thread From: Peter Zijlstra @ 2021-08-02 13:38 UTC (permalink / raw) To: Juri Lelli Cc: mingo, linux-kernel, vincent.guittot, rostedt, dietmar.eggemann, bristot, bsegall, mgorman, Mark Simmons On Mon, Aug 02, 2021 at 09:35:50AM +0200, Juri Lelli wrote: > However, we managed to confirm your fix works great! > > May I leave posting it again with changelog etc to you, or would you > like I do that (making you the author)? If you could (re)post a known working version with Changelog that would be much appreciated. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] sched/rt: Fix double enqueue caused by rt_effective_prio 2021-07-01 9:14 [PATCH] sched/rt: Fix double enqueue caused by rt_effective_prio Juri Lelli 2021-07-06 14:48 ` Dietmar Eggemann 2021-07-08 10:06 ` Peter Zijlstra @ 2021-07-08 11:35 ` Peter Zijlstra 2 siblings, 0 replies; 9+ messages in thread From: Peter Zijlstra @ 2021-07-08 11:35 UTC (permalink / raw) To: Juri Lelli Cc: mingo, linux-kernel, vincent.guittot, rostedt, dietmar.eggemann, bristot, bsegall, mgorman, Mark Simmons On Thu, Jul 01, 2021 at 11:14:31AM +0200, Juri Lelli wrote: > Looks like this survives the test case (and regression testing), but I'm > still not fully convinced that the above statement "Concurrent priority > inheritance handling is still safe and will eventually converge to a new > state by following the inheritance chain(s)" is actually sound. I think we good there. rt_mutex_adjust_prio_chain() updates ->prio in step 7 and calls rt_mutex_setprio() in step 11. So if we race against __sched_setschedule() and observe the old value, step 11 will follow up and correct us. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2021-08-02 13:39 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-07-01 9:14 [PATCH] sched/rt: Fix double enqueue caused by rt_effective_prio Juri Lelli 2021-07-06 14:48 ` Dietmar Eggemann 2021-07-07 8:47 ` Juri Lelli 2021-07-08 10:06 ` Peter Zijlstra 2021-07-08 10:26 ` Peter Zijlstra 2021-07-09 8:33 ` Juri Lelli 2021-08-02 7:35 ` Juri Lelli 2021-08-02 13:38 ` Peter Zijlstra 2021-07-08 11:35 ` Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).