From: Luca Abeni <luca.abeni@unitn.it>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Kirill Tkhai <tkhai@yandex.ru>, Juri Lelli <juri.lelli@arm.com>,
Ingo Molnar <mingo@redhat.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"juri.lelli@gmail.com" <juri.lelli@gmail.com>
Subject: Re: Another SCHED_DEADLINE bug (with bisection and possible fix)
Date: Fri, 30 Jan 2015 11:25:16 +0100 [thread overview]
Message-ID: <54CB5C0C.7060008@unitn.it> (raw)
In-Reply-To: <20150128140803.GF23038@twins.programming.kicks-ass.net>
Hi Peter,
On 01/28/2015 03:08 PM, Peter Zijlstra wrote:
> On Thu, Jan 15, 2015 at 02:35:46PM +0100, Luca Abeni wrote:
>
>>> >From what I understand we should either modify the tasks run/sleep stats
>>> when we change its parameters or we should schedule a delayed release of
>>> the bandwidth delta (when it reaches its 0-lag point, if thats in the
>>> future).
>
>> I suspect the correct behaviour can be difficult to implement:
>> - When a SCHED_DEADLINE task ends (or changes its scheduling policy to
>> something different), its bandwidth cannot be released immediately,
>> but should be released at the "0-lag time" (which reminds me about the
>> GRUB patches... I had to implement a similar behaviour in those patches :)
>> - The same applies when the task changes its scheduling parameters decreasing
>> its bandwidth. In this case, we also need to update the current runtime (if
>> it is larger than the new runtime, set it to the new maximum value - I think
>> this is the safest thing to do)
>> - When a task changes its parameters to increase its bandwidth, be do not
>> have such problems.
>>
>> As far as I can see, if we apply the runtime / deadline changes starting from
>> the next reservation period we are safe (because the "0-lag time" is always
>> smaller than the current scheduling deadline).
>> This might cause some transient overloads (because if I change the parameters
>> of a task at time t, the update takes action a little bit later - at the next
>> scheduling deadline), but guarantees that a task never consumes more than
>> expected (for example: if a task continuously changes its bandwidth between
>> 0.4 and 0.3, it will never consume more than 0.4. I suspect that if we
>> immediately update dl_se->deadline and dl_se->runtime a task can arrive to
>> consume much more CPU time).
>
>
> OK, how about something like this; it seems to survive your Bug-Test for
> at least 50 cycles.
I tested your patch for 1 day, and it works fine for my testcases (no crashes,
hangs, or strange behaviours).
Also, the comments look ok to me.
Thanks,
Luca
>
> ---
> kernel/sched/core.c | 33 ++++++++++++++++++++++++++++-----
> kernel/sched/deadline.c | 3 ++-
> 2 files changed, 30 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index ade2958a9197..d787d6553d72 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1816,6 +1816,10 @@ void __dl_clear_params(struct task_struct *p)
> dl_se->dl_period = 0;
> dl_se->flags = 0;
> dl_se->dl_bw = 0;
> +
> + dl_se->dl_throttled = 0;
> + dl_se->dl_new = 1;
> + dl_se->dl_yielded = 0;
> }
>
> /*
> @@ -1844,7 +1848,7 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
> #endif
>
> RB_CLEAR_NODE(&p->dl.rb_node);
> - hrtimer_init(&p->dl.dl_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> + init_dl_task_timer(&p->dl);
> __dl_clear_params(p);
>
> INIT_LIST_HEAD(&p->rt.run_list);
> @@ -2054,6 +2058,9 @@ static inline int dl_bw_cpus(int i)
> * allocated bandwidth to reflect the new situation.
> *
> * This function is called while holding p's rq->lock.
> + *
> + * XXX we should delay bw change until the task's 0-lag point, see
> + * __setparam_dl().
> */
> static int dl_overflow(struct task_struct *p, int policy,
> const struct sched_attr *attr)
> @@ -3258,15 +3265,31 @@ __setparam_dl(struct task_struct *p, const struct sched_attr *attr)
> {
> struct sched_dl_entity *dl_se = &p->dl;
>
> - init_dl_task_timer(dl_se);
> dl_se->dl_runtime = attr->sched_runtime;
> dl_se->dl_deadline = attr->sched_deadline;
> dl_se->dl_period = attr->sched_period ?: dl_se->dl_deadline;
> dl_se->flags = attr->sched_flags;
> dl_se->dl_bw = to_ratio(dl_se->dl_period, dl_se->dl_runtime);
> - dl_se->dl_throttled = 0;
> - dl_se->dl_new = 1;
> - dl_se->dl_yielded = 0;
> +
> + /*
> + * Changing the parameters of a task is 'tricky' and we're not doing
> + * the correct thing -- also see task_dead_dl() and switched_from_dl().
> + *
> + * What we SHOULD do is delay the bandwidth release until the 0-lag
> + * point. This would include retaining the task_struct until that time
> + * and change dl_overflow() to not immediately decrement the current
> + * amount.
> + *
> + * Instead we retain the current runtime/deadline and let the new
> + * parameters take effect after the current reservation period lapses.
> + * This is safe (albeit pessimistic) because the 0-lag point is always
> + * before the current scheduling deadline.
> + *
> + * We can still have temporary overloads because we do not delay the
> + * change in bandwidth until that time; so admission control is
> + * not on the safe side. It does however guarantee tasks will never
> + * consume more than promised.
> + */
> }
>
> /*
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index b52092f2636d..726470d47f87 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1094,6 +1094,7 @@ static void task_dead_dl(struct task_struct *p)
> * Since we are TASK_DEAD we won't slip out of the domain!
> */
> raw_spin_lock_irq(&dl_b->lock);
> + /* XXX we should retain the bw until 0-lag */
> dl_b->total_bw -= p->dl.dl_bw;
> raw_spin_unlock_irq(&dl_b->lock);
>
> @@ -1614,8 +1615,8 @@ static void cancel_dl_timer(struct rq *rq, struct task_struct *p)
>
> static void switched_from_dl(struct rq *rq, struct task_struct *p)
> {
> + /* XXX we should retain the bw until 0-lag */
> cancel_dl_timer(rq, p);
> -
> __dl_clear_params(p);
>
> /*
>
next prev parent reply other threads:[~2015-01-30 10:25 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-29 23:27 Another SCHED_DEADLINE bug (with bisection and possible fix) luca abeni
2015-01-04 22:51 ` Kirill Tkhai
2015-01-05 15:21 ` Luca Abeni
2015-01-05 23:07 ` Kirill Tkhai
2015-01-07 7:01 ` Luca Abeni
2015-01-07 12:29 ` Kirill Tkhai
2015-01-07 12:45 ` Luca Abeni
2015-01-07 13:04 ` Kirill Tkhai
2015-01-07 13:14 ` Luca Abeni
2015-01-09 11:15 ` Luca Abeni
2015-01-09 11:46 ` Kirill Tkhai
2015-01-13 8:10 ` Juri Lelli
2015-01-13 9:26 ` Kirill Tkhai
2015-01-13 14:04 ` Peter Zijlstra
2015-01-14 12:43 ` Kirill Tkhai
2015-01-15 11:23 ` Luca Abeni
2015-01-15 12:23 ` Peter Zijlstra
2015-01-15 13:35 ` Luca Abeni
2015-01-28 14:08 ` Peter Zijlstra
2015-01-28 14:40 ` Luca Abeni
2015-01-30 10:25 ` Luca Abeni [this message]
2015-01-30 10:35 ` Juri Lelli
2015-01-31 9:56 ` Peter Zijlstra
2015-02-02 11:31 ` Juri Lelli
2015-02-02 13:57 ` Luca Abeni
2015-02-04 14:34 ` [tip:sched/core] sched/deadline: Fix deadline parameter modification handling tip-bot for Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54CB5C0C.7060008@unitn.it \
--to=luca.abeni@unitn.it \
--cc=juri.lelli@arm.com \
--cc=juri.lelli@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=tkhai@yandex.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).