Another SCHED_DEADLINE bug (with bisection and possible fix)

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Another SCHED_DEADLINE bug (with bisection and possible fix)
@ 2014-12-29 23:27 luca abeni
  2015-01-04 22:51 ` Kirill Tkhai
  0 siblings, 1 reply; 26+ messages in thread
From: luca abeni @ 2014-12-29 23:27 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Juri Lelli, linux-kernel

Hi all,

when running some experiments on current git master, I noticed a
regression respect to version 3.18 of the kernel: when invoking
sched_setattr() to change the SCHED_DEADLINE parameters of a task that
is already scheduled by SCHED_DEADLINE, it is possible to crash the
system.

The bug can be reproduced with this testcase:
http://disi.unitn.it/~abeni/reclaiming/bug-test.tgz
Uncompress it, enter the "Bug-Test" directory, and type "make test".
After few cycles, my test machine (a laptop with an intel i7 CPU)
becomes unusable, and freezes.

Since I know that 3.18 is not affected by this bug, I tried a bisect,
that pointed to commit 67dfa1b756f250972bde31d65e3f8fde6aeddc5b
(sched/deadline: Implement cancel_dl_timer() to use in
switched_from_dl()).
By looking at that commit, I suspect the problem is that it removes the
following lines from init_dl_task_timer():
-       if (hrtimer_active(timer)) {
-               hrtimer_try_to_cancel(timer);
-               return;
-       }

As a result, when changing the parameters of a SCHED_DEADLINE task
init_dl_task_timer() is invoked again, and it can initialize a pending
timer (not sure why, but this seems to be the cause of the freezes I am
seeing).

So, I modified core.c::__setparam_dl() to invoke init_dl_task_timer()
only if the task is not already scheduled by SCHED_DEADLINE...
Basically, I changed
	init_dl_task_timer(dl_se);
into
	if (p->sched_class != &dl_sched_class) {
		init_dl_task_timer(dl_se);
	}

I am not sure if this is the correct fix, but with this change the
kernel survives my test script (mentioned above), and arrives to 500
cycles (without my patch, it crashes after 2 or 3 cycles).

What do you think? Is my patch correct, or should I fix the issue in a
different way?

			Thanks,
				Luca

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another SCHED_DEADLINE bug (with bisection and possible fix)
  2014-12-29 23:27 Another SCHED_DEADLINE bug (with bisection and possible fix) luca abeni
@ 2015-01-04 22:51 ` Kirill Tkhai
  2015-01-05 15:21   ` Luca Abeni
  0 siblings, 1 reply; 26+ messages in thread
From: Kirill Tkhai @ 2015-01-04 22:51 UTC (permalink / raw)
  To: luca abeni, Peter Zijlstra; +Cc: Ingo Molnar, Juri Lelli, linux-kernel

Hi, Luca,

I've just notived this.

30.12.2014, 02:27, "luca abeni" <luca.abeni@unitn.it>:
> Hi all,
>
> when running some experiments on current git master, I noticed a
> regression respect to version 3.18 of the kernel: when invoking
> sched_setattr() to change the SCHED_DEADLINE parameters of a task that
> is already scheduled by SCHED_DEADLINE, it is possible to crash the
> system.
>
> The bug can be reproduced with this testcase:
> http://disi.unitn.it/~abeni/reclaiming/bug-test.tgz
> Uncompress it, enter the "Bug-Test" directory, and type "make test".
> After few cycles, my test machine (a laptop with an intel i7 CPU)
> becomes unusable, and freezes.
>
> Since I know that 3.18 is not affected by this bug, I tried a bisect,
> that pointed to commit 67dfa1b756f250972bde31d65e3f8fde6aeddc5b
> (sched/deadline: Implement cancel_dl_timer() to use in
> switched_from_dl()).
> By looking at that commit, I suspect the problem is that it removes the
> following lines from init_dl_task_timer():
> -       if (hrtimer_active(timer)) {
> -               hrtimer_try_to_cancel(timer);
> -               return;
> -       }
>
> As a result, when changing the parameters of a SCHED_DEADLINE task
> init_dl_task_timer() is invoked again, and it can initialize a pending
> timer (not sure why, but this seems to be the cause of the freezes I am
> seeing).
>
> So, I modified core.c::__setparam_dl() to invoke init_dl_task_timer()
> only if the task is not already scheduled by SCHED_DEADLINE...
> Basically, I changed
>         init_dl_task_timer(dl_se);
> into
>         if (p->sched_class != &dl_sched_class) {
>                 init_dl_task_timer(dl_se);
>         }
>
> I am not sure if this is the correct fix, but with this change the
> kernel survives my test script (mentioned above), and arrives to 500
> cycles (without my patch, it crashes after 2 or 3 cycles).
>
> What do you think? Is my patch correct, or should I fix the issue in a
> different way?

I think we should do something like below.

hrtimer_init() is already called in sched_fork(), so we shouldn't do this
twice. Also, we shouldn't reinit dl_throttled etc if timer is pending,
and we should prevent a race here.

This passes your test (I waited for 30 cycles), but there's no SOB,
because I need a little time to check everything once again.

 include/linux/sched/deadline.h |  2 ++
 kernel/sched/core.c            | 12 ++++++++----
 kernel/sched/deadline.c        | 10 +---------
 kernel/sched/sched.h           |  1 -
 4 files changed, 11 insertions(+), 14 deletions(-)
diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
index 9d303b8..c70a7fc 100644
--- a/include/linux/sched/deadline.h
+++ b/include/linux/sched/deadline.h
@@ -21,4 +21,6 @@ static inline int dl_task(struct task_struct *p)
 	return dl_prio(p->prio);
 }
 
+extern enum hrtimer_restart dl_task_timer(struct hrtimer *timer);
+
 #endif /* _SCHED_DEADLINE_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c0accc0..a388cc8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1840,6 +1840,7 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
 
 	RB_CLEAR_NODE(&p->dl.rb_node);
 	hrtimer_init(&p->dl.dl_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	p->dl.dl_timer.function = dl_task_timer;
 	__dl_clear_params(p);
 
 	INIT_LIST_HEAD(&p->rt.run_list);
@@ -3250,16 +3251,19 @@ static void
 __setparam_dl(struct task_struct *p, const struct sched_attr *attr)
 {
 	struct sched_dl_entity *dl_se = &p->dl;
+        struct hrtimer *timer = &dl_se->dl_timer;
+
+	if (!hrtimer_active(timer) || hrtimer_try_to_cancel(timer) != -1) {
+		dl_se->dl_throttled = 0;
+		dl_se->dl_new = 1;
+		dl_se->dl_yielded = 0;
+	}
 
-	init_dl_task_timer(dl_se);
 	dl_se->dl_runtime = attr->sched_runtime;
 	dl_se->dl_deadline = attr->sched_deadline;
 	dl_se->dl_period = attr->sched_period ?: dl_se->dl_deadline;
 	dl_se->flags = attr->sched_flags;
 	dl_se->dl_bw = to_ratio(dl_se->dl_period, dl_se->dl_runtime);
-	dl_se->dl_throttled = 0;
-	dl_se->dl_new = 1;
-	dl_se->dl_yielded = 0;
 }
 
 /*
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index e5db8c6..8649bd6 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -500,7 +500,7 @@ static int start_dl_timer(struct sched_dl_entity *dl_se, bool boosted)
  * updating (and the queueing back to dl_rq) will be done by the
  * next call to enqueue_task_dl().
  */
-static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
+enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
 {
 	struct sched_dl_entity *dl_se = container_of(timer,
 						     struct sched_dl_entity,
@@ -559,14 +559,6 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
 	return HRTIMER_NORESTART;
 }
 
-void init_dl_task_timer(struct sched_dl_entity *dl_se)
-{
-	struct hrtimer *timer = &dl_se->dl_timer;
-
-	hrtimer_init(timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-	timer->function = dl_task_timer;
-}
-
 static
 int dl_runtime_exceeded(struct rq *rq, struct sched_dl_entity *dl_se)
 {
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 9a2a45c..ad3a2f0 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1257,7 +1257,6 @@ extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 runtime
 
 extern struct dl_bandwidth def_dl_bandwidth;
 extern void init_dl_bandwidth(struct dl_bandwidth *dl_b, u64 period, u64 runtime);
-extern void init_dl_task_timer(struct sched_dl_entity *dl_se);
 
 unsigned long to_ratio(u64 period, u64 runtime);
 

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: Another SCHED_DEADLINE bug (with bisection and possible fix)
  2015-01-04 22:51 ` Kirill Tkhai
@ 2015-01-05 15:21   ` Luca Abeni
  2015-01-05 23:07     ` Kirill Tkhai
  0 siblings, 1 reply; 26+ messages in thread
From: Luca Abeni @ 2015-01-05 15:21 UTC (permalink / raw)
  To: Kirill Tkhai, Peter Zijlstra; +Cc: Ingo Molnar, Juri Lelli, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3466 bytes --]

Hi Kirill,

On 01/04/2015 11:51 PM, Kirill Tkhai wrote:
> Hi, Luca,
>
> I've just notived this.
[...]
> I think we should do something like below.
>
> hrtimer_init() is already called in sched_fork(), so we shouldn't do this
> twice. Also, we shouldn't reinit dl_throttled etc if timer is pending,
> and we should prevent a race here.
Thanks for the comments (I did not notice that hrtimer_init() was called twice)
and for the patch. I'll test it in the next days.

For reference, I attach the patch I am using locally (based on what I suggested
in my previous mail) and seems to work fine here.

Based on your comments, I suspect my patch can be further simplified by moving
the call to init_dl_task_timer() in __sched_fork().

[...]
> @@ -3250,16 +3251,19 @@ static void
>   __setparam_dl(struct task_struct *p, const struct sched_attr *attr)
>   {
>   	struct sched_dl_entity *dl_se = &p->dl;
> +        struct hrtimer *timer = &dl_se->dl_timer;
> +
> +	if (!hrtimer_active(timer) || hrtimer_try_to_cancel(timer) != -1) {
Just for the sake of curiosity, why trying to cancel the timer
("|| hrtimer_try_to_cancel(timer)") here? If it is active, cannot we leave it
active (without touching dl_throttled, dl_new and dl_yielded)?

I mean: if I try to change the parameters of a task when it is throttled, I'd like
it to stay throttled until the end of the reservation period... Or am I missing
something?


				Thanks,
					Luca

> +		dl_se->dl_throttled = 0;
> +		dl_se->dl_new = 1;
> +		dl_se->dl_yielded = 0;
> +	}
>
> -	init_dl_task_timer(dl_se);
>   	dl_se->dl_runtime = attr->sched_runtime;
>   	dl_se->dl_deadline = attr->sched_deadline;
>   	dl_se->dl_period = attr->sched_period ?: dl_se->dl_deadline;
>   	dl_se->flags = attr->sched_flags;
>   	dl_se->dl_bw = to_ratio(dl_se->dl_period, dl_se->dl_runtime);
> -	dl_se->dl_throttled = 0;
> -	dl_se->dl_new = 1;
> -	dl_se->dl_yielded = 0;
>   }
>
>   /*
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index e5db8c6..8649bd6 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -500,7 +500,7 @@ static int start_dl_timer(struct sched_dl_entity *dl_se, bool boosted)
>    * updating (and the queueing back to dl_rq) will be done by the
>    * next call to enqueue_task_dl().
>    */
> -static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
> +enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
>   {
>   	struct sched_dl_entity *dl_se = container_of(timer,
>   						     struct sched_dl_entity,
> @@ -559,14 +559,6 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
>   	return HRTIMER_NORESTART;
>   }
>
> -void init_dl_task_timer(struct sched_dl_entity *dl_se)
> -{
> -	struct hrtimer *timer = &dl_se->dl_timer;
> -
> -	hrtimer_init(timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> -	timer->function = dl_task_timer;
> -}
> -
>   static
>   int dl_runtime_exceeded(struct rq *rq, struct sched_dl_entity *dl_se)
>   {
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 9a2a45c..ad3a2f0 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1257,7 +1257,6 @@ extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 runtime
>
>   extern struct dl_bandwidth def_dl_bandwidth;
>   extern void init_dl_bandwidth(struct dl_bandwidth *dl_b, u64 period, u64 runtime);
> -extern void init_dl_task_timer(struct sched_dl_entity *dl_se);
>
>   unsigned long to_ratio(u64 period, u64 runtime);
>
>


[-- Attachment #2: 0003-Do-not-initialize-the-deadline-timer-if-it-is-alread.patch --]
[-- Type: text/x-diff, Size: 1543 bytes --]

>From 7a0e6747c40cf9186f3645eb94408090ab11936a Mon Sep 17 00:00:00 2001
From: Luca Abeni <luca.abeni@unitn.it>
Date: Sat, 27 Dec 2014 18:20:57 +0100
Subject: [PATCH 03/11] Do not initialize the deadline timer if it is already
 initialized

After commit 67dfa1b756f250972bde31d65e3f8fde6aeddc5b changing the
parameters of a SCHED_DEADLINE tasks might crash the system. This
happens because 67dfa1b756f250972bde31d65e3f8fde6aeddc5b removed the
following lines from init_dl_task_timer():
-       if (hrtimer_active(timer)) {
-               hrtimer_try_to_cancel(timer);
-               return;
-       }

As a result, if sched_setattr() is invoked to change the parameters
(runtime or period) of a SCHED_DEADLINE task, init_dl_task_timer()
might end up initializing a pending timer.

Fix this problem by modifying __setparam_dl() to call init_dl_task_timer()
only if the task is not already a SCHED_DEADLINE one.
---
 kernel/sched/core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b5797b7..470111c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3251,7 +3251,9 @@ __setparam_dl(struct task_struct *p, const struct sched_attr *attr)
 {
 	struct sched_dl_entity *dl_se = &p->dl;
 
-	init_dl_task_timer(dl_se);
+        if (p->sched_class != &dl_sched_class) {
+		init_dl_task_timer(dl_se);
+	}
 	dl_se->dl_runtime = attr->sched_runtime;
 	dl_se->dl_deadline = attr->sched_deadline;
 	dl_se->dl_period = attr->sched_period ?: dl_se->dl_deadline;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: Another SCHED_DEADLINE bug (with bisection and possible fix)
  2015-01-05 15:21   ` Luca Abeni
@ 2015-01-05 23:07     ` Kirill Tkhai
  2015-01-07  7:01       ` Luca Abeni
  0 siblings, 1 reply; 26+ messages in thread
From: Kirill Tkhai @ 2015-01-05 23:07 UTC (permalink / raw)
  To: Luca Abeni; +Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, linux-kernel

On Пн, 2015-01-05 at 16:21 +0100, Luca Abeni wrote:
> Hi Kirill,
> 
> On 01/04/2015 11:51 PM, Kirill Tkhai wrote:
> > Hi, Luca,
> >
> > I've just notived this.
> [...]
> > I think we should do something like below.
> >
> > hrtimer_init() is already called in sched_fork(), so we shouldn't do this
> > twice. Also, we shouldn't reinit dl_throttled etc if timer is pending,
> > and we should prevent a race here.
> Thanks for the comments (I did not notice that hrtimer_init() was called twice)
> and for the patch. I'll test it in the next days.
> 
> For reference, I attach the patch I am using locally (based on what I suggested
> in my previous mail) and seems to work fine here.
> 
> Based on your comments, I suspect my patch can be further simplified by moving
> the call to init_dl_task_timer() in __sched_fork().

It seems this way has problems. The first one is that task may become throttled
again, and we will start dl_timer again. The second is that it's better to minimize
number of combination of situations we have. Let's keep only one combination:
timer is set <-> task is throttled. This makes easier the further development
(just makes less of combinations we have to keep in mind).

> [...]
> > @@ -3250,16 +3251,19 @@ static void
> >   __setparam_dl(struct task_struct *p, const struct sched_attr *attr)
> >   {
> >   	struct sched_dl_entity *dl_se = &p->dl;
> > +        struct hrtimer *timer = &dl_se->dl_timer;
> > +
> > +	if (!hrtimer_active(timer) || hrtimer_try_to_cancel(timer) != -1) {
> Just for the sake of curiosity, why trying to cancel the timer
> ("|| hrtimer_try_to_cancel(timer)") here? If it is active, cannot we leave it
> active (without touching dl_throttled, dl_new and dl_yielded)?
> 
> I mean: if I try to change the parameters of a task when it is throttled, I'd like
> it to stay throttled until the end of the reservation period... Or am I missing
> something?

I think that when people change task's parameters, they want the kernel reacts
on this immediately. For example, you want to kill throttled deadline task.
You change parameters, but nothing happens. I think all developers had this
use case when they were debugging deadline class.

> 
> 
> 				Thanks,
> 					Luca
> 
> > +		dl_se->dl_throttled = 0;
> > +		dl_se->dl_new = 1;
> > +		dl_se->dl_yielded = 0;
> > +	}
> >
> > -	init_dl_task_timer(dl_se);
> >   	dl_se->dl_runtime = attr->sched_runtime;
> >   	dl_se->dl_deadline = attr->sched_deadline;
> >   	dl_se->dl_period = attr->sched_period ?: dl_se->dl_deadline;
> >   	dl_se->flags = attr->sched_flags;
> >   	dl_se->dl_bw = to_ratio(dl_se->dl_period, dl_se->dl_runtime);
> > -	dl_se->dl_throttled = 0;
> > -	dl_se->dl_new = 1;
> > -	dl_se->dl_yielded = 0;
> >   }
> >
> >   /*
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index e5db8c6..8649bd6 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -500,7 +500,7 @@ static int start_dl_timer(struct sched_dl_entity *dl_se, bool boosted)
> >    * updating (and the queueing back to dl_rq) will be done by the
> >    * next call to enqueue_task_dl().
> >    */
> > -static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
> > +enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
> >   {
> >   	struct sched_dl_entity *dl_se = container_of(timer,
> >   						     struct sched_dl_entity,
> > @@ -559,14 +559,6 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
> >   	return HRTIMER_NORESTART;
> >   }
> >
> > -void init_dl_task_timer(struct sched_dl_entity *dl_se)
> > -{
> > -	struct hrtimer *timer = &dl_se->dl_timer;
> > -
> > -	hrtimer_init(timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> > -	timer->function = dl_task_timer;
> > -}
> > -
> >   static
> >   int dl_runtime_exceeded(struct rq *rq, struct sched_dl_entity *dl_se)
> >   {
> > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> > index 9a2a45c..ad3a2f0 100644
> > --- a/kernel/sched/sched.h
> > +++ b/kernel/sched/sched.h
> > @@ -1257,7 +1257,6 @@ extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 runtime
> >
> >   extern struct dl_bandwidth def_dl_bandwidth;
> >   extern void init_dl_bandwidth(struct dl_bandwidth *dl_b, u64 period, u64 runtime);
> > -extern void init_dl_task_timer(struct sched_dl_entity *dl_se);
> >
> >   unsigned long to_ratio(u64 period, u64 runtime);
> >
> >
> 

Please, try this one instead of the patch I sent yesterday.

[PATCH]sched/dl: Prevent initialization of already set dl_timer

__setparam_dl() may be called for a task which is already of
deadline class. If the task is throttled, we corrupt its dl_timer
when we are doing hrtimer_init() in init_dl_task_timer().

It seems that__setparam_dl() is not the place where we want
to use cancel_dl_timer(). It may unlock rq while the call chain
is long, and some of previous functions in the chain may be
unhappy for this reason (now and in the future). So, we just
try to cancel the timer, and in some situations we fail to do that.

If hrtimer_try_to_cancel() fails, than the handler is being
executed on other processor, and it's waiting for rq->lock.
The most probably the handler will take the lock right after
we release it.

So, let's pass the actions, which we do here usually here,
to the handler. It will unthrottle and queue us properly.

Reported-by: Luca Abeni <luca.abeni@unitn.it>
Fixes: 67dfa1b756f2 "sched/deadline: Implement cancel_dl_timer() to use ..."
Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>

 include/linux/sched/deadline.h |  2 ++
 kernel/sched/core.c            | 21 +++++++++++++++++----
 kernel/sched/deadline.c        | 10 +---------
 kernel/sched/sched.h           |  1 -
 4 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
index 9d303b8..c70a7fc 100644
--- a/include/linux/sched/deadline.h
+++ b/include/linux/sched/deadline.h
@@ -21,4 +21,6 @@ static inline int dl_task(struct task_struct *p)
 	return dl_prio(p->prio);
 }
 
+extern enum hrtimer_restart dl_task_timer(struct hrtimer *timer);
+
 #endif /* _SCHED_DEADLINE_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c0accc0..8fc8a2d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1840,6 +1840,7 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
 
 	RB_CLEAR_NODE(&p->dl.rb_node);
 	hrtimer_init(&p->dl.dl_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	p->dl.dl_timer.function = dl_task_timer;
 	__dl_clear_params(p);
 
 	INIT_LIST_HEAD(&p->rt.run_list);
@@ -3250,16 +3251,28 @@ static void
 __setparam_dl(struct task_struct *p, const struct sched_attr *attr)
 {
 	struct sched_dl_entity *dl_se = &p->dl;
+        struct hrtimer *timer = &dl_se->dl_timer;
+	struct rq *rq = task_rq(p);
+
+	/*
+	 * There may be a race with dl_task_timer. If it's pending and
+	 * we've failed to cancel it, timer's handler will unthrottle
+	 * the task right after p's rq is unlocked.
+	 */
+	if (!hrtimer_active(timer) || hrtimer_try_to_cancel(timer) != -1) {
+		dl_se->dl_throttled = 0;
+		dl_se->dl_new = 1;
+		dl_se->dl_yielded = 0;
+	} else {
+		dl_se->deadline = rq_clock(rq) + attr->sched_deadline;
+		dl_se->runtime = attr->sched_runtime;
+	}
 
-	init_dl_task_timer(dl_se);
 	dl_se->dl_runtime = attr->sched_runtime;
 	dl_se->dl_deadline = attr->sched_deadline;
 	dl_se->dl_period = attr->sched_period ?: dl_se->dl_deadline;
 	dl_se->flags = attr->sched_flags;
 	dl_se->dl_bw = to_ratio(dl_se->dl_period, dl_se->dl_runtime);
-	dl_se->dl_throttled = 0;
-	dl_se->dl_new = 1;
-	dl_se->dl_yielded = 0;
 }
 
 /*
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index e5db8c6..8649bd6 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -500,7 +500,7 @@ static int start_dl_timer(struct sched_dl_entity *dl_se, bool boosted)
  * updating (and the queueing back to dl_rq) will be done by the
  * next call to enqueue_task_dl().
  */
-static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
+enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
 {
 	struct sched_dl_entity *dl_se = container_of(timer,
 						     struct sched_dl_entity,
@@ -559,14 +559,6 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
 	return HRTIMER_NORESTART;
 }
 
-void init_dl_task_timer(struct sched_dl_entity *dl_se)
-{
-	struct hrtimer *timer = &dl_se->dl_timer;
-
-	hrtimer_init(timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-	timer->function = dl_task_timer;
-}
-
 static
 int dl_runtime_exceeded(struct rq *rq, struct sched_dl_entity *dl_se)
 {
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 9a2a45c..ad3a2f0 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1257,7 +1257,6 @@ extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 runtime
 
 extern struct dl_bandwidth def_dl_bandwidth;
 extern void init_dl_bandwidth(struct dl_bandwidth *dl_b, u64 period, u64 runtime);
-extern void init_dl_task_timer(struct sched_dl_entity *dl_se);
 
 unsigned long to_ratio(u64 period, u64 runtime);
 



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: Another SCHED_DEADLINE bug (with bisection and possible fix)
  2015-01-05 23:07     ` Kirill Tkhai
@ 2015-01-07  7:01       ` Luca Abeni
  2015-01-07 12:29         ` Kirill Tkhai
  0 siblings, 1 reply; 26+ messages in thread
From: Luca Abeni @ 2015-01-07  7:01 UTC (permalink / raw)
  To: tkhai; +Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, linux-kernel

Hi Kirill,

On Tue, 06 Jan 2015 02:07:21 +0300
Kirill Tkhai <tkhai@yandex.ru> wrote:

> On Пн, 2015-01-05 at 16:21 +0100, Luca Abeni wrote:
[...]
> > For reference, I attach the patch I am using locally (based on what
> > I suggested in my previous mail) and seems to work fine here.
> > 
> > Based on your comments, I suspect my patch can be further
> > simplified by moving the call to init_dl_task_timer() in
> > __sched_fork().
> 
> It seems this way has problems. The first one is that task may become
> throttled again, and we will start dl_timer again.
Well, in my understanding if I change the parameters of a
SCHED_DEADLINE task when it is throttled, it stays throttled... So, the
task might not become throttled again before the dl timer fires.
So, I hoped this problem does not exist. But I might be wrong.

> The second is that
> it's better to minimize number of combination of situations we have.
> Let's keep only one combination: timer is set <-> task is throttled.
Yes, this was my goal too... So, if I change the parameters of a task
when it is throttled, I leave dl_throttled set to 1 and I leave the
timer active.

[...]
> > > @@ -3250,16 +3251,19 @@ static void
> > >   __setparam_dl(struct task_struct *p, const struct sched_attr
> > > *attr) {
> > >   	struct sched_dl_entity *dl_se = &p->dl;
> > > +        struct hrtimer *timer = &dl_se->dl_timer;
> > > +
> > > +	if (!hrtimer_active(timer) ||
> > > hrtimer_try_to_cancel(timer) != -1) {
> > Just for the sake of curiosity, why trying to cancel the timer
> > ("|| hrtimer_try_to_cancel(timer)") here? If it is active, cannot
> > we leave it active (without touching dl_throttled, dl_new and
> > dl_yielded)?
> > 
> > I mean: if I try to change the parameters of a task when it is
> > throttled, I'd like it to stay throttled until the end of the
> > reservation period... Or am I missing something?
> 
> I think that when people change task's parameters, they want the
> kernel reacts on this immediately. For example, you want to kill
> throttled deadline task. You change parameters, but nothing happens.
> I think all developers had this use case when they were debugging
> deadline class.
I see... Different people have different requirements :)
My goal was to do something like adaptive scheduling (or scheduling
tasks with mode changes), so I did not want that changing the
scheduling parameters of a task affected the scheduling of the other
tasks... But if a task exits the throttled state when I change its
parameters, it might consume much more than the reserved CPU time.
Also, I suspect this kind of approach can be exploited by malicious
users: if I create a task with runtime 30ms and period 100ms, and I
change its scheduling parameters (to runtime=29ms and back) frequently
enough, I can consume much more than 30% of the CPU time...

Anyway, I am fine with every patch that fixes the bug :)



			Thanks,
				Luca

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another SCHED_DEADLINE bug (with bisection and possible fix)
  2015-01-07  7:01       ` Luca Abeni
@ 2015-01-07 12:29         ` Kirill Tkhai
  2015-01-07 12:45           ` Luca Abeni
  2015-01-13  8:10           ` Juri Lelli
  0 siblings, 2 replies; 26+ messages in thread
From: Kirill Tkhai @ 2015-01-07 12:29 UTC (permalink / raw)
  To: Luca Abeni; +Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, linux-kernel

On Ср, 2015-01-07 at 08:01 +0100, Luca Abeni wrote:
> Hi Kirill,
> 
> On Tue, 06 Jan 2015 02:07:21 +0300
> Kirill Tkhai <tkhai@yandex.ru> wrote:
> 
> > On Пн, 2015-01-05 at 16:21 +0100, Luca Abeni wrote:
> [...]
> > > For reference, I attach the patch I am using locally (based on what
> > > I suggested in my previous mail) and seems to work fine here.
> > > 
> > > Based on your comments, I suspect my patch can be further
> > > simplified by moving the call to init_dl_task_timer() in
> > > __sched_fork().
> > 
> > It seems this way has problems. The first one is that task may become
> > throttled again, and we will start dl_timer again.
> Well, in my understanding if I change the parameters of a
> SCHED_DEADLINE task when it is throttled, it stays throttled... So, the
> task might not become throttled again before the dl timer fires.
> So, I hoped this problem does not exist. But I might be wrong.

You keep zeroing of dl_se->dl_throttled, and further enqueue_task()
places it on the dl_rq. So, further update_curr_dl() may make it throttled
again, and it will try to start dl_timer (which is already set).

> > The second is that
> > it's better to minimize number of combination of situations we have.
> > Let's keep only one combination: timer is set <-> task is throttled.
> Yes, this was my goal too... So, if I change the parameters of a task
> when it is throttled, I leave dl_throttled set to 1 and I leave the
> timer active.

As I see,

dl_se->dl_throttled = 0;

is still in __setparam_dl() after your patch, so you do not leave
it set to 1.

> 
> [...]
> > > > @@ -3250,16 +3251,19 @@ static void
> > > >   __setparam_dl(struct task_struct *p, const struct sched_attr
> > > > *attr) {
> > > >   	struct sched_dl_entity *dl_se = &p->dl;
> > > > +        struct hrtimer *timer = &dl_se->dl_timer;
> > > > +
> > > > +	if (!hrtimer_active(timer) ||
> > > > hrtimer_try_to_cancel(timer) != -1) {
> > > Just for the sake of curiosity, why trying to cancel the timer
> > > ("|| hrtimer_try_to_cancel(timer)") here? If it is active, cannot
> > > we leave it active (without touching dl_throttled, dl_new and
> > > dl_yielded)?
> > > 
> > > I mean: if I try to change the parameters of a task when it is
> > > throttled, I'd like it to stay throttled until the end of the
> > > reservation period... Or am I missing something?
> > 
> > I think that when people change task's parameters, they want the
> > kernel reacts on this immediately. For example, you want to kill
> > throttled deadline task. You change parameters, but nothing happens.
> > I think all developers had this use case when they were debugging
> > deadline class.
> I see... Different people have different requirements :)
> My goal was to do something like adaptive scheduling (or scheduling
> tasks with mode changes), so I did not want that changing the
> scheduling parameters of a task affected the scheduling of the other
> tasks... But if a task exits the throttled state when I change its
> parameters, it might consume much more than the reserved CPU time.
> Also, I suspect this kind of approach can be exploited by malicious
> users: if I create a task with runtime 30ms and period 100ms, and I
> change its scheduling parameters (to runtime=29ms and back) frequently
> enough, I can consume much more than 30% of the CPU time...
> 
> Anyway, I am fine with every patch that fixes the bug :)

Deadline class requires root privileges. So, I do not see a problem
here. Please, see __sched_setscheduler().

If in the future we allow non-privileged users to increase deadline,
we will reflect that in __setparam_dl() too.

Thanks,
Kirill.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another SCHED_DEADLINE bug (with bisection and possible fix)
  2015-01-07 12:29         ` Kirill Tkhai
@ 2015-01-07 12:45           ` Luca Abeni
  2015-01-07 13:04             ` Kirill Tkhai
  2015-01-13  8:10           ` Juri Lelli
  1 sibling, 1 reply; 26+ messages in thread
From: Luca Abeni @ 2015-01-07 12:45 UTC (permalink / raw)
  To: tkhai; +Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, linux-kernel

On 01/07/2015 01:29 PM, Kirill Tkhai wrote:
[...]
>>>> Based on your comments, I suspect my patch can be further
>>>> simplified by moving the call to init_dl_task_timer() in
>>>> __sched_fork().
>>>
>>> It seems this way has problems. The first one is that task may become
>>> throttled again, and we will start dl_timer again.
>> Well, in my understanding if I change the parameters of a
>> SCHED_DEADLINE task when it is throttled, it stays throttled... So, the
>> task might not become throttled again before the dl timer fires.
>> So, I hoped this problem does not exist. But I might be wrong.
>
> You keep zeroing of dl_se->dl_throttled
Right... Now that you point this out, I realize that dl_se->dl_throttled = 0
should be inside the if().

> and further enqueue_task() places it on the dl_rq.
I was under the impression that no further enqueue_task() will happen (since
the task is throttled, it is not on runqueue, so __sched_setscheduler() will
not dequeue/enqueue it).
But I am probably missing something else :)

>>> The second is that
>>> it's better to minimize number of combination of situations we have.
>>> Let's keep only one combination: timer is set <-> task is throttled.
>> Yes, this was my goal too... So, if I change the parameters of a task
>> when it is throttled, I leave dl_throttled set to 1 and I leave the
>> timer active.
>
> As I see,
>
> dl_se->dl_throttled = 0;
>
> is still in __setparam_dl() after your patch, so you do not leave
> it set to 1.
You are right, my fault.

[...]
>>> I think that when people change task's parameters, they want the
>>> kernel reacts on this immediately. For example, you want to kill
>>> throttled deadline task. You change parameters, but nothing happens.
>>> I think all developers had this use case when they were debugging
>>> deadline class.
>> I see... Different people have different requirements :)
>> My goal was to do something like adaptive scheduling (or scheduling
>> tasks with mode changes), so I did not want that changing the
>> scheduling parameters of a task affected the scheduling of the other
>> tasks... But if a task exits the throttled state when I change its
>> parameters, it might consume much more than the reserved CPU time.
>> Also, I suspect this kind of approach can be exploited by malicious
>> users: if I create a task with runtime 30ms and period 100ms, and I
>> change its scheduling parameters (to runtime=29ms and back) frequently
>> enough, I can consume much more than 30% of the CPU time...
>>
>> Anyway, I am fine with every patch that fixes the bug :)
>
> Deadline class requires root privileges. So, I do not see a problem
> here. Please, see __sched_setscheduler().
I know... But the final goal is to allow non-root users to use SCHED_DEADLINE,
so I was thinking about future problems.


> If in the future we allow non-privileged users to increase deadline,
> we will reflect that in __setparam_dl() too.
Ok.



			Thanks,
				Luca


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another SCHED_DEADLINE bug (with bisection and possible fix)
  2015-01-07 12:45           ` Luca Abeni
@ 2015-01-07 13:04             ` Kirill Tkhai
  2015-01-07 13:14               ` Luca Abeni
  2015-01-09 11:15               ` Luca Abeni
  0 siblings, 2 replies; 26+ messages in thread
From: Kirill Tkhai @ 2015-01-07 13:04 UTC (permalink / raw)
  To: Luca Abeni; +Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, linux-kernel

On Ср, 2015-01-07 at 13:45 +0100, Luca Abeni wrote:
> On 01/07/2015 01:29 PM, Kirill Tkhai wrote:
> [...]
> >>>> Based on your comments, I suspect my patch can be further
> >>>> simplified by moving the call to init_dl_task_timer() in
> >>>> __sched_fork().
> >>>
> >>> It seems this way has problems. The first one is that task may become
> >>> throttled again, and we will start dl_timer again.
> >> Well, in my understanding if I change the parameters of a
> >> SCHED_DEADLINE task when it is throttled, it stays throttled... So, the
> >> task might not become throttled again before the dl timer fires.
> >> So, I hoped this problem does not exist. But I might be wrong.
> >
> > You keep zeroing of dl_se->dl_throttled
> Right... Now that you point this out, I realize that dl_se->dl_throttled = 0
> should be inside the if().
> 
> > and further enqueue_task() places it on the dl_rq.
> I was under the impression that no further enqueue_task() will happen (since
> the task is throttled, it is not on runqueue, so __sched_setscheduler() will
> not dequeue/enqueue it).
> But I am probably missing something else :)

We have two concept of "on runqueue". The first one is rq->on_rq. It means
that a task is "queued". The second is on_dl_rq(dl_se).

When task is not "queued", it's always not on dl_rq.

When task is "queued" it may be in two states:
1)on_dl_rq() -- this means the task is not throttled;
2)!on_dl_rq() -- is task as throttled.

So when we are discussing about a throttled task, the task is "queued". If
you clear dl_throttled, __sched_setscheduler() places it back it the both
meaning: on_rq and on_dl_rq, and the task becomes available for picking
in __schedule().

> 
> >>> The second is that
> >>> it's better to minimize number of combination of situations we have.
> >>> Let's keep only one combination: timer is set <-> task is throttled.
> >> Yes, this was my goal too... So, if I change the parameters of a task
> >> when it is throttled, I leave dl_throttled set to 1 and I leave the
> >> timer active.
> >
> > As I see,
> >
> > dl_se->dl_throttled = 0;
> >
> > is still in __setparam_dl() after your patch, so you do not leave
> > it set to 1.
> You are right, my fault.
> 
> [...]
> >>> I think that when people change task's parameters, they want the
> >>> kernel reacts on this immediately. For example, you want to kill
> >>> throttled deadline task. You change parameters, but nothing happens.
> >>> I think all developers had this use case when they were debugging
> >>> deadline class.
> >> I see... Different people have different requirements :)
> >> My goal was to do something like adaptive scheduling (or scheduling
> >> tasks with mode changes), so I did not want that changing the
> >> scheduling parameters of a task affected the scheduling of the other
> >> tasks... But if a task exits the throttled state when I change its
> >> parameters, it might consume much more than the reserved CPU time.
> >> Also, I suspect this kind of approach can be exploited by malicious
> >> users: if I create a task with runtime 30ms and period 100ms, and I
> >> change its scheduling parameters (to runtime=29ms and back) frequently
> >> enough, I can consume much more than 30% of the CPU time...
> >>
> >> Anyway, I am fine with every patch that fixes the bug :)
> >
> > Deadline class requires root privileges. So, I do not see a problem
> > here. Please, see __sched_setscheduler().
> I know... But the final goal is to allow non-root users to use SCHED_DEADLINE,
> so I was thinking about future problems.

I think everything may change many times before we implement that. It's better
to keep the code in the consistent state.

> > If in the future we allow non-privileged users to increase deadline,
> > we will reflect that in __setparam_dl() too.
> Ok.

Does my patch help you? It helps me, but anyway I need your confirmation.

Thanks,
Kirill.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another SCHED_DEADLINE bug (with bisection and possible fix)
  2015-01-07 13:04             ` Kirill Tkhai
@ 2015-01-07 13:14               ` Luca Abeni
  2015-01-09 11:15               ` Luca Abeni
  1 sibling, 0 replies; 26+ messages in thread
From: Luca Abeni @ 2015-01-07 13:14 UTC (permalink / raw)
  To: tkhai; +Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, linux-kernel

On 01/07/2015 02:04 PM, Kirill Tkhai wrote:
[...]
>>> and further enqueue_task() places it on the dl_rq.
>> I was under the impression that no further enqueue_task() will happen (since
>> the task is throttled, it is not on runqueue, so __sched_setscheduler() will
>> not dequeue/enqueue it).
>> But I am probably missing something else :)
>
> We have two concept of "on runqueue". The first one is rq->on_rq. It means
> that a task is "queued". The second is on_dl_rq(dl_se).
>
> When task is not "queued", it's always not on dl_rq.
>
> When task is "queued" it may be in two states:
> 1)on_dl_rq() -- this means the task is not throttled;
> 2)!on_dl_rq() -- is task as throttled.
>
> So when we are discussing about a throttled task, the task is "queued". If
> you clear dl_throttled, __sched_setscheduler() places it back it the both
> meaning: on_rq and on_dl_rq, and the task becomes available for picking
> in __schedule().
Ah, I see. Thanks for explaining! Now, everything is more clear and I agree
with you.

[...]
> Does my patch help you? It helps me, but anyway I need your confirmation.
I am just back from vacations, and I had no time to test it yet... I hope to
test it before the end of the week, and I'll let you know (but now I am convinced
that it should help).



			Thanks again,
				Luca


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another SCHED_DEADLINE bug (with bisection and possible fix)
  2015-01-07 13:04             ` Kirill Tkhai
  2015-01-07 13:14               ` Luca Abeni
@ 2015-01-09 11:15               ` Luca Abeni
  2015-01-09 11:46                 ` Kirill Tkhai
  1 sibling, 1 reply; 26+ messages in thread
From: Luca Abeni @ 2015-01-09 11:15 UTC (permalink / raw)
  To: tkhai; +Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, linux-kernel

Hi Kirill,

On 01/07/2015 02:04 PM, Kirill Tkhai wrote:
[...]
>>> If in the future we allow non-privileged users to increase deadline,
>>> we will reflect that in __setparam_dl() too.
>> Ok.
>
> Does my patch help you? It helps me, but anyway I need your confirmation.
Sorry about the delay... Anyway, I finally had time to test your patch, and
as expected it fixes the hang I was seeing.



				Thanks,
					Luca


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another SCHED_DEADLINE bug (with bisection and possible fix)
  2015-01-09 11:15               ` Luca Abeni
@ 2015-01-09 11:46                 ` Kirill Tkhai
  0 siblings, 0 replies; 26+ messages in thread
From: Kirill Tkhai @ 2015-01-09 11:46 UTC (permalink / raw)
  To: Luca Abeni; +Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, linux-kernel


09.01.2015, 14:15, "Luca Abeni" <luca.abeni@unitn.it>:
> Hi Kirill,
>
> On 01/07/2015 02:04 PM, Kirill Tkhai wrote:
> [...]
>>>>  If in the future we allow non-privileged users to increase deadline,
>>>>  we will reflect that in __setparam_dl() too.
>>>  Ok.
>>  Does my patch help you? It helps me, but anyway I need your confirmation.
>
> Sorry about the delay... Anyway, I finally had time to test your patch, and
> as expected it fixes the hang I was seeing.

Thanks for the testing.

Kirill

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another SCHED_DEADLINE bug (with bisection and possible fix)
  2015-01-07 12:29         ` Kirill Tkhai
  2015-01-07 12:45           ` Luca Abeni
@ 2015-01-13  8:10           ` Juri Lelli
  2015-01-13  9:26             ` Kirill Tkhai
  1 sibling, 1 reply; 26+ messages in thread
From: Juri Lelli @ 2015-01-13  8:10 UTC (permalink / raw)
  To: tkhai, Luca Abeni; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, juri.lelli

Hi all,

really sorry for the huge delay in replying to this! :(

On 07/01/2015 12:29, Kirill Tkhai wrote:
> On Ср, 2015-01-07 at 08:01 +0100, Luca Abeni wrote:
>> Hi Kirill,
>>
>> On Tue, 06 Jan 2015 02:07:21 +0300
>> Kirill Tkhai <tkhai@yandex.ru> wrote:
>>
>>> On Пн, 2015-01-05 at 16:21 +0100, Luca Abeni wrote:
>> [...]
>>>> For reference, I attach the patch I am using locally (based on what
>>>> I suggested in my previous mail) and seems to work fine here.
>>>>
>>>> Based on your comments, I suspect my patch can be further
>>>> simplified by moving the call to init_dl_task_timer() in
>>>> __sched_fork().
>>>
>>> It seems this way has problems. The first one is that task may become
>>> throttled again, and we will start dl_timer again.
>> Well, in my understanding if I change the parameters of a
>> SCHED_DEADLINE task when it is throttled, it stays throttled... So, the
>> task might not become throttled again before the dl timer fires.
>> So, I hoped this problem does not exist. But I might be wrong.
> 
> You keep zeroing of dl_se->dl_throttled, and further enqueue_task()
> places it on the dl_rq. So, further update_curr_dl() may make it throttled
> again, and it will try to start dl_timer (which is already set).
> 
>>> The second is that
>>> it's better to minimize number of combination of situations we have.
>>> Let's keep only one combination: timer is set <-> task is throttled.
>> Yes, this was my goal too... So, if I change the parameters of a task
>> when it is throttled, I leave dl_throttled set to 1 and I leave the
>> timer active.
> 
> As I see,
> 
> dl_se->dl_throttled = 0;
> 
> is still in __setparam_dl() after your patch, so you do not leave
> it set to 1.
> 
>>
>> [...]
>>>>> @@ -3250,16 +3251,19 @@ static void
>>>>>   __setparam_dl(struct task_struct *p, const struct sched_attr
>>>>> *attr) {
>>>>>   	struct sched_dl_entity *dl_se = &p->dl;
>>>>> +        struct hrtimer *timer = &dl_se->dl_timer;
>>>>> +
>>>>> +	if (!hrtimer_active(timer) ||
>>>>> hrtimer_try_to_cancel(timer) != -1) {
>>>> Just for the sake of curiosity, why trying to cancel the timer
>>>> ("|| hrtimer_try_to_cancel(timer)") here? If it is active, cannot
>>>> we leave it active (without touching dl_throttled, dl_new and
>>>> dl_yielded)?
>>>>
>>>> I mean: if I try to change the parameters of a task when it is
>>>> throttled, I'd like it to stay throttled until the end of the
>>>> reservation period... Or am I missing something?
>>>
>>> I think that when people change task's parameters, they want the
>>> kernel reacts on this immediately. For example, you want to kill
>>> throttled deadline task. You change parameters, but nothing happens.
>>> I think all developers had this use case when they were debugging
>>> deadline class.
>> I see... Different people have different requirements :)
>> My goal was to do something like adaptive scheduling (or scheduling
>> tasks with mode changes), so I did not want that changing the
>> scheduling parameters of a task affected the scheduling of the other
>> tasks... But if a task exits the throttled state when I change its
>> parameters, it might consume much more than the reserved CPU time.
>> Also, I suspect this kind of approach can be exploited by malicious
>> users: if I create a task with runtime 30ms and period 100ms, and I
>> change its scheduling parameters (to runtime=29ms and back) frequently
>> enough, I can consume much more than 30% of the CPU time...
>>

Well, I'm inclined to agree to Luca's viewpoint. We should not change
parameters of a throttled task or we may affect other tasks.

>> Anyway, I am fine with every patch that fixes the bug :)
> 
> Deadline class requires root privileges. So, I do not see a problem
> here. Please, see __sched_setscheduler().
> 
> If in the future we allow non-privileged users to increase deadline,
> we will reflect that in __setparam_dl() too.
> 

I'd say it is better to implement the right behavior even for root, so
that we will find it right when we'll grant access to non root users
too. Also, even if root can do everything, we always try not to break
guarantees that come with admission control (root or non root that is).

Thanks a lot,

- Juri


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another SCHED_DEADLINE bug (with bisection and possible fix)
  2015-01-13  8:10           ` Juri Lelli
@ 2015-01-13  9:26             ` Kirill Tkhai
  2015-01-13 14:04               ` Peter Zijlstra
  0 siblings, 1 reply; 26+ messages in thread
From: Kirill Tkhai @ 2015-01-13  9:26 UTC (permalink / raw)
  To: Juri Lelli, Luca Abeni
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, juri.lelli

Hi, Juri,

13.01.2015, 11:10, "Juri Lelli" <juri.lelli@arm.com>:
> Hi all,
>
> really sorry for the huge delay in replying to this! :(
>
> On 07/01/2015 12:29, Kirill Tkhai wrote:
>>  On Ср, 2015-01-07 at 08:01 +0100, Luca Abeni wrote:
>>>  Hi Kirill,
>>>
>>>  On Tue, 06 Jan 2015 02:07:21 +0300
>>>  Kirill Tkhai <tkhai@yandex.ru> wrote:
>>>>  On Пн, 2015-01-05 at 16:21 +0100, Luca Abeni wrote:
>>>  [...]
>>>>>  For reference, I attach the patch I am using locally (based on what
>>>>>  I suggested in my previous mail) and seems to work fine here.
>>>>>
>>>>>  Based on your comments, I suspect my patch can be further
>>>>>  simplified by moving the call to init_dl_task_timer() in
>>>>>  __sched_fork().
>>>>  It seems this way has problems. The first one is that task may become
>>>>  throttled again, and we will start dl_timer again.
>>>  Well, in my understanding if I change the parameters of a
>>>  SCHED_DEADLINE task when it is throttled, it stays throttled... So, the
>>>  task might not become throttled again before the dl timer fires.
>>>  So, I hoped this problem does not exist. But I might be wrong.
>>  You keep zeroing of dl_se->dl_throttled, and further enqueue_task()
>>  places it on the dl_rq. So, further update_curr_dl() may make it throttled
>>  again, and it will try to start dl_timer (which is already set).
>>>>  The second is that
>>>>  it's better to minimize number of combination of situations we have.
>>>>  Let's keep only one combination: timer is set <-> task is throttled.
>>>  Yes, this was my goal too... So, if I change the parameters of a task
>>>  when it is throttled, I leave dl_throttled set to 1 and I leave the
>>>  timer active.
>>  As I see,
>>
>>  dl_se->dl_throttled = 0;
>>
>>  is still in __setparam_dl() after your patch, so you do not leave
>>  it set to 1.
>>>  [...]
>>>>>>  @@ -3250,16 +3251,19 @@ static void
>>>>>>    __setparam_dl(struct task_struct *p, const struct sched_attr
>>>>>>  *attr) {
>>>>>>            struct sched_dl_entity *dl_se = &p->dl;
>>>>>>  +        struct hrtimer *timer = &dl_se->dl_timer;
>>>>>>  +
>>>>>>  + if (!hrtimer_active(timer) ||
>>>>>>  hrtimer_try_to_cancel(timer) != -1) {
>>>>>  Just for the sake of curiosity, why trying to cancel the timer
>>>>>  ("|| hrtimer_try_to_cancel(timer)") here? If it is active, cannot
>>>>>  we leave it active (without touching dl_throttled, dl_new and
>>>>>  dl_yielded)?
>>>>>
>>>>>  I mean: if I try to change the parameters of a task when it is
>>>>>  throttled, I'd like it to stay throttled until the end of the
>>>>>  reservation period... Or am I missing something?
>>>>  I think that when people change task's parameters, they want the
>>>>  kernel reacts on this immediately. For example, you want to kill
>>>>  throttled deadline task. You change parameters, but nothing happens.
>>>>  I think all developers had this use case when they were debugging
>>>>  deadline class.
>>>  I see... Different people have different requirements :)
>>>  My goal was to do something like adaptive scheduling (or scheduling
>>>  tasks with mode changes), so I did not want that changing the
>>>  scheduling parameters of a task affected the scheduling of the other
>>>  tasks... But if a task exits the throttled state when I change its
>>>  parameters, it might consume much more than the reserved CPU time.
>>>  Also, I suspect this kind of approach can be exploited by malicious
>>>  users: if I create a task with runtime 30ms and period 100ms, and I
>>>  change its scheduling parameters (to runtime=29ms and back) frequently
>>>  enough, I can consume much more than 30% of the CPU time...
>
> Well, I'm inclined to agree to Luca's viewpoint. We should not change
> parameters of a throttled task or we may affect other tasks.

Could you explain your viewpoint more? How does this affects other tasks?

As I understand, in __setparam_dl() we are sure that there is enough
dl_bw. In __sched_setscheduler() we call it after dl_overflow() check.

>>>  Anyway, I am fine with every patch that fixes the bug :)
>>  Deadline class requires root privileges. So, I do not see a problem
>>  here. Please, see __sched_setscheduler().
>>
>>  If in the future we allow non-privileged users to increase deadline,
>>  we will reflect that in __setparam_dl() too.
>
> I'd say it is better to implement the right behavior even for root, so
> that we will find it right when we'll grant access to non root users
> too. Also, even if root can do everything, we always try not to break
> guarantees that come with admission control (root or non root that is).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another SCHED_DEADLINE bug (with bisection and possible fix)
  2015-01-13  9:26             ` Kirill Tkhai
@ 2015-01-13 14:04               ` Peter Zijlstra
  2015-01-14 12:43                 ` Kirill Tkhai
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2015-01-13 14:04 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: Juri Lelli, Luca Abeni, Ingo Molnar, linux-kernel, juri.lelli

On Tue, Jan 12, 2015 at 12:26:40PM +0300, Kirill Tkhai wrote:
> > Well, I'm inclined to agree to Luca's viewpoint. We should not change
> > parameters of a throttled task or we may affect other tasks.
> 
> Could you explain your viewpoint more? How does this affects other tasks?

I agree with Juri and Luca here. Luca gave an example that DoS's
SCHED_DEADLINE.

In Luca's example we'd constantly call sched_setattr() on a task, which
results in it never throttling and that will affect other tasks, because
if we're running, they cannot be.

> As I understand, in __setparam_dl() we are sure that there is enough
> dl_bw. In __sched_setscheduler() we call it after dl_overflow() check.

Yes, _but_, as per the above. BW includes the sleep time, if we fail to
sleep its all pointless.

We got the runtime (before the throttle) under the promise that we'd go
sleep. When we change our parameters while being throttled we should
adjust the throttle time accordingly. If say we changed from 30% to 35%
we are allowed to sleep less. Now sleeping more than strictly required
is only detrimental to ourselves, so that is not (too) critical.

However, the other way around, if we change from 35% to 30% we should
now effectively sleep longer (for the same runtime we already consumed),
and we must do this, because admission control will instantly assume the
5% change in bandwidth to be available. If we sleep short, this BW is
not in fact available and we break our guarantees.

> >>>  Anyway, I am fine with every patch that fixes the bug :)
> >>  Deadline class requires root privileges. So, I do not see a problem
> >>  here. Please, see __sched_setscheduler().

Its not a priv issue, not doing this makes it impossible to use
SCHED_DEADLINE in the intended way, even for root.

Say we have a userspace task that evaluates and changes runtime
parameters for other tasks (basically what Luca is doing IIRC), and the
changes keep resetting the sleep time, the whole guarantee system comes
down, rendering the deadline system useless.

> >>  If in the future we allow non-privileged users to increase deadline,
> >>  we will reflect that in __setparam_dl() too.
> >
> > I'd say it is better to implement the right behavior even for root, so
> > that we will find it right when we'll grant access to non root users
> > too. Also, even if root can do everything, we always try not to break
> > guarantees that come with admission control (root or non root that is).

Just so.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another SCHED_DEADLINE bug (with bisection and possible fix)
  2015-01-13 14:04               ` Peter Zijlstra
@ 2015-01-14 12:43                 ` Kirill Tkhai
  2015-01-15 11:23                   ` Luca Abeni
  0 siblings, 1 reply; 26+ messages in thread
From: Kirill Tkhai @ 2015-01-14 12:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Luca Abeni, Ingo Molnar, linux-kernel, juri.lelli

13.01.2015, 17:04, "Peter Zijlstra" <peterz@infradead.org>:
> On Tue, Jan 12, 2015 at 12:26:40PM +0300, Kirill Tkhai wrote:
>>>  Well, I'm inclined to agree to Luca's viewpoint. We should not change
>>>  parameters of a throttled task or we may affect other tasks.
>>  Could you explain your viewpoint more? How does this affects other tasks?
>
> I agree with Juri and Luca here. Luca gave an example that DoS's
> SCHED_DEADLINE.
>
> In Luca's example we'd constantly call sched_setattr() on a task, which
> results in it never throttling and that will affect other tasks, because
> if we're running, they cannot be.
>>  As I understand, in __setparam_dl() we are sure that there is enough
>>  dl_bw. In __sched_setscheduler() we call it after dl_overflow() check.
>
> Yes, _but_, as per the above. BW includes the sleep time, if we fail to
> sleep its all pointless.
>
> We got the runtime (before the throttle) under the promise that we'd go
> sleep. When we change our parameters while being throttled we should
> adjust the throttle time accordingly. If say we changed from 30% to 35%
> we are allowed to sleep less. Now sleeping more than strictly required
> is only detrimental to ourselves, so that is not (too) critical.
>
> However, the other way around, if we change from 35% to 30% we should
> now effectively sleep longer (for the same runtime we already consumed),
> and we must do this, because admission control will instantly assume the
> 5% change in bandwidth to be available. If we sleep short, this BW is
> not in fact available and we break our guarantees.
>>>>>   Anyway, I am fine with every patch that fixes the bug :)
>>>>   Deadline class requires root privileges. So, I do not see a problem
>>>>   here. Please, see __sched_setscheduler().
>
> Its not a priv issue, not doing this makes it impossible to use
> SCHED_DEADLINE in the intended way, even for root.
>
> Say we have a userspace task that evaluates and changes runtime
> parameters for other tasks (basically what Luca is doing IIRC), and the
> changes keep resetting the sleep time, the whole guarantee system comes
> down, rendering the deadline system useless.
>>>>   If in the future we allow non-privileged users to increase deadline,
>>>>   we will reflect that in __setparam_dl() too.
>>>  I'd say it is better to implement the right behavior even for root, so
>>>  that we will find it right when we'll grant access to non root users
>>>  too. Also, even if root can do everything, we always try not to break
>>>  guarantees that come with admission control (root or non root that is).
>
> Just so.

How about something like this?

 include/linux/sched/deadline.h |  2 ++
 kernel/sched/core.c            | 33 +++++++++++++++++++++++++++++----
 kernel/sched/deadline.c        | 10 +---------
 kernel/sched/sched.h           |  1 -
 4 files changed, 32 insertions(+), 14 deletions(-)
diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
index 9d303b8..c70a7fc 100644
--- a/include/linux/sched/deadline.h
+++ b/include/linux/sched/deadline.h
@@ -21,4 +21,6 @@ static inline int dl_task(struct task_struct *p)
 	return dl_prio(p->prio);
 }
 
+extern enum hrtimer_restart dl_task_timer(struct hrtimer *timer);
+
 #endif /* _SCHED_DEADLINE_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c0accc0..edf9d91 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1809,6 +1809,8 @@ void __dl_clear_params(struct task_struct *p)
 {
 	struct sched_dl_entity *dl_se = &p->dl;
 
+	dl_se->dl_throttled = 0;
+	dl_se->dl_yielded = 0;
 	dl_se->dl_runtime = 0;
 	dl_se->dl_deadline = 0;
 	dl_se->dl_period = 0;
@@ -1840,6 +1842,7 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
 
 	RB_CLEAR_NODE(&p->dl.rb_node);
 	hrtimer_init(&p->dl.dl_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	p->dl.dl_timer.function = dl_task_timer;
 	__dl_clear_params(p);
 
 	INIT_LIST_HEAD(&p->rt.run_list);
@@ -3250,16 +3253,38 @@ static void
 __setparam_dl(struct task_struct *p, const struct sched_attr *attr)
 {
 	struct sched_dl_entity *dl_se = &p->dl;
+	struct hrtimer *timer = &dl_se->dl_timer;
+	struct rq *rq = task_rq(p);
+	int pending = 0;
+
+	if (hrtimer_is_queued(timer)) {
+		/*
+		 * Not need smp_rmb() here, synchronization points
+		 * are rq lock in our caller and in dl_task_timer().
+		 */
+		pending = 1;
+	} else if (hrtimer_callback_running(timer)) {
+		smp_rmb(); /* Pairs with rq lock in timer handler */
+
+		/* Has the timer handler already finished? */
+		if (dl_se->dl_throttled)
+			pending = 1; /* No */
+	}
+
+	if (!pending) {
+		BUG_ON(dl_se->dl_throttled);
+		dl_se->dl_new = 1;
+	} else {
+		dl_se->deadline = rq_clock(rq) + attr->sched_deadline;
+		dl_se->runtime = attr->sched_runtime;
+	}
 
-	init_dl_task_timer(dl_se);
+	dl_se->dl_yielded = 0;
 	dl_se->dl_runtime = attr->sched_runtime;
 	dl_se->dl_deadline = attr->sched_deadline;
 	dl_se->dl_period = attr->sched_period ?: dl_se->dl_deadline;
 	dl_se->flags = attr->sched_flags;
 	dl_se->dl_bw = to_ratio(dl_se->dl_period, dl_se->dl_runtime);
-	dl_se->dl_throttled = 0;
-	dl_se->dl_new = 1;
-	dl_se->dl_yielded = 0;
 }
 
 /*
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index b52092f..b457ca7 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -500,7 +500,7 @@ static int start_dl_timer(struct sched_dl_entity *dl_se, bool boosted)
  * updating (and the queueing back to dl_rq) will be done by the
  * next call to enqueue_task_dl().
  */
-static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
+enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
 {
 	struct sched_dl_entity *dl_se = container_of(timer,
 						     struct sched_dl_entity,
@@ -559,14 +559,6 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
 	return HRTIMER_NORESTART;
 }
 
-void init_dl_task_timer(struct sched_dl_entity *dl_se)
-{
-	struct hrtimer *timer = &dl_se->dl_timer;
-
-	hrtimer_init(timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-	timer->function = dl_task_timer;
-}
-
 static
 int dl_runtime_exceeded(struct rq *rq, struct sched_dl_entity *dl_se)
 {
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 9a2a45c..ad3a2f0 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1257,7 +1257,6 @@ extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 runtime
 
 extern struct dl_bandwidth def_dl_bandwidth;
 extern void init_dl_bandwidth(struct dl_bandwidth *dl_b, u64 period, u64 runtime);
-extern void init_dl_task_timer(struct sched_dl_entity *dl_se);
 
 unsigned long to_ratio(u64 period, u64 runtime);
 

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: Another SCHED_DEADLINE bug (with bisection and possible fix)
  2015-01-14 12:43                 ` Kirill Tkhai
@ 2015-01-15 11:23                   ` Luca Abeni
  2015-01-15 12:23                     ` Peter Zijlstra
  0 siblings, 1 reply; 26+ messages in thread
From: Luca Abeni @ 2015-01-15 11:23 UTC (permalink / raw)
  To: Kirill Tkhai, Peter Zijlstra
  Cc: Juri Lelli, Ingo Molnar, linux-kernel, juri.lelli

Hi Kirill,

On 01/14/2015 01:43 PM, Kirill Tkhai wrote:
[...]
>> Say we have a userspace task that evaluates and changes runtime
>> parameters for other tasks (basically what Luca is doing IIRC), and the
>> changes keep resetting the sleep time, the whole guarantee system comes
>> down, rendering the deadline system useless.
>>>>>    If in the future we allow non-privileged users to increase deadline,
>>>>>    we will reflect that in __setparam_dl() too.
>>>>   I'd say it is better to implement the right behavior even for root, so
>>>>   that we will find it right when we'll grant access to non root users
>>>>   too. Also, even if root can do everything, we always try not to break
>>>>   guarantees that come with admission control (root or non root that is).
>>
>> Just so.
>
> How about something like this?
There are some parts of the patch that I do not understand (for example:
if I understand well, if the task is not throttled you set dl_new to 1...
And if it is throttled you change its current runtime and scheduling deadline...
Why?)
Anyway, I tested the patch and I confirm that it fixes the hang I originally
reported.


			Luca
>
>   include/linux/sched/deadline.h |  2 ++
>   kernel/sched/core.c            | 33 +++++++++++++++++++++++++++++----
>   kernel/sched/deadline.c        | 10 +---------
>   kernel/sched/sched.h           |  1 -
>   4 files changed, 32 insertions(+), 14 deletions(-)
> diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
> index 9d303b8..c70a7fc 100644
> --- a/include/linux/sched/deadline.h
> +++ b/include/linux/sched/deadline.h
> @@ -21,4 +21,6 @@ static inline int dl_task(struct task_struct *p)
>   	return dl_prio(p->prio);
>   }
>
> +extern enum hrtimer_restart dl_task_timer(struct hrtimer *timer);
> +
>   #endif /* _SCHED_DEADLINE_H */
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index c0accc0..edf9d91 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1809,6 +1809,8 @@ void __dl_clear_params(struct task_struct *p)
>   {
>   	struct sched_dl_entity *dl_se = &p->dl;
>
> +	dl_se->dl_throttled = 0;
> +	dl_se->dl_yielded = 0;
>   	dl_se->dl_runtime = 0;
>   	dl_se->dl_deadline = 0;
>   	dl_se->dl_period = 0;
> @@ -1840,6 +1842,7 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
>
>   	RB_CLEAR_NODE(&p->dl.rb_node);
>   	hrtimer_init(&p->dl.dl_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +	p->dl.dl_timer.function = dl_task_timer;
>   	__dl_clear_params(p);
>
>   	INIT_LIST_HEAD(&p->rt.run_list);
> @@ -3250,16 +3253,38 @@ static void
>   __setparam_dl(struct task_struct *p, const struct sched_attr *attr)
>   {
>   	struct sched_dl_entity *dl_se = &p->dl;
> +	struct hrtimer *timer = &dl_se->dl_timer;
> +	struct rq *rq = task_rq(p);
> +	int pending = 0;
> +
> +	if (hrtimer_is_queued(timer)) {
> +		/*
> +		 * Not need smp_rmb() here, synchronization points
> +		 * are rq lock in our caller and in dl_task_timer().
> +		 */
> +		pending = 1;
> +	} else if (hrtimer_callback_running(timer)) {
> +		smp_rmb(); /* Pairs with rq lock in timer handler */
> +
> +		/* Has the timer handler already finished? */
> +		if (dl_se->dl_throttled)
> +			pending = 1; /* No */
> +	}
> +
> +	if (!pending) {
> +		BUG_ON(dl_se->dl_throttled);
> +		dl_se->dl_new = 1;
> +	} else {
> +		dl_se->deadline = rq_clock(rq) + attr->sched_deadline;
> +		dl_se->runtime = attr->sched_runtime;
> +	}
>
> -	init_dl_task_timer(dl_se);
> +	dl_se->dl_yielded = 0;
>   	dl_se->dl_runtime = attr->sched_runtime;
>   	dl_se->dl_deadline = attr->sched_deadline;
>   	dl_se->dl_period = attr->sched_period ?: dl_se->dl_deadline;
>   	dl_se->flags = attr->sched_flags;
>   	dl_se->dl_bw = to_ratio(dl_se->dl_period, dl_se->dl_runtime);
> -	dl_se->dl_throttled = 0;
> -	dl_se->dl_new = 1;
> -	dl_se->dl_yielded = 0;
>   }
>
>   /*
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index b52092f..b457ca7 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -500,7 +500,7 @@ static int start_dl_timer(struct sched_dl_entity *dl_se, bool boosted)
>    * updating (and the queueing back to dl_rq) will be done by the
>    * next call to enqueue_task_dl().
>    */
> -static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
> +enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
>   {
>   	struct sched_dl_entity *dl_se = container_of(timer,
>   						     struct sched_dl_entity,
> @@ -559,14 +559,6 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
>   	return HRTIMER_NORESTART;
>   }
>
> -void init_dl_task_timer(struct sched_dl_entity *dl_se)
> -{
> -	struct hrtimer *timer = &dl_se->dl_timer;
> -
> -	hrtimer_init(timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> -	timer->function = dl_task_timer;
> -}
> -
>   static
>   int dl_runtime_exceeded(struct rq *rq, struct sched_dl_entity *dl_se)
>   {
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 9a2a45c..ad3a2f0 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1257,7 +1257,6 @@ extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 runtime
>
>   extern struct dl_bandwidth def_dl_bandwidth;
>   extern void init_dl_bandwidth(struct dl_bandwidth *dl_b, u64 period, u64 runtime);
> -extern void init_dl_task_timer(struct sched_dl_entity *dl_se);
>
>   unsigned long to_ratio(u64 period, u64 runtime);
>
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another SCHED_DEADLINE bug (with bisection and possible fix)
  2015-01-15 11:23                   ` Luca Abeni
@ 2015-01-15 12:23                     ` Peter Zijlstra
  2015-01-15 13:35                       ` Luca Abeni
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2015-01-15 12:23 UTC (permalink / raw)
  To: Luca Abeni
  Cc: Kirill Tkhai, Juri Lelli, Ingo Molnar, linux-kernel, juri.lelli

On Thu, Jan 15, 2015 at 12:23:43PM +0100, Luca Abeni wrote:
> There are some parts of the patch that I do not understand (for example:
> if I understand well, if the task is not throttled you set dl_new to 1...
> And if it is throttled you change its current runtime and scheduling deadline...
> Why?)
> Anyway, I tested the patch and I confirm that it fixes the hang I originally
> reported.

I'll go look at the patch in a wee bit. But Juri, Luca, could you
describe the correct behaviour?

>From what I understand we should either modify the tasks run/sleep stats
when we change its parameters or we should schedule a delayed release of
the bandwidth delta (when it reaches its 0-lag point, if thats in the
future).

Doing neither will allow one to game the reservation system afaict.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another SCHED_DEADLINE bug (with bisection and possible fix)
  2015-01-15 12:23                     ` Peter Zijlstra
@ 2015-01-15 13:35                       ` Luca Abeni
  2015-01-28 14:08                         ` Peter Zijlstra
  0 siblings, 1 reply; 26+ messages in thread
From: Luca Abeni @ 2015-01-15 13:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kirill Tkhai, Juri Lelli, Ingo Molnar, linux-kernel, juri.lelli

Hi Peter,

On 01/15/2015 01:23 PM, Peter Zijlstra wrote:
> On Thu, Jan 15, 2015 at 12:23:43PM +0100, Luca Abeni wrote:
>> There are some parts of the patch that I do not understand (for example:
>> if I understand well, if the task is not throttled you set dl_new to 1...
>> And if it is throttled you change its current runtime and scheduling deadline...
>> Why?)
>> Anyway, I tested the patch and I confirm that it fixes the hang I originally
>> reported.
>
> I'll go look at the patch in a wee bit. But Juri, Luca, could you
> describe the correct behaviour?
>
>>From what I understand we should either modify the tasks run/sleep stats
> when we change its parameters or we should schedule a delayed release of
> the bandwidth delta (when it reaches its 0-lag point, if thats in the
> future).
I suspect the correct behaviour can be difficult to implement:
- When a SCHED_DEADLINE task ends (or changes its scheduling policy to
   something different), its bandwidth cannot be released immediately,
   but should be released at the "0-lag time" (which reminds me about the
   GRUB patches... I had to implement a similar behaviour in those patches :)
- The same applies when the task changes its scheduling parameters decreasing
   its bandwidth. In this case, we also need to update the current runtime (if
   it is larger than the new runtime, set it to the new maximum value - I think
   this is the safest thing to do)
- When a task changes its parameters to increase its bandwidth, be do not
   have such problems.

As far as I can see, if we apply the runtime / deadline changes starting from
the next reservation period we are safe (because the "0-lag time" is always
smaller than the current scheduling deadline).
This might cause some transient overloads (because if I change the parameters
of a task at time t, the update takes action a little bit later - at the next
scheduling deadline), but guarantees that a task never consumes more than
expected (for example: if a task continuously changes its bandwidth between
0.4 and 0.3, it will never consume more than 0.4. I suspect that if we
immediately update dl_se->deadline and dl_se->runtime a task can arrive to
consume much more CPU time).


				Luca

>
> Doing neither will allow one to game the reservation system afaict.
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another SCHED_DEADLINE bug (with bisection and possible fix)
  2015-01-15 13:35                       ` Luca Abeni
@ 2015-01-28 14:08                         ` Peter Zijlstra
  2015-01-28 14:40                           ` Luca Abeni
                                             ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Peter Zijlstra @ 2015-01-28 14:08 UTC (permalink / raw)
  To: Luca Abeni
  Cc: Kirill Tkhai, Juri Lelli, Ingo Molnar, linux-kernel, juri.lelli

On Thu, Jan 15, 2015 at 02:35:46PM +0100, Luca Abeni wrote:

> >>From what I understand we should either modify the tasks run/sleep stats
> >when we change its parameters or we should schedule a delayed release of
> >the bandwidth delta (when it reaches its 0-lag point, if thats in the
> >future).

> I suspect the correct behaviour can be difficult to implement:
> - When a SCHED_DEADLINE task ends (or changes its scheduling policy to
>   something different), its bandwidth cannot be released immediately,
>   but should be released at the "0-lag time" (which reminds me about the
>   GRUB patches... I had to implement a similar behaviour in those patches :)
> - The same applies when the task changes its scheduling parameters decreasing
>   its bandwidth. In this case, we also need to update the current runtime (if
>   it is larger than the new runtime, set it to the new maximum value - I think
>   this is the safest thing to do)
> - When a task changes its parameters to increase its bandwidth, be do not
>   have such problems.
> 
> As far as I can see, if we apply the runtime / deadline changes starting from
> the next reservation period we are safe (because the "0-lag time" is always
> smaller than the current scheduling deadline).
> This might cause some transient overloads (because if I change the parameters
> of a task at time t, the update takes action a little bit later - at the next
> scheduling deadline), but guarantees that a task never consumes more than
> expected (for example: if a task continuously changes its bandwidth between
> 0.4 and 0.3, it will never consume more than 0.4. I suspect that if we
> immediately update dl_se->deadline and dl_se->runtime a task can arrive to
> consume much more CPU time).


OK, how about something like this; it seems to survive your Bug-Test for
at least 50 cycles.

---
 kernel/sched/core.c     | 33 ++++++++++++++++++++++++++++-----
 kernel/sched/deadline.c |  3 ++-
 2 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ade2958a9197..d787d6553d72 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1816,6 +1816,10 @@ void __dl_clear_params(struct task_struct *p)
 	dl_se->dl_period = 0;
 	dl_se->flags = 0;
 	dl_se->dl_bw = 0;
+
+	dl_se->dl_throttled = 0;
+	dl_se->dl_new = 1;
+	dl_se->dl_yielded = 0;
 }
 
 /*
@@ -1844,7 +1848,7 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
 #endif
 
 	RB_CLEAR_NODE(&p->dl.rb_node);
-	hrtimer_init(&p->dl.dl_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	init_dl_task_timer(&p->dl);
 	__dl_clear_params(p);
 
 	INIT_LIST_HEAD(&p->rt.run_list);
@@ -2054,6 +2058,9 @@ static inline int dl_bw_cpus(int i)
  * allocated bandwidth to reflect the new situation.
  *
  * This function is called while holding p's rq->lock.
+ *
+ * XXX we should delay bw change until the task's 0-lag point, see
+ * __setparam_dl().
  */
 static int dl_overflow(struct task_struct *p, int policy,
 		       const struct sched_attr *attr)
@@ -3258,15 +3265,31 @@ __setparam_dl(struct task_struct *p, const struct sched_attr *attr)
 {
 	struct sched_dl_entity *dl_se = &p->dl;
 
-	init_dl_task_timer(dl_se);
 	dl_se->dl_runtime = attr->sched_runtime;
 	dl_se->dl_deadline = attr->sched_deadline;
 	dl_se->dl_period = attr->sched_period ?: dl_se->dl_deadline;
 	dl_se->flags = attr->sched_flags;
 	dl_se->dl_bw = to_ratio(dl_se->dl_period, dl_se->dl_runtime);
-	dl_se->dl_throttled = 0;
-	dl_se->dl_new = 1;
-	dl_se->dl_yielded = 0;
+
+	/*
+	 * Changing the parameters of a task is 'tricky' and we're not doing
+	 * the correct thing -- also see task_dead_dl() and switched_from_dl().
+	 *
+	 * What we SHOULD do is delay the bandwidth release until the 0-lag
+	 * point. This would include retaining the task_struct until that time
+	 * and change dl_overflow() to not immediately decrement the current
+	 * amount.
+	 *
+	 * Instead we retain the current runtime/deadline and let the new
+	 * parameters take effect after the current reservation period lapses.
+	 * This is safe (albeit pessimistic) because the 0-lag point is always
+	 * before the current scheduling deadline.
+	 *
+	 * We can still have temporary overloads because we do not delay the
+	 * change in bandwidth until that time; so admission control is
+	 * not on the safe side. It does however guarantee tasks will never
+	 * consume more than promised.
+	 */
 }
 
 /*
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index b52092f2636d..726470d47f87 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1094,6 +1094,7 @@ static void task_dead_dl(struct task_struct *p)
 	 * Since we are TASK_DEAD we won't slip out of the domain!
 	 */
 	raw_spin_lock_irq(&dl_b->lock);
+	/* XXX we should retain the bw until 0-lag */
 	dl_b->total_bw -= p->dl.dl_bw;
 	raw_spin_unlock_irq(&dl_b->lock);
 
@@ -1614,8 +1615,8 @@ static void cancel_dl_timer(struct rq *rq, struct task_struct *p)
 
 static void switched_from_dl(struct rq *rq, struct task_struct *p)
 {
+	/* XXX we should retain the bw until 0-lag */
 	cancel_dl_timer(rq, p);
-
 	__dl_clear_params(p);
 
 	/*

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: Another SCHED_DEADLINE bug (with bisection and possible fix)
  2015-01-28 14:08                         ` Peter Zijlstra
@ 2015-01-28 14:40                           ` Luca Abeni
  2015-01-30 10:25                           ` Luca Abeni
                                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 26+ messages in thread
From: Luca Abeni @ 2015-01-28 14:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kirill Tkhai, Juri Lelli, Ingo Molnar, linux-kernel, juri.lelli

Hi Peter,

On 01/28/2015 03:08 PM, Peter Zijlstra wrote:
> On Thu, Jan 15, 2015 at 02:35:46PM +0100, Luca Abeni wrote:
>
>>> >From what I understand we should either modify the tasks run/sleep stats
>>> when we change its parameters or we should schedule a delayed release of
>>> the bandwidth delta (when it reaches its 0-lag point, if thats in the
>>> future).
>
>> I suspect the correct behaviour can be difficult to implement:
>> - When a SCHED_DEADLINE task ends (or changes its scheduling policy to
>>    something different), its bandwidth cannot be released immediately,
>>    but should be released at the "0-lag time" (which reminds me about the
>>    GRUB patches... I had to implement a similar behaviour in those patches :)
>> - The same applies when the task changes its scheduling parameters decreasing
>>    its bandwidth. In this case, we also need to update the current runtime (if
>>    it is larger than the new runtime, set it to the new maximum value - I think
>>    this is the safest thing to do)
>> - When a task changes its parameters to increase its bandwidth, be do not
>>    have such problems.
>>
>> As far as I can see, if we apply the runtime / deadline changes starting from
>> the next reservation period we are safe (because the "0-lag time" is always
>> smaller than the current scheduling deadline).
>> This might cause some transient overloads (because if I change the parameters
>> of a task at time t, the update takes action a little bit later - at the next
>> scheduling deadline), but guarantees that a task never consumes more than
>> expected (for example: if a task continuously changes its bandwidth between
>> 0.4 and 0.3, it will never consume more than 0.4. I suspect that if we
>> immediately update dl_se->deadline and dl_se->runtime a task can arrive to
>> consume much more CPU time).
>
>
> OK, how about something like this; it seems to survive your Bug-Test for
> at least 50 cycles.
Thanks; tomorrow I'll test this patch and I'll let you know how it works in
my setup (but looking at it I am pretty sure it will solve my issue).


			Thanks,
				Luca

>
> ---
>   kernel/sched/core.c     | 33 ++++++++++++++++++++++++++++-----
>   kernel/sched/deadline.c |  3 ++-
>   2 files changed, 30 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index ade2958a9197..d787d6553d72 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1816,6 +1816,10 @@ void __dl_clear_params(struct task_struct *p)
>   	dl_se->dl_period = 0;
>   	dl_se->flags = 0;
>   	dl_se->dl_bw = 0;
> +
> +	dl_se->dl_throttled = 0;
> +	dl_se->dl_new = 1;
> +	dl_se->dl_yielded = 0;
>   }
>
>   /*
> @@ -1844,7 +1848,7 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
>   #endif
>
>   	RB_CLEAR_NODE(&p->dl.rb_node);
> -	hrtimer_init(&p->dl.dl_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +	init_dl_task_timer(&p->dl);
>   	__dl_clear_params(p);
>
>   	INIT_LIST_HEAD(&p->rt.run_list);
> @@ -2054,6 +2058,9 @@ static inline int dl_bw_cpus(int i)
>    * allocated bandwidth to reflect the new situation.
>    *
>    * This function is called while holding p's rq->lock.
> + *
> + * XXX we should delay bw change until the task's 0-lag point, see
> + * __setparam_dl().
>    */
>   static int dl_overflow(struct task_struct *p, int policy,
>   		       const struct sched_attr *attr)
> @@ -3258,15 +3265,31 @@ __setparam_dl(struct task_struct *p, const struct sched_attr *attr)
>   {
>   	struct sched_dl_entity *dl_se = &p->dl;
>
> -	init_dl_task_timer(dl_se);
>   	dl_se->dl_runtime = attr->sched_runtime;
>   	dl_se->dl_deadline = attr->sched_deadline;
>   	dl_se->dl_period = attr->sched_period ?: dl_se->dl_deadline;
>   	dl_se->flags = attr->sched_flags;
>   	dl_se->dl_bw = to_ratio(dl_se->dl_period, dl_se->dl_runtime);
> -	dl_se->dl_throttled = 0;
> -	dl_se->dl_new = 1;
> -	dl_se->dl_yielded = 0;
> +
> +	/*
> +	 * Changing the parameters of a task is 'tricky' and we're not doing
> +	 * the correct thing -- also see task_dead_dl() and switched_from_dl().
> +	 *
> +	 * What we SHOULD do is delay the bandwidth release until the 0-lag
> +	 * point. This would include retaining the task_struct until that time
> +	 * and change dl_overflow() to not immediately decrement the current
> +	 * amount.
> +	 *
> +	 * Instead we retain the current runtime/deadline and let the new
> +	 * parameters take effect after the current reservation period lapses.
> +	 * This is safe (albeit pessimistic) because the 0-lag point is always
> +	 * before the current scheduling deadline.
> +	 *
> +	 * We can still have temporary overloads because we do not delay the
> +	 * change in bandwidth until that time; so admission control is
> +	 * not on the safe side. It does however guarantee tasks will never
> +	 * consume more than promised.
> +	 */
>   }
>
>   /*
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index b52092f2636d..726470d47f87 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1094,6 +1094,7 @@ static void task_dead_dl(struct task_struct *p)
>   	 * Since we are TASK_DEAD we won't slip out of the domain!
>   	 */
>   	raw_spin_lock_irq(&dl_b->lock);
> +	/* XXX we should retain the bw until 0-lag */
>   	dl_b->total_bw -= p->dl.dl_bw;
>   	raw_spin_unlock_irq(&dl_b->lock);
>
> @@ -1614,8 +1615,8 @@ static void cancel_dl_timer(struct rq *rq, struct task_struct *p)
>
>   static void switched_from_dl(struct rq *rq, struct task_struct *p)
>   {
> +	/* XXX we should retain the bw until 0-lag */
>   	cancel_dl_timer(rq, p);
> -
>   	__dl_clear_params(p);
>
>   	/*
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another SCHED_DEADLINE bug (with bisection and possible fix)
  2015-01-28 14:08                         ` Peter Zijlstra
  2015-01-28 14:40                           ` Luca Abeni
@ 2015-01-30 10:25                           ` Luca Abeni
  2015-01-30 10:35                           ` Juri Lelli
  2015-02-04 14:34                           ` [tip:sched/core] sched/deadline: Fix deadline parameter modification handling tip-bot for Peter Zijlstra
  3 siblings, 0 replies; 26+ messages in thread
From: Luca Abeni @ 2015-01-30 10:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kirill Tkhai, Juri Lelli, Ingo Molnar, linux-kernel, juri.lelli

Hi Peter,

On 01/28/2015 03:08 PM, Peter Zijlstra wrote:
> On Thu, Jan 15, 2015 at 02:35:46PM +0100, Luca Abeni wrote:
>
>>> >From what I understand we should either modify the tasks run/sleep stats
>>> when we change its parameters or we should schedule a delayed release of
>>> the bandwidth delta (when it reaches its 0-lag point, if thats in the
>>> future).
>
>> I suspect the correct behaviour can be difficult to implement:
>> - When a SCHED_DEADLINE task ends (or changes its scheduling policy to
>>    something different), its bandwidth cannot be released immediately,
>>    but should be released at the "0-lag time" (which reminds me about the
>>    GRUB patches... I had to implement a similar behaviour in those patches :)
>> - The same applies when the task changes its scheduling parameters decreasing
>>    its bandwidth. In this case, we also need to update the current runtime (if
>>    it is larger than the new runtime, set it to the new maximum value - I think
>>    this is the safest thing to do)
>> - When a task changes its parameters to increase its bandwidth, be do not
>>    have such problems.
>>
>> As far as I can see, if we apply the runtime / deadline changes starting from
>> the next reservation period we are safe (because the "0-lag time" is always
>> smaller than the current scheduling deadline).
>> This might cause some transient overloads (because if I change the parameters
>> of a task at time t, the update takes action a little bit later - at the next
>> scheduling deadline), but guarantees that a task never consumes more than
>> expected (for example: if a task continuously changes its bandwidth between
>> 0.4 and 0.3, it will never consume more than 0.4. I suspect that if we
>> immediately update dl_se->deadline and dl_se->runtime a task can arrive to
>> consume much more CPU time).
>
>
> OK, how about something like this; it seems to survive your Bug-Test for
> at least 50 cycles.
I tested your patch for 1 day, and it works fine for my testcases (no crashes,
hangs, or strange behaviours).
Also, the comments look ok to me.


			Thanks,
				Luca

>
> ---
>   kernel/sched/core.c     | 33 ++++++++++++++++++++++++++++-----
>   kernel/sched/deadline.c |  3 ++-
>   2 files changed, 30 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index ade2958a9197..d787d6553d72 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1816,6 +1816,10 @@ void __dl_clear_params(struct task_struct *p)
>   	dl_se->dl_period = 0;
>   	dl_se->flags = 0;
>   	dl_se->dl_bw = 0;
> +
> +	dl_se->dl_throttled = 0;
> +	dl_se->dl_new = 1;
> +	dl_se->dl_yielded = 0;
>   }
>
>   /*
> @@ -1844,7 +1848,7 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
>   #endif
>
>   	RB_CLEAR_NODE(&p->dl.rb_node);
> -	hrtimer_init(&p->dl.dl_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +	init_dl_task_timer(&p->dl);
>   	__dl_clear_params(p);
>
>   	INIT_LIST_HEAD(&p->rt.run_list);
> @@ -2054,6 +2058,9 @@ static inline int dl_bw_cpus(int i)
>    * allocated bandwidth to reflect the new situation.
>    *
>    * This function is called while holding p's rq->lock.
> + *
> + * XXX we should delay bw change until the task's 0-lag point, see
> + * __setparam_dl().
>    */
>   static int dl_overflow(struct task_struct *p, int policy,
>   		       const struct sched_attr *attr)
> @@ -3258,15 +3265,31 @@ __setparam_dl(struct task_struct *p, const struct sched_attr *attr)
>   {
>   	struct sched_dl_entity *dl_se = &p->dl;
>
> -	init_dl_task_timer(dl_se);
>   	dl_se->dl_runtime = attr->sched_runtime;
>   	dl_se->dl_deadline = attr->sched_deadline;
>   	dl_se->dl_period = attr->sched_period ?: dl_se->dl_deadline;
>   	dl_se->flags = attr->sched_flags;
>   	dl_se->dl_bw = to_ratio(dl_se->dl_period, dl_se->dl_runtime);
> -	dl_se->dl_throttled = 0;
> -	dl_se->dl_new = 1;
> -	dl_se->dl_yielded = 0;
> +
> +	/*
> +	 * Changing the parameters of a task is 'tricky' and we're not doing
> +	 * the correct thing -- also see task_dead_dl() and switched_from_dl().
> +	 *
> +	 * What we SHOULD do is delay the bandwidth release until the 0-lag
> +	 * point. This would include retaining the task_struct until that time
> +	 * and change dl_overflow() to not immediately decrement the current
> +	 * amount.
> +	 *
> +	 * Instead we retain the current runtime/deadline and let the new
> +	 * parameters take effect after the current reservation period lapses.
> +	 * This is safe (albeit pessimistic) because the 0-lag point is always
> +	 * before the current scheduling deadline.
> +	 *
> +	 * We can still have temporary overloads because we do not delay the
> +	 * change in bandwidth until that time; so admission control is
> +	 * not on the safe side. It does however guarantee tasks will never
> +	 * consume more than promised.
> +	 */
>   }
>
>   /*
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index b52092f2636d..726470d47f87 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1094,6 +1094,7 @@ static void task_dead_dl(struct task_struct *p)
>   	 * Since we are TASK_DEAD we won't slip out of the domain!
>   	 */
>   	raw_spin_lock_irq(&dl_b->lock);
> +	/* XXX we should retain the bw until 0-lag */
>   	dl_b->total_bw -= p->dl.dl_bw;
>   	raw_spin_unlock_irq(&dl_b->lock);
>
> @@ -1614,8 +1615,8 @@ static void cancel_dl_timer(struct rq *rq, struct task_struct *p)
>
>   static void switched_from_dl(struct rq *rq, struct task_struct *p)
>   {
> +	/* XXX we should retain the bw until 0-lag */
>   	cancel_dl_timer(rq, p);
> -
>   	__dl_clear_params(p);
>
>   	/*
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another SCHED_DEADLINE bug (with bisection and possible fix)
  2015-01-28 14:08                         ` Peter Zijlstra
  2015-01-28 14:40                           ` Luca Abeni
  2015-01-30 10:25                           ` Luca Abeni
@ 2015-01-30 10:35                           ` Juri Lelli
  2015-01-31  9:56                             ` Peter Zijlstra
  2015-02-04 14:34                           ` [tip:sched/core] sched/deadline: Fix deadline parameter modification handling tip-bot for Peter Zijlstra
  3 siblings, 1 reply; 26+ messages in thread
From: Juri Lelli @ 2015-01-30 10:35 UTC (permalink / raw)
  To: Peter Zijlstra, Luca Abeni
  Cc: Kirill Tkhai, Juri Lelli, Ingo Molnar, linux-kernel

Hi Peter,

On 28/01/2015 14:08, Peter Zijlstra wrote:
> On Thu, Jan 15, 2015 at 02:35:46PM +0100, Luca Abeni wrote:
> 
>>> >From what I understand we should either modify the tasks run/sleep stats
>>> when we change its parameters or we should schedule a delayed release of
>>> the bandwidth delta (when it reaches its 0-lag point, if thats in the
>>> future).
> 
>> I suspect the correct behaviour can be difficult to implement:
>> - When a SCHED_DEADLINE task ends (or changes its scheduling policy to
>>   something different), its bandwidth cannot be released immediately,
>>   but should be released at the "0-lag time" (which reminds me about the
>>   GRUB patches... I had to implement a similar behaviour in those patches :)
>> - The same applies when the task changes its scheduling parameters decreasing
>>   its bandwidth. In this case, we also need to update the current runtime (if
>>   it is larger than the new runtime, set it to the new maximum value - I think
>>   this is the safest thing to do)
>> - When a task changes its parameters to increase its bandwidth, be do not
>>   have such problems.
>>
>> As far as I can see, if we apply the runtime / deadline changes starting from
>> the next reservation period we are safe (because the "0-lag time" is always
>> smaller than the current scheduling deadline).
>> This might cause some transient overloads (because if I change the parameters
>> of a task at time t, the update takes action a little bit later - at the next
>> scheduling deadline), but guarantees that a task never consumes more than
>> expected (for example: if a task continuously changes its bandwidth between
>> 0.4 and 0.3, it will never consume more than 0.4. I suspect that if we
>> immediately update dl_se->deadline and dl_se->runtime a task can arrive to
>> consume much more CPU time).
> 
> 
> OK, how about something like this; it seems to survive your Bug-Test for
> at least 50 cycles.
> 

Thanks for the patch!

> ---
>  kernel/sched/core.c     | 33 ++++++++++++++++++++++++++++-----
>  kernel/sched/deadline.c |  3 ++-
>  2 files changed, 30 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index ade2958a9197..d787d6553d72 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1816,6 +1816,10 @@ void __dl_clear_params(struct task_struct *p)
>  	dl_se->dl_period = 0;
>  	dl_se->flags = 0;
>  	dl_se->dl_bw = 0;
> +
> +	dl_se->dl_throttled = 0;
> +	dl_se->dl_new = 1;
> +	dl_se->dl_yielded = 0;
>  }
>  
>  /*
> @@ -1844,7 +1848,7 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
>  #endif
>  
>  	RB_CLEAR_NODE(&p->dl.rb_node);
> -	hrtimer_init(&p->dl.dl_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +	init_dl_task_timer(&p->dl);
>  	__dl_clear_params(p);
>  
>  	INIT_LIST_HEAD(&p->rt.run_list);
> @@ -2054,6 +2058,9 @@ static inline int dl_bw_cpus(int i)
>   * allocated bandwidth to reflect the new situation.
>   *
>   * This function is called while holding p's rq->lock.
> + *
> + * XXX we should delay bw change until the task's 0-lag point, see
> + * __setparam_dl().
>   */
>  static int dl_overflow(struct task_struct *p, int policy,
>  		       const struct sched_attr *attr)
> @@ -3258,15 +3265,31 @@ __setparam_dl(struct task_struct *p, const struct sched_attr *attr)
>  {
>  	struct sched_dl_entity *dl_se = &p->dl;
>  
> -	init_dl_task_timer(dl_se);
>  	dl_se->dl_runtime = attr->sched_runtime;
>  	dl_se->dl_deadline = attr->sched_deadline;
>  	dl_se->dl_period = attr->sched_period ?: dl_se->dl_deadline;
>  	dl_se->flags = attr->sched_flags;
>  	dl_se->dl_bw = to_ratio(dl_se->dl_period, dl_se->dl_runtime);
> -	dl_se->dl_throttled = 0;
> -	dl_se->dl_new = 1;
> -	dl_se->dl_yielded = 0;
> +
> +	/*
> +	 * Changing the parameters of a task is 'tricky' and we're not doing
> +	 * the correct thing -- also see task_dead_dl() and switched_from_dl().
> +	 *
> +	 * What we SHOULD do is delay the bandwidth release until the 0-lag
> +	 * point. This would include retaining the task_struct until that time
> +	 * and change dl_overflow() to not immediately decrement the current
> +	 * amount.
> +	 *
> +	 * Instead we retain the current runtime/deadline and let the new
> +	 * parameters take effect after the current reservation period lapses.
> +	 * This is safe (albeit pessimistic) because the 0-lag point is always
> +	 * before the current scheduling deadline.
> +	 *
> +	 * We can still have temporary overloads because we do not delay the
> +	 * change in bandwidth until that time; so admission control is
> +	 * not on the safe side. It does however guarantee tasks will never
> +	 * consume more than promised.

So, we do the safe thing only in case of throttling. I guess is more than
ok for now, while we hopefully find some spare cycle to implement a
complete solution :/.

Thanks a lot!

Best,

- Juri

> +	 */
>  }
>  
>  /*
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index b52092f2636d..726470d47f87 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1094,6 +1094,7 @@ static void task_dead_dl(struct task_struct *p)
>  	 * Since we are TASK_DEAD we won't slip out of the domain!
>  	 */
>  	raw_spin_lock_irq(&dl_b->lock);
> +	/* XXX we should retain the bw until 0-lag */
>  	dl_b->total_bw -= p->dl.dl_bw;
>  	raw_spin_unlock_irq(&dl_b->lock);
>  
> @@ -1614,8 +1615,8 @@ static void cancel_dl_timer(struct rq *rq, struct task_struct *p)
>  
>  static void switched_from_dl(struct rq *rq, struct task_struct *p)
>  {
> +	/* XXX we should retain the bw until 0-lag */
>  	cancel_dl_timer(rq, p);
> -
>  	__dl_clear_params(p);
>  
>  	/*
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another SCHED_DEADLINE bug (with bisection and possible fix)
  2015-01-30 10:35                           ` Juri Lelli
@ 2015-01-31  9:56                             ` Peter Zijlstra
  2015-02-02 11:31                               ` Juri Lelli
  2015-02-02 13:57                               ` Luca Abeni
  0 siblings, 2 replies; 26+ messages in thread
From: Peter Zijlstra @ 2015-01-31  9:56 UTC (permalink / raw)
  To: Juri Lelli; +Cc: Luca Abeni, Kirill Tkhai, Ingo Molnar, linux-kernel

On Fri, Jan 30, 2015 at 10:35:02AM +0000, Juri Lelli wrote:
> So, we do the safe thing only in case of throttling.

No, even for the !throttle aka running tasks. We only use
dl_{runtime,deadline,period} for replenishment, until that time we
observe the old runtime/deadline set by the previous replenishment.

> I guess is more than
> ok for now, while we hopefully find some spare cycle to implement a
> complete solution :/.

Yeah, I bet the fun part is computing the 0-lag across the entire root
domain, per-cpu 0-lag isn't correct afaict.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another SCHED_DEADLINE bug (with bisection and possible fix)
  2015-01-31  9:56                             ` Peter Zijlstra
@ 2015-02-02 11:31                               ` Juri Lelli
  2015-02-02 13:57                               ` Luca Abeni
  1 sibling, 0 replies; 26+ messages in thread
From: Juri Lelli @ 2015-02-02 11:31 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Luca Abeni, Kirill Tkhai, Ingo Molnar, linux-kernel

On 31/01/2015 09:56, Peter Zijlstra wrote:
> On Fri, Jan 30, 2015 at 10:35:02AM +0000, Juri Lelli wrote:
>> So, we do the safe thing only in case of throttling.
> 
> No, even for the !throttle aka running tasks. We only use
> dl_{runtime,deadline,period} for replenishment, until that time we
> observe the old runtime/deadline set by the previous replenishment.
> 

Oh, right. We set dl_new in __dl_clear_params(), nice.

Thanks,

- Juri

>> I guess is more than
>> ok for now, while we hopefully find some spare cycle to implement a
>> complete solution :/.
> 
> Yeah, I bet the fun part is computing the 0-lag across the entire root
> domain, per-cpu 0-lag isn't correct afaict.
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another SCHED_DEADLINE bug (with bisection and possible fix)
  2015-01-31  9:56                             ` Peter Zijlstra
  2015-02-02 11:31                               ` Juri Lelli
@ 2015-02-02 13:57                               ` Luca Abeni
  1 sibling, 0 replies; 26+ messages in thread
From: Luca Abeni @ 2015-02-02 13:57 UTC (permalink / raw)
  To: Peter Zijlstra, Juri Lelli; +Cc: Kirill Tkhai, Ingo Molnar, linux-kernel

Hi Peter,

On 01/31/2015 10:56 AM, Peter Zijlstra wrote:
> On Fri, Jan 30, 2015 at 10:35:02AM +0000, Juri Lelli wrote:
>> So, we do the safe thing only in case of throttling.
>
> No, even for the !throttle aka running tasks. We only use
> dl_{runtime,deadline,period} for replenishment, until that time we
> observe the old runtime/deadline set by the previous replenishment.
>
>> I guess is more than
>> ok for now, while we hopefully find some spare cycle to implement a
>> complete solution :/.
>
> Yeah, I bet the fun part is computing the 0-lag across the entire root
> domain, per-cpu 0-lag isn't correct afaict.
Uhm... This is an interesting problem.

I _suspect_ the 0-lag time does not depend on the number of CPUs. In other
words: I _suspect_ that when you kill a SCHED_DEADLINE task its bandwidth
should released at a time
	t0 = scheduling_deadline - current_budget / maximum_budget * period
and this time is not affected by the fact that the task is scheduled by
global EDF or by EDF on a single core.
But I have no proof about this (and I changed my mind on this multiple
times :).


On a side note: as far as I can see, releasing the bandwidth at the end
of the current reservation period (that is, when the time is equal to the
current scheduling deadline) should be safe.
Basically, by doing this we assume that the task already consumed all of
its budget for the current reservation period.



				Luca

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [tip:sched/core] sched/deadline: Fix deadline parameter modification handling
  2015-01-28 14:08                         ` Peter Zijlstra
                                             ` (2 preceding siblings ...)
  2015-01-30 10:35                           ` Juri Lelli
@ 2015-02-04 14:34                           ` tip-bot for Peter Zijlstra
  3 siblings, 0 replies; 26+ messages in thread
From: tip-bot for Peter Zijlstra @ 2015-02-04 14:34 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, tglx, juri.lelli, luca.abeni, mingo, stable, hpa,
	torvalds, tkhai, peterz

Commit-ID:  40767b0dc768060266d261b4a330164b4be53f7c
Gitweb:     http://git.kernel.org/tip/40767b0dc768060266d261b4a330164b4be53f7c
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Wed, 28 Jan 2015 15:08:03 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 4 Feb 2015 07:42:48 +0100

sched/deadline: Fix deadline parameter modification handling

Commit 67dfa1b756f2 ("sched/deadline: Implement cancel_dl_timer() to
use in switched_from_dl()") removed the hrtimer_try_cancel() function
call out from init_dl_task_timer(), which gets called from
__setparam_dl().

The result is that we can now re-init the timer while its active --
this is bad and corrupts timer state.

Furthermore; changing the parameters of an active deadline task is
tricky in that you want to maintain guarantees, while immediately
effective change would allow one to circumvent the CBS guarantees --
this too is bad, as one (bad) task should not be able to affect the
others.

Rework things to avoid both problems. We only need to initialize the
timer once, so move that to __sched_fork() for new tasks.

Then make sure __setparam_dl() doesn't affect the current running
state but only updates the parameters used to calculate the next
scheduling period -- this guarantees the CBS functions as expected
(albeit slightly pessimistic).

This however means we need to make sure __dl_clear_params() needs to
reset the active state otherwise new (and tasks flipping between
classes) will not properly (re)compute their first instance.

Todo: close class flipping CBS hole.
Todo: implement delayed BW release.

Reported-by: Luca Abeni <luca.abeni@unitn.it>
Acked-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Luca Abeni <luca.abeni@unitn.it>
Fixes: 67dfa1b756f2 ("sched/deadline: Implement cancel_dl_timer() to use in switched_from_dl()")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20150128140803.GF23038@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c     | 33 ++++++++++++++++++++++++++++-----
 kernel/sched/deadline.c |  3 ++-
 2 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5c86687..9e83809 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1814,6 +1814,10 @@ void __dl_clear_params(struct task_struct *p)
 	dl_se->dl_period = 0;
 	dl_se->flags = 0;
 	dl_se->dl_bw = 0;
+
+	dl_se->dl_throttled = 0;
+	dl_se->dl_new = 1;
+	dl_se->dl_yielded = 0;
 }
 
 /*
@@ -1839,7 +1843,7 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
 #endif
 
 	RB_CLEAR_NODE(&p->dl.rb_node);
-	hrtimer_init(&p->dl.dl_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	init_dl_task_timer(&p->dl);
 	__dl_clear_params(p);
 
 	INIT_LIST_HEAD(&p->rt.run_list);
@@ -2049,6 +2053,9 @@ static inline int dl_bw_cpus(int i)
  * allocated bandwidth to reflect the new situation.
  *
  * This function is called while holding p's rq->lock.
+ *
+ * XXX we should delay bw change until the task's 0-lag point, see
+ * __setparam_dl().
  */
 static int dl_overflow(struct task_struct *p, int policy,
 		       const struct sched_attr *attr)
@@ -3251,15 +3258,31 @@ __setparam_dl(struct task_struct *p, const struct sched_attr *attr)
 {
 	struct sched_dl_entity *dl_se = &p->dl;
 
-	init_dl_task_timer(dl_se);
 	dl_se->dl_runtime = attr->sched_runtime;
 	dl_se->dl_deadline = attr->sched_deadline;
 	dl_se->dl_period = attr->sched_period ?: dl_se->dl_deadline;
 	dl_se->flags = attr->sched_flags;
 	dl_se->dl_bw = to_ratio(dl_se->dl_period, dl_se->dl_runtime);
-	dl_se->dl_throttled = 0;
-	dl_se->dl_new = 1;
-	dl_se->dl_yielded = 0;
+
+	/*
+	 * Changing the parameters of a task is 'tricky' and we're not doing
+	 * the correct thing -- also see task_dead_dl() and switched_from_dl().
+	 *
+	 * What we SHOULD do is delay the bandwidth release until the 0-lag
+	 * point. This would include retaining the task_struct until that time
+	 * and change dl_overflow() to not immediately decrement the current
+	 * amount.
+	 *
+	 * Instead we retain the current runtime/deadline and let the new
+	 * parameters take effect after the current reservation period lapses.
+	 * This is safe (albeit pessimistic) because the 0-lag point is always
+	 * before the current scheduling deadline.
+	 *
+	 * We can still have temporary overloads because we do not delay the
+	 * change in bandwidth until that time; so admission control is
+	 * not on the safe side. It does however guarantee tasks will never
+	 * consume more than promised.
+	 */
 }
 
 /*
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index b52092f..726470d 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1094,6 +1094,7 @@ static void task_dead_dl(struct task_struct *p)
 	 * Since we are TASK_DEAD we won't slip out of the domain!
 	 */
 	raw_spin_lock_irq(&dl_b->lock);
+	/* XXX we should retain the bw until 0-lag */
 	dl_b->total_bw -= p->dl.dl_bw;
 	raw_spin_unlock_irq(&dl_b->lock);
 
@@ -1614,8 +1615,8 @@ static void cancel_dl_timer(struct rq *rq, struct task_struct *p)
 
 static void switched_from_dl(struct rq *rq, struct task_struct *p)
 {
+	/* XXX we should retain the bw until 0-lag */
 	cancel_dl_timer(rq, p);
-
 	__dl_clear_params(p);
 
 	/*

^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2015-02-04 14:35 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-29 23:27 Another SCHED_DEADLINE bug (with bisection and possible fix) luca abeni
2015-01-04 22:51 ` Kirill Tkhai
2015-01-05 15:21   ` Luca Abeni
2015-01-05 23:07     ` Kirill Tkhai
2015-01-07  7:01       ` Luca Abeni
2015-01-07 12:29         ` Kirill Tkhai
2015-01-07 12:45           ` Luca Abeni
2015-01-07 13:04             ` Kirill Tkhai
2015-01-07 13:14               ` Luca Abeni
2015-01-09 11:15               ` Luca Abeni
2015-01-09 11:46                 ` Kirill Tkhai
2015-01-13  8:10           ` Juri Lelli
2015-01-13  9:26             ` Kirill Tkhai
2015-01-13 14:04               ` Peter Zijlstra
2015-01-14 12:43                 ` Kirill Tkhai
2015-01-15 11:23                   ` Luca Abeni
2015-01-15 12:23                     ` Peter Zijlstra
2015-01-15 13:35                       ` Luca Abeni
2015-01-28 14:08                         ` Peter Zijlstra
2015-01-28 14:40                           ` Luca Abeni
2015-01-30 10:25                           ` Luca Abeni
2015-01-30 10:35                           ` Juri Lelli
2015-01-31  9:56                             ` Peter Zijlstra
2015-02-02 11:31                               ` Juri Lelli
2015-02-02 13:57                               ` Luca Abeni
2015-02-04 14:34                           ` [tip:sched/core] sched/deadline: Fix deadline parameter modification handling tip-bot for Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).