All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched/deadline: Fix a bug in dl_overflow()
@ 2016-04-14 12:19 Xunlei Pang
  2016-04-15  7:07 ` Juri Lelli
  2016-04-23 12:59 ` [tip:sched/core] " tip-bot for Xunlei Pang
  0 siblings, 2 replies; 4+ messages in thread
From: Xunlei Pang @ 2016-04-14 12:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Juri Lelli, Ingo Molnar, Steven Rostedt, Xunlei Pang

I got a minus(very big) dl_b->total_bw during my deadline tests.

    # grep dl /proc/sched_debug
    dl_rq[0]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : -222297900

Something unusual must have happened.

After some digging, I finally noticed that when changing a deadline
task to normal(cfs), and changing it back to deadline immediately,
after it died, we will got the wrong dl_bw->total_bw.

The root cause is in dl_overflow(), it has:
    if (new_bw == p->dl.dl_bw)
    	return 0;

1) When a deadline task is changed to !deadline task, it will start
   dl timer in switched_from_dl(), and retain previous deadline parameter
   till the timer expires.
2) If we change it back to deadline with the same bandwidth parameter
   before the timer expires, as it keeps the old bandwidth although it
   is not a deadline task. dl_overflow() simply returns success without
   updating the right data, and got the wrong dl_bw->total_bw.

The solution is simple, if @p is not deadline, don't return.

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
---
 kernel/sched/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4a2c79d..5988fee 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2378,7 +2378,8 @@ static int dl_overflow(struct task_struct *p, int policy,
 	u64 new_bw = dl_policy(policy) ? to_ratio(period, runtime) : 0;
 	int cpus, err = -1;
 
-	if (new_bw == p->dl.dl_bw)
+	/* !deadline task may carry old deadline bandwidth */
+	if (new_bw == p->dl.dl_bw && task_has_dl_policy(p))
 		return 0;
 
 	/*
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] sched/deadline: Fix a bug in dl_overflow()
  2016-04-14 12:19 [PATCH] sched/deadline: Fix a bug in dl_overflow() Xunlei Pang
@ 2016-04-15  7:07 ` Juri Lelli
  2016-04-15  8:13   ` Xunlei Pang
  2016-04-23 12:59 ` [tip:sched/core] " tip-bot for Xunlei Pang
  1 sibling, 1 reply; 4+ messages in thread
From: Juri Lelli @ 2016-04-15  7:07 UTC (permalink / raw)
  To: Xunlei Pang
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Steven Rostedt, luca abeni

[+Luca]

Hi,

On 14/04/16 20:19, Xunlei Pang wrote:
> I got a minus(very big) dl_b->total_bw during my deadline tests.
> 
>     # grep dl /proc/sched_debug
>     dl_rq[0]:
>     .dl_nr_running                 : 0
>     .dl_bw->bw                     : 996147
>     .dl_bw->total_bw               : -222297900
> 
> Something unusual must have happened.
> 
> After some digging, I finally noticed that when changing a deadline
> task to normal(cfs), and changing it back to deadline immediately,
> after it died, we will got the wrong dl_bw->total_bw.
> 
> The root cause is in dl_overflow(), it has:
>     if (new_bw == p->dl.dl_bw)
>     	return 0;
> 
> 1) When a deadline task is changed to !deadline task, it will start
>    dl timer in switched_from_dl(), and retain previous deadline parameter
>    till the timer expires.
> 2) If we change it back to deadline with the same bandwidth parameter
>    before the timer expires, as it keeps the old bandwidth although it
>    is not a deadline task. dl_overflow() simply returns success without
>    updating the right data, and got the wrong dl_bw->total_bw.
> 
> The solution is simple, if @p is not deadline, don't return.
> 
> Signed-off-by: Xunlei Pang <xlpang@redhat.com>
> ---
>  kernel/sched/core.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 4a2c79d..5988fee 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2378,7 +2378,8 @@ static int dl_overflow(struct task_struct *p, int policy,
>  	u64 new_bw = dl_policy(policy) ? to_ratio(period, runtime) : 0;
>  	int cpus, err = -1;
>  
> -	if (new_bw == p->dl.dl_bw)
> +	/* !deadline task may carry old deadline bandwidth */
> +	if (new_bw == p->dl.dl_bw && task_has_dl_policy(p))

Right. I got the same patch that I believe Luca is be already using for
his tests (and he also put together the changelog). I never managed to
send it out, sorry about that. We can take yours, mine follows just in
case we want to take something from the changelog or we want to reverse
the if condition.

Thanks,

- Juri

--->8---

>From 4bf38111bd9383035e03d3dc3d42011aaa9e26e7 Mon Sep 17 00:00:00 2001
From: Juri Lelli <juri.lelli@arm.com>
Date: Thu, 25 Feb 2016 15:50:42 +0100
Subject: [PATCH 2/3] fix a bug in the -deadline utilization tracking mechanism

Currently, a task doing
	while(1) {
		switch to SCHED_DEADLINE
		switch to SCHED_OTHER
	}
brings dl_b->total_bw below 0.
This happens because when the task switches back from SCHED_DEADLINE
to SCHED_OTHER, switched_from_dl() does not clear its deadline
parameters (they will be cleared by the deadline timer when it fires).
But dl_overflow() removes its utilization from dl_b->total_bw.
When the task switches back to SCHED_DEADLINE, the
	if (new_bw == p->dl.dl_bw)
check in dl_overflow() prevents __dl_add() from being called, and
so when the task switches back to SCHED_OTHER dl_b->total_bw becomes
negative.
This patch changes the check so that if the task is switching from
SCHED_OTHER to SCHED_DEADLINE __dl_add() is correctly invoked.
---
 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9503d59..d59fa20 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2432,7 +2432,7 @@ static int dl_overflow(struct task_struct *p, int policy,
 	u64 new_bw = dl_policy(policy) ? to_ratio(period, runtime) : 0;
 	int cpus, err = -1;

-	if (new_bw == p->dl.dl_bw)
+	if (task_has_dl_policy(p) && new_bw == p->dl.dl_bw)
 		return 0;

 	/*
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] sched/deadline: Fix a bug in dl_overflow()
  2016-04-15  7:07 ` Juri Lelli
@ 2016-04-15  8:13   ` Xunlei Pang
  0 siblings, 0 replies; 4+ messages in thread
From: Xunlei Pang @ 2016-04-15  8:13 UTC (permalink / raw)
  To: Juri Lelli
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Steven Rostedt, luca abeni

On 2016/04/15 at 15:07, Juri Lelli wrote:
> [+Luca]
>
> Hi,
>
> On 14/04/16 20:19, Xunlei Pang wrote:
>> I got a minus(very big) dl_b->total_bw during my deadline tests.
>>
>>     # grep dl /proc/sched_debug
>>     dl_rq[0]:
>>     .dl_nr_running                 : 0
>>     .dl_bw->bw                     : 996147
>>     .dl_bw->total_bw               : -222297900
>>
>> Something unusual must have happened.
>>
>> After some digging, I finally noticed that when changing a deadline
>> task to normal(cfs), and changing it back to deadline immediately,
>> after it died, we will got the wrong dl_bw->total_bw.
>>
>> The root cause is in dl_overflow(), it has:
>>     if (new_bw == p->dl.dl_bw)
>>     	return 0;
>>
>> 1) When a deadline task is changed to !deadline task, it will start
>>    dl timer in switched_from_dl(), and retain previous deadline parameter
>>    till the timer expires.
>> 2) If we change it back to deadline with the same bandwidth parameter
>>    before the timer expires, as it keeps the old bandwidth although it
>>    is not a deadline task. dl_overflow() simply returns success without
>>    updating the right data, and got the wrong dl_bw->total_bw.
>>
>> The solution is simple, if @p is not deadline, don't return.
>>
>> Signed-off-by: Xunlei Pang <xlpang@redhat.com>
>> ---
>>  kernel/sched/core.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 4a2c79d..5988fee 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -2378,7 +2378,8 @@ static int dl_overflow(struct task_struct *p, int policy,
>>  	u64 new_bw = dl_policy(policy) ? to_ratio(period, runtime) : 0;
>>  	int cpus, err = -1;
>>  
>> -	if (new_bw == p->dl.dl_bw)
>> +	/* !deadline task may carry old deadline bandwidth */
>> +	if (new_bw == p->dl.dl_bw && task_has_dl_policy(p))
> Right. I got the same patch that I believe Luca is be already using for
> his tests (and he also put together the changelog). I never managed to
> send it out, sorry about that. We can take yours, mine follows just in
> case we want to take something from the changelog or we want to reverse
> the if condition.

Ah, exactly the same issue. Thanks for your feedback :-)

Regards,
Xunlei

>
> Thanks,
>
> - Juri
>
> --->8---
>
> From 4bf38111bd9383035e03d3dc3d42011aaa9e26e7 Mon Sep 17 00:00:00 2001
> From: Juri Lelli <juri.lelli@arm.com>
> Date: Thu, 25 Feb 2016 15:50:42 +0100
> Subject: [PATCH 2/3] fix a bug in the -deadline utilization tracking mechanism
>
> Currently, a task doing
> 	while(1) {
> 		switch to SCHED_DEADLINE
> 		switch to SCHED_OTHER
> 	}
> brings dl_b->total_bw below 0.
> This happens because when the task switches back from SCHED_DEADLINE
> to SCHED_OTHER, switched_from_dl() does not clear its deadline
> parameters (they will be cleared by the deadline timer when it fires).
> But dl_overflow() removes its utilization from dl_b->total_bw.
> When the task switches back to SCHED_DEADLINE, the
> 	if (new_bw == p->dl.dl_bw)
> check in dl_overflow() prevents __dl_add() from being called, and
> so when the task switches back to SCHED_OTHER dl_b->total_bw becomes
> negative.
> This patch changes the check so that if the task is switching from
> SCHED_OTHER to SCHED_DEADLINE __dl_add() is correctly invoked.
> ---
>  kernel/sched/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 9503d59..d59fa20 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2432,7 +2432,7 @@ static int dl_overflow(struct task_struct *p, int policy,
>  	u64 new_bw = dl_policy(policy) ? to_ratio(period, runtime) : 0;
>  	int cpus, err = -1;
>
> -	if (new_bw == p->dl.dl_bw)
> +	if (task_has_dl_policy(p) && new_bw == p->dl.dl_bw)
>  		return 0;
>
>  	/*

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [tip:sched/core] sched/deadline: Fix a bug in dl_overflow()
  2016-04-14 12:19 [PATCH] sched/deadline: Fix a bug in dl_overflow() Xunlei Pang
  2016-04-15  7:07 ` Juri Lelli
@ 2016-04-23 12:59 ` tip-bot for Xunlei Pang
  1 sibling, 0 replies; 4+ messages in thread
From: tip-bot for Xunlei Pang @ 2016-04-23 12:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: xlpang, rostedt, tglx, linux-kernel, efault, mingo, hpa,
	juri.lelli, peterz

Commit-ID:  fec148c000d0f9ac21679601722811eb60b4cc52
Gitweb:     http://git.kernel.org/tip/fec148c000d0f9ac21679601722811eb60b4cc52
Author:     Xunlei Pang <xlpang@redhat.com>
AuthorDate: Thu, 14 Apr 2016 20:19:28 +0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 23 Apr 2016 14:20:43 +0200

sched/deadline: Fix a bug in dl_overflow()

I got a minus(very big) dl_b->total_bw during my deadline tests.

    # grep dl /proc/sched_debug
    dl_rq[0]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : -222297900

Something unusual must have happened.

After some digging, I finally noticed that when changing a deadline
task to normal(cfs), and changing it back to deadline immediately,
after it died, we will got the wrong dl_bw->total_bw.

The root cause is in dl_overflow(), it has:
    if (new_bw == p->dl.dl_bw)
	return 0;

1) When a deadline task is changed to !deadline task, it will start
   dl timer in switched_from_dl(), and retain previous deadline parameter
   till the timer expires.

2) If we change it back to deadline with the same bandwidth parameter
   before the timer expires, as it keeps the old bandwidth although it
   is not a deadline task. dl_overflow() simply returns success without
   updating the right data, and got the wrong dl_bw->total_bw.

The solution is simple, if @p is not deadline, don't return.

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@arm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1460636368-1993-1-git-send-email-xlpang@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 71dffbb..9d84d60 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2378,7 +2378,8 @@ static int dl_overflow(struct task_struct *p, int policy,
 	u64 new_bw = dl_policy(policy) ? to_ratio(period, runtime) : 0;
 	int cpus, err = -1;
 
-	if (new_bw == p->dl.dl_bw)
+	/* !deadline task may carry old deadline bandwidth */
+	if (new_bw == p->dl.dl_bw && task_has_dl_policy(p))
 		return 0;
 
 	/*

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-04-23 13:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-14 12:19 [PATCH] sched/deadline: Fix a bug in dl_overflow() Xunlei Pang
2016-04-15  7:07 ` Juri Lelli
2016-04-15  8:13   ` Xunlei Pang
2016-04-23 12:59 ` [tip:sched/core] " tip-bot for Xunlei Pang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.